An aero-engine life-cycle maintenance policy optimization algorithm: Reinforcement learning approach

2019-10-26 01:12:16ZhenLIShishengZHONGLinLIN
CHINESE JOURNAL OF AERONAUTICS 2019年9期

Zhen LI, Shisheng ZHONG, Lin LIN

School of Mechatronics Engineering, Harbin Institute of Technology, Harbin 150001, China

KEYWORDS Aero-engine;Hybrid strategy;Maintenance policy;Optimization algorithm;Reinforcement learning

Abstract An aero-engine maintenance policy plays a crucial role in reasonably reducing maintenance cost. An aero-engine is a type of complex equipment with long service-life. In engineering,a hybrid maintenance strategy is adopted to improve the aero-engine operational reliability. Thus,the long service-life and the hybrid maintenance strategy should be considered synchronously in aero-engine maintenance policy optimization. This paper proposes an aero-engine life-cycle maintenance policy optimization algorithm that synchronously considers the long service-life and the hybrid maintenance strategy. The reinforcement learning approach was adopted to illustrate the optimization framework, in which maintenance policy optimization was formulated as a Markov decision process. In the reinforcement learning framework, the Gauss-Seidel value iteration algorithm was adopted to optimize the maintenance policy. Compared with traditional aero-engine maintenance policy optimization methods,the long service-life and the hybrid maintenance strategy could be addressed synchronously by the proposed algorithm. Two numerical experiments and algorithm analyses were performed to illustrate the optimization algorithm in detail.

1. Introduction

An aero-engine, which is composed of mechanic-electrichydraulic coupling systems,is the power plant of an aircraft.1,2It has been reported that more than 30% of aircraft mechanical problems are related to aero-engines, and the aero-engine maintenance cost contributes to about 30% of an airline's direct operating cost.3A maintenance optimization method provides an available way to reduce the maintenance cost reasonably.4In general, excessive maintenance is costly, while insufficient maintenance may lead to disasters.Thus,a maintenance policy plays a crucial role in balancing the maintenance cost and operational reliability.5However, it is not an easy work to optimize an aero-engine maintenance policy manually,especially taking the long service-life and the hybrid maintenance strategy into consideration synchronously.

In engineering,a hybrid maintenance strategy is adopted to improve the civil aero-engine operational reliability.Strategies of Condition-Based Maintenance (CBM), Hard-time Maintenance (HM), and failure Corrective Maintenance (CM) are included in the hybrid maintenance strategy.6As aero-engine performance deterioration is inevitable,7gas path performance parameters are monitored for CBM. The Life Limit Part(LLP) should be replaced before its life limitation,8and HM is adopted. Moreover, CM is performed when an aeroengine is in the random failure state.Thus,the hybrid maintenance strategy should be considered in aero-engine maintenance policy optimization. However, few existing aero-engine maintenance optimization methods are able to address the hybrid strategy. At the same time, an aero-engine is a type of equipment with long service-life,9which should be synchronously considered in maintenance policy optimization.To the best of our knowledge, this is the first paper to investigate an aero-engine maintenance policy optimization method that can synchronously address the hybrid maintenance strategy and long service-life.

In traditional aero-engine maintenance optimization methods, maintenance intervals are used to decide when to repair an aero-engine, and maintenance work-scopes indicate how to carry out maintenance actions. Maintenance intervals and work-scopes can be obtained by traditional separate models.For example, LLP replacement intervals and performance recovery intervals can be optimized separately by current models.10-12Based on maintenance intervals, maintenance workscopes are obtained by traditional decision-making models.13,14Based on traditional optimization methods, an aeroengine maintenance decision support system was proposed by Fu et al.,15in which maintenance interval and work-scope optimization models were presented. An optimization method for reliability-centered maintenance was proposed by Crocker and Kumar,16in which the concepts of soft life and hard life were used to optimize a military aero-engine maintenance policy.A multi-objective evolutionary algorithm was also adopted to solve the aero-engine maintenance scheduling problem,17taking the module exchange into consideration. To optimize an aero-engine maintenance policy, traditional optimization methods would become extremely complicated when the long service-life and the hybrid maintenance strategy are considered synchronously.18

In general, machine learning methods include supervised learning, unsupervised learning, and reinforcement learning,19and the reinforcement learning method has attracted increasing interests in solving decision-making problems.20Reinforcement learning represents a machine learning method in which an agent learns how to behave through action rewards.Different from widely used supervised learning methods, there is no presentation of input and output pairs in reinforcement learning. In reinforcement learning, an agent chooses an available action according to the environment on each decision epoch.The chosen action changes the environment, along with a reward to the agent. The objective of the agent is to find the action collection, whose reward is maximal in the long run.

Reinforcement learning methods have been adopted in energy system charging policy optimization,21,22energy system and distributed system schedule determination,23,24multiple robotic task optimization,25demand response optimization,26,27robust control optimization,28multiple satellites task planning,29et al. Although reinforcement learning methods have been successfully applied, they have not aroused much attention in aero-engine maintenance policy optimization. Reinforcement learning does provide a more appropriate way for aero-engine life-cycle maintenance policy optimization. Thus, an aero-engine life-cycle maintenance policy optimization algorithm is proposed based on reinforcement learning. The main contributions of this paper are as follows:

(1) An aero-engine life-cycle maintenance policy optimization algorithm is proposed, which can synchronously address the aero-engine long service-life and the hybrid maintenance strategy.

(2) To address the hybrid strategy, the aero-engine state is represented by a multi-dimensional state space.

(3) Reinforcement learning is adopted to illustrate the maintenance policy optimization. In the reinforcement learning framework, maintenance policy optimization is formulated as a discrete Markov Decision Process(MDP), and the Gauss-Seidel value iteration algorithm is adopted to optimize the maintenance policy.

The remainder of this paper is organized as follows.Section 2 introduces aero-engine maintenance policy optimization and the reinforcement learning approach, and the deficiencies of traditional optimization methods are analyzed.In Section 3, the proposed aero-engine life-cycle maintenance policy optimization algorithm is described in detail. In Section 4, two simulation experiments and algorithm analysis are used to illustrate the proposed optimization algorithm.Conclusions and future work are discussed in Section 5.

2. Aero-engine maintenance policy optimization and reinforcement learning

2.1. Aero-engine maintenance policy optimization

A maintenance policy indicates when and how to repair an aero-engine. Traditionally, maintenance intervals indicate when to repair an aero-engine, which are obtained by traditional separate optimization models, addressing CBM, HM and,CM strategies separately.In traditional methods,maintenance work-scopes indicate how to repair an aero-engine,which are optimized based on the outputs of interval optimization models. In summary, there are several deficiencies in traditional aero-engine maintenance optimization methods, as follows:

(1) In traditional methods,maintenance intervals and workscopes are optimized by separate models, and workscopes are optimized based on interval optimization results. Thus, the interactions of maintenance intervals and work-scopes are neglected.Moreover,interval optimization errors would be propagated to work-scope optimization.

(2) Because the hybrid maintenance strategy is adopted for civil aero-engines, hybrid strategies should be addressed synchronously in optimization. However, traditional optimization methods address hybrid strategies separately, and interactions of hybrid strategies are neglected.

(3) It is difficult for traditional optimization methods to address the aero-engine long-service life and the hybrid maintenance strategy synchronously.

(4) Definite optimization results are obtained by traditional optimization methods. Due to random factors, definite optimization results may be poorly applicable in engineering.

To deal with the aforementioned deficiencies of traditional methods, an aero-engine life-cycle maintenance policy optimization algorithm is proposed based on the reinforcement learning approach. Taking the place of maintenance intervals and work-scopes in traditional methods, the maintenance policy indicates when and how to repair an aero-engine.The proposed optimization algorithm is able to address the hybrid maintenance strategy and long service-life synchronously.

To address the hybrid maintenance strategy, a multidimensional space is adopted to represent the aero-engine state.The maintenance strategies of CBM and CM are formulated as an MDP,and imperfect repair and random factors are all considered in state transition. Aero-engine maintenance policy optimization is illustrated by the reinforcement learning approach, and the Gauss-Seidel value iteration algorithm is adopted to seek for the optimal maintenance policy. In the reinforcement learning framework, the Gauss-Seidel value iteration algorithm makes it available to optimize the lifecycle maintenance policy. A comparison between traditional optimization methods and the proposed optimization algorithm is shown in Fig. 1.

In Fig.1,traditional aero-engine maintenance optimization methods are shown on the left.Traditional methods are constituted by separate models,including an LLP interval optimization model, a performance interval optimization model, and a work-scope optimization model.LLP replacement and performance recovery intervals are obtained by separate optimization models. Based on the LLP replacement, performance recovery, and corrective maintenance intervals, the maintenance work-scope is optimized. The proposed aero-engine life-cycle maintenance policy optimization algorithm is shown on the right in Fig. 1. Based on the reinforcement learning framework, the strategies of HM, CM, and CBM are addressed synchronously by the proposed optimization algorithm. The traditional separate optimization models of the LLP replacement interval, the performance recovery interval,and the maintenance work-scope are replaced by the proposed optimization algorithm.

2.2. Reinforcement learning approach

Reinforcement learning is a machine learning method,which is widely used to solve multi-step, sequential-decision problems.Different from supervised learning, no pre-specified model is required in reinforcement learning. In aero-engine life-cycle maintenance policy optimization, few historical data is available for training the pre-specified mode. Thus, reinforcement learning provides a more appropriate way for aero-engine life-cycle maintenance policy optimization. Meanwhile, the aero-engine long service-life and the hybrid maintenance strategy can be addressed synchronously by reinforcement learning.

In reinforcement learning, an agent takes on the work of optimizing an aero-engine maintenance policy. The agent is able to respond to dynamically changing aero-engine states through ongoing learning methods.30In aero-engine maintenance policy optimization, aero-engine states are represented by a multi-dimensional state space. To optimize the maintenance policy, the agent chooses a maintenance action according to the aero-engine state. The aero-engine state is changed by the chosen maintenance action,along with the maintenance cost, as shown in Fig. 2. The optimal objective of the agent is to find the maintenance action collection, whose total cost is the minimum in the long run.

Fig. 2 Schematic diagram of reinforcement learning.

Value iteration is a reinforcement learning algorithm that is widely adopted in solving decision-making problems. In the reinforcement learning framework, the Gauss-Seidel value iteration algorithm provides an appropriate way to optimize an aero-engine life-cycle maintenance policy. According to the Gauss-Seidel value iteration algorithm, an agent would run multiple episodes for the purpose of exploring and finding the optimal policy.The learning process is conducted for a sufficient number of iterations,and the total cost of each iteration is recorded.The minimum total cost is represented as Q-value,which is updated every iteration, and the Bellman equation is adopted as the updating mechanism in the Gauss-Seidel value iteration algorithm. The convergence of value iterative methods has been widely proven.31Thus, based on the reinforcement learning framework, the Gauss-Seidel value iteration algorithm is adopted in the proposed aero-engine life-cycle maintenance policy optimization algorithm.

3. Aero-engine life-cycle maintenance policy optimization algorithm

The reinforcement learning approach is adopted in the proposed aero-engine life-cycle maintenance policy optimization algorithm, in which the hybrid maintenance strategy and long service-life are considered synchronously.In the reinforcement learning framework, the aero-engine state, maintenance actions, state transition matrices, and maintenance costs should be determined. To address the hybrid maintenance strategy, a multi-dimensional state space is adopted to represent the aero-engine state, taking performance, LLP, and random failure states into consideration synchronously. The optimal objective of the proposed optimization algorithm is to obtain a maintenance policy whose total cost is the minimum in the long run. The reinforcement learning algorithm of Gauss-Seidel value iteration is adopted, addressing longterm optimization. In this section, the proposed optimization algorithm is described in detail.sub-state Yjdenotes the performance state, and Zkdenotes the random failure state.

(1) LLP state

In engineering,an LLP must be replaced before its life limitation,and the HM strategy is adopted for LLP maintenance.Traditionally, the flight-cycle or flight-hour is adopted to represent the LLP state. For example, a fan shaft is an LLP of a CFM56-5B aero-engine, and its life limitation is 30,000 flightcycles. That is, a fan shaft must be replaced before 30,000 flight-cycles. For convenience, the LLP state is represented by discrete time increments.32In the proposed optimization algorithm, the LLP state is represented by several state levels,denoted as {Ti|i=0,1,2,...}, where T0denotes the all-new state; when m <n, state Tnis ‘‘older” than state Tm. It is defined that an LLP is in state Tn, when tn-1<tllp≤tn; tn-1and tnare boundary values; tllpis an LLP real state, measured by flight-cycle or flight-hour.

(2) Performance state

As the CBM strategy is adopted in aero-engine maintenance, the aero-engine performance state should also be considered in the proposed optimization algorithm. Most of the in-service aero-engines are equipped with condition monitoring systems, and performance parameters are sent to ground by the aircraft communication addressing and reporting system in quasi real-time. Traditionally, the aero-engine performance state is assessed by performance parameters.33,34In engineering, the Exhaust Gas Temperature Margin (EGTM)is adopted as an aero-engine performance indicator.35The EGTM is defined as the margin temperature between exhaust gas temperature and red-line temperature. The aero-engine performance degrades as it operates, presented as EGTM declining.36,37When the EGTM is close to the limitation, performance recovery maintenance should be performed. The 1000 flight-cycles EGTM parameters of a CFM56-5B aeroengine are shown in Fig. 3.

As EGTM parameters are typical time series,38it would make the algorithm extremely complicated when EGTM time series are adopted directly.For convenience,the CBM strategy is formulated as a discrete MDP in the reinforcement learning framework. Thus, the aero-engine performance state is represented by several levels, denoted as {Di|i=0,1,2,...}, where D0denotes the all-new performance state; when m <n, state Dnis worse than state Dm. It is defined that the performance state is Dn, when dn-1<dper≤dn, in which dn-1and dnare

3.1. Aero-engine states

In the reinforcement learning framework,the aero-engine state is changed by performed actions. As the hybrid maintenance strategy is considered in the proposed optimization algorithm,a multi-dimensional state space is adopted to represent the aero-engine state, in which performance, LLP, and random failure states are considered synchronously. The multidimensional state space is represented by S={Xi,Yj,Zk,...},where Xi,Yj,Zk,... denote the sub-states in the multidimensional state space. Each sub-state denotes one considered factor, for example, sub-state Xidenotes the LLP state,the boundary values;dperis the EGTM real value,measured by centigrade. Thus, in the proposed optimization algorithm, the aero-engine performance state is indicated by the performance levels.

Fig. 3 1000 flight-cycles EGTM of a CFM56-5B aero-engine.

(3) Random failure state

Although an aero-engine is with high reliability, it is subjected to random failures in practice. When an aero-engine is in the random failure state, CM should be performed to drive it back to the working state. Unlike the LLP or performance state, the random failure state is represented by two levels,denoted as {F0,F1}, where F0denotes the working state and F1denotes the random failure state.

From the above,when LLP,performance,and random failure states are all considered, the aero-engine state is represented by a multi-dimensional state space, denoted as

where Tidenotes the LLP state, Djdenotes the performance state, and Fkdenotes the random failure state.

3.2. Maintenance actions

In reinforcement learning, maintenance actions are performed on every decision epoch. The decision epoch is denoted as{Eii=| 0,1,2,...,m,...}, and state Siis regarded as the status between epochs Ei-1and Ei. As LLP, performance, and random failure states are all considered,LLP replacement actions,performance recovery actions, and failure corrective actions should be determined on each decision epoch.

LLP replacement actions denote maintenance actions of replacing an aero-engine LLP. When an LLP replacement action is performed, the definite LLP is replaced, and the LLP state is changed.LLP replacement actions are represented by {Arep,i|i=0,1,2,...}, where Arep,0denotes no LLP replaced and Arep,m(m≠0) denotes LLP m replaced. When Arep,mis performed, the LLP m state is changed to the all-new state.

Performance recovery actions denote maintenance actions of recovering aero-engine performance. When a performance recovery action is performed, the performance state is recovered by a definite level, and the performance state is changed.Performance recovery actions are represented by{Arec,j|j=0,1,2,...}, where Arec,0denotes no performance recovery action performed and Arec,m(m≠0)denotes the action of recovering m performance levels.

Failure corrective actions denote maintenance actions of making a failure-state aero-engine back to the running state.When the aero-engine is trapped in the random failure state,a failure corrective action should be performed.Failure corrective actions are represented by Acor={Acor,0,Acor,1}, where Acor,0denotes no corrective maintenance performed and Acor,1denotes corrective maintenance performed.

From the above, aero-engine maintenance actions are represented by

Because an aero-engine does not operate during the maintenance process, maintenance actions are assumed to be ‘‘instantaneous” in the proposed optimization algorithm.39

3.3. State transition

In the reinforcement learning framework,the aero-engine state is changed by performed maintenance actions.Thus,LLP,performance, and random failure state transitions are illustrated as follows.

(1) LLP state transition

In engineering,the LLP state is measured by the flight-cycle or flight-hour,which increases directly as an aero-engine operates. The LLP state would be recovered to the all-new state when an LLP replacement action is performed. Thus, the LLP state transfers directly, without uncertainty.

When action Arep,0is performed,LLP state Tiwould transfer to the definite state Ti+1,that is,p(Ti+1|Ti,Arep,0)=1.When action Arep,1is performed, LLP state Tiwould transfer to T0,that is, p(T0|Ti,Arep,1)=1. A schematic diagram of LLP state transition is shown in Fig. 4.

(2) Performance state transition

Aero-engine performance deterioration is inevitable in engineering. As the imperfect maintenance concept and random factors are considered in the proposed optimization algorithm,the CBM strategy is formulated as a discrete MDP. Probability matrices are adopted in performance state transition.

(

Because the maintenance concept of ‘‘as good as new” has been proven to be far from the truth,40a more realistic concept of imperfect repair is adopted for performance recovery actions.41,42That is,the performance state cannot be recovered to the all-new state by any maintenance actions, and the performance state would transfer according to the transition probability matrices. When action Arec,m(m >0) is performed,the performance state would transfer from Dito Di-maccording to the probability matrix [p(Di-m|Di,Arec,m,i-m >0)]. A schematic diagram of performance state transition is shown in Fig. 5.

The performance state transition probability matrices can be calculated by survival analysis based on the Weibull distribution.43,44

Fig. 4 Schematic diagram of LLP state transition.

Fig. 5 Schematic diagram of performance state transition.

(3) Random failure state transition

As an aero-engine may be in the random fault state occasionally, the CM strategy is also formulated as a discrete MDP. The random failure state would transfer according to probability matrices.

In the proposed optimization algorithm, an aero-engine may fall into the random failure state F1by probability p(F1|Dj,F0;Acor,0). To be more realistic, it is assumed that the random failure probability is lower when the aero-engine is in a better performance state. Thus, the random failure probability p(F1|Dj,F0) is related to the performance state Dj.When the aero-engine is in the random failure state,corrective maintenance should be performed to drive it to the working state. It is assumed that the corrective maintenance is completely efficient, and the transition probability of the corrective maintenance is represented as p(F0|Dj,F1;Acor,1)=1.A schematic diagram of random failure state transition is shown in Fig. 6.

From the above, the state transition matrix on Eiis represented by p(Si+1|Si,Ai), where Si+1denotes the aero-engine state on Ei+1, Sidenotes the aero-engine state on Ei, and Aidenotes the performed maintenance action on Ei. Different sub-states would transfer according to different modes, as illustrated above.

3.4. Total cost and optimization

In reinforcement learning, an agent chooses maintenance actions according to action costs.44,45In the proposed optimization algorithm, the optimal objective is to obtain a maintenance policy whose total cost is the minimum in the long run.The maintenance cost on decision epoch Ekis calculated by

where ckdenotes the maintenance cost on decision epoch Ek;Cope,kdenotes the operating cost; Crep,k,Crec,k, Ccor,k, and Cotherdenote the LLP replacement cost, the performance recovery cost,the corrective maintenance cost,and other costs,respectively.

Fig. 6 Schematic diagram of random failure state transition.

In general, a good-performance-state aero-engine would have a better economic efficiency and a lower random failure rate. Thus, Cope,kis determined by the performance state.When an LLP replacement action is performed, material and replacement costs are both counted in Crep,k(Arep,k). LLP replacement costs vary from different LLP replacement actions. The performance recovery cost is represented by Crec,k(Arec,k), and when m >n, Crec,i(Arec,m)>Crec,i(Arec,n).The corrective maintenance cost Ccor,kis counted when a corrective maintenance is performed.

As the life-cycle maintenance policy is optimized by the proposed optimization algorithm, the future maintenance cost should be counted in optimization. In reinforcement learning,a discount factor is adopted to address long-term optimization.Thus,the optimal objective of the proposed optimization algorithm is denoted as

where C denotes the discounted future cost, and γ(γ ∈[0,1])denotes the discount factor, representing the relative impact of future action costs.

In reinforcement learning, when a larger γ is adopted,future action costs would leave a greater impact on maintenance action selection.That is,when γ=0,the optimized policy is shortsighted,and the maintenance action is chosen by the current cost; when γ=1, all the future actions are considered in action selection,which would bring a heavy calculation burden. Thus, a balance between future costs and the calculation burden should be determined. Thus, the discount factor γ should be set as γ ∈(0,1), for example, γ=0.9.

As the Gauss-Seidel value iteration algorithm is an effective reinforcement learning algorithm, which is widely used in policy optimization, it is adopted to seek for the maintenance action collection whose discounted long-term cost is the minimum.

4. Numerical experiments of maintenance policy optimization

Two numerical experiments were used to illustrate the proposed aero-engine life-cycle maintenance policy optimization algorithm in detail. According to the reinforcement learning framework,determinations of aero-engine states,maintenance actions, state transition matrices, and total cost matrices are described firstly.As traditional methods are unable to address the long service-life and the hybrid maintenance strategy synchronously,they were not adopted as the benchmark methods.The reinforcement learning algorithm of Gauss-Seidel value iteration was adopted in the experiments.

4.1. Aero-engine states

In the reinforcement learning framework,the aero-engine state should be determined firstly. As the multi-dimensional state space was adopted to represent the aero-engine state, performance, LLP, and random failure states were all considered in the first numerical experiment.

The EGTM was adopted to represent the aero-engine performance state. For convenience, the EGTM time series were divided into several levels. In a sample fleet, there were three performance recovery actions, including minimal repair,medium repair, and overhaul repair. According to the performance recovery assumptions in Section 4.2,there should be at least five performance levels to fully illustrate the three performance recovery actions. In reinforcement learning, as more performance levels would make the optimization algorithm more complicated, five levels were able to present the aeroengine performance state. In the numerical experiment, the performance state was divided into five levels, denoted as{D1,D2,D3,D4,D5}, from good to bad, where D5denoted the worst performance level. Besides, the all-new performance state was denoted as D0.

Because performance and random failure states were both transferred by probabilities, the random failure state was regarded as a specific‘‘performance state”.Thus,in the numerical experiment, performance and random failure states were represented by one state-space dimension, denoted as{D0,D1,D2,D3,D4,D5,F}.

Although there were several LLPs in an aero-engine, for convenience,LLPs with the same life limitation were regarded as the same LLP type. In the first numerical experiment, one LLP type was taken into account,and the LLP state was measured by flight-cycles. Referring to the adopted performance state levels,the LLP state was divided into five levels,denoted as {T1,T2,T3,T4,T5}. Besides, T0denoted the all-new state.Thus, the aero-engine state was represented by a twodimensional state space, denoted as

where Di(i=0,1,2,3,4,5) denoted the performance state; D6denoted the random failure state F; Tj(j=0,1,2,3,4,5)denoted the LLP state. The performance and random failure states were denoted as the first dimension; the LLP state was denoted as the second dimension.

Fig. 7 Schematic diagrams of performance recovery action effects on performance states.

4.2. Maintenance actions

In reinforcement learning, the aero-engine state was changed by performed maintenance actions, and maintenance actions should be determined according to the aero-engine state.Firstly, according to traditional and practical matters,46two performance recovery assumptions were made in the numerical experiment as follows:

(1) An aero-engine could not be recovered to the all-new performance state by any performance recovery actions.

(2) No maintenance action should be performed when the aero-engine was in the worst performance state.

According to the sample fleet operation, three actions of minimal repair, medium repair, and overhaul repair were included in performance recovery actions. Thus, three performance recovery actions were adopted in the numerical experiment, denoted as Arec={Arec,1,Arec,2,Arec,3}, where Arec-h1was defined as the action to recover the performance state from Dxto Dx-1, when x-1 >0, or to keep in the performance state Dx,when x=1;Arec,2was defined as the action to recover the performance state from Dxto Dx-2, when x-2 >0, or to recover the performance state from Dxto D1,when 1 ≤x ≤2;Arec,3was defined as the action to recover the performance state from Dxto Dx-3, when x-3 >0, or to recover the performance state from Dxto D1, when 1 ≤x ≤3. Besides, Arec,0denoted no performance recovery action was performed.In the reinforcement learning framework, Fig. 7 shows performance recovery action effects on performance states.

In the numerical experiment, the random failure state was regarded as a specific ‘‘performance state”. Thus, it was assumed that performance recovery actions could drive the random failure aero-engine back to the working state.

As one LLP type was considered,LLP replacement actions were denoted as Arep={Arep,0,Arep,1}. Fig. 8 shows the LLP replacement action effect on LLP states.

From the above, maintenance actions in the numerical experiment were represented by

4.3. State transition

Fig. 8 Schematic diagram of the LLP replacement action effect on LLP states.

Based on the aforementioned methods, performance state Diand transition probability matrix P0were obtained from the sample fleet, as shown in

Transition matrices of Arec,1, Arec,2, and Arec,3were P1, P2,and P3, denoted as

where nllpdenoted the number of LLP types, and nrecadenoted the number of performance recovery actions.

4.4. Maintenance policy optimization

Based on the aforementioned methods, aero-engine states,maintenance actions, and state transition matrices were all determined. In the reinforcement learning framework, the Gauss-Seidel value iteration algorithm was adopted to optimize the aero-engine maintenance policy. The flow diagram of the proposed aero-engine maintenance policy optimization algorithm is shown in Fig. 9.

Table 2 Transition probability matrix of an LLP replacement action.

Fig. 9 Flow diagram of maintenance policy optimization.

Because real cost data was unavailable, a hypothetical cost matrix was adopted in the numerical experiment. In engineering,the cost matrix may change according to the actual maintenance cost, and it would not distort the analysis of simulation results. In the numerical experiment, the LLP replacement cost Cllpwas assumed to be 350, while the performance recovery costs Crep,1, Crep,2, and Crep,3were assumed to be 300, 350, and 500. As an aero-engine has long service-life,future maintenance actions should be fully considered in lifecycle maintenance policy optimization.Thus,a larger discount factor γ=0.9 was adopted.In contrast to the Jacobi value iteration algorithm, the Gauss-Seidel value iteration algorithm has a faster convergence ability. Thus, in the reinforcement learning framework, the Gauss-Seidel value iteration algorithm was adopted to optimize the aero-engine maintenance policy.

As the aero-engine state was represented by a twodimensional space, the optimal maintenance policy was presented by a two-dimensional policy map, shown in Fig. 10.

Optimal maintenance actions on each decision epoch were shown in the policy map,and decision epochs were represented by aero-engine states. In the maintenance policy map of Fig. 10, the LLP state was regarded as the ordinate, and the performance state was regarded as the abscissa. Different maintenance actions were presented in different colors and shapes. In the legend, A0denoted action { Arep,0,Arec,0} ; A1denoted action { Arep,0,Arec,1} ; A2denoted action{ Arep,0,Arec,2} ;A4denoted action{ Arep,1,Arec,0} ;A6denoted action { Arep,1,Arec,2} . In engineering, a maintenance policy could be obtained according to the aero-engine state.

Fig. 10 Maintenance policy map of the numerical experiment.

4.5. Algorithm performance analysis

In this section,a more complex numerical experiment was conducted to illustrate the proposed optimization algorithm with more detailed instructions. In engineering, an aero-engine is usually composed of more than one LLP type. Some LLP states were measured by flight-hour, different from the one in the first numerical experiment. Thus, a more complex numerical experiment with two LLP types was conducted.To distinguish the two numerical experiments,the first numerical experiment was named as Experiment 1,and the numerical experiment in this section was named as Experiment 2.

Table 3 Transition matrices of no LLP replacement action.

Table 4 Transition probability matrices of LLP1 replacement actions.

4.5.1. Aero-engine states

Two LLP types were considered in Experiment 2,and the two LLP states were measured by different units: one was flightcycle and the other was flight-hour.Thus,the aero-engine state was denoted as a three-dimensional state space. The three dimensions included the performance state,the LLP state represented by flight-cycle(LLP1), and the LLP state represented by flight-hour (LLP2).

Corresponding to Experiment 1, in addition to all-new states T0and L0, the LLP1 state was divided into two levels,denoted as {T0,T1,T2};the LLP2 state was divided into three levels, denoted as {L0,L1,L2,L3}. Same as in Experiment 1,the performance state was represented by five levels, and the random failure state was regarded as a specific ‘‘performance state”. Thus, the aero-engine state space was represented by

where Di(i=0,1,2,3,4,5) denoted the performance state; D6denoted the random failure state; Tjdenoted the LLP1 state;Lkdenoted the LLP2 state.

4.5.2. Maintenance actions and state transition

In Experiment 2, LLP1 and LLP2 replacement actions were denoted as ArepT={ArepT,0,ArepT,1} and ArepL={ArepL,0,ArepL,1}. Performance recovery actions were the same as those in Experiment 1. Performance recovery assumptions were also available.Maintenance actions in Experiment 2 were represented by

Similar to LLP1, the LLP2 state would transfer as the flight-hour increasing, and transition probabilities were unavailable for LLP2. However, aero-engine state transition matrices were changed by LLP2.

On Em, action Am={ Arec,i,ArepT,0,ArepL,0|i=0,1,2,3}denoted no LLP replaced, and transition matrices are presented in Table 3, denoted as p(S,Arec,i,ArepT,0,ArepL,0|i=0,1,2,3)= p3{PF,i,,PN,PR,i|i=0,1,2,3}. The concrete forms of PF,, and PRwere the same as those in Experiment 1.

State transition matrices of Am={ Arec,i,ArepT,1,ArepL,0|i=0,1,2,3} are presented in Table 4, denoted as p(S,Arec,i,ArepT,1,ArepL,0|i = 0,1,2,3) = p4{ PF,i,,PN,PR,i|i = 0,1,2,3} .

State transition matrices of Am={Arec,i,ArepT,0,ArepL,1|i=0,1,2,3} are presented in Table 5, denoted as p(S,Arec,i,ArepT,0,ArepL,1|i = 0,1,2,3) = p5{PF,i,,PN,PR,i|i = 0,1,2,3}.

State transition matrices of Am={Arec,i,ArepT,1,ArepL,1|i=0,1,2,3} are presented in Table 6, denoted as p(S,Arec,i,ArepT,1,ArepL,1|i = 0,1,2,3) = p6{PF,i,,PN,PR,i|i = 0,1,2,3}.

4.5.3. Maintenance policy optimization

In Experiment 2, the reinforcement learning algorithm of Gauss-Seidel value iteration was also adopted to optimize the maintenance policy. Hypothetical costs were adopted,and the LLP1 replacement cost Cllp,1was assumed to be 300 while the LLP2 replacement cost Cllp,2was assumed to be 600. The three performance recovery action costs were assumed to be 200, 500, and 800, respectively. The discount factor was set as γ=0.9. Because the aero-engine state was represented by a three-dimensional space, including the LLP1, LLP2, and performance states, the optimal maintenance policy was represented by a three-dimensional policy map, shown in Fig. 11.

Table 6 Transition probability matrices of LLP1 and LLP2 replacement actions.

Fig. 11 Maintenance policy map of Experiment 2.

Table 7 Algorithm information of two experiments.

In the three-dimensional policy map,the LLP1,LLP2,and performance states were regarded as x axis, z axis, and y axis,respectively.The maintenance actions were presented in different colors and shapes.In the legend of the maintenance policy map in Fig. 11, A0denoted action {Arec,0,ArepT,0,ArepL,0}; A1denoted action {Arec,1,ArepT,0,ArepL,0}; A4denoted action{Arec,0,ArepT,1,ArepL,0}; A5denoted action {Arec,0,ArepT,0,ArepL,1}; A6denoted action {Arec,1,ArepT,1,ArepL,0}; A9denoted action {Arec,1,ArepT,0,ArepL,1}; A12denoted action {Arec,1,ArepT,1,ArepL,1}; A15denoted action {Arec,0,ArepT,1,ArepL,1}.Based on the policy map, a maintenance policy was obtained according to the aero-engine state.

4.5.4. Algorithm analysis

In aforementioned two numerical experiments, a twodimensional state space was adopted in Experiment 1, while a three-dimensional state space was adopted in Experiment 2.It was obvious that state transition matrices of Experiment 2 were more complicated. Thus, the algorithm would become complex as the state number increased. The algorithm information of the two numerical experiments is shown in Table 7.

In the aforementioned numerical experiments,performance and random failure states were defined as probability states,because they transferred by probabilities. LLP states did not transfer by probabilities, and were defined as definite states.As shown in the two numerical experiments, state transition matrix forms were impacted by definite states. Transition probabilities were impacted by probability states.In reinforcement learning, a larger state space would make the algorithm more complicated, and more iterations were needed to seek for the optimal policy. The impact of aero-engine state space complexity on the algorithm was obvious.

As the discount factor is an important coefficient in reinforcement learning, the discount factor impact was analyzed by contrast experiments, which were based on Experiment 1.In addition to the discount factor, the other parameters were all the same as those in Experiment 1. Maintenance policy maps of discount factor analysis are shown in subgraphs of Fig. 12.

Fig. 12 Policy maps of discount factor analysis in Example 1.

In the policy maps,different maintenance actions were represented by different colors and shapes. In the legend of Fig. 12, A0denoted action {Arep,0,Arec,0}; A1denoted action{Arep,0,Arec,1}; A2denoted action {Arep,0,Arec,2};A4denoted action {Arep,1,Arec,0}; A5denoted action {Arep,1,Arec,1}; A6denoted action {Arep,1,Arec,2}. The analysis experiments showed that when the discount factor was set as γ >0.43,the optimal policy maps were the same as those in Fig. 10;when the discount factor was set as γ <0.35, the optimal policy maps were the same as those in Fig. 12(c). In Fig. 12, discount factors were set as γ = 0.43, γ = 0.41, and γ = 0.35,respectively. The analysis experiments showed that when a smaller discount factor was adopted, more low-cost maintenance actions were adopted in the optimal maintenance policy.It was consistent with the aforementioned analysis of the discount factor.

As hypothetical maintenance costs were adopted in numerical experiments, cost impacts were analyzed. Based on the aforementioned experiments, the medium performance recovery cost was set as Crep,2=350, and it was regarded as the benchmark cost. Maintenance policy maps of different cost ratios are shown in subgraphs of Figs. 13 and 14.

Fig. 13 Policy maps of performance recovery cost analysis in Example 1.

In the legend of Fig. 13, A0denoted action {Arep,0,Arec,0};A1denoted action {Arep,0,Arec,1}; A2denoted action{Arep,0,Arec,2};A3denoted action {Arep,0,Arec,3}; A4denoted action {Arep,1,Arec,0}; A5denoted action {Arep,1,Arec,1}; A7denoted action {Arep,1,Arec,3}. In the legend of Fig. 14, A0denoted action {Arec,0,ArepT,0,ArepL,0}; A1denoted action{Arec,1,ArepT,0,ArepL,0};A3denoted action {Arec,3,ArepT,0,ArepL,0}; A4denoted action {Arec,0,ArepT,1,ArepL,0}; A5denoted action {Arec,0,ArepT,0,ArepL,1}; A6denoted action{Arec,1,ArepT,1,ArepL,0}; A8denoted action {Arec,3,ArepT,1,ArepL,0}; A9denoted action {Arec,1,ArepT,0,ArepL,1}; A12denoted action{Arec,1,ArepT,1,ArepL,1};A14denoted action{Arec,3,ArepT,1,ArepL,1}; A15denoted action {Arec,0,ArepT,1,ArepL,1}.

Fig. 14 Policy maps of performance recovery cost analysis in Example 2.

Fig. 15 Policy maps of LLP replacement cost analysis in Example 1.

As shown in Figs.13(a)and 14(a),showed the optimal policy when Crec,1decreased; Figs. 13(b) and 14(b) showed the optimal policy when Crec,3decreased; Figs. 13(c) and 14(c)showed the optimal policy when Crec,1and Crec,3decreased simultaneously. As shown in Figs. 13(c) and 14(c), when Crec,1and Crec,3decreased simultaneously, optimal policy changes were not obvious.

The impact of the LLP replacement cost on the optimal maintenance policy was analyzed by contrast experiments.Experiment results showed that the optimal maintenance policy would not change as the LLP replacement cost increasing.However,LLP replacement action times would increase as the LLP replacement cost decreasing.Optimal policy maps of LLP replacement cost analysis are shown in Figs. 15-17.

In the legend of Fig. 15, A0denoted action {Arep,0,Arec,0};A1denoted action {Arep,0,Arec,1}; A2denoted action{Arep,0,Arec,2};A4denoted action {Arep,1,Arec,0}; A6denoted action {Arep,1,Arec,2}. In the legends of Figs. 16 and 17, A0denoted action {Arec,0,ArepT,0,ArepL,0}; A1denoted action{Arec,1,ArepT,0,ArepL,0};A2denoted action {Arec,2,ArepT,0,ArepL,0}; A4denoted action {Arec,0,ArepT,1,ArepL,0}; A5denoted action {Arec,0,ArepT,0,ArepL,1}; A6denoted action{Arec,1,ArepT,1,ArepL,0}; A7denoted action {Arec,2,ArepT,1,ArepL,0}; A9denoted action {Arec,1,ArepT,0,ArepL,1}; A13denoted action {Arec,2,ArepT,1,ArepL,1}; A15denoted action{Arec,0,ArepT,1,ArepL,1}.

As shown in Fig. 15, the LLP replacement cost decreased from subgraph (a) to (c). In Figs. 16 and 17, LLP1 and LLP2 replacement costs decreased from subgraph (a) to (b).It was shown that LLP replacement times would increase as the LLP replacement cost decreasing, and policy changes appeared on interface decision epochs.

In the aforementioned experiments, the impact of the LLP residual life was not considered. Thus, based on the assumption that the random failure probability would increase as the LLP residual life decreasing, contrast experiments were performed to analyze the optimization algorithm.Optimal policy maps are shown in Fig. 18.

Fig. 16 Policy maps of LLP1 replacement cost analysis in Example 2.

Fig. 17 Policy maps of LLP2 replacement cost analysis in Example 2.

In the legend of Fig. 18, A0denoted action {Arec,0,ArepT,0,ArepL,0}; A1denoted action {Arec,1,ArepT,0,ArepL,0}; A4denoted action {Arec,0,ArepT,1,ArepL,0}; A5denoted action {Arec,0,ArepT,0,ArepL,1}; A6denoted action {Arec,1,ArepT,1,ArepL,0}; A9denoted action {Arec,1,ArepT,0,ArepL,1}; A12denoted action{Arec,1,ArepT,1,ArepL,1}; A15denoted action {Arec,0,ArepT,1,ArepL,1}. Based on Experiment 2, according to the residual life of the elder LLP, random failure probabilities were enhanced by 5%, 10%, and 15% respectively from subgraph (a) to (c)of Fig. 18. Policy maps showed no variation. Thus, the LLP residual life may not affect the optimal maintenance policy.

5. Conclusions

Based on the reinforcement learning approach,an aero-engine life-cycle maintenance policy optimization algorithm was proposed, which was able to address the long service-life and the hybrid maintenance strategy synchronously. To address the hybrid maintenance strategy,the multi-dimensional state space was adopted to represent the aero-engine state. Based on the reinforcement learning framework, the Gauss-Seidel value iteration algorithm was adopted to optimize the life-cycle maintenance policy.

Compared with traditional optimization methods, the optimal maintenance policy was used to indicate when and how to repair an aero-engine, taking the place of maintenance intervals and work-scopes in traditional methods.Meanwhile, the aero-engine long service-life, the hybrid maintenance strategy, and random factor destabilization were all addressed by the proposed optimization algorithm.Because few historical data was available for training the pre-specified optimization model of the aero-engine lifecycle maintenance policy, the reinforcement learning approach provided an appropriate way. In the reinforcement learning framework, the aero-engine state space, maintenance actions, and state transition matrices were determined according to aero-engine real-life operation. The Gauss-Seidel value iteration algorithm was employed to solve the long-term decision-making problem. The proposed optimization algorithm would help in making a wiser aero-engine life-cycle maintenance policy, resulting in a lower life-cycle maintenance cost. Two numerical experiments and algorithm analyses were employed to illustrate the proposed optimization algorithm in detail.

As real aero-engine maintenance cost data was unavailable,hypothetical data was adopted in the numerical experiments.In future studies, maintenance cost calculation methods deserve further attention to improve the applicability of the proposed optimization algorithm.

Fig. 18 Policy maps with LLP lifetime impact on transition probability.

Acknowledgments

The authors thank anonymous reviewers for their critical and constructive review of the manuscript. This work was cosupported by the Key National Natural Science Foundation of China (No. U1533202), the Civil Aviation Administration of China (No. MHRD20150104), and the Shandong Independent Innovation and Achievements Transformation Fund,China (No. 2014CGZH1101).