Inter national J our nal of P o wer Electr onics and Dri v e Systems (IJPEDS) V ol. 12, No. 1, March 2021, pp. 551 557 ISSN: 2088-8694, DOI: 10.11591/ijpeds.v12.i1.pp551-557 551 Adapti v e dynamic pr ogramming algorithm f or uncertain nonlinear switched systems Dao Phuong Nam 1 , Nguy en Hong Quang 2 , Nguy en Nhat T ung 3 , T ran Thi Hai Y en 4 1 School of Electrical Engineering, Hanoi Uni v ersity of Science and T echnology , B ´ ach Khoa, Hai B ` a T rung, H ` a Noi, V ietnam 2,4 Thai Nguyen Uni v ersity of T echnology , So 666 D. 3/2, P , Th ` anh pho Th ´ ai Nguy ˆ en, Th ´ ai Nguy ˆ en, V ietnam 3 Electric Po wer Uni v ersity , 235 Ho ` ang Quoc V iet, Co Nhue, T u Li ˆ em, H ` a Noi 129823, V ietnam Article Inf o Article history: Recei v ed Feb 2, 2020 Re vised Dec 15, 2020 Accepted Jan 10, 2021 K eyw ords: Adapti v e dynamic programming HJB equation L yapuno v Neural netw orksstability Nonlinear switched systems ABSTRA CT This paper studies an approximate dynamic programming (ADP) strate gy of a group of nonlinear switched systems, where the e xternal disturbances are considered. The neu- ral netw ork (NN) technique is re g arded to estima te the unkno wn part of actor as well as critic to deal with the corresponding nominal system. The training technique is simul- taneously carried out based on the solution of minimizing the square error Hamilton function. The closed system’ s tracking error is analyzed to con v er ge to an attraction re gion of origin point with the uniformly ultimately bounded (UUB) description. The simulation results are implemented to determine the ef fecti v eness of the ADP based controller . This is an open access article under the CC BY -SA license . Corresponding A uthor: Nguyen Hong Quang Thai Nguyen Uni v ersity of T echnology , So 666 D. 3/2, P , Th ` anh pho Th ´ ai Nguy ˆ en, Th ´ ai Nguy ˆ en, V ietnam Email: quang.nguyenhong@tnut.edu.vn 1. INTR ODUCTION It is w orth noting that man y systems in industry can be describe d by swit ched system such as DC- DC con v erter [1]-[3], H-bridge in v erter [4], multile v el in v erter [5], photo v oltaic in v erter [6]. Although man y dif ferent approaches for switched systems ha v e been proposed, e.g., switching-delay tolerant control [7], clas- sical nonlinear control [8]-[12], the optimization approaches with the adv antage of mentioning the input/state constraint has not been mentioned much. The approaches of fuzzy and neural netw ork as well as ANN, par - ticle sw arm optimization (PSO) technique were in v estig ated in se v eral dif ferent systems such as photo v oltaic in v erter , transmission line. [13]-[17]. Adapti v e dynamic programming has been considered in man y situations, such as nonlinear continuous time systems [18], actuator saturation [19], li n e ar systems [20]-[22], output constraint [23]. In the case of non- linear systems, the algorithm should be implemented based on Neural Netw orks (NNs). Ho we v er , Kroneck er product w as emplo yed in linear systems. Furthermore, the data dri v en technique should to be mentioned to compute the actor/critic precisely . It should be noted that the robotic systems has been controlled by ADP algorithm [24]-[25]. Our w ork proposed the solution of adapti v e dynamic programming in nonlinear perturbed switching systems based on the neural netw orks. The consideration of the Halminton function enables us obtaining the learning technique of these neural netw orks. The UUB stability of closed system is analyzed and simulation results illustrate the high ef fecti v eness of gi v en controller . J ournal homepage: http://ijpeds.iaescor e .com Evaluation Warning : The document was created with Spire.PDF for Python.
552 ISSN: 2088-8694 2. PR OBLEM ST A TEMENTS Consider the follo wing uncertain nonlinear continuous time switched systems of the form: d dt ξ ( t ) = f i ( ξ ( t )) + g i ( ξ ( t )) ( u + ( ξ , t )) (1) where ξ ( t ) x R n denotes the state v ariables and u ( t ) u R m describes the control v ariables. The function β : [ 0 , + ) 7→ = { 1 , 2 , ..., l } is a information of switching processing, which is kno wn as a function with man y continuous piece wise depending on time, and l is the subsystems number . f i ( ξ ) are uncertain s mooth v ector funct ions with f i (0) = 0 . g i ( ξ ) are ment ioned as s mooth v ector functions with the property G min g i ( ξ ) G max . The switching inde x β ( t ) is unkno wn. Assumption 1: ( ξ , t ) is bounded by a certain function ϱ ( ξ ) as ( ξ , t ) ϱ ( ξ ) Consider the cost function connected with the uncertain switched system (1): J ( ξ , u ) = Z t r ( ξ ( τ ) , u ( τ )) (2) where r ( ξ , u ) = ξ T + u T R u and Q = Q T > 0; R = R T > 0 . The main purpose is to achie v e the state feedback control design and gi v e the upper bound term to guarantee the closed systems under this controller is rob ustly stable. Additionally , the performance inde x (2) is bounded as J K ( ξ , u ) M . Denition: The term K ( u ) is gi v en by the appropriate performance inde x. As a result, the control input u = arg min u u K ( ξ , u ) is mentioned as the optimal appropriate performance inde x method. 3. CONTR OL DESIGN The obtained nominal system after eliminating the disturbance in switched system (3) is described by: d dt ξ = f i ( ξ ) + g i ( ξ ) u (3) The performance inde x of system (3) is modied as (4) Q 1 ( ξ , u ) = Z t h r ( ξ , u ) + γ ( ρ ( ξ )) 2 i (4) W e pro v e that Q 1 ( ξ , u ) with γ R is the one of appropriate performance inde x es of dynamical system (1). Dene: V ( t ) = min u u Q 1 ( ξ , u ) , we ha v e (5) V ( t ) = min u u Z t r ( ξ , u ) + γ ρ 2 ( ξ ) (5) based on nominal system and cost function (4), it leads to Halminton function as (6) H ( ξ , u, V ) = r ( ξ , u ) + γ ρ 2 ( ξ ) + V ξ T ( f i ( ξ ) + g i ( ξ ) u ) (6) by using optimality principle, the optimal control input can be obtained as (7). u ( ξ ) = 1 2 R 1 ( g i ( ξ )) T V ξ (7) W e continue to utilize this control la w (7) for nonlinear continuous SW system (1) and obtain that: Theor em 1: The system (1) under the controller u ( ξ ) = 1 2 R 1 ( g i ( ξ )) T V ξ is stable with the associated L yapuno v function candidate: Int J Po w Elec & Dri Syst, V ol. 12, No. 1, March 2021 : 551 557 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Po w Elec & Dri Syst ISSN: 2088-8694 553 V ( t ) = Z t r ( ξ , u ) + γ ϱ 2 ( ξ ) (8) where γ R . Pr oof: T aking the deri v ati v e of V under the control input u ( ξ ) = 1 2 R 1 ( g i ( ξ )) T V , we imply that (9): d dt V = ξ T γ ϱ 2 ( ξ ) ( ξ , t ) T R ( ξ , t ) ( u + ( ξ , t )) T R ( u + ( ξ , t )) (9) It is able to conclude that (10): ˙ V ( t ) ξ T (10) Therefore, the system (1) is rob ustly stable. Ho we v er , it is impossible to solv e directly HJB equation. Hence, the optimal performance inde x V for system (3) can be described based on a NN as (11) V = w T σ ( ξ ) + ε ( ξ ) (11) where σ ( x ) : R n R N ; σ (0) = 0 , w R N is the NN constant weight v ector . σ ( x ) can be found to guarantee that when N , we ha v e: ε ( ξ ) 0 and ε ( ξ ) 0 , so for x ed N , we can assume that: Assumption 2: ε ( ξ ) ε max ; ∥∇ ε ( ξ ) ε max ; σ min ∥∇ σ ( ξ ) σ max ; w w max . Combining tw o formulas (10) and (11) we imply (12) H ( ξ , u , V ) = ξ T + λϱ 2 ( ξ ) + ( V ) T f i ( ξ ) 1 4 ( V ) T g i ( ξ ) R 1 g i ( ξ ) T ( V ) = 0 (12) F ormula (19) leads to (13). V = ( σ ( ξ )) T w + ε ( ξ ) (13) Obtain the description as (14). e N N = −∇ ε ( ξ ) T ( f i ( ξ ) + g i ( ξ ) u ) + 1 4 ε ( ξ ) T g i ( ξ ) R 1 g i ( ξ ) T ε ( ξ ) (14) It follo ws that e N N con v er ges uniformly to zero as N . F or each number N , e N N is bounded on a re gion as e N N e max . Under the structure of ADP-based controller , a critic NN is computed as (15). ˆ V = ˆ w T σ ( ξ ) = σ ( ξ ) T ˆ w ; ˆ u = 1 2 R 1 ( g i ( ξ )) T ˆ V (15) It is able to achie v e that: e H J B = ξ T + λϱ 2 ( ξ ) + ˆ w T σ ( ξ ) f i ( ξ ) 1 4 ˆ w T σ ( ξ ) g i ( ξ ) R 1 g i ( ξ ) T σ ( ξ ) T ˆ w (16) The training la w is handled based on a steepest descent method: d dt b w = α E b w (17) with E = 1 2 e T H J B e H J B . Remark 1: The weight b w is trained to minimize the netw ork error part G = 1 2 e T H J B e H J B . This result is obtained from (18). G t = α G b w 2 (18) Adaptive dynamic pr o gr amming algorithm for uncertain nonlinear switc hed systems (Dao Phuong Nam) Evaluation Warning : The document was created with Spire.PDF for Python.
554 ISSN: 2088-8694 Theor em 2: Consider the feedback controller in (15) and the critic weight is updated by (18), the weight estimate error ˜ w = w ˆ w and the closed system’ s state v ector x ( t ) are uniformly ultimately bounded (UUB). Pr oof: Let’ s choose the L yapuno v function: V ( t ) = V 1 ( t ) + V 2 ( t ) , where: V 1 ( t ) = 1 2 α ˜ w ( t ) T ˜ w ( t ) , V 2 ( t ) = V (19) Using the Assumption 3: f i ( ξ ) + g i ( ξ ) u ρ max and the denition: ρ i = f i ( ξ ) + g i ( ξ ) u ; G i = g i ( ξ ) R 1 g i ( ξ ) T ; σ = σ ( ξ ) ; ε = ε ( ξ ) . T aking the deri v ati v e of V 1 ( t ) , we imply that: ˙ V 1 ( t ) = ˜ w T e N N + ˜ w T σ µ i + 1 2 ˜ w T σ G i ε + 1 4 ˜ w T σ G i σ T ˜ w σ ( x ) µ i + 1 2 G i σ T ˜ w + ε (20) It leads to the estimation: ˙ V 1 ( t ) π 1 . F or the term V 2 ( t ) , from (20) we ha v e (21). ˙ V 2 = ( V ) T ( f i + g i ( ˆ u + ∆)) = ξ T + λρ 2 ( ξ ) 1 4 ( V ) T g i R 1 g T i ( V ) + 1 2 ( V ) T g i R 1 g T i σ ( ξ ) T ˜ w + ε ( ξ ) + ( V ) T g i (21) Assume that ρ ( ξ ) = ϖ ξ . From (40) we ha v e (22). ˙ V 2 ( λ min ( Q ) + λϖ ) ξ 2 + θ 2 (22) with θ 2 = 1 4 ( V ) T g i R 1 g T i ( V ) + 1 2 ( V ) T g i R 1 g T i σ ( x ) T ˜ w + ε ( x ) + ( V ) T g i . Based on the tw o abo v e assumptions, we ha v e (23). θ 2 1 4 ( w max σ max + ε max ) 2 g 2 max λ max R 1 + 1 2 ( ϑ σ max + ε max ) 2 g 2 max λ max R 1 + ( w max σ max + ε max ) g max ϖ x (23) It is ob vious that ( λ min ( Q ) + λϖ ) x 2 θ 2 π 2 with π 2 > 0 and we obtain (24). ˙ V 2 ( t ) π 2 (24) . Remark 2: The coef cients ϑ 1 ; ϑ 2 can be chosen by reno v ating the NN of the optimal performance inde x. Moreo v er , for arbitrary switching inde x, after V (0) min( π 1 ; π 2 ) the v ariable ξ and ˜ w tend to the accurate domains. The ADP controller ˆ u is proposed in (15), which tends to the neighborhood of u . Pr oof: The de viation of control input is estimated as (25). ˆ u u = 1 2 R 1 ( g i ( ξ )) T ( σ ( ξ )) T ˜ w + ε ( ξ ) 1 2 λ max R 1 .G max . ( σ max 1 + ε max ) = ϑ 3 (25) Thus the proof is completed. Int J Po w Elec & Dri Syst, V ol. 12, No. 1, March 2021 : 551 557 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Po w Elec & Dri Syst ISSN: 2088-8694 555 4. SIMULA TION RESUL TS In this section, we consider the simulations to v alidate the performance of the established c o nt rol scheme: Let N = 2 and the subsystems of the switched system are (26) and (27). ˙ x 1 = x 3 1 2 x 2 + ( u + 1 ( x, t )) ˙ x 2 = x 1 + 0 . 5 cos x 2 1 sin x 3 2 ( u + 1 ( x, t )) (26) ˙ x 1 = x 5 1 sin ( x 2 ) + ( u + 2 ( x, t )) ˙ x 2 = 1 2 x 1 cos ( x 1 ) cos x 3 2 ( u + 2 ( x, t )) (27) The initial state v ectors can be chosen as (28). x (0) = 5 5 T (28) Choosing that the parameter matrices: R = 2 0 0 2 ; Q = 1 0 0 3 ; α = 0 . 1; λ = 5 . The simulation results sho wn in Figure 1 and Figure 2 v alidate the ef fecti v eness of proposed algorithm. Figure 1. The response of x 2 Figure 2. The response of x 2 5. CONCLUSION This paper has in v estig ated the ADP problem of switched nonlinear systems under the e xternal dis - turbance. W e consider pre viously for nominal system by eliminating the disturbance, then using classical nonlinear control technique. The neural netw orks ha v e been designed to estimate the actor and critic NN of iteration. It is possible to de v elop the learning algorithm with simultaneous tuning. Finally , UUB description of the closed system is guaranteed under this w ork. A CKNO WLEDGEMENT This research w as supported by Research F oundation funded by Thai Nguyen Uni v ersity of T echnol- ogy . REFERENCES [1] V u, T ran Anh and Nam, Dao Phuong and Huong, Pham Thi V iet, Analysis and control design of transformerless high g ain, high ef cient b uck-boost DC-DC con v erters, in 2016 IEEE International Conference on Sustainable Ener gy T echnologies (ICSET) , Hanoi, 2016, pp. 72-77, doi: 10.1109/IC- SET .2016.7811759. Adaptive dynamic pr o gr amming algorithm for uncertain nonlinear switc hed systems (Dao Phuong Nam) Evaluation Warning : The document was created with Spire.PDF for Python.
556 ISSN: 2088-8694 [2] Nam, Dao Phuong and Thang, Bui Minh and Thanh, Nguyen T ruong, Adapti v e T racking Control for a Boost DC–DC Con v erter: A Switched Systems Approach, in 2018 4th Int ernational Conference on Green T echnology and Sustainable De v elopment (GTSD) , Ho Chi Minh City , 2018, pp. 702-705, doi: 10.1109/GTSD.2018.8595580. [3] Thanh, Nguyen T ruong and Sam, Pham Ngoc and Nam, Dao Phuong, An Adapti v e Backstepping Con- trol for Switched Systems in presence of Control Input Constraint, in 2019 International Conference on System Science and Engineering (ICSSE) , Dong Hoi, V ietnam, 2019, pp. 196-200, doi: 10.1109/IC- SSE.2019.8823125. [4] P anigrahi, Swetapadma and Thakur , Amarnath, “Modeling and simulation of three phases cascaded H- bridge grid-tied PV in v erter , Bulletin of Electrical Engineering and Informatics (BEEI), v ol. 8, no. 1, pp. 1-9, 2019, doi: 10.11591/eei.v8i1.1225. [5] De v arajan, N and Reena, A, “Reduction of switches and DC sources in Cascaded Multile v el In v erter , Bulletin of Electrical Engineering and Informatics (BEEI), v ol. 4, no. 3, pp. 186-195, 2015, doi: 10.11591/eei.v4i3.320. [6] V enkatesan, M and Rajeshw ari, R and De v erajan, N and Kaliyamoorth y , M, “Comparati v e study of three phase grid connected photo v oltaic in v erter using pi and fuzzy logic controller with switching losses cal- culation, International Journal of P o wer Electronics and Dri v e Systems (IJPEDS), v ol. 7, no. 2, pp. 543-550, 2016. [7] Zhang, Lixian and Xiang, W eiming, “Mode-identifying time estimati on and switching-delay tolerant con- trol for switched systems: An elementary time unit approach, Automatica , v ol. 64, pp. 174-181, 2016, doi: 10.1016/j.automatica.2015.11.010. [8] Y uan, Shuai and Zhang, Lixian and De Schutter , Bart and Baldi, Simone, A no v el L yapuno v function for a non-weighted L2 g ain of asynchronously switched linear systems, Automatica , v ol. 87, pp. 310-317, 2018, doi: 10.1016/j.automatica.2017.10.018. [9] Xiang, W eiming and Lam, James and Li, P anshuo, “On stability and H control of switched systems with random switching signals, Automatica , v ol. 95, pp. 419-425, 2018, doi: 10.1016/j.automatica.2018.06.001. [10] Lin, Jinxing and Zhao, Xudong and Xiao, Min and Shen, Jingjin, “Stabilization of discrete-time switched singular systems with stat e, output and switching delays, Journal of the Franklin Institute , v ol. 356, pp. 2060-2089, 2019, doi: 10.1016/j.jfranklin.2018.11.034. [11] Briat, Corentin, “Con v e x conditions for rob ust stabilization of uncertain switched systems with guaranteed minimum and mode-dependent dwell-time, Systems & Control Letters, v ol. 78, pp. 63-72, 2015, doi: 10.1016/j.sysconle.2015.01.012. [12] Lian, Jie and Li, Can, “Ev ent-triggered control for a class of switched uncertain nonlinear systems, Systems & Control Letters, v ol. 135, pp. 1-5, 2020, doi: 10.1016/j.sysconle.2019.104592. [13] An yaka, Bonif ace O and Manirakiza, J Felix and Chik e, K enneth C and Ok oro, Prince A, “Opti- mal unit commitment of a po wer plant using particle sw arm optimization approach, International Journal of Electrical and Computer Engineering (IJECE), v ol. 10, no.2, pp. 1135-1141, 2020, doi: 10.11591/ijece.v10i2.pp1135-1141. [14] De vi, P alakaluri Sri vidya and Santhi, R V ijaya, “Introducing LQR-fuzzy for a dynamic multi area LFC- DR model, International Journal of Electrical & Computer Engineering, v ol. 9, no. 2, pp. 861-874, 2019, doi: 10.11591/ijece.v9i2.pp861-874. [15] Omar , Othman AM and Badra, Ni v een M and Attia, Mahmoud A, “Enhancement of on-grid pv sys- tem under irr adiance and temperature v ariations using ne w optimized adapti v e controller , Interna- tional Journal of Electrical and Computer Engineering (IJECE), v ol. 8, no. 5, pp. 2650-2660, 2018, doi: 10.11591/ijece.v8i5.2650-2660. [16] Sharma, Purv a and Saini, Deepak and Sax ena, Akash, “F ault detection and classication in transmission line using w a v elet transform and ANN, Bulletin of Electrical Engineering and Informatics (BEEI), v ol. 5, no. 3, pp. 284-295, 2016. [17] Ilamathi, P and Selladurai, V and Balamurug an, K, “Predicti v e modelling and optimization of nitrogen oxides emission in coal po wer plant using Articial Neural Netw ork and Simulated Annealing, IAES International Journal of Articial Intelligence (IJ-AI), v ol. 1, no. 1, pp. 11-18, 2012. [18] V amv oudakis, K yriak os G and Vrabie, Draguna and Le wis, Frank L, “Online adapti v e algorithm for optimal control with inte gral reinforcement learning, International Journal of Rob ust and Nonlinear Int J Po w Elec & Dri Syst, V ol. 12, No. 1, March 2021 : 551 557 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Po w Elec & Dri Syst ISSN: 2088-8694 557 Control, v ol. 24, no. 17, pp. 2686-2710, 2013, doi: 10.1002/rnc.3018. [19] Bai, W eiwei and Zhou, Qi and Li, T ieshan and Li, Hongyi, Adapti v e rei nforcement learning neural netw ork control for uncertain nonlinear system with input saturation, IEEE transactions on c ybernetics, v ol. 50, no. 8, pp. 3433-3443, Aug. 2020, doi: 10.1109/TCYB.2019.2921057. [20] Chen, Ci and Modares, Hamidreza and Xie, Kan and Le wis, Frank L and W an, Y an and Xie, Shengli, “Re- inforcement learning-based adapti v e optimal e xponential tracking control of linear systems with unkno wn dynamics, in IEEE T ransactions on Automatic Control , v ol. 64, no. 11, pp. 4423-4438, No v . 2019, doi: 10.1109/T A C.2019.2905215. [21] V amv oudakis, K yriak os G and Ferraz, Henrique, “M odel-free e v ent-triggered control algorithm for continuous-time linear systems with optimal performance, in Automatica , v ol. 87, pp. 412-420, 2018, doi: 10.1016/j.automatica.2017.03.013. [22] Gao, W e inan and Jiang, Y u and Jiang, Zhong-Ping and Chai, T ian you, “Output-feedback adapti v e optimal control of interconnected systems based on rob ust adapti v e dynamic programming, Automatica, v ol. 72, pp. 37-45, 2016, doi: 10.1016/j.automatica.2016.05.008. [23] Zhang, T ianping and Xu, Haoxiang, Adapti v e optimal dynamic surf ace control of strict-feedback non- linear systems with output constraints, International Journal of Rob ust and Nonlinear Control, v ol. 30, no. 5, pp. 2059–2078, 2020, doi: 10.1002/rnc.4864. [24] W ang, Ding and Mu, Chaoxu, Adapti v e-critic-based rob ust trajectory tracking of uncertain dynamics and its application to a spring–mass–damper system, IEEE T ransactions on Industrial Electronics, v ol. 65, no. 1, pp. 654-663, Jan. 2018, doi: 10.1109/TIE.2017.2722424. [25] W en, Guoxing and Ge, Shuzhi Sam and Chen, CL Philip and T u, F angwen and W ang, Shengnan, Adap- ti v e tracking control of surf ace v essel using optimized backstepping technique, IEEE transactions on c ybernetics, v ol. 49, no. 9, pp. 3420-3431, Sept. 2019, doi: 10.1109/TCYB.2018.2844177. Adaptive dynamic pr o gr amming algorithm for uncertain nonlinear switc hed systems (Dao Phuong Nam) Evaluation Warning : The document was created with Spire.PDF for Python.