Indonesian J our nal of Electrical Engineering and Computer Science V ol. 20, No. 2, September 2020, pp. 854 862 ISSN: 2502-4752, DOI: 10.11591/ijeecs.v20i2.pp854-862 r 854 Thr oughput maximization f or full-duplex tw o-way r elay with finite b uffers Betene Anyugu Francis Lin Department of Electronic Engineering, Shanghai Jiao T ong Uni v ersity , China Article Inf o Article history: Recei v ed Feb 11, 2020 Re vised Apr 14, 2020 Accepted Apr 29, 2020 K eyw ords: Buf fer -aided relaying Full-duple x Q-learning Reinforcement learning Throughput ABSTRA CT Optimal queueing control of multi -hop netw orks remains a challenging problem, especially in tw o-w ay relaying systems, e v en in the most straightforw ard scenarios. In this paper , we e xplore tw o-w ay relaying ha ving a full-duple x decode-and-forw ard relay with tw o finite b uf fers. Principally , we propose a no v el concept based on the multi-agent reinforcement learning (that maximizes the cumulati v e netw ork through- put) based on the combination of the b uf fer states and the lossy links; a decision is generated as to whether it can transmit, recei v e or e v en simultaneousl y recei v e and transmit information. T o w ards this objecti v e, chiefly , based on t he queue state transi- tion and the lossy links, an analytic Mark o v decision process is proposed to analyze this scheme, and the throughput and queueing delay are deri v ed. Our numerical results re v eal e xciting insights. First, artificial intelligence based on reinforc ement learning is optimal when the length of the b uf fer is superior to a certain threshold. Second, we demonstrate that reinforcement learning can boost transmission ef ficienc y and pre v ent b uf fer o v erflo w . Copyright c 2020 Insitute of Advanced Engineeering and Science . All rights r eserved. Corresponding A uthor: Betene An yugu Francis Lin, Department of Electrical and Computer Engineering, Shanghai Jiao T ong Uni v ersity , Shanghai, China. Email: francislin@sjtu.edu.cn 1. INTR ODUCTION W ith the significant e v olution of computer in term of speed, it becomes easy to implement ne w algo- rithms to boost suf ficiently its performance, arti ficial intelligence (AI) h a s become the k e yw ord which defines the future and e v erything that it holds. Not only has Artificial Intelligence tak en o v er traditional methods of computing, b ut it has also changed the w ay industries perform. Recently , AI algorithms ha v e f as cinated re- searchers and ha v e also been applied successfully to solv e probl ems in engineering, such as visual perception, speech recognition, decision-making, and translation between languages. Our research aims to design a ne w concept of artificial intelligence based on the full-duple x tw o-w ay relaying system with reinforcement learning, such that the achie v able data rate/throughput is maximized, and the bottleneck problem of b uf fer o v erflo w is remedied. T w o-w ay relaying systems, which w as first studied by Shannon in [1], adding with the relay to boost the e xchange of information between tw o nodes has been dramatically analyzed and implemented to o v ercome the incessant demand of speed. In [2-10], the capacity of the tw o-w ay relay channel (TWRC) in half-duple x and full-duple x (FD) has been e xtensi v ely in v estig ated, and the a v erage throughput and delays of the proposed protocols were e v aluated. W ith the benefit of AI, for cooperati v e wireless netw orks, in half-duple x. Reinforce- ment learning(RL) is used to increase the performance of cooperati v e wireless netw orks. In [11], the authors introduced the frame w ork of communication systems and the jamming modes commonly used in communica- J ournal homepage: http://ijeecs.iaescor e .com Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 855 tion. Then the basic principle of the quality–learning(Q-Learning) algorithm is briefly introduced. Ho we v er , the y didn’ t analyze the throughput and system delay . In [12], the authors e xamined the problem of relay node selection in cooperati v e netw orks, in which one relay node can be used by multiple source- d e stination trans- mission pairs, and all transmission pairs share the same set of relay nodes. Ho we v er , the throughput w as not optimized. In[13], the authors in v estig ate the po wer control problem in a cooperati v e netw ork with multiple wireless transmitters, multiple amplify-and-forw ard relays, and one dest ination. Ne v ertheless, the po wer con- trol problem in a cooperati v e netw ork for the tw o-w ay relay w as not in v estig ated. The goal is to maximize the long-term system throughput by fully e xploiting multi-user di v ersity in the netw ork. Furthermore, the delay performance of the netw ork,which is v ery crucial to e v aluate the system netw ork w as not in v estig ated. Whereas in [14, 15], the authors present a no v el deep reinforcement learning-based joint spectrum sensing and po wer control algorithm for do wnlink communications in a cogniti v e small cell. Re grettably , the authors did not in v estig ate the outage probability for the whole system. In[16], the authors propose an i terati v e optimal po wer allocation (OP A) method to maximize the deri v ed end-to-end sum rate, which is based on Lagrangian and Ne wton-Raphson algorithm under total po wer constraint. The authors used reinforcement learning for tw o-w ay relaying systems. Ho we v er , the system delay w as not in v estig ated. Ho we v er , in [17], the authors in v estig ated tw o-w ay practical relay protocol with an enhanced transmission scheduling, which tak es a joint consideration of the finite relay b uf fers, signalling o v erheads, and lossy links. In this re g ard, to impro v e the performance of the system, full-duple x tw o-w ay relaying with b uf fer is proposed due to its throughput merits. Ho we v er , the authors in [18] propose a no v el adapti v e protocol (that maximises the cumulati v e netw ork throughput) based on the combination of the b uf fer states, lossy link, and the outage probability; a decision is generated as to whether it can transmit, recei v e or e v en simultaneously recei v e and transmit information. Similarly , the y in v estig ated the queue state, the transition matrix, an analytic Mark o v chain model w as proposed to analyse this scheme, and the throughput and queueing delay were deri v ed. From the a v ailable literature, we are the first and the only one to ha v e proposed a concept of the full-duple x tw o-w ay relay (FD-TWR) with reinforcement learning, such that the achie v able data rate/throughput is maximized. The contrib ution of this paper can be summarized as follo ws: W e propose a reinforcement learning approach for full-duple x tw o-w ay relay based on multi-agents. As f ar as we kno w , we are the first to introduce the use of reinforcement learning for the full-duple x tw o-w ay relay , and we do not only tak e into consideration the instantaneous qualities of the in v olv ed links b ut also consider the states of the queues at the b uf fers for the full-duple x tw o-w ay relay . W e present a general frame w ork for obtaining the a v erage throughput and the system delay based on Mark o v decision process of the queue states at the b uf fers. 2. RESEARCH METHOD 2.1. Channel models In this paper , throughout this paper , we assume that source and destination al w ays transmit data, we in v estig ate the maximum rate of the three-node model-based full-duple x tw o-w ay relay(FD-TWR) with finite b uf fers in the decode-and-forw ard mode, denoted as Q a and Q b . There is no direct link between node A and node B. W e consider the FD-TWR with finite b uf fer -aided where tw o users named A and B e xchange data via the relay (R),which is equipped with tw o finite b uf fers, see Figure 1. On another note, we denote T as the state when the relay transmits data, R as the state when the relay recei v es data and T R as the state when the relay transmits and recei v es data at the same time. In paral lel, we denote E as the state when the b uf fers are empty , F as the st ate when the b uf fers are full, and F E state of the b uf fers when the b uf fers are neither empty nor full. W e denote the channel coef ficients as h AR , h B R , h R R representing the channels from the source A to the relay R and from the source B to the relay R, and the self-interference, respecti v ely . Namely , the relay can transmit and recei v e data at the same time. Moreo v er , due to the co-channel transmission and imperfect interference cancellation, we assume that the FD-TWR w ould under go more se v ere self-interference. Thus, in essence, one of the byproducts of this w ork is to, unlik e in [19], assume that user A and user B ha v e enough information to send to the relay in some time slots, and do not ha v e enough information in other time slots. Especially , we assume that the state of b uf fers can also be empty or full in all time slots. Finally , for the theoretical approach, we assume that the full-duple x relay is perfect, that means that h R R = 0 and h AR , h B R are binary dependent v ariables. Thr oughput maximization for full-duple x two-way r elay with... (Betene Anyugu F r ancis Lin) Evaluation Warning : The document was created with Spire.PDF for Python.
856 r ISSN: 2502-4752 Figure 1. System model 2.2. Reinf or cement lear ning f or multi-agent In this section, unlik e [11-15] where unsupervised and supervised learning are used, we implement the system model based on reinforcement learning, as sho wn in Figure 2. Figure 2. Reinforcement learning c ycle The current state of the en vironment which the agent observ ers is defined as s t , the action of the agent is defined as a t .The re w ard (or penalty) recei v ed by the action a t reflects ho w perfect the pre vious action a t 1 in the en vironment s t 1 . The model of the single-agent RL is a Mark o v decision process. It is defined as: f : S A ! [0 ; 1] : S A ! < (1) where S is the finite set of the en vironment states, A is the finite set of agent actions, f is the state transition probability function, is the re w ard function. F or the dynamic en vironment, the transition probability function and the re w ard are: where s t and s t 1 are the en vironment states at t and t 1 , A t is the agent actions at time t, f t is the state transition probability function, p t is the re w ard function. The beha vior of the agent is described by the polic y , which specifies ho w does the agent choose its actions from the gi v en state. The polic y may be either static and dynamic. In static case = S ! A , the polic y is stationary . In a dynamic case, t : S A ! [0, 1] , the polic y is time-v arying. The path planning is to find a polic y such that the return R is maximized for the state S , R = E f 1 X k =0 k r k +1 j s 0 = s; g (2) Indonesian J Elec Eng & Comp Sci, V ol. 20, No. 2, September 2020 : 854 862 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 857 where [0 ; 1) is the discount f actor . This form is tak en o v er the probabilistic state transition under the polic y . R also represents the re w ard accumulated by the agent. The task of the agent is, therefore, to maximize its long-term performance. It is obtained by computing the optimal state action-v alue function, called Q-function. The Q-function is presented as Q h : S A ! R Q ( s; a ) = E f 1 X k =0 k r k +1 j s 0 = s; a 0 = a; g (3) The optimal Q-function is defined as Q ( s; a ) = max Q ( s; a ) (4) Q ( s; a ) can be written as V ( s ) := max a 2A ( s ) Q ( s; a ) = max a E [ 1 X k =0 k r t + k +1 j s t = s; a t = a ] = max a E [ r t +1 + 1 X k =0 k r t + k +2 j s t = s; a t = a ] = max a E [ r t +1 + V ( s t +1 ) j s t = s; a t = a ] = max a X s 0 2 S P ( s 0 j s; a )[ R ( s; a; s 0 ) + V ( s 0 )] (5) In the multi-agent case, the state transitions are the result of joint actions a k within all agents, a k P : S A S ! R f : S A S ! [0 1] A = A 1 A 2 ::: A n (6) The joint polic y is defined as . The return of the multi-agent depends on the joint polic y is a k = a T 1 ;K ; :::; a T n;K ] T a k A; a i;k A i R i ( x ) = E f 1 X k =0 k r k +1 j s 0 = s; g (7) The Q-function of each agent depends on the joint action and the join polic y Q i : S A ! R Q i ( s; a ) = E f 1 X k =0 k r k +1 j s 0 = s; a 0 = a; g (8) The transition diagram can be represented as, as sho wn in Figure 2. Thr oughput maximization for full-duple x two-way r elay with... (Betene Anyugu F r ancis Lin) Evaluation Warning : The document was created with Spire.PDF for Python.
858 r ISSN: 2502-4752 Figure 3. T ransition diagram 2.3. Thr oughput-delay analysis In this section, we present a general frame w ork for the throughput maximization. F or simplicity , the Mark o v process decision is reducible since there are isolated states in the Mark o v process deci sion. W e can represent al l this me chanics on a re w ard table. On this table, the ro ws represent the rooms, and the columns represent the actions. The v alues on this matrix represent the re w ards, the v alue (-1) indicates that some specific actions are not a v ailable. W e assume that the probability of mo ving for one state to anot her can be represented as p 1 and p 2 . Therefore, for simplicity , we ass ume that p 1 = p 2 . Consequently , we can represent all this mechanics on a re w ard table R R = 2 6 6 6 6 6 6 4 1 1 1 1 0 1 1 1 1 0 1 100 1 1 1 0 1 1 1 0 0 1 0 1 0 1 1 0 1 100 1 0 1 1 0 100 3 7 7 7 7 7 7 5 (9) W e ha v e 3 possible actions: 1 , 4 or 5, with their re w ards 0, 0, 100. It is merely all positi v e v alues from ro w ”5”, and we are just interested in the one with the most significant v alue. Then, we select the biggest Q v alue with those possible actions by selecting Q(5,1), Q (5 ; 4) , Q (5 ; 5) . Q = 2 6 6 6 6 6 6 4 0 0 0 0 0 0 0 0 0 0 0 100 p 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 7 7 7 7 7 7 5 (10) By selecting the action ”1” as our ne xt state will ha v e no w the follo wing possible actions Q = 2 6 6 6 6 6 6 4 0 0 0 0 0 0 0 0 0 0 0 100 p 1 0 0 0 0 0 0 0 80 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 7 7 7 7 7 7 5 (11) Indonesian J Elec Eng & Comp Sci, V ol. 20, No. 2, September 2020 : 854 862 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 859 Q = 2 6 6 6 6 6 6 4 0 0 0 0 0 400 p 1 0 0 0 0 320 p 1 100 p 1 0 0 0 0 320 p 1 0 0 400 p 1 256 p 1 0 0 500 p 1 320 p 1 0 0 0 320 p 1 500 p 1 0 400 p 1 0 0 0 0 3 7 7 7 7 7 7 5 (12) Q = (3 ; 1) + 0 : 8 :max ([ Q (1 ; 3) ; Q (1 ; 5))] p 1 = 100 + 0 : 80 (13) After a lot of episodes(100000) our Q matrix can be considered to ha v e con v er gence, in this case Q will be lik e this: Q = 2 6 6 6 6 6 6 4 0 0 0 0 80 p 1 0 0 0 0 64 p 1 0 100 p 1 0 0 0 64 p 1 0 0 0 80 p 1 51 p 1 0 80 p 1 0 64 0 0 64 p 1 0 100 p 1 0 80 p 1 0 0 80 p 1 100 p 1 3 7 7 7 7 7 7 5 (14) Then, we define the throughput as the amount of successfully deli v ered data of both A and B per second, which can be gi v en by = Q a X 1 Q b X 1 Q ( s; a ) T Qa Qb = Q a X 1 Q b X 1 max Q ( s; a ) T Qa Qb = Q a X 1 Q b X 1 max a X s 0 2 S P ( s 0 j s; a )[ R ( s; a; s 0 ) + V ( s 0 )]( s; a ) T Qa Qb (15) where T is the fix ed transmission rate of the nodes. After that, we denote Q as the a v erage length of b uf fers, then we ha v e Q = a X a =1 b X b =1 n a;b a;b = n X k =1 k p k = p ( np ( n +1) ( n + 1) p n + 1) ( p 1) 2 : (16) The last equation in (16) can be pro v ed by induction as follo ws. When n = 1 , we ha v e p = p ( p 1+1 (1 + 1) p + 1) ( p 1) 2 = p ( p 2 2 p + 1) ( p 1) 2 : Assume it holds for n; n 2 , then for n + 1 , we ha v e Thr oughput maximization for full-duple x two-way r elay with... (Betene Anyugu F r ancis Lin) Evaluation Warning : The document was created with Spire.PDF for Python.
860 r ISSN: 2502-4752 Q = ( n + 1) P n +1 + p ( np n +1 ( n + 1) p n + 1) ( p 1) 2 = ( n + 1) p n +1 ( p 1) 2 + p ( np n +1 ( n + 1) p n + 1) ( p 1) 2 = p (( n + 1) p n +2 ( n + 1 + 1) p n +1 + 1) ( p + 1) 2 = n +1 X k =1 k p k : (17) Then (16) has been pro v ed. Finally , the a v erage pack et delay T can be properly defined according to the Little’ s La w in [25], i.e., T = Q : (18) 3. RESUL TS AND DISCUSSION In this section, we e v aluate the performance of the FD-TWR aided b uf fer in terms of the throughput and the system delay according to the formulas deri v ed in the abo v e section. F or the simulations, the size of the b uf fers ( N ) is finite, and we adopt N = 40 in all simulations. T Qa Qb is a v ariable rate, T = 5 is a fix ed rate, and p 1 = p 2 = 1 Figure 4 compares the system delay between FD-TWR aided b uf fer with Q-learning a nd FD-TWR aided b uf fer without Q-learning v ersus the size of the b uf fers. It re v eals that when the size of the b uf fers increases, the system delay is reduced, and the a v erage pack et delay of the full-duple x with Q-learning relay is smaller than the one without Q-learning. That means that with Q-learning after a certain threshold v alue 15 , the size of the b uf fer does n’ t af fect the performance of the system; this is due to the reinforcement learning algorithm. Figure 5 compares the throughput between FD-TWR aided b uf fer with Q-learning and the throughput of FD-TWR aided b uf fer without Q-learning v ersus the size of the b uf fers. v ersus the number of the b uf fer . The results sho w that when the b uf fer size is 5 , the throughput is constant for both curv es, this is due to the optimization of the full-duple x relay when it can recei v e and send at the same time. Moreo v er , we observ e that there is a correlation between the tw o curv es. Both curv es are steady , and the throughput of FD-TWR aided b uf fer with Q-learning o v ertak es the one without a Q-learning. 0 5 10 15 20 25 30 35 40 buffer size 0 10 20 30 40 50 60 System delay delay with Q-learning delay without Q-learning X: 15 Y: 21.75 X: 15 Y: 2.398 Figure 4. System delay v ersus b uf fer size Indonesian J Elec Eng & Comp Sci, V ol. 20, No. 2, September 2020 : 854 862 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 861 0 5 10 15 20 25 30 35 40 Buffer size  0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 Throughput (Mbps)  Throughtpt with Q-learning  Throughput without Q-learning X: 5 Y: 1.375 X: 5 Y: 0.6768 Figure 5. Throughput with Q-learning v ersus v ersus b uf fer size 4. CONCLUSION In this paper , we st udied the throughput and the system delay in FD-TWR netw orks using a rein- forcement learning scheme. Specifically , we considered a b uf fer -aided-relay which has a finite length b uf fer . Based on the Mark o v decision process, we characterized a reinforcement learning approach that can maximize the aggre g ate netw ork throughput by taking into considerati on the const raint of the b uf fer state. Moreo v er , we compared the pe rformance of the proposed transmission polic y to the FD-TWR with the reinforcement learning approach and the one without the reinforcement learning scheme, especially when our full-duple x relay has finite b uf fers. The numerical results sho wed that b uf fering relay techniques impro v e the capacity of relay netw orks in slo w f ading en vironments, and the reinforcement learning scheme significantly impro v es the throughput of the system by queuing or dequeuing the b uf fers rapidl y according to the b uf fer states and lossy link. Furthermore, the reinforcement learning scheme does not need too much b uf fer size to maximize a v erage throughput and the system delay . It also remedies the bottleneck problem of b uf fer o v erflo w . A CKNO WLEDGMENTS This research w as supported by Project 1236130208, supported by the CSC (China S cholarship Council). REFERENCES [1] C. E. Shannon, et al., ”T w o-w ay communication channels, Proc. 4th Berk ele y Symp. Math. Stat. Prob , v ol. 1, pp. 611-644, 1961. [2] G. Kramer , et al., ”Cooperati v e communications, F oundations and T rends R? in Netw orking , v ol. 1, no. 3-4, pp. 271-425, 2007. [3] P . Popo vski and H. Y omo, ”Ph ysical netw ork coding in tw o-w ay wireless relay channels, IEEE Interna- tional Conference on Communications , pp. 707-712, 2007. [4] L. Ong, ”The half-duple x g aussian tw o-w ay relay channel with direct links, IEEE International Sympo- sium on Information Theory , pp. 1891-1895, 2015. [5] L. Ding, M. T ao, F . Y ang, and W . Zhang, ”Joint scheduling and relay selection in one-and tw o-w ay relay netw orks with b uf fering, IEEE International Conference on Communications , pp. 1-5, 2009. [6] H. Liu, P . Popo vski, E. De Carv alho, and Y . Zhao, ”Sum-rate optimization in a tw o-w ay relay netw ork with b uf fering, IEEE Communications Letters , v ol. 17, no. 1, pp. 95-98, 2013. [7] Y . Gu and S. Aissa, ”Interference aided ener gy harv esting in decode-and-forw ard relaying systems, IEEE international conference on communications , pp. 5378-5382, 2014. [8] N. Zlatano v , D. Hranilo vic, and J. S. Ev ans, ”Buf fer -aided relaying impro v es throughput of full- duple x relay netw orks with fix ed-rate transmissions, IEEE Communications Letters , v ol. 20, no. 12, pp. 2446-2449, 2016. [9] K. T . Phan and T . Le-Ngoc, ”Po wer allocation for b uf fer -aided full-duple x relaying wi th imperfect self- interference cancelation and statistical delay constraint, IEEE Access , v ol. 4, pp. 3961-3974, 2016. Thr oughput maximization for full-duple x two-way r elay with... (Betene Anyugu F r ancis Lin) Evaluation Warning : The document was created with Spire.PDF for Python.
862 r ISSN: 2502-4752 [10] M. M. Razlighi and N. Zlatano v , ”Buf fer -aided relaying for the tw o-hop full-duple x relay channel with self-interference, IEEE T ransactions on W ireless Communications , v ol. 17, no. 1, pp. 477-491, 2018. [11] Z. Zhang, Q. W u, B. Zhang, and J. Peng, ”Intelligent anti-jamming relay communication system based on reinforcement learning, 2nd International Conference on Communication Engineering and T echnology , pp. 52-56, 2019. [12] Z. Chen, T . Lin, and C. W u, ”Decentralized l earning-based relay assignment for cooperati v e communica- tions, IEEE T ransactions on V ehicular T echnology , v ol. 65, no. 2, pp. 813-826, 2015. [13] F . Shams, G. Bacci, and M. Luise, ”Ener gy ef fici ent po wer control for multi ple-relay cooperati v e netw orks using q-learning, IEEE T ransactions on W ireless Communications , v ol. 14, no. 3, pp. 1567-1580, 2014. [14] X. Meng, H. Inaltekin, and B. Krongold, ”Deep reinforcement learning-based po wer control in full-duple x cogniti v e radio netw orks, IEEE Global Communications Conference (GLOBECOM) , pp. 1-7, 2018. [15] X. Qiu, T . Jiang, and N. W ang, ”Safe guarding multiuser communication using full-duple x jamming and q-learning algorithm, IET Communications , v ol. 12, no. 15, pp. 1805-1811, 2018. [16] K. Chang and Y . Choi, ”Performance e v aluation of in-band full-duple x system using one-time-slot tw o- w ay relay , IEEE Systems Journal , 2019. [17] S. Shi, S. Li, and J. T ian, ”Mark o v modeling for practical tw o-w ay relay with finite relay b uf fer , IEEE Communications Letters , v ol. 20, no. 4, pp. 768-771, 2016. [18] B. A. F . Lin, X. Y e, and S. Hao, ”Adapti v e protocol for full-duple x tw o-w ay systems with the b uf fer -aided relaying, IET Communications , v ol. 13, no. 1, pp. 54-58, 2018. [19] V . Jamal i, N. Zlatano v , and R. Schober , ”Bidirectional b uf fer -aided relay netw orks with fix ed rate trans- missionpart ii: Delay-constrained case, IEEE T ransactions on W ireless Communicati ons , v ol. 14, no. 3, pp. 1339-1355, 2015. [20] A. Shafeeq and K. Hareesha, ”Dynamic clustering of data with modified k-means algorithm, Proceedings of the 2012 conference on information and computer netw orks , pp. 221-225, 2012. [21] N. Kamari, I. Musirin, Z. Hamid, and A. A. Ibrahim, ”Optimal tuning of svc-pi controller using whale optimization algorithm for angle stability impro v ement, Indonesian Journal of Electrical Engineering and Computer Science , v ol. 12, no. 2, pp. 612-619, 2018. [22] M. M. S aufi, M. A. Zamanhuri, N. Mohammad, and Z. Ibrahim, ”Deep learning for roman handwritten character recognition, International Journal of Electrical Engineering and Computer Science , v ol. 12, no. 2, pp. 455-460, 2018. [23] A. H. Basori, A. T enria w aru, and A. B. F . Mansur , ”Intelligent a v atar on e-learning using f acial e xpression and haptic , TELK OMNIKA T elecommunication, Computing, Electronics and Control , v ol. 9, no. 1, 2011. [24] R. Luo, W . Liao, and Y . Pi, ”Discriminati v e supervi sed neighborhood preserving embedding feature e xtraction for h yperspectral-image classification, TELK OMNIKA T elecommunication, Computing, Electronics and Control , v ol. 10, no. 5, pp. 1051-1056, 2012. [25] J. D. Little, ”Or forumlittle’ s la w as vie wed on its 50th anni v ersary , Operations research , v ol. 59, no. 3, pp. 536-549, 2011. BIOGRAPHIES OF A UTHORS Francis BETENE is a professional research assistant at REMI-Ult ra-electronics with a Master of science from Hohai uni v ersity China (2013). He obtained a PhD De gree in wireless technologies from Shanghai Jiao-T ong uni v ersity (China) in 2019. His researches are in fields of electronics, digital systems, wireless, signal processing, and artificial intelligence. Recently , system’ s application on reinforcement learning has been tackled. He is af filiated with IEEE as re vie wer member . In IJECE, IAES journals, and other scientific publications, he has serv ed as in vited re vie wer . Indonesian J Elec Eng & Comp Sci, V ol. 20, No. 2, September 2020 : 854 862 Evaluation Warning : The document was created with Spire.PDF for Python.