Inter national J our nal of Electrical and Computer Engineering (IJECE) V ol. 7, No. 6, December 2017, pp. 3484 3491 ISSN: 2088-8708 3484       I ns t it u t e  o f  A d v a nce d  Eng ine e r i ng  a nd  S cie nce   w     w     w       i                       l       c       m     Design and Implementation of an Embedded System f or Softwar e Defined Radio A. E. Abdelkar eem 1 , Saad Mohammed Saleh 2 , and Ammar D . J asim 3 1,3 Colle ge of Information Engineering, Al- Nahrain Uni v ersity , Baghdad, Iraq 2 Colle ge of Engineering, Diyala Uni v ersity , Diyala, Iraq Article Inf o Article history: Recei v ed: Mar 20, 2017 Re vised: Jun 18, 2017 Accepted: Jul 8, 2017 K eyw ord: DSP Embedded system Recei v er Synchronization ABSTRA CT In this paper , de v eloping high performance softw are for demanding real-time embed- ded systems is proposed. This softw are-based design will enable the softw are engi- neers and system architects in emer ging technology areas lik e 5G W ireless and Soft- w are Defined Netw orking (SDN) to b uild their algorithms. An ADSP-21364 floating point SHARC Digital Signal Processor (DSP) running at 333 MHz is adopted as a platform for an embedded system. T o e v aluate the proposed embedded system, an implementation of frame, symbol and carrier phase synchronization is presented as an application. Its performance is in v estig ated with an on line Quadrature Phase Shift k e ying (QPSK) recei v er . Obtained results sho w that the designed softw are is imple- mented successfully based on the SHARC DSP which can utilized ef ficiently for such algorithms. In addition, it is pro v en that the proposed embedded system is pragmatic and capable of dealing with the memory constraints and critical time issue due to a long length interlea v ed coded data utilized for channel coding. Copyright c 2017 Institute of Advanced Engineering and Science . All rights r eserved. Corresponding A uthor: Dr Ammar E. Al-Qassab Colle ge Of Information Engineering (COIE), Al- Nahrain Uni v ersity Baghdad, Jadiria,P .O. Box 64004,Iraq +964-7705802111 ammar .e@coie-nahrain.edu.iq 1. INTR ODUCTION Embedded systems ha v e g ained considerable attention in the last years. No w adays, de v eloping an embedded softw are is required e v erywhere specially with the adv ancements of netw orks technology which necessitates internet protocol (IP) for each de vice to deri v e the Internet of things (IoT). In terms of an em- bedded system, the authors in [1] de v elop an alarm system embedded in Altera DE0 FPGA and v alidated via an e xperiment test whereas in [2], a softw are frame w ork is suggested to increas e the speed of an embedded softw are. Selecting the most appropriate DSP processor and tackling a r eal-time signal is an important issue. Programmable DSP is more fle xible, of a lo wer cost, and a higher speed than other processors, so it has become the best solution for man y communication, medical, and industrial products because traditional microproces- sors are inappropriate for such applications. SHARC has been impro v ed by using separate memories for data and instruction. In addition, it in- cludes a high speed I/O controller to support Direct Memory Access (DMA). Furthermore, [3] mentioned that SHARC uses shado w re gisters for all the CPUs re gisters. The y are used to accomplish the interrupt quickly by mo ving the entire re gister contents to these re gisters in a single clock c ycle. Signal processing functions, such as V iterbi decoding, can be implemented using DSPs. F or e xample, Analog De vices, T igerSHARC ADSP- 101S and SHARC ADSP-21065L can be used in the baseba n d modem implementation. The first of these manipulates the V iterbi Decoder in 0.86 MIPS and 1024-point comple x FFT in 32.75 s, and has been used as a multiprocessor structure by [4] with FPGA to implement an OFDM underw ater acoustic communication system. The second manipulates 1024-point FFT i n 0.274 ms. I n addition, TMS320C6416 is designed for 3rd J ournal Homepage: http://iaesjournal.com/online/inde x.php/IJECE       I ns t it u t e  o f  A d v a nce d  Eng ine e r i ng  a nd  S cie nce   w     w     w       i                       l       c       m     DOI:  10.11591/ijece.v7i6.pp3484-3491 Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE ISSN: 2088-8708 3485 Generation P artnership Project (3GPP) turbo code and is capable of decoding up to 12 Mbps (6 It erations) [5]. In [6], a practical description of the design choices and hardw are implementation details based on Spartan3 xc3s2000 FPGA required to b uild an ef ficient sym bol synchronizer has been sho wn. The implemen- tation is suggested to be applied for a short-range, underw ater FSK acoustic modem. Furthermore, in [7] the transmitter has been implemented with multiple DSPs of type ADSP-TS101s and FPGA as the logical control. It has been pro v en e xperimentally that the signal transmitter satisfies requirements of signal transmis sion for OFDM in real-time underw ater acoustic communication. The paper contrib ution is to introduce a DSP-based embedded system for softw are defined radio (SDR). This embedded softw are is suitable for coherent recei v ers which are w orking in an online mode. By ob- serving the performance of the system, the time’ s disposition of the frame is assigned to each each stage on the recei v er accordingly . This sho ws that the proposed embedded recei v er is performing adequately . Furthermore, in the proposed design, the reception of incoming frames and their processing phases are interfered without using the e xternal memory . The remainder of the paper is or g anized as follo ws. In Section 2., the description of the proposed system is presented. Section 3. introduces the e xperimental results. Finally , conclusions are dra wn in Section 4. 2. PR OPOSED SYSTEM Input bit sequence are encoded to ha v e an immunity ag ainst channel errors. A simple code rate- 1 = 2 nonsystematic con v olutional (NSC) code and constraint length K = 5 is selected as a channel code. In order to permute the data, the encoder output is interlea v ed. The interlea v er is consequently randomize error . Interlea v ed bits are transmitted using quadrature phase shift k e ying (QPSK) with a carrier frequenc y of 10 kHz and a symbol rate of 4 ksps. Figure 1. Block diagram of the transmitting system 2.1. T ransmitter The block diagram of the transmitting system is sho wn in Fig.1. In this system, we suppose that the input message is con v erted into bit stream to be transmitted to the other site using wireless channel. Fig.2 sho ws the frame structure, which is composed linear fre q ue n c y modulation (LFM ) of 10 ms that can be utilized for frame synchronization, silent period of 12.5 m s w as used to mitig ate interference between the LFM and the training sequence due to the multipath. In addition, Pseudo-noise (PN) sequence is transmitted to initiate carrier phase reco v ery in the second stage of synchronization. Fig.3 sho ws the real-time transmission of the frame presented in Fig.2. This figure depicts the entire frame structure which contains tw o silent periods, LFM and data. 2.2. Recei v er The recei v er structure is illustrated in Fig.4. The recei v ed frame can be represented as: y ( k ) = r ( k ) + w ( k ) = x ( k ) h ( k ) + w ( k ) ; (1) Design and Implementation of an Embedded System for Softwar e Defined Radio (A. E. Abdelkar eem) Evaluation Warning : The document was created with Spire.PDF for Python.
3486 ISSN: 2088-8708 Figure 2. T ransmitter frame structure Figure 3. On-line transmitted signal where y ( k ) is the recei v ed signal at k time inde x, h ( k ) represents the channel impulse response which is gi v en as: h ( k ) = L 1 X l =0 h l ( k ) ( l ) ; (2) where L is the number of t aps, l is the delay spread associated with the l -th tap and h l ( k ) is the comple x- v alued channel f ading coef ficient of the l -th tap, w ( k ) represents the additi v e white Gaussian noise (A WGN) samples, denotes the circular con v olution operation. The transmitted samples x ( k ) are con v olv ed with h ( k ) and the recei v ed passband signal is gi v en as: r ( k ) = L 1 X l =0 h ( l ) x (( k l )) N ; k = 0 : : : N 1 : (3) Figure 4. Block diagram of the recei v er In this paper , tw o stages of the synchronization are considered on the recei v er side as an application for the proposed em bedded system. Therefore, the decoder is implemented to to be used later . In the decoding stage, an optimized v ersion of V eterbi decoder which is called BCJR is consi dered. Refer to [8] for more details on BCJR. Once the synchronization is achie v ed, a reliable data can be deli v ered to the decoding stage. IJECE V ol. 7, No. 6, December 2017: 3484 3491 Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE ISSN: 2088-8708 3487 A. P ack et synchr onization The recei v ed signal r ( k ) , is sampled in the analog-to-digital con v erter (ADC) and scaled. The LFM detection process which represents the header of the frame structure sho wn in Fig.2 is performed by comparing the threshold v alue with the cross correlation of the recei v ed signal r ( k ) and kno wn LFM t n ; the start of the pack et. Equation 4 sho ws ho w to calculate the cross correlation [9]. The v alue of k that corresponds to max- imum absolute v alue of the cross correlation is the pack et timing estimate. If the cross correlation is greater than the threshold v alue, the frame synchronization is achie v ed. ^ t s = ar g max k C l 1 X n =0 r k + n t n (4) In Equati on 4, the length C l of the cross correlation determines the performance of the algorithm. Lar ger v alues impro v e performance, ho we v er it increases the amount of computation required. In hardw are implementations, FIR-correlation is adopted. This filter is conducting LFM with its flipped v ersion. Fig.5 depicts the real- time cross correlation detection of the LFM peak signal. This signal is then filtered in the frequenc y band [ f c R = 2 ; f c + R = 2] where R is the data rate. The sampling rate is chosen to be an inte ger multiple of carrier frequenc y f c for the sak e of simple data manipulation. Figure 5. Peak detection of the recei v ed signal Design and Implementation of an Embedded System for Softwar e Defined Radio (A. E. Abdelkar eem) Evaluation Warning : The document was created with Spire.PDF for Python.
3488 ISSN: 2088-8708 B. Symbol Synchr onization Early-late based t iming reco v ery is adopted. This symbol synchronizati on algorithm generates its er - ror based on samples that are early and late compared to the ideal sampling point. W e use a b uf fer of length N = 12 to store the matched filter output and measuring the ener gy in the left (early) and right (late) half of the b uf fer as: E ear l y = 5 X n =0 ( x I [ n ]) 2 + ( x Q [ n ]) 2 ; E l ate = 11 X n =6 ( x I [ n ]) 2 + ( x Q [ n ]) 2 (5) It is well kno wn that the early and late samples are at dif ferent amplitudes. By comparing the amplitudes of the early and late samples, the timing error is generated. T o el iminate this error , it is required to use a technique such as early-late synchronization which will produce better results and maintain perfect symbol timing. A delay line of one symbol time T s is created and the total ener gy in the early and late samples will be compared. The sample to be used for later processing is the sample that lies in the middle of the early and late samples. Symbol timing is then adjusted in order to maintain approximately equal ener gy in the tw o halv es then the center of the delay line corresponds to the optimal sampling point. C. Carrier synchr onization It is noticed that I-Q constellation has a v aryi n g phase of fset due to a carrier frequenc y and its phase mismatch between the transmitted and local carrier . Thus carrier reco v ery is necessitated for coherent recei v ers [10]. Decision-Directed carrier phase reco v ery via Costas loop is utilized as sho wn in Fig.6 and embedded in the DSP platform at the recei v er side. Figure 6. Decision-directed carrier phase loop Fig.6 is represented in algorithm 1 which is implemented on the recei v er side. In this algorithm, the adapti v e step size is of im portance in terms of con v er gence period. It must be v aried to get satisf actory results and it is recommended to start with 0.01. Result : Phase correction initialization: Adjust scaling of I and Q symbol v alues to be close to 1.0; Set Initially phase correction in the adapti v e algorithm to 1.0+j0; while no phase corr ection do Find y I ;Q ( n ) = x ( n ) c ( n ) ; where x ( n ) , c ( n ) is the synchronizer O/P and phase correction, respecti v ely; Mak e decision on y I ;Q ( n ) to obtain < f d ( n ) g ; = f d ( n ) g ; Calculate the phase error e I ;Q ( n ) = d I ;Q ( n ) y I ;Q ( n ) ; Update phase correction in the adapti v e algorithm; V ary the adapti v e step size ; where, the operator < , = represents the real and imaginary part, respecti v ely . end Algorithm 1: Decision-directed carrier reco v ery algorithm via costas loop for QPSK 2.3. DSP Implementation V on Neumann architecture uses single memory for both data and instruction, ho we v er , this type of architecture [11] dissipate more po wer than con v entional DSP architecture. As we ha v e multistage synchro- IJECE V ol. 7, No. 6, December 2017: 3484 3491 Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE ISSN: 2088-8708 3489 nization in the proposed recei v er , it is important a v oid an y w ait state in the system. This architecture is of fered by Super Harv ard Architecture (SHARC), where the address lines of data and instructions are split. In the proposed system, the ADSP-21364 SHARC EZ-KIT LITE has been selected. It consists of a 333 MHz SHARC DSP with an audio codec which pro vides 2 24-bit ADC inputs and 8 24-bit D A C outputs at a maximum sampling rate of 96kHz. It also pro vides serial peripheral interf ace (SPI) link. T o obtain maximum processor utilization, double b uf fering technique is used. The SHARC DSP has a direct memory access (DMA) coprocessor , which reads/writes a block of data from/to memory while the processor core w orks on another block. The operating system w as V isual DSP++. ADC data, in 24 bits signed inte ger format 8388608 , must first be con v erted to floating point representation. 3. EXPERIMENT AL RESUL TS The proposed system is in v estig ated and emulated in the lab with a wire l ink between transmitter and recei v er in order to calibrate the system. In Fig.7, it can be seen on the upper plot that there is a cut in the reception represented by the concentrated area of the QPSK signal. This concentrated area visualizes the state of the signal when the microprocessor c o m es out of w ork in real time and consequently af fects the constellation outputs as sho wn in the lo wer plot which is the e ye diagram of the recei v ed QPSK signal. This is due to a long e x ecution time at the recei v er side especially on the decoding stage, where the time is critical and consequently the DSP ran out of the real time mode. Such type of run time error is subtle and can not be detected quickly . Thus, in f a v or of Fig.7, the problem has been identi fied and manipulated by adopting a piplining and double b uf fering through an interrupt programming. Fig.8, depicts obtained real time constellation of the proposed embedded QPSK recei v er . The adap- ti v e step size w as 0.005 and zero error rate. It is sho wn from this figure that the the embedded system for multistage synchronization w as achie v ed. An interest ing point in these requirem ents is that the onchi p mem- ory of size 3Mbit is determined, which is pro vided by the utilized SHARC processor (including interlea v er) equi v alent to 21 N f symbols. In order to e v aluate the proposed system, obtained result, in terms of memory utilization, is compared with [12] and sho ws that our system outperforms their system in one block memory requirement, where the y used 22 N f . It is w orth mentioning that there w as insuf ficient memory space while b uf fering the entire frame in both transmitter and recei v er . Data memory (DM) space and program memory (PM) in c o oper ation with heap are e xploi ted to tackle these constraints of memory . In terms of the number of operations, T able 1 demonstrates the number of operations (multiply , di vide, add, subtract) of each stage of the synchronization, b ut the read and write operations has not been considered. T able 1. Recei v er operations Recei v er stages No.Operations BPF+LPF 179 Synchronization 388 Design and Implementation of an Embedded System for Softwar e Defined Radio (A. E. Abdelkar eem) Evaluation Warning : The document was created with Spire.PDF for Python.
3490 ISSN: 2088-8708 Figure 7. Out of real-time reception 4. CONCLUSION The focus of this paper w as on the design and implementation of an embedded system for softw are defined radio using SHARC DSP . The proposed embedded system has been in v estig ated online through an im- plementation of QPSK synchronization schemes and con v olutional decoding. It w as assessed in the laboratory to calibrate the operation. This paper presented a technique to tackle both the ef fect of critical time and mem- ory constraints i ssues. In conclusion, the pipelining and double b uf fering is useful to g ain processing time and should be considered. The interlea v er length is crucial in selecting the DSP memory where it requires to b uf fer the whole frame. Obtained results sho w that the implementation is rob ust and w orking online successful ly and can be considered in man y embedded systems. REFERENCES [1] A. Zakw an, et al., ”Implementation of Algorithm for V ehicle Anti-Collision Alert System in FPGA, International J ournal of Electrical and Computer Engineering (IJECE) , V ol. 7, pp. 775-783, April. 2017. [2] M. Abdurohman and A. Sasongk o, ”Softw are for Simplifying Embedded System Design Based on Ev ent- Dri v en Method, International J ournal of Electrical and Computer Engineering (IJECE) , V ol. 5, pp. 491- 502, Jun. 2015. [3] A. De vices, L A T E X Embedded Processor and DSP Selection Guide , Analog De vices, 2005. [4] Y an, Z., Huang, J. and He, C., ”Implementation of an OFDM underw ater acoustic communication system IJECE V ol. 7, No. 6, December 2017: 3484 3491 Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE ISSN: 2088-8708 3491 Figure 8. On-line synchronized reception of the QPSK signal on an underw ater v ehicle with multiprocessor structure, Frontiers of Electrical and Electronic Engineering , v ol. 2, pp. 151-155, 2007. [5] M.R. Sole ymani, Y ingzi Gao and U. V ilaipornsa w ai, L A T E X T urbo Coding for satellite and wireless Com- munications , Springer Netherlands, 2002. [6] Y ing Li, et al., ”Hardw are Implementation of Symbol Synchronization for Underw ater FSK, Sensor Net- w orks, Ubiquitous, and T rustw orth y Computing, 2010. SUTC IEEE 2010 , 2010, pp. 82- 88, 2010. [7] L. Kaizhuo, et al., ”Design and implementation of underw ater OFDM acoustic communication transmitter , in Audio, Language and Image Processing Conference, 2008. ICALIP 2008. International Conference , 2008, pp. 609-613. [8] L.Bhal et al ., ”Optimal decoding of linear codes for minimizing symbol error rate, IEEE T r ansactions on Info. Theory , V ol. 20, pp. 284-287, Mar . 1974. [9] J. Heiskala and J. T erry L A T E X –OFDM W ireless LANs: A theoretical and Practical guide , SAMS, ISBN: 0672321572. [10] J. G. Proakis and M. Salehi L A T E X –Digital Communications , McGra w-Hill, Fifth Edition, 2008. [11] W . H. P ark, M. H. Sunw oo, and S. K. Oh, ”Ef ficient DSP architecture for V iterbi decoding with small trace back latenc y , Asia P acific on Circuits and Systems Conference, IEEE 2004 , 2004, pp. 2813-2818 , 2004. [12] Or , Y ., K utz G., Chass A., Gubesk ys A., Pollak E., ”Iterati v e decoding a lgorithms for real time softw are implementation in wireless communication systems, IEEE Conf .,VTS ,pp. 1884 - 1888, v ol.3, 2001. Design and Implementation of an Embedded System for Softwar e Defined Radio (A. E. Abdelkar eem) Evaluation Warning : The document was created with Spire.PDF for Python.