Indonesian J our nal of Electrical Engineering and Computer Science V ol. 16, No. 3, December 2019, pp. 1286 1296 ISSN: 2502-4752, DOI: 10.11591/ijeecs.v16i3.pp1286-1296 r 1286 A str eaming multi-class support v ector machine classification ar chitectur e f or embedded systems J ee v an Sirkunan, J .W . T ang, N. Shaikh-Husin, M. N. Marsono School of Electrical Engineering, F aculty of Engineering, Uni v ersiti T eknologi Malaysia, Johor , Malaysia Article Inf o Article history: Recei v ed Dec 28, 2018 Re vised Jun 1, 2019 Accepted Jun 28, 2019 K eyw ords: Multi-class SVM FPGA embedded system ABSTRA CT Pedestrian det ection, f ace detection, speech recognition and object detection are some of the appl ications that ha v e benefited from hardw are-accelerated Support V ector Machine (SVM). Computational comple xity of SVM classification mak es it chal- lenging for designing hardw are architec ture with real-time performance and lo w po wer consumption. On an embedded streaming architecture, testing data are mostly stored on e xternal memory . Data are transferred in streams with the maximum bandwidth limited to the b us bandwidth. The hardw are implementation for SVM classification needs to be suf ficiently f ast to k eep up with the data transfer speed. Prior implementation throttles data input to a v oid o v erwhelming the computational unit. This results in a bottleneck in the st reaming architecture. In this w ork, we propose a streaming-architecture multi -class SVM classification for an embedded system that is fully pipelined and able to proce ss data continuously without an y need to throttle data stream input. The proposed design is tar geted for em bedded platform where testing data is transferred in streams from e xternal memories. The architecture is modele d using V erilog and the e v aluation is tar geted for Altera Cyclone IV field programmable g ate array platform. Performance profiling on the proposed architec- ture is done with re g ard to the number of features and support v ectors. F or v alidation, the proposed architecture is simulated using ModelSim and t he results are compared with LibSVM. Based on the simulation result, the proposed architecture is able to produce a throughput of 1 = N f classification per clock c ycle, where N f is the number of features. Copyright c 2019 Insitute of Advanced Engineeering and Science . All rights r eserved. Corresponding A uthor: Jee v an Sirkunan, School of Electrical Engineering, F aculty of Engineering, Uni v ersiti T eknologi Malaysia, Johor , Malaysia. Email: jee v an2@li v e.utm.my 1. INTR ODUCTION Perv asi v e computing leads to emer gence of applications that mandated computational inte lligence and analytics. Enter tainment, sensor netw orks, health care and en vironmental monitoring are some of the e xample applications that ha v e embedded perv asi v e computing into their system [1]. These applications need to process lar ge amounts of data, and this requires massi v e data parallelism that needs high data bandwidth transfer between the processors and of f-chip memory . Such data access pattern renders on-chi p caches mostly inef fecti v e [2]. At the same time, some of these applications require a lo w size, lo w po wer processor with real- time operati ng performance especially for an embedded system [1]. There are se v eral w ays in dealing with this demand. One method is to use a reconfigurable hardw are such as field programmable g ate array (FPGA) that pro vides high performance computation at minimal cost o v erhead and po wer consumption [3]. J ournal homepage: http://ijeecs.iaescor e .com/inde x.php/IJEECS Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 1287 Support V ector Machine (SVM) is an accurate binary classi fier that is based on a solid theoreti cal background [4-6]. Some applications that ha v e benefited from SVM hardw are acceleration are pedestrian detection [7], f ace detection [8], speech recognition [9] and object detection [10, 11]. Due to SVM computa- tional comple xity , designing a hardw are architecture with lo w po wer consumption and real-time classification performance still remains a challenge [12]. Man y e xisting custom SVM hardw are implementations focused on accelerating the decision function that is application specific [8, 13]. These designs cannot be easily reused for other applications as the y are optimized for specific applications [8]. W orks that tar geted on acceleration of SVM with a co-processor unit [2, 14, 15, 16] tend to focus on the k ernel due to its compute-int ensi v e task and also its innate nature to be parallelized. The performance of a single accelerator unit or a co-processor unit is highly dependent on the rate of data transfer . W ith limited on-chip memory in an embedded system, dat a are stored in of f-chip memory . Data are transferred in streams and the maximum bandwidth depends on the b us width of a parti cular platform. Kane et al. [12] proposed a fully-pipelined SVM architecture that supports multi-class classification which w as tested with a wide range of data sets. Ho we v er , in their implementation input throttling w as required to a v oid data input from o v erwhelming the processing unit. The throughput of their proposed architecture is limited under high load. In this w ork, we propose a multi-class fully pipelined SVM classification streaming architecture for embedded system. The architecture is able to produce output at the same rate as the data input without the need for data input throttling. The proposed hardw are architecture is based on LibSVM model data structure. Performance e v aluation of the proposed hardw are implementation is done with re g ard to the number of support v ectors and features. R esults sho w that the proposed design has an initial latenc y approximately equi v alent to the number of support v ectors and able to process data continuously without an y need to throttle the data transfer . This feature is crucial as training data are transferred in streams from e xternal memory . 2. RESEARCH METHOD T raditionally , there ha v e been tw o fundamentally dif ferent types of tasks in machine learning: unsu- pervised and supervised. The goal of unsupervised learning is to find patterns and structure in the data, while supervised learning is to learn a mapping from a labeled data set [18]. SVM f alls under the cate gory of the latter . In general, SVM tasks can be di vided into tw o: training and classification. Based on a labeled data set, SVM is trained to obtain its deci sion function. W ith the decision function, unkno wn data can then be classified. 2.1. T raining SVM is inherently a binary classifier; in the training phase, it determines the decision boundary that maximizes the space between tw o classes [19]. In order to handle data which cannot be separated linearly in lo w-dimensional feature space, k ernel functions are used to project learning data into high-dimensional space so that it can be linearly separated [20]. Equations (1) and (2) sho w the dual Lagrange problem to obtain the SVM decision function. K ( x i ; x j ) is the k ernel function, b is the bias parameter and C is the trade-of f parameter between maximizing the mar gin and minimizing the error . Maximize L = N X i =1 i 1 2 N X i =1 N X j =1 i j y i y j K ( x i ; x j ) Subject to ( 0 C P N i =1 i y i = 0 (1) b = y i N X i =1 i x i y i (2) where : Lagrangian multiplier N T otal amount of training data set x i T raining data (feature set) y i T raining data (label) A str eaming multi-class support vector mac hine ... (J ee van Sirkunan) Evaluation Warning : The document was created with Spire.PDF for Python.
1288 r ISSN: 2502-4752 The main aim of the training functions (equations (1) and (2)) are to obtain i and b v alues. If i is 0, the corresponding training feature v ector x i is not a support v ector . If i is v ery high, then the corresponding training feature x i has high influence o v er the decision surf ace of the h yperplane ( x i becomes a support v ector). 2.2. Classification Equation (3) sho ws the SVM decision function d ( u ) for classification. The computation only e xtends to the number of support v ectors N sv , where the support v ectors are a subset of the training data. The class of unkno wn input u is determined by the sign of the function in equation (3). d ( u ) = sign N sv X i =1 i K ( x i ; u ) + b (3) 2.3. K er nel Function K ernel functions ef ficiently map non-linear datasets to a high dimensional linear feature space. K ernel function allo ws SVM to handle non-linear data. The type of k ernel to be emplo yed is entirely dependent on the characteristics of the appli cation data sets, which is be yond the scope of this paper . Some of the main k ernel functions that are used in literature are linear (equation (4)), polynomial (equation (5)), Gaussian radial function (RBF) (equation (6)) and sigmoid (equation (7)). K ( x i ; u ) = x i u (4) K ( x i ; u ) = ( ( x i u ) + r ) d (5) K ( x i ; u ) = e ( ) x i u 2 ; where > 0 (6) K ( x i ; u ) = tanh ( ( x i u ) + r ) (7) 3. RELA TED W ORKS Extant w orks on accelerating SVM on FPGA can be di vided into tw o main groups: training and classification. Accelerating training phase focuses on certain applications where online learning is required. Applications such as adapti v e channel equalization [21] and sk etch recognizer [22] demand short training time for f ast adaptation since the characteristics of data change o v er time. On the other hand, accelerat- ing classification phase is important when a task requires a fix ed classification module with lar ge data to be classified [8, 13]. Sequential Minimal Optimization (SMO) [23] is considered as one of the commonly used algorithms for optimizing SVM training problem. Ho we v er , the SMO algorithm itself w as designed in such a w ay that it caters for single-threaded computer [16]. Besides SMO, there are also Gilbert’ s algorithm [24] and Least Squared Support V ector Machine (LS-SVM) [25] for training. W orks that are tar geted for training either implement the whole system-on-chip or of floaded to a co-processor [2, 14, 15, 16] to accel erate the training process. The task that is t ar geted for hardw are implementation is the k ernel as it is a compute-intensi v e task that benefits from parallelization. SVM training phase produces a model, which i s used in the classification phase later . This model then dict ates the architecture, logic resource and memory bits utilization of the hardw are implementation for SVM classification. F or the classification part, man y e xisting custom hardw are implementations focused on accelerating the decision function for a specific task [8, 9, 11, 13, 26]. These designs cannot eas ily be reused for other purposes as these were tar geted for a fix ed number of support v ectors and classes. Another approach to w ards SVM classification on FPGA system is cascaded SVM [27]. The cascaded SVM architecture by [27] contains a combination of a high and a lo w precision module that are implemented depending on the rate of the incoming testing data. The lo w precision module processes incom ing data since it has higher throughput potential. Data which are not able to be classified wit h certainty is transferred to the high precision module which has lo wer throughput potential b ut is able to classify the data more accurately . Ho we v er , [27] focused w as solely on the binary classification problem. The implementation of SVM classification in hardw are requires man y multipliers especially the k ernel modules. This has led to the use of the CORDIC algorithm in implementing SVM classification accelerator module. Ruiz-Llata et al. [1] proposed an SVM classification with multi-class support using the Indonesian J Elec Eng & Comp Sci, V ol. 16, No. 3, December 2019 : 1286 1296 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 1289 CORDIC algorithm to replace multipliers [1]. Ho we v er , timing performance w as not reported and the w ork w as not benchmark ed with an y CPU or GPU implementation. As the CORDIC method is iterati v e, this w ould result in a reduction in performance for more comple x data sets with a lar ger amount of support v ectors and classes. COR DIC unrolled loop can be implemented with fix ed pipelined stages. Ho we v er , this w ould consume additional hardw are and w ould af fect accurac y for certain input conditions. Multiplications in FPGA were a v oided in earlier implementations since it w as resource-hungry . Ho we v er , ne wer FPGA f amilies ha v e dedicated hardw are multipliers and MA C units [17]. Kane et al. [12] proposed a fully pipelined SVM decision architect ure for multi-class with a generi c design, which w as t h e n tested with a wide v ariety of datasets. This w ork performed better than GPU and CPU implementations in terms of throughput with practically a ne gli g i ble trade-of f in accurac y . Ho we v er , in their proposed w ork, a stalling mechanism w as used to stop input data from o v erflo wing the processing unit. The rate of input data needs to be throttled to match with the maximum rate of the data output of the proposed hardw are. In this w ork, we propose a streaming architecture for embedded systems that is fully pipelined and is able to operate without the need to throttle the rate of input data. This allo ws the proposed architecture to achie v e maximum allo w able throughput depending on the rate of data input. The proposed architecture is tar geted for embedded system platform where training data are stored in e xternal memory source and continu- ously streamed-in through a fix ed memory bandwidth. T raining data are computed as the y arri v e, which result in the throughput of the data output to be equi v alent to the data input. The architecture has an initial latenc y approximately equi v alent to the number of support v ectors. 4. LIBSVM MODEL The proposed architecture is based on the model generated by LibSVM [28]. During the training phase, based on a set of training data, LibSVM creates an SVM model file. This file contains important parameters for the decision function. Besides that, the classifier model also contains information on types of classification, k ernel type, k ernel parameters, number of support v ectors from each class and a bias v alue. F or each binary training problem, LibSVM generates a set of that is used in the decision function. There are man y multi-class approaches in SVM, such as pairwise, D A G, and one-v ersus-all. LibSVM uses the pairwise method. Therefore for each multi-class SVM model, there will be n ! 2( n 2)! sets of coef ficient for computation, where n is the number of classes. Figure 1 sho ws the format for the SVM classifier model with four classes as generated by LibSVM. N sv represents the number of support v ectors, N f is the number of fea tures, and N c is the number of classes. During the training phase, each support v ector is compared wit h other support v ectors to ensure that no redundant SVs are stored in the model. In the coef ficient column for SV belonging to another binary classier , the coef ficient is set to zero. Based on this structure, each data input can be multi plied with all SVs and coef ficients without requiring a multiple x er for selection in the SVM hardw are module. Figure 1. SVM Model format A str eaming multi-class support vector mac hine ... (J ee van Sirkunan) Evaluation Warning : The document was created with Spire.PDF for Python.
1290 r ISSN: 2502-4752 5. PR OPOSED ARCHITECTURE This sect ion presents the proposed architecture. Figure 2. sho ws the o v ervie w of the ar chitecture. The architecture is di vided into four sections: k ernel computation, coef ficient weighting, v oting summation, and v ote sorting. Each of this module is structured in such a w ay to produce output at e v ery clock c ycle. Our implementation design is parameterizable in terms of N sv and N f in order to handle v arious types of application. The input of the proposed architecture is an e xternal memory connect ed to the FPGA. Unlik e on-chip memory where data can be accessed simultaneously , for e xternal memory , the bandwidth is limited. F or FPGA de vices that ha v e higher memory transfer bandwidth with e xternal memory , the proposed design can be e xpanded with the same structure to support the high bandwidth data input. F or our implemen- tation, we fix ed the input to be one feature per clock c ycle. Figure 2. Proposed architecture o v ervie w 5.1. K er nel Computation This module performs k ernel computation as the input data is streamed-in continuously e v ery clock c ycle, resulting in the throughput of the system to be 1 = N f . In each clock c ycle, a single feature is transferred to the module. Figure 3 sho ws the o v ervie w of the linear k ernel architecture. The number of multipliers needed is based on the number of SVs that are a v ailable in the model. The linear k ernel implementation is based on LibSVM. This structure is maintained so that it can easily be substituted with a dif ferent type of k ernel for future implementation. From the SVM model in Figure 1, parallelization of the k ernel computation can be done based on either the N f or the N sv . P arallelizing based on N f will result in a match with the input data rate. Therefore for proposed design parallelization based on the number of SVs is chosen. Figure 3. Linear K ernel architecture In our implementation, each input data is transferred through a shift re gister in comparison to the w ork proposed by [26], where input data is broadcasted to each SV at the same time. This is done in order to reduce the f an-out from input data, which spans across the support v ector memory output, which w ould be lar ge for application with man y SVs. Since data tra v erse through the SV memory with a clock c ycle delay , each SV is also accessed with one clock c ycle delay . Therefore, in order to a v oid multiple indi vidual memory blocks, a counter and a complicated control unit, we concatenate all the support v ectors into a single memory block based on the Indonesian J Elec Eng & Comp Sci, V ol. 16, No. 3, December 2019 : 1286 1296 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 1291 format in Figure 4b. W ith this scheme, only a single memory block and counter are required. Figure 4a sho ws the con v entional method [28] for storing support v ectors in memory . The proposed method (Figure 4b) is based on the con v entional method (Figure 4a) where each SV column is shifted one ro w from left to right. (a) Con v entional storage format of support v ectors memory [28] (b) Proposed storage format of support v ectors memory Figure 4. Storage format of support v ectors memory 5.2. Coefficient weighting Based on SVM classification equation (3), resultant k ernel computation for each SV will be multipli ed with its corresponding coef ficient. The resultant is then summed together with its bias. The final result sign is then compared to see if its greater or less than zero. Based on this the class is determined. Piece wise multi-class approach from [28] compares one class with other classes. During the tr aining phase, one class is ass igned to be in the positi v e class while the other is in the class ne g ati v e. Generated coef ficients will be based on this notation. The sign on SVM model coef ficients for multi-class with four classes is sho wn in Figure 5. If the coef ficients from 1vs2 are positi v e, the corresponding 2vs1 w ould be ne g ati v e. Based on this, equation (3) is e xpanded to equation (8). The bias parameter b can be placed on either side of equation (8) depending on its sign. Based on equation (8), the hardw are module does not require for signed inte ger computation. Figure 5. coef ficient sign notation for SVM model with 4 classes N sv + X i =1 i K ( x i ; x ) b > N sv X i =1 i K ( x i ; x ) b (8) Figure 6 sho ws the structure of the coef ficient weighti ng module. The v alues are loaded into the FPGA as constant v alues. The resultant v alue from each SV will be summed and stored in each indi vidual A str eaming multi-class support vector mac hine ... (J ee van Sirkunan) Evaluation Warning : The document was created with Spire.PDF for Python.
1292 r ISSN: 2502-4752 FIFO. The FIFOs are needed for this architecture to store the result from the coef ficient weighting module before it is computed in the ne xt module. T est ing data are computed from quadrant 1 until 4 (Figure 5). Once the data reaches quadrant 2, comparison for class 1 and 2 can be performed. In quadrant 3, comparison for classes 1 and 2, with 3 can be performed, and finally at quadrant 4, comparison for classes 1,2 and 3 with 4 can be performed. The FIFOs are loaded in depending on the SV that has been completed. Figure 6. Alpha coef ficient weighting architecture 5.3. V oting summation The outputs of coef ficient weighting module are connected to the comparator to determine the winner for each class comparison (Figure 6). Based on Figure 1, training data inputs are computed with SV from class 1 follo wed by class 2, class 3 and finally class 4. The results of each comparator are stored in another set of FIFOs. Figure 7 sho ws an o v ervie w of the v oting summation module. The FIFOs are shifted out depending on the SVs that ha v e been completed. The in v erter produces an in v erted result for the opposing class. Once SVs from all classes are completed, the results are summed up and stored in a re gister . Each re gister corresponds to a particular class. Figure 7. V oting Summation architecture 5.4. V oting sort Finally , the results of all summations are sorted to determine the class. Figure 8 sho ws the a rchitecture of the sorting mechanism. Each result re gister from v oting summation is concatenated with the class inde x. Once sorted, the inde x will be sho wn as the labeled class. The architecture is based on bitonic sort [29], b ut in our proposed architecture it only determines the maxim um v alue within a list. Similar to LibSVM, if tw o classes sho w the identical summation result, the class with the smaller inde x number will be the winner . Indonesian J Elec Eng & Comp Sci, V ol. 16, No. 3, December 2019 : 1286 1296 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 1293 Figure 8. V oting sort architecture 6. RESUL T AND DISCUSSION Depending on the tar geted application, the number of SVs, features and class v ary . Based on the implementation of [12] with linear k ernel for DN A dataset [30], the SVM model has three classes, 180 features and 402 SVs while for V o wel [31] dataset the SVM model has 11 classes, 10 features, and 268 SVs. SVM configurations for certain applications dif fer based on these param eters. Our analysis has been carried in tw o scenarios. F or Experiment 1, the number of support v ectors is fix ed to 20 and the size of input data is 16-bit inte ger . The proposed architecture has been e v aluated wi th a dif ferent number of features and SVs. The number of features is adjusted from 10 onw ards with increments of 10 until 100 features. F or Experiment 2, the number of features is fix ed to 32 and the size of input data is also 16-bit inte ger . The numbers of support v ectors are adjusted from 10 onw ards with increments of 10 until 100 support v ectors. The number of classes is k ept constant to four classes. F or both e xperiments, the logic resource utilization, internal memory bits usage, maximum operating frequenc y (in MHz) and the number of embedded multipliers are analysed. Performance profiling e xperiment on the proposed architecture w as done using Quartus II v13.0 softw are and the tar get FPGA de vice is Cyclone IV EP4CE115F29C7. F or v alidation, the proposed architecture w as simulated using ModelSim v10 to obtain c ycle accurate results and the results are compared with LibSVM. T ables 1 and 2 sho w the results for Experiments 1 and 2, respecti v ely . The number of SVs has a greater impact on logic util ization. As the number of SVs increases, the number of multipli ers increases, whereas if the number features increases, only the number of adders will increases. The number of internal memory bits utilization also increases with the number of SVs. This is mainly due to SV memory and the FIFOs that are needed to store the temporary resul ts. The number of SVs does not ha v e a significant impact on the maximum operating frequenc y (T able 2). The minor dif ferences are mai nly due to placement and routing. On the other hand, as the number of features increases, the maximum operating frequenc y decreases is due to the increase in the size of adders and multipliers for the architecture. T able 1. Performance e v aluation of the proposed architecture to w ards dif ferent number of features. No. of Logic Memory Multipliers Max featur es Elements Bits Fr equency (MHz) 10 6408 27776 320 90.46 20 7706 36736 376 85 30 9056 40576 432 79.13 40 10322 54656 488 76.34 50 12429 58496 532 71.78 60 14027 62336 532 68.48 70 16688 86656 532 66.54 80 19223 90496 532 63.4 90 21308 94336 532 60.51 100 23777 98176 532 58.2 A str eaming multi-class support vector mac hine ... (J ee van Sirkunan) Evaluation Warning : The document was created with Spire.PDF for Python.
1294 r ISSN: 2502-4752 T able 2. Performance e v aluation of the proposed architecture to w ards dif ferent number of support v ectors. No. of Logic Memory Multipliers Max SVs Elements Bits Fr equency (MHz) 10 4810 20672 244 79.94 20 9300 41344 488 81.31 30 15836 46464 532 80.75 40 24276 82688 532 76.38 50 33080 87808 532 77.22 60 42538 155136 532 74.77 70 52110 160256 532 76.16 80 59679 165376 532 75.41 90 69094 170496 532 71.12 100 80540 175616 532 75.48 The performance e v aluations are tar geted for a generic SVM classification hardw are impleme n t ation. A 16-bit data input w as assumed to be a 16-bit inte ger , therefore the adders, multipliers, and re gisters that follo w after the linear k ernel module (Figure 3) increase in width in order to maintain a lossless computation. F or real application implementation depending on the tolerance of the application to w ards a lossy result, the proposed archi tecture can be optimized to ha v e smaller numerical adders and multipliers, which will l o wer the logic utilization and increase the maximum operating frequenc y . F or this implementation, the focus w as only on the linear k ernel. The proposed design can be e xpanded to dif ferent types of the k ernel with some minor changes in the k ernel computation module. F or RBF k ernel implementation (equation (6)), e xponential computation is required. F or a pipelined architecture, a lookup table is more suitable in comparison to CORDIC since it can produce results with a predictable time frame. Based on our architecture, since the input data tra v ersed through the shift re gister , a shared lookup table for RBF k ernel implementation can be used without compromising the throughput of the system. F or application where the number of SVs is less than the number of features, the proposed architecture w ould be able to share a single e xponential lookup table. Ho we v er , if the number of SVs is lar ger than the number of features, the number of required e xponential lookup tables will be equi v alent to N sv = N f . Since N sv determines the number of clock c ycles needed for all support v ectors to share a single lookup table, whereas N f represents the number of clock c ycles needed for each SV to be computed. In our performance analysis, the focus w as only on the linear k ernel. The proposed design can be e xpanded to dif ferent t ypes of k ernels with some minor changes in the k ernel computation module. F or RBF k ernel implementation (Eq. ) e xponential computation is required. F or a pipelined architecture lookup table is more suitable in comparison to CORDIC since its able to produce results with a predictable time frame. Based on our architecture, since the input data tra v ersed through the shift re gister , a shared lookup table for RBF k ernel implementation can be used without compromising the throughput of the system. F or application where the number of SVs is less then the number of features, the proposed architecture w ould be able to share a single e xponential lookup table. Ho we v er , if the number of SVs is lar ger then the number of features the number of e xponential lookup table needed will be equi v alent to N sv = N f . N sv determines the number of clock c ycle needed for all SVs to share a single lookup table whereas N f represents the the number of clock c ycle needed for each SV to be computed before the ne xt input data starts to be computed. 7. CONCLUSION The proposed streaming architecture for SVM classification is able to produce output at the same rate as data input. The throughput of the implementation is equi v alent to 1 = N f . Based on the performance profiling e xperiments, the proposed architecture resource utilization is dependent on the number of SVs , while the e resource utilization is dependent on the number number of features determines the maximum operating frequenc y . F or future w ork, the proposed architecture needs to be tested with real-w orld data sets. From this, the trade-of f of accurac y betwee n the softw are and hardw are implementations can be observ ed. Besides that, e xplorations on dif ferent streaming hardw are architecture for dif ferent multi-class SVM method need to be done. Other multi-class approaches can be applied to the same problem, which can result in dif ferent hardw are utilization and accurac y . Indonesian J Elec Eng & Comp Sci, V ol. 16, No. 3, December 2019 : 1286 1296 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 r 1295 A CKNO WLEDGEMENT This w ork is supported in part by the Collaborati v e Research in Engineeri ng, Science & T echnology (CREST) grant P17C114 (v ote no. 4B176) and Uni v ersiti T eknologi Malaysia Matching grant (v ote no. 00M75). REFERENCES [1] Ruiz-Llata M, Guarnizo G, Y ´ ebenes-Calvino M. FPGA implementation of a support v ector machine for classification and re gression. In Proceedings of the 2010 IEEE International Joint Conference on Neural Netw orks (IJCNN), 2010 (pp. 1-5). [2] Cadambi S, Durdano vi c I, Jakkula V , Sankaradass M, Cosatto E, Chakradhar S, Graf HP . A massi v ely parallel FPGA-based coprocessor for support v ector machines. In Pr oceedings of the 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines, 2009 (pp. 115-122). [3] V ´ estias MP . High-performance reconfigurable computing granularity . Enc yclopedia of Information Sci- ence and T echnology , Third Edition. IGI Global. 2015:3558-3567. [4] K otsiantis SB, Zaharakis I, Pintelas P . Supervised machine learning: A re vie w of cl assification techniques. Emer ging Artificial Intelligence Applications in Computer Engineering. 2007:160:3-24. [5] V apnik V . The nature of statistical learning theory . Springer science & b usiness media, 2013. [6] Mahmoodi D, Soleimani A, Khosra vi H, T aghizadeh M. FPGA simulation of linear and nonlinear support v ector machine. Journal of Softw are Engineering and Applications. 2011, 4(05):320. [7] Hahnle M, Sax en F , Hisung M, Brunsmann U, Doll K. FPGA-based real-time pedestrian detection on high-resolution images. In Proceedings of the IEEE Conference on Computer V ision and P attern Recog- nition W orkshops, 2013 (pp. 629-635). [8] K yrk ou C, Boug anis CS, Theocharides T , Polycarpou MM. Embedded hardw are-ef ficient real-time clas- sification with cascade support v ector machines. IEEE T ransactions on Neural Netw orks and Learning Systems. 2015, 27(1):99-112. [9] Cutajar M, Gatt E, Grech I, Casha O, Micallef J. Hardw are-based support v ector machine for phoneme classification. In Proceedings of the 2013 IEEE International Conference on Smart T echnologies (EUR O- CON), 2013 (pp. 1701-1708). [10] Mizuno K, T erachi Y , T akagi K, Izumi S, Ka w aguchi H, Y oshimoto M. An FPGA implementation of a HOG-based object detection processor . IPSJ T ransactions on System LSI Design Methodology . 2013, 6:42-51. [11] Qasaimeh M, Sag ah yroon A, Shanableh T . FPGA-based parallel hardw are architecture for real-time image classification. IEEE T ransactions on Computational Imaging. 2015, 1(1):56-70. [12] Kane J, Hernandez R, Y ang Q. A reconfigurable multiclass support v ector machine architecture for real- time embedded systems classification. In Proceedings of the 2015 IEEE 23rd Annual International Sym- posium on Field-Programmable Custom Computing Machines (FCCM), 2015 (pp. 244-251). [13] Kryjak T , K omorkie wicz M, Gor gon M. FPGA implementation of real-time head-shoulder detection using local binary patterns, SVM and fore ground object detection. In Proceedings of the 2012 IEEE Conference on Design and Architectures for Signal and Image Processing, 2012 (pp. 1-8). [14] Bustio-Mart ´ ınez L, Cumplido R, Hern ´ andez-P alancar J, Fere grino-Uribe C. On the Design of a Hardw are- Softw are Architecture for Acceleration of SVM’ s T rai ning Phase. In Proceedings of the Me xican Confer - ence on P attern Recognition, 2010 (pp. 281-290). [15] W ang JF , Peng JS, W ang JC, Lin PC, K uan TW . Hardw are /softw are co-design for f ast-trainable speak er identification system based on SMO. In Proceedings of the 2011 IEEE International Conference on Sys- tems, Man, and Cybernetics, 2011 (pp. 1621-1625). [16] V enkateshan S, P atel A, V ar ghese K. Hybrid w orking set algorithm for SVM learning with a k er - nel coprocessor on FPGA. IEEE T ransactions on V ery Lar ge Scale Inte gration (VLSI) Systems. 2015, 23(10):2221-32. [17] ˇ Sk oda P , Rogina BM, Sruk V . FPGA implementations of data mining algorithms. In Proceedings of the 35th International Con v ention of Information Communication T echnology (MIPR O), 2012 (pp. 362-367). [18] Chapelle O, Scholk opf B, Zien A. Semi-supervised learning. IEEE T ransactions on Neural Netw orks. 2009, 20(3):542. [19] V apnik V . Statistical Learning Theory . John W ile y&Sons. Inc. 1998:156-60. A str eaming multi-class support vector mac hine ... (J ee van Sirkunan) Evaluation Warning : The document was created with Spire.PDF for Python.