TELK OMNIKA T elecommunication, Computing, Electr onics and Contr ol V ol. 19, No. 1, February 2021, pp. 182 191 ISSN: 1693-6930, accredited First Grade by K emenristekdikti, No: 21/E/KPT/2018 DOI: 10.12928/TELK OMNIKA.v19i1.15026 r 182 Fine-grained or coarse-grained? Strategies f or implementing parallel genetic algorithms in a pr ogrammable neur omor phic platf orm Indar Sugiarto 1 , Ste v e Furber 2 1 Department of Electrical Engineering, Petra Christian Uni v ersity , Indonesia 2 School of Computer Science, Uni v ersity of Manchester , United Kingdom Article Inf o Article history: Recei v ed Dec 19, 2019 Re vised Apr 10, 2020 Accepted Sep 24, 2020 K eyw ords: Genetic algorithm Netw ork on chip Neuromorphic computing P arallel computing SpiNNak er ABSTRA CT Genetic algorithm (GA) is one of popular heuristic-based optimization methods that attracts engineers and scientists for man y years. W ith the adv ancement of multi- and man y-core technologies, GAs are transformed into more po werful tools by par - allelising their c ore processes. This paper describes a feasibility study of implement- ing parallel GAs (pGAs) on a SpiNNak er . As a man y-core neuromorphic platform, SpiNNak er of fers a possibility to scale-up a parallelised algorithm, such as a pGA, whilst of fering lo w po wer consumption on its processing and communication o v er - head. Ho we v er , due to its small pack ets distrib ution mechanism and constrained pro- cessing resources, parallelising processes of a GA in SpiNNak er is challenging. In this paper we sho w ho w a pGA can be implemented on SpiNNak er and analyse its performance. Due to inherently numerous parameter and classification of pGAs, we e v aluate only the most common aspects of a pGA and use some artificial benchmark- ing test functions. The e xperiments produced some promising results that may lead to further de v elopments of massi v ely parallel GAs on SpiNNak er . This is an open access article under the CC BY -SA license . Corresponding A uthor: Indar Sugiarto Department of Electrical Engineering Petra Christian Uni v ersity Jl. Siw alank erto 121-131, Surabaya, 60236, Indonesia Email: indi@petra.ac.id 1. INTR ODUCTION The adv ent of multi- and man y-core platform mak es the parallel and distrib uted computing more feasible and attracti v e. The same is also a v alid case for algorithms that are inherently feasible for parallelism such as a genetic algorithm (GA). Ho we v er , the real implementation of parallelism for such algorithms may be challenged by the f act that man y current computing platforms inherent serial processing mechanism of v on- Neumann architecture [1, 2]. This mak es the implementation of parallel information processing complicated and e xpensi v e [3, 4]. In the case of GAs, the parallelism concept has been applied in r ecent years in order to achie v e a higher performance throughput. One common method to achie v e such acceleration is by utilizing accelerator such as graphics processing units (GPUs) [5–8], it is well-kno wn that single instruction multiple data (SIM D) mechanism in a GPU plays an important role to achie v e such v ery high computation throughput [9, 10]. J ournal homepage: http://journal.uad.ac.id/inde x.php/TELK OMNIKA Evaluation Warning : The document was created with Spire.PDF for Python.
TELK OMNIKA T elecommun Comput El Control r 183 GA belongs to the lar ger class of e v olutionary algorithms (EA) and inherently w orks as a s equential process [11, 12]. Ev en though the inner processes in a GA are sequential, the data on which the processes tak e place are partitionable. This allo ws a GA to be parallelized [13]. A parallel GA (pGA) usually e xploits the ab undant processing elements in a parallel/distrib uted system, and can be r o ughl y classified into tw o cate gories: coarse-grained parallel GA (cpGA) and fine-grained parallel GA (fpGA). The most distincti v e features of the tw o can be inferred from the GA s population granularity and the communication o v erhead. The most popular e xample of the cpGA is the island model, whereas in fpGA cellular GA is the most common model. In this paper we present our preliminary study of pGAs implementation on a no v el man y-core neu- romorphic platform called SpiNNak er . SpiNNak er (spiking neural netw ork architecture) is a man y-core neu- romorphic platform that is designed to emulate brain’ s processing and connecti vity in neuronal le v el [14–16]. Due to enormous features of pGAs [17, 18], we cannot e xplore and present all features of both pGA cate gories in this paper . Thus, this paper serv es as the starting point for future e xploration of pGAs i n SpiNNak er . T o our kno wledge, this is the first paper that describes methods for imple menting pGAs on a neuromorphic platform. Thus, benchmarking the performance e v aluation of pGAs on SpiNNak er with other a v ailable neuromorphic platforms is not possible. Rather , the emphasis is on testing the SpiNNak er platform as a viable de vice to im- plement a massi v ely parallel GA. The results of this paper can be used for future e xploration on a collaboration between e xtensi v e optimization and deep learni ng [19, 20]. The paper is or g anized as f ollo ws. After the intro- duction in this section, related w orks on pGAs and SpiNNak er for con v entional non-spiking applications are described. In section 3, the detailed implementation of pGAs on SpiNNak er is described. Section 4 presents the e xperiment setup and the collected results, follo wed by their analysis and discussion. Finally , this paper is closed by conclusions in section 5. 2. RELA TED W ORKS GA is a nature (biology) inspired algorithm mostly popular to be used to solv e comple x optimi zation problems [21, 22]. It is a metaheuristic method inspired by the process of natural selection that in v olv es bio- inspired operations such as gene mutation and crosso v er . There are dif ferent f amilies of GAs, and in the recent decade, the y ha v e been sho wing to con v er ge [23, 24]. Conceptually , inner w orking principles in a GA are sequential proces ses. T o speed them up, parallelism strate gies are applied on those processes. At s ome e xtent, the parallelism also in v olv es distrib uted processes. In this paper , we refer a parallel distrib uted GA to as simply a parallel GA. This is conceptually a straightforw ard met ho d, since GA normally w orks on re gular data in a partitioned solution space. Ho we v er , in real implementation, there is a complication issue due to v arious options that could be traded of f in order to match between the hardw are platform and the GA-encoded problem description. In general, a parallel process/algorithm can w ork in a fine-grained or coarse-grained f ashion. A fine- grained parallel GA (fpGA) commonly tak es a form in a cellular model; whereas a coarse-grained parallel GA (cpGA) utilizes a concept of group separation, such as in island models. The granularity of pGA can also be described in a finer c o nt inuum, creating a broader range of classification; in this paradigm, an y ne w pGA model can f all in between fine- and coarse-grained classes. Furthermore, the y can also be stack ed hierarchically for creating a h ybrid class from an y possible combination of e xisting models. Comprehensi v e lists of pGA models ha v e been presented in v arious recent re vie w papers, for e xample [25–30]. The morphology , genetic operation, and communication principle b e tween elements on those v arious pGA models ha v e also been described on those papers. Hence, we will not co v er detailed aspects of a pGA in this paper . Instead, we present a general concept of pGAs only for these simplified classes: fine- and coarse- grained models. Fine-grained pGAs are strongly associated with the massi v ely parallel machines in the form of multi-core computers. On the other hand, coarse-grained pGAs are commonly implemented in a distrib uted platform and closely related to the concept of migration islands. Due to numerous tuning properties of a pGA as well as a tight coupling between pGA design and the probl em it tries to solv e, it is almost impossible to create a generic pGA frame w ork that unifies both classes. F ortunately , the SpiNNak er platform is generic enough to be used as both parallel and distrib uted computing. In this paper , we use some parallelism strate gies and tools de v eloped in our pre vious w ork [31]. Str ate gies for implementing par allel g enetic algorithms in neur omorphic platform (Indar Sugiarto) Evaluation Warning : The document was created with Spire.PDF for Python.
184 r ISSN: 1693-6930 3. RESEARCH METHOD 3.1. Brief o v er view of SpiNNak er SpiNNak er (spiking neural netw ork architecture) is a neuromorphic platform originally designed to simulate brain processing in a biologically plausible mechanism [32]. On hardw are le v el, SpiNNak er is a field programmable neuromorphic platform that is b uilt on man y-core concept usi n g massi v ely connected multi- core chips [33]. SpiNNak er w orking mechanism are based on small pack et transaction between nodes/chips. These pack ets are v ery f ast to be distrib uted intra- and inter -chips to mimic biological realm of the brain where man y small spik es are interchanged among neurons in the brain. This w ay , the SpiNNak er allo ws brain-lik e parallel, asynchronous comput ation, and in real time with time-steps at about 1 ms. In terms of po wer ef ficienc y , SpiNNak er has demonstrated superiority o v er supercomput ing-clusters when simulating lar ge-scale biological netw orks [34]. Currently , the full SpiNNak er machine construction is underw ay and it will host 50K SpiNNak er chips; hence, it will ha v e one million processors with 7 terabytes of memory running in parallel. It is e xpected that the machine will consume at most 90kW of electrical po wer [35]. Figure 1 sho ws the structure of the SpiNNak er machine. Figure 1. The SpiNNak er machine is composed of man y SpiNNak er chips assembled on a board. Each SpiNNak er board contains 48 chips, arranged in 2D triangular mesh 3.2. fpGA implementation 3.2.1. Basic cellular model An fpGA is implemented usually as a cellular model. In this model, population is spatially distrib uted across a grid population in which genetic interactions may only tak e place in a small neighborhood of each indi vidual. In a basic cellular model, a node/processor is responsible for processing only one (or v ery limited number) GA chromosome and performs the GA operations in a reproducti v e c ycle together or after interaction with its adjacent neighbor . In this model, a 2D grid of processors is used where the neighborhood is defined using the Manhattan distance from one point to others in the grid. Figure 2 s ho ws e xamples of fpGA netw orks. The neighborhood refers to the set of potential mates of an indi vidual. Consequently , each node in the grid has a neighborhood that o v erlaps the neighborhoods of nearby nodes. The geometrical simplification is used in the sense that all the neighborhoods ha v e the same size and identical shapes. There are v arious configurations in which a cellular pGA can be formed. The tw o most commonly used are L5, also called V on Neumann or NEWS (North, East, W est and South), and C9, also kno wn as Moore neighborhood. 3.2.2. Realisation in SpiNNak er Implementing the pGA basic c ellular model on SpiNNak er is straight forw ard. Ho we v er , there are se v eral considerations for implementation. First, SpiNNak er uses triangular mesh layout, whereas the common pGA cellular models use rectangular mesh topology . In our w ork, we create a modification to the SpiNNak er routing such that only four links are used to mim ic the standard rectangular cellular model: north, west, south and east links are used for pGA interaction. Ho we v er , we k eep the other tw o links (northeast and southwest) ali v e for the purpose of f ault tolerance; i.e., the y only be used as backups if the main pGA links f ail to w ork, either permanently or intermittently . TELK OMNIKA T elecommun Comput El Control, V ol. 19, No. 1, February 2021 : 182 191 Evaluation Warning : The document was created with Spire.PDF for Python.
TELK OMNIKA T elecommun Comput El Control r 185 Second, the SpiNNak er machine can be used per board basis or per group of board basis. In order to create a continuous torus, at least three SpiNNak er boards are needed. In this three board setup, a closed grid of 12x12 nodes can be accomplished and a normal cellular pGA model can operate properly . In some circumstances, ho we v er , a square grid cannot be closed since the torus is discontinued; for e xample, when only one SpiNNak er board is used. F or this particular situation, we define a maximum neighborhood distance as the maximum squared odd-number of chips a v ailable on the system. In this paper , we use three board system b ut the y are not connected in a torus. The grid structure of this three board system is sho wn in Figure 3. Here we use 8x15 grid for the model. In such a configuration, the maximum neighborhood distance allo ws to create either L13 or C49 model. Figure 2. Example of fpGAs as cellular models. Colored nodes represent a neighborhood. Here, L stands for linear while C stands for Compact. It is possible to arrange nodes in the grid dif ferently , such as using a diamond-lik e layout Figure 3. The proposed layout for a pGA cellular model on a three SpiNNak er board Our method for implementing fpGA based on the idea of run-time impro v ement; here, the multi -core feature of SpiNNak er chips pro vides a redundanc y that can be e xploit for amelioration. Here, we propose to use the similar approach presented in our pre vious w ork (see [36]), where we use a spa wning task mechanism on se v eral cores for impro ving performance through loc al parallelism (i.e., those cores will be loaded with the Str ate gies for implementing par allel g enetic algorithms in neur omorphic platform (Indar Sugiarto) Evaluation Warning : The document was created with Spire.PDF for Python.
186 r ISSN: 1693-6930 same GA code and w ork in synchron y) and the rest of the cores will serv e as the backup cores for pro viding higher reliability through a f ault tolerance mechanism. As a summary to our implementation strate gy , Figure 3 sho ws the proposed layout to implement an fpGA in a cellular model on SpiNNak er . The torus is created on the 8x15 grid, in which each of the blue-colored nodes processes one GA chromosome/indi vidual. In the diagram, just tw o GA chromosomes are depicted to sho w ho w the neighborhood in L13 is formed: one with purple color , and the other with yello w color . The num- ber inside the square indicates the coordinate of the node used for routing by the SpiNNak er routing mechanism. 3.3. cpGA implementation 3.3.1. Basic island model In contrast to the fpGA models, cpGA models try to achie v e higher computation performance whilst reducing the communication o v erhead. In this paradigm, the proponent of these models ar gue that it is better to localise the intense GA operations into a group of processing units. In r eality , this concept is commonly implemented as a netw ork of processing islands; hence comes the name of island model. By distrib uting and localising the GA processors, the solution space is e xpected to be partitioned and each island can specialize itself to a certain solution re gion. In order to impro v e the con v er gence, an additional operation called migration is introduced in these models. The conceptual island model is presented in Figure 4. Figure 4. Example of cpGA in an island model 3.3.2. Realisation in SpiNNak er In contrast to the approach for implementing fgGA on SpiNNak er , implementing island models on SpiNNak er is not straight forw ard. There are some constraints that need to be considered. F or e xample, ho w an island is defined i n terms of hardw are resources of SpiNNak er; i.e., ho w man y cores or chips are used to contain an island? This corresponds to the common problem in this island model: specifying/tuning the island size and numbers. This is closely related with the mapping strate gy that will be emplo yed by the SpiNNak er tool-chain. There are tw o options for this: static mapping and dynamic mapping. In s tatic mapping, the num- ber of cores or chips that will contain an island and the routing table for those cores/chips are predefined in compile-time; i.e., the designer must specify them e xplicitly (hardcoded) in the SpiNNak er program code. The second option in v olv es a dynamic mapping approach in the sense that the number of cores/chips along with the routing among them can be obtained automatically , e v en though not during run-time. In this paper , we use the static mapping approach since the problem formulated for our e v aluation is small. Re g arding the island implementation, all cores in a chip are orchestrat ed to handle one population that consists of 120 indi vidual. Since there are only 16 cores a v ailable for computation in each chip, each core is responsible for handling up to 8 indi viduals (some of them handle just 7 indi vidual). In this paper , we design an island to be contained in a SpiNNak er chip. 4. RESUL TS AND AN AL YSIS 4.1. Experiments setup F or e v aluating the performance of pGA on SpiNNak er , we use tw o standard benchmarking functions: Rastrigin and the second Schaf fer function. The y are e xpressed in the follo wing formula: f ( x 1 x n ) = 10 n + n X i =1 ( x 2 i 10 cos (2 x i )) (1) TELK OMNIKA T elecommun Comput El Control, V ol. 19, No. 1, February 2021 : 182 191 Evaluation Warning : The document was created with Spire.PDF for Python.
TELK OMNIKA T elecommun Comput El Control r 187 where 5 : 12 x i 5 : 12 f ( x; y ) = 0 : 5 + sin 2 ( p x 2 + y 2 ) 0 : 5 [1 + 0 : 001 ( x 2 + y 2 )] 2 (2) where 100 x i 100 Both functions ha v e minimum v alue at (0,0). The Rastrigin function in (1) is an e xample of non-con v e x and non-linear multimodal function. Finding the minimum of thi s function is a f airly dif ficult problem due to its lar ge search space and its lar ge number of local minima. The second Schaf fer function in (2) is also non-con v e x and non-linear multimodal function, b ut it has a broader search space than the Rastrigin function. It is used here along with the Rastrigin function so that we can e v aluate the e xtensibility of our pGA frame w ork for solving a similar b ut bigger solution space. In this paper we use only three SpiNNak er boards. F or the fpGA models, we use L13 and C49 model as described in section 3.2.2.. F or the cpGA models, we use v aried number of islands from 2 to 21. An e xample of cpGA model implementation with 21 islands i s depicted in Figure 5. Note that the position of islands do not gi v e an y ef fect on the pGA performance, b ut only on the creation of the routing table for SpiNNak er pack ets. F or the sak e of simplicity , the mapping w as static one as described in section 3.3.2.; hence, the routing for SpiNNak er pack ets were defined manually . If the islands position in Figure 5 are changed, then a ne w routing table must be defined. W e are a w are of this limitation, and we plan to use a dynamic mapping in our future w ork. T o mak e a f air comparison between fpGA and cpGA, we use the same GA parameters as sho wn in T able 1. Re g ardi ng the population generation in fpGAs, there are tw o mechanism that can be emplo yed: synchronous and asynchronous. In this paper , we use the synchronous approach as the parallel update polic y , in which all nodes are updated simultaneously in each operation. Figure 5. Example of an fpGA model with 21 islands using only a single SpiNNak er board. Here the green-colored links are predefined as the route for migration T able 1. Common GA parameters used for both fpGA and cpGA in the e xperiments P arameter V alue GA encoding binary 16-bit Elitism none Crosso v er T ype single point Crosso v er Rate 0.1 Mutation Rate 0.02 Migration Rate 0.05 Population Size 120 4.2. Experiment r esults W e run the fpGA as a cellular model for the Rastrigin and Schaf fer functions on SpiNNak er using L13 and C49 configurations. The netw ork w as e x ecuted for maximum 100 iterations, which corresponds to generating 100 generations in the GA process. The result is sho wn in Figure 6. Just for con v enience, the plot Str ate gies for implementing par allel g enetic algorithms in neur omorphic platform (Indar Sugiarto) Evaluation Warning : The document was created with Spire.PDF for Python.
188 r ISSN: 1693-6930 sho ws the objecti v e v alue instead of fitness v alue; plotting the fitness v alue will sho w the similar pattern. W e also run the cpGA v ersion using an island model with v arious number of islands. W e use the model to find the global minima for both the Rastrigin and the second Schaf fer test functions. Similar to the pre vious e xperiment on fpGA, we run the model for 100 generations. The result is sho wn in Figure 7 (a) for Rastrigin, and Figure 7 (b) for the Schaf fer function. 4.3. Ev aluation and discussion W e ha v e run initial e xperiments usi ng tw o standard optimization test functions and we found that our implementation of both fpGA and cpGA produced satisfying results. Both models found the global optimum v ery quickly , which is measured as number of generations needed to achie v e the con v er gence result. P articular e v aluations of each model are presented as follo ws. 4.3.1. P erf ormance on fpGA models W e use tw o cellular models for implementing fpGA on SpiNNak er: L13 and C49. In both cases, multi- core feature of SpiNNak er chips are emplo yed for ameli o r ation and reliability purposes; hence, the number of indi vidual in the population is the same as the size of the grid, which is 120. As we can see in Figure 6, both L13 and C49 models w ork properly and quite f ast to con v er ge i nto the correct solution. In general, we can observ e that solving the second Schaf fer function is a bit f aster than the Rastrigin function. This is interesting because the Schaf fer function has a lar ger search space than the Rastrigin functi on; one w ould e xpect that the searching for the solution should be longer . Ho we v er , Rastrigin function has much more local minima than the Schaf fer function. W e ar gue that this e xplains the con v er gence speed beha viour in our models. 0 10 20 30 40 50 60 70 80 90 100 Generation 0 5 10 15 20 25 30 35 Objective Value L13_on_Schaffer L13_on_Rastrigin C49_on_Schaffer C49_on_Rastrigin Figure 6. The result of running cellular models using L13 and C49 configurations to find the global minima for Rastrigin and Schaf fer functions Also, we notice that C49 model runs in a more steady f ashion than the L13 model. As we can see from the plot, C49 model needs almost the same number of generations to con v er ge for both test functions, which happens at 15 th generation. Ho we v er for L13 model, e v en though it reach the con v er gence for the Schaf fer function at generation 13 th , the con v er gence for the Rastrigin function starts imminently after the 25 th generation. W e admit that we could not gi v e further insight wh y this happens, and we plan to carry further e v aluation using some more comple x functions in our future w ork. 4.3.2. P erf ormance on cpGA models The similar test mechanism w as als o applied for our island model as a representation of cpGA cl ass. Here, we use the number of islands as the indicator for performance. As we can see in Figure 7, higher number of islands means higher con v er gence rate. In both test cases, the con v er gence has been achie v ed before the 10 th generation when using 20 islands (in the Schaf fer test, the con v er gence starts at the sixth generation). As in the fpGA case, we also notice that the netw ork performance for solving the Sc h a f fer test problem is a bit f aster than for the Rastrigin test function. TELK OMNIKA T elecommun Comput El Control, V ol. 19, No. 1, February 2021 : 182 191 Evaluation Warning : The document was created with Spire.PDF for Python.
TELK OMNIKA T elecommun Comput El Control r 189 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 G e n e r a t i o n 0 5 1 0 1 5 2 0 2 5 3 0 3 5 O b j e c t i v e   V a l u e 2   i s l a n d s 4   i s l a n d s 6   i s l a n d s 8   i s l a n d s 1 0   i s l a n d s 1 2   i s l a n d s 1 4   i s l a n d s 1 6   i s l a n d s 1 8   i s l a n d s 2 0   i s l a n d s (a) 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 G e n e r a t i o n 0 5 1 0 1 5 2 0 2 5 3 0 3 5 O b j e c t i v e   V a l u e 2   i s l a n d s 4   i s l a n d s 6   i s l a n d s 8   i s l a n d s 1 0   i s l a n d s 1 2   i s l a n d s 1 4   i s l a n d s 1 6   i s l a n d s 1 8   i s l a n d s 2 0   i s l a n d s (b) Figure 7. cpGA results for: (a) rastrigin, and (b) schaf fer function 4.3.3. Further impr o v ement ideas Comparing the island model and the cellular model, our e xperiments sho w that the island model con v er ges f aster than the cellular model at higher number of islands. Ho we v er , when the number of islands is small, the island model does not sho w better performance than the cellular model. W e ar gue that increasing the number of islands means that the number of indi viduals searching for the solution is also increased. W e also observ e that in the cellular models, similar indi viduals tend to cluster creating niches, and these groups operate as if the y were separate sub-populations similar to islands in the island model. F or practical implementation consideration, we sho w here tw o observ ations: 1. F or solving small problems, in general, impleme n t ing the “constrained” island models on SpiNNak er is easier than the cellular models. In this case, the island models require less SpiNNak er resources in terms of the number of chips and the pack et routing mechanism. Ho we v er , this does not scale-up with problem size. 2. F or solving bigger problems which require longer GA strings for encoding, the cellular models are pre- ferred because we can achie v e higher computation performance by applying local parallelism inside the chip. This cannot be achie v ed when using island models, because all cores in the chip ha v e been used already and lea v e no space for performance impro v ement. There are se v eral other features that still need to be e xplored in future. F or e xample, in the fpGA models, what if the neighborhood tak es non-canonical forms as in a diamond or e v en in a trapezoid shape? Also, ho w the synchronous and asynchronous updates af fected the performance on the models? F or the cpGA models, there are also man y features that need to be e xplored. F or e xample, what are the ef fects of dif ferent migration rates as well as the direction of migration? Also, ho w to mer ge islands and what is the ef fect of dynamic immigrant size? Ne v ertheless, this paper serv es as an initial starting point to e xplore the v ast potential of pGA applications on the SpiNNak er platform. Str ate gies for implementing par allel g enetic algorithms in neur omorphic platform (Indar Sugiarto) Evaluation Warning : The document was created with Spire.PDF for Python.
190 r ISSN: 1693-6930 5. CONCLUSIONS This paper presents a preliminary w ork on the feasibil ity for implementing parallel genetic al gorithms (pGA) with some de grees of fle xibility on a neuromorphic platform. SpiNNak er , a man y-core programmable neuromorphic platform, is used to implement tw o classes of pGA: fine-grained pGA (fpGA) and coarse-grained pGA (cpGA). Due to numerous v ariability of these models, not all possible configurations are presented. Ho w- e v er , the e xperi ment results sho w the promising fle xibility that potentially can be used to implement an y model configuration from both fgGA and cpGA domains. Some considerations are presented re g arding the e xpected performance of the implemented pGA with re g ard to hardw are constraint. The trade-of f between performance and scalability is also presented. W e ar gue that once the hurdle with the mapping comple xity has been o v ercome, SpiNNak er will become a superior platform for implementing a massi v e p a rallel genetic algorithm frame w ork to solv e v arious multi-objecti v e optimisation problems. This will be a precious added v alue for SpiNNak er as a man y-core neuromorphic platform, not only for performing brain-style information processing, b ut also as a generic high performance computing system. A CKNO WLEDGMENT This w ork w as supported partially by the United Kingdom Engineering and Ph ysical Sciences Re- search Council (EPSRC) through the Graceful project (grant EP/L000563/1). The design and construction of the SpiNNak er machine w as supported by EPSRC (the UK Engineering and Ph ysical Sciences Research Coun- cil) under grant nos. EP/D07908X/1 and EP/G015740/1, in collaboration with the uni v ers ities of Southampton, Cambridge and Shef field and with industry partners ARM Ltd, Silistix Ltd and Thales. Ongoing de v elopment of the softw are is supported by the EU ICT Flagship Human Brain Project (H2020 785907 and 945539), in collaboration with man y uni v ersity and industry partners across the EU and be yond. The de v elopment of the pGA test cases on SpiNNak er w as supported by Petra Christian Uni v ersity through the Special Grant (grant 009/SP2H/L T -MUL TI/LPPM-UKP/III/2020 and 016/SP2H/L T -MUL TI/LPPM-UKP/III/2020). REFERENCES [1] A. O. Moraes, J. F . Mitre, P . L. Lage, and A. R. Secchi, A rob ust parallel algorithm of the particle sw arm optimization method for lar ge dimensional engineering problems, Applied Mathematical Modelling , v ol. 39, no. 14, pp. 4223–4241, 2015. [Online]. A v ailable: http://www .sciencedirect.com/science/article/pii/S0307904X14007094 [2] J. Dom ´ ınguez and E. Alba, “Dealing with hardw are heterogeneity: a ne w parallel search model, Natur al Computing , v ol. 12, no. 2, pp. 179–193, 2013. [3] K. Asano vic, R. Bodik, J. Demmel, T . K ea v en y , K. K eutzer , J. K ubiato wicz, N. Mor g an, D. P atterson, K. Sen, J. W a wrzynek, D. W essel, and K. Y elick, A vie w of the parallel computing landscape, Communication of the A CM , v ol. 52, no. 10, pp. 56–67, 2009. [Online]. A v ailable: http://doi.acm.or g/10.1145/1562764.1562783 [4] D. Lim, Y .-S. Ong, Y . Jin, B. Sendhof f, and B.-S. Lee, “Ef cient hierarchical parallel genetic algorithms using grid computing, Futur e Gener ation Computer Systems , v ol. 23, no. 4, pp. 658–670, 2007. [5] J. Luo, S. Fujimura, D. El Baz, and B. Plazolles, “GPU based parallel genetic algorithm for solving an ener gy ef ficient dynamic fle xible flo w shop scheduling problem, J ournal of P ar allel and Distrib uted Computing , v ol. 133, pp. 244–257, 2019. [Online]. A v ailable: http://doi.or g/10.1016/j.jpdc.2018.07.022 [6] N. Hou, F . He, Y . Zhou, Y . Chen, and X. Y an, A parallel genetic algorithm with dispersion correction for hw/sw partitioning on multi-core cpu and man y-core GPU, IEEE Access , v ol. 6, pp. 883–898, 2018. [7] S. Kang, S.-S. Kim, J. W on, and Y . M. Kang, “GPU-based parallel genetic approach to lar ge-scale tra v elling salesman problem, The J ournal of Super computing , v ol. 72, no. 11, pp. 4399–4414, 2016. [8] A. Ferigo and G. Iacca, A gpu-enabled compact genetic algorithm for v ery lar ge-scale optimization problems, Mathematics , v ol. 8, no. 5, pp. 758 (1–26), 2020. [9] B. Plazolles, D. El Baz, M. Spel, V . Ri v ola, and P . Ge gout, “SIMD monte-carlo numerical simulations accelerated on GPU and x eon phi, International J ournal of P ar allel Pr o gr amming , v ol. 46, no. 3, pp. 584–606, 2018. [10] B. Ren, S. Balakrishna, Y . Jo, S. Krishnamoorth y , K. Agra w al, and M. K ulkarni, “Extracting SIMD parallelism from recursi v e task-parallel programs, A CM T r ans. P ar allel Comput. , v ol. 6, no. 4, pp. 1–37, Dec. 2019. [11] A. E. Eiben and J. E. S mith, Intr oduction to Evolutionary Computing , 2 nd ed: S pr ing er ; 2015 : [12] J. P ´ erez Heredia, A computational vie w on natural e v olution: On the rigorous analysis of the speed of adaptation, Ph.D. dissertation, Uni v ersity of Shef field, 2017. [13] Y . Y . Liu and S. W ang, A scalable parallel genetic algorithm for the generalized assignment problem, P ar allel computing , v ol. 46, pp. 98–119, 2015. [14] S. Furber , “Lar ge-scale neuromorphic computing systems, J ournal of neur al engineering , v ol. 13, no. 5, pp. 1–14, 2016. [15] M. Da vies, N. Srini v asa, T . Lin, G. Chin ya, Y . Cao, S. H. Choday, and et al. , “Loihi: A neuromorphic man ycore processor with on-chip learning, IEEE Micr o , v ol. 38, no. 1, pp. 82–99, 2018. [16] Y . v an De Bur gt, A. Melianas, S. T . K eene, G. Malliaras, and A. Salleo, “Or g anic electronics for neuromorphic computing, Natur e Electr onics , v ol. 1, no. 7, pp. 386–397, 2018. [17] Y . Y . W ong, K. H. Lee, K. S. Leung, and C.-W . Ho, A no v el approach in pa rameter adaptation and di v ersity mainte- nance for genetic algorithms, Soft Computing , v ol. 7, no. 8, pp. 506–515, 2003. TELK OMNIKA T elecommun Comput El Control, V ol. 19, No. 1, February 2021 : 182 191 Evaluation Warning : The document was created with Spire.PDF for Python.
TELK OMNIKA T elecommun Comput El Control r 191 [18] G. Karafotias, M. Hoogendoorn, and A. E. Eiben, “P arameter control in e v olutionary algori thms: T rends and chal- lenges, IEEE T r ansactions on Evolutionary Computation , v ol. 19, no. 2, pp. 167–187, 2015. [19] F . P . Such, V . Madha v an, E. Conti, J. Lehman, K. O. Stanle y , and J. Clune, “Deep neuroe v olution: Genetic algorithms are a competiti v e alternati v e for traini ng deep neural netw orks for reinforcement learning, CoRR , v ol. abs/1712.06567, 2017. [Online]. A v ailable: http://arxi v .or g/abs/1712.06567 [20] A. Ab uZekry , I. Sobh, E.-S. Hemayed, M. Hadhoud, and M. F ayek, “Comparati v e study of neuro-e v olution algorithms in reinforcement learning for self-dri ving cars, Eur opean J ournal of Engineering Science and T ec hnolo gy , v ol. 2, no. 4, pp. 60–71, 2019. [21] D. Whitle y , A genetic algor ithm tutorial, Statistics and computing , v ol. 4, no. 2, pp. 65–85, 1994. [22] M. Srini v as and L. M. P at naik, “Genetic algorithms: a surv e y , Computer , v ol. 27, no. 6, pp. 17–26, June 1994. [23] E. Alba and M. T omassini, “P arallelism and e v olutiona ry algorithms, IEEE T r ansactions on Evolutionary Computa- tion , v ol. 6, no. 5, pp. 443–462, Oct 2002. [24] J. R. Cheng and M. Gen, Accelerating genetic algorithms with GPU computing: A selec ti v e o v ervie w , Computer s & Industrial Engineering , v ol. 128, pp. 514–525, 2019. [25] E. Cant ´ u-P az, A surv e y of parallel genetic algorithms, Calculateur s par alleles, r eseaux et systems r epartis , v ol. 10, no. 2, pp. 141–171, 1998. [26] E. Alba and J. M. T ro ya, A surv e y of parallel distrib uted genetic algorithms, Comple xity , v ol. 4, no. 4, pp. 31–52, 1999. [27] D. Kn ysh and V . K ureichik, “P arallel genetic algorithms: a surv e y and problem state of the art, J ournal of Computer and Systems Sciences International , v ol. 49, no. 4, pp. 579–589, 2010. [28] B. Johnson and Y . Qu, “T o w ards an autonomously configured parallel genetic algorithm: Deme size and deme num- ber , in 2016 2nd IEEE International Confer ence on Computer and Communications (ICCC) . IEEE, 2016, pp. 278–288. [29] H. Ma, S. Shen, M. Y u, Z. Y ang, M. Fei, and H. Zhou, “Multi-population techniques in nature inspired optimization algorithms: A comprehensi v e surv e y , Swarm and Evolutionary Computation , v ol. 44, pp. 365–387, 2019. [Online]. A v ailable: http://www .sciencedirect.com/science/article/pii/S2210650217306363 [30] Y .-J. Gong, W .-N. Chen, Z.-H. Zhan, J. Zhang, Y . Li, Q. Zhang, and J.-J. Li, “Distrib uted e v olutionary algorithms and their models: A surv e y of the state-of-the-art, Applied Soft Computing , v ol. 34, pp. 286–300, 2015. [Online]. A v ailable: http://www .sciencedirect.com/science/article/pii/S1568494615002987 [31] I. Sugiarto, G. Liu, S. Da vidson, L. A. Plana, and S. B. Furber , “H igh performance c omputing on SpiNNak e r neuro- morphic platform: A case study for ener gy ef ficient image processing, in 2016 IEEE 35th International P erformance Computing and Communications Confer ence (IPCCC) . IEEE, 2016, pp. 1–8. [32] B. Sen-Bhattacharya, S. James, O. Rhodes, I. Sugiarto, A. Ro wle y, A. B. Stok es, K. Gurne y, and S. B. Furber, “Building a spiking neural netw ork model of the basal g anglia on SpiNNak er, IEEE T r ansactions on Co gnitive and De velopmental Systems , v ol. 10, no. 3, pp. 823–836, 2018. [33] A. Y ousefzadeh, L. A. Plana, S. T emple, T . Serrano-Gotarredona, S. B. Furber, and B. Linares-Barranco, “F ast predic- ti v e handshaking in sync hronous FPGAs for fully asynchronous multisymbol chip links: Application to SpiNNak er 2-of-7 links, IEEE T r ansactions on Cir cuits and Systems II: Expr ess Briefs , v ol. 63, no. 8, pp. 763–767, 2016. [34] T . Sharp, F . Galluppi, A. Rast, and S. Furber , “Po wer -ef ficient simulation of detailed cortical microcircuits on SpiN- Nak er , J ournal of Neur oscience Methods , v ol. 210, no. 1, pp. 110–118, 2012. [35] A. G. D. Ro wle y , C. Brenninkmeijer , S. Da vidson, D. Fello ws, A. Gait, D. Lester , L. A. Plana, O. Rhodes, A. Stok es, and S. B. Furber , “Spinntools: the e x ecuti on engine for the SpiNNak er platform, F r ontier s in Neur oscience , v ol. 13, pp. 231–251, 2019. [36] I. Sugiarto, P . Campos, N. Dahir , G. T empesti, and S. Furber , “Optimized task graph mapping on a man y-core neuro- morphic supercomputer , in 2017 IEEE High P erformance Extr eme Computing Confer ence (HPEC) . IEEE, 2017, pp. 1–6. BIOGRAPHIES OF A UTHORS Indar Sugiarto w as a postdoctoral researcher at the Adv anced Processor T echnology of the Uni v er - sity of Manchest er . He obtained PhD De gree from T echnische Uni v ersit ¨ at M ¨ unchen in 2015. His researches are in fields of parallel computing, artificial intelligence, machine learning, and robotics. In his postdoctoral w ork, he de v eloped a f ault-tolerance and optim ization method for the SpiNNak er neuromorphic supercomputer . He is af filiated with IEEE and A CM as an acti v e member . He serv es as an of ficer for the Indonesian Chapter of IEEE Computer Society . He is also the Deputy of Artificial Intelligence Research Center at Petra Christian Uni v ersity . Ste v e Furber recei v ed the B.A. de gree in mathematics and Ph.D. de gree in aerodynamics from the Uni v ersity of Cambridge, Cambridge, U.K., in 1974 and 1980, respecti v ely . He w as with the RD Department, Acorn Computer Ltd. from 1981 to 1990 and w as a Principal Designer with BBC Micro and the ARM 32-bit RISC microprocessor . He mo v ed to his current position as ICL Professor of Computer Engineering with the Uni v ersity of Manchester , Manchester , U.K., in 1990. Prof. Furber is a Fello w of the Ro yal Society , the Ro yal Academy of Engineering, the BCS, and the IET . Str ate gies for implementing par allel g enetic algorithms in neur omorphic platform (Indar Sugiarto) Evaluation Warning : The document was created with Spire.PDF for Python.