IAES International Journal of Articial Intellig ence (IJ- AI) V ol. 14, N o. 3, June 2025, pp. 2537 2546 ISSN: 2252-8938, DOI: 10.11591/i jai.v14.i3.pp2537-2546 2537 Ev aluating sear c h k e y distribution im pact on sear c hing per f ormance in larg e dat a str eams Bo w onsak Srisungsittisunti 1 , Jira w at Duangkae w 1 , N akarin Chaikae w 2 1 Depar tment of Computer Engineer ing, Sc hool of Inf or mation and Communication T ec hnology , U niv ersity of Pha y ao, Pha y ao, Thailand 2 Depar tment of Geog raphic Inf or mation Science, Sc hool of Inf or mation and Communication T ec hnology , U niv ersity of Pha y ao, Pha y ao, Thailand Article Inf o Ar ticle histor y : R eceiv ed Oct 9, 2024 R e vised Jan 29, 2025 A ccepted Mar 15, 2025 K eyw or ds: Binar y searc h tree Dense inde xing Inde xing eciency Lar g e-scale dataset Sparse inde xing ABS TRA CT The dis tr ibution patter n of searc h k e y s is assessed in this s tudy b y contras ting f our methods of inde x searc hing on lar g e-scale JSON les with data s treams. The A delson- V elskii and Landis (A VL) tree, binar y searc h tree (BS T), linear searc h (LS), and binar y searc h (BS) are among the searc h s trategies. W e look at the nor mal dis tr ibution, left-sk e w ed dis tr ibution, and r ight-sk e w ed dis tr ibution of searc h-k e y dis tr ibutions. A ccording to the results, LS per f or ms the slo w es t, a v eraging 653.166 milliseconds, whereas A VL tree per f or ms better than the others in dense inde x, with an a v erag e searc h time of 0.005 milliseconds. W ith 0.011 milliseconds per k e yw ord f or sparse inde x, BS outper f or ms LS, whic h a v erag es 1007.848 milliseconds. F or dense inde xing, an A VL tree w orks bes t; f or sparse inde xing, BS is recommended. This is an open access ar ticle under t he CC B Y -SA license. Cor r esponding A uthor : Jira w at Duangkae w Depar tment of Computer Engineer ing, Sc hool of Inf or mation and Communication T ec hnology U niv ersity of Pha y ao 19 Area 2 Maeka, Pha y ao, Thailand Email: jira w at.du@outlook.com 1. INTR ODUCTION In our pre vious researc h [1], w e introduced a method to enhance data retr ie v al eciency in lar g e- scale Ja v aScr ipt Object N otation (JSON) datasets b y using inde xing tec hniq ues. Our approac h applies dense inde xing tec hniq ues f or one-to-one dataset relationships and sparse inde xing f or one-to-man y relationships, in compar ison with non-inde x ed cases. The researc h w as conducted in tw o main segments. The rs t in v ol v ed e xper imenting with accessing positions within the inde x, and the second in v ol v ed data retr ie v al within segments using both inde xing types, sho w casing the ability to skip segments to access lar g e-scale Bigdata.json le sets. In ter ms of time eciency f or position access with a searc h k e y , dense inde xing a v erag ed 59.175 milliseconds, whereas non-dense inde xing a v erag ed 15,635.232 milliseconds. Sparse inde xing a v erag ed 36.387 milliseconds, while non-sparse inde xing took an a v erag e of 15,635.232 milliseconds. A ccessing data positions b y searc h k e y from pre vious researc h [1] s till f ollo w s a linear searc h (LS) patter n, whic h is ecient onl y when there is an inde x order . F or this researc h, e v aluating the impact of searc h k e y dis tr ibution on searc hing tec hniq ue per f or mance in lar g e-scale JSON les that contain data s treams is proposed. This e v aluation compares the time eciency of accessing data locations b y searc h k e y : dense inde x f or one-to-one data and sparse inde x f or one-to-man y data. Section 2.2 e xplains the approac hes to appl ying LS, binar y searc h (BS), binar y searc h tree (BS T), and A delson- V elskii Landis tree (A VL) tec hniq ues. Exper iments are conducted on the c haracter is tics Journal homepag e: http://i jai.iaescor e.com Evaluation Warning : The document was created with Spire.PDF for Python.
2538 ISSN: 2252-8938 of searc h k e y s in dense and sparse inde x es, whic h ha v e three dierent dis tr ibution patter ns as descr ibed in section 2.1, consis ting of searc h k e y s f or nor mal dis tr ibution, left-sk e w dis tr ibution, and r ight-sk e w dis tr ibution. The aim is to s tudy whic h tec hniq ue has the highes t and lo w es t time eciency in accessing data locations in the lar g e-scale JSON le set using searc h k e y s from dense and sparse inde x es that ha v e dierent searc h k e y dis tr ibution patter ns. In addition, w e sur v e y ed to oer a comprehensiv e unders tanding of cur rent researc h in this eld. Pre vious researc h in v ol v ed impro ving inde x eciency , suc h as through parallel inde xing, h ybr id methodologies, and adaptiv e inde xing, whic h ha v e the potential to signicantl y enhance inde xing per f or mance. The y can reduce searc h durations, decrease s torag e o v erhead, and manag e specialized data types or intr icate data s tr uctures [2]–[8]. The deplo yment of these tec hniq ues is par ticular l y benecial f or databases containing uncer tain data, g raph databases, lar g e-scale g eospatial data der iv ed from the inter net of things (Io T), and e xtensiv e time ser ies datasets. The researc h section w as related to databases using LS tec hniq ues. Suc h inno v ativ e methods as path- based inde x s tr uctures, online similar ity searc hes, in v er ted inde x optimizations, dynamic compressed lear ned inde x es, small-space algor ithmic anal y sis, lear ned s tr uctures f or spatial data, and dis tance-computation-free searc h sc hemes are emer ging as inno v ativ e methods to enhance database per f or mance. These tec hniq ues can s treamline q uer y processing, optimize s torag e utilization, and procientl y manag e specialized data f or mats or comple x data arc hitectures [9]–[16]. The application of these methodologies can be especiall y adv antag eous f or e xtensible mark up languag e (XML) databases, uncer tain time ser ies data, relational database manag ement sy s tems, spatial data, and binar y code databases. Mean while, researc h related to BS, suc h as ne xt-g eneration database inde xing, decision tree algor ithms tailored f or spatial data, nature-inspired optimization methods, dynamic approac hes in BS spaces, and random BS methodologies, oers considerable promise to adv ance database per f or mance. These adv ancements can s treamline searc h operations, optimize s torag e capabilities, and ecientl y handle specialized data f or mats or comple x data infras tr uctures [17]–[21]. The application of these s trategies pro v es especiall y eectiv e f or prog ressiv e database sy s tems, spatial data anal y sis, association r ule mining, BS optimizations, and neares t neighbor searc hes within binar y domains. On the other hand, researc h related to BS T , suc h as deter minis tic nite tree automata minimizatio n, space-ecient inde xing tailored f or cyber -ph y sical sy s tems, data-dr iv en cac he optimization, and B+- T ree-based searc h methods, presents signicant adv ancements in database per f or mance and secur ity . These methodologies ha v e the potential to optimize searc h operations, ensure ecient space utilization, and deliv er secure and eectiv e data retr ie v al in clo ud en vironments [22]–[26]. Appl ying these tec hniq ues is especiall y adv antag eous f or sy s tems f ocused on minimizing automata, optimizing cyber -ph y sical databases, rening cac he per f or mance in B+- T rees, ensur ing encr ypted cloud data secur ity , and ecientl y retr ie ving product inf or mation on cloud platf or ms. F inall y , w e s tudied researc h related to the A VL tree. Suc h researc h includes secure document retr ie v al in encr ypted cloud sy s tems, h ybr id sw ar m optimization f or netw ork clus ter ing, ener gy -ecient clus ter head selection, and ultra-scalable bloc kc hain platf or ms f or asset tok enization, sho w casing promising adv ancements in their respectiv e elds. These methods ensure data secur ity in cloud en vironments, optimize netw ork clus ter ing and ener gy utilization in wireless sensor netw orks, and pa v e the w a y f or lar g e-scale asset manag ement on bloc kc hain platf or ms [27]–[30]. The amalg amation of these s trategies oers subs tantial potential f or sy s tems that underscore encr ypted cloud data protection, s treamlined clus ter ing in wireless infras tr uctures, and e xtensible bloc kc hain mec hanisms f or global assets. 2. METHOD In pre vious researc h [1], w e proposed dense inde xing f or one-to-one data access, as illus trated in F igure 1. The le set is called Dense inde x position.json, with a size of 33 MB and containing 1,000,000 transactions. The le set consis ts of searc h k e y s and specic positions within the Bigdata.json dataset. While the sparse inde x f or accessing one-to-man y data is illus trated in F igure 2, the set of les is called Sparse inde x position.json, with a le size of 8 MB and con taining 5,146 transactions. This le set consis ts of searc h k e y s and multiple positions within the Bigdata.json dataset. In searc hing to access data positions using the dense and sparse inde x es, no issues ar ise when searc hing f or a single searc h k e y . Ho w e v er , when searc hing f or multiple searc h k e y s, the searc h k e y dis tr ibution mus t be considered. Theref ore, this researc h proposes an approac h to compare the eciency of access times to data positions using dense and sparse inde x es. The researc her selected main searc h k e y s from both inde x datasets Int J Ar tif Intell, V ol. 14, N o. 3, June 2025: 2537–2546 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Ar tif Intell ISSN: 2252-8938 2539 as descr ibed in section 2.1, whic h ha v e three dierent searc h k e y dis tr ibutions: nor mal dis tr ibution, left-sk e w dis tr ibution, and r ight-sk e w dis tr ibution. The objectiv e is to compare the eciency of accessing positions using the main searc h k e y s within both inde x les. The researc her applied f our tec hniq ues as e xplained in section 2.2, including LS, BS, BS T , and A VL tree. The time eciency of eac h tec hniq ue w as considered to deter mine whic h pro vided the f as tes t and slo w es t results f or accessing positions within both inde x datasets. "Phillip.Headlon@smith.com": 0, "Jonnie@pacificare.com": 1, ... "Annalee@pinnaclewest.com": 1000000 F igure 1. Ex amples of transactions in the Dense inde x position.json with a size of 33 MB and 1,000,000 transactions "Maryalice": [0, 547, 836317, 1040693, 1041786], "Luciana": [1, 3639, 9282, 1041132, 1044174], ... "Eulah": [2, 1216, 630028, 630264, 1042683] F igure 2. Ex amples of transactions in the Sparse inde x position.json with a size of 8 MB and 5,146 transactions 2.1. The selection of sear c h k e y s f or dense and sparse distribution This section e xplains the selection of 524 searc h k e y s to e v aluate the eectiv eness of the researc h. W e obtained these searc h k e y s from the dense inde x position le, whic h is 33 MB and has 1,000,000 transactions, and the sparse inde x position le, whic h is 8 MB and has 5,146 transactions. F or the dense inde x, w e created v e email in de x les, eac h containing 524 non-repeating searc h k e y s, and divided these inde x les into three main types according to position rang e. The rs t type is the dense inde x f or the nor mal dis tr ibution of the inde x, whic h has 2,070 searc h k e y s in the position rang e of 300,000 to 800,000, with 550 searc h k e y s in the left-sk e w ed dis tr ibution and r ight-sk e w ed dis tr ibution, as illus trated in F igure 3. F igure 3. Dense inde x f or nor mal dis tr ibution The second type of inde x is a dense inde x f or the left-sk e w ed dis tr ibution. As illus trated in F igure 4, the inde x is dis tr ibuted within the rang e of 100,000 to 500,000 with 1,806 searc h k e y s. Moreo v er , the dis tr ibution on the r ight side holds 814 searc h k e y s. Ev aluating sear c h key dis tribution impact on sear c hing per f or mance in ... (Bo w onsak Srisungsittisunti) Evaluation Warning : The document was created with Spire.PDF for Python.
2540 ISSN: 2252-8938 F igure 4. Dense inde x f or a left sk e w ed dis tr ibution The third type of inde x is a dense inde x f or a r ight-sk e w ed dis tr ibution. As illus trated in F igure 5, the inde x is dis tr ibuted within the rang e of 500,000 to 1,000,000, encompassing 1,999 searc h k e y s. The dis tr ibution on the left side contains 621 searc h k e y s. F or the sparse inde x, w e created v e email inde x les, eac h containing 524 non-repeating searc h k e y s. These inde x les w ere divided into three main types according to position rang e. The rs t type is the sparse inde x f or nor mal dis tr ibution of the inde x, whic h has 2,600 searc h k e y s in the 2,000 to 4,000 position rang e, with 20 searc h k e y s in the left- and r ight-sk e w ed dis tr ibution, as illus trated in F igure 6. F igure 5. Dense inde x f or a r ight sk e w ed dis tr ibution F igure 6. Sparse inde x f or nor mal dis tr ibution The second type of inde x is a sparse inde x f or the left-sk e w ed dis tr ibution. As illus trated in F igure 7, the inde x is dis tr ibuted within the rang e of 1,000 to 3,000 with 1,885 searc h k e y s. Moreo v er , the dis tr ibution on the r ight side holds 735 searc h k e y s. Int J Ar tif Intell, V ol. 14, N o. 3, June 2025: 2537–2546 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Ar tif Intell ISSN: 2252-8938 2541 The third type of inde x is a sparse inde x f or a r ight-sk e w ed dis tr ibution. As illus trated in F igure 8, the inde x is dis tr ibuted within the rang e of 3,000 to 5,000, encompassing 1,916 searc h k e y s. Moreo v er , the dis tr ibution on the left side contains 704 searc h k e y s. F igure 7. Sparse inde x f or left sk e w ed dis tr ibution F igure 8. Sparse inde x f or r ight sk e w ed dis tr ibution 2.2. Algorithms f or com paring the eciency of dense and sparse indices This section e xplains the methods f or searc hing and accessing dense and sparse inde x position les, whic h compr ise LS, BS, BS T , and A VL tree. The details of these algor ithms are pro vided as f ollo w s. 2.2.1. Dense and sparse inde x algorithms using linear sear c h F or the method of searc hing the position of searc h k e y ter ms within dense and sparse inde x les using the LS algor ithm, three dierent dis tr ibutions of tes t searc h k e y ter ms w ere used, as detailed in section 2.1. The searc h s tar ts b y c hec king the searc h k e y ter m in eac h transaction entr y within the dense inde x le. If the searc hed searc h k e y ter m matc hes the searc h k e y ter m appear ing in the le, the position of that searc h k e y ter m can be retr ie v ed. If not, the searc h continues until the desired k e y is f ound or until all entr ies are e xhaus ted. Algorithm 1 Dense and sparse inde x algor ithms using LS R eq uir e: Dense and Sparse Inde x position F ile, searc h k e y s tes t with dis tr ibution as descr ibed in section 2.1 Ensur e: P osition in Bigdata.json 1: function FindSear chKey (inde x.positions, desired.searc h.k e y s) 2: result {} 3: f or searc h.k e y desired.searc h.k e y s do 4: position.f ound F alse 5: f or inde x.entr y , inde x.entr y Enumerate(inde x.positions) do 6: if inde x.entr y[’ searc h.k e y’] == searc h.k e y then 7: result[searc h.k e y] inde x.entr y[’position ’] 8: position.f ound T r ue br eak 9: end if 10: end f or 11: if not position.f ound then 12: result[searc h.k e y] ”Searc h k e y not f ound.” 13: end if 14: end f or 15: r e turn result 16: end function F or the utilization of dense and sparse inde x algor ithms using LS to access the position of searc h k e y s within dense and sparse inde x position les, impor t tes t searc h k e y s with all dis tr ibution patter ns as descr ibed in section 2.1. Impor t one searc h k e y at a time into the searc h v er ication process. This process uses the LS tec hniq ue as e xplained in section 2.2.1. Ev aluating sear c h key dis tribution impact on sear c hing per f or mance in ... (Bo w onsak Srisungsittisunti) Evaluation Warning : The document was created with Spire.PDF for Python.
2542 ISSN: 2252-8938 2.2.2. Dense and sparse inde x algorithms using binary sear c h When searc hing f or the position of searc h k e y ter ms in dense and sparse inde x les using the BS tec hniq ue, three searc h k e y ter m dis tr ibution patter ns w ere tes ted, as descr ibed in section 2.1. The searc h process begins b y sor ting the searc h k e y ter ms in the dense and sparse inde x les. T es t searc h k e y ter ms with dierent dis tr ibutions are then used to nd the midpoint of the searc h k e y ter m seq uence. The midpoint is compared to the searc h v alue. If the searc h k e y ter m is f ound, the position of the searc h k e y ter m is retur ned. If the searc h k e y ter m is not f ound, the data seq uence is divided into tw o par ts, and the searc h continues in the par t with searc h k e y ter m v alues closes t to the desired searc h k e y ter m. Algorithm 2 Dense and sparse inde x algor ithms using binar y searc h 1: In put : Dense and Sparse Inde x position F ile, searc h k e y s tes t with dis tr ibution as descr ibed in section 2.1 2: Output : P osition in Bigdata.json 3: function bin ar y sear ch ( , ) 4: 0 5: len ( ) 1 6: while do 7:  ( + ) // 2 8:  [  ] [ 0 ] 9: if  == then 10: r e turn [  ] [ 1 ] 11: else if  < then 12:  + 1 13: else 14:  1 15: end if 16: end while 17: r e turn ”Searc h k e y not f ound.” 18: end function F or the utilization of dense and sparse inde x algor ithms using BS to access the position of searc h k e y s within dense and sparse inde x position les, impor t tes t searc h k e y s with six dierent dis tr ibution patter ns as descr ibed in section 2.1. Eac h searc h k e y is processed through the searc h v er ication process using the BS tec hniq ue, as e xplained in section 2.2.2. The eciency of accessing the searc h k e y position based on the a v erag e or w ors t-case data rang e is O (log n), but the bes t-case scenar io is O(1) w hen the tes t searc h k e y matc hes the inde x position les, retur ning the position of Bigdata.json. 2.2.3. Dense and sparse inde x algorithms using BS T and A VL T r ee This section e xplains the method of searc hing and accessing positions within dense and sparse inde x les using both the BS T and the A VL tree tec hniq ues. The algor ithms f or these methods are illus trated in Algor ithm 3, whic h integ rates both BS T and A VL tree operations f or eciency . Six dierent tes t searc h k e y dis tr ibution patter ns are used, as descr ibed in section 2.1. The searc h process begins b y ar ranging the main w ords in the inde x le. T es t searc h k e y s are then used to searc h at the root node of either the BS T or the A VL tree, depending on the desired eciency . If the root node is empty , the process retur ns none. F or both tree s tr uctures, the searc h proceeds b y compar ing the desired searc h k e y with the cur rent node s searc h k e y s in the dense inde x position le. If the searc h k e y matc hes the cur rent node, the position of the main w ord is retur ned, and the searc h f or the ne xt searc h k e y continues. Other wise, the direction of tra v ersal (left or r ight) is deter mined based on whether the searc h k e y is less or g reater than the cur rent node s searc h k e y . F or the utilization of dense and sparse inde x algor ithms using BS T or A VL trees to access the position of searc h k e y s within dense and sparse inde x position les, the tes t searc h k e y s are processed with both tec hniq ues, depending on eciency req uirements. The BS T tec hniq ue, as descr ibed in section 2.2.3, allo w s searc hing with a v erag e or w ors t time comple xity eq ual to ( ) . On the other hand, the A VL tree tec hniq ue, as descr ibed in section 2.2.4, oers an impro v ed a v erag e and w ors t-case time comple xity of ( log ) . The bes t case f or both is ( 1 ) when the searc h k e y directl y matc hes an inde x in the le, retur ning the position in Bigdata.json. Int J Ar tif Intell, V ol. 14, N o. 3, June 2025: 2537–2546 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Ar tif Intell ISSN: 2252-8938 2543 Algorithm 3 Dense and sparse inde x algor ithms using BS T and A VL T ree 1: In put : Dense and Sparse Inde x position F ile, searc h k e y s tes t with dis tr ibution as descr ibed in section 2.1 2: Output : P osition in Bigdata.json 3: function Index Sear ch (inde x le, searc h k e y s, tree type) 4: if == “BS T” then 5: ne w Binar ySearc hT ree() 6: else if == A VL then 7: ne w A VL T ree() 8: else 9: raise err or : U nsuppor ted tree type. 10: end if 11: f or ( , ) do 12: . ( , ) 13: end f or 14: { } 15: f or do 16: . ( ) 17: if NULL then 18: [ ] 19: else 20: [ ] ”Searc h k e y not f ound.” 21: end if 22: end f or 23: r e turn 24: end function 3. RESUL TS This section presents the e v aluation of searc h time eciency f or accessing data positions in tw o datasets: dense inde x position.json and sparse inde x position.json. The searc h algor ithms emplo y ed f or the anal y sis include LS, BS, BS T , and A VL tree. The e xper iments w ere conducted with three dis tinct searc h k e y dis tr ibution patter ns: nor mal, left-sk e w ed, and r ight-sk e w ed dis tr ibutions, as detailed in section 2.1. The comparativ e e xper iments f ocused on tw o types of inde x es: dense inde x as sho wn in T able 1 and sparse inde x as sho wn in T able 2. F or both inde x types, the e xper iments measured tw o pr imar y c haracter is tics: A v erag e time of 15 rounds, whic h assessed the total time req uired to access positions f or 524 searc h k e y s o v er fteen rounds, and A v erag e time per searc h k e y , whic h e v aluated the searc h time req uired to access eac h searc h k e y individuall y . The results sho w ed that sor ting and inde xing are necessar y f or BS, BS T , and A VL T ree to enhance per f or mance. F or LS, there w as no need to sor t the inde x bef orehand, as LS seq uentiall y e v aluates all transaction items, reg ardless of searc h k e y dis tr ibution. The results f or the dense inde x (T able 1) demons trate that BS had an a v erag e inde xing time of 1,795.708 milliseconds, while BS T completed inde xing in 773.149 milliseconds, and A VL T ree ac hie v ed the f as tes t time of 635.262 milliseconds. T able 1. Compar ison of dense inde x searc hing eciency with searc h k e y dis tr ibution T ec hniq ues R etr ie v e time b y dense inde x (ms) A v g. time of 15 rounds A v g. per searc h k e y Inde x le sor ting time Linear searc h N or mal dis tr ibution 342259.155 653.166 N o inde x le sor ting Left-sk e w ed dis tr ibution 337220.625 643.551 Right-sk e w ed dis tr ibution 349457.130 666.903 Binar y searc h N or mal dis tr ibution 9.811 0.019 1795.708 Left-sk e w ed dis tr ibution 10.199 0.019 Right-sk e w ed dis tr ibution 11.572 0.022 Binar y searc h tree N or mal dis tr ibution 179.970 0.343 773.149 Left-sk e w ed dis tr ibution 173.583 0.331 Right-sk e w ed dis tr ibution 155.317 0.296 A VL T ree N or mal dis tr ibution 2.703 0.005 635.262 Left-sk e w ed dis tr ibution 2.346 0.004 Right-sk e w ed dis tr ibution 2.433 0.005 The rs t e xper iment e v aluated the eciency of accessing data positions in a dense inde x. Eac h tes t round included 524 searc h k e y s, and the results w ere a v erag ed o v er 15 rounds. W hen tes ted with searc h k e y s f ollo wing a nor mal dis tr ibution, the A VL tree w as the f as tes t, with an a v erag e access time of 2.703 milliseconds. Ev aluating sear c h key dis tribution impact on sear c hing per f or mance in ... (Bo w onsak Srisungsittisunti) Evaluation Warning : The document was created with Spire.PDF for Python.
2544 ISSN: 2252-8938 On the other hand, LS w as the slo w es t, taking 342,259.155 milliseconds. F or searc h k e y s with a left-sk e w ed dis tr ibution, the A VL tree per f or med the q uic k es t, with an a v erag e time of 2.346 milliseconds, while LS w as ag ain the slo w es t, taking 337,220.625 milliseconds. In r ight-sk e w ed searc h k e y dis tr ibutions, the A VL tree w as the f as tes t, with an a v erag e time of 2.433 milliseconds, whereas LS remained the slo w es t at 349,457.13 milliseconds. The second e xper iment assessed searc h eciency in a sparse inde x. Eac h round also consis ted of 524 searc h k e y s, and the results w ere a v erag ed o v er 15 rounds. W ith a nor mal searc h k e y dis tr ibution, BS w as the f as tes t, w ith an a v erag e time of 5.647 milliseconds, while LS w as the slo w es t, taking 528,112.194 milliseconds. F or left-sk e w ed searc h k e y dis tr ibutions, BS w as the q uic k es t, a v eraging 3.868 milliseconds, while LS w as the slo w es t, taking 528,112.194 milliseconds. W ith r ight-sk e w ed searc h k e y dis tr ibutions, BS w as the f as tes t, a v eraging 5.592 milliseconds, while LS remained the slo w es t at 528,179.767 milliseconds. T able 2. Compar ison of sparse inde x searc hing eciency with searc h k e y dis tr ibution T ec hniq ues R etr ie v e time b y sparse inde x (ms) A v g. time of 15 rounds A v g. per searc h k e y Inde x le sor ting time Linear searc h N or mal dis tr ibution 528112.194 1007.848 N o inde x le sor ting Left-sk e w ed dis tr ibution 528253.244 1008.117 Right-sk e w ed dis tr ibution 528179.767 1007.977 Binar y searc h N or mal dis tr ibution 5.647 0.011 136.711 Left-sk e w ed dis tr ibution 3.868 0.007 Right-sk e w ed dis tr ibution 5.592 0.011 Binar y searc h tree N or mal dis tr ibution 161.007 0.307 N o inde x le sor ting Left-sk e w ed dis tr ibution 166.118 0.317 Right-sk e w ed dis tr ibution 158.378 0.302 A VL tree N or mal dis tr ibution 30.903 0.059 113.538 Left-sk e w ed dis tr ibution 35.200 0.067 Right-sk e w ed dis tr ibution 35.223 0.067 4. C ON CL USION This researc h co mpares the time eciency of accessing positions using both dense and sparse inde x es. The s tudy applied f our searc h tec hniq ues: LS, BS, BS T , and A VL tree, across three searc h k e y dis tr ibution patter ns: nor mal, left-sk e w ed, and r ight-sk e w ed. The e xper imental ndings indicate that f or dense inde x es with a nor mal dis tr ibution of searc h k e y s, the A VL T ree w as the f as tes t, a v eraging 0.005 milliseconds per searc h k e y . Con v ersel y , LS w as the slo w es t, with an a v erag e of 653.166 milliseconds per searc h k e y . F or left-sk e w ed dis tr ibutions, the A VL T ree w as ag ain the q uic k es t, a v eraging 0.004 milliseconds per searc h k e y , while LS w as the slo w es t at 643.551 milliseconds. The patter n w as consis tent f or r ight-sk e w ed dis tr ibutions, with the A VL T ree being the f as tes t at 0.005 milliseconds and LS being the slo w es t at 666.903 milliseconds. F or sparse inde x es, the results w ere slightl y dierent. When searc h k e y s w ere nor mall y dis tr ibuted, BS w as the mos t ecient, a v eraging 0.011 milliseconds per searc h k e y , while LS w as the slo w es t, taking 1007.848 milliseconds. F or left-sk e w ed dis tr ibutions, BS remained the f as tes t, a v eraging 0.007 milliseconds per searc h k e y , with LS being the slo w es t at 1008.117 milliseconds. Similar l y , f or r ight-sk e w ed dis tr ibutions, BS w as the q uic k es t at 0.011 milliseconds per searc h k e y , and LS w as the slo w es t, a v eraging 1007.977 milliseconds. In conclusion, the A VL T ree is highl y ecient f or dense inde x es, par ticular l y when handling lar g e v olumes of searc h k e y s, suc h as 1,000,000 transactions. On the other hand, BS is more suitable f or sparse inde x es with a smaller number of searc h k e y s, suc h as 5,000 transactions. LS consis tentl y sho w ed the leas t eciency in both dense and sparse inde x scenar ios. F uture researc h can e xplore secondar y inde xing tec hniq ues to enhance scalability and eciency f or more comple x data types and datasets. A CKN O WLEDGMENTS The authors thank R esearc h U nit f or De v elopment of Intellig ent Sy s tems and A utonomous R obots (IS AR) and Sc hool of Inf or mation T ec hnology and Communication, U niv ersity of Pha y ao, f or f acility suppor t. FUNDIN G INFORMA TION This w ork w as suppor ted b y the Super KPI project f or inter national publications, under the Sc hool of Inf or mation and Communication T ec hnology , U niv ersity of Pha y ao. Int J Ar tif Intell, V ol. 14, N o. 3, June 2025: 2537–2546 Evaluation Warning : The document was created with Spire.PDF for Python.
Int J Ar tif Intell ISSN: 2252-8938 2545 A UTHOR C ONTRIBUTIONS S T A TEMENT This jour nal uses the Contr ibutor R oles T ax onom y (CR edi T) to recognize individual author contr ibu- tions, reduce authorship disputes, and f acilitate collaboration. N ame of A uthor C M So V a F o I R D O E V i Su P Fu Bo w onsak Sr isungsittisunti Jira w at Duangkae w N akar in Chaikae w C : C onceptualization I : I n v es tig ation V i : V i sualization M : M ethodology R : R esources Su : Su per vision So : So ftw are D : D ata Curation P : P roject A dminis tration V a : V a lidation O : W r iting - O r iginal Draft F u : Fu nding A cq uisition F o : F o r mal Anal y sis E : W r iting - R e vie w & E diting C ONFLICT OF INTERES T S T A TEMENT The authors s tate no conict of interes t. D A T A A V AILABILIT Y Der iv ed data suppor ting the ndings of this s tudy are a v ailable from the cor responding author , [JD], on req ues t. REFEREN CES [1] B. Sr isungsittisunti, J. Duangkae w , S. Mekr uksa v anic h, N . Chaikae w , and P . R ojana v asu, “Enhancing data retr ie v al eciency in lar g e-scale Ja v aScr ipt object notation datasets b y using inde xing tec hniq ues, IAES Int er national Jour nal of Ar ticial Int ellig ence (IJ-AI) , v ol. 13, no. 2, pp. 2342–2353, 2024, doi:10.11591/i jai.v13.i2.pp2342-2353. [2] R. Qiu, Y . Ming, Y . Hong, H. Li, and T . Y ang, “W ind-bell inde x: to w ards ultra-f as t edg e q uer y f or g raph databases, 2023 IEEE 39t h Int er national Conf er ence on Data Engineering (ICDE) , Apr . 2023, pp. 2090–2098, doi: 10.1109/icde55515.2023.00162. [3] H. W u e t al. , A per f or mance-impro v ed and s torag e-ecient secondar y inde x f or big data processing, 2017 IEEE Int er national Conf er ence on Smar t Cloud (Smar tCloud) , N o v . 2017, pp. 161–167, doi: 10.1109/smar tcloud.2017.32. [4] R. Ma, X. Zhu, and L. Y an, A h ybr id approac h f or clus ter ing uncer tain time ser ies, Jour nal of Computing and Inf or mation T ec hnology , v ol. 28, no. 4, pp. 255–267, Oct. 2021, doi: 10.20532/cit.2020.1004802. [5] S. V . Limkar and R. K. Jha, A no v el method f or parallel inde xing of real time g eospatial big data g enerated b y Io T de vices, F utur e Gener ation Comput er Sy s t ems , v ol. 97, pp. 433–452, A ug. 2019, doi: 10.1016/j.future.2018.09.061. [6] M. A. Qader , S. Cheng, and V . Hr is tidis, A comparativ e s tudy of secondar y inde xing tec hniq ues in LSM-based N oSQL databases, Pr oceedings of t he 2018 Int er national Conf er ence on Manag ement of Data , Ma y 2018, pp. pp. 551–566, doi: 10.1145/3183713.3196900. [7] Z. W ang, Q. W ang, P . W ang, T . P alpanas, and W . W ang, “Dump y : A compact and adaptiv e inde x f or lar g e data ser ies collections, Pr oceedings of t he A CM on Manag ement of Data , v ol. 1, no. 1, pp. 1–27, Ma y 2023, doi: 10.1145/3588965. [8] M. M. La w al, H. Ibrahim, N . F . M. Sani, and R. Y aak ob, Anal y ses of inde xing tec hniq ues on uncer tain data with high dimensionality , IEEE A ccess , v ol. 8, pp. 74101–74117, 2020, doi: 10.1109/access.2020.2988487. [9] D. Gopinathan and K. Asa w a, “N e w path based inde x s tr ucture f or processing C AS q uer ies o v er XML database, Jour nal of Computing and Inf or mation T ec hnology , v ol. 25, no. 3, pp. 211–225, Oct. 2017, doi: 10.20532/cit.2017.1003557. [10] R. Ma, D. Zheng, and L. Y an, “F as t online similar ity searc h f or uncer tain time ser ies, Jour nal of Computing and Inf or mation T ec hnology , v ol. 28, no. 1, pp. 1–17, Jul. 2020, doi: 10.20532/cit.2020.1004574. [11] Y . Shin, J. Ahn, and D.-H. Im, “Join optimization f or in v er ted inde x tec hniq ue on relational database manag ement sy s tems, Exper t Sy s t ems wit h Applications , v ol. 198, Jul. 2022, doi: 10.1016/j.esw a.2022.116956. [12] P . F er ragina and G. V inciguer ra, The PGM-inde x: a full y -dynamic compressed lear ned inde x with pro v able w ors t-case bounds, Pr oceedings of t he VLDB Endo wment , v ol. 13, no. 8, pp. 1162–1175, 2020, doi: 10.14778/3389133.3389135. [13] D. Belazzougui e t al. , “Linear -time s tr ing inde xing and anal y sis in small space, A CM T r ansactions on Algorit hms (T AL G) , v ol. 16, no. 2, pp. 1–54, 2020, doi: 10.1145/3381417. [14] P . Li, H. Lu, Q. Zheng, L. Y ang, and G. P an, “LIS A: A lear ned inde x s tr ucture f or spatial data, Pr oceedings of t he 2020 A CM SIGMOD Int er national Conf er ence on Manag ement of Data , Ma y 2020, pp. 2119–2133, doi: 10.1145/3318464.3389703. [15] T . Kraska, A. Beutel, E. H. Chi, J. Dean, and N . P ol yzotis, The case f or lear ned inde x s tr uctures, Pr oceedings of t he 2018 Int er national Conf er ence on Manag ement of Data , Ma y 2018, pp. 489–504, doi: 10.1145/3183713.3196909. [16] J. Song, H. T . Shen, J. W ang, Z. Huang, N . Sebe, and J. W ang, A dis tance-computation-free searc h sc heme f or binar y code databases, IEEE T r ansactions on Multimedia , v ol. 18, no. 3, pp. 484–495, Mar . 2016, doi: 10.1109/tmm.2016.2515990. [17] J. Dittr ic h, J. Nix, and C. Sc h ¨ on, The ne xt 50 y ears in database inde xing or: the case f or automaticall y g enerated inde x s tr uctures, Pr oceedings of t he VLDB Endo wment , v ol. 15, no. 3, pp. 527–540, N o v . 2021, doi: 10.14778/3494124.3494136. Ev aluating sear c h key dis tribution impact on sear c hing per f or mance in ... (Bo w onsak Srisungsittisunti) Evaluation Warning : The document was created with Spire.PDF for Python.
2546 ISSN: 2252-8938 [18] S. Oujdi, H. Belbac hir , and F . Bouf ares, “C4.5 decision tree algor ithm f or spatial data, alter nativ es and per f or mances, Jour nal of Computing and Inf or mation T ec hnology , v ol. 27, no. 3, pp. 29–43, Ma y 2020, doi: 10.20532/cit.2019.1004651. [19] Y . Gheraibia, A. Moussaoui, S. Kabir , and P . Y . Y in, “P enguins searc h optimisation algor ithm f or association r ules mining, Jour nal of Computing and Inf or mation T ec hnology , v ol. 24, no. 2, pp. 165–179, Jun. 2016, doi: 10.20532/cit.2016.1002745. [20] A. Ba ykaso ˘ glu and F . B. Ozso ydan, “Dynamic optimization in binar y searc h spaces via w eighted super position attraction algor ithm, Exper t Sy s t ems wit h Applications , v ol. 96, pp. 157–174, Apr . 2018, doi: 10.1016/j.esw a.2017.11.048. [21] M. K omoro w ski and T . T rzci ´ nski, “Random binar y searc h trees f or appro ximate neares t neighbour searc h in binar y spaces, Applied Sof t Computing , v ol. 79, pp. 87–93, Jun. 2019, doi: 10.1016/j.asoc.2019.03.031. [22] Y . Guellouma and H. Cher roun, “Ecient implementation f or deter minis tic nite tree automata minimization, Jour nal of Computing and Inf or mation T ec hnology , v ol. 24, no. 4, pp. 311–322, Dec. 2016, doi: 10.20532/cit.2016.1002867. [23] Y .-H. K uan, Y .-H. Chang, T .- Y . Chen, P .-C. Huang, and K.- Y . Lam, “Space-ecient inde x sc heme f or PCM-based multiv ersion databases in cyber -ph y sical sy s tems, A CM T r ansactions on Embedded Computing Sy s t ems , v ol. 16, no. 1, pp. 1–26, Oct. 2016, doi: 10.1145/2950060. [24] R. K ¨ uhn, D. Bieber t, C. Hak er t, J.-J. Chen, and J. T eubner , T o w ards data-based cac he optimization of B+-trees, Pr oceedings of t he 19t h Int er national W or kshop on Data Manag ement on N ew Har dw ar e , Jun. 2023, pp. 63–69, doi: 10.1145/3592980.3595316. [25] H. Shen, L. X ue, H. W ang, L. Zhang, and J. Zhang, “B+-tree based multi-k e yw ord rank ed similar ity searc h sc heme o v er encr ypted cloud data, IEEE A ccess , v ol. 9, pp. 150865–150877, 2021, doi: 10.1109/access.2021.3125729. [26] Y .-S. Zhao and Q.- A. Zeng, “Secure and ecient product inf or mation retr ie v al in cloud computing, IEEE A ccess , v ol. 6, pp. 14747–14754, 2018, doi: 10.1109/access.2018.2816919. [27] J. F u, N . W ang, B. Cui, and B. K. Bhar g a v a, A practical frame w ork f or secure document retr ie v al in encr ypted cloud le sy s tems, IEEE T r ansactions on P ar allel and Dis tribut ed Sy s t ems , v ol. 33, no. 5, pp. 1246–1261, Ma y 2022, doi: 10.1109/tpds.2021.3107752. [28] R. M. Alamelu and K. Prabu, “Hybr idization of Pig eon inspired with glo ww or m’ sw ar m optimization based clus ter ing tec hniq ue in wireless sensor netw orks, Micr opr ocessor s and Micr osy s t ems , v ol. 91, Jun. 2022, doi: 10.1016/j.micpro.2022.104528. [29] R. R. Pr iy adarshini and N . Siv ak umar , “Clus ter head selection based on minimum connected dominating set and Bi-par tite inspired methodology f or ener gy conser v ation in W SNs, Jour nal of King Saud U niv er sity - Comput er and Inf or mation Sciences , v ol. 33, no. 9, pp. 1132–1144, N o v . 2021, doi: 10.1016/j.jksuci.2018.08.009. [30] A. Buldas e t al. , An ultra-scalable bloc kc hain platf or m f or univ ersal asset tok enization: design and implementation, IEEE A ccess , v ol. 10, pp. 77284–77322, 2022, doi: 10.1109/access.2022.3192837. BIOGRAPHIES OF A UTHORS Dr . Bo w onsak Srisungsittisunti is Assis tant Prof essor at Computer Engineer ing, Sc hool of Inf or mation and Communication tec hnology , U niv ersity of Pha y ao, Thailand. He holds a Ph.D. deg ree in computer engineer ing with specialization in data processing. His researc h areas are data processing, data anal ytic, data mining, and database sy s tem. He can be contacted at email: bo w onsak.sr@up.ac.th. Jira w at Duangkae w receiv ed a Bac helor of Science deg ree in computer science from Rambhai Bar ni Ra jabhat U niv ersity , Thailand, in 2020, and completed his Mas ter of Engineer ing in computer engineer ing at the U niv ersity of Pha y ao, Thailand, in 2024. He is cur rentl y pursuing his Ph.D. in computer engineer ing at the U niv ersity of Pha y ao, Thailand. His researc h interes ts include inde xing tec hniq ues, non-relational databases, lar g e databases, and incremental databases. He can be contacted at email: jira w at.du@outlook.com. N akarin Chaikae w is Ph.D. in remote sensing and g eog raphic inf or mation sy s tems, Asian Ins titute of T ec hnology (AIT). He is Assis tant Prof essor in g eog raphic inf or mation science, U niv ersity of Pha y ao, Thailand. He can be contacted at email: nakar in.c h@up.ac.th. Int J Ar tif Intell, V ol. 14, N o. 3, June 2025: 2537–2546 Evaluation Warning : The document was created with Spire.PDF for Python.