Indonesian J our nal of Electrical Engineering and Computer Science V ol. 39, No. 1, July 2025, pp. 310 321 ISSN: 2502-4752, DOI: 10.11591/ijeecs.v39.i1.pp310-321 310 Short-term r ecall comparison of iconic auditory and visual feedback stimuli in a memory game Gy ¨ or gy W ers ´ enyi 1 , ´ Ad ´ am Csap ´ o 2,3 , J ´ ozsef T oll ´ ar 4,5 1 Department of T elecommunications, Sz ´ echen yi Istv ´ an Uni v ersity (SZE), Gy ¨ or , Hung ary 2 Institute for Adv anced Studies Corvinus Uni v ersity of Budapest, Budapest, Hung ary 3 Institute of Data Analytics and Information Systems, Corvinus Uni v ersity of Budapest, Budapest, Hung ary 4 Digital De v elopment Center , Sz ´ echen yi Istv ´ an Uni v ersity (SZE), Gy ¨ or , Hung ary 5 Somogy County Kaposi M ´ or T eaching Hospital, Kaposv ´ ar , Hung ary Article Inf o Article history: Recei v ed May 28, 2024 Re vised No v 6, 2025 Accepted Mar 25, 2025 K eyw ords: Audio visual memory Auditory icon Human-computer Interaction Serious g aming Sound design ABSTRA CT Multimedia user interf aces incorporate v a rious feedback methods using dif ferent modalities. Cogniti v e processing of audio visual information requires the ability to recall visual and auditory information, either separately , or in combination. Short-term memory capabilities v ary indi vidually and depend on f actors such as signal presentation and the number and type of visual and auditory items. In an e xperiment in v olving 40 subjects, we aimed to compare short-term auditory and visual capabilities in a serious g ame application. Subjects played the ‘P airs’ g ame at dif ferent resolutions, using either visual icons or audio samples, while the total time cost and number of ips were recorded. The results indicate that visual memory is not superior , and female subjects performed better than males at higher le v els in the visual task. Additionally , human sound samples, speech and f amiliar auditory icons were found to be easier to recall than articial mea- surement signals. This is an open access article under the CC BY -SA license . Corresponding A uthor: Gy ¨ or gy W ers ´ en yi Department of T elecommunications, Sz ´ echen yi Istv ´ an Uni v ersity Gy ¨ or , H-9026, Hung ary Email: wersen yi@sze.hu 1. INTR ODUCTION Augmented and virtual reality solutions, assisti v e technology applications, virtual audio displays (V AD), g ames, and simulators are just some of the emer ging elds where feedback is based on audio visual information. Users often need to recall the visual and/or auditory representations of specic e v ents on the screen and recall their meaning, and sometimes e v en their spatial location. Usabilit y of the multimedia in- terf ace v aries depending on the number of e v ents, user e xperience, and cogniti v e capabilities. It is essential to remember the meaning behind a gi v en representation. This cogniti v e process in v olv es the utilization of both visual and auditory memory in the brain, both in the long-t erm and short-term. Early e xperiments in psychology did not incorporate computer -based methods. De v elopments in technology later allo wed for using computers both for e xperimenting and for data collection and e v aluation. In addition, computer g ames e v olv ed and introduced a v ariety of audio visual i nformation for entertainment purposes. Recently , the need for com- bining entertainment and e xperimental data collection in v olving human subjects emer ged. Serious g aming, or g amication, is a method used to collect scientic data through a g aming scenario. A well-designed g ame can enhance the user e xperience, maintain and increase moti v ation, while also allo wing for the analysis of results J ournal homepage: http://ijeecs.iaescor e .com Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 311 with scientic merit. Using g amication, scientic e xperiments can be designed and e x ecuted to collect data in an entertaining and moti v ating process for an y age or gender groups [1]–[6]. Subjects ha v e a limited capacity to recall information and w orking memory plays a k e y role in t his process. The terms “w orking memory” and “short-term memory” are often used interchangeably [7]–[11]. The y both refer to immediate conscious perceptual and linguistic processing for a limited amount of informa- tion and time. During this acti v e process, temporarily stored audio and/or visual information can be accessed and manipulated. The storage time for short-term is generally around 20-30 seconds or e v en less [12]–[14]. Long-term memory dif fers from short-term memory primarily in terms of duration b ut also in capacity [8]. The most important property of w orking memory is the limite d capacity . It w as demonstrated that the visual w orking memory can store 3-4 objects [15]–[20]. Ho we v er , a lar ger number of objects can also be recalled with v arying precision, and there are indi vidual dif ferences and lar ge v ar iability in repeated measurements [21], [22]. In the case of auditory memory , most studies ha v e focused on the short-term ef fects; ho we v er , com- parisons with long-term ef fects ha v e also been made [23]–[26]. Capacity limits here were also suggested to be around “se v en plus or minus tw o” [27]. The results contrasting the abilities of the audio and visual modalities ha v e not been conclusi v e. Most studies ha v e sho wn superior visual performance [28]–[34]. Ho we v er , some e xperiments ha v e found similar memory performance [35], [36]. V ariability in former results and outcomes could be attrib uted to the sensiti vity of the e xperiments to initial parameters. Auditory information can also be presented alongside visual informati on in a mix ed mode. Memory performance has been demonstrated to be better f o r semantically congruent stimuli presented together in dif ferent modalities compared to stimuli presented with an incongruent or non-semantic stimulus across modalities [37]–[41]. Semantically congruent v erbal and non-v erbal visual stimuli presented in tandem with auditory counterparts can enhance the precision of auditory encoding. Semantically congruent presentation, where the iconic representation is easily link ed to its meaning, generally aids in this process. Better perf o r mance can be achie v ed with meaningful stimuli and cogniti v e training [42]–[46]. In particular , human sounds were sho wn to be detected better , especially in the case of speech and human-generated v ocal sounds [47], [48]. Although most pre vious w orks suggest other wise, there is no e vident consensus on the superiority of visual memory , especially in short-term recall tasks. In the case of visually impaired indi viduals, the processing of audit ory information can be e v en more enhanced. The y are the most important tar get group in the de v elop- ment of assisti v e technology , where auditory memory plays an e v en more signicant role. Furthermore, sound design and sonication approaches constantly deal with the problem of the proper selection and optimization of auditory e v ents for feedback. The results can be v ery sensiti v e to the age, gender , or e xperience of the subjects; thus, a lar ger number of participants is required. This number should generally e xceed 30, a requirement that is seldom met. Exhausti v e laboratory procedures can be demanding, especially for the subjects; therefore, a g amication approach with a f amiliar g ame design can enhance the reliability of the data. An application with the possibility to set the number of items to be recalled from “v ery easy” to “v ery dif cult” can also highlight the limitations in capacity , and determine if there is a trade-of f limit i n cogniti v e processing. The purpose of our e xperiment is to test dif ferences between m od a lities, genders, limits, and types of stimuli in a short-term recall task of information. This paper presents an e xperiment in v olving unt rained subjects using a serious g ame application based on the “P airs” memory g ame in both visual and auditory modes, across v arious resolutions. Section 2 describes the measurement setup, including the softw are implementation, the e xperimental procedure, and data e v aluation methods. Section 3 presents results based on statistical analysis. Outcomes will be discussed based on the results in section 4, follo wed by the nal conclusions. 2. MEASUREMENT SETUP First, the softw are en vironment, including the g ame and the data collection module, w as designed, programmed, and tested. F ollo wing this, the measurement procedure (data collection and e v aluation) and the applied methods were determined. Finally , the recruitment of subjects and the laboratory setup were completed. The memory g ame “P airs” w as selected for the e xperiment. In this g ame, players ip cards to match pairs. The f amiliar and simple g ameplay , as well as the easy implementation of dif ferent modalities (audio and/or visual), were the most important f actors in the decision. Furthermore, this type of g ame eng ages the players’ short-term memory . Short-term r ecall comparison of iconic auditory and visual feedbac k ... (Gy ¨ or gy W er s ´ enyi) Evaluation Warning : The document was created with Spire.PDF for Python.
312 ISSN: 2502-4752 The GUI is simply or g anized. Figure 1 sho ws tw o screenshots of the g ame. Upon initialization, the user or the e xper imenter enters user rele v ant data (ID, gender , and age) and selects the modality and resolution (number of pairs). Each le v el with a higher resolution includes all pairs from the pre vious le v el; for e xample, all 5 pairs in the 5 × 2 resolution are included in all subsequent resolutions. In the visual mode, black-and-white icons were displayed, while in the audio mode, short, iconic sound samples were played back. Figure 2 illustrates all the a v ailable icons and their corresponding auditory e v ents. The icons were designed to represent the semantic meaning of t he sound sam p l es while k eeping them v ery simple. Auditory samples were do wnloaded from public databases or recorded and then modied (e.g., adjusting sound le v els, cutting, and shortening). These samples were selected to represent dif ferent sound types, such as human-related sounds, e v eryday sounds, and meaningless sound e v ents (acoustic measurement signals). Upon starting the g ame, icons or audio samples are randomized. In both modalities, the corresponding visual icon is re v ealed after successfully matching a pair . If there are 10 seconds of inacti vity , the g ame will be aborted without sa ving the data. A more detailed description of the coding procedure can be found in [36]. Figure 1. Screenshots of the g ame. Initial screen (left) and an ongoing g ame in 4 × 4 resolution Figure 2. All visual icons and the corresponding auditory samples in the highest resolution (6 × 8). Green color indicates “articial measurement signals”, yello w represents “human sounds” and white signies “auditory icons or earcons” Indonesian J Elec Eng & Comp Sci, V ol. 39, No. 1, July 2025: 310–321 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 313 T otal number of ips, total g ame time and i p number for ea ch pair were recorded and stored using the e xible JSON le format. F or e v aluation of the res u l ts, the JSON les were imported into Excel. Statistical e v aluation of the results w as performed using the Excel Solv er , including paired t-tests and ANO V A, follo wed by T uk e y post-hoc analysis at the 0.05 signicance le v el. In the e xperiment 40 subjects participated, 20 males (age 18-43, mean 20.50 years; standard de via tion (SD) 6.24) and 20 females (age 18-50, mean 27.85; SD 11.80). The subjects we re seated in a quiet laboratory room and used a standard laptop computer with b uilt-in speak ers that the y controlled with a mouse. After e x- plaining the purpose of the e xperiment, subjects eng aged in playing the g ame. During the process, subje cts rst played t he visual g ame, start ing with the smallest resolution (5 × 2), and then progressed to higher resolutions (up to 6 × 8). F ollo wing a short break, the same procedure w as repeated in the audio modality . Subjects were encouraged to minimize their error rate (number of ips) b ut could choose an y g aming strate gy and speed. The g ame is currently not a v ailable to the general public, as further e xperiments are ongoing. Ho we v er , after completing the laboratory measurements, both the current v ersion of the g ame and an updated v ersion with a cro wdsourcing module will be published and made a v ailable for use. 3. RESUL TS The main focus of the e v aluation is to detect dif ferences based on gender , between the tw o modaliti es, and among the auditory samples, using completion time and ip numbers as metrics. In this section, results are rst presented based on gender comparison, follo wed by comparisons of modality and resolution. Finally , specic ndings for each resolution are presented. The ne xt section discusses the ndings. 3.1. Gender comparison T ables 1 and 2 sho w mean and SD v alues for time and ips for both genders and modalities ba sed on g ameplays at all resolutions combined. In visual mode, the dif ference in time cost between the genders w as not signicant (F=0.36; p=0.55), b ut the mean number of ips sho wed signicantly better results (fe wer ips) for females (F=7.73; p=0.006). Ho we v er , there w as no dif ference observ ed for either time or ips in audio-only mode (F=0.47; p=0.49) and (F=0.73; p=0.39), respecti v ely . T able 1. Summarized results o v er all resolutions of time costs (in seconds) and number of ips (mean and SD v alues) for each modality (males) Modality T ime Flips T ime Flips V ision V ision Audio Audio Mean 128.01 102.18 232.73 92.63 SD 94.52 78.82 184.63 70.54 T able 2. Summarized results o v er all resolutions of time costs (in seconds) and number of ips (mean and SD v alues) for each modality (females) Modality T ime Flips T ime Flips V ision V ision Audio Audio Mean 121.93 80.94 219.53 86.08 SD 96.71 65.50 182.08 74.90 3.2. Comparison of modalities The time cost for visual g ameplays w as consistently signicantly lo wer than for audio m ode, b ut t his is attrib uted to the presentation method rather than the cogniti v e functions of the subjects in this case. V isual icons were re v ealed immediately after clicking on a card, whereas audio samples required 2-4 seconds each to play back. Thus, when comparing the modalities among males (T able 1), the mean completion time for visual stimuli (128.01) is signicantly f aster than for audio (232.73) (F=45.89; p=5.16E-11). Interestingly , there w as no signicant dif ference in ip numbers (F=1.47; p=0.23). The same pattern holds for females (T able 2), where the dif ference between the mean times (121.93 and 219.53) is signicant (F=40.34; p=6.47E-10), b ut not for ip numbers (F=0.48; p=0.49). Short-term r ecall comparison of iconic auditory and visual feedbac k ... (Gy ¨ or gy W er s ´ enyi) Evaluation Warning : The document was created with Spire.PDF for Python.
314 ISSN: 2502-4752 3.3. Comparison depending on r esolution Figure 3 presents the results for all resolutions used and for both modalities. Mean time cost and ip v alues for males/females are collected and presented alongside the ANO V A results. “No” indicates a statis- tically insignicant dif ference between the means, while “yes” indicates a statistically signicant dif ference between the genders. Lo wer v alues (less time, fe wer ips) indicate better results. F or instance, in the 5 × 2 resolution, the mean ip v alue in audio mode for males (21.50) appears higher than for females (19.90), b ut it is not si gnicant (p=0.46). In contrast, the dif ference in the same e v aluation in visual mode sho ws better results for females. Using some of the data from Figure 3, we can rearrange the result s to create T ables 3 and 4. Here, the time information is omitted, allo wing for a comparison based solely on the mean ip numbers across all res- olutions. These results support that there w as no signicant dif ference in ip number between the modalities, neither for females nor for males, re g ardless of resolution. Only one of the 18 paired comparisons sho wed a slightly signicant dif ference (T able 3): in the 6 × 6 resolution for males, where the mean ip number in audio mode (129.20) is better than it is for visual mode (156.55). Figure 3. Summarized results for all resolution (ra w × column) for gender comparison (male/female) based on time and ips T able 3. Summarized results for modality comparison based on mean ips numbers in each resolution (males) Audio V ision ANO V A 5 × 2 21.50 21.90 F=0.06; p=0.80 3 × 4 25.70 24.50 F=0.53; p=0.47 4 × 4 39.80 43.30 F=1.49; p=0.23 4 × 5 56.50 62.00 F=1.46; p=0.19 4 × 6 71.10 78.00 F=1.20; p=0.28 6 × 5 108.50 110.50 F=0.04; p=0.84 6 × 6 129.20 156.55 F=5.36; p=0.03 6 × 7 177.60 192.40 F=0.90; p=0.35 6 × 8 205.00 230.50 F=1.45; p=0.24 Indonesian J Elec Eng & Comp Sci, V ol. 39, No. 1, July 2025: 310–321 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 315 T able 4. Summarized results for modality comparison based on mean ips numbers in each resolution (females) Audio V ision ANO V A 5 × 2 19.90 18.40 F=0.61; p=0.44 3 × 4 24.50 21.50 F=2.78; p=0.10 4 × 4 47.30 38.10 F=2.25; p=0.14 4 × 5 46.30 50.50 F=0.74; p=0.39 4 × 6 68.10 67.15 F=0.02; p=0.89 6 × 5 90.80 88.25 F=0.05; p=0.83 6 × 6 122.60 118.45 F=0.06; p=0.81 6 × 7 157.30 146.40 F=0.24; p=0.62 6 × 8 197.90 179.70 F=0.52; p=0.47 3.4. Results in each r esolution As e xpected, when comparing visual icons, there w as no signicant dif ference in an y of t he res olutions among the iconic representations, neither in time nor in ip numbers. Ho we v er , ip numbers sho w that audi- tory samples may be recalled dif ferently depending on the type and number of concurrent items (resolution). Findings will be discussed in section 4.3. In the 5 × 2 resolution, there were no signicant dif ferences in time cost and ips between males and females for audio mode. In visual mode, time costs were the same, b ut females performed signicantly better in ips. When comparing the v e sound samples (combining female and male data) based on mean ip numbers, no dif ferences were found among them. At the 3 × 4 and 4 × 4 resolutions, there w as no dif ference between the genders in either audio or visual mode, for both time cost and ips. Similarly , when comparing the six and eight sound samples, respecti v ely , there were no dif ferences among them. At 4 × 5 resolution, female subjects performed signicantly better in audio mode for both ti me cost and ip number , while in visual mode, the dif ference w as signicant only for ip number . Additionally , there w as a signicant dif ference among the ten sound samples. In the 4 × 6 resolution, there were no dif ferences between the genders in either audio or visual mode, for both time cost and ips. Ho we v er , there w as a signicant dif ference among the 12 sound samples. Results for the highest resolutions (6 × 5, 6 × 6, 6 × 7, and 6 × 8) sho wed no dif ference between genders in audio mode for either time cost or ips. Ho we v er , in vision mode, there w as a signicant dif ference in ip numbers, with females requiring fe wer ips. When comparing the sound samples, signicant dif ferences were observ ed among them, e xcept for 6 × 5, although this may also be considered an outlier . 4. DISCUSSION This section analyzes and discusses the results from the pre vious section. The e v aluation is based on gender , modality , type of stimuli, and memory capacity (resolution). 4.1. Gender Comparison of genders can be made based on T able 1 and T able 2. In audio mode, there were no dif ferences in time and ips. Interestingly , females performed better in visual mode re g arding ip numbers, especially at higher resolutions. The only e xception w as 4 × 5, which we consider an outlier , as it is unlik ely to be signicantly dif ferent from 4 × 4 and 4 × 6. Early psychological studies did not aim to e xplore gender dif- ferences, and re vie ws suggest that neither se x can be said to ha v e a better memory per se; rather the tw o se x es dif fer in terms of what type of information the y remember best. V ariations in memory performance between men and w omen may be due to their ph ysiological capabil ities, their interest, their e xpectations, or some com- ple x interaction of these f actors [49]. A present meta-analysis aimed to quantify gender dif ferences in v erbal w orking memory sho wed that gender dif ferences dif fered across tasks [50]. Although it has been commonly held that males sho w an adv antage on spatial tasks, and females on v erbal tasks, there is ne w e vidence that gen- der dif ferences are more widespread, and female v erbal adv antage e xtends into numerous tasks, with a small b ut signicant adv antage may e xist f o r general episodic memory [51], [52]. Recognition-memory tests also re v ealed indi vidual dif ferences in visual episodic memory . In an e xperiment, females outperformed males on f ace recognition-m emory tests, and this adv antage w as related to females’ scanning beha vior [53]. Although in our e xperiment the icons in the g ame were spatially aligned and higher resolutions were l ar ger in size than Short-term r ecall comparison of iconic auditory and visual feedbac k ... (Gy ¨ or gy W er s ´ enyi) Evaluation Warning : The document was created with Spire.PDF for Python.
316 ISSN: 2502-4752 smaller ones, spatial attrib utes did not play a signicant role. W e speculate that the better results in the visual task may be attrib uted to the scanning and g aming strate gies emplo yed by females. Re g arding auditory memory , a recent study compared 30 young females and 30 males in a s h or t-term memory test. Females performed better in the visual task, and visual memory w as sho wn to be superior to auditory memory for both genders [54]. W e can support the rst observ ation, b ut we ha v e found no dif ference between the modalities. A similar study also concluded that females perform better in visual task [55]. Another study tar geting gender and age group dif ferences in episodic memory in v olv ed a v ery lar ge sample of 366 fe- males and 330 males. W omen outperformed men on auditory memory tasks, whereas male adolescents sho wed higher le v el perfor mance on visual episodic and visual w orking memory measures [56]. As our observ ations did not support these results, we can still speculate that the initial conditions of the tests play a signicant role. F ormer results partly support a declining performance on episodic memory and visual w orking mem- ory measures with increasing age [56]. In our e xperiment, there w as no e v aluation based on the age of the subjects. All participants were relati v ely young, e xcept for one outlier , a 50-year -old female, whose results in audio mode signicantly dif fered from the means both for time and ips. Otherwise, we did not nd outliers in the groups. Generally , on smaller resolutions, indi vidual dif ferences may be signicant. Our pre vious e xperiment with this setup indicated that younger subjects produce better results [36]. Ho we v er , in both e xperiments, the selection criteria were not suitable for a correct age comparison or for conclusi v e results. It is suggested to design e xperiments specically to test the ef fect of age, as it appears to be an important f actor . From an engineering point of vie w , gender does not appear to play a signicant role in the design and de v elopment procedure of applications where episodic memory is important. 4.2. Modalities The time cost for visual g ameplays w as al w ays signicantly lo wer than for audio mode, b ut this is due to the presentation method and not the cogni ti v e functions of the subjects in this case. V isual icons are re v ealed immediately after clicking a card, whereas playback of audio samples tak es se v eral seconds each. Although it is not required to w ait until the sound sample is nished, subjects usually w aited until the end. T o mak e a correct comparison, a delay should be inserted in visual mode to correct for timing irre gularities. Ho we v er , this kind of comparison w ould not be v ery meaningful. In f act, a parallel in v estig ation that included a mix ed mode (audio and visual combined) re v ealed that the completion time in this case lies between audio-only and visual- only modes, as subjects tak e some time to reconsider the posit ion of the visual icons during audio playback. Moreo v er , this combined audio visual presentation seemed to decrease the mean number of ips as well. Man y pre vious e xperiments ha v e sho wn visual memory to outperform auditory memory [31], [57]– [60]. Also, the studies mentioned in the gender section generally support this observ ation. Ho we v er , some other papers ha v e reported that there is no dif ference between them [61], [62]. Scores could e v en be better when processed through the auditory modality , such as for children [63], [64]. Comparing visual and auditory modalities in our e xperiment, there w as no signicant dif ference in ip numbers for males (F=1.47, p=0.23), and the same holds for females, with the dif ference also not being signicant (F=0.48, p=0.49). T ables 3 and 4 corroborate this observ ati on , with one e xception: the 6x6 resolution for males sho wed a some what signicant dif ference. Our results indicate no signicant dif ference between the visual and auditory modalities for ip num- bers in this g ame, re g ardless of the number of item s (ranging from 10 to 24) or gender . This nding is important from an engineering perspecti v e, as application de v elopers can reliably use audio information if short-term re- call is important. The reason and parameters for achie ving results with audio that are as good as those with visual stimuli remain an open question, and further e xperiments should be carefully designed and conducted. 4.3. Sound comparison Figure 2 introduced the sound samples used in the e xperiment, presented in the order of appearance with increasing le v els. The rst ten s amples comprise measurement signals and male and female v oice samples. F ollo wing these, V iolin1 and Guitar1 are the rst auditory icons, introduced at the 4 × 6 le v el. Subsequently , the sound of a “kiss” w as added e xclusi v ely at the highest le v el, 6 × 8. Originally intended as an auditory icon, it w as disco v ered to be more akin to a “human sound, more closely related to the v oice samples. T able 5 presents the summarized ndings for all resolutions i n a simplied form, indicating whether there w as a signicant dif ference among the sounds according to the mean ip numbers of the indi vidual sound sample. The second column denotes the number of dif ferences identied through all possible paired t-tests Indonesian J Elec Eng & Comp Sci, V ol. 39, No. 1, July 2025: 310–321 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 317 during the T uk e y post-hoc analysis. The results indicate that up to 8 sound pairs, there w as no discernible dif- ference between the sound samples, incl u di ng the male v oice sample. Ho we v er , the introduction of the female sample in the 4 × 5 resolution resulted in signicantly better performance for this particular sample (observ ed 3 times). As additional samples, including dif ferent auditory icons, were introduced, some emer ged as signif- icantly better recalled than others . These include female and male v oice samples, kiss sound, and in certain cases, to y train, whistle (also closely resembling human sounds), and phone ringing. Although no clear pattern emer ged among the a u di tory icons, human sounds were generally f a v ored and better recalled than other sounds. Notably , the 6 × 5 resolution e xhibited no signicant dif ferences, b ut we suspect this may be an outlier . T able 5. Results of the ANO V A and T uk e y test sho wing ho w man y times a paired t-test re v ealed signicant dif ference (fe wer ip number) Signicant dif ference Dif ferences in paired t-tests Number of sound pairs Resolution No 0 5 5 × 2 No 0 6 3 × 4 No 0 8 4 × 4 Y es 3 10 4 × 5 Y es 4 12 4 × 6 No 0 15 6 × 5 Y es 6 18 6 × 6 Y es 1 21 6 × 7 Y es 16 24 6 × 8 A former e xperiment incorporated tw o sets of visual icons and their auditory counterparts only in a 3 × 5 resolution [65]. Sound stimuli consisted of auditory icons and earc on s . The results sho wed that the participants made f aster and more correct matches between visual icons and auditory icons than between visual icons and earcons. W e support former ndings that f amiliar natural sounds are better recalled [65], [66]. In the case of auditory icons, the recall process may also depend on the task, and the amount of spectral–temporal structure in a sound can be indicati v e for memory performance [67]. Standardized measurement signals allo w for easy comparison across repeated e xperiments. On the other hand, auditory icons and earcons can v ary signicantly , e v en when con v e ying the same semantic meaning (e.g., guitar , phone ringing). This v ariability may result in greater dif ferences in results when using dif ferent sound samples. Speech and human sound samples represent an intermediate solution. Generally , our ndings support the idea of using iconic human sound samples and auditory icons, as the y are better recalled than unf amiliar and unpleasant articial measurement signals. Furthermore, our results indicate that there are no signicant dif f erences e v en between similar s ou nds , such as pink noise-white noise, 1 kHz sinus-1 kHz square, and 1 kHz sinus-5 kHz sinus. Although some subjects reported confusion with these sounds during informal feedback after the e xperiment, statistical analysis did not support this speculation. As mentioned pre viously , no dif ference w as observ ed in the visual mode, as the iconic representation w as intentionally designed to be similar , such as a v oiding the use of colors or dif ferent sizes. From an engineering perspecti v e, e v en short-term recall of iconic auditory e v ents can be impro v ed by using human-related and f amiliar e v eryday sound samples. Articial sounds can be emplo yed when necessary , such as for alarm sounds, neutral notications, or when meaningful sounds might cause confusion. 4.4. Memory capacity and limitations The short-term memory capacity has been e xtensi v ely studied, particularly in psychology , neurology , and cogniti v e sciences, with a primary focus on visual and/or speech memory . In visual scenarios, the recall capacity w as found to be inuenced by the comple xity of items, with simpler objects being easier to remember [68]. It w as also suggested that the limited capacity of short-term memory could be a consequence of ef cienc y of design, with an ef fecti v e upper limit of about 5 to 9 items [69]. Our results align with these, as error rates and dif ferences among the auditory icons increased after resolution 4 × 5 (10 pairs). Informal feedback from the subjects also supported this nding, as the y reported that the g ame w as relati v ely easy with 5-8 pairs in both modalities. The g ame includes a b uilt-in re w ard system to moti v ate players. If a player completes a g ame without an y errors, the y recei v e a “perfect g ame” feedback. Only at the lo west resolutions (up to 8 pairs) were players able to achie v e this. Short-term r ecall comparison of iconic auditory and visual feedbac k ... (Gy ¨ or gy W er s ´ enyi) Evaluation Warning : The document was created with Spire.PDF for Python.
318 ISSN: 2502-4752 Although some pre vious studies suggested a precise capacity limit of three to v e chunks, a re vie w article presented a range of data on dif ferent capacity limits. It w as proposed that a more accurate limit might be around four chunks [27], [32], [70]. Our results suggest a higher number around 8. F or auditory e v ents, fe wer results are a v ailable. An o v ervie w w as presented on ho w auditory memory functions, with a focus on ho w attention inuences outcomes [26]. In engineering, audio visual memory capacity plays an important role. Our results suggest that both auditory and visual repres entations can be ef fecti v ely recalled in the short term for up to 8-10 items. In addition, training w orking memory has been found to generally enhance its capacity [71]. This highlights the importance of e xperience and a-priori training. Further in v estig ations could focus on the ef fects of such training. 5. CONCLUSION 40 subjects participated in a g amied e xperiment focusing on short-term audio visual memory . Sub- jects played a f amiliar memory g ame in both visual-only and audio-only modes, incorporating iconic visual and auditory representations in nine dif ferent resolutions ranging from 5 × 2 to 6 × 8. Results indicated no sig- nicant dif ference between the visual and auditory modalities based on the number of ips. The superiority in the results for visual presentation in the completion time w as due to the presentation method. During visual presentation, the mean ip number of female subjects w as less than for male subjects only if the number of pairs e xceeded 15 (6 × 5). There w as no dif ference in the audio mode. Gender did not appear to be a signicant parameter . Measurement signals, human sounds, and auditory icons were e xamined based on mean time cost and ip numbers. Ev aluation of the sound samples indicated that human sounds can be recalled the best, follo wed by auditory icons. This supports former ndings about the importance of f amiliarity and semantic content of iconic sound samples during designing auditory displays and feedback solutions (i.e., for assisti v e technology , augmented reality/virtual reality (AR/VR) en vironments, and simulators). The results can be sensiti v e to initial parameters such as the age of the participants, the duration of the e xperiment (including the ef fects of training and f atigue), and the selection criteria of auditory icons. Future w ork will address open questi ons about the signicance of the subjects’ age, the impact of e xperience, and the usability of cro wdsourcing solutions for big data e v aluation. FUNDING INFORMA TION Authors state no funding in v olv ed. A UTHOR CONTRIB UTIONS ST A TEMENT Name of A uthor C M So V a F o I R D O E V i Su P Fu Gy ¨ or gy W ers ´ en yi ´ Ad ´ am Csap ´ o J ´ ozsef T oll ´ ar C : C onceptualization I : I n v estig ation V i : V i sualization M : M ethodology R : R esources Su : Su pervision So : So ftw are D : D ata Curation P : P roject Administration V a : V a lidation O : Writing - O riginal Draft Fu : Fu nding Acquisition F o : F o rmal Analysis E : Writing - Re vie w & E diting CONFLICT OF INTEREST ST A TEMENT Authors state no conict of interest. D A T A A V AILABILITY The data that support the ndings of this study are a v ailable from the corresponding author , Gy .W ., upon reasonable request. Indonesian J Elec Eng & Comp Sci, V ol. 39, No. 1, July 2025: 310–321 Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian J Elec Eng & Comp Sci ISSN: 2502-4752 319 REFERENCES [1] F . Bellot ti, B. Kapralos, K. Lee, P . Moreno-Ger , and R. Berta, Assessm ent in and of serious g ames: an o v ervie w , Advances in Human-Computer Inter action , v ol. 2013, p. 1, 2013, doi: 10.1155/2013/136864. [2] J. B. hauge et al. , “serious g ame mechanics and opportunities for reuse, in 11th International Confer ence eLearning and Softwar e for Education , Apr . 2015, v ol. 2, pp. 19–27, doi: 10.12753/2066-026X-15-094. [3] A. Dimitriadou, N. Djaf aro v a, O. T uretk en, M. V erkuyl, and A. Ferw orn, “Challenges in serious g ame design and de v elopment: educators’ e xperiences, Simulation and Gaming , v ol. 52, no. 2, pp. 132–152, 2021, doi: 10.1177/1046878120944197. [4] A. C. T . Klock, I. Gasparini, M . S. Pimenta, and J. Hamari, “T ailored g amication: a re vie w of literature, International J ournal of Human-Computer Studies , v ol. 144, p. 102495, 2020. [5] A. Rapp, F . Hopfg artner , J. Hamari, C. Linehan, and F . Cena, “Strengthening g amication studies: current trends and fu- ture opportunities of g amication research, International J ournal of Human Computer Studies , v ol. 127, pp. 1–6, 2019, doi: 10.1016/j.ijhcs.2018.11.007. [6] D. Djaout i, J. Alv arez, and J.-P . Jessel, “Classifying serious g ames: the G/P/S model, in Handbook of r esear c h on impr o ving learning and motivation thr ough educational games: Multidisciplinary appr oac hes , IGI global, 2011, pp. 118–136. [7] S. Deterding, S. L. Bj ¨ ork, L. E. Nack e, D. Dixon, and E. La wle y , “Designing g amication, in CHI ’13 Extended Abstr acts on Human F actor s in Computing Systems , Apr . 2013, v ol. 2013-April, pp. 3263–3266, doi: 10.1145/2468356.2479662. [8] N. Co w an, “What are the dif ferences between long-term, short-term, and w orking memory?, Pr o gr ess in Br ain Resear c h , v ol. 169, pp. 323–338, 2008, doi: 10.1016/S0079-6123(07)00020-9. [9] D. Norris, “Short-term memory and long-term memory are still dif ferent, Psyc holo gical Bulletin , v ol. 143, no. 9, pp. 992–1009, 2017, doi: 10.1037/b ul0000108. [10] D. Burr and D. Alais, “Chapter 14 combining visual and auditory information, Pr o gr ess in Br ain Resear c h , v ol. 155 B, pp. 243–258, 2006, doi: 10.1016/S0079-6123(06)55014-9. [11] M. K ubo vy and D. V an V alk enb ur g, Auditory and visual objects, Co gnition , v ol. 80, no. 1–2, pp. 97–126, Jun. 2001, doi: 10.1016/S0010-0277(00)00155-4. [12] W . J. Chai, A. I. Abd Hamid, and J. M. Abdullah, “W orking memory from the psychological and neurosciences perspecti v es: a re vie w , F r ontier s in Psyc holo gy , v ol. 9, no. MAR, p. 401, 2018, doi: 10.3389/fpsyg.2018.00401. [13] P . K elle y , M. D. R. Ev ans, and J. K elle y , “M aking memories: wh y time matters, F r ontier s in Human Neur oscience , v ol. 12, p. 400, 2018, doi: 10.3389/fnhum.2018.00400. [14] M . C. Potter , “V ery short-ter m conceptual memory , Memory & Co gnition , v ol. 21, no. 2, pp. 156–161, 1993, doi: 10.3758/BF03202727. [15] S . J. Luck and E. K. V ogel, “The capacity of visual w orking memory for features and conjunctions, Natur e , v ol. 390, no. 6657, pp. 279–284, 1997, doi: 10.1038/36846. [16] G. Alv arez and P . Ca v anagh, “The capacity of visual short-term memory is set by total informational load, not number of objects, J ournal of V ision , v ol. 2, no. 7, pp. 106–111, 2002, doi: 10.1167/2.7.273. [17] T . F . Brady and G. A. Alv arez, “No e vidence for a x ed object limit in w orking memory: spatial ensembl e representations inate es- timates of w orking memory capacity for comple x objects, J ournal of Experimental Psyc holo gy: Learning Memory and Co gnition , v ol. 41, no. 3, pp. 921–929, 2015, doi: 10.1037/xlm0000075. [18] K. O. Hardman and N. Co w an, “Remembering comple x objects in visual w orking memory: do capacity limits restrict objects or fea tures?, J ournal of Experimental Psyc holo gy: Learning Memory and Co gnition , v ol. 41, no. 2, pp. 325–347, 2015, doi: 10.1037/xlm0000031. [19] K. Fukuda, E. A wh, and E. K. V ogel, “Discrete capacity limits in visual w orking memory , Curr ent Opinion in Neur obiolo gy , v ol. 20, no. 2, pp. 177–182, 2010, doi: 10.1016/j.conb .2010.03.005. [20] M. W . Schur gin, “V isual mem ory , the long and the short of it: a re vie w of visual w orking memory and long-term memory , Attention, P er ception, and Psyc hophysics , v ol. 80, no. 5, pp. 1035–1056, 2018, doi: 10.3758/s13414-018-1522-y . [21] P . W ilk en and W . J. Ma, A detection theory account of change detection, J ournal of V isi on , v ol. 4, no. 12, pp. 1120–1135, 2004, doi: 10.1167/4.12.11. [22] T . F . Brady , T . K onkle, and G. A. Alv arez, A re vie w of visual memory capacity: be yond indi vidual items and to w ard structured representations, J ournal of V ision , v ol. 11, no. 5, pp. 1–34, 2011, doi: 10.1167/11.5.1. [23] S. McAdams and E. Big and, Thinking in sound: the co gnitive psyc holo gy of human audition . Oxford Uni v ersity Press, 1993. [24] W . Ritter , D. Deacon, H. Gomes, D. C. Ja vitt, and H. G. V aughan, “The mismatch ne g ati vity of e v ent-related potentials as a probe of transient auditory memory: a re vie w , Ear and Hearing , v ol. 16, no. 1, pp. 52–67, 1995, doi: 10.1097/00003446-199502000-00005. [25] J. Kaiser , “Dynamics of auditory w orking memory , F r ontier s in Psyc holo gy , v ol. 6, no. May , p. 613, 2015, doi: 10.3389/fp- syg.2015.00613. [26] J. F . Zimmermann, M. Mosco vitch, and C. Alain, Attending to auditory memory , Br ain Resear c h , v ol. 1640, pp. 208–221, 2016, doi: 10.1016/j.brainres.2015.11.032. [27] N. Co w an, “The magical number 4 in short-term memory: a reconsideration of mental storage capacity , Behavior al and Br ain Sciences , v ol. 24, no. 1, pp. 87–114, 2001, doi: 10.1017/S0140525X01003922. [28] D. L. Nelson, V . S. Reed, and J. R. W alling, “Pictorial superiority ef fect, J ournal of e xperi mental psyc holo gy: Human learning and memory , v ol. 2, no. 5, pp. 523–528, 1976, doi: 10.1037/0278-7393.2.5.523. [29] K. C. Back er and C. Alain, Att ention to memory: orienting attention to sound object representations, Psyc holo gical Resear c h , v ol. 78, no. 3, pp. 439–452, 2014, doi: 10.1007/s00426-013-0531-7. [30] J. L. Burt, D. S. Bartolome, D. W . Burdette, and J. R. Comstock Jr , A psychoph ysiological e v al uation of the percei v ed ur genc y of auditory w arning signals, Er gonomics , v ol. 38, no. 11, pp. 2327–2340, 1995. [31] M. A. Cohen, T . S. Horo witz, and J. M. W olfe, Auditory recognition memory is inferior to visual recognition memory , Pr o- ceedings of the National Academy of Sciences of the United States of America , v ol. 106, no. 14, pp. 6008–6010, 2009, doi: 10.1073/pnas.0811884106. [32] N. Co w an, “V isual and auditory w orking memory capacity , T r ends in Co gnitive Sciences , v ol. 2, no. 3, pp. 77–78, 1998, doi: 10.1016/S1364-6613(98)01144-9. Short-term r ecall comparison of iconic auditory and visual feedbac k ... (Gy ¨ or gy W er s ´ enyi) Evaluation Warning : The document was created with Spire.PDF for Python.