Indonesian
J
our
nal
of
Electrical
Engineering
and
Computer
Science
V
ol.
39,
No.
1,
July
2025,
pp.
310
∼
321
ISSN:
2502-4752,
DOI:
10.11591/ijeecs.v39.i1.pp310-321
❒
310
Short-term
r
ecall
comparison
of
iconic
auditory
and
visual
feedback
stimuli
in
a
memory
game
Gy
¨
or
gy
W
ers
´
enyi
1
,
´
Ad
´
am
Csap
´
o
2,3
,
J
´
ozsef
T
oll
´
ar
4,5
1
Department
of
T
elecommunications,
Sz
´
echen
yi
Istv
´
an
Uni
v
ersity
(SZE),
Gy
¨
or
,
Hung
ary
2
Institute
for
Adv
anced
Studies
Corvinus
Uni
v
ersity
of
Budapest,
Budapest,
Hung
ary
3
Institute
of
Data
Analytics
and
Information
Systems,
Corvinus
Uni
v
ersity
of
Budapest,
Budapest,
Hung
ary
4
Digital
De
v
elopment
Center
,
Sz
´
echen
yi
Istv
´
an
Uni
v
ersity
(SZE),
Gy
¨
or
,
Hung
ary
5
Somogy
County
Kaposi
M
´
or
T
eaching
Hospital,
Kaposv
´
ar
,
Hung
ary
Article
Inf
o
Article
history:
Recei
v
ed
May
28,
2024
Re
vised
No
v
6,
2025
Accepted
Mar
25,
2025
K
eyw
ords:
Audio
visual
memory
Auditory
icon
Human-computer
Interaction
Serious
g
aming
Sound
design
ABSTRA
CT
Multimedia
user
interf
aces
incorporate
v
a
rious
feedback
methods
using
dif
ferent
modalities.
Cogniti
v
e
processing
of
audio
visual
information
requires
the
ability
to
recall
visual
and
auditory
information,
either
separately
,
or
in
combination.
Short-term
memory
capabilities
v
ary
indi
vidually
and
depend
on
f
actors
such
as
signal
presentation
and
the
number
and
type
of
visual
and
auditory
items.
In
an
e
xperiment
in
v
olving
40
subjects,
we
aimed
to
compare
short-term
auditory
and
visual
capabilities
in
a
serious
g
ame
application.
Subjects
played
the
‘P
airs’
g
ame
at
dif
ferent
resolutions,
using
either
visual
icons
or
audio
samples,
while
the
total
time
cost
and
number
of
ips
were
recorded.
The
results
indicate
that
visual
memory
is
not
superior
,
and
female
subjects
performed
better
than
males
at
higher
le
v
els
in
the
visual
task.
Additionally
,
human
sound
samples,
speech
and
f
amiliar
auditory
icons
were
found
to
be
easier
to
recall
than
articial
mea-
surement
signals.
This
is
an
open
access
article
under
the
CC
BY
-SA
license
.
Corresponding
A
uthor:
Gy
¨
or
gy
W
ers
´
en
yi
Department
of
T
elecommunications,
Sz
´
echen
yi
Istv
´
an
Uni
v
ersity
Gy
¨
or
,
H-9026,
Hung
ary
Email:
wersen
yi@sze.hu
1.
INTR
ODUCTION
Augmented
and
virtual
reality
solutions,
assisti
v
e
technology
applications,
virtual
audio
displays
(V
AD),
g
ames,
and
simulators
are
just
some
of
the
emer
ging
elds
where
feedback
is
based
on
audio
visual
information.
Users
often
need
to
recall
the
visual
and/or
auditory
representations
of
specic
e
v
ents
on
the
screen
and
recall
their
meaning,
and
sometimes
e
v
en
their
spatial
location.
Usabilit
y
of
the
multimedia
in-
terf
ace
v
aries
depending
on
the
number
of
e
v
ents,
user
e
xperience,
and
cogniti
v
e
capabilities.
It
is
essential
to
remember
the
meaning
behind
a
gi
v
en
representation.
This
cogniti
v
e
process
in
v
olv
es
the
utilization
of
both
visual
and
auditory
memory
in
the
brain,
both
in
the
long-t
erm
and
short-term.
Early
e
xperiments
in
psychology
did
not
incorporate
computer
-based
methods.
De
v
elopments
in
technology
later
allo
wed
for
using
computers
both
for
e
xperimenting
and
for
data
collection
and
e
v
aluation.
In
addition,
computer
g
ames
e
v
olv
ed
and
introduced
a
v
ariety
of
audio
visual
i
nformation
for
entertainment
purposes.
Recently
,
the
need
for
com-
bining
entertainment
and
e
xperimental
data
collection
in
v
olving
human
subjects
emer
ged.
Serious
g
aming,
or
g
amication,
is
a
method
used
to
collect
scientic
data
through
a
g
aming
scenario.
A
well-designed
g
ame
can
enhance
the
user
e
xperience,
maintain
and
increase
moti
v
ation,
while
also
allo
wing
for
the
analysis
of
results
J
ournal
homepage:
http://ijeecs.iaescor
e
.com
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
❒
311
with
scientic
merit.
Using
g
amication,
scientic
e
xperiments
can
be
designed
and
e
x
ecuted
to
collect
data
in
an
entertaining
and
moti
v
ating
process
for
an
y
age
or
gender
groups
[1]–[6].
Subjects
ha
v
e
a
limited
capacity
to
recall
information
and
w
orking
memory
plays
a
k
e
y
role
in
t
his
process.
The
terms
“w
orking
memory”
and
“short-term
memory”
are
often
used
interchangeably
[7]–[11].
The
y
both
refer
to
immediate
conscious
perceptual
and
linguistic
processing
for
a
limited
amount
of
informa-
tion
and
time.
During
this
acti
v
e
process,
temporarily
stored
audio
and/or
visual
information
can
be
accessed
and
manipulated.
The
storage
time
for
short-term
is
generally
around
20-30
seconds
or
e
v
en
less
[12]–[14].
Long-term
memory
dif
fers
from
short-term
memory
primarily
in
terms
of
duration
b
ut
also
in
capacity
[8].
The
most
important
property
of
w
orking
memory
is
the
limite
d
capacity
.
It
w
as
demonstrated
that
the
visual
w
orking
memory
can
store
3-4
objects
[15]–[20].
Ho
we
v
er
,
a
lar
ger
number
of
objects
can
also
be
recalled
with
v
arying
precision,
and
there
are
indi
vidual
dif
ferences
and
lar
ge
v
ar
iability
in
repeated
measurements
[21],
[22].
In
the
case
of
auditory
memory
,
most
studies
ha
v
e
focused
on
the
short-term
ef
fects;
ho
we
v
er
,
com-
parisons
with
long-term
ef
fects
ha
v
e
also
been
made
[23]–[26].
Capacity
limits
here
were
also
suggested
to
be
around
“se
v
en
plus
or
minus
tw
o”
[27].
The
results
contrasting
the
abilities
of
the
audio
and
visual
modalities
ha
v
e
not
been
conclusi
v
e.
Most
studies
ha
v
e
sho
wn
superior
visual
performance
[28]–[34].
Ho
we
v
er
,
some
e
xperiments
ha
v
e
found
similar
memory
performance
[35],
[36].
V
ariability
in
former
results
and
outcomes
could
be
attrib
uted
to
the
sensiti
vity
of
the
e
xperiments
to
initial
parameters.
Auditory
information
can
also
be
presented
alongside
visual
informati
on
in
a
mix
ed
mode.
Memory
performance
has
been
demonstrated
to
be
better
f
o
r
semantically
congruent
stimuli
presented
together
in
dif
ferent
modalities
compared
to
stimuli
presented
with
an
incongruent
or
non-semantic
stimulus
across
modalities
[37]–[41].
Semantically
congruent
v
erbal
and
non-v
erbal
visual
stimuli
presented
in
tandem
with
auditory
counterparts
can
enhance
the
precision
of
auditory
encoding.
Semantically
congruent
presentation,
where
the
iconic
representation
is
easily
link
ed
to
its
meaning,
generally
aids
in
this
process.
Better
perf
o
r
mance
can
be
achie
v
ed
with
meaningful
stimuli
and
cogniti
v
e
training
[42]–[46].
In
particular
,
human
sounds
were
sho
wn
to
be
detected
better
,
especially
in
the
case
of
speech
and
human-generated
v
ocal
sounds
[47],
[48].
Although
most
pre
vious
w
orks
suggest
other
wise,
there
is
no
e
vident
consensus
on
the
superiority
of
visual
memory
,
especially
in
short-term
recall
tasks.
In
the
case
of
visually
impaired
indi
viduals,
the
processing
of
audit
ory
information
can
be
e
v
en
more
enhanced.
The
y
are
the
most
important
tar
get
group
in
the
de
v
elop-
ment
of
assisti
v
e
technology
,
where
auditory
memory
plays
an
e
v
en
more
signicant
role.
Furthermore,
sound
design
and
sonication
approaches
constantly
deal
with
the
problem
of
the
proper
selection
and
optimization
of
auditory
e
v
ents
for
feedback.
The
results
can
be
v
ery
sensiti
v
e
to
the
age,
gender
,
or
e
xperience
of
the
subjects;
thus,
a
lar
ger
number
of
participants
is
required.
This
number
should
generally
e
xceed
30,
a
requirement
that
is
seldom
met.
Exhausti
v
e
laboratory
procedures
can
be
demanding,
especially
for
the
subjects;
therefore,
a
g
amication
approach
with
a
f
amiliar
g
ame
design
can
enhance
the
reliability
of
the
data.
An
application
with
the
possibility
to
set
the
number
of
items
to
be
recalled
from
“v
ery
easy”
to
“v
ery
dif
cult”
can
also
highlight
the
limitations
in
capacity
,
and
determine
if
there
is
a
trade-of
f
limit
i
n
cogniti
v
e
processing.
The
purpose
of
our
e
xperiment
is
to
test
dif
ferences
between
m
od
a
lities,
genders,
limits,
and
types
of
stimuli
in
a
short-term
recall
task
of
information.
This
paper
presents
an
e
xperiment
in
v
olving
unt
rained
subjects
using
a
serious
g
ame
application
based
on
the
“P
airs”
memory
g
ame
in
both
visual
and
auditory
modes,
across
v
arious
resolutions.
Section
2
describes
the
measurement
setup,
including
the
softw
are
implementation,
the
e
xperimental
procedure,
and
data
e
v
aluation
methods.
Section
3
presents
results
based
on
statistical
analysis.
Outcomes
will
be
discussed
based
on
the
results
in
section
4,
follo
wed
by
the
nal
conclusions.
2.
MEASUREMENT
SETUP
First,
the
softw
are
en
vironment,
including
the
g
ame
and
the
data
collection
module,
w
as
designed,
programmed,
and
tested.
F
ollo
wing
this,
the
measurement
procedure
(data
collection
and
e
v
aluation)
and
the
applied
methods
were
determined.
Finally
,
the
recruitment
of
subjects
and
the
laboratory
setup
were
completed.
The
memory
g
ame
“P
airs”
w
as
selected
for
the
e
xperiment.
In
this
g
ame,
players
ip
cards
to
match
pairs.
The
f
amiliar
and
simple
g
ameplay
,
as
well
as
the
easy
implementation
of
dif
ferent
modalities
(audio
and/or
visual),
were
the
most
important
f
actors
in
the
decision.
Furthermore,
this
type
of
g
ame
eng
ages
the
players’
short-term
memory
.
Short-term
r
ecall
comparison
of
iconic
auditory
and
visual
feedbac
k
...
(Gy
¨
or
gy
W
er
s
´
enyi)
Evaluation Warning : The document was created with Spire.PDF for Python.
312
❒
ISSN:
2502-4752
The
GUI
is
simply
or
g
anized.
Figure
1
sho
ws
tw
o
screenshots
of
the
g
ame.
Upon
initialization,
the
user
or
the
e
xper
imenter
enters
user
rele
v
ant
data
(ID,
gender
,
and
age)
and
selects
the
modality
and
resolution
(number
of
pairs).
Each
le
v
el
with
a
higher
resolution
includes
all
pairs
from
the
pre
vious
le
v
el;
for
e
xample,
all
5
pairs
in
the
5
×
2
resolution
are
included
in
all
subsequent
resolutions.
In
the
visual
mode,
black-and-white
icons
were
displayed,
while
in
the
audio
mode,
short,
iconic
sound
samples
were
played
back.
Figure
2
illustrates
all
the
a
v
ailable
icons
and
their
corresponding
auditory
e
v
ents.
The
icons
were
designed
to
represent
the
semantic
meaning
of
t
he
sound
sam
p
l
es
while
k
eeping
them
v
ery
simple.
Auditory
samples
were
do
wnloaded
from
public
databases
or
recorded
and
then
modied
(e.g.,
adjusting
sound
le
v
els,
cutting,
and
shortening).
These
samples
were
selected
to
represent
dif
ferent
sound
types,
such
as
human-related
sounds,
e
v
eryday
sounds,
and
meaningless
sound
e
v
ents
(acoustic
measurement
signals).
Upon
starting
the
g
ame,
icons
or
audio
samples
are
randomized.
In
both
modalities,
the
corresponding
visual
icon
is
re
v
ealed
after
successfully
matching
a
pair
.
If
there
are
10
seconds
of
inacti
vity
,
the
g
ame
will
be
aborted
without
sa
ving
the
data.
A
more
detailed
description
of
the
coding
procedure
can
be
found
in
[36].
Figure
1.
Screenshots
of
the
g
ame.
Initial
screen
(left)
and
an
ongoing
g
ame
in
4
×
4
resolution
Figure
2.
All
visual
icons
and
the
corresponding
auditory
samples
in
the
highest
resolution
(6
×
8).
Green
color
indicates
“articial
measurement
signals”,
yello
w
represents
“human
sounds”
and
white
signies
“auditory
icons
or
earcons”
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
39,
No.
1,
July
2025:
310–321
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
❒
313
T
otal
number
of
ips,
total
g
ame
time
and
i
p
number
for
ea
ch
pair
were
recorded
and
stored
using
the
e
xible
JSON
le
format.
F
or
e
v
aluation
of
the
res
u
l
ts,
the
JSON
les
were
imported
into
Excel.
Statistical
e
v
aluation
of
the
results
w
as
performed
using
the
Excel
Solv
er
,
including
paired
t-tests
and
ANO
V
A,
follo
wed
by
T
uk
e
y
post-hoc
analysis
at
the
0.05
signicance
le
v
el.
In
the
e
xperiment
40
subjects
participated,
20
males
(age
18-43,
mean
20.50
years;
standard
de
via
tion
(SD)
6.24)
and
20
females
(age
18-50,
mean
27.85;
SD
11.80).
The
subjects
we
re
seated
in
a
quiet
laboratory
room
and
used
a
standard
laptop
computer
with
b
uilt-in
speak
ers
that
the
y
controlled
with
a
mouse.
After
e
x-
plaining
the
purpose
of
the
e
xperiment,
subjects
eng
aged
in
playing
the
g
ame.
During
the
process,
subje
cts
rst
played
t
he
visual
g
ame,
start
ing
with
the
smallest
resolution
(5
×
2),
and
then
progressed
to
higher
resolutions
(up
to
6
×
8).
F
ollo
wing
a
short
break,
the
same
procedure
w
as
repeated
in
the
audio
modality
.
Subjects
were
encouraged
to
minimize
their
error
rate
(number
of
ips)
b
ut
could
choose
an
y
g
aming
strate
gy
and
speed.
The
g
ame
is
currently
not
a
v
ailable
to
the
general
public,
as
further
e
xperiments
are
ongoing.
Ho
we
v
er
,
after
completing
the
laboratory
measurements,
both
the
current
v
ersion
of
the
g
ame
and
an
updated
v
ersion
with
a
cro
wdsourcing
module
will
be
published
and
made
a
v
ailable
for
use.
3.
RESUL
TS
The
main
focus
of
the
e
v
aluation
is
to
detect
dif
ferences
based
on
gender
,
between
the
tw
o
modaliti
es,
and
among
the
auditory
samples,
using
completion
time
and
ip
numbers
as
metrics.
In
this
section,
results
are
rst
presented
based
on
gender
comparison,
follo
wed
by
comparisons
of
modality
and
resolution.
Finally
,
specic
ndings
for
each
resolution
are
presented.
The
ne
xt
section
discusses
the
ndings.
3.1.
Gender
comparison
T
ables
1
and
2
sho
w
mean
and
SD
v
alues
for
time
and
ips
for
both
genders
and
modalities
ba
sed
on
g
ameplays
at
all
resolutions
combined.
In
visual
mode,
the
dif
ference
in
time
cost
between
the
genders
w
as
not
signicant
(F=0.36;
p=0.55),
b
ut
the
mean
number
of
ips
sho
wed
signicantly
better
results
(fe
wer
ips)
for
females
(F=7.73;
p=0.006).
Ho
we
v
er
,
there
w
as
no
dif
ference
observ
ed
for
either
time
or
ips
in
audio-only
mode
(F=0.47;
p=0.49)
and
(F=0.73;
p=0.39),
respecti
v
ely
.
T
able
1.
Summarized
results
o
v
er
all
resolutions
of
time
costs
(in
seconds)
and
number
of
ips
(mean
and
SD
v
alues)
for
each
modality
(males)
Modality
T
ime
Flips
T
ime
Flips
V
ision
V
ision
Audio
Audio
Mean
128.01
102.18
232.73
92.63
SD
94.52
78.82
184.63
70.54
T
able
2.
Summarized
results
o
v
er
all
resolutions
of
time
costs
(in
seconds)
and
number
of
ips
(mean
and
SD
v
alues)
for
each
modality
(females)
Modality
T
ime
Flips
T
ime
Flips
V
ision
V
ision
Audio
Audio
Mean
121.93
80.94
219.53
86.08
SD
96.71
65.50
182.08
74.90
3.2.
Comparison
of
modalities
The
time
cost
for
visual
g
ameplays
w
as
consistently
signicantly
lo
wer
than
for
audio
m
ode,
b
ut
t
his
is
attrib
uted
to
the
presentation
method
rather
than
the
cogniti
v
e
functions
of
the
subjects
in
this
case.
V
isual
icons
were
re
v
ealed
immediately
after
clicking
on
a
card,
whereas
audio
samples
required
2-4
seconds
each
to
play
back.
Thus,
when
comparing
the
modalities
among
males
(T
able
1),
the
mean
completion
time
for
visual
stimuli
(128.01)
is
signicantly
f
aster
than
for
audio
(232.73)
(F=45.89;
p=5.16E-11).
Interestingly
,
there
w
as
no
signicant
dif
ference
in
ip
numbers
(F=1.47;
p=0.23).
The
same
pattern
holds
for
females
(T
able
2),
where
the
dif
ference
between
the
mean
times
(121.93
and
219.53)
is
signicant
(F=40.34;
p=6.47E-10),
b
ut
not
for
ip
numbers
(F=0.48;
p=0.49).
Short-term
r
ecall
comparison
of
iconic
auditory
and
visual
feedbac
k
...
(Gy
¨
or
gy
W
er
s
´
enyi)
Evaluation Warning : The document was created with Spire.PDF for Python.
314
❒
ISSN:
2502-4752
3.3.
Comparison
depending
on
r
esolution
Figure
3
presents
the
results
for
all
resolutions
used
and
for
both
modalities.
Mean
time
cost
and
ip
v
alues
for
males/females
are
collected
and
presented
alongside
the
ANO
V
A
results.
“No”
indicates
a
statis-
tically
insignicant
dif
ference
between
the
means,
while
“yes”
indicates
a
statistically
signicant
dif
ference
between
the
genders.
Lo
wer
v
alues
(less
time,
fe
wer
ips)
indicate
better
results.
F
or
instance,
in
the
5
×
2
resolution,
the
mean
ip
v
alue
in
audio
mode
for
males
(21.50)
appears
higher
than
for
females
(19.90),
b
ut
it
is
not
si
gnicant
(p=0.46).
In
contrast,
the
dif
ference
in
the
same
e
v
aluation
in
visual
mode
sho
ws
better
results
for
females.
Using
some
of
the
data
from
Figure
3,
we
can
rearrange
the
result
s
to
create
T
ables
3
and
4.
Here,
the
time
information
is
omitted,
allo
wing
for
a
comparison
based
solely
on
the
mean
ip
numbers
across
all
res-
olutions.
These
results
support
that
there
w
as
no
signicant
dif
ference
in
ip
number
between
the
modalities,
neither
for
females
nor
for
males,
re
g
ardless
of
resolution.
Only
one
of
the
18
paired
comparisons
sho
wed
a
slightly
signicant
dif
ference
(T
able
3):
in
the
6
×
6
resolution
for
males,
where
the
mean
ip
number
in
audio
mode
(129.20)
is
better
than
it
is
for
visual
mode
(156.55).
Figure
3.
Summarized
results
for
all
resolution
(ra
w
×
column)
for
gender
comparison
(male/female)
based
on
time
and
ips
T
able
3.
Summarized
results
for
modality
comparison
based
on
mean
ips
numbers
in
each
resolution
(males)
Audio
V
ision
ANO
V
A
5
×
2
21.50
21.90
F=0.06;
p=0.80
3
×
4
25.70
24.50
F=0.53;
p=0.47
4
×
4
39.80
43.30
F=1.49;
p=0.23
4
×
5
56.50
62.00
F=1.46;
p=0.19
4
×
6
71.10
78.00
F=1.20;
p=0.28
6
×
5
108.50
110.50
F=0.04;
p=0.84
6
×
6
129.20
156.55
F=5.36;
p=0.03
6
×
7
177.60
192.40
F=0.90;
p=0.35
6
×
8
205.00
230.50
F=1.45;
p=0.24
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
39,
No.
1,
July
2025:
310–321
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
❒
315
T
able
4.
Summarized
results
for
modality
comparison
based
on
mean
ips
numbers
in
each
resolution
(females)
Audio
V
ision
ANO
V
A
5
×
2
19.90
18.40
F=0.61;
p=0.44
3
×
4
24.50
21.50
F=2.78;
p=0.10
4
×
4
47.30
38.10
F=2.25;
p=0.14
4
×
5
46.30
50.50
F=0.74;
p=0.39
4
×
6
68.10
67.15
F=0.02;
p=0.89
6
×
5
90.80
88.25
F=0.05;
p=0.83
6
×
6
122.60
118.45
F=0.06;
p=0.81
6
×
7
157.30
146.40
F=0.24;
p=0.62
6
×
8
197.90
179.70
F=0.52;
p=0.47
3.4.
Results
in
each
r
esolution
As
e
xpected,
when
comparing
visual
icons,
there
w
as
no
signicant
dif
ference
in
an
y
of
t
he
res
olutions
among
the
iconic
representations,
neither
in
time
nor
in
ip
numbers.
Ho
we
v
er
,
ip
numbers
sho
w
that
audi-
tory
samples
may
be
recalled
dif
ferently
depending
on
the
type
and
number
of
concurrent
items
(resolution).
Findings
will
be
discussed
in
section
4.3.
In
the
5
×
2
resolution,
there
were
no
signicant
dif
ferences
in
time
cost
and
ips
between
males
and
females
for
audio
mode.
In
visual
mode,
time
costs
were
the
same,
b
ut
females
performed
signicantly
better
in
ips.
When
comparing
the
v
e
sound
samples
(combining
female
and
male
data)
based
on
mean
ip
numbers,
no
dif
ferences
were
found
among
them.
At
the
3
×
4
and
4
×
4
resolutions,
there
w
as
no
dif
ference
between
the
genders
in
either
audio
or
visual
mode,
for
both
time
cost
and
ips.
Similarly
,
when
comparing
the
six
and
eight
sound
samples,
respecti
v
ely
,
there
were
no
dif
ferences
among
them.
At
4
×
5
resolution,
female
subjects
performed
signicantly
better
in
audio
mode
for
both
ti
me
cost
and
ip
number
,
while
in
visual
mode,
the
dif
ference
w
as
signicant
only
for
ip
number
.
Additionally
,
there
w
as
a
signicant
dif
ference
among
the
ten
sound
samples.
In
the
4
×
6
resolution,
there
were
no
dif
ferences
between
the
genders
in
either
audio
or
visual
mode,
for
both
time
cost
and
ips.
Ho
we
v
er
,
there
w
as
a
signicant
dif
ference
among
the
12
sound
samples.
Results
for
the
highest
resolutions
(6
×
5,
6
×
6,
6
×
7,
and
6
×
8)
sho
wed
no
dif
ference
between
genders
in
audio
mode
for
either
time
cost
or
ips.
Ho
we
v
er
,
in
vision
mode,
there
w
as
a
signicant
dif
ference
in
ip
numbers,
with
females
requiring
fe
wer
ips.
When
comparing
the
sound
samples,
signicant
dif
ferences
were
observ
ed
among
them,
e
xcept
for
6
×
5,
although
this
may
also
be
considered
an
outlier
.
4.
DISCUSSION
This
section
analyzes
and
discusses
the
results
from
the
pre
vious
section.
The
e
v
aluation
is
based
on
gender
,
modality
,
type
of
stimuli,
and
memory
capacity
(resolution).
4.1.
Gender
Comparison
of
genders
can
be
made
based
on
T
able
1
and
T
able
2.
In
audio
mode,
there
were
no
dif
ferences
in
time
and
ips.
Interestingly
,
females
performed
better
in
visual
mode
re
g
arding
ip
numbers,
especially
at
higher
resolutions.
The
only
e
xception
w
as
4
×
5,
which
we
consider
an
outlier
,
as
it
is
unlik
ely
to
be
signicantly
dif
ferent
from
4
×
4
and
4
×
6.
Early
psychological
studies
did
not
aim
to
e
xplore
gender
dif-
ferences,
and
re
vie
ws
suggest
that
neither
se
x
can
be
said
to
ha
v
e
a
better
memory
per
se;
rather
the
tw
o
se
x
es
dif
fer
in
terms
of
what
type
of
information
the
y
remember
best.
V
ariations
in
memory
performance
between
men
and
w
omen
may
be
due
to
their
ph
ysiological
capabil
ities,
their
interest,
their
e
xpectations,
or
some
com-
ple
x
interaction
of
these
f
actors
[49].
A
present
meta-analysis
aimed
to
quantify
gender
dif
ferences
in
v
erbal
w
orking
memory
sho
wed
that
gender
dif
ferences
dif
fered
across
tasks
[50].
Although
it
has
been
commonly
held
that
males
sho
w
an
adv
antage
on
spatial
tasks,
and
females
on
v
erbal
tasks,
there
is
ne
w
e
vidence
that
gen-
der
dif
ferences
are
more
widespread,
and
female
v
erbal
adv
antage
e
xtends
into
numerous
tasks,
with
a
small
b
ut
signicant
adv
antage
may
e
xist
f
o
r
general
episodic
memory
[51],
[52].
Recognition-memory
tests
also
re
v
ealed
indi
vidual
dif
ferences
in
visual
episodic
memory
.
In
an
e
xperiment,
females
outperformed
males
on
f
ace
recognition-m
emory
tests,
and
this
adv
antage
w
as
related
to
females’
scanning
beha
vior
[53].
Although
in
our
e
xperiment
the
icons
in
the
g
ame
were
spatially
aligned
and
higher
resolutions
were
l
ar
ger
in
size
than
Short-term
r
ecall
comparison
of
iconic
auditory
and
visual
feedbac
k
...
(Gy
¨
or
gy
W
er
s
´
enyi)
Evaluation Warning : The document was created with Spire.PDF for Python.
316
❒
ISSN:
2502-4752
smaller
ones,
spatial
attrib
utes
did
not
play
a
signicant
role.
W
e
speculate
that
the
better
results
in
the
visual
task
may
be
attrib
uted
to
the
scanning
and
g
aming
strate
gies
emplo
yed
by
females.
Re
g
arding
auditory
memory
,
a
recent
study
compared
30
young
females
and
30
males
in
a
s
h
or
t-term
memory
test.
Females
performed
better
in
the
visual
task,
and
visual
memory
w
as
sho
wn
to
be
superior
to
auditory
memory
for
both
genders
[54].
W
e
can
support
the
rst
observ
ation,
b
ut
we
ha
v
e
found
no
dif
ference
between
the
modalities.
A
similar
study
also
concluded
that
females
perform
better
in
visual
task
[55].
Another
study
tar
geting
gender
and
age
group
dif
ferences
in
episodic
memory
in
v
olv
ed
a
v
ery
lar
ge
sample
of
366
fe-
males
and
330
males.
W
omen
outperformed
men
on
auditory
memory
tasks,
whereas
male
adolescents
sho
wed
higher
le
v
el
perfor
mance
on
visual
episodic
and
visual
w
orking
memory
measures
[56].
As
our
observ
ations
did
not
support
these
results,
we
can
still
speculate
that
the
initial
conditions
of
the
tests
play
a
signicant
role.
F
ormer
results
partly
support
a
declining
performance
on
episodic
memory
and
visual
w
orking
mem-
ory
measures
with
increasing
age
[56].
In
our
e
xperiment,
there
w
as
no
e
v
aluation
based
on
the
age
of
the
subjects.
All
participants
were
relati
v
ely
young,
e
xcept
for
one
outlier
,
a
50-year
-old
female,
whose
results
in
audio
mode
signicantly
dif
fered
from
the
means
both
for
time
and
ips.
Otherwise,
we
did
not
nd
outliers
in
the
groups.
Generally
,
on
smaller
resolutions,
indi
vidual
dif
ferences
may
be
signicant.
Our
pre
vious
e
xperiment
with
this
setup
indicated
that
younger
subjects
produce
better
results
[36].
Ho
we
v
er
,
in
both
e
xperiments,
the
selection
criteria
were
not
suitable
for
a
correct
age
comparison
or
for
conclusi
v
e
results.
It
is
suggested
to
design
e
xperiments
specically
to
test
the
ef
fect
of
age,
as
it
appears
to
be
an
important
f
actor
.
From
an
engineering
point
of
vie
w
,
gender
does
not
appear
to
play
a
signicant
role
in
the
design
and
de
v
elopment
procedure
of
applications
where
episodic
memory
is
important.
4.2.
Modalities
The
time
cost
for
visual
g
ameplays
w
as
al
w
ays
signicantly
lo
wer
than
for
audio
mode,
b
ut
this
is
due
to
the
presentation
method
and
not
the
cogni
ti
v
e
functions
of
the
subjects
in
this
case.
V
isual
icons
are
re
v
ealed
immediately
after
clicking
a
card,
whereas
playback
of
audio
samples
tak
es
se
v
eral
seconds
each.
Although
it
is
not
required
to
w
ait
until
the
sound
sample
is
nished,
subjects
usually
w
aited
until
the
end.
T
o
mak
e
a
correct
comparison,
a
delay
should
be
inserted
in
visual
mode
to
correct
for
timing
irre
gularities.
Ho
we
v
er
,
this
kind
of
comparison
w
ould
not
be
v
ery
meaningful.
In
f
act,
a
parallel
in
v
estig
ation
that
included
a
mix
ed
mode
(audio
and
visual
combined)
re
v
ealed
that
the
completion
time
in
this
case
lies
between
audio-only
and
visual-
only
modes,
as
subjects
tak
e
some
time
to
reconsider
the
posit
ion
of
the
visual
icons
during
audio
playback.
Moreo
v
er
,
this
combined
audio
visual
presentation
seemed
to
decrease
the
mean
number
of
ips
as
well.
Man
y
pre
vious
e
xperiments
ha
v
e
sho
wn
visual
memory
to
outperform
auditory
memory
[31],
[57]–
[60].
Also,
the
studies
mentioned
in
the
gender
section
generally
support
this
observ
ation.
Ho
we
v
er
,
some
other
papers
ha
v
e
reported
that
there
is
no
dif
ference
between
them
[61],
[62].
Scores
could
e
v
en
be
better
when
processed
through
the
auditory
modality
,
such
as
for
children
[63],
[64].
Comparing
visual
and
auditory
modalities
in
our
e
xperiment,
there
w
as
no
signicant
dif
ference
in
ip
numbers
for
males
(F=1.47,
p=0.23),
and
the
same
holds
for
females,
with
the
dif
ference
also
not
being
signicant
(F=0.48,
p=0.49).
T
ables
3
and
4
corroborate
this
observ
ati
on
,
with
one
e
xception:
the
6x6
resolution
for
males
sho
wed
a
some
what
signicant
dif
ference.
Our
results
indicate
no
signicant
dif
ference
between
the
visual
and
auditory
modalities
for
ip
num-
bers
in
this
g
ame,
re
g
ardless
of
the
number
of
item
s
(ranging
from
10
to
24)
or
gender
.
This
nding
is
important
from
an
engineering
perspecti
v
e,
as
application
de
v
elopers
can
reliably
use
audio
information
if
short-term
re-
call
is
important.
The
reason
and
parameters
for
achie
ving
results
with
audio
that
are
as
good
as
those
with
visual
stimuli
remain
an
open
question,
and
further
e
xperiments
should
be
carefully
designed
and
conducted.
4.3.
Sound
comparison
Figure
2
introduced
the
sound
samples
used
in
the
e
xperiment,
presented
in
the
order
of
appearance
with
increasing
le
v
els.
The
rst
ten
s
amples
comprise
measurement
signals
and
male
and
female
v
oice
samples.
F
ollo
wing
these,
V
iolin1
and
Guitar1
are
the
rst
auditory
icons,
introduced
at
the
4
×
6
le
v
el.
Subsequently
,
the
sound
of
a
“kiss”
w
as
added
e
xclusi
v
ely
at
the
highest
le
v
el,
6
×
8.
Originally
intended
as
an
auditory
icon,
it
w
as
disco
v
ered
to
be
more
akin
to
a
“human
sound,
”
more
closely
related
to
the
v
oice
samples.
T
able
5
presents
the
summarized
ndings
for
all
resolutions
i
n
a
simplied
form,
indicating
whether
there
w
as
a
signicant
dif
ference
among
the
sounds
according
to
the
mean
ip
numbers
of
the
indi
vidual
sound
sample.
The
second
column
denotes
the
number
of
dif
ferences
identied
through
all
possible
paired
t-tests
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
39,
No.
1,
July
2025:
310–321
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
❒
317
during
the
T
uk
e
y
post-hoc
analysis.
The
results
indicate
that
up
to
8
sound
pairs,
there
w
as
no
discernible
dif-
ference
between
the
sound
samples,
incl
u
di
ng
the
male
v
oice
sample.
Ho
we
v
er
,
the
introduction
of
the
female
sample
in
the
4
×
5
resolution
resulted
in
signicantly
better
performance
for
this
particular
sample
(observ
ed
3
times).
As
additional
samples,
including
dif
ferent
auditory
icons,
were
introduced,
some
emer
ged
as
signif-
icantly
better
recalled
than
others
.
These
include
female
and
male
v
oice
samples,
kiss
sound,
and
in
certain
cases,
to
y
train,
whistle
(also
closely
resembling
human
sounds),
and
phone
ringing.
Although
no
clear
pattern
emer
ged
among
the
a
u
di
tory
icons,
human
sounds
were
generally
f
a
v
ored
and
better
recalled
than
other
sounds.
Notably
,
the
6
×
5
resolution
e
xhibited
no
signicant
dif
ferences,
b
ut
we
suspect
this
may
be
an
outlier
.
T
able
5.
Results
of
the
ANO
V
A
and
T
uk
e
y
test
sho
wing
ho
w
man
y
times
a
paired
t-test
re
v
ealed
signicant
dif
ference
(fe
wer
ip
number)
Signicant
dif
ference
Dif
ferences
in
paired
t-tests
Number
of
sound
pairs
Resolution
No
0
5
5
×
2
No
0
6
3
×
4
No
0
8
4
×
4
Y
es
3
10
4
×
5
Y
es
4
12
4
×
6
No
0
15
6
×
5
Y
es
6
18
6
×
6
Y
es
1
21
6
×
7
Y
es
16
24
6
×
8
A
former
e
xperiment
incorporated
tw
o
sets
of
visual
icons
and
their
auditory
counterparts
only
in
a
3
×
5
resolution
[65].
Sound
stimuli
consisted
of
auditory
icons
and
earc
on
s
.
The
results
sho
wed
that
the
participants
made
f
aster
and
more
correct
matches
between
visual
icons
and
auditory
icons
than
between
visual
icons
and
earcons.
W
e
support
former
ndings
that
f
amiliar
natural
sounds
are
better
recalled
[65],
[66].
In
the
case
of
auditory
icons,
the
recall
process
may
also
depend
on
the
task,
and
the
amount
of
spectral–temporal
structure
in
a
sound
can
be
indicati
v
e
for
memory
performance
[67].
Standardized
measurement
signals
allo
w
for
easy
comparison
across
repeated
e
xperiments.
On
the
other
hand,
auditory
icons
and
earcons
can
v
ary
signicantly
,
e
v
en
when
con
v
e
ying
the
same
semantic
meaning
(e.g.,
guitar
,
phone
ringing).
This
v
ariability
may
result
in
greater
dif
ferences
in
results
when
using
dif
ferent
sound
samples.
Speech
and
human
sound
samples
represent
an
intermediate
solution.
Generally
,
our
ndings
support
the
idea
of
using
iconic
human
sound
samples
and
auditory
icons,
as
the
y
are
better
recalled
than
unf
amiliar
and
unpleasant
articial
measurement
signals.
Furthermore,
our
results
indicate
that
there
are
no
signicant
dif
f
erences
e
v
en
between
similar
s
ou
nds
,
such
as
pink
noise-white
noise,
1
kHz
sinus-1
kHz
square,
and
1
kHz
sinus-5
kHz
sinus.
Although
some
subjects
reported
confusion
with
these
sounds
during
informal
feedback
after
the
e
xperiment,
statistical
analysis
did
not
support
this
speculation.
As
mentioned
pre
viously
,
no
dif
ference
w
as
observ
ed
in
the
visual
mode,
as
the
iconic
representation
w
as
intentionally
designed
to
be
similar
,
such
as
a
v
oiding
the
use
of
colors
or
dif
ferent
sizes.
From
an
engineering
perspecti
v
e,
e
v
en
short-term
recall
of
iconic
auditory
e
v
ents
can
be
impro
v
ed
by
using
human-related
and
f
amiliar
e
v
eryday
sound
samples.
Articial
sounds
can
be
emplo
yed
when
necessary
,
such
as
for
alarm
sounds,
neutral
notications,
or
when
meaningful
sounds
might
cause
confusion.
4.4.
Memory
capacity
and
limitations
The
short-term
memory
capacity
has
been
e
xtensi
v
ely
studied,
particularly
in
psychology
,
neurology
,
and
cogniti
v
e
sciences,
with
a
primary
focus
on
visual
and/or
speech
memory
.
In
visual
scenarios,
the
recall
capacity
w
as
found
to
be
inuenced
by
the
comple
xity
of
items,
with
simpler
objects
being
easier
to
remember
[68].
It
w
as
also
suggested
that
the
limited
capacity
of
short-term
memory
could
be
a
consequence
of
ef
cienc
y
of
design,
with
an
ef
fecti
v
e
upper
limit
of
about
5
to
9
items
[69].
Our
results
align
with
these,
as
error
rates
and
dif
ferences
among
the
auditory
icons
increased
after
resolution
4
×
5
(10
pairs).
Informal
feedback
from
the
subjects
also
supported
this
nding,
as
the
y
reported
that
the
g
ame
w
as
relati
v
ely
easy
with
5-8
pairs
in
both
modalities.
The
g
ame
includes
a
b
uilt-in
re
w
ard
system
to
moti
v
ate
players.
If
a
player
completes
a
g
ame
without
an
y
errors,
the
y
recei
v
e
a
“perfect
g
ame”
feedback.
Only
at
the
lo
west
resolutions
(up
to
8
pairs)
were
players
able
to
achie
v
e
this.
Short-term
r
ecall
comparison
of
iconic
auditory
and
visual
feedbac
k
...
(Gy
¨
or
gy
W
er
s
´
enyi)
Evaluation Warning : The document was created with Spire.PDF for Python.
318
❒
ISSN:
2502-4752
Although
some
pre
vious
studies
suggested
a
precise
capacity
limit
of
three
to
v
e
chunks,
a
re
vie
w
article
presented
a
range
of
data
on
dif
ferent
capacity
limits.
It
w
as
proposed
that
a
more
accurate
limit
might
be
around
four
chunks
[27],
[32],
[70].
Our
results
suggest
a
higher
number
around
8.
F
or
auditory
e
v
ents,
fe
wer
results
are
a
v
ailable.
An
o
v
ervie
w
w
as
presented
on
ho
w
auditory
memory
functions,
with
a
focus
on
ho
w
attention
inuences
outcomes
[26].
In
engineering,
audio
visual
memory
capacity
plays
an
important
role.
Our
results
suggest
that
both
auditory
and
visual
repres
entations
can
be
ef
fecti
v
ely
recalled
in
the
short
term
for
up
to
8-10
items.
In
addition,
training
w
orking
memory
has
been
found
to
generally
enhance
its
capacity
[71].
This
highlights
the
importance
of
e
xperience
and
a-priori
training.
Further
in
v
estig
ations
could
focus
on
the
ef
fects
of
such
training.
5.
CONCLUSION
40
subjects
participated
in
a
g
amied
e
xperiment
focusing
on
short-term
audio
visual
memory
.
Sub-
jects
played
a
f
amiliar
memory
g
ame
in
both
visual-only
and
audio-only
modes,
incorporating
iconic
visual
and
auditory
representations
in
nine
dif
ferent
resolutions
ranging
from
5
×
2
to
6
×
8.
Results
indicated
no
sig-
nicant
dif
ference
between
the
visual
and
auditory
modalities
based
on
the
number
of
ips.
The
superiority
in
the
results
for
visual
presentation
in
the
completion
time
w
as
due
to
the
presentation
method.
During
visual
presentation,
the
mean
ip
number
of
female
subjects
w
as
less
than
for
male
subjects
only
if
the
number
of
pairs
e
xceeded
15
(6
×
5).
There
w
as
no
dif
ference
in
the
audio
mode.
Gender
did
not
appear
to
be
a
signicant
parameter
.
Measurement
signals,
human
sounds,
and
auditory
icons
were
e
xamined
based
on
mean
time
cost
and
ip
numbers.
Ev
aluation
of
the
sound
samples
indicated
that
human
sounds
can
be
recalled
the
best,
follo
wed
by
auditory
icons.
This
supports
former
ndings
about
the
importance
of
f
amiliarity
and
semantic
content
of
iconic
sound
samples
during
designing
auditory
displays
and
feedback
solutions
(i.e.,
for
assisti
v
e
technology
,
augmented
reality/virtual
reality
(AR/VR)
en
vironments,
and
simulators).
The
results
can
be
sensiti
v
e
to
initial
parameters
such
as
the
age
of
the
participants,
the
duration
of
the
e
xperiment
(including
the
ef
fects
of
training
and
f
atigue),
and
the
selection
criteria
of
auditory
icons.
Future
w
ork
will
address
open
questi
ons
about
the
signicance
of
the
subjects’
age,
the
impact
of
e
xperience,
and
the
usability
of
cro
wdsourcing
solutions
for
big
data
e
v
aluation.
FUNDING
INFORMA
TION
Authors
state
no
funding
in
v
olv
ed.
A
UTHOR
CONTRIB
UTIONS
ST
A
TEMENT
Name
of
A
uthor
C
M
So
V
a
F
o
I
R
D
O
E
V
i
Su
P
Fu
Gy
¨
or
gy
W
ers
´
en
yi
√
√
√
√
´
Ad
´
am
Csap
´
o
√
√
√
J
´
ozsef
T
oll
´
ar
√
√
√
C
:
C
onceptualization
I
:
I
n
v
estig
ation
V
i
:
V
i
sualization
M
:
M
ethodology
R
:
R
esources
Su
:
Su
pervision
So
:
So
ftw
are
D
:
D
ata
Curation
P
:
P
roject
Administration
V
a
:
V
a
lidation
O
:
Writing
-
O
riginal
Draft
Fu
:
Fu
nding
Acquisition
F
o
:
F
o
rmal
Analysis
E
:
Writing
-
Re
vie
w
&
E
diting
CONFLICT
OF
INTEREST
ST
A
TEMENT
Authors
state
no
conict
of
interest.
D
A
T
A
A
V
AILABILITY
The
data
that
support
the
ndings
of
this
study
are
a
v
ailable
from
the
corresponding
author
,
Gy
.W
.,
upon
reasonable
request.
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
39,
No.
1,
July
2025:
310–321
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
❒
319
REFERENCES
[1]
F
.
Bellot
ti,
B.
Kapralos,
K.
Lee,
P
.
Moreno-Ger
,
and
R.
Berta,
“
Assessm
ent
in
and
of
serious
g
ames:
an
o
v
ervie
w
,
”
Advances
in
Human-Computer
Inter
action
,
v
ol.
2013,
p.
1,
2013,
doi:
10.1155/2013/136864.
[2]
J.
B.
hauge
et
al.
,
“serious
g
ame
mechanics
and
opportunities
for
reuse,
”
in
11th
International
Confer
ence
eLearning
and
Softwar
e
for
Education
,
Apr
.
2015,
v
ol.
2,
pp.
19–27,
doi:
10.12753/2066-026X-15-094.
[3]
A.
Dimitriadou,
N.
Djaf
aro
v
a,
O.
T
uretk
en,
M.
V
erkuyl,
and
A.
Ferw
orn,
“Challenges
in
serious
g
ame
design
and
de
v
elopment:
educators’
e
xperiences,
”
Simulation
and
Gaming
,
v
ol.
52,
no.
2,
pp.
132–152,
2021,
doi:
10.1177/1046878120944197.
[4]
A.
C.
T
.
Klock,
I.
Gasparini,
M
.
S.
Pimenta,
and
J.
Hamari,
“T
ailored
g
amication:
a
re
vie
w
of
literature,
”
International
J
ournal
of
Human-Computer
Studies
,
v
ol.
144,
p.
102495,
2020.
[5]
A.
Rapp,
F
.
Hopfg
artner
,
J.
Hamari,
C.
Linehan,
and
F
.
Cena,
“Strengthening
g
amication
studies:
current
trends
and
fu-
ture
opportunities
of
g
amication
research,
”
International
J
ournal
of
Human
Computer
Studies
,
v
ol.
127,
pp.
1–6,
2019,
doi:
10.1016/j.ijhcs.2018.11.007.
[6]
D.
Djaout
i,
J.
Alv
arez,
and
J.-P
.
Jessel,
“Classifying
serious
g
ames:
the
G/P/S
model,
”
in
Handbook
of
r
esear
c
h
on
impr
o
ving
learning
and
motivation
thr
ough
educational
games:
Multidisciplinary
appr
oac
hes
,
IGI
global,
2011,
pp.
118–136.
[7]
S.
Deterding,
S.
L.
Bj
¨
ork,
L.
E.
Nack
e,
D.
Dixon,
and
E.
La
wle
y
,
“Designing
g
amication,
”
in
CHI
’13
Extended
Abstr
acts
on
Human
F
actor
s
in
Computing
Systems
,
Apr
.
2013,
v
ol.
2013-April,
pp.
3263–3266,
doi:
10.1145/2468356.2479662.
[8]
N.
Co
w
an,
“What
are
the
dif
ferences
between
long-term,
short-term,
and
w
orking
memory?,
”
Pr
o
gr
ess
in
Br
ain
Resear
c
h
,
v
ol.
169,
pp.
323–338,
2008,
doi:
10.1016/S0079-6123(07)00020-9.
[9]
D.
Norris,
“Short-term
memory
and
long-term
memory
are
still
dif
ferent,
”
Psyc
holo
gical
Bulletin
,
v
ol.
143,
no.
9,
pp.
992–1009,
2017,
doi:
10.1037/b
ul0000108.
[10]
D.
Burr
and
D.
Alais,
“Chapter
14
combining
visual
and
auditory
information,
”
Pr
o
gr
ess
in
Br
ain
Resear
c
h
,
v
ol.
155
B,
pp.
243–258,
2006,
doi:
10.1016/S0079-6123(06)55014-9.
[11]
M.
K
ubo
vy
and
D.
V
an
V
alk
enb
ur
g,
“
Auditory
and
visual
objects,
”
Co
gnition
,
v
ol.
80,
no.
1–2,
pp.
97–126,
Jun.
2001,
doi:
10.1016/S0010-0277(00)00155-4.
[12]
W
.
J.
Chai,
A.
I.
Abd
Hamid,
and
J.
M.
Abdullah,
“W
orking
memory
from
the
psychological
and
neurosciences
perspecti
v
es:
a
re
vie
w
,
”
F
r
ontier
s
in
Psyc
holo
gy
,
v
ol.
9,
no.
MAR,
p.
401,
2018,
doi:
10.3389/fpsyg.2018.00401.
[13]
P
.
K
elle
y
,
M.
D.
R.
Ev
ans,
and
J.
K
elle
y
,
“M
aking
memories:
wh
y
time
matters,
”
F
r
ontier
s
in
Human
Neur
oscience
,
v
ol.
12,
p.
400,
2018,
doi:
10.3389/fnhum.2018.00400.
[14]
M
.
C.
Potter
,
“V
ery
short-ter
m
conceptual
memory
,
”
Memory
&
Co
gnition
,
v
ol.
21,
no.
2,
pp.
156–161,
1993,
doi:
10.3758/BF03202727.
[15]
S
.
J.
Luck
and
E.
K.
V
ogel,
“The
capacity
of
visual
w
orking
memory
for
features
and
conjunctions,
”
Natur
e
,
v
ol.
390,
no.
6657,
pp.
279–284,
1997,
doi:
10.1038/36846.
[16]
G.
Alv
arez
and
P
.
Ca
v
anagh,
“The
capacity
of
visual
short-term
memory
is
set
by
total
informational
load,
not
number
of
objects,
”
J
ournal
of
V
ision
,
v
ol.
2,
no.
7,
pp.
106–111,
2002,
doi:
10.1167/2.7.273.
[17]
T
.
F
.
Brady
and
G.
A.
Alv
arez,
“No
e
vidence
for
a
x
ed
object
limit
in
w
orking
memory:
spatial
ensembl
e
representations
inate
es-
timates
of
w
orking
memory
capacity
for
comple
x
objects,
”
J
ournal
of
Experimental
Psyc
holo
gy:
Learning
Memory
and
Co
gnition
,
v
ol.
41,
no.
3,
pp.
921–929,
2015,
doi:
10.1037/xlm0000075.
[18]
K.
O.
Hardman
and
N.
Co
w
an,
“Remembering
comple
x
objects
in
visual
w
orking
memory:
do
capacity
limits
restrict
objects
or
fea
tures?,
”
J
ournal
of
Experimental
Psyc
holo
gy:
Learning
Memory
and
Co
gnition
,
v
ol.
41,
no.
2,
pp.
325–347,
2015,
doi:
10.1037/xlm0000031.
[19]
K.
Fukuda,
E.
A
wh,
and
E.
K.
V
ogel,
“Discrete
capacity
limits
in
visual
w
orking
memory
,
”
Curr
ent
Opinion
in
Neur
obiolo
gy
,
v
ol.
20,
no.
2,
pp.
177–182,
2010,
doi:
10.1016/j.conb
.2010.03.005.
[20]
M.
W
.
Schur
gin,
“V
isual
mem
ory
,
the
long
and
the
short
of
it:
a
re
vie
w
of
visual
w
orking
memory
and
long-term
memory
,
”
Attention,
P
er
ception,
and
Psyc
hophysics
,
v
ol.
80,
no.
5,
pp.
1035–1056,
2018,
doi:
10.3758/s13414-018-1522-y
.
[21]
P
.
W
ilk
en
and
W
.
J.
Ma,
“
A
detection
theory
account
of
change
detection,
”
J
ournal
of
V
isi
on
,
v
ol.
4,
no.
12,
pp.
1120–1135,
2004,
doi:
10.1167/4.12.11.
[22]
T
.
F
.
Brady
,
T
.
K
onkle,
and
G.
A.
Alv
arez,
“
A
re
vie
w
of
visual
memory
capacity:
be
yond
indi
vidual
items
and
to
w
ard
structured
representations,
”
J
ournal
of
V
ision
,
v
ol.
11,
no.
5,
pp.
1–34,
2011,
doi:
10.1167/11.5.1.
[23]
S.
McAdams
and
E.
Big
and,
Thinking
in
sound:
the
co
gnitive
psyc
holo
gy
of
human
audition
.
Oxford
Uni
v
ersity
Press,
1993.
[24]
W
.
Ritter
,
D.
Deacon,
H.
Gomes,
D.
C.
Ja
vitt,
and
H.
G.
V
aughan,
“The
mismatch
ne
g
ati
vity
of
e
v
ent-related
potentials
as
a
probe
of
transient
auditory
memory:
a
re
vie
w
,
”
Ear
and
Hearing
,
v
ol.
16,
no.
1,
pp.
52–67,
1995,
doi:
10.1097/00003446-199502000-00005.
[25]
J.
Kaiser
,
“Dynamics
of
auditory
w
orking
memory
,
”
F
r
ontier
s
in
Psyc
holo
gy
,
v
ol.
6,
no.
May
,
p.
613,
2015,
doi:
10.3389/fp-
syg.2015.00613.
[26]
J.
F
.
Zimmermann,
M.
Mosco
vitch,
and
C.
Alain,
“
Attending
to
auditory
memory
,
”
Br
ain
Resear
c
h
,
v
ol.
1640,
pp.
208–221,
2016,
doi:
10.1016/j.brainres.2015.11.032.
[27]
N.
Co
w
an,
“The
magical
number
4
in
short-term
memory:
a
reconsideration
of
mental
storage
capacity
,
”
Behavior
al
and
Br
ain
Sciences
,
v
ol.
24,
no.
1,
pp.
87–114,
2001,
doi:
10.1017/S0140525X01003922.
[28]
D.
L.
Nelson,
V
.
S.
Reed,
and
J.
R.
W
alling,
“Pictorial
superiority
ef
fect,
”
J
ournal
of
e
xperi
mental
psyc
holo
gy:
Human
learning
and
memory
,
v
ol.
2,
no.
5,
pp.
523–528,
1976,
doi:
10.1037/0278-7393.2.5.523.
[29]
K.
C.
Back
er
and
C.
Alain,
“
Att
ention
to
memory:
orienting
attention
to
sound
object
representations,
”
Psyc
holo
gical
Resear
c
h
,
v
ol.
78,
no.
3,
pp.
439–452,
2014,
doi:
10.1007/s00426-013-0531-7.
[30]
J.
L.
Burt,
D.
S.
Bartolome,
D.
W
.
Burdette,
and
J.
R.
Comstock
Jr
,
“
A
psychoph
ysiological
e
v
al
uation
of
the
percei
v
ed
ur
genc
y
of
auditory
w
arning
signals,
”
Er
gonomics
,
v
ol.
38,
no.
11,
pp.
2327–2340,
1995.
[31]
M.
A.
Cohen,
T
.
S.
Horo
witz,
and
J.
M.
W
olfe,
“
Auditory
recognition
memory
is
inferior
to
visual
recognition
memory
,
”
Pr
o-
ceedings
of
the
National
Academy
of
Sciences
of
the
United
States
of
America
,
v
ol.
106,
no.
14,
pp.
6008–6010,
2009,
doi:
10.1073/pnas.0811884106.
[32]
N.
Co
w
an,
“V
isual
and
auditory
w
orking
memory
capacity
,
”
T
r
ends
in
Co
gnitive
Sciences
,
v
ol.
2,
no.
3,
pp.
77–78,
1998,
doi:
10.1016/S1364-6613(98)01144-9.
Short-term
r
ecall
comparison
of
iconic
auditory
and
visual
feedbac
k
...
(Gy
¨
or
gy
W
er
s
´
enyi)
Evaluation Warning : The document was created with Spire.PDF for Python.