TELK
OMNIKA
T
elecommunication,
Computing,
Electr
onics
and
Contr
ol
V
ol.
19,
No.
5,
October
2021,
pp.
1622
1629
ISSN:
1693-6930,
accredited
First
Grade
by
K
emenristekdikti,
Decree
No:
21/E/KPT/2018
DOI:
10.12928/TELK
OMNIKA.v19i5.19566
r
1622
PCA-based
dimensionality
r
eduction
f
or
face
r
ecognition
Md.
Ab
u
Marjan
1
,
Md.
Rashedul
Islam
2
,
Md.
P
alash
Uddin
3
,
Masud
Ibn
Afjal
4
,
Md.
Al
Mamun
5
1,3,4
Department
of
Computer
Science
and
Engineering,
Hajee
Mohammad
Danesh
Science
and
T
echnology
Uni
v
ersity
,
Bangladesh
2
Information
T
echnology
Cell,
Hajee
Mohammad
Danesh
Science
and
T
echnology
Uni
v
ersity
,
Bangladesh
5
Department
of
Computer
Science
and
Engineering,
Rajshahi
Uni
v
ersity
of
Engineering
and
T
echnology
,
Bangladesh
Article
Inf
o
Article
history:
Recei
v
ed
Jan
6,
2021
Re
vised
Mar
19,
2021
Accepted
Mar
30,
2021
K
eyw
ords:
Data
reduction
Dimensionality
reduction
Eigen
analysis
F
ace
recognition
Principal
component
analysis
ABSTRA
CT
In
this
paper
,
we
conduct
a
comprehensi
v
e
study
on
dimensionality
reduction
(DR)
techniques
and
discuss
the
mostly
used
statistical
DR
technique
called
principal
com-
ponent
analysis
(PCA)
in
detail
with
a
vie
w
to
addressing
the
classical
f
ace
recognition
problem.
Therefore,
we,
more
de
v
otedly
,
propose
a
solution
to
either
a
typical
f
ace
or
indi
vidual
f
ace
recognition
based
on
the
principal
components,
which
are
constructed
using
PCA
on
the
f
ace
images.
W
e
simulate
the
proposed
solution
with
se
v
eral
train-
ing
and
test
sets
of
ma
nually
captured
f
ace
images
and
also
with
the
popular
Oli
v
etti
Research
Laboratory
(ORL)
and
Y
ale
f
ace
databases.
The
performance
measure
of
the
proposed
f
ace
recognizer
signifies
its
superiority
.
This
is
an
open
access
article
under
the
CC
BY
-SA
license
.
Corresponding
A
uthor:
Md.
Ab
u
Marjan
Department
of
Computer
Science
and
Engineering
Hajee
Mohammad
Danesh
Science
and
T
echnology
Uni
v
ersity
Dinajpur
-5200,
Bangladesh
Email:
marjan@hstu.ac.bd
1.
INTR
ODUCTION
Data
mining
is
a
w
ay
for
e
xtracting
or
mining
kno
wledge
from
lar
ge
amounts
of
data
[1]-[4].
In
de
v
eloping
data
mining
application,
the
amount
data
ta
k
e
n
from
v
arious
repositories
such
as
databases,
data
w
arehouse,
and
W
orld
W
ide
W
eb
(WWW).
is
typically
huge
to
be
either
stored
or
processed.
Long
time
may
be
required
for
analyzing
comple
x
data
and
mining
on
huge
amounts
of
data.
Therefore,
it
mak
es
such
analysis
sometimes
impractical
or
infeasible.
Data
reduction
techniques
are
traditionally
applied
to
find
a
reduced
representation
of
the
dataset,
which
is
much
smaller
i
n
size
ensuring
the
close
inte
grity
of
the
original
data.
T
o
what
follo
ws,
mining
on
the
reduced
dataset
should
be
more
ef
ficient
producing
the
same
or
almost
the
same
analytical
results.
The
common
strate
gies
for
data
reduction
incl
u
de
data
cube
aggre
g
ation,
attrib
ute
subset
selection,
dimensionality
reduction
(DR)
and
numerosity
reduction
[1].
Recently
,
the
dataset
size
in
terms
of
number
of
records
and
attrib
utes
i
s
e
xploring
v
ery
rapidly
,
which
prompts
the
de
v
elopment
of
a
number
of
big-data
pl
atforms,
parallel
data
analytics
algorithms
and
the
usage
of
data
DR
procedures
ef
ficiently
.
In
order
to
handle
the
real-w
orld
data
ef
fecti
v
ely
,
the
respecti
v
e
dimensionality
needs
to
be
reduced
in
an
ef
fecti
v
e
(more
economic)
amount.
DR
is
the
study
of
methods
of
transformations
for
reducing
the
number
of
dimensions
describing
the
object
of
high-dimensional
data
into
a
meaningful
rep-
resentation
of
reduced
dimensionality
.
Theoretically
,
the
reduced
representation
of
dataset
should
ha
v
e
such
a
dimensionality
that
corresponds
to
the
intrinsic
dimensionality
of
the
dataset.
The
intrinsic
dimensionality
of
J
ournal
homepage:
http://journal.uad.ac.id/inde
x.php/TELK
OMNIKA
Evaluation Warning : The document was created with Spire.PDF for Python.
TELK
OMNIKA
T
elecommun
Comput
El
Control
r
1623
dataset
means
the
minimum
number
of
ar
gument
s
needed
to
account
for
the
observ
ed
properties
of
the
data.
The
general
objecti
v
es
of
DR
are
to
remo
v
e
irrele
v
ant
and
redundant
data
for
reducing
the
manipulation
cost
and
a
v
oiding
data
o
v
er
-fitting,
and
increasing
the
quality
of
data
for
ef
ficient
data-intensi
v
e
processing
tasks,
such
as
pattern
recognition,
data
mining,
visualization,
database
na
vig
ation,
and
compression
of
high-dimensional
data.
As
such,
DR
of
fers
an
ef
fecti
v
e
solution
to
the
di
v
erse
problem
of
“curse
of
dimensionality”
and
fix
es
other
undesired
properties
of
high-dimensional
spaces
[5].
Mathematically
,
the
DR
techniques
can
be
defined
as
to
con
v
ert
a
gi
v
en
dataset
represented
in
a
n
D
matrix
X
consisting
of
n
data
v
ectors
x
i
;
i
=
1
;
2
;
:::;
n
with
dimensionality
D
into
another
dataset
Y
that
has
an
intrinsic
dimensionality
d
,
where
d
<
D
,
and
often
d
<<
D
.
The
intrinsic
dimensionality
of
data
signifies
that
the
points
in
dataset
X
are
belonging
to
or
near
a
manifold
with
dimensionality
d
that
is
implanted
in
the
D
dimensional
space.
In
another
w
ords,
the
DR
methods
encode
the
gi
v
en
dataset
X
ha
ving
dimensionality
D
into
a
ne
w
datas
et
Y
with
dimensionality
d
retaining
the
geometry
of
the
data
as
much
as
possible.
In
general,
neither
the
intrinsic
dimensionality
d
of
the
dataset
X
nor
the
geometry
of
the
data
manifold
is
completely
kno
wn.
Therefore,
DR
of
a
dataset
is
an
ill-posed
problem
that
can
only
be
solv
ed
by
assuming
certain
properties
of
the
data
such
as
its
intrinsic
dimensionality
[5].
There
are
some
DR
techniques
for
the
purpose
of
taking
a
smaller
image
and
compression
and
there
are
some
other
DR
techniques
for
machine
learning
purpose
(e.g.,
for
better
data
analysis,
classification,
statis-
tics,
and
visualization)
[6].
In
machine
learning,
dimension
reduction
is
usually
concerning
with
the
feature
v
ectors.
In
this
case,
DR
techniques
can
be
di
vided
into
tw
o
cate
gories:
feature
e
xtraction
and
feature
selection
methods.
Feature
e
xtraction
can
furt
her
be
di
vided
into
l
inear
and
non-linear
methods.
The
main
goal
of
some
methods
is
to
preserv
e
fidelity
with
respect
to
the
original
data
using
a
certain
metric
such
as
mean
squared
error
,
and
the
goal
of
some
other
methods
is
to
impro
v
e
the
performance
of
a
typical
task,
such
as
classifica-
tion,
prediction,
and
visualization
[7].
Linear
feature
e
xtraction
methods
include
principal
component
analysis
(PCA),
f
actor
analysis,
i
n
de
p
e
nd
e
nt
component
analysis
(ICA),
and
linear
discriminant
analysis
(LD
A).
Non-
linear
feature
e
xtraction
methods
include
the
front-rank
ed
techniques
such
as
multidimensional
scaling
(MDS),
Isomap,
maximum
v
ariance
unfolding,
k
ernel
PCA
etc
[5].
Feature
selection
is
di
vided
into
feature
ranking
and
feature
subset
selection.
Feature
ranking
commonly
uses
tw
o
scoring
function,
such
as
Eucli
d
e
an
distance
and
correlation
and
information
g
ain
ratio.
On
the
other
hand,
the
feature
subset
selection
methods
are
di
vided
into
filter
method,
wrapper
method
and
embedded
method.
The
filter
methods
do
not
use
an
y
learning
algorithm
[8].
In
this
paper
,
after
conducting
a
comprehensi
v
e
study
on
the
DR
techniques,
we
present
a
f
ace
recog-
nition
approach
using
PCA
transformation.
W
e
perform
e
xperiment
using
Oli
v
etti
Research
Laboratory
(ORL)
and
Y
ale
f
ace
databases.
The
e
xperimental
results
manifest
the
superiority
of
the
proposed
method.
The
main
contrib
ution
of
this
paper
is
listed:
i)
comprehensi
v
e
study
on
the
DR
techniques;
ii)
technical
and
mathematical
intuitions
behind
the
PCA
approach;
iii)
tw
o
f
ace
recognition
proposals
using
PCA
data;
and
i
v)
performance
e
v
aluation
on
ORL
and
Y
ale
f
ace
databases.
The
remainder
of
this
paper
is
or
g
anized
as
follo
ws.
W
e
pro
vide
the
technical
detail
of
the
PCA
method
in
se
ction
2.
Then,
we
discuss
the
related
w
orks
to
ours
in
section
3.
After
that,
we
e
xplain
the
proposed
f
ace
recognition
approach
in
section
4.
The
e
xperiments
and
results
are
pro
vided
in
section
5.
At
last,
we
summarize
and
conclude
the
findings
and
observ
ations
in
section
6.
2.
PRINCIP
AL
COMPONENT
AN
AL
YSIS
The
constituent
attrib
utes
of
real-w
orld
dataset
re
v
eal
relationships
among
them.
The
relati
onships
are
often
linear
or
approximately
linear
.
This
mak
es
the
attrib
utes
amenable
to
common
analysis
techniques.
One
of
such
techniques
is
PCA,
which
rotates
the
original
data
to
ne
w
coordinates
with
a
vie
w
to
making
the
data
as
flat
as
possible.
PCA
is
a
statistical
transformation
that
identifies
patterns
in
data
through
detecting
the
correlation
between
attrib
utes
[9].
If
there
e
xists
a
strong
correlation
between
attrib
utes,
the
attempt
to
reduce
the
dimensionality
only
mak
es
sense.
PCA
finds
the
directions
of
maximum
v
ariance
in
high-dimensional
data
and
then
projects
it
onto
a
reduced
dimensional
subspace
while
retaining
most
of
the
information
of
the
original
dataset
[10].
Mathematically
,
gi
v
en
a
matrix
of
tw
o
or
more
attrib
utes,
PCA
produces
a
ne
w
matrix
with
the
same
number
of
attrib
utes,
called
the
principal
components.
Each
generated
principal
component
is
a
linear
transformation
of
the
entire
original
dataset.
The
measurements
of
the
principal
components
are
calculated
in
such
a
w
ay
that
the
first
principal
component
holds
the
maximum
v
ariance,
which
can
tentati
v
ely
PCA-based
dimensionality
r
eduction
for
face
r
eco
gnition
(Md.
Ab
u
Marjan)
Evaluation Warning : The document was created with Spire.PDF for Python.
1624
r
ISSN:
1693-6930
be
thought
as
the
maximum
information.
The
second
principal
component
is
calculated
to
ha
v
e
the
second
most
v
ariance,
and,
significantly
,
in
a
linear
sense
is
uncorrelated
with
the
first
principal
component.
The
further
principal
components,
if
there
are
an
y
,
e
xhibit
decreasing
v
ariance
and
are
uncorrelated
with
all
other
principal
components.
The
steps
for
the
implementation
of
PCA
are
illustrated
[11]:
–
Step
1:
T
ak
e
the
whole
dataset
consisting
of
d
-dimensional
samples
ignoring
the
class
labels.
–
Step
2:
Compute
the
d
-dimensional
mean
v
ector
.
The
mean
v
ector
consists
of
the
means
of
each
v
ariable.
The
mean
is
the
sum
of
the
data
points
di
vided
by
the
number
of
data
points.
That
is,
=
A
=
P
n
i
=1
A
i
n
.
The
mean
is
that
v
alue
that
is
most
commonly
referred
to
as
the
a
v
erage.
The
mean
v
ector
is
often
referred
to
as
the
centroid.
The
v
ariance
is
roughly
the
arithmetic
a
v
erage
of
the
squared
distance
from
the
mean.
The
v
ariance
is
defined
as
2
=
s
2
=
v
ar
(
A
)
=
P
n
i
=1
(
A
i
A
)
2
n
1
,
where
A
is
the
mean
of
the
data.
Note
that
the
standard
de
viation
(
)
is
the
square
root
of
the
v
ariance.
–
Step
3:
Compute
the
co
v
ariance
matrix,
alternati
v
ely
,
the
scatter
matrix
of
the
whole
dataset.
a.
Co
v
ariance
matrix:
The
v
ariance-co
v
ariance
matrix
consists
of
the
v
ariances
of
the
v
ariables
along
the
main
diagonal
and
the
co
v
ariances
between
each
pair
of
v
ariables
in
the
other
matrix
posi-
tions.
The
formula
for
computing
the
co
v
ariance
of
the
v
ariables
S
and
T
is
cov
ar
(
S
;
T
)
=
P
n
i
=1
(
S
i
S
)(
T
i
T
)
n
1
,
where
S
and
T
denote
the
means
of
S
and
T
,
respecti
v
ely
.
The
co
v
ariance
matrix
is
defined
as
X
=
2
6
6
4
2
11
2
12
:::
2
1
n
2
21
2
22
:::
2
2
n
:::
:::
:::
:::
2
n
1
2
n
2
:::
2
nn
3
7
7
5
Here,
2
ii
is
the
v
ariance
of
each
v
ariable
A
i
in
A,
2
j
k
is
the
co
v
ariance
between
A
i
and
A
k
in
A.
b
.
Scatter
matrix:
The
scatter
matrix
is
computed
as
P
n
i
=1
(
A
i
m
)(
A
i
m
)
,
where
m
is
the
mean
v
ector
and
it
is
defined
as
m
=
P
n
i
=1
A
i
n
–
Step
4:
Perform
eigendecomposition
i.e.,
compute
eigen
v
ectors
(
e
1
;
e
2
;
:::;
e
d
)
and
corresponding
eigen-
v
alues
(
1
;
2
;
:::;
d
)
.
The
eigen
v
ectors
or
principal
components
determine
the
directions
of
the
ne
w
feature
space,
and
the
eigen
v
alues
determine
their
magnitude.
–
Step
5:
Sort
the
eigen
v
ectors
by
decreasing
eigen
v
alues
and
choose
k
eigen
v
ectors
with
the
lar
gest
eigen
v
alues
to
form
a
d
k
dimensional
matrix
W
,
where
e
v
ery
column
represents
an
eigen
v
ector
and
k
is
the
number
of
dimensions
of
the
ne
w
feature
subspace
with
k
6
d
.
–
Step
6:
Use
the
d
k
eigen
v
ector
projection
matrix,
W
to
transform
the
original
samples
onto
the
ne
w
subspace.
This
can
be
summarized
by
the
mathematical
equation:
y
=
W
x
,
where
x
is
a
1
d
-
dimensional
v
ector
representing
one
sample,
and
y
is
the
transformed
1
k
-dimensional
sample
in
the
ne
w
subspace.
Alternati
v
ely
,
this
can
be
performed
as
Y
=
A
W
(or
Y
=
W
A
)
,
where
Y
is
the
transformed
n
k
-dimensional
samples
in
the
ne
w
subspace.
3.
RELA
TED
W
ORK
Dash
et
al.
[12]
presented
a
PCA
based
entrop
y
measure
for
ranking
features
and
compares
with
a
similar
feature
ranking
method
(Relief)
in
[12].
Maaten,
Postma,
and
Herik
ha
v
e
in
v
estig
ated
the
performances
of
the
nonlinear
techniques
on
artificial
and
natural
tasks,
also
conduct
re
vie
w
and
systematic
comparison
of
DR
techniques
[5].
Spectr
al
DR
methods
ha
v
e
e
xplained
with
a
short
tutorial
in
the
follo
wing
paper
[13].
In
re
vie
w
w
ork
[14],
the
authors
cate
gorized
the
plethora
of
a
v
ailable
DR
methods
and
illustrated
the
mathem
atical
insight
behind
them.
Loog
a,
Ginnek
en,
and
Duin
ha
v
e
proposed
a
DR
technique
for
image
features
using
the
canonical
conte
xtual
correlation
projection
in
[15].
In
[16]
article,
the
authors
pro
vide
a
comprehensi
v
e
re
vie
w
and
comparison
of
the
performance
of
the
principal
methods
of
dimension
reduction
proposed
in
the
approximate
Bayesian
computation
literature.
Silipo,
Adae,
and
Berthold
ha
v
e
discussed
se
v
en
techniques
for
DR
which
are
missing
v
alues,
lo
w
v
ariance
filter
,
high
correlation
filter
,
PCA,
random
forests,
backw
ard
feature
elimination,
and
forw
ard
feature
construction
in
[17].
Joshi
and
Machchhar
[18]
conduct
a
comprehensi
v
e
surv
e
y
on
DR
methods
and
proposed
a
DR
met
hod
that
depends
upon
the
gi
v
en
set
of
parameters
and
v
arying
conditions
[18].
The
authors
in
v
estig
ate
that
recursi
v
e
feature
elimination,
and
genetic
and
e
v
olutionary
feature
weighting
and
selection
gi
v
e
better
classification
result
than
PCA
[19].
TELK
OMNIKA
T
elecommun
Comput
El
Control,
V
ol.
19,
No.
5,
October
2021
:
1622
–
1629
Evaluation Warning : The document was created with Spire.PDF for Python.
TELK
OMNIKA
T
elecommun
Comput
El
Control
r
1625
Se
v
eral
w
orks
ha
v
e
also
b
e
en
conducted
on
recognition
problem
based
on
PCA
in
v
arious
w
ays.
Huang
and
Y
in
[20]
compare
and
in
v
estig
ate
linear
PCA
and
v
arious
nonlinear
techniques
for
f
ace
recognition.
Alkandari
and
Aljaber
[21]
ha
v
e
presented
the
importance
of
PCA
to
identify
the
f
acial
image
without
human
interv
ention
[21].
Da
n
dpat
and
Meher
proposed
a
f
ace
recognition
for
impro
ving
performance
using
PCA
and
tw
o-dimensional
PCA
in
[22].
PCA
in
linear
discriminant
analysis
space
for
f
ace
recognition
has
been
proposed
by
Su
and
W
ang
[23].
The
follo
wing
paper
in
v
estig
ates
the
performance
when
tw
o
DR
methods
such
as
self-or
g
anizing
map
(SOM)
and
PCA
ha
v
e
been
combined
[24].
4.
PR
OPOSED
APPR
O
A
CH
T
O
F
A
CE
RECOGNITION
In
this
paper
,
after
dis
cussing
the
w
orking
principle
of
PCA
in
detail,
we
propose
a
solution
for
f
ace
recognition
problem
based
the
principal
components
of
the
trai
ning
grayscale
f
ace
image
matrices.
The
pro-
posal
is
a
customization
of
v
arious
principal
components-based
e
xisting
classifiers.
The
main
customization
is
made
in
case
of
deri
ving
the
training
and
test
sets,
where
the
images
are
placed
as
matrices
rather
than
as
v
ectors
of
the
traditional
approaches
and
introducing
the
transpose
of
the
main
sets
as
discussed
later
.
T
o
implement
the
proposal,
the
f
ace
recognition
problem
is
di
vided
into
tw
o
cate
gories.
4.1.
Pr
oblem
statement-1:
Recognition
of
a
typical
face
Gi
v
en
a
ne
w
image,
classify
it
to
“f
ace”
or
“non-f
ace”
from
a
set
of
N
original
peoples’
f
ace
images,
each
image
is
R
pix
els
high
by
C
pix
els
wide
i.e.,
the
pix
el
resolution
is
R
C
.
T
o
solv
e
this,
we
mer
ge
N
training
image
matrices
into
a
single
big
matrix
by
placing
one
after
another
.
Then,
we
also
place
the
input
image
matrix
N
times
one
after
another
to
form
another
big
matrix.
After
that,
we
tak
e
the
transpose
of
both
big
matrices.
Subsequently
,
we
apply
PCA
on
the
four
big
matrices
and
select
k
eigen
v
ectors
for
each.
W
e
then
determine
the
similarity
of
the
normal
input
big
matrix
with
the
normal
training
big
matrix,
and
transposed
input
big
matrix
with
the
t
ransposed
training
big
matrix
using
selected
k
features
(eigen
v
ectors).
Finally
,
the
decision
is
tak
en
based
the
similarity
result.
The
solution
is
illustrated
with
the
follo
wing
steps:
–
Step
1:
Input
the
N
original
images
of
size
R
C
.
–
Step
2:
F
or
each
of
the
N
images,
con
v
ert
the
image
to
a
matrix
of
length
(dimension)
R
C
a.
Step
2.1:
Put
all
the
matrices
together
in
one
big
image-matrix,
T
rain1
lik
e
this:
T
rain1
=
2
6
6
6
6
4
ImageMatrix1
ImageMatrix2
:::
:::
ImageMatrixN
3
7
7
7
7
5
b
.
Step
2.2:
T
ak
e
the
transpose
of
T
rain1
and
assign
it
to
another
matrix,
T
rain2
.
T
rain2
=
T
r
anspose
(
T
rain1
)
–
Step
3:
F
or
the
ne
w
image
to
be
classified,
a.
Step
3.1:
Con
v
ert
the
image
to
a
matrix
of
length
R
C
and
put
it
N
times
together
in
another
big
image-matrix,
T
est1
lik
e
this:
T
est1
=
2
6
6
6
6
4
NewImageMatrix
NewImageMatrix
:::
:::
NewImageMatrix
3
7
7
7
7
5
b
.
Step
3.2:
T
ak
e
the
transpose
of
T
est1
and
assign
it
to
another
matrix,
T
est2
.
T
est2
=
T
r
anspose
(
T
est1
)
[label=.]
PCA-based
dimensionality
r
eduction
for
face
r
eco
gnition
(Md.
Ab
u
Marjan)
Evaluation Warning : The document was created with Spire.PDF for Python.
1626
r
ISSN:
1693-6930
–
Step
4:
F
or
both
big
image
matrices,
a.
Step
4.1:
Apply
PCA
b
.
Step
4.2:
Select
k
eigen
v
ectors
with
the
highest
eigen
v
alues
–
Step
5:
Determine
the
similarity
of
the
ne
w
image
with
the
e
xisting
images
using
the
k
e
xtracted
features
i.e.,
determi
ne
the
similarity
of
T
est1
with
T
rain1
and
T
est2
with
T
rain2
using
the
k
e
xtracted
features.
–
Step
6:
Classify
the
ne
w
input
image
either
to
“f
ace”
if
the
similarity
is
highest,
or
to
“non-f
ace”,
other
-
wise.
4.2.
Pr
oblem
statement-2:
Recognition
of
indi
vidual
face
Gi
v
en
a
ne
w
image,
classify
it
to
most
similar
ima
g
e(s)
from
a
set
of
N
original
f
ace
images
for
e
ach
of
the
m
peoples,
each
i
mage
is
R
pix
els
high
by
C
pix
els
wide
i.e.,
the
size
is
R
C
.
T
o
solv
e
this,
we
mer
ge
N
training
image
matrices
for
each
of
the
m
people
into
a
separate
single
big
matrix
by
placing
one
after
another
.
Then,
we
also
place
the
input
image
matrix
N
times
one
after
another
to
form
another
big
matrix.
After
that,
we
tak
e
the
transpose
of
all
big
matrices.
Subsequently
,
we
apply
PCA
on
all
big
matrices
and
select
k
eigen
v
ectors
for
each.
W
e
then
determine
the
similarity
of
the
normal
input
big
matrix
with
all
normal
training
big
matrices,
and
transposed
input
big
matrix
with
all
transposed
training
big
matrices
using
selected
k
features.
Finally
,
the
decision
is
tak
en
based
the
similarity
result.
The
solution
is
illustrated
with
the
follo
wing
steps:
•
Step
1:
Input
the
N
original
images
of
size
R
C
for
each
of
the
m
peoples.
•
Step
2:
F
or
each
of
the
N
images
of
each
of
the
m
peoples,
con
v
ert
it
into
a
matrix
of
length
R
C
–
Step
2.1:
Put
the
matrices
together
in
a
separate
big
image-matrix,
T
rain3
lik
e
this:
T
rain3
=
2
6
6
6
6
4
ImageMatrix1
ImageMatrix2
:::
:::
ImageMatrixN
3
7
7
7
7
5
–
Step
2.2:
T
ak
e
the
transpose
of
T
rain3
and
assign
it
to
another
matrix,
T
rain4
.
T
rain4
=
T
r
anspose
(
T
rain3
)
•
Step
3:
F
or
the
ne
w
image
to
be
classified,
–
Step
3.1:
Con
v
ert
the
image
to
a
matrix
of
length
R
C
and
put
it
N
times
together
in
another
big
image-matrix,
T
est3
lik
e
this:
T
est3
=
2
6
6
6
6
4
NewImageMatrix
NewImageMatrix
:::
:::
NewImageMatrix
3
7
7
7
7
5
–
Step
3.2:
T
ak
e
the
transpose
of
T
est4
and
assign
it
to
another
matrix,
T
est3
.
T
est4
=
T
r
anspose
(
T
est3
)
•
Step
4:
F
or
both
big
image
matrices,
–
Step
4.1:
Apply
PCA
–
Step
4.2:
Select
k
eigen
v
ectors
with
the
highest
eigen
v
alues
•
Step
5:
Determine
the
s
imilarity
of
the
ne
w
image
with
all
the
e
xisting
i
mages
of
m
peoples
using
the
k
e
xtracted
features
i.e.,
determine
the
similarity
of
T
est3
with
T
rain3
and
T
est4
with
T
rain4
using
the
k
e
xtracted
features.
TELK
OMNIKA
T
elecommun
Comput
El
Control,
V
ol.
19,
No.
5,
October
2021
:
1622
–
1629
Evaluation Warning : The document was created with Spire.PDF for Python.
TELK
OMNIKA
T
elecommun
Comput
El
Control
r
1627
•
Step
6:
Classify
the
ne
w
input
image
to
the
most
probable
ima
g
e(s)
with
the
highest
similarity
.
T
o
determine
the
similarity
for
both
problem
statement
s,
first,
each
eigen
v
ector
in
a
training
set
i
s
subtracted
with
its
corresponding
eigen
v
ector
in
the
testing
set.
Then
the
result
of
each
eigen
v
ector
is
a
v
eraged.
No
w
,
the
ne
w
instance
is
classified
as
“yes”,
if
the
a
v
erage
v
alues
are
near
to
a
threshold
v
alue,
say
,
that
w
ould
be
ideally
around
zero
(0).
5.
RESUL
TS
The
proposed
method
for
f
ace
recognition
based
on
principal
components
has
been
implemented
in
MA
TLAB
simulation
platform.
The
implemented
code
has
been
tested
for
some
common
f
ace
images
captured
manually
.
In
addition,
it
has
been
tested
for
the
tw
o
popular
f
ace
image
databases:
ORL
and
Y
ale.
In
ORL
database,
there
are
10
dif
ferent
grayscale
images
of
each
of
40
distinct
subjects.
F
or
some
of
the
subjects,
the
images
were
tak
en
at
dif
ferent
times,
and
with
the
v
ariation
of
lighting
and
f
acial
e
xpressions.
All
images
were
captured
ag
ainst
a
dark
homogeneous
background
with
the
subjects
in
an
upright,
frontal
position.
In
Y
ale
database,
there
are
11
dif
ferent
grayscale
images
of
each
of
15
distinct
subjects/indi
viduals,
one
per
dif
ferent
f
acial
e
xpression
or
configuration.
Y
ale
has
e
xtensions
called
Extended
Y
ale
F
ace
Database
A
and
B.
Extended
Y
ale
F
ace
Database
B
has
38
subjects/indi
viduals
and
around
64
near
frontal
images
under
dif
ferent
illuminations
per
subject.
F
or
both
databases,
there
are
tw
o
types
of
pix
el
resolution
for
the
images
a
v
ailable:
32
32
and
64
64
.
Some
images
from
ORL
and
Y
ale
and
e
xtended
Y
ale
f
ace
database
B
are
sho
wn
in
Figures
1
(a-c)
respecti
v
ely
[25]
while
T
able
1
sho
ws
the
results
on
dif
ferent
data
distrib
utions.
(a)
(b)
(c)
Figure
1.
F
ace
databases:
(a)
sample
images
from
the
ORL
database,
(b)
sample
images
from
the
Y
ale
database,
and
(c)
sample
images
from
the
e
xtended
Y
ale
f
ace
database
B
F
or
the
database,
the
training
and
testing
sets
are
created
in
the
same
manner
mentioned
abo
v
e.
F
or
the
first
problem
statement,
a
random
subset
of
images
from
e
v
ery
subject
w
as
tak
en
to
form
the
training
set,
T
rain1
and
thus
T
rain2
.
The
other
images
were
considered
to
be
the
testing
set,
T
est1
and
thus
T
est2
.
F
or
the
second
problem
statement,
a
random
subset
of
images
per
e
v
ery
subject
w
as
tak
en
to
form
the
training
set,
T
rain3
and
thus
T
rain4
.
An
y
of
the
rest
image(s)
of
the
respecti
v
e
subject,
upon
which
the
training
sets
are
formed,
w
as
considered
to
be
the
testing
set,
T
est3
and
thus
T
est4
.
The
recognition
result
of
the
PCA-based
dimensionality
r
eduction
for
face
r
eco
gnition
(Md.
Ab
u
Marjan)
Evaluation Warning : The document was created with Spire.PDF for Python.
1628
r
ISSN:
1693-6930
T
able
1.
Databases
and
results
T
ask
Database
T
otal
number
of
samples
Samples
of
indi
vidual
subject
Recognition
of
a
typical
f
ace
ORL
400
40
Theoretically:
0;
Practically:
around
0
Y
ale
2432
38
Theoretically:
0;
Practically:
around
0
Recognition
of
Indi
vidual
F
ace
ORL
400
40
Theoretically:
0;
Practically:
around
0
Y
ale
2432
38
Theoretically:
0;
Practically:
around
0
proposed
method
w
as
quite
acceptable
because
of,
especially
,
the
training
sets,
T
rain2
and
T
rain4
,
which
are
the
transpose
of
the
original
training
sets,
T
rain1
and
T
rain3
respecti
v
ely
.
The
recognition
accurac
y
can
significantly
be
decreased
with
the
inconsistent
images
in
the
training
sets.
6.
CONCLUSION
AND
FUTURE
W
ORK
The
discussed
comprehensi
v
e
o
v
ervie
w
of
DR
techniques
and
the
w
orking
principle
of
PCA
can
be
the
ingredients
for
de
v
eloping
a
typical
image-data
mining
a
p
pl
ication.
The
proposed
method
for
f
ace
recognition
based
on
principal
components
can,
mostly
,
be
used
in
those
applications
where
a
fe
w
images
are
enough
to
train.
The
proposed
a
pp
r
oach
can
be
used
for
not
only
f
ace
recognition
b
ut
also
for
other
kind
of
objects
recognition
in
the
same
manner
.
In
future,
the
proposed
technique
will
be
applied
on
ORL
and
Y
ale
databases
completely
along
with
other
f
ace
databases
and
its
performance
will
be
compared
with
the
e
xisting
classifiers
based
on
either
machine
learning
algorithms
or
other
statisti
cal
approaches.
In
addition,
an
adapti
v
e
range
of
the
threshold,
to
recognize
an
instance
will
be
determined.
REFERENCES
[1]
J.
Han,
J.
Pei,
and
M.
Kamber
,
”Data
mining:
concepts
and
techniques,
”
Else
vier
,
2011.
[2]
I.
H.
W
itten
and
E.
Frank,
”Data
mining:
practical
machi
ne
learning
tools
and
techniques
with
Ja
v
a
implementations,
”
A
CM
SIGMOD
Recor
d
,
v
ol.
31,
no.
1,
pp.
76-77,
2002,
doi:
10.1145/507338.507355.
[3]
M.
F
.
Rabbi
et
al.
,
”Performance
Ev
aluation
of
Data
Mini
ng
Classification
T
echniques
for
Heart
Disease
Prediction,
”
American
J
ournal
of
Engineering
Resear
c
h
(AJER)
,
v
ol.
7,
no.
2,
pp.
278-283,
2002.
[4]
S.
M.
M.
Hasan,
M.
A.
Mamun,
M.
P
.
Uddin
and
M.
A.
Hossain,
”Comparati
v
e
Analysis
of
Classification
Approaches
for
Heart
Disease
Prediction,
”
2018
International
Confer
ence
on
Computer
,
Communication,
Chemical,
Material
and
Electr
onic
Engineering
(IC4ME2)
,
2018,
pp.
1-4,
doi:
10.1109/IC4ME2.2018.8465594.
[5]
L.
Maaten,
E.
Postma,
and
J.
Heri,
”Dimensionality
Reduction:
A
Comparati
v
e
Re
vie
w
,
”
J
ournal
of
Mac
hine
Learn-
ing
Resear
c
h
,
v
ol.
10,
no.
1,
2009.
[6]
A.
W
.
Altaher
and
S.
K.
Abbas,
“Image
processing
analysis
of
sigmoidal
Hadamard
w
a
v
elet
with
PCA
todetect
hidden
object,
”
TELK
OMNIKA
T
elecommunicati
on
Computing
Electr
onics
and
Contr
ol
,
v
ol.
18,
no.
3,
pp.
12161223,
Jun.
2020,
doi:
10.12928/telk
omnika.v18i3.13541.
[7]
S.
A.
Bak
er
,
H.
H.
Mohammed,
and
H.
A.
Aldabagh,
“Impro
ving
f
ace
recognition
by
artificial
neural
netw
ork
using
principal
component
analysis,
”
TELK
OMNIKA
T
elecommunication
Computing
Electr
onics
and
Contr
ol
,
v
ol.
18,
no.
6,
pp.
3357–3364,
doi:
10.12928/telk
omnika.v18i6.16335.
[8]
R.
Ka
vitha
and
E.
Kannan,
”An
ef
ficient
frame
w
ork
for
heart
disease
classification
using
feature
e
xtraction
and
feature
selection
technique
in
data
mining,
”
2016
International
Confer
ence
on
Emer
ging
T
r
ends
in
Engineering
,
T
ec
hnolo
gy
and
Science
(ICETETS)
,
2016,
pp.
1-5,
doi:
10.1109/ICETETS.2016.7603000.
[9]
R.
N.
Rohmah,
B.
Handag
a,
N.
Nurokhim,
and
I.
Soesanti,
”A
statistical
approach
on
pulmonary
tuberculosis
detec-
tion
system
based
on
X-ray
image,
”
TELK
OMNIKA
T
elecommunication
Computing
Electr
onics
and
Contr
ol
,
v
ol
17,
no.
9,
pp.
1474–1482,
Jun.
2019,
doi:
10.12928/telk
omnika.v17i3.10546.
[10]
O.
A.
Ade
gbola,
I.
A.
Ade
yemo,
F
.
A.
Semire,
S.
I.
Popoola,
and
A.
A.
Atayero,
”A
principal
component
analysis-
based
feature
dimensionality
reduction
scheme
for
content-based
image
retrie
v
al
system,
”
TELK
OMNIKA
T
elecom-
munication
Computing
Electr
onics
and
Contr
ol
,
v
ol.
18,
no.
4,
pp.
1892–1896,Aug.
2020,
doi:
10.12928/telk
om-
nika.v18i4.11176.
[11]
S.
Raschka,
“Principal
component
analysis
in
3
simple
steps,
”2015.
[Online].
A
v
ail-
able:http://sebastianraschka.com/Articles/2015pcain3steps.html
[12]
M.
Dash,
H.
Liu
and
J.
Y
ao,
”Dimensionality
reduction
of
unsupervised
data,
”
Pr
oceedings
Ninth
IEEE
International
Confer
ence
on
T
ools
with
Artificial
Intellig
ence
,
1997,
pp.
532-539,
doi:
10.1109/T
AI.1997.632300.
[13]
A.
Ghodsi,
“Dimensionality
reduction
a
short
tutorial,
”
Department
of
Statist
ics
and
Actuarial
Science,
Uni
v
ersity
of
W
aterloo
W
aterloo,
Ontario,
Canada,
pp.
1–25,
2006.
TELK
OMNIKA
T
elecommun
Comput
El
Control,
V
ol.
19,
No.
5,
October
2021
:
1622
–
1629
Evaluation Warning : The document was created with Spire.PDF for Python.
TELK
OMNIKA
T
elecommun
Comput
El
Control
r
1629
[14]
C.
O.
S.
Sorzano,
J.
V
ar
g
as,
and
A.
P
ascual,
“
A
surv
e
y
of
dimensionality
reduction
techniques,
”
Mac
hine
Learning
,
pp.
1–35,
2014.
[15]
M.
Loog,
B.
v
an
Ginnek
en,
and
R.
P
.
W
.
Duin,
“Dimensionality
reduction
of
image
features
using
the
canonical
conte
xtual
correlation
projection,
”
P
attern
Reco
gnition
,
v
ol.
38,
no.
12,
pp.
2409-2418,
2005,
doi:10.1016/j.patcog.2005.04.011.
[16]
M.
G.
B.
Blum,
M.
A.
Nunes,
D.
Prangle,
and
S.
A.
Sisson,
“
A
Comparati
v
e
Re
vie
w
of
Dimension
Reduction
Methods
in
Approximate
Bayesian
Computation,
”
Statistical
Science
,
v
ol.
28,
no.
2,
pp.
189–208,
May2013,
doi:
10.1214/12-STS406.
[17]
R.
Silipo,
I.
Adae,
and
A.
H.
M.
Berthold,
“Se
v
en
techniques
for
dimensionality
reduction,
”
Open
for
Inno
v
ation
KNIME,
2014.
[Online].
A
v
ailable:
https://www
.knime.com/sit
es/def
ault/files/inline-
images/knime
se
v
entechniquesdatadimreduction.pdf
[18]
S.
K.
Joshi
and
S.
Machchhar
,
”An
e
v
olution
and
e
v
aluation
of
dimensionality
reduction
techniques
—
A
comparati
v
e
study
,
”
2014
IEEE
International
Conf
er
ence
on
Com
putational
Intellig
ence
and
Computing
Resear
c
h
,
2014,
pp.
1-5,
doi:
10.1109/ICCIC.2014.7238538.
[19]
W
.
Nick,
J.
Shelton,
G.
Bullock,
A.
Esterline
and
K.
Asamene,
”Comparing
dimensionality
reduction
techniques,
”
SoutheastCon
2015
,
2015,
pp.
1-2,
doi:
10.1109/SECON.2015.7132997.
[20]
W
.
Huang
and
H.
Y
in,
”Linear
and
nonlinear
dimensionality
reduction
for
f
ace
recognition,
”
2009
16th
IEEE
Inter
-
national
Confer
ence
on
Ima
g
e
Pr
ocessing
(ICIP)
,
2009,
pp.
3337-3340,
doi:
10.1109/ICIP
.2009.5413898.
[21]
A.
Alkandari
and
S.
J.
Aljaber
,
”Principle
Component
Analysis
algorithm
(PCA)
for
image
recognition,
”
2015
Second
International
Confer
ence
on
Computing
T
ec
hnolo
gy
and
Information
Mana
g
ement
(ICCTIM)
,
2015,
pp.
76-80,
doi:
10.1109/ICCTIM.2015.7224596.
[22]
S.
K.
Dandpat
and
S.
Meher
,
”Performance
impro
v
ement
for
f
ace
recognition
using
PCA
and
tw
o-dimensional
PCA,
”
2013
International
Confer
ence
on
Computer
Communication
and
Informatics
,
2013,
pp.
1-5,
doi:
10.1109/IC-
CCI.2013.6466291.
[23]
H.
Su
and
X.
W
ang,
”Principal
Component
Analysis
in
Linear
Discriminant
Analysis
Space
for
F
ace
Recognition,
”
2014
5th
International
Confer
ence
on
Digital
Home
,
2014,
pp.
30-34,
doi:
10.1109/ICDH.2014.13.
[24]
D.
K
umar
,
C.
S.
Rai
and
S.
K
umar
,
”F
ace
Recognition
using
Self-Or
g
anizing
Map
and
Principal
Component
Analysis,
”
2005
Internati
onal
Conference
on
Neural
Netw
orks
and
Brain,
2005,
pp.
1469-1473,
doi:
10.1109/IC-
NNB.2005.1614908.
[25]
D.
Cai,
“F
our
f
ace
databases
in
matlab
format.
”
[Online].
A
v
ailable:
http://www
.cad.zju.edu.cn/home/dengcai/Data/F
aceData.html
PCA-based
dimensionality
r
eduction
for
face
r
eco
gnition
(Md.
Ab
u
Marjan)
Evaluation Warning : The document was created with Spire.PDF for Python.