Indonesian
J
our
nal
of
Electrical
Engineering
and
Computer
Science
V
ol.
16,
No.
2,
No
v
ember
2019,
pp.
827
834
ISSN:
2502-4752,
DOI:
10.11591/ijeecs.v16i2.pp827-834
r
827
An
optimization
of
facial
featur
e
point
detection
pr
ogram
by
using
se
v
eral
types
of
con
v
olutional
neural
netw
ork
Sh
y
ota
Shindo
1
,
T
akaaki
Goto
2
,
T
adaaki
Kirishima
3
,
K
ensei
Tsuchida
4
1,3,4
T
o
yo
Uni
v
ersity
,
2100
K
ujirai,
Ka
w
agoe,
Saitama,
Japan
2
Ryutsu
K
eizai
Uni
v
ersity
,
3-2-1
Shin-Matsudo,
Matsudo,
Chiba,
Japan
Article
Inf
o
Article
history:
Recei
v
ed
Jan
17,
2019
Re
vised
Apr
7,
2019
Accepted
May
10,
2019
K
eyw
ords:
F
acial
feature
point
detection
Neural
netw
ork
Con
v
olutional
neural
netw
ork
ABSTRA
CT
Detection
of
f
acial
feature
points
is
an
important
technique
used
for
biometric
au-
thentication
and
f
acial
e
xpress
ion
estimation.
A
f
acial
feature
point
is
a
local
point
indicating
both
ends
of
the
e
ye,
holes
of
the
nose,
and
end
points
of
the
mouth
in
the
f
ace
image.
Man
y
researches
on
f
ace
feature
point
detection
ha
v
e
be
en
done
so
f
ar
,
b
ut
the
accurac
y
of
f
acial
or
g
an
point
detection
is
impro
ving
by
the
approach
using
Con-
v
olutional
Neural
Netw
ork
(CNN).
Ho
we
v
er
,
CNN
not
only
tak
es
time
to
learn
b
ut
also
the
neural
netw
ork
becomes
a
complicated
model,
so
it
is
necessary
to
impro
v
e
learning
time
and
detection
accurac
y
.
In
this
research,
the
impro
v
ement
of
the
detec-
tion
accurac
y
of
the
learning
speed
is
impro
v
ed
by
increasing
the
con
v
olution
layer
.
Copyright
c
2019
Insitute
of
Advanced
Engineeering
and
Science
.
All
rights
r
eserved.
Corresponding
A
uthor:
T
akaaki
Goto,
Ryutsu
K
eizai
Uni
v
ersity
,
3-2-1
Shin-Matsudo,
Matsudo,
Chiba,
Japan.
Email:
tg@gotolab
.net
1.
INTR
ODUCTION
A
f
acial
feature
point
is
a
local
point
indicating
a
place
such
as
an
e
ye
end
or
a
mouth
end
of
a
f
acial
image.
The
detection
of
f
acial
feature
points
is
applied
to
important
technologies
such
as
f
acial
e
xpression
estimation
and
biometric
authentication
using
f
acial
images.
Man
y
detection
methods
ha
v
e
been
proposed
so
f
ar
,
b
ut
with
the
adv
ent
of
Con
v
olutional
Neural
Netw
ork
(CNN)
in
recent
years,
man
y
researches
on
detection
methods
using
CNN
ha
v
e
been
conducted,
and
detection
with
higher
accurac
y
is
getting
e
xpected
[1].
Ho
we
v
er
,
CNN
learning
tak
es
time.
If
the
layer
of
CNN
becomes
deep
and
the
number
of
t
raining
data
is
lar
ge,
the
learning
time
becomes
huge.
As
a
method
for
speeding
up
learning,
there
are
methods
using
GPU
with
good
performance,
and
methods
for
de
vising
hardw
are
such
as
adding
main
memory
.
In
addition,
the
methods
for
de
vising
softw
are
are
[2,
3,
4].
Among
them,
there
are
a
fe
w
methods
[5]
to
de
vise
pre-processing
for
input
data.
In
this
paper
,
we
aim
at
impro
ving
preprocessing
of
input
data
and
speed
up
learning
of
f
acial
fea
ture
point
detection
program
using
CNN.
CNN
w
as
implemented
in
Python
with
reference
to
the
program
of
Y
a-
mashita
et
al.
[6].
W
e
propose
a
method
to
reduce
the
number
of
layers
of
CNN
by
applying
Laplacian
filter
to
preprocessing
and
reducing
image
features.
2.
RELA
TED
W
ORKS
F
acial
feature
point
detection
can
be
obtained
by
v
arious
methods
such
as
CNN
and
image
processing.
As
a
con
v
entional
method,
Cootes
et
al’
s
Acti
v
e
Appearance
Model
(AAM)
is
a
v
ailable
[7].
In
this
method,
the
a
v
erage
Shape
is
obtained
by
using
the
coordinate
points
of
the
f
acial
images
of
all
the
le
arning
data,
J
ournal
homepage:
http://iaescor
e
.com/journals/inde
x.php/ijeecs
Evaluation Warning : The document was created with Spire.PDF for Python.
828
r
ISSN:
2502-4752
and
the
a
v
erage
f
ace
is
obtained
by
using
the
pix
el
v
alues.
Principal
component
analysis
is
performed
using
the
coordinate
points
of
this
Shape
and
the
pix
el
v
alues
in
the
Shape,
and
the
change
amount
is
obtained.
Appearance
can
sho
w
the
features
of
the
front
f
ace
and
Shape
can
e
xpress
the
orientation
and
s
hape
of
the
f
ace.
By
combining
tw
o,
it
becomes
possible
to
create
a
f
ace
image
that
can
respond
to
changes
in
f
ace
orientation
and
shape.
That
is,
in
order
to
obtain
the
f
acial
feature
point
of
the
input
f
ace
image,
Shape
and
Appearance
are
updated
by
the
gradient
descent
method.
Ho
we
v
er
,
this
method
can
not
deal
with
unkno
wn
image
data,
resulting
in
lo
w
accurac
y
.
As
a
method
of
image
processing,
V
ukadino
vic
et
al.
independently
detect
each
feature
point
using
a
Gabor
filter
(a
filter
that
e
xtracts
the
direction
of
the
line
in
the
image)
[8].
Recently
used
method
is
CNN.
Since
winning
in
the
object
recognition
cate
gory
of
ILSVRC
in
2012,
CNN
has
recei
v
ed
great
attention.
As
a
f
acial
f
eature
point
detection
method
using
CNN,
there
is
a
method
by
Kimura
et
al
[9].
In
this
method,
f
acial
or
g
an
points
are
detected
by
learning
input
v
alues
as
100
100
grayscale
images
and
learning
teaching
data
as
coordinate
v
alues
of
f
acial
or
g
an
points
of
the
images.
This
mak
es
it
pos
sible
to
cope
with
unkno
wn
image
data.
There
is
also
a
method
of
creating
an
o
pt
imum
mini
batch
in
CNN
mini
batch
learning.
There
is
also
a
method
of
creating
an
optimum
mini
batch
in
CNN
mini
batch
learning
[10].
Minag
a
w
a
et
al.
do
not
use
CNN
b
ut
detect
it
using
DNN
[11].
In
this
proposed
method,
points
are
mark
ed
as
a
learning
sample
in
an
image
among
e
xisting
correct
feature
point
and
a
certain
range
of
the
feature
point,
and
a
transfer
v
ector
representing
the
relati
v
e
position
from
the
feature
point
to
each
learning
sample
is
used.
Ho
we
v
er
,
with
this
approach,
it
is
necessary
to
use
a
separate
DNN
for
each
or
g
an
and
the
accurac
y
is
reduced.
Con
v
entional
methods
were
those
that
independently
detect
each
feature
by
image
processing,
or
one
using
CNN.
Although
there
is
a
method
using
DNN,
complicated
processing
is
in
v
olv
ed.
In
this
research,
we
aim
to
propose
a
ne
w
method
which
is
more
accurate
and
f
aster
than
the
con
v
entional
method.
In
addition,
we
also
compare
e
x
ecution
time
and
detection
accurac
y
with
detection
by
ef
fecti
v
e
CNN
in
f
acial
feature
point
detection.
In
[4],
authors
search
for
an
object
from
the
image,
clip
it
out,
and
mak
e
it
an
input
v
alue
to
CNN.
And
CNN
judges
whether
it
is
a
f
ace
or
not
(R
-
CNN).
This
paper
is
a
research
on
accelerating
R-CNN.
Our
method
is
seeking
f
acial
or
g
an
points
for
image
data
which
has
already
cut
out
the
f
ace.
[1]
proposes
a
f
ace
authentication
system
that
F
acebook
made.
It
is
dif
ferent
from
our
research
that
t
h
e
f
ace
orientation
is
detected
and
af
fine
transformation
is
performed.
In
[6],
the
proposed
method
reduces
the
coordinate
v
alues
of
f
acial
or
g
an
points.
[12]
sho
ws
the
result
t
hat
it
becomes
high
performance
when
learning
f
acial
or
g
an
point
re
gression
and
classification
of
features
such
as
wearing
glasses
simultaneously
(TCDCN).
As
in
the
model
of
[13]
,
there
are
researches
that
adapted
Laplacian
filter
to
pretreatment.
Ho
we
v
er
,
this
is
a
technique
for
reducing
the
v
ariability
of
the
input
pattern
by
the
f
ace
detection
program
and
does
not
mention
the
learning
speed.
[14]
describes
f
acial
feature
point
detection
using
Gabor
filter
.
The
method
in
this
research
dif
fers
from
this
research.
In
paper
[15],
f
ace
recognition
is
used
in
the
alarm
system.
A
method
of
e
xtracting
a
feature
quantity
by
a
his
togram
is
used
for
a
f
ace
recognition
method.
In
paper
[16],
the
Enhanced
Local
binary
pattern
(EnLBP)
is
performed
to
compress
the
image
and
stored
in
the
database.
Authors
of
this
paper
ha
v
e
proposed
a
method
to
recognize
f
aces
by
comparing
sa
v
ed
images
with
images
EnLBPed
of
input
images.
3.
RESEARCH
METHOD
Python
is
used
for
the
programming
language
of
machine
learning
conducted
in
this
research.
Man
y
machinery
learning
support
libraries
are
pro
vided
in
Python,
b
ut
we
do
not
use
these
libraries
because
this
study
does
not
mak
e
the
comparison
of
e
x
ecution
times
ambiguous.
T
able
1
sho
ws
the
PC
en
vironment
in
which
machine
learning
w
as
performed.
In
addition,
Python
has
a
library
for
CUD
A
that
allo
ws
GPU
to
calculate,
b
ut
in
this
research
we
ha
v
e
not
done
calculations
with
GPU
at
all.
T
able
1.
En
vironment
Item
Spec
CPU
Intel(R)Celeron(R)2957U,
1.40GHz,
Multi-Core
Memory
4.00GB
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
16,
No.
2,
No
v
ember
2019
:
827
–
834
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
r
829
First
of
all,
CNN
of
the
structure
of
Figure
1
which
is
the
standard
in
this
research
w
as
implemented
Figure
1.
The
structure
of
CNN
[10]
3.1.
Implementation
of
F
acial
F
eatur
e
P
oint
Detection
by
CNN
As
sho
wn
in
Fig.
1,
the
CNN
hierarchically
follo
ws
the
con
v
olution
layer
and
the
pooling
layer
aft
er
the
input
image,
and
then
passes
through
the
fully
connected
layer
.
F
or
the
structure
and
acti
v
ation
function
of
CNN
in
this
study
,
we
refer
to
[10],
and
the
con
v
olution
layer
and
pooling
layer
shall
be
as
sho
wn
in
Figure
1.
Because
CNN
is
a
supervised
learning,
CNN
uses
the
squared
error
between
the
output
v
alue
and
the
teacher
data
as
a
loss
function.
Furthermore,
it
transmits
error
to
the
input
layer
by
error
back
propag
ation
method
and
updates
by
gradient
descent
method.
W
e
use
the
Labeled
F
aces
in
the
W
ild
(LFW)
data
set.
The
LFW
data
set
cuts
out
an
image
with
the
detection
range
from
the
forehead
of
the
f
ace
image
to
the
submaxillary
as
the
detection
range
and
annotates
the
i
mage
after
normalizing
the
cut
out
image
to
100
100
in
that
range.
The
coordinate
v
alue
and
the
clipping
range
at
that
time
are
publicly
a
v
ailable.
Annotation
contains
a
total
of
10
points,
4
e
yes
at
both
e
yes
and
at
the
inner
corner
,
2
at
the
bottom
of
the
nose,
4
points
at
both
ends
of
the
lip
and
abo
v
e
and
belo
w
.
Since
the
learning
amount
is
not
enough
with
only
the
LFW
data
set,
1500
sheets
of
image
data
are
subjected
to
data
aggre
g
ation
to
increase
the
number
of
images
for
learning
to
20000
sheets.
Data
augmentation
performed
is
noise
addi
tion,
translational
mo
v
ement
of
5
pix
els
up
and
do
wn,
left
and
right,
a
v
eraging
by
mean
filter
,
sharpening
by
sharpening
filter
.
As
learning
methods,
batch
learning,
stochastic
gradient
descent
method,
mini
batch
can
be
used,
b
ut
in
this
research
mini
batch
is
adopted.
Creating
a
mini-batch
is
carried
out
by
randomly
selected
from
the
image
obtained
by
increasing
perform
data
augmentation.
In
addition,
we
will
use
the
same
epoch
e
v
ery
mini
batch
in
learning,
not
shuf
fle
in
order
.
F
or
mini
batch,
we
mak
e
20
sheets
per
batch
and
di
vide
the
error
accumulated
during
the
gradient
descent
method
by
the
number
of
mini
batches.
The
learning
update
rate
is
l
e
4(0
:
0001)
,
the
number
of
epochs
is
20,
and
updating
is
done
20,000
times.
Ne
xt,
the
structure
of
CNN
will
be
e
xplained.
First,
as
an
input
v
alue,
an
image
is
clipped
to
the
range
sho
wn
in
the
LFW
data
set,
and
an
image
normal
ized
to
a
size
of
100
100
and
made
into
grayscale
is
used.
Ne
xt,
the
con
v
olution
layer
and
the
pooling
layer
are
alternately
arranged
and
each
has
three
layers.
The
filter
size
of
each
con
v
olution
layer
is
9
9
,
and
the
number
of
mo
v
ements
in
the
con
v
olution
operation
i
s
1
pix
el.
The
acti
v
ation
function
uses
Maxout
with
tw
o
adjacent
feature
maps.
The
number
of
filters
is
16,
8
32
,
and
16
64
.
Each
pooling
layer
performs
max-pooling
by
filter
size
2
2
.
Finally
,
the
total
coupling
layer
and
the
output
layer
are
composed
of
one
layer
,
and
the
input
v
alue
becomes
one
dimension
of
the
feature
map
through
the
pooling
layer
.
In
this
paper
,
the
number
of
input
v
alues
is
1152,
and
the
number
of
output
v
alues
is
20.
The
number
of
outputs
is
the
binary
v
alue
of
the
coordinate
v
alues
x
and
y
on
which
the
annotation
is
performed,
and
it
is
twice
t
he
number
of
the
feature
points.
The
output
v
alue
is
between
0
and
1,
and
the
acti
v
ation
function
uses
linear
combination.
Since
the
output
is
between
0
a
n
d
1,
the
teacher
data
is
di
vided
by
1000,
and
when
it
is
detected,
it
is
obtained
by
multiplying
the
output
v
alue
by
1000.
Also,
to
pre
v
ent
o
v
er
learning,
Dropout
is
pro
vided
in
all
the
bonding
layers.
The
probability
of
Dropout
is
50%
,
and
it
is
independent
for
each
image
in
the
subset.
The
structure
of
CNN
is
sho
wn
in
T
able
2.
An
optimization
of
facial
featur
e
point
detection
pr
o
gr
am...
(Shyota
Shindo)
Evaluation Warning : The document was created with Spire.PDF for Python.
830
r
ISSN:
2502-4752
T
able
2.
Details
of
CNN
Size
of
images
Filter
,
number
of
weight
Acti
v
ation
function
Input
Layer
100
100
Con
v
olutional
layer
1
92
92
16
Maxout
Pooling
Layer
1
46
46
Zero
padding
48
48
Con
v
olutional
layer
2
40
40
8
32
Maxout
Pooling
Layer
2
20
20
Con
v
olutional
layer
3
12
12
16
64
Maxout
Pooling
Layer
3
6
6
Full
connected
Layer
1152
Output
Layer
Linear
Combination
As
a
result,
the
e
x
ecution
time
w
as
about
61
hours.
Ho
we
v
er
,
with
only
20
epochs,
the
error
con
v
er
ged
only
to
about
0
:
012
.
Detection
accurac
y
using
test
data
w
as
96%
.
Since
the
learning
w
as
carried
out
at
20
epochs
this
time,
the
accurac
y
deteriorated
considerably
b
ut
the
con
v
er
gence
of
the
error
w
as
quite
slo
w
b
ut
it
gradually
became
smaller
,
so
if
update
has
been
done
300
thousands,
it
is
e
xpected
that
the
error
can
con
v
er
ge
to
almost
0.
In
the
neural
netw
ork
de
vised
in
this
research,
the
first
goal
is
to
mak
e
the
con
v
er
gence
of
errors
f
aster
and
better
than
this
result.
The
second
goal
is
to
implement
things
that
lea
v
e
20
more
epochs
and
continue
to
con
v
er
ge
e
v
en
less
error
.
Figure
2.
A
result
of
f
acial
feature
detection
by
CNN
(Photographic
images
are
obtained
from
[17])
3.2.
Outline
of
pr
oposed
neural
netw
ork
W
e
propose
a
neural
netw
ork
which
learns
images
with
DNN
or
CNN
after
applying
the
Laplacian
filter
.
The
Laplacian
filter
is
a
kind
of
filter
processing
used
for
image
processing,
and
i
t
is
possible
to
e
xtract
only
the
po
r
tion
where
the
dif
ference
in
luminance
v
alue
is
drastic
from
the
image.
By
utilizi
ng
this
property
,
information
amount
is
reduced
from
the
input
image,
and
the
structure
of
the
neural
netw
ork
is
simplified.
There
are
tw
o
reasons
wh
y
we
decided
to
use
the
Laplacian
filte
r:
(1)It
is
not
necessary
to
consider
the
dif
ference
in
skin
color
due
to
race
by
using
Laplacian
filter
,
and
(2)
This
is
because
it
is
possible
to
discriminate
f
acial
feature
points
such
as
the
edge
of
the
e
yes
and
the
edge
of
the
mouth
by
only
the
outline
from
human
e
yes
therefore
we
assumed
that
machine
learning
can
also
discriminate
same
as
human.
In
order
to
use
the
image
processed
by
Laplacian
filter
for
learning
of
neural
netw
ork,
all
input
images
were
processed
with
Laplacian
filter
.
The
coef
ficient
of
the
filter
is
as
follo
ws:
F
il
t
er
=
2
4
1
1
1
1
8
1
1
1
1
3
5
Instead
of
processing
the
input
image
as
resized
to
100
100
as
it
is,
the
input
image
is
changed
to
a
size
of
102
102
with
zero
padding,
and
filtering
is
performed.
Figure
3
sho
ws
input
images
after
applying
Laplacian
filter
.
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
16,
No.
2,
No
v
ember
2019
:
827
–
834
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
r
831
Figure
3.
Input
images
after
applying
Laplacian
filter
(Photographic
images
are
obtained
from
[17])
Learning
w
as
done
with
neural
netw
orks
using
these
images.
As
input
v
alues,
a
one-dimensional
input
image
reduced
by
tw
o
pooling
layers
is
used
as
preprocessing.
The
structure
of
the
neural
netw
ork
is
sho
wn
in
Figure
4.
Poolying layer 1
Poolying layer 2
100 x 100
50 x 50
25 x 25
N
eural netwo
rk
Figure
4.
Frame
w
ork
of
beta
type
neural
netw
ork
4.
RESUL
TS
AND
DISCUSSION
W
e
implemented
and
e
v
aluated
the
type
neural
netw
ork
with
v
arious
configurations.
4.1.
Beta
type
5-lay
er
DNN
W
e
implemented
and
v
erified
3
-
layer
DNN,
4
-
layer
DNN
and
5
-
layer
DNN.
T
able
3
sho
ws
the
beta
type
5-layer
DNN
that
finally
con
v
er
ged
most.
Learning
w
as
conducted
with
three
layers
of
intermediate
layers
in
order
to
f
urther
impro
v
e
discrimination
po
wer
.
Also,
although
the
acti
v
ation
function
used
Maxout
and
Sigmoid
up
to
the
last
time,
Maxout
has
a
better
result
of
con
v
er
gence
of
error
e
v
en
if
layers
are
added,
so
fix
it
to
Maxout.
The
composition
of
each
layer
is
summa
rized
in
T
able
3.
The
initial
v
alue
is
1e-3
to
-1e-3,
and
the
learning
rate
is
1e-2.
An
optimization
of
facial
featur
e
point
detection
pr
o
gr
am...
(Shyota
Shindo)
Evaluation Warning : The document was created with Spire.PDF for Python.
832
r
ISSN:
2502-4752
T
able
3.
Frame
w
ork
of
beta
type
5-layer
DNN
Size
of
images,
number
of
units
Acti
v
ation
function
Probability
of
Dropout
Pooling
Layer
1
100
100
Pooling
Layer
2
50
50
Input
Layer
625
Intermediate
layer
1
600
Maxout
50%
Intermediate
layer
2
500
Maxout
25%
Intermediate
layer
3
400
Maxout
10%
Output
Layer
20
Linear
Combination
The
e
x
ecution
time
w
as
about
2
hours
when
the
number
of
units
w
as
the
minimum
and
about
4
hours
at
the
maximum.
The
error
did
not
decrease
from
nearly
0.01.
Sometimes
the
error
w
as
as
lar
ge
as
0.02.
As
a
result
of
increasing
the
number
of
interlayers,
the
error
con
v
er
gence
has
impro
v
ed
considerably
.
Ho
we
v
er
,
since
the
error
increases
with
the
input
image,
we
found
that
the
discrimination
po
wer
is
still
weak.
Therefore,
we
added
a
con
v
olution
layer
and
thought
about
learning
with
CNN.
Then
we
had
another
e
xperiment
to
detect
with
CNN.
As
the
input
v
alue,
use
the
input
image
resized
to
50
50
.
Then,
the
con
v
olution
layer
and
the
pooling
layer
are
repeated
se
v
eral
times
to
pass
to
the
entire
binding
layer
.
The
structure
of
CNN
is
sho
wn
in
Figure
5.
Convolutional
Layer
Pooling Layer
Fully Connected
Layer
50 X 50
Figure
5.
The
structure
of
beta
type
CNN
4.2.
Beta
type
CNN
with
2-lay
er
con
v
olution
lay
er
In
this
time,
the
input
image
is
passed
directly
to
the
con
v
olution
layer
without
being
reduced
by
the
pooling
layer
.
The
output
of
the
con
v
olution
layer
is
gradually
increasing.
The
total
coupling
layer
uses
the
pre
vious
DNN.
The
size
of
the
filter
of
the
con
v
olution
layer
is
11
11
,
the
initial
v
alue
is
1
e
3
to
1
e
3
,
and
the
learning
rate
is
5
e
3
.
The
composition
of
each
layer
is
summarized
in
T
able
4.
T
able
4.
Frame
w
ork
of
beta
type
CNN
with
2-layer
con
v
olution
layer
Number
of
channel
hight
width
Acti
v
ation
function
Probability
of
Dropout
Input
Layer
50
50
Con
v
olution
Layer
1
50
40
40
Maxout
Pooling
Layer
1
25
20
20
Con
v
olution
Layer
2
40
10
10
Maxout
Pooling
Layer
2
20
5
5
Input
of
All
connected
layer
500
50%
Intermediate
layer
1
300
Maxout
25%
Intermediate
layer
2
200
Maxout
10%
Output
Layer
20
Linear
Combination
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
16,
No.
2,
No
v
ember
2019
:
827
–
834
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
r
833
The
e
x
ecution
time
w
as
about
6
hours
when
the
output
number
of
the
con
v
olution
layer
and
the
number
of
units
of
the
middle
layer
w
as
the
minimum,
the
maximum
case
w
as
about
16
hours,
and
the
error
con
v
er
ged
to
0
:
02
.
The
more
con
v
er
gence
layers
are
added,
the
better
the
error
con
v
er
gence.
From
this
result,
it
w
as
found
that
the
con
v
er
gence
of
error
w
as
impro
v
ed
considerably
when
CNN
added
a
layer
rather
than
DNN.
Ho
we
v
er
,
it
also
turned
out
that
the
e
x
ecution
time
als
o
significantly
increased.
From
this,
it
is
e
xpected
that
-type
CNN
can
be
considerably
increased
in
speed.
If
the
con
v
olution
layer
has
tw
o
layers,
we
increase
the
output
number
of
the
con
v
olution
layer
more,
it
will
be
a
f
aster
and
more
accurate
classifier
than
the
CNN
implemented.
Ho
we
v
er
,
in
order
to
raise
the
accurac
y
without
increasing
the
e
x
ecution
time
an
y
more,
the
learning
w
as
performed
by
further
reducing
the
feature
amount
of
the
input
image.
Reduction
is
to
set
the
small
pix
el
v
alue
of
the
input
image
to
0.
Ev
en
if
small
pix
el
v
alues
are
reduced,
since
edges
of
the
e
yes
can
be
recognized
by
human
e
yes,
learning
is
carried
out
with
three
patterns
of
10
pix
els
or
less,
20
pix
els
or
less,
50
pix
els
or
less
of
the
input
image
set
to
0
.
The
e
x
ecution
result
con
v
er
ges
to
about
0
:
02
if
less
than
10
pix
els
is
made
0
,
and
when
the
20
pix
els
or
less
is
made
0
,
the
error
con
v
er
ged
to
about
0
:
013
.
And
if
we
set
0
pix
el
or
less
to
0
,
the
error
con
v
er
ged
to
about
0
:
03
.
From
this
result,
v
ery
good
con
v
er
gence
w
as
obtained
by
setting
0
to
20
pix
els
or
less.
In
the
method,
the
error
has
already
con
v
er
ged
to
about
0
:
013
at
the
time
of
reaching
10
epochs,
and
the
error
did
not
become
smaller
thereafter
.
T
able
5.
Result
of
learning
time
and
con
v
er
gence
of
error
for
each
NN
The
maximum
learning
time
The
con
v
er
gence
of
error
Beta
type
NN
10m
F
ailed
Beta
type
3-layer
DNN
1h30m
F
ailed
Beta
type
4-layer
DNN
3h
F
ailed
Beta
type
5-layer
DNN
4h
Almost
success
Beta
type
CNN
22h
Success
As
sho
wn
in
T
able
5,
in
Beta
type
NN,
learning
f
ailed
with
all
patterns.
In
Beta
type
DNN,
things
of
3
layers
and
4
layers
f
ailed
to
learn.
Fi
v
e
layers
of
learning
results
were
quite
good,
b
ut
it
w
as
not
enough.
And
Beta
type
CNN
succeeded
in
learning.
F
or
the
CNN
not
pretreated
with
the
Laplacian
filter
,
the
learning
time
of
the
type
CNN
pretreated
w
as
about
2
:
8
times
f
aster
and
the
detection
accurac
y
w
as
1%
lo
wer
as
sho
wn
in
T
able
6.
T
able
6.
The
accurac
y
comparison
Base
line
CNN
type
CNN
Learning
time
61
hours
22
hours
Accurac
y
96%
95%
5.
CONCLUSION
In
this
research,
f
acial
feature
points
were
detected
by
CNN
using
the
programming
language
Python.
Then,
we
attempted
to
create
a
neural
netw
ork
with
a
f
ast
learning
time
and
a
higher
precision
than
this
CNN.
In
this
research,
input
images
are
processed
by
Laplacian
filter
,
and
then
learning
is
done
with
neural
netw
ork.
The
point
that
the
image
processing
processing
is
performed
first
is
de
vised,
and
the
learning
time
has
drastically
decreased
compared
with
the
neural
netw
ork
(the
image
processing
by
the
Laplacian
filter
is
not
done).
F
or
future
w
ork,
it
is
concei
v
able
to
add
impro
v
ements
such
as
using
a
Laplacian
filter
or
increas
ing
the
con
v
olution
layer
of
CNN.
REFERENCES
[1]
Y
.
T
aigman,
M.
Y
ang,
M.
Ranzato,
and
L.
W
olf,
“Deepf
ace:
Closing
the
g
ap
to
human-le
v
el
performance
in
f
ace
v
erification,
”
in
2014
IEEE
Confer
ence
on
Computer
V
ision
and
P
attern
Reco
gnition
,
June
2014,
pp.
1701–1708.
An
optimization
of
facial
featur
e
point
detection
pr
o
gr
am...
(Shyota
Shindo)
Evaluation Warning : The document was created with Spire.PDF for Python.
834
r
ISSN:
2502-4752
[2]
D.
T
riantafyllidou
and
A.
T
ef
as,
“
A
f
ast
deep
con
v
olutional
neural
netw
ork
for
f
ace
detection
in
big
visual
data,
”
in
Advances
in
Big
Data
,
P
.
Angelo
v
,
Y
.
Manolopoulos,
L.
Ili
adis,
A.
Ro
y
,
and
M.
V
ellasco,
Eds.
Springer
International
Publishing,
2017,
pp.
61–70.
[3]
D.
T
riantafyllidou,
P
.
Nousi,
and
A.
T
ef
as,
“F
ast
deep
con
v
olutional
f
ace
detection
in
the
wild
e
xploiting
hard
sample
m
ining,
”
Big
Data
Resear
c
h
,
v
ol.
11,
pp.
65
–
76,
2018,
selected
papers
from
the
2nd
INNS
Conference
on
Big
Data:
Big
Data
and
Neural
Netw
orks.
[Online].
A
v
ailable:
http://www
.sciencedirect.com/science/article/pii/S2214579617300096
[4]
S.
Ren,
K.
He,
R.
B.
Girshick,
and
J.
Sun,
“F
aster
R-CNN:
to
w
ards
real-time
object
detection
with
re
gion
proposal
netw
orks,
”
CoRR
,
v
ol.
abs/1506.01497,
2015.
[Online].
A
v
ailable:
http://arxi
v
.or
g/abs/1506.01497
[5]
Q.
Gao,
P
.
F
orster,
K.
R.
Mob
us,
and
G.
S.
Mosch
ytz,
“
Fingerprint
recognition
using
cnns:
fingerprint
preprocessing,
”
in
ISCAS
2001.
The
2001
IEEE
International
Symposium
on
Cir
cuits
and
Systems
(Cat.
No.01CH37196)
,
v
ol.
3,
May
2001,
pp.
433–436
v
ol.
2.
[6]
T
.
Y
amashita,
T
.
W
atasue,
Y
.
Y
amauchi,
and
H.
Fujiyoshi,
“F
acial
point
detection
using
con
v
olutional
neural
netw
ork
transferred
from
a
heterogeneous
task,
”
in
2015
IEEE
International
Confer
ence
on
Ima
g
e
Pr
ocessing
(ICIP)
,
Sep.
2015,
pp.
2725–2729.
[7]
T
.
F
.
Cootes,
G.
J.
Edw
ards,
and
C.
J.
T
aylor
,
“
Acti
v
e
appearance
models,
”
in
Computer
V
ision
—
ECCV’98
,
H.
Burkhardt
and
B.
Neumann,
Eds.
Berlin,
Heidelber
g:
Springer
Berlin
Heidelber
g,
1998,
pp.
484–498.
[8]
D.
V
ukadino
vic
and
M.
P
antic,
“Fully
automatic
f
acial
feature
point
detection
using
g
abor
feature
based
boosted
classifiers,
”
in
2005
IEEE
International
Confer
ence
on
Systems,
Man
and
Cybernetics
,
v
ol.
2,
Oct
2005,
pp.
1692–1698
V
ol.
2.
[9]
M.
Kim
ura,
H.
Fukui,
T
.
Y
amashita,
Y
.
Y
amauchi,
and
H.
Fujiyoshi,
“F
acial
point
detection
based
on
deep
con
v
olutional
neural
netw
ork
with
optimal
minibatch,
”
in
TECHNICAL
REPORT
OF
IEICE.
CN
R,
T
ec
hnical
Committee
on
Cloud
Network
Robotics
(CNR)
,
v
ol.
114,
no.
455.
The
Institute
of
Elec
tronics,
Information
and
Communication
Engineers,
Feb
.
2015,
pp.
87–88,
(in
Japanese).
[Online].
A
v
ailable:
https://ci.nii.ac.jp/naid/110010014760/
[10]
T
.
Y
amashita,
M.
Kimura,
H.
Fukui,
Y
.
Y
amauchi,
and
H.
Fujiyoshi,
“Optimal
mini-batch
procedure
for
f
acial
piont
detection
based
on
a
deep
con
v
olutional
neural
netw
ork,
”
in
The
21st
Symposium
on
Sensing
via
Ima
g
e
Information
,
2015,
(in
Japanese).
[11]
Y
.
Minag
a
w
a,
M.
Abe,
and
Q.
Zhao,
“
Automatic
f
ace
feature
e
xtraction
based
on
neural
netw
orks,
”
in
SICE
T
ohoku
284
,
v
ol.
284,
no.
2,
11
2013,
pp.
1–4,
(in
Japanese).
[12]
Z.
Zhang,
P
.
Luo,
C.
C.
Lo
y
,
and
X.
T
ang,
“Learning
deep
representation
for
f
ace
alignment
with
auxiliary
attrib
utes,
”
IEEE
T
r
ansactions
on
P
attern
Analysis
and
Mac
h
i
ne
Intellig
ence
,
v
ol.
38,
no.
5,
pp.
918–930,
May
2016.
[13]
C.
Garcia
and
M.
Delakis,
“Con
v
olutional
f
ace
finder:
a
neural
architecture
for
f
ast
and
rob
ust
f
ace
detection,
”
IEEE
T
r
ansactions
on
P
attern
Analysis
and
Mac
hine
Intellig
ence
,
v
ol.
26,
no.
11,
pp.
1408–
1423,
No
v
2004.
[14]
K.
Sudhakar
,
P
.
Nith
yanandam,
“
An
accurate
f
acial
component
detection
using
g
abor
filter
,
”
Bulletin
of
Electrical
Engineering
and
Informatics
,
v
ol.
6,
no.
3,
pp.
287–294,
September
2017.
[15]
Ri
Cerd
Ng,
Kian
Ming
Lim,
Chin
Poo
Lee,
Siti
F
atimah
Abdul
Razak,
“S
urv
ei
llance
system
with
motion
and
f
ace
detection
using
histograms
of
oriented
gradients,
”
Indonesian
J
ournal
of
Electrical
Engineering
and
Computer
Science
,
v
ol.
14,
no.
2,
pp.
869–876,
May
2019.
[16]
Srini
v
asa
Perumal
Ra
maling
am,
Nadesh
R.
K.,
SenthilK
umar
N.
C.,
“Rob
ust
f
ace
recognition
using
en-
hanced
local
binary
pattern,
”
Bulletin
of
Electrical
Engineering
and
Informatics
,
v
ol.
7,
no.
1,
pp.
96–101,
March
2018.
[17]
LFW,
“Labeled
F
aces
in
the
W
ild
(LFW),
”
http://vis-www
.cs.umass.edu/lfw/.
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
16,
No.
2,
No
v
ember
2019
:
827
–
834
Evaluation Warning : The document was created with Spire.PDF for Python.