TELK
OMNIKA
T
elecommunication,
Computing,
Electr
onics
and
Contr
ol
V
ol.
19,
No.
4,
August
2021,
pp.
1357
1368
ISSN:
1693-6930,
accredited
First
Grade
by
K
emenristekdikti,
No:
21/E/KPT/2018
DOI:
10.12928/TELK
OMNIKA.v19i4.20292
r
1357
A
machine
lear
ning
appr
oach
f
or
the
r
ecognition
of
melanoma
skin
cancer
on
macr
oscopic
images
J
air
o
Hurtado,
Francisco
Reales
Pontificia
Uni
v
ersidad
Ja
v
eriana,
Electronics
Department.
School
of
Engineering,
Bogot
´
a,
Colombia
Article
Inf
o
Article
history:
Recei
v
ed
Aug
20,
2020
Re
vised
Jan
02,
2021
Accepted
Jan
20,
2021
K
eyw
ords:
Artificial
intelligence
Image
processing
Machine
learning
Melanoma
Skin
cancer
ABSTRA
CT
In
the
last
years,
computer
vision
systems
for
the
detection
of
skin
cancer
ha
v
e
been
proposed,
especially
using
machine
learning
techniques
for
the
classification
of
the
disease
and
features
based
on
the
ABCD
dermatology
criterion,
which
gi
v
es
infor
-
mation
on
the
status
of
the
skin
lesion
based
on
static
properties
such
as
geometry
,
color
,
and
te
xture,
making
it
an
appropriate
criterion
for
medical
diagnosis
systems
that
w
ork
through
images.
This
paper
proposes
a
no
v
el
skin
cancer
classification
sys-
tem
that
w
orks
on
images
tak
en
from
a
standard
camera
and
studies
the
impact
on
the
results
of
the
smoothed
bootstrapping,
which
w
as
used
to
augment
the
original
dataset.
Eight
classifiers
with
dif
ferent
topologies
(KNN,
ANN,
and
SVM)
were
com-
pared,
with
and
w
ithout
data
augmentation,
sho
wing
that
the
classifier
with
the
highest
performance
as
well
as
the
most
balanced
one
w
as
the
ANN
with
data
augmentation,
achie
ving
an
A
UC
of
87.1%,
which
sa
w
an
impro
v
ement
from
an
A
UC
of
84.3%
of
the
ANN
trained
with
the
original
dataset.
This
is
an
open
access
article
under
the
CC
BY
-SA
license
.
Corresponding
A
uthor:
Jairo
Hurtado
Electronics
Department
Pontificia
Uni
v
ersidad
Ja
v
eriana.
Bogot
´
a,
Colombia
Email:
jhurtado@ja
v
eriana.edu.co
1.
INTR
ODUCTION
The
high
le
v
el
of
sun
e
xposure,
lo
w
use
of
sunscreens
by
people,
and
s
ome
en
vironmental
f
actors
lead
to
an
increase
in
the
number
of
disorders
and
skin
diseases,
including
cancer
.
There
are
three
main
types
of
skin
cancer:
basal
cell
carcinoma,
squamous
cell
carcinoma,
and
melanoma,
being
melanoma
the
most
lethal
one
[1].
Rural
populations
in
the
tropics,
especially
in
mountainous
areas
are
particularly
af
fected
by
this
disease
due
to
the
e
xposition
to
solar
radiation
products
of
their
lifestyles,
skin
color
,
and
geographical
location,
since
the
UV
radiation
increases
between
10%
and
12%
for
each
kilometer
of
altitude
[2].
Also,
as
the
CO
VID-19
pandemic
has
caused
limited
ph
ysical
access
to
health-care
pro
viders,
this
can
generate
further
delay
treatment
of
melanoma
producing
de
v
astating
consequences
for
the
patients
[3].
F
or
this
reason,
the
adoption
of
computational
tools
in
medicine
is
arising
[4].
Melanoma
has
been
an
illness
of
public
concern
due
to
the
rapid
increase
of
25.9
%
between
2006
and
2016
[2]
and
the
W
orld
Health
Or
g
anization
predicts
that
in
the
ne
xt
tw
o
decades,
the
number
of
people
diagnosed
with
skin
cancer
will
be
double
[5].
So
that,
it
can
be
appreciated
the
usefulness
of
an
algorithm
that
identifies
malignant
lesion
patterns
and
suggests
that
the
person
go
immediately
to
a
specialist,
because,
if
it
is
diagnosed
early
,
the
chance
of
survi
ving
is
about
95%
[6].
Besides,
automatic
diagnosis
has
sho
wn
to
o
v
ercome
dermatologists
when
recognizing
either
malignant
and
benignant
lesions
or
a
particular
type
of
J
ournal
homepage:
http://journal.uad.ac.id/inde
x.php/TELK
OMNIKA
Evaluation Warning : The document was created with Spire.PDF for Python.
1358
r
ISSN:
1693-6930
lesion
[7].
Marco
Albrecht
et
al.
studied
dif
ferent
computer
methods
for
diagnosis
and
modeling
of
melanoma
sho
wing
the
helpfulness
of
the
melanoma
pattern
recognition
systems
in
order
to
start
early
treatment.
In
the
recent
years,
systems
oriented
to
the
automated
diagnosis
of
skin
cancer
through
images
ha
v
e
been
proposed
[1],
[7]-[17].
V
ariations
depends
especially
on
the
type
of
image
that
is
used
as
input
and
the
architecture
of
the
system.
Firstly
,
there
are
o
v
erall
three
types
of
images
used
for
this
purpose.
Macroscopic
images
that
are
lesions
tak
en
from
standard
cameras,
dermatoscopic
images
where
the
images
are
tak
en
using
a
de
vice
called
dermatoscope
which
magnifies
the
skin
lesion
making
malignant
patterns
more
visible
for
the
dermatologist
[5]
and
finally
the
least
used
Histo-pat
hological
images,
which
are
photos
of
the
disease
using
microscopic
e
xamination
of
a
biopsy
[18].
So
that,
while
a
system
that
w
orks
on
macroscopic
images
may
be
more
useful
for
common
people,
the
amount
of
images
in
datasets
of
macroscopic
lesions
is
v
ery
limited.
The
opposite
happens
with
dermatoscopic
images
where
there
are
man
y
publicly
a
v
ailable
datasets
with
an
amount
of
samples
of
the
order
of
thousand
images.
W
ith
re
g
ards
to
the
architecture,
it
is
used
traditional
machine
learning
with
hand-crafted
features
or
deep
learning
where
the
features
are
calculated
automatically
.
In
this
paper
an
algorithm
for
detecting
malignant
patterns
in
a
skin
mole
using
traditional
machine
learning
and
hand-crafted
features
is
proposed,
counting
with
a
pre-processing
which
reduces
the
shado
ws
in
the
image
produced
by
the
circularity
of
some
parts
of
the
body
.
Secondly
,
the
skin
mole
is
se
gmented
using
the
algorithm
of
unsupervised
learning:
Gaussian
Mixture
Model.
After
that,
70
features
based
on
a
dermato-
logical
criterion,
which
is
used
to
diagnose
melanoma
skin
cancer
,
are
calculated,
and
finally
,
a
classification
is
performed.
The
main
contrib
utions
of
this
paper
are:
(i)
The
implementation
of
a
no
v
el
malignant
pattern
recognition
system
that
w
orks
on
macroscopic
skin
lesions
images.
(ii)
The
comparison
of
the
performance
of
the
Gaussian
mixture
model
to
se
gment
dif
ferent
types
of
skin
lesions.
(iii)
A
study
on
the
impact
of
the
Smoothed
Bootstrap
data
augmentation
method
on
the
performance
of
dif
ferent
topologies
of
classifiers.
(i
v)
A
comparison
of
v
arious
state-of-the-art
systems
with
dif
ferent
architectures
and
type
of
input
images.
2.
RESEARCH
METHOD
T
o
detect
malignant
patterns
on
skin
lesions,
the
system
is
based
on
a
medical
criterion
called
the
ABCD
rule,
this
is
one
of
the
most
used
methods
whose
acron
ym
refers
to
the
four
parameters
used
in
the
clini-
cal
dermatological
diagnosi
s.
These
are
Asymmetry
,
Border
,
Color
,
and
structural
dif
ferences
[6].
Asymmetry
(A):
It
is
generated
by
the
uncontrolled
gro
wth
of
the
lesion,
because
of
higher
le
v
els
of
melanin
in
dif
ferent
re
gions
and
tends
to
ha
v
e
an
irre
gular
shape.
Borders
(B):
Melanoc
ytic
lesions
ha
v
e
irre
gular
borders.
In
contrast,
benign
lesions
tend
to
ha
v
e
borders
that
f
ade
smoothly
and
are
symmetric.
Color
(C):
It
is
related
to
the
e
xcess
melanin
under
the
surf
ace
of
the
lesion,
causing
a
dif
ferent
pigmentation
in
a
specific
re
gion.
Dermoscopic
structur
es
(D):
It
refers
to
the
generation
of
holes,
points,
cells,
and
inhomogeneity
(te
xture)
that
indicates
more
melanin
in
a
gi
v
en
re
gion.
The
ABCD
rule
has
been
t
ested
in
multiple
studies,
which
ha
v
e
documented
its
successful
diagnostic
accurac
y
in
clinical
practice.
Also,
has
been
confirmed
with
digital
im-
age
analysis
[19].
Ho
we
v
er
,
it
is
a
medical
criterion
that
only
can
be
applied
to
pigmented
lesions,
which
are
lesions
that
look
lik
e
spots.
So
that,
the
ABCD
rule
can
not
be
applied
to
basal
cell
carcinoma
nor
squamous
cell
carcinoma
[1].
F
or
this
reason,
the
system
uses
the
ABCD
rule
aiming
to
recognize
only
benignant
lesions
and
melanoma.
In
the
implementation,
it
w
as
used
the
Dermatology
Education
atlas
[20],
which
contains
173
images
of
macroscopic
skin
lesions
of
tw
o
types,
melanoma
(84)
and
benignant
(89),
with
sizes
from
154
by
186
to
1129
by
1241.
This
dataset
w
as
used
to
train
the
system.
Ho
we
v
er
,
it
is
clear
that
it
is
not
as
lar
ge
to
ensure
statistical
significance,
ho
we
v
er
,
since
datasets
of
macroscopic
images
tend
to
be
sm
all,
pre
vious
w
orks
ha
v
e
had
to
deal
with
these
situat
ions
with
methods
such
as
data
augmentation
[8]-[11].
In
Figure
1
the
block
diagram
of
the
proposed
system
is
sho
wn.
The
entire
system
w
as
implemented
in
Python,
using
the
OpenCV
library
for
the
pre-processing
st
ep
as
well
as
for
the
feature
e
xtraction.
On
the
other
hand,
for
the
classification,
it
w
as
used
Scikit-learn
and
T
ensorflo
w
2
libraries.
Each
block
of
the
diagram
of
Figure
1
is
e
xplained
belo
w
.
2.1.
Pr
e-pr
ocessing
In
this
block,
the
shado
ws,
caused
by
the
curv
ature
of
some
body
parts,
that
can
af
fect
the
system
performance
are
attenuated.
Since
there
could
be
shado
ws
that
resemble
the
color
of
the
skin
mole,
making
a
nonrecognition
between
shades
and
mole,
Figure
2.
T
o
correct
this
problem,
another
image,
obtained
from
the
re
gression
of
the
v
alues
near
the
corners
of
the
original
image,
is
created.
TELK
OMNIKA
T
elecommun
Comput
El
Control,
V
ol.
19,
No.
4,
August
2021
:
1357
–
1368
Evaluation Warning : The document was created with Spire.PDF for Python.
TELK
OMNIKA
T
elecommun
Comput
El
Control
r
1359
Figure
1.
Block
diagram
of
the
system
Figure
2.
Inaccurate
se
gmentation
caused
by
shado
ws
T
o
use
this
method
of
attenuation
of
shado
ws,
there
are
tw
o
assumptions.
The
shado
ws
change
smoothly
and
the
mole
i
s
located
in
the
center
of
the
image.
The
first
assumption
ensures
that
a
tw
o
poly-
nomial
de
gree
adequately
fits
the
shado
ws
of
the
image.
On
the
other
hand,
the
second
assumption
ensures
that
samples
can
be
tak
en
at
the
corners
of
the
image
without
touching
the
mole,
samples
that
are
subsequently
used
in
the
re
gression.
The
data
tak
en
for
t
he
re
gression
is
obtained
from
the
channel
V
(V
alue)
of
the
HSV
color
space
and
the
tw
o-de
gree
polynomial
is
presented
as
sho
wn
in
(1):
z
(
x;
y
)
=
p
1
x
2
+
p
2
y
2
+
p
3
xy
+
p
4
x
+
p
5
y
+
p
6
(1)
The
six
constants
that
minimize
the
squared
error
gi
v
en
as
sho
wn
in
(2)
ha
v
e
to
be
found,
where
(
i;
j
)
are
the
inde
x
es
of
all
samples
in
the
corners
’S’;
Sho
wn
in
Figure
3,
V
is
the
v
alue
channel
of
the
space
HSV
and
Z
is
the
quadratic
function
that
will
be
found.
The
result
is
sho
wn
in
Figure
4.
l
=
X
j
2
S
X
i
2
S
[
V
(
j
;
i
)
z
(
x
j
;
y
i
)]
2
(2)
Figure
3.
Samples
are
tak
en
from
the
corners
used
in
the
re
gression
(a)
(b)
Figure
4.
These
figures
are;
(a)
v
alue
component
of
the
HSV
space;
(b)
image
after
re
gression
T
o
attenuate
the
shado
ws,
the
v
alue
component
of
the
HSV
space
obtained
abo
v
e
is
di
vided
by
the
quadratic
polynomial
found
and
is
multiplied
by
the
ratio
of
the
a
v
erage
of
the
V
v
alues
with
the
a
v
erage
of
the
v
alues
of
V
/
z
function
as
sho
wn
in
(3).
Finally
,
after
c
hanging
the
channel
V
of
the
original
image
in
the
HSV
space
for
the
ne
w
one
found,
it
is
passed
to
the
RGB
color
space,
Figure
5.
A
mac
hine
learning
appr
oac
h
for
the
r
eco
gnition
of
melanoma
skin
cancer
on
...
(J
air
o
Hurtado)
Evaluation Warning : The document was created with Spire.PDF for Python.
1360
r
ISSN:
1693-6930
V
new
(
x;
y
)
=
V
V
=
Z
V
(
x;
y
)
z
(
x;
y
)
(3)
(a)
(b)
Figure
5.
These
figures
are;
(a)
original
image;
(b)
pre-processed
image
2.2.
Segmentation
The
purpose
of
this
step
is
to
detect
the
skin
mole
automatically
based
on
the
color
distrib
ution
of
the
image.
Because
images
of
skin
lesions
ha
v
e
near
tw
o
clusters,
light
and
dark
colors,
equi
v
alent
to
back-
ground
and
skin
lesion
respecti
v
ely
,
the
Gaussians
mi
xture
model
(GMM)
can
adequately
describe
the
color
distrib
ution
of
the
image
and
get
the
parameters
of
each
of
the
tw
o
clusters
in
order
to
perform
a
pix
el-wise
classification.
Also,
GMM
has
sho
wn
to
be
capable
of
recognizing
skin
diseases
with
satisf
actory
ef
ficienc
y
[21].
Another
reason
to
choose
the
Gaussian
Mixture
Model
is
that
color
-based
clustering
has
been
com-
pared
to
other
methods
such
as
Gra
p
h-
Cut
Se
gmentation
and
Otsu,
sho
wing
the
best
classification
accurac
y
[2].
Besides,
Pedro
Pereira
et
al.
[3]
has
compared
the
performance
of
39
se
gmentation
methods
across
three
dif
ferent
lar
ge
datasets
concluding
that
these
methods:
Local
Binary
P
atterns
Clustering,
W
u
Quantifier
,
and
Color
Based
Clustering
had
the
best
o
v
erall
performance.
After
finding
the
tw
o
clusters,
the
one
with
the
lar
gest
area
is
classified
as
background
and
the
other
one
as
the
lesion.
So
that,
the
pix
els
of
each
cluster
are
labeled
as
0
and
1
respecti
v
ely
,
generating
the
se
gmentation
mask,
sho
wn
in
Figure
6
(a),
which
after
filling
blackheads
and
dilate
the
image
sho
w
in
Figure
6
(b),
Figure
6
(c)
is
generated,
which
gi
v
es
shape
information,
and
Figure
6
(d)
is
obtained
by
multiplying
the
mask
by
the
original
image.
The
first
is
useful
for
e
v
aluating
the
asymmetry
of
the
mole
and
the
second
i
t
can
be
e
v
aluated,
borders,
v
ariation
in
color
,
and
te
xture
presence.
(a)
(b)
(c)
(d)
Figure
6.
These
figures
are;
(a)
GMM
Clustering,
(b)
Filling
holes,
(c)
Dilated
mask,
(d)
Skin
mole
se
gmented
T
o
measure
the
accurac
y
of
the
proposed
s
e
gme
ntation
method
on
a
dif
ferent
type
of
skin
lesions
that
apparently
w
ould
be
dif
ficult
for
the
system
to
recognize,
such
as
lesions
with
high
color
v
ariation
(ne
vus
spilus,
ne
vus
repigmented,
a
n
d
some
melanomas),
Caf
´
e-au-lait
macule,
which
tends
to
ha
v
e
blurred
shape,
and
lesions
containing
hair
,
see
Figure
7,
the
Border
Error
(BE),
sho
wn
in
(4),
which
has
been
used
in
pre
vious
w
orks
to
compare
the
se
gmentation
ef
ficienc
y
[22],
is
calculated.
Where
SM
is
the
se
gmentation
mask,
calculated
automatically
through
GMM,
and
GT
is
the
ground-truth
which
w
as
hand-labeled
from
the
dataset.
So
that,
the
BE
measures
the
percentage
of
the
non-o
v
erlapping
area
between
the
se
gmentation
mask
and
the
ground-truth.
l
B
E
(
S
M
;
GT
)
=
Ar
ea
(
S
M
GT
)
Ar
ea
(
GT
)
(4)
TELK
OMNIKA
T
elecommun
Comput
El
Control,
V
ol.
19,
No.
4,
August
2021
:
1357
–
1368
Evaluation Warning : The document was created with Spire.PDF for Python.
TELK
OMNIKA
T
elecommun
Comput
El
Control
r
1361
(a)
(b)
(c)
(d)
Figure
7.
These
figures
are;
(a)
Caf
´
e-au-lait
macule;
(b)
Lesion
with
color
v
ariation;
(c)
Containing
hair;
(d)
Lesion
with
color
uniformity
and
sharp
borders
T
able
1
sho
ws
the
a
v
erage
BE
for
the
four
dif
ferent
lesions
of
Figure
7,
where
the
lo
wer
the
Border
Error
the
more
accurate
the
se
gmentation
is.
Therefore,
it
suggests
that
the
Caf
´
e-au-lait
macule
is
the
type
of
lesion
that
the
se
gmentation
system
w
orks
better
with.
On
the
other
hand,
images
that
contain
hairs
are
not
correctly
se
gmented,
due
to
the
similarity
in
color
between
the
skin
lesion
and
hair
.
F
or
this
reason,
nine
haired
samples
were
remo
v
ed
from
the
original
dataset.
So
that,
the
classifiers
were
trained
with
164
samples,
85
benignant
lesions
and
79
malignant.
T
able
1.
Border
Error
(BE)
for
dif
ferent
type
of
lesions
Measure
Caf
´
e-au-lait
macule
Containing
hair
W
ith
color
v
ariation
W
ith
color
uniformity
B
E
(%)
31.1
770.2
67.1
49.0
2.3.
F
eatur
es
based
on
the
ABCD
criterion
The
skin
lesion
characterization
is
made
using
the
criterion
of
the
ABCD,
as
it
gi
v
es
information
on
the
state
of
pigmented
skin
lesions
using
static
parameters.
The
process
to
obtain
these
features
is
sho
wn
belo
w
.
2.3.1.
F
eatur
es
based
on
asymmetry
T
o
quantify
the
mole
asymmetry
,
it
is
considered
Figure
6
(c),
and
com
pared
t
o
ot
her
geom
etric
figures,
Figure
8,
as
the
ellipse;
in
purple(proposed
in
this
paper),
as
if
the
contour
or
shape
of
the
mole
resembles
an
ellipse
is
less
lik
ely
to
be
malignant.
A
comparison
with
the
bounding
box
is
also
made;
in
red,
to
ha
v
e
the
dimensions
of
the
lesion,
and
the
Con
v
e
x
Hull;
blue
dotted.
Besides,
the
area
of
the
quadrants
of
the
mole
must
be
the
same
if
the
mole
is
completely
symmetrical.
Figure
8.
Geometric
parameters
of
the
skin
lesion
The
parameters
used
are,
b
p
and
a
p
;
minor
and
major
ax
es.
A
p
;
A
c
;
A
b
and
A
e
;
areas
of
the
lesion,
con
v
e
x
hull,
bounding
box
and
ellipse,
respecti
v
ely
,
and
P
p
;
P
;
A
b
and
P
e
;
perimeter
of
the
lesion,
con
v
e
x
hull,
Bounding
Box,
and
ellipse
respecti
v
ely
.
On
the
other
hand,
A
1
and
A
2
represent
the
areas
of
each
di
vision
of
the
axis
a
p
.
Similar
to
B
1
and
B
2
for
the
axis
b
p
.
The
asymmetry
features
are
presented
in
T
able
2.
F
or
the
features
not
to
depend
on
the
size
and
resolution
of
the
image,
those
ha
ving
units
of
area
were
di
vided
by
the
area
of
the
Bounding
Box
A
b
and
with
length
units
were
di
vided
by
its
perimeter
P
b
.
A
mac
hine
learning
appr
oac
h
for
the
r
eco
gnition
of
melanoma
skin
cancer
on
...
(J
air
o
Hurtado)
Evaluation Warning : The document was created with Spire.PDF for Python.
1362
r
ISSN:
1693-6930
T
able
2.
Features
of
asymmetry
Lesion
area
Solidity
Equi
v
alent
diameter
Con
v
e
x
Hull
Area
A
p
A
p
A
c
q
4
A
p
A
c
Circularity
Lesion
perimeter
Aspect
ratio
Con
v
e
x
Hull
perimeter
4
A
p
P
p
2
P
p
b
p
a
p
P
c
Aspect
ratio
Elliptic
area
rate
Elliptic
rate
Rate
of
areas
b
p
b
b
a
b
A
p
A
e
P
p
P
e
(
B
1
B
2)
A
p
Rate
of
areas
a
p
Rate
form
b
p
Rate
form
a
p
(
A
1
A
2)
A
p
B
1
B
2
A
1
A
2
2.3.2.
T
extur
e
v
ariation,
darkness
and
color
inf
ormation
image
F
or
features
based
on
borders,
color
uniformity
,
and
dermoscopic
structures,
a
ne
w
image
(
I
N
)
,
with
three
channels,
obtained
from
the
original
image
is
created
4.
The
first
channel
pro
vides
information
on
the
te
xture
v
ariation,
the
second
on
the
skin
darkness,
and
the
third,
information
on
the
color
v
ariation,
I
i
N
(
i
=
1
;
2
;
3)
.
In
order
to
calculate
the
te
xture
v
ariation
channel
(
I
1
N
)
,
the
brightness
image
L
is
obtained
from
as
sho
wn
in
(5),
where
the
three
channels
of
the
original
image
I
C
are
a
v
eraged.
l
L
(
x;
y
)
=
P
3
i
=1
I
C
i
3
(5)
The
te
xture
(
x;
y
;
)
is
defined
by
as
sho
wn
in
(6)
where
S
(
x;
y
;
)
=
L
(
x;
y
)
G
(
)
is
the
bright-
ness
smoothed
by
a
Gaussian
filter
with
standard
de
viation
.
l
(
x;
y
;
)
=
L
(
x;
y
)
S
(
x;
y
;
)
L
(
x;
y
)
(6)
The
te
xture
image
(
x;
y
;
)
is
calculated
for
dif
ferent
v
alue
s
of
=
(
1
;
:
:
:
;
N
)
,
and
it
is
selected
for
each
pix
el
the
highest
te
xture
among
all
scales,
this
is
sho
wn
in
(7).
F
or
this
paper
,
the
standard
de
viation
w
as
chosen
as
=
1
;
11
7
;
:
:
:
;
43
7
with
a
windo
w
of
7
by
7
.
The
v
alues
of
this
parameter
were
suggested
by
Ca
v
alcanti
et
al.
[10]
due
to
the
a
v
erage
size
of
the
images
in
the
dataset
[5].
(
x;
y
)
=
max
[
(
x;
y
;
)]
(7)
The
te
xture
v
ariation
channel
will
be
obtained
from
the
normalization
of
(
x;
y
)
,
sho
wn
in
(8),
where
the
minimum
among
all
v
alues
is
subtracted
from
the
te
xture
image,
and
then
di
vided
by
the
dif
ference
between
the
maximum
and
minimum,
causing
that
all
data
is
in
t
he
interv
al
[0,
1].
The
original
image
as
sho
wn
in
Figure
9
(a)
and
the
result
of
I
1
N
is
sho
wn
in
Figure
9
(b).
I
N
1
=
(
x;
y
)
min
max
min
(8)
F
or
the
darkness
information
image
(
I
2
N
)
,
since
health
y
skin
tends
to
be
reddish,
when
the
red
chan-
nel
of
the
original
image
is
brighter
means
that
it
is
part
of
the
fund
and
if
it
is
dark
er
injury
.
So
that,
the
darkness
is
gi
v
en
by
the
complement
of
the
red
channel
of
the
original
image,
as
sho
wn
in
(9).
The
result
of
I
2
N
is
sho
wn
in
Figure
9
(c).
l
I
N
2
=
1
I
C
1
(9)
In
the
color
information
channel
(
I
3
N
)
,
the
three
color
channel
s
of
the
original
image
I
C
are
repre-
sented
in
a
single
channel
I
3
N
using
PCA
(Principal
Components
Analysis),
and
the
absolute
v
alue
is
tak
en
creating
the
image
C
(
x;
y
)
.
which
is
then
normalized
as
sho
wn
in
(10).
The
result
of
I
3
N
is
sho
wn
in
Figure
9
(d).
TELK
OMNIKA
T
elecommun
Comput
El
Control,
V
ol.
19,
No.
4,
August
2021
:
1357
–
1368
Evaluation Warning : The document was created with Spire.PDF for Python.
TELK
OMNIKA
T
elecommun
Comput
El
Control
r
1363
I
N
3
=
C
(
x;
y
)
C
min
C
max
C
min
(10)
(a)
(b)
(c)
(d)
Figure
9.
These
figures
are;
(a)
original
image,
(b)
te
xture
v
ariation,
(c)
darkness
information,
(d)
color
information
2.3.3.
F
eatur
es
based
on
borders
T
o
quantify
the
v
ariation
of
the
intensity
in
the
borders
of
the
lesion,
the
magnitude
of
the
gra
dient
v
ector
has
to
be
calculated,
as
it
measures
if
the
border
is
sharp
or
soft.
First,
the
border
is
obtained
by
subtracting
the
mask
contracted
from
the
mask
dilated,
it
is
important
that
the
border
is
thik
enough
to
ensure
that
it
contains
the
change
from
lesion
to
the
background.
Then
the
gradient
it
i
s
calculated
for
the
v
alues
on
the
border
B
.
So
that,
the
features
containing
information
of
the
i
ntensity
v
ariation
of
the
image
in
the
borders
are
calculated
from
the
mean
and
v
ariance
of
the
v
alues
of
the
gradient
magnitude,
as
sho
wn
in
(11)
and
(12).
l
=
1
N
X
(
x;y
)
2
B
jr
I
(
x;
y
)
(11)
2
=
1
N
X
(
x;y
)
2
B
(
jr
I
(
x;
y
)
j
)
2
(12)
Where
r
I
(
x;
y
)
is
the
gradient
v
ector
for
the
scalar
field
I
(
x;
y
)
.
Due
to
the
dependenc
y
of
the
gradient
v
ector
magnitude
on
the
skin
color
of
the
original
image,
the
v
alue
of
the
magnitude
w
ould
be
less
if
the
skin
color
w
as
dark
er
.
F
or
this
reason,
the
pre
viously
found
channels
I
1
N
;
I
2
N
and
I
3
N
are
used,
due
to
the
less
dependenc
y
on
this
parameter
.
T
o
approximate
the
gradient
of
the
image,
the
Sobel
operator
is
calculated,
Figure
10.
The
color
information
channel
is
also
di
vided
into
eight
pieces,
whose
principal
ax
es
are
oriented
in
the
direction
of
the
mole,
this
is
ensured
by
using
the
eigen
v
ectors,
and
the
a
v
erage
and
v
ariance
of
the
mean
of
the
gradients
are
obtained
in
each
fragment
that
belongs
to
the
border
of
the
lesion,
thus
obtaining
tw
o
ne
w
features
for
each
channel.
This
idea
is
similar
to
the
proposed
in
Figure
11.
The
features
based
on
the
borders
of
the
skin
mole
are
sho
wn
in
T
able
3.
Figure
10.
Borders
and
Sobel
operator
for
I
1
N
;
I
2
N
e
I
3
N
Figure
11.
Pieces
of
the
color
information
channel
A
mac
hine
learning
appr
oac
h
for
the
r
eco
gnition
of
melanoma
skin
cancer
on
...
(J
air
o
Hurtado)
Evaluation Warning : The document was created with Spire.PDF for Python.
1364
r
ISSN:
1693-6930
T
able
3.
Features
based
on
borders
Name
Notation
Quantity
A
v
erage
gradient
for
each
channel.
r
I
i
N
3
Gradient
v
ariance
for
each
channel.
r
I
i
N
3
A
v
erage
mean
gradient
for
the
eight
fragments
per
channel.
av
g
(
r
I
N
i
)
3
V
ariance
of
a
v
erage
gradient
for
the
eight
fragments
per
channel.
v
ar
(
r
I
i
N
)
3
2.3.4.
F
eatur
es
based
on
color
unif
ormity
Color
features
are
obtained
from
the
images
of
the
se
gmented
lesion,
sho
wn
in
Figure
6
(d),
and
the
color
information
channel
sho
wn
in
Figure
9
(d).
In
order
to
remo
v
e
color
noise,
both
images
are
smoothed
using
a
Gaussian
filter
.
T
able
4
is
al
so
used
where
the
tones
of
i
nterest
are
sho
wn.
These
tones
are
the
most
common
colors
found
in
dif
ferent
types
of
lesions
and
allo
w
the
system
to
recognize
particular
color
patterns.
So
that,
six
counte
rs
(C)
that
are
increased
depending
on
the
Euclidean
distance
of
a
gi
v
en
pix
el
color
to
the
tones
in
T
able
4
are
proposed.
Thus,
if
the
color
of
a
gi
v
en
pix
el
is
the
closest
to
one
of
the
tones
of
interest,
the
respecti
v
e
tone
counter
is
incremented.
Also,
it
is
proposed
in
this
paper
,
adding
the
mean
and
v
ariance
of
a
ne
w
image
obtained
by
calculating
the
Euclidean
distance
between
the
color
of
the
original
image
for
each
pix
el
and
the
tones
of
interest.
These
ne
w
parameters
gi
v
e
information
on
ho
w
f
ar
the
colors
of
interest
are
on
a
v
erage
and
ho
w
much
the
y
de
viate
from
the
v
alues
of
the
original
image.
T
able
4.
T
ones
of
interest
in
skin
lesions.
Alcon
et
al.
[23]
Colour
Counter
(C)
Red
channel
Green
channel
Blue
channel
White
(W)
c
W
1
1
1
Red
(R)
c
R
0.8
0.2
0.2
Light
bro
wn
(LB)
c
LB
0.6
0.4
0
Dark
Bro
wn
(DB)
c
D
B
0.2
0
0
blue-gray
(BG)
c
B
G
0.2
0.6
0.6
Black
(BBL)
c
B
L
0
0
0
In
order
to
ha
v
e
information
about
the
non-uniformity
of
the
mole
distrib
ution
color
,
features
that
depend
on
the
location
of
the
gi
v
en
color
distrib
ution
in
the
image
are
proposed.
F
or
this,
the
channel
of
color
information
I
3
N
is
di
vided
into
eight
pieces,
sho
wn
in
Figure
11,
whose
main
ax
es
are
oriented
in
the
mole
direction
(this
is
ensured
by
using
the
eigen
v
ectors)
and
the
a
v
erage
of
the
mean
of
the
v
alues
of
intensity
in
each
fragment
belonging
to
the
lesion
and
its
v
ariance
are
calculated.
So
that,
in
T
able
5,
the
features
that
gi
v
e
information
on
the
color
distrib
ution
and
uniformity
of
the
skin
mole
are
sho
wn.
Being
R,
G,
and
B
the
color
channels
of
the
original
image
whose
pix
els
are
part
of
the
lesion.
Lik
e
wise,
for
I
3
N
,
there
will
be
tak
en
only
the
v
alues
belonging
to
the
lesion.
On
t
he
other
hand,
the
mean
and
v
ariance
of
the
data
are
represented
with
the
symbols
and
respecti
v
ely
.
T
able
5.
Color
distrib
ution
features
max
(
R
)
max
(
G
)
max
(
B
)
min
(
R
)
min
(
G
)
min
(
B
)
(
R
)
(
G
)
(
W
)
(
R
)
(
G
)
(
B
)
(
j
I
c
W
j
)
(
j
I
c
R
j
)
(
j
I
c
LB
j
)
f
ont
siz
e
:
14
px
(
j
I
c
D
B
j
)
(
j
I
c
B
G
j
)
(
j
I
c
LB
j
)
(
j
I
c
W
j
)
(
j
I
c
R
j
)
(
j
I
c
LB
j
)
f
ont
siz
e
:
14
px
(
j
I
c
D
B
j
)
(
j
I
c
B
G
j
)
(
j
I
c
LB
j
)
max
(
I
3
N
)
(
I
N
3
)
(
I
N
3
)
(
R
)
=
(
G
)
(
R
)
=
(
B
)
(
G
)
=
(
B
)
c
LB
c
R
c
D
B
c
B
G
c
B
L
c
W
(
I
N
3
;;
8
)
(
I
N
3
;;
8
)
2.3.5.
F
eatur
es
based
on
dermoscopic
structur
es
Although
dermoscopic
structures
can
only
be
measurable
using
a
dermatoscope,
which
is
a
de
vice
that
enables
dermatologists
to
ha
v
e
a
closer
vie
w
of
the
skin
lesion,
dif
ferences
between
benignant
and
malignant
skin
moles
can
be
measured
through
macroscopic
images
using
features
based
on
the
skin
mole
te
xture
[4].
F
or
this
reason,
the
te
xture
channel
I
1
N
,
Figure
9
(b),which
gi
v
es
information
of
the
mole
rugosity
(Holes,
points
and
inhomogeneity),
is
used
in
order
to
obtain
four
more
features
which
are
the
maximum,
minimum,
mean,
and
v
ariance
of
the
te
xture
v
ariation
channel
calculated
inside
the
lesion.
These
features
are
sho
wn
in
T
able
6.
TELK
OMNIKA
T
elecommun
Comput
El
Control,
V
ol.
19,
No.
4,
August
2021
:
1357
–
1368
Evaluation Warning : The document was created with Spire.PDF for Python.
TELK
OMNIKA
T
elecommun
Comput
El
Control
r
1365
T
able
6.
T
e
xture
Features
Feature
Description
min
(
I
1
N
)
Minimum
te
xture
max
(
I
1
N
)
Maximum
te
xture
mean
(
I
1
N
)
A
v
erage
te
xture
v
ar
(
I
1
N
)
V
ariance
of
the
te
xture
channel
2.4.
Data
A
ugmentation
In
the
abo
v
e
processes,
70
features
based
on
the
medical
criterion
of
the
ABCD
were
calculated.
W
it
h
this,
the
system
has
enough
information
related
to
the
status
of
the
skin
mole
to
perform
a
classification
with
labels
(cancer
,
not
cancer).
Ho
we
v
er
,
as
the
dataset
used
for
training
and
testing
only
has
164
samples,
there
is
not
supported
statis
tical
significance.
F
or
this
r
eason,
the
smoothed
bootstrap
data
cloning
is
used.
This
(re)sampling
technique
is
based
on
the
idea
that
ne
w
data
samples
can
be
added
if
these
samples
are
distrib
uted
according
to
the
same
probability
densit
y
as
the
real
data
set,
resulting
in
a
greater
statistical
significance
[24].
In
order
to
clone
data
through
the
smoothed
bootstrapping
method,
a
Gaussian
distrib
ution
is
a
reason-
able
assumption
[10].
This
is
adding
to
each
sampl
e
a
Gaussian
error
with
zero
mean
and
standard
de
viation
ten
times
smaller
than
the
de
vi
ation
of
each
feature.
Each
ne
w
sample
is
gi
v
en
by
as
sho
wn
in
(13),
where
!
Y
k
;n
;
!
X
n
;
!
G
k
2
R
70
and
!
G
k
is
a
random
v
ector
with
normal
distrib
ution,
!
Y
k
;n
is
the
ne
w
sample
which
is
obtained
from
adding
Gaussian
noise
to
the
current
sample
!
X
n
.
!
Y
k
;n
=
!
X
n
+
!
G
k
(13)
This
data
augmentation
technique
has
been
used
pre
viously
by
Ca
v
alcanty
et
al.
[4].
Ho
we
v
er
,
in
their
w
ork
the
entire
dataset
w
as
augmented
fi
v
e
times,
making
that
there
w
as
al
w
ays
in
the
test
partition
a
similar
sample
in
the
training
partition.
This
w
ould
mak
e
k-nearest
neighbors
the
classifier
with
the
best
accurac
y
,
ne
v
ertheless,
those
results
w
ould
be
biased
due
to
the
e
xtended
dataset.
T
o
a
v
oid
this
situation,
only
the
training
partition
is
augmented
tw
o
times
and
it
is
tested
o
n
the
remaining
samples.
Also,
the
results
of
this
method
for
dif
ferent
classifiers
as
well
as
the
results
without
data
augmentation
are
studied.
2.5.
Classification
system
F
or
this
research
area,
there
are
specially
three
classifiers
used,
K-nearest
neighbors
(KNN),
artificial
neural
netw
ork
(ANN),
and
support
v
ectors
machine
(SVM).
These
classifiers
ha
v
e
been
used
in
pre
vious
w
orks
sho
wing
a
dermatologist
le
v
el
accurac
y
[2],
[4],
[6],
[7],
for
this
reason,
the
performance
of
the
three
said
classifiers,
both
with
and
without
data
augmentation,
are
compared
in
order
to
see
ho
w
the
beha
vior
of
these
classifiers
change
with
the
Smoothed
Bootstrapping
Data
Cloning.
3.
RESUL
TS
AND
DISCUSSION
Performance
metrics
of
the
classification
systems
are
presented
in
T
able
7.
These
results
are
generat
ed
using
a
10-fold
cross-v
alidation,
sho
wing
the
accurac
y;
which
is
the
o
v
erall
system
performance,
specificity;
ability
to
recognize
malignant
lesions
and
sensiti
vity
,
ability
to
recognize
benign
lesions.
Specificity
becomes
the
most
important
par
ameter
to
consider
because
if
the
system
does
not
suggest
to
visit
a
specialist,
since
the
image
is
a
malignant
lesion,
it
can
endanger
the
patient.
Therefore,
specificity
should
be
the
highest
possible.
On
the
other
hand,
it
is
better
if
the
sensiti
vity
has
a
high
v
alue,
ho
we
v
er
,
it
does
not
represent
an
imminent
danger
to
the
patient.
T
able
7.
Comparison
of
the
performance
measures
of
dif
ferent
classification
systems
Classifier
Specificity
(%)
Sensiti
vity
(%)
Accurac
y
(%)
SVM
with
k
ernel
grade
5
15.9
100
70.6
SVM
with
k
ernel
grade
5
augmented
15.9
100
70.6
KNN,
Euclidean
distance,
k=2
37.6
83.1
66.7
KNN,
Euclidean
distance,
k=2
augmented
56.3
76.7
66.7
KNN,
Mahalanobis
distance,
k=2
47.4
90
72.5
KNN,
Mahalanobis
distance,
k=2
augmented
71.7
87
80.4
ANN
200-180-150-100-50-20-1
80.2
94.4
88.2
ANN
200-180-150-100-50-20-1
augmented
86.9
87.8
86.3
A
mac
hine
learning
appr
oac
h
for
the
r
eco
gnition
of
melanoma
skin
cancer
on
...
(J
air
o
Hurtado)
Evaluation Warning : The document was created with Spire.PDF for Python.
1366
r
ISSN:
1693-6930
T
able
7
sho
ws
that
the
neural
net
w
ork
has
the
best
accurac
y
and
with
the
original
dataset
the
sys
tem
achie
v
es
a
hi
g
h
sensiti
vity
le
v
el.
In
contrast,
when
the
dataset
is
augmented
the
sensiti
vity
decreases
while
the
specificity
increases
making
it
a
more
balanced
classifier
.
The
problem
with
the
measures
tak
en
in
T
able
7
is
that
while
it
gi
v
es
an
estimate
of
the
classifier
performance
in
one
(specificity
,
sensiti
vity)
point,
it
is
better
to
compare
the
entire
curv
e.
This
method
to
measure
the
performance
is
called
the
R
OC
curv
e,
which
stands
for
Recei
v
er
Operating
Characteristic
Curv
e
and
sho
ws
the
dependence
between
sensiti
vity
and
specificity
for
each
classifier
,
making
possible
a
comparison
not
only
for
one
point
b
ut
for
the
entire
spectr
um
of
v
alues
of
sensiti
vity
and
specificity
.
T
o
c
ompare
the
performance
of
the
classifier
using
the
R
OC
curv
e,
the
A
UC
is
used,
which
is
the
area
under
the
R
OC
curv
e,
and
measures
the
ability
of
the
system
to
recognize
benignant
lesions
as
benignant
and
malignant
lesions
as
malignant.
So
that,
the
closer
the
A
UC
to
100%
the
more
accurate
the
system
is.
The
R
OC
curv
e
and
A
UC
for
the
three
topologies
compared
(SVM,
KNN,
and
ANN)
are
sho
wn
in
Figure
12,
which
suggests
that
the
o
v
erall
performance
of
the
neural
netw
ork
and
SVM
increases
with
data
augmentation.
Ho
we
v
er
,
being
consistent
with
T
able
7,
the
neural
netw
ork
has
the
best
performance
among
the
topologies
compared.
On
the
other
hand,
both
T
able
7
and
Figure
12
sho
w
that
the
KNN
tends
to
w
ork
better
with
the
Mahalanobis
distance
than
the
Euclidean
distance
for
this
specific
task.
This
is
important
because
most
approaches
to
skin
cancer
classi
fication
that
use
KNN
are
based
on
the
Euclidean
distance
while
the
y
could
increase
their
performance
using
a
Mahalanobis
distance-based
KNN.
On
the
other
hand,
Figure
12
mak
es
clear
that
when
the
KNN
with
Mahalanobis
distance
i
s
based
on
augmented
data,
the
o
v
erall
performance
drops,
the
sensiti
vity
increases
and
the
specificity
decreases.
Figure
12.
R
OC
Curv
es
for
the
three
toplogies:
ANN,
KNN
and,
SVM
On
the
other
hand,
o
v
er
the
years
traditional
machine
lea
rning
and
deep
learning
systems
ha
v
e
been
proposed.
Ho
we
v
er
,
while
It
is
not
possible
to
ha
v
e
a
di
rect
comparison
of
these
systems’
performance,
since
the
y
are
trained
o
n
dif
ferent
datasets,
comparing
the
accurac
y
with
pre
vious
systems
gi
v
es
information
on
whether
the
proposed
approach
is
viable
or
not.
Also,
mak
es
it
possible
to
recognize
patterns
present
among
dif
ferent
systems,
datasets,
arquitectures,
and
types
of
im
ages.
So
that,
a
comparison
of
dif
ferent
state-of-the-art
systems
is
sho
wn
in
T
able
8.
W
ith
re
g
ards
to
traditional
machine
learning
usually
counting
with
pre-processing,
se
gmentation,
feature
e
xtract
ion,
and
classification.
Y
uheng
et
al.
[9]
in
2019
proposed
a
system
based
on
an
SVM
classifier
and
143
macroscopic
images
using
polarization
speckle
imaging
which
allo
wed
the
system
to
increase
performance.
Another
interesting
approach
w
as
made
by
V
erosha
et
al.
[8]
in
2019.
Their
system
used
170
macroscopic
images
of
skin
lesions
for
trai
ning
and
testing
and
implemented
both
the
KNN
with
k=5
and
SVM
classifiers.
Also,
traditional
machine
learning
has
been
used
in
histo-pathological
images,
for
e
xample,
T
akruri
et
al.
[12]
implemented
a
PSO-SVM
h
ybrid
system
that
optimizes
SVM
parameters
and
then
performs
a
classification.
On
the
other
hand,
deep
learning
systems
w
orking
on
the
recognition
of
skin
cancer
ha
v
e
seen
important
impro
v
ements
o
v
er
the
years.
P
acheco
et
al.
[1]
proposed
t
hat
the
input
of
a
con
v
olutional
neural
netw
ork
(CNN)
w
as
not
only
the
image
of
the
skin
lesion
b
ut
also
clinical
information
such
as
age,
location
of
the
lesion
and
if
it
had
bled,
impro
ving
the
a
v
erage
accurac
y
in
o
v
er
7%.
Maron
et
al.
made
a
comparison
between
112
dermatologists
and
a
deep
learning
system,
sho
wi
n
g
that
the
con
v
olutional
neural
netw
ork
w
as
more
accurate
in
both
binary
class
problem
(Melanoma,
benignant)
and
multiclass
(T
ype
of
skin
disease).
T
able
8
sho
ws
the
result
s
obtained
by
the
112
dermatologists
in
order
to
ha
v
e
a
reference
to
compare
the
systems.
Ade
gun
et
al.
[15]
proposed
an
encoder
-decoder
architecture
which
g
a
v
e
the
system
the
possibility
of
e
xtracting
TELK
OMNIKA
T
elecommun
Comput
El
Control,
V
ol.
19,
No.
4,
August
2021
:
1357
–
1368
Evaluation Warning : The document was created with Spire.PDF for Python.