Indonesian
J
our
nal
of
Electrical
Engineering
and
Computer
Science
V
ol.
22,
No.
3,
June
2021,
pp.
1619
1628
ISSN:
2502-4752,
DOI:
10.11591/ijeecs.v22i3.pp1619-1628
r
1619
Unsuper
vised
featur
e
selection
with
least-squar
es
quadratic
mutual
inf
ormation
J
anya
Sainui,
Chouv
anee
Sri
visal
Di
vision
of
Computational
Science,
F
aculty
of
Science,
Prince
of
Songkla
Uni
v
ersity
,
Songkhla,
Thailand
Article
Inf
o
Article
history:
Recei
v
ed
Dec
27,
2020
Re
vised
May
10,
2021
Accepted
May
24,
2021
K
eyw
ords:
Least-squares
quadratic
mutual
information
Mutual
information
Unsupervised
feature
selection
ABSTRA
CT
W
e
propose
the
feature
selection
method
based
on
the
dependenc
y
bet
ween
features
in
an
unsupervised
manner
.
The
underlying
assumption
is
that
the
most
important
feature
should
pro
vide
high
dependenc
y
between
itself
and
the
rest
of
the
features.
Therefore,
the
top
m
features
with
maximum
dependenc
y
scores
should
be
selected,
b
ut
the
redundant
features
should
be
ignored.
T
o
deal
with
this
problem,
the
objecti
v
e
function
that
is
applied
to
e
v
aluate
the
dependenc
y
between
features
plays
a
crucial
role.
Ho
we
v
er
,
pre
vious
methods
mainly
used
the
mutual
information
(MI),
where
the
MI
esti
mator
based
on
the
k
-nearest
neighbor
graph,
resulting
in
its
estimation
de-
pendent
on
the
selection
of
paramet
er
,
k
,
without
a
systematic
w
ay
to
select
it.
This
implies
that
the
MI
estimator
tends
to
be
less
reliable.
Here,
we
introduce
the
least-
squares
quadratic
mutual
information
(LSQMI)
that
is
more
sensible
because
its
tuning
parameters
can
be
selected
by
cross-v
alidation.
W
e
sho
w
through
the
e
xperiments
that
the
use
of
LSQMI
performed
better
than
that
of
MI.
In
addition,
we
compared
the
pro-
posed
method
to
the
three
counterpart
methods
using
six
UCI
benchmark
datasets.
The
results
demonstrated
that
the
proposed
method
is
useful
for
selecting
the
informati
v
e
features
as
well
as
discarding
the
redundant
ones.
This
is
an
open
access
article
under
the
CC
BY
-SA
license
.
Corresponding
A
uthor:
Jan
ya
Sainui
Di
vision
of
Computational
Science,
F
aculty
of
Science
Prince
of
Songkla
Uni
v
ersity
Songkhla,
Thailand
Email:
jan
ya.s@psu.ac.th
1.
INTR
ODUCTION
Feature
selection
aims
to
select
the
most
informati
v
e
subset
of
features
to
capture
the
structure
of
the
original
data.
It
usually
appears
as
a
pre-processing
step
for
v
arious
tasks
such
as
classification
[1],
[2],
clustering
[3]-[5],
data
mining
[6],
resulting
in
better
results.
In
a
supervised
scenario,
the
idea
is
to
select
the
features
t
h
a
t
are
the
most
rele
v
ant
to
the
class
labels
[7],
[8].
Ho
we
v
er
,
in
this
paper
,
we
focus
on
an
unsupervised
feature
selection
that
is
more
dif
ficult
to
achie
v
e
than
the
supervised
fea
ture
selection
due
to
the
absence
of
the
class
labels
[9]-[14].
In
unsupervised
manner
,
feature
selection
is
important
as
it
helps
impro
v
e
the
performance
as
well
as
reduce
the
computational
time
of
clustering
[15]-[19].
Moreo
v
er
,
unsupervised
feature
selections
may
be
used
for
v
arious
purposes
such
as
visualization
and
pre-processing
step
for
supervised
learning
[7],
[15],
[20].
Unsupervised
feature
selection
c
an
be
di
vided
into
tw
o
cate
gories,
namely
wrapper
method
and
filter
method
[10],
[15].
Man
y
e
xisting
methods
of
unsupervised
feature
selection
are
based
on
the
wrapper
method
such
that
the
selected
features
are
dependent
on
the
specific
clustering
algorithm.
The
main
dra
wbacks
of
the
wrapper
method
are
its
computational
comple
xity
and
limited
use
with
a
particular
clustering
J
ournal
homepage:
http://ijeecs.iaescor
e
.com
Evaluation Warning : The document was created with Spire.PDF for Python.
1620
r
ISSN:
2502-4752
algorithm.
In
this
paper
,
we
focus
on
the
filter
method
which
e
v
aluates
the
important
features
without
using
an
y
clustering
algorithm
[21]-[25].
Thus,
the
filter
method
is
more
general
and
useful
than
the
wrapper
method.
W
e
re
vie
w
some
e
xisting
unsupervised
feature
selection
methods
relating
to
our
w
ork
as
follo
ws.
Laplacian
score
(LS)
[10]
is
based
on
Laplacia
n
Eigenmap
and
locality
preserving
projection.
F
or
each
feature,
the
Laplacian
score
is
estimated
using
the
nearest
neighbor
graph.
If
it
is
a
good
feature,
its
LS
tends
to
be
small.
The
LS
is
based
on
the
observ
ation
that,
tw
o
data
points
are
probably
related
to
the
same
cluster
if
the
y
are
close
to
each
other
.
The
dra
wbacks
of
this
method
are
that
there
is
no
systematic
w
ay
to
tune
the
number
of
nearest
neighbor
k
or
Gaussian
width,
and
the
redundant
features
are
not
observ
ed.
multi-cluster
feature
selection
(MCFS)
[11]
selects
the
features
while
pres
erving
the
cluster
structure
of
the
data.
The
authors
proposed
to
use
multi
ple
eigen
v
ectors
of
graph
Laplacian,
which
are
defined
on
the
af
finity
matrix
of
data
points
to
capture
the
multi-cluster
structure
of
data.
The
algorithm
performs
especially
well
when
the
number
of
selected
features
is
small
(e.g.,
<
=
50
).
Ho
we
v
er
,
it
performs
best
when
the
number
of
used
eigen
v
ectors
is
equal
to
the
number
of
clusters,
b
ut
in
an
unsupervised
manner
,
the
number
of
clusters
remains
unkno
wn.
Moreo
v
er
,
we
observ
ed
through
the
e
xperiments
that
this
method
tends
to
be
sensiti
v
e
to
noise
features.
Unsupervised
feature
selection
based
on
mutual
information
(UFSMI)
[9]
is
deri
v
ed
from
the
observ
ation
that
good
features
share
information
in
common,
and
noisy
features
are
less
correlated
with
the
other
features.
Mutual
information
(MI)
is
then
used
as
the
objecti
v
e
function
to
capture
the
shared
information
between
a
feature
and
the
rest
of
features,
and
the
y
choose
the
first
m
features
that
achie
v
e
the
higher
mutual
information.
The
higher
le
v
el
of
noise
leads
to
a
smaller
a
v
erage
v
alue
of
the
score
function
because
the
addition
of
noise
reduces
the
mutual
information
between
features.
The
desired
property
of
t
he
feature
selection
algorithm
aims
at
remo
ving
noisy
features.
Ho
we
v
er
,
the
estimation
of
MI
is
less
reliable
as
it
is
computed
using
the
k
-nearest
neighbor
graph.
Thus,
its
performance
is
based
on
the
tuning
parameter
,
k
,
resulting
in
that
a
subset
of
the
selected
features
may
not
be
the
best.
All
methods
mentioned
abo
v
e
including
se
v
eral
e
xisting
methods
such
as
[12]-[14]
are
based
on
the
k
-nearest
neighbor
graph
or
other
tuning
param
eters,
and
there
is
no
systematic
w
ay
to
choose
such
parameters
(i.e.,
k
).
In
addi
tion,
the
y
did
not
observ
e
the
redundant
features.
Therefore,
in
this
paper
,
we
w
ould
lik
e
to
deal
with
these
problems.
Our
idea
is
inspired
by
[9]
as
the
y
pro
v
ed
that
a
good
feature
is
e
xpected
to
be
dependent
on
the
rest
of
features.
Ho
we
v
er
,
the
method
did
not
tak
e
the
redundanc
y
of
features
into
account,
and
thus
may
not
produce
an
optimal
feature
subset.
Moreo
v
er
,
in
[9],
the
y
used
the
mutual
information
(MI)
as
their
dependent
measure,
and
the
MI
is
estimated
based
on
the
k
-nearest
neighbors
which
lacks
of
the
systematic
w
ay
to
choose
the
appropriate
k
.
So
that
the
estimation
of
MI
tends
to
be
less
reliable.
In
order
to
solv
e
these
problems,
we
propose
to
apply
the
L
2
-distance
v
ariant
of
MI
called
the
quadratic
mutual
information
(QMI)
[26],
where
the
least-square
method
is
used
to
approximate
QMI
[27].
Therefore,
the
used
parameters
c
an
be
obtained
automatically
by
cross-v
alidation.
The
contrib
ution
of
t
his
paper
is
a
no
v
el
method
of
unsupervised
feature
selection
based
on
the
least-squares
quadratic
mutual
information
(LSQMI),
which
is
po
werful
for
selecting
the
most
important
features
as
well
as
rejecting
the
redundant
features.
In
the
e
xperimental
results,
we
sho
w
that
the
LSQMI
is
more
reliable
than
MI.
In
addition,
we
demonstrate
that
the
proposed
method
is
promising
through
the
e
xperimental
results
on
six
UCI
machine
learning
repositories.
The
rest
of
paper
is
or
g
anized
as
follo
ws.
In
section
2,
we
re
vie
w
the
least-square
quadratic
mutual
information
and
its
norm
alization.
W
e
describe
the
proposed
method
in
section
3.
The
e
xperimental
setup
and
results
are
sho
wn
in
section
4.
In
section
5,
we
pro
vide
the
conclusion.
2.
THE
LEAST
-SQ
U
ARE
Q
U
ADRA
TIC
MUTU
AL
INFORMA
TION
In
this
section,
we
first
re
vie
w
the
least-square
quadratic
mutual
information
(LSQMI)
[27]
that
can
be
used
to
measure
a
statistical
dependence
among
features.
As
LSQMI
is
ranged
from
0
to
1
,
we
need
to
normalize
LSQMI
to
[0,1]
range.
W
e
then
re
vie
w
the
normalization
of
LSQMI
[28]
that
is
finally
used
as
the
objecti
v
e
function
for
our
proposed
unsupervised
feature
selection
method.
2.1.
LSQMI
estimation
Suppose
that
we
are
gi
v
en
a
set
of
f
(
x
i
;
y
i
)
g
n
i
=1
independently
dra
wn
from
a
joint
probabili
ty
dis-
trib
ution
with
density
p
(
x
;
y
)
.
Here,
x
2
R
n
is
a
featrue
v
ector
and
y
2
R
n
(
d
1)
is
the
rest
of
feature
v
ectors,
where
n
denotes
the
number
of
samples
and
d
denotes
the
number
of
features.
The
quadratic
mutual
information
(QMI)
[26]
between
x
and
y
is
defined
as
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
22,
No.
3,
June
2021
:
1619
–
1628
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
r
1621
QMI
:=
Z
Z
f
(
x
;
y
)
2
d
x
d
y
;
where
f
(
x
;
y
)
:=
p
(
x
;
y
)
p
(
x
)
p
(
y
)
;
and
p
(
x
)
and
p
(
y
)
denote
the
mar
ginal
densities
of
x
and
y
,
respecti
v
ely
.
In
LSQMI
[27],
the
density
dif
ference
f
(
x
;
y
)
is
modeled
as
g
(
x
;
y
)
=
n
X
`
=1
`
K
(
x
;
x
`
)
L
(
y
;
y
`
)
;
where
K
(
x
;
x
`
)
and
L
(
y
;
y
`
)
are
k
ernel
functions
for
x
and
y
,
respecti
v
ely
.
Then,
=
(
1
;
:
:
:
;
n
)
>
is
learned
by
least-squares
as
min
Z
Z
g
(
x
;
y
)
f
(
x
;
y
)
2
d
x
d
y
:
An
empirical
and
re
gularized
v
ersion
of
the
abo
v
e
optimization
problem
is
gi
v
en
as
b
:=
argmin
h
>
H
2
>
b
h
+
>
i
;
where
0
is
the
re
gularization
parameter
,
and
H
and
b
h
are
defined
as
H
`;`
0
:=
Z
K
(
x
;
x
`
)
K
(
x
;
x
`
0
)d
x
Z
L
(
y
;
y
`
)
L
(
y
;
y
`
0
)d
y
;
(1)
b
h
`
:=
1
n
n
X
i
=1
K
(
x
i
;
x
`
)
L
(
y
i
;
y
`
)
1
n
2
n
X
i;j
=1
K
(
x
i
;
x
`
)
L
(
y
j
;
y
`
)
:
In
this
paper
,
we
use
the
Gaussian
k
ernel
for
both
K
(
x
;
x
`
)
and
L
(
y
;
y
`
)
as
both
x
and
y
are
continuous:
K
(
x
;
x
`
)
:=
e
xp
k
x
x
`
k
2
2
2
;
L
(
y
;
y
`
)
:=
e
xp
k
y
y
`
k
2
2
2
;
so
that
the
inte
gral
in
H
(1)
can
be
computed
analytically
as
H
`;`
0
=
(
2
)
d
x
=
2
exp
k
x
`
x
`
k
2
4
2
(
2
)
d
y
=
2
exp
k
y
`
y
`
0
k
2
4
2
:
Thus,
the
solution
b
can
be
obtained
as
b
=
(
H
+
I
)
1
b
h
;
where
I
denotes
the
identity
matrix.
Finally
,
the
least-squares
quadratic
mutual
information
(LSQMI)
is
gi
v
en
by
[
QMI
:=
m
ax
(0
;
2
b
>
b
h
b
>
H
b
)
:
Unsupervised
featur
e
selection
with
least-squar
es
quadr
atic
mutual
information
(J
anya
Sainui)
Evaluation Warning : The document was created with Spire.PDF for Python.
1622
r
ISSN:
2502-4752
2.2.
The
normalisation
of
LSQMI
As
LSQMI
is
ranged
from
0
to
1
,
we
here
normalize
it
as
follo
w:
\
NQMI
:=
[
QMI(
x
;
y
)
max
(
[
QMI(
x
;
x
)
;
[
QMI(
y
;
y
))
:
(2)
The
higher
\
NQMI
indicates
the
more
information
shared
between
the
feature
x
and
the
rest
of
features
y
,
while
it
is
closed
to
zero
if
the
feature
x
is
a
noise
feature.
W
e
assume
that
the
v
alues
of
[
QMI(
y
;
y
)
and
[
QMI(
x
;
x
)
are
lar
ger
than
zero
and
that
of
[
QMI(
x
;
y
)
,
because
the
y
are
the
dependenc
y
of
themselv
es.
No
w
,
the
range
of
\
NQMI
is
[0
;
1]
.
Ho
we
v
er
,
in
practice,
\
NQMI
may
be
higher
than
1.
3.
THE
PR
OPOSED
UNSUPER
VISED
FEA
TURE
SELECTION
METHOD
In
an
unsupervi
sed
scenario,
the
objecti
v
e
function
that
is
used
to
e
v
aluate
the
important
feature
is
v
ery
important.
Ho
we
v
er
,
the
objecti
v
e
function
of
the
e
xisting
methods
tends
to
be
less
reliable
because
its
approximation
depends
on
the
parameters
included
in
the
objecti
v
e
function,
and
there
is
no
automatic
w
ay
to
choose
those
parameters.
In
this
paper
,
we
propose
an
unsupervised
feature
selection
by
using
a
more
reliable
objecti
v
e
function
called
the
least-squares
quadratic
mutual
information
(LSQMI).
The
benefits
of
LSQMI
are
that
the
parameters
needed
in
the
objecti
v
e
function
can
be
chosen
by
cross
v
alidation,
and
it
is
less
sensiti
v
e
to
noises
and
outliers
that
may
cause
the
estimation
of
the
selected
criteria
unreliable
[26],
[27].
The
proposed
method
is
described
as
follo
ws.
Gi
v
en
a
set
of
n
samples
denoted
as
f
x
i
g
n
i
=1
,
where
each
sample
x
i
consists
of
d
features
(i.e.,
x
i
=
f
f
1
;
:
:
:
;
f
d
g
),
our
goal
here
is
to
select
the
most
m
informati
v
e
features
from
d
features,
while
the
redundant
features
should
be
discarded.
T
o
address
this
problem,
we
apply
the
LSQMI
[27]
as
our
criteria
for
both
selecting
the
most
m
important
features
as
well
as
ignoring
the
redundant
ones.
Let
f
j
2
R
n
be
the
j
th
feature,
and
f
n
j
2
R
n
(
d
1
)
be
all
features
e
xcluding
f
j
.
T
o
select
the
import
ant
b
ut
not
redundant
features,
our
proposed
algorithm
is
sho
wn
in
Algorithm
1.
Algorithm
1
The
unsupervised
feature
selection
with
least-squares
quadratic
mutual
information
(UFSLSQMI)
Input:
A
set
of
features,
f
f
1
;
:::;
f
d
g
,
f
j
2
R
n
,
the
number
of
needed
features,
m
,
and
a
threshold,
.
Output:
A
subset
of
selected
features,
S
=
f
f
1
;
:
:
:
;
f
m
g
.
1:
S
?
2:
f
or
j
=
1
;
:
:
:
;
d
do
3:
compute
\
NQMI
(Equation
(2))
between
f
j
and
f
n
j
4:
end
f
or
5:
sort
features
f
f
1
;
:::;
f
d
g
according
to
their
\
NQMI
in
descending
order
resulting
in
f
f
1
;
:::;
f
d
g
6:
S
f
S
[
f
1
g
7:
k
1
8:
f
or
j
=
2
;
:
:
:
;
d
do
9:
if
(
k
<
m
)
then
10:
if
(
\
NQMI(
f
j
;
f
n
j
)
\
NQMI(
f
j
1
;
f
n
j
1
)
<
)
then
11:
S
f
S
[
f
j
g
12:
k
k
+
1
13:
end
if
14:
else
15:
break
16:
end
if
17:
end
f
or
The
inputs
of
the
algorithm
are
a
set
of
features,
f
f
1
;
:::;
f
d
g
,
the
number
of
needed
features,
m
,
and
a
threshold,
.
While
the
output
is
a
subset
of
selected
features
S
=
f
f
1
;
:
:
:
;
f
m
g
.
The
algorithm
starts
by
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
22,
No.
3,
June
2021
:
1619
–
1628
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
r
1623
computing
the
normalisation
of
LSQMI
as
(2)
for
all
features
(lines
2
-
4).
Then,
the
features
are
sorted
in
descending
order
according
to
their
score
(l
ine
5).
Last
ly
,
the
features
are
selected
according
to
the
ranking;
a
feature
with
the
highest
score
is
firstly
selected
and
the
feature
with
the
second
high
s
core
is
then
selected
and
so
on.
Ho
we
v
er
,
the
redundant
features
are
ignored
by
discarding
the
feature
that
has
the
normalisation
of
LSQMI
score
closed
to
the
pre
vious
one.
In
other
w
ords,
if
the
ratio
of
the
current
score
and
the
pre
vious
one
is
higher
than
the
gi
v
en
threshold
,
(line
10),
the
feature
according
to
the
smaller
score
is
ignored.
The
algorithm
is
repeated
until
the
m
features
(lines
8
-
17)
are
obtained.
The
performance
of
the
proposed
method
depends
on
the
parameters
used
for
approximating
the
LSQMI.
Ho
we
v
er
,
thanks
to
the
least-squares
method,
we
can
choose
the
appropriate
parameters
by
cross-
v
alidation
method.
Although
the
s
elected
features
are
also
depended
on
the
threshold
,
we
sho
w
you
through
the
e
xperiments
that
our
objecti
v
e
function
well
defines
the
redundant
features
and
the
noise
features.
In
other
w
ords,
if
tw
o
features
are
redundanc
y
,
their
LSQMI
scores
tend
to
be
closed
to
each
other
,
and
if
a
feature
is
the
noise
feature,
the
LSQMI
score
tends
to
be
small.
4.
RESUL
TS
AND
DISCUSSION
In
this
section,
we
demonstrate
the
ef
fecti
v
eness
of
the
proposed
unsupervised
feature
selection.
W
e
compared
the
proposed
method
with
three
competiti
v
e
methods,
including
UFSMI
[9],
LSFS
[10],
and
MCFS
[11].
All
methods
are
based
on
the
k
-nearest
neighbor
,
and
here
we
set
k
=
5
for
all
methods
as
their
def
ault
setting.
F
or
MCFS,
there
is
another
parameter
that
is
the
number
of
eigen
v
ectors,
and
we
set
this
parameter
at
5
as
the
same
reason.
F
or
the
proposed
method,
the
threshold
is
set
at
0.95.
The
competiti
v
e
methods
including
the
proposed
method
rank
the
features
according
to
the
criteria
to
select
the
top
m
features.
This
means
that
the
objecti
v
e
function
mainly
af
fect
a
subset
of
selected
features.
Thanks
to
the
reliability
of
LSQMI,
our
method
considers
not
only
the
importance
of
features
b
ut
the
redun-
danc
y
also.
Therefore,
in
the
e
xperiment
setups,
we
firstly
sho
w
you
that
our
objecti
v
e
function
i
s
ef
fecti
v
e
for
selecting
the
important
features
as
well
as
discarding
the
redundant
features
on
the
synthetic
data.
Secondly
,
we
illustrate
the
clustering
performance
after
feature
selection
using
six
UCI
benchmark
datasets
to
confirm
the
usefulness
of
our
proposed
method.
4.1.
Synthetic
data
In
the
first
e
xperiment,
we
w
ould
lik
e
to
demonstrate
the
reliability
of
the
LSQMI
comparing
to
the
competiti
v
e
methods.
Specifically
,
we
w
ould
lik
e
to
sho
w
you
that
the
LSQMI
is
more
reliable
for
e
v
aluating
both
the
informati
v
e
feature
and
the
redundant
features.
T
o
do
so,
we
conducted
a
synthetic
data
of
3
clusters
with
3-dimensionality
illustrated
in
Figure
1.
W
e
generated
100
instances
with
standard
de
viation
=
1
for
each
class.
Figure
1
(a)-(c)
sho
ws
that
the
3
r
d
feature
is
the
most
important
for
underlying
clusters,
while
the
1
st
feature
and
the
2
nd
feature
are
redundanc
y
,
because
the
y
ha
v
e
the
same
mean.
Therefore,
if
we
w
ould
lik
e
to
choose
tw
o
features
(i.e.,
m
=
2
),
the
first
selected
one
should
be
the
3
r
d
feature,
and
the
second
one
may
be
the
1
st
feature
or
the
2
nd
feature.
In
other
w
ords,
the
set
of
selected
features
should
not
be
the
1
st
feature
and
the
2
nd
feature,
because
the
y
can
capture
only
2
clusters
of
the
original
data
as
sho
wn
in
Figure
1
(a).
(a)
(b)
(c)
Figure
1.
Relationship
between
features
in
the
synthetic
data:
(a)
Feature
1
vs.
Feature
2,
(b)
Feature
1
vs.
Feature
3,
(c)
Feature
2
vs.
Feature
3
Unsupervised
featur
e
selection
with
least-squar
es
quadr
atic
mutual
information
(J
anya
Sainui)
Evaluation Warning : The document was created with Spire.PDF for Python.
1624
r
ISSN:
2502-4752
W
e
ran
all
methods
using
this
to
y
data,
where
each
method
w
as
repeated
10
times
with
dif
ferent
random
data.
W
e
found
that
UFSMI
performs
the
w
orst,
because
the
3
r
d
feature
w
as
not
selected
4
times.
LS
missed
selecting
the
3
r
d
feature
1
time,
while
the
proposed
method
and
MCFS
al
w
ays
selected
the
3
r
d
feature.
The
a
v
erage
scores
between
each
feature
and
the
rest
features
obtained
from
each
objecti
v
e
function
are
sho
wn
in
T
able
1,
sho
wing
that
our
objecti
v
e
function,
\
NQMI
,
tends
to
be
more
sensible
than
other
ones.
As
can
be
seen,
the
\
NQMI
score
between
the
3
r
d
feature
and
the
rest
features
is
significantly
highest,
while
the
\
NQMI
scores
according
to
the
2
nd
feature
and
the
1
st
feature
are
close
to
each
other
.
In
contrast
to
other
objecti
v
e
functions,
the
scores
according
to
each
feature
are
almost
the
same
for
all
features.
Notice
that
the
scores
of
LS
sho
wing
in
T
ables
1
and
2
are
the
scores
of
(1
Laplacian
score)
for
each
feature.
Thus,
as
same
as
other
objecti
v
e
functions,
the
feature
with
lar
ger
score
is
more
important.
T
able
1.
The
a
v
erage
scores
o
v
er
10
runs
between
the
feature
j
and
the
rest
of
the
features
obtained
from
dif
ferent
objecti
v
e
functions
using
the
synthetic
data
Objecti
v
e
functions
feature
1
feature
2
feature
3
LS
0.9935
0.9934
0.9936
MI
-9.3682
-9.3771
-9.3730
MCFS
0.7330
0.8010
0.9848
\
NQMI
0.7424
0.7404
1.9034
T
able
2.
The
a
v
erage
scores
o
v
er
10
runs
between
the
feature
j
and
the
rest
of
the
features
obtained
from
dif
ferent
objecti
v
e
functions
after
adding
the
noise
feature
(feature
4)
Objecti
v
e
functions
feature
1
feature
2
feature
3
feature
4
LS
0.9821
0.9818
0.9801
0.9704
MI
-8.1727
-8.1980
-8.2047
-8.8783
MCFS
0.6872
0.7244
0.8226
0.9997
\
NQMI
0.3331
0.3415
0.8952
0.0167
Ne
xt,
we
added
a
noise
feature
(the
4
th
feature)
with
mean
=
0
for
all
clusters
and
standard
de
viation
=
1
into
the
a
b
o
v
e
to
y
data.
W
e
ran
the
e
xperiment
10
times
for
each
method
ag
ain,
and
the
results
sho
w
that
our
method
outperforms
other
methods
including
MCFS.
Moreo
v
er
,
MCFS
performs
the
w
orst
in
this
e
xperiment
as
it
al
w
ays
selected
the
4
th
feature,
while
the
other
three
methods
ne
v
er
selected
the
4
th
feature.
The
a
v
erage
scores
of
each
method
are
sho
wn
in
T
able
2,
confirmi
n
g
that
\
NQMI
is
more
reliable
than
other
objecti
v
e
functions.
Notice
that
MCFS
w
orks
well
if
the
number
of
used
eigen
v
ectors
is
equal
to
the
number
of
clusters;
ho
we
v
er
,
in
an
unsupervised
scenario,
we
do
not
kno
w
the
actual
number
of
clusters.
4.2.
UCI
datasets
Here,
we
e
v
aluated
the
clustering
performance
after
feature
selection
using
six
UCI
benchmark
datasets,
namely
Abalone,
Sonar
,
Glass,
Pima,
Heart,
and
Cancer
datasets.
F
or
each
dataset,
the
number
of
clusters
(
c
),
the
number
of
features
(
d
),
and
the
number
of
samples
(
n
)
are
sho
wn
in
Figure
2.
F
or
each
run,
we
randomly
chosen
90%
samples
from
each
dataset,
then
we
performed
feature
selection.
Finally
,
we
performed
clustering
by
the
k
-means
algorithm;
we
ran
the
k
-means
algorithm
10
times
with
random
initialization
and
chose
the
best
solution
with
the
minimum
error
rate.
The
clustering
accurac
y
is
e
v
aluated.
The
results
of
each
dataset
are
sho
wn
in
Figure
2,
sho
wing
the
a
v
erage
accuracies
o
v
er
10
runs
at
dif
ferent
number
of
seleced
features,
m
.
The
results
indicate
that
the
proposed
method
(i.e.,
UFSLSQMI)
almost
performs
better
than
the
counterpart
approaches.
Ho
we
v
er
,
we
cannot
say
that
the
proposed
method
is
the
best.
Because,
in
practice,
there
is
no
method
that
w
orks
well
for
all
data.
Moreo
v
er
,
it
is
hard
to
e
xplain
the
reasons
wh
y
it
w
orks
or
it
does
not
w
ork.
What
we
can
say
is
that
the
proposed
method
can
be
an
alternati
v
e
for
solving
unsupervised
feature
selection
problem
that
may
w
ork
well
for
your
data.
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
22,
No.
3,
June
2021
:
1619
–
1628
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
r
1625
(a)
(b)
(c)
(d)
(e)
(f)
Figure
2.
The
a
v
erage
of
clustering
accuracies
o
v
er
10
runs
on
UCI
datasets:
(a)
Abalone
(c=10
,d=8
,
n=2493),
(b)
Sonar
(c=2
,d=
60,
n=188),
(c)
Glass
(c=6
,d=9
,
n=
193),
(d)
Pima
(c=
2,d=
8,
n=
692),
(e)
Heart
(c=2
,d=13
,
n=244),
(f)
Cancer
(c=2
,d=9
,
n=615)
4.3.
Discussion
In
the
e
xperime
nts,
we
compar
ed
the
proposed
unsupervised
feature
selection
method
with
the
other
filter
methods.
All
m
ethods
including
the
proposed
method
select
the
informati
v
e
features
from
the
rank
of
features
according
to
the
scores
of
the
objecti
v
e
function.
This
means
that
the
objecti
v
e
function
plays
an
important
role
for
this
task.
W
e
discuss
here
each
objecti
v
e
function
as
fol
lo
ws.
Firstly
,
Laplacian
score
(LS)
is
computed
by
constructing
of
a
nearest
neighbor
graph
with
the
specific-defined
number
of
nearest
neighbors,
k
.
This
is
the
reason
wh
y
the
Laplacian
score
feature
selection
does
not
al
w
ays
perform
well.
In
addit
ion,
as
can
be
seen
in
the
e
xperiments
on
to
y
data
(T
able
1
and
T
able
2),
the
Laplacian
scores
of
the
most
important
feature
and
the
noise
feature
are
not
significantly
dif
ferent
from
each
other
,
this
may
cause
the
rank
of
features
relating
to
the
important
features
is
unreliable.
Unsupervised
featur
e
selection
with
least-squar
es
quadr
atic
mutual
information
(J
anya
Sainui)
Evaluation Warning : The document was created with Spire.PDF for Python.
1626
r
ISSN:
2502-4752
Secondly
,
the
mutual
information
(MI)
is
also
computed
based
on
the
k
-nearest
neighbors
graph.
The
results
in
T
ables
1
and
2
demonstrate
that
this
MI
score
does
not
des
cribe
the
dif
ference
between
t
he
informati
v
e
feature
and
the
noise
feature
well.
In
other
w
ords,
as
same
as
LS,
the
MI
scores
of
the
important
feature
and
the
noise
feature
are
not
significantly
dif
ferent
from
each
other
.
Thirdly
,
the
MCFS
score
is
also
computed
based
on
the
k
-nearest
neighbor
graph
such
that
the
suitable
k
ef
fects
the
good
score.
Not
only
the
parameter
k
,
the
MCFS
score
also
needs
the
number
of
used
eigen
v
ectors
that
mak
es
the
MCFS
score
depend
on
these
tw
o
parame
ters.
As
can
be
seen
in
the
e
xperiment
on
to
y
data
with
noise
feature,
the
MC
FS
score
of
noise
feature
is
unreliable;
ho
we
v
er
,
if
we
change
the
number
o
f
used
eigen
v
ectors
to
be
2
instead
of
5,
the
MCFS
score
of
noise
feature
becomes
more
reliable.
This
confirms
that
the
MCFS
score
hea
vily
depends
on
the
number
of
used
eigen
v
ectors.
Lastly
,
the
proposed
method
utilizes
the
LSQMI
that
approximates
by
least-square
method
so
that
the
cross-v
alidation
can
be
included,
resulting
in
that
the
estimator
of
QMI
is
sensible.
As
can
be
seen
in
T
ables
1
and
2,
the
normalized
LSQMI
describe
the
structure
of
the
original
data
as
well;
the
LSQMI
is
highest
if
the
feature
is
the
most
important
as
the
3
r
d
feature
of
the
to
y
data;
in
contrast,
the
LSQMI
is
smallest
if
the
feature
is
the
noise
feature
that
pro
vides
no
information
about
data
lik
e
the
4
th
feature.
The
main
dra
wback
of
the
proposed
method
is
the
time
comple
xity
that
is
higher
than
other
methods
because
of
the
cross-v
alidation
process.
T
o
sum
up,
the
main
problem
of
the
filter
based
unsupervised
feature
selection
methods
is
lacking
of
the
ability
to
automatically
choose
the
parameters
included
in
their
objecti
v
e
function
so
that
the
approximation
of
the
objecti
v
e
function
may
be
unreliable.
Thus,
we
here
propose
to
use
the
LSQMI
for
unsupervised
feature
selection,
that
is
not
only
reliable
for
e
v
aluating
the
important
features
b
ut
also
for
ignoring
the
redundant
features.
5.
CONCLUSION
The
goal
of
feature
selection
is
to
select
a
subset
of
features
in
order
to
reduce
the
dimensionality
of
the
original
data,
while
the
structure
of
data
should
be
preserv
ed.
There
are
tw
o
main
points
that
are
important
in
order
to
achie
v
e
this
goal.
Firstly
,
we
need
to
select
the
most
informati
v
e
features
underlying
preserving
the
structure
of
the
original
data.
Secondly
,
we
ha
v
e
to
remo
v
e
the
redundant
features
because
the
y
do
not
pro
vide
more
information
about
data.
Ho
we
v
er
,
the
e
xisting
approaches
may
not
select
the
important
features
because
the
use
of
unreliable
objecti
v
e
functions.
Moreo
v
er
,
the
y
do
not
ignore
the
redundant
features.
In
this
paper
,
we
focus
on
these
tw
o
points
by
e
xploring
the
reliable
objecti
v
e
function
that
is
the
most
important
for
e
v
aluating
the
informati
v
e
features
as
well
as
the
redundant
features
in
an
unsupervis
ed
manner
.
Specifically
,
we
propose
to
use
the
least-squares
quadratic
mutual
information
(LSQMI)
as
the
objecti
v
e
function
to
select
the
v
aluable
features
e
xcluding
the
redundant
features.
The
benefit
of
LSQMI
is
that
the
needed
parameters
can
be
chosen
by
cross
v
alidation
resulting
that
the
estimation
of
the
objecti
v
e
function
will
be
more
reliable,
while
most
of
the
other
objecti
v
e
functions
lack
of
the
systematic
w
ay
to
choose
their
parameters.
Through
the
e
xperiments,
we
sho
w
that
the
LSQMI
can
be
used
to
determine
the
important
features
including
the
redundant
feature
as
well.
REFERENCES
[1]
S.
B.
K
ot
siantis,
”Feature
selection
for
machine
learning
classification
problems:
a
recent
o
v
ervie
w
,
”
Artificial
Intellig
ence
Re
vie
w
,
v
ol.
42,
no.
1,
pp.
157–176,
2011,
doi:
10.1007/s10462-011-9230-1.
[2]
J.
Cai,
J.
Luo,
S.
W
ang,
and
S.
Y
ang,
”Feature
selection
in
machine
lear
ning:
a
ne
w
perspecti
v
e,
”
Neur
o-
computing
,
v
ol.
300,
pp.
70-79,
2018,
doi:
10.1016/j.neucom.2017.11.077.
[3]
L.
T
ala
v
era,
C.
Nord,
and
J.
Girona,
”Dependenc
y-based
feat
u
r
e
selection
for
clustering
symbolic
data,
”
Intellig
ent
Data
Analysis
,
v
ol.
4,
no.
1,
pp.
19-28,
2000,
doi:
10.3233/ID
A-2000-4103.
[4]
V
.
Roth
and
T
.
Lange,
”Feature
selection
in
clustering
problems,
”
Advances
in
Neur
al
Information
Pr
o-
cessing
Systems
,
v
ol.
16,
pp.
473–480,
2004.
[5]
Ma.
Dash,
K.
Choi,
P
.
Scheuermann,
and
H.
Liu,
”Feature
selection
for
clustering—a
filter
solution,
”
2002
IEEE
International
Confer
ence
on
Data
Mining
,
2002.
Pr
oceedings
,
2002,
pp.
115-122,
doi:
10.1109/ICDM.2002.1183893.
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
22,
No.
3,
June
2021
:
1619
–
1628
Evaluation Warning : The document was created with Spire.PDF for Python.
Indonesian
J
Elec
Eng
&
Comp
Sci
ISSN:
2502-4752
r
1627
[6]
S.
V
isalakshi
and
V
.
Radha,
”A
lit
erature
re
vie
w
of
feature
selection
techniques
and
applications:
Re
vie
w
of
feature
selection
in
data
mining,
”
IEEE
International
Confer
ence
on
Computational
Intellig
ence
and
Computing
Resear
c
h,
2014
,
pp.
1-6,
2014,
doi:
10.1109/ICCIC.2014.7238499.
[7]
J.
T
ang,
S.
Alelyani,
and
H.
Liu,
”Feature
selection
for
classification:
a
re
vie
w
,
”
Data
classification:
Algorithms
and
applications
,
CRC
Press,
pp.
37–64,
2014.
[8]
J.
R.
V
er
g
ara
and
P
.
A.
Est
´
ev
ez,
”A
re
vie
w
of
feature
selection
methods
based
on
mutual
information,
”
Neur
al
Computing
and
Application
,
v
ol.
24,
no.
1,
pp.
175–186,
2014,
doi:
10.1007/s00521-013-1368-0.
[9]
J.
Xu,
Z.
Y
uming,
C.
Lin,
and
X.
Bao
wen,
”An
unsupervised
feature
selection
approach
based
on
mutual
information,
”
J
ournal
of
Computer
Resear
c
h
and
De
velopment
,
v
ol.
49,
no.
2,
pp.
372-382,
2012.
[10]
X.
He,
D.
Cai,
and
P
.
Niyogi,
”Laplacian
score
for
feature
selection,
”
Advances
in
Neur
al
Information
Pr
ocessing
Systems
,
2005,
pp.
507-514.
[11]
D.
Cai,
C.
Zhang,
and
X.
He,
”Unsupervised
feature
selection
for
multi-cluster
data,
”
Pr
oceedings
of
the
16th
A
CM
SIGKDD
internati
onal
confer
ence
on
Knowledg
e
disco
very
and
data
mining
,
pp.
333-342,
2010,
doi:
10.1145/1835804.1835848.
[12]
Y
.
W
ang,
”Unsupervised
Representati
v
e
Feature
Selection
Algorithm
Based
on
Information
En-
trop
y
and
Rele
v
ance
Analysis,
”
in
IEEE
Access
,
v
ol.
6,
pp.
45317-45324,
2018,
doi:
10.1109/A
C-
CESS.2018.2863752.
[13]
L.
Shi,
L.
Du,
and
Y
.
Shen,
”Rob
ust
Spectral
Learning
for
Unsupervised
Feature
Selection,
”
IEEE
Inter
-
national
Confer
ence
on
Data
Mining
,
2014,
pp.
977-982,
doi:
10.1109/ICDM.2014.58.
[14]
F
.
Nie,
W
.
Zhu,
and
X.
Li,
”Unsupervised
feature
selection
with
structured
graph
optimiza-
tion,
”
Pr
oceedings
of
the
30th
Confer
ence
on
Artificial
Intellig
ence
,
2016,
pp.
1302–1308,
doi:
10.5555/3015812.3016004
.
[15]
S.
S.-Fern
´
andez,
J.
A.
C.-Ochoa,
and
J.
Fco.
M.-T
rinidad,
”A
re
vie
w
of
unsupervised
feature
selection
methods,
”
Artificial
Intellig
ence
Re
vie
w
,
v
ol.
53,
pp.
907–948,
2020,
doi:
10.1007/s10462-019-09682-y
.
[16]
S.
Alelyani,
J.
T
ang,
and
H.
Liu,
”Feature
selection
for
clustering:
a
re
vie
w
,
”
Data
Cluster
Algorithms
Applications
,
pp.29-60,
2018.
[17]
M.
Dash
and
Y
.
Ong,
”RELIEF-C:
ef
ficient
feature
selection
for
clustering
o
v
er
noisy
data,
”
2011
IEEE
23r
d
International
Confer
ence
on
T
ools
with
Artificial
Intellig
ence
,
2011,
pp.
869-872,
doi:
10.1109/IC-
T
AI.2011.135.
[18]
V
.
M.
Rao
and
V
.
N.
Sastry
,
”Unsupervised
feature
ranking
based
on
representation
entrop
y
,
”
2012
1st
International
Confer
ence
on
Recent
Advances
in
Information
T
ec
hnolo
gy
(RAIT)
,
2012,
pp.
421-425,
doi:
10.1109/RAIT
.2012.6194631.
[19]
J.
G.
Dy
and
C.
E.
Brodle
y
,
”Feature
selection
for
unsupervised
learning,
”
J
ournal
of
Mac
hine
Learning
Resear
c
h
,
v
ol.
5,
pp.
845–889,
2004.
[20]
P
.
Y
.
Lee,
W
.
P
.
Loh,
and
J.
F
.
Chin,
”Feature
selection
in
multimedia:
the
state-of-the-art
re
vie
w
,
”
Ima
g
e
V
ision
Computing
,
v
ol.
67,
pp.
29–42,
2017,
doi:
10.1016/j.ima
vis.2017.09.004.
[21]
R.
V
arsha
vsk
y
,
A.
Gottlieb,
M.
Linial,
and
D.
Horn,
”No
v
el
unsupervised
feature
filtering
of
biological
data,
”
Bioinformatics
,
v
ol.22,
no.
14,
pp.
e507–e513,
2006,
doi:
10.1093/bioinformatics/btl214.
[22]
M.i
Banerjee
and
N.
R.
P
al,
”Feature
selection
with
SVD
entrop
y:
some
modification
and
e
xtension,
”
Information
Sciences
,
v
ol.
264,
pp.
118–134,
2014,
doi:
10.1016/j.ins.2013.12.029.
[23]
S.
S.-Fern
´
andez,
J.
F
.
Mart
´
ınez-T
rinidad,
and
J.
A.
Carrasco-Ochoa,
”A
ne
w
unsupervised
spectral
feature
selection
method
for
mi
x
ed
data:
a
filter
approach,
”
P
attern
Reco
gnition
,
v
ol.
72,
pp.
314–326,
2017,
doi:
10.1016/j.patcog.2017.07.020.
[24]
X.
Zhu,
Y
.
W
ang,
Y
.
Li,
Y
.
T
an,
G.
W
ang,
and
Q.
Song,
”A
ne
w
unsupervised
feature
selection
algorithm
using
similarity-based
feature
clustering,
”
Computational
Intellig
ence
,
v
ol.
35,
no.
1,
pp.
2-22,
2018,
doi:
10.1111/coin.12192.
[25]
H.
W
ang,
Y
.
Zhang,
J.
Zhang,
T
.
Li,
and
L.
Peng,
”A
f
actor
graph
model
for
unsupervised
feature
selec-
tion,
”
Information
Sciences
,
v
ol.
480,
pp.
144-159,
2019,
doi:
10.1016/j.ins.2018.12.034.
[26]
K.
T
orkk
ola,
”Feature
e
xtraction
by
non-parametric
mutual
information
maximization,
”
The
J
ournal
of
Mac
hine
Learning
Resear
c
h
,
v
ol.
3,
pp.
1415-1438,
2003.
[27]
J.
Sainui
and
M.
Sugiyama,
”Direct
approximation
of
quadratic
mutual
information
and
its
application
to
dependence
maximization
clustering,
”
IEICE
T
r
ansactions
on
Information
and
Systems
,
v
ol.
E96-D,
no.
10,
pp.
2282-2285,
2013,
doi:
10.1587/transinf.E96.D.2282.
Unsupervised
featur
e
selection
with
least-squar
es
quadr
atic
mutual
information
(J
anya
Sainui)
Evaluation Warning : The document was created with Spire.PDF for Python.
1628
r
ISSN:
2502-4752
[28]
J.
Sainui
and
M.
Sugiyama,
”Unsupervised
k
e
y
frame
selection
using
information
theory
and
colour
histogram
dif
ference,
”
International
J
ournal
of
Business
Intellig
ence
and
Data
Mining
,
v
ol.
16,
no.
3,
pp.
324-344,
2020,
doi:
10.1504/ijbidm.2020.106137.
BIOGRAPHIES
OF
A
UTHORS
J
anya
Sainui
recei
v
ed
her
BS
and
MS
in
Computer
Science
from
Prince
of
Songkla
Uni
v
ersity
,
Hat
Y
ai,
Songkhla,
Thailand,
in
2005
and
2009,
respecti
v
ely
.
She
is
currently
a
Lecturer
in
Computer
Science
at
Prince
of
Songkla
Uni
v
ersity
,
Thailand.
She
has
conduct
ed
research
on
topics
such
as
bitmap
inde
xing,
video
inde
xing
and
detection,
clustering
and
unsupervised
dimension
reduction.
She
is
interested
in
algorithm
and
application
of
machine
learning,
especially
,
applying
machine
learning
techniques
to
real
w
orld
applications
such
as
image/video
processing,
medical
imaging,
bioinformatics,
as
well
as
information
retrie
v
al.
Chouv
anee
Sri
visal
is
currently
a
Lecturer
in
Computer
Science
at
F
aculty
of
Science
Prince
of
Songkla
Uni
v
ersity
,
Thailand.
She
recei
v
ed
her
BS
in
Computer
Science
from
Prince
of
Songkla
Uni
v
ersity
and
MS
in
Information
T
echnology
from
King
Mongkut’
s
Institute
of
T
echnology
Lad-
krabang,
Thailand
in
1997
and
2002,
respecti
v
ely
.
She
has
interested
in
algorithm
and
application
of
machine
learning
that
for
conducting
research
on
topics
such
as
image
processing
and
information
retrie
v
al.
Indonesian
J
Elec
Eng
&
Comp
Sci,
V
ol.
22,
No.
3,
June
2021
:
1619
–
1628
Evaluation Warning : The document was created with Spire.PDF for Python.