Inter
national
J
our
nal
of
Electrical
and
Computer
Engineering
(IJECE)
V
ol.
7,
No.
2,
April
2017,
pp.
1071
–
1087
ISSN:
2088-8708
1071
I
ns
t
it
u
t
e
o
f
A
d
v
a
nce
d
Eng
ine
e
r
i
ng
a
nd
S
cie
nce
w
w
w
.
i
a
e
s
j
o
u
r
n
a
l
.
c
o
m
Cr
edal
Fusion
of
Classifications
f
or
Noisy
and
Uncertain
Data
F
atma
Kar
em
1
,
Mounir
Dhibi
2
,
Ar
naud
Martin
3
,
and
Med
Salim
Bouhlel
4
1,4
Research
Unit
SETIT
,
Higher
Institute
of
Biotechnology
Sf
ax,
3038
T
unisia
2
Research
Unit
PMI
09/UR/13-0,
Zarroug
ISSA
T
Gafsa
2112
T
unisia
3
Uni
v
ersity
of
Rennes
1,
UMR
6074
IRISA,
Edouard
Branly
street
Bp
30219,
22302
Lannion
Cede
x
France
Article
Inf
o
Article
history:
Recei
v
ed
Oct
24,
2016
Re
vised
Feb
8,
2017
Accepted
Feb
22,
2017
K
eyw
ord:
lustering
Classification
Combination
Belief
function
theory
Noise
ABSTRA
CT
This
paper
reports
on
an
in
v
estig
ation
in
c
lassification
technique
emplo
yed
to
clas
sify
noised
and
uncertain
data.
Ho
we
v
er
,
classification
is
not
an
ea
sy
task.
It
is
a
significant
challenge
to
disco
v
er
kno
wledge
from
uncertain
dat
a.
In
f
act,
we
can
find
man
y
problems.
More
time
we
don’
t
ha
v
e
a
good
or
a
big
learning
database
for
s
upervised
classification.
Also,
when
training
data
contains
noise
or
missing
v
alue
s,
classification
accurac
y
will
be
af
fected
dra-
matically
.
So
to
e
xtract
groups
from
data
is
not
easy
to
do.
The
y
are
o
v
erlapped
and
not
v
ery
separated
from
each
other
.
Another
proble
m
which
can
be
cited
here
is
the
uncertainty
due
to
measuring
de
vices.
Consequentially
classification
model
is
not
so
rob
ust
and
strong
to
classify
ne
w
objects.
In
this
w
ork,
we
present
a
no
v
el
classification
al
gorithm
to
co
v
er
these
problems.
W
e
materialize
our
main
idea
by
using
belief
function
theory
to
do
com-
bination
between
classification
and
clustering.
This
theory
treats
v
ery
well
imprecision
and
uncertainty
link
ed
to
classification.
Experimental
results
sho
w
that
our
approach
has
ability
to
significantly
impro
v
e
the
quality
of
classification
of
generic
database.
Copyright
c
2017
Institute
of
Advanced
Engineering
and
Science
.
All
rights
r
eserved.
Corresponding
A
uthor:
F
atma
Karem
Research
Unit
SETIT
,
Higher
Institute
of
Biotechnology
Sf
ax,
3038
T
unisia
60
street
p
yramids,
Assala,
Gafsa,
2100
T
unisia
Phone
+216.27.951.381
Email
f
atoumakarem@gmail.com
1.
INTR
ODUCTION
There
are
tw
o
broads
of
classification
technique:
supervised
and
unsupervised
one.
Supervised
classification
is
the
essential
tool
used
for
e
xtracting
quantitati
v
e
information
based
on
learning
databas
e.
All
e
xtracted
feat
ures
are
assigned
by
labels
e
xamples.
It
tries
to
classify
objects
by
measuring
similarity
between
ne
w
and
learning
database.
The
second
technique
is
based
on
clusters
by
measuring
tw
o
criteria
essentially
compacity
and
separat
ion
[1],[2].
It
tries
to
form
clusters
which
are
compact
and
separable
the
possible
maximum.
Grouping
data
is
not
e
vident.
Firstly
clusters
are
o
v
erlapped
most
of
the
times.
Secondly
,
data
to
classify
are
generally
v
ery
comple
x.
Moreo
v
er
,
there
is
not
a
unique
quality
criteria
to
measure
the
goodness
of
classification.
Generally
,
v
alidity
inde
x
is
u
s
ed
to
measure
the
quality
of
clusteri
n
g.
Until
no
w
,
there’
s
no
standard
one
which
is
uni
v
ersal.
It
v
aries
from
an
applicat
ion
to
another
.
Data
to
classify
are
not
al
w
ays
correct
especially
in
real
applications.
The
y
can
be
uncertain
or
ambiguous.
The
y
are
dependent
of
acquisition
de
vices
or
e
xpert
opinions.
Con-
sequently
,
the
result
of
classification
will
be
uncertain.
Besides,
labeled
e
xamples
used
for
training
may
be
sometimes
not
a
v
ailable.
Due
to
these
limits
and
for
the
objecti
v
e
to
impro
v
e
classification
process,
we
propose
to
combine
clas-
sification
and
clustering.
This
combination
also
named
fusion
procedure
aims
to
tak
e
account
of
the
complementarity
between
both.
Clustering
is
used
to
o
v
ercome
problems
of
learning
and
o
v
er
-fitting.
Combination
is
made
by
using
belief
functions
theory
.
This
theory
is
well
kno
wn
in
treating
problems
of
uncertainty
and
imprecision.
In
this
paper
,
we
report
our
recent
research
ef
forts
to
w
ard
this
goal.
Fi
rst,
we
present
basic
concepts
of
belief
functions
theory
.
Then,
we
propose
a
no
v
el
classification
mechanism
based
on
combination.
Ne
w
process
aims
to
impro
v
e
classification
results
related
to
noisy
en
vironment
and
missing
data.
W
e
conduct
e
xperiments
on
generic
data
to
sho
w
the
quality
of
data
mining
results.
The
rest
of
this
paper
is
or
g
anized
as
follo
ws.
Related
w
ork
on
J
ournal
Homepage:
http://iaesjournal.com/online/inde
x.php/IJECE
I
ns
t
it
u
t
e
o
f
A
d
v
a
nce
d
Eng
ine
e
r
i
ng
a
nd
S
cie
nce
w
w
w
.
i
a
e
s
j
o
u
r
n
a
l
.
c
o
m
,
DOI:
10.11591/ijece.v7i2.pp1071-1087
Evaluation Warning : The document was created with Spire.PDF for Python.
1072
ISSN:
2088-8708
noise
handling
is
discussed
in
subsection
”related
w
orks”.
In
Section
III,
we
describe
the
details
of
proposed
fusion
mechanism.
Experimental
results
and
discussion
are
presented
in
Section
IV
.
In
the
final
section,
we
conclude
this
paper
out
the
future
w
ork.
2.
THEORETICAL
B
ASIS
W
e
present
here
essentially
the
belief
function
theory
our
frame
of
fusion
of
information.
Then,
we
present
some
w
orks
done
in
fusion
of
classifications.
2.1.
Belief
function
theory
Fusion
is
a
combination
process
of
multiple
data
or
information
coming
from
dif
ferent
sources
in
order
to
mak
e
a
decision.
The
final
decision
is
better
than
the
indi
vidual
ones.
The
v
ariety
of
information
implied
in
the
combination
process
mak
es
the
added
v
alue.
Combination
is
needed
in
problems
where
ambiguity
and
uncertainty
are
big.
W
e
may
be
sometimes
unable
to
mak
e
an
indi
vidual
decision.
T
o
raise
the
ambiguity
,
we
must
fuse.
The
applications
requiring
fusion
are
multiple.
W
e
find
medicine
[3],[4]
for
e
xample.
Sometimes,
it
is
dif
ficult
to
do
a
good
diagnosis
disease
indi
vidually
.
It
will
be
better
to
fuse
between
doctors
opinions.
T
umor
detection
is
the
well
kno
wn
application.
W
e
find
also
image
processing
applications
[5],[6],
classification
[7],[8],
remote
detection,
artificial
intelligence,
pattern
recognition
[9]
etc.
The
means
of
combination
are
multiple.
W
e
call
it
uncertain
theories.
W
e
find
v
ote
theory
,
possibility
theory
,
probability
theory
and
belief
function
theory
.
The
latter
sho
ws
rob
ustness
in
front
of
uncertainty
and
imprecision
problems.
The
theory
is
in
v
ented
by
Dempster
in
1967
and
resumed
by
Shafer
.
It
is
also
called
Dempster
-Shafer
theory
[10],[11].
Belief
function
theory
models
beliefs
in
an
e
v
ent
by
a
function
called
mass
function.
W
e
note
by
m
j
the
mass
function
of
the
source
S
j
.
It
is
defined
in
the
set
2
,
their
v
alues
are
in
[0
;
1]
and
v
erify
the
constraint:
X
A
2
2
m
j
(
A
)
=
1
(1)
2
is
the
set
of
decision
or
class
disjunctions
C
i
if
we
talk
about
classification:
2
=
f;
;
f
C
1
g
;
f
C
2
g
;
f
C
1
[
C
2
g
;
:
:
:
;
g
.
The
parts
A
of
ha
ving
the
condition
m
(
A
)
>
0
are
called
focal
elements.
The
set
of
focal
ele-
ments
is
called
k
ernel.
m
(
A
)
is
a
measure
of
e
vidence
allocated
e
xactly
to
the
h
ypothesis
X
2
A
.
Classes
C
i
must
be
e
xclusi
v
e
a
n
d
not
necess
arily
e
xhausti
v
e.
Belief
function
theory
measures
imprecision
and
uncertainty
by
man
y
func-
tions
such
as
credibility
and
plausibility
.
Credibility
i
s
the
minimum
belief.
It
tak
es
account
of
the
conflict
between
sources.
Credibility
is
defined
by:
C
r
j
(
X
)
=
X
Y
X
;X
6
=
;
m
(
Y
)
(2)
Plausibility
function
measures
the
maximal
belief
in
X
2
2
.
W
e
suppose
that
all
decisions
are
complete
so
we
are
in
a
closed
frame
of
discernment:
P
l
j
(
X
)
=
X
Y
2
2
;Y
\
X
6
=
;
m
j
(
Y
)
=
C
r
j
()
C
r
j
(
X
c
)
=
1
m
j
(
;
)
C
r
j
(
X
c
)
(3)
X
c
is
the
complement
of
X
.
T
o
represent
a
problem
by
the
concepts
of
belief
functions
theory
,
we
should
respect
three
s
teps:
model
ling,
combination
and
decision.
There’
s
an
intermediate
step:
discounting.
It
can
be
done
before
or
after
combination.
It
me
asures
the
reliability
of
sources.
A
reliability
coef
ficient
is
used
here
noted
by
.
The
first
step
is
the
most
crucial.
W
e
must
choose
the
suitable
model
to
represent
mass
functions.
It
depends
on
the
conte
xt
and
the
application.
It
can
be
computed
by
man
y
w
ays.
W
e
find
essentially
probabilistic
and
distance
models.
F
or
the
second
step:
combination
can
be
done
by
using
dif
ferent
opera
tors.
The
choice
of
the
suitable
operator
depends
on
the
conte
xt.
Man
y
h
ypotheses
control
the
operator
such
as
the
independence
and
reliability
of
sources.
W
e
find
man
y
operators
such
as
conjuncti
v
e,
disjuncti
v
e
and
cautious
operators
or
rules
[12].
The
first
suppose
that
sources
are
independent
and
reliable
whereas
the
second
suppose
that
one
of
both
should
be
reli
able.
The
cautious
rule
doesn’
t
impose
independence
h
ypothesis
for
the
sources.
So
it
allo
ws
dependence
and
redundanc
y
.
This
situation
may
be
IJECE
V
ol.
7,
No.
2,
April
2017:
1071
–
1087
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
1073
encountered
in
practice
for
e
xample
e
xperts
may
share
some
information.
Classifiers
may
be
trained
on
the
same
learning
sets
or
not
separate
ones.
The
conjuncti
v
e
combination
fuse
by
considering
the
intersections
between
the
elements
of
2
.
It
reduces
imprecision
of
focal
ele
ments
and
increases
belief
in
the
elements
where
sources
agree.
If
we
ha
v
e
m
mass
functions
to
combine
we
ha
v
e
the
follo
wing
formula:
m
(
A
)
=
(
m
1
\
m
2
\
::::::::::::::
\
m
M
)(
A
)
=
X
B
1
T
B
2
T
:::::::
T
B
M
=
A
M
Y
j
=1
m
j
(
B
j
)
(4)
F
or
the
cautious
rule
it
is
defined
as
follo
wing:
m
12
=
m
1
^
m
2
=
\
A
A
w
1
(
A
)
^
w
2
(
A
)
=
\
A
A
w
12
(
A
)
(5)
m
12
is
the
information
g
ained
by
the
tw
o
sources
S
1
and
S
2
.
It
must
be
more
informati
v
e
than
m
1
and
m
2
.
If
we
try
t
o
formalize
this
we
ha
v
e
the
follo
wing:
m
12
2
S
(
m
1
)
\
S
(
m
2
)
.
S
(
m
)
is
the
set
of
mass
functions
more
informati
v
e
than
m
.
T
o
choose
the
most
i
n
f
ormati
v
e
mass
function
we
apply
the
least
commitment
principle
(LCP).
It
i
s
based
on
this
principle:
if
se
v
eral
mass
functions
are
compatible
with
some
constraints
the
least
informati
v
e
one
in
S
(
m
1
)
\
S
(
m
2
)
should
be
selected.
This
element
is
unique
it
is
the
non
dogmatic
mass
function
(
m
()
>
0
)
m
12
with
the
follo
wing
weight
function:
w
12
(
A
)
:
=
w
1
(
A
)
^
w
2
(
A
)
;
8
A
(6)
w
(
A
)
is
a
representation
of
a
non
dogmatic
mass
function
(simple
mass
function),
it
may
be
computed
from
m
as
follo
ws:
w
(
A
)
:
=
Y
A
B
q
(
B
)
(
1)
j
B
jj
A
j
+1
;
8
A
(7)
q
is
the
commonality
function
defined
as:
q
(
A
)
:
=
X
B
A
m
(
B
)
;
8
A
(8)
T
o
apply
this
principle
some
informational
ordering
between
mass
functions
has
to
be
chosen.
Man
y
orderings
can
be
used
such
as
q-ordering
and
w-ordering.
The
first
one
af
firms
that
m
1
is
q-more
committed
or
informati
v
e
than
m
2
noted
by
m
1
v
q
m
2
(9)
if
it
v
erifies
the
follo
wing
constraint:
q
1
(
A
)
q
2
(
A
)
;
8
A
(10)
The
second
one
is
ba
sed
on
the
conjuncti
v
e
weight
function:
m
1
is
w-more
committed
than
m
2
(noted
by
m
1
v
w
m
2
)
if
it
v
erifies
the
follo
wing
constraint:
w
1
(
A
)
w
2
(
A
)
;
8
A
(11)
After
calculating
the
mass
functions
and
combining,
we
obtain
the
masses
relati
v
e
to
the
dif
ferent
elements
of
the
frame
of
discernment.
W
e
must
tak
e
a
decision
or
af
fect
a
class
if
we
ha
v
e
to
classify
at
the
end.
It
is
made
by
using
a
criteria.
Criterion
are
multiple.
W
e
mention
maximum
of
plausibility
,
maximum
of
credibility
and
pignistic
proba-
bility
.
F
or
the
first
criteria,
we
choose
the
singleton
or
class
C
i
gi
ving
the
maximum
of
plausibility
.
F
or
an
object
or
Cr
edal
Fusion
of
Classifications
for
Noisy
and
Uncertain
Data
(Kar
em)
Evaluation Warning : The document was created with Spire.PDF for Python.
1074
ISSN:
2088-8708
v
ector
x
,
we
decide
C
i
if:
P
l
j
(
C
i
)(
x
)
=
max
1
k
n
P
l
(
C
k
)(
x
)
(12)
This
criteria
is
optimistic
because
the
plausibility
of
a
singleton
measures
the
belief
obtained
if
all
disjunction
masses
are
focused
on
this
one.
Second
criteria
chooses
C
i
for
x
if
it
gi
v
es
the
maximum
credibility:
C
r
j
(
C
i
)(
x
)
=
m
ax
1
k
n
C
r
(
C
k
)(
x
)
(13)
This
criteria
is
more
selecti
v
e
because
credibility
function
gi
v
es
the
minimum
belief
committed
to
a
decision.
The
third
criteria
is
between
the
tw
o
criterion.
It
mo
v
es
closer
credibility
and
plausibility
.
F
or
a
class
C
i
,
the
pignistic
probability
is
defined
as:
bet
(
C
i
)
=
X
A
2
2
;C
i
2
A
m
(
A
)
j
A
j
(1
m
(
C
i
))
(14)
j
A
j
is
the
cardinality
of
A
.
The
maximum
of
pignistic
probability
decide
C
i
for
an
observ
ation
x
if:
bet
(
C
i
)(
x
)
=
max
1
k
n
bet
(
C
k
)(
x
)
(15)
This
criteria
is
more
adapted
to
a
probabilistic
conte
xt.
In
the
ne
xt
section,
we
present
some
w
orks
related
to
classifi-
cation
combination.
2.2.
Related
w
orks
Man
y
researches
are
done
about
fusion
in
classification.
Most
of
them
is
about
either
clustering
[13,
14,
15,
16,
17]
neither
classification
[18,
11].
Some
researches
deplo
y
combination
to
impro
v
e
classification
performance.
Other
one
deplo
y
fusion
to
construct
a
ne
w
classifier
such
as
neural
netw
ork
based
on
belief
function
[19]
or
credal
K
N
N
[20]
or
credal
decision
tree
[21].
In
[19],
the
study
presents
a
solution
to
problems
bound
in
bayesian
model.
Conditional
densities
and
a
priori
probabilities
of
classes
are
unkno
wn.
The
y
can
be
estimated
from
learning
samples.
The
estimation
is
not
reli
able
especially
if
the
set
of
learning
database
is
small.
Moreo
v
er
,
it
can
not
represent
v
ery
well
uncertainty
connected
to
class
membership
of
ne
w
objects.
If
we
dispose
of
fe
w
labeled
e
xamples
and
we
ha
v
e
to
classify
ne
w
object
which
is
v
ery
dissimilar
of
other
ones
uncertainty
will
be
big.
This
state
of
ignorance
is
not
reflected
by
the
outputs
of
statistical
classifier
.
This
situation
is
met
in
man
y
real
applications
lik
e
medicine:
diagnosis
disease.
So
it
tries
to
measure
uncertainty
bound
to
the
class
of
the
ne
w
object
considering
the
information
gi
v
en
by
the
learning
data.
Suppose
that
we
ha
v
e
a
ne
w
object
to
classify
,
we
focus
on
his
neighbors.
The
y
are
considered
as
e
vidence
elements
or
h
ypothese
s
about
class
membership
of
the
ne
w
one.
Masses
are
assigned
to
each
class
and
for
each
neighbor
of
the
ne
w
object
to
classify
.
The
beliefs
are
represented
by
basic
belief
assignment
and
combined
by
Dempster
-Shafer
theory
to
decide
to
which
class
it
belongs
to.
The
study
doesn’
t
depend
strongly
of
the
number
of
neighbors.
In
[21],
decision
tree
(classifying
tw
o
classes)
are
combined
to
solv
e
multi-class
problem
using
belief
function
theory
.
Classic
decision
trees
bases
on
probabili
ties.
The
y
are
not
al
w
ays
suitable
to
some
problems
lik
e
uncertainty
.
Uncertaint
y
of
inputs
and
outputs
can
not
be
modelled
v
ery
well
by
probability
.
Moreo
v
er
,
a
good
learning
database
is
not
al
w
ays
a
v
ailable.
The
research
proposes
an
e
xtension
to
a
pre
vious
study
dealing
with
decision
tree
solving
tw
o
class
problem.
It
is
based
on
belief
function.
The
ne
w
study
aims
to
treat
multi-class
problem
by
combining
decision
trees
(tw
o
class
problem)
using
e
vidence
theory
.
In
[22],
tw
o
supervised
classifier
are
combined
which
are
Support
V
ector
Machines
and
K-Nearset
neighbors.
Combination
aims
to
impro
v
e
classification
performance.
Each
of
them
has
disadv
antages.
S
V
M
for
e
xample
depends
strongly
on
learning
samples.
It
is
sensiti
v
e
to
the
noise
and
the
intruder
.
K
N
N
is
a
statistical
classifier
.
It
is
also
sensiti
v
e
to
noise.
A
ne
w
h
ybrid
algorithm
is
proposed
t
o
o
v
ercome
the
limits
of
both
classifiers.
Concerning
the
combination
of
clustering,
man
y
res
earches
are
done.
In
[13],
a
no
v
el
classifier
is
proposed
based
on
a
collaboration
between
man
y
clustering
techniques.
The
process
of
collaboration
tak
es
place
in
three
stages:
parallel
and
independent
clusterings,
refinement
and
e
v
aluation
of
the
results
and
unification.
The
second
stage
is
the
most
dif
ficult
.
Correspondence
between
the
dif
ferent
clusters
obtained
by
the
classifiers
is
look
ed.
Conflict
between
results
may
be
found.
An
iterati
v
e
resolution
of
conflict
is
done
in
order
to
obtain
a
similar
number
of
clusters.
The
possible
actions
to
solv
e
conflicts
are
fusion,
deletion
and
split
of
clusters.
After
that,
results
are
unified
thanks
to
v
ote
technique.
Combination
w
as
used
to
analyze
multi-sources
images.
Fusion
w
as
needed
because
sources
are
heterogeneous.
In
[23],
man
y
techniques
of
clustering
collaboration
are
presented.
It
dif
fers
by
the
type
of
result.
Result
can
be
a
unique
partition
of
data
or
an
ensemble
of
clustering
results.
F
or
the
IJECE
V
ol.
7,
No.
2,
April
2017:
1071
–
1087
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
1075
first
t
yp
e
of
result,
fusion
techniques
of
classification
are
used.
F
or
the
second,
multi-objecti
v
e
clustering
methods
are
used.
The
y
try
to
optimize
simultaneously
man
y
criteria.
At
the
end
of
process,
the
set
of
results
is
obtained.
It
is
the
best
result
that
compromises
between
the
criteria
to
be
optimized.
Concerning
fusion
between
clustering
and
classification
man
y
researches
deplo
y
clustering
in
the
learning
phase
of
supervised
classification
[24],[25],[26].
3.
RESEARCH
METHOD
This
w
ork
is
an
impro
v
ement
of
a
pre
vious
one.
The
former
[27]
w
as
established
to
combine
clustering
and
classification
in
order
to
impro
v
e
their
performance.
Both
has
dif
ficulties.
F
or
clustering,
we
ha
v
e
essentially
problems
of
comple
x
data
and
inde
x
v
alidity
.
F
or
clas
sification,
we
ha
v
e
problem
of
lack
of
learning
database.
W
e
used
belief
function
theory
to
fuse.
W
e
respect
the
three
steps
of
combination
process:
modelling,
combination
and
decision.
Our
frame
of
discernment
is:
2
,
:
=
f
q
j
;
j
:
=
1
;
:
:
:
;
n
g
where
n
number
of
classes
q
j
found
by
the
supervised
classifier
.
F
or
modelling
step,
both
sources
must
gi
v
e
their
beliefs
in
the
classes.
Unsupervised
source
gi
v
es
as
outputs
clusters.
The
classes
are
unkno
wn
for
it.
Ho
w
can
the
clustering
source
gi
v
e
their
beliefs
for
it?
T
o
do
that,
we
look
for
the
similarity
between
classes
and
clusters.
More
the
similarity
is
big
more
the
tw
o
cl
assifications
agree
with
each
other
.
Generally
to
measure
similarity
we
use
distance.
If
we
try
to
measure
distance
between
a
cluster
and
a
class,
we
will
confront
a
big
problem
which
is
the
choice
of
the
best
distance.
W
e
chose
to
look
for
the
reco
v
ery
between
clusters
and
classes.
More
the
y
ha
v
e
objects
in
common
more
the
y
are
similar
.
Concerning
supervised
source,
we
used
probabilistic
model
of
Appriou.
Only
singletons
interested
us.
In
the
combination
phase,
we
adopted
the
conjuncti
v
e
rule.
It
w
orks
in
the
intersections
of
the
elements
of
the
frame
of
discernment.
At
the
end,
we
must
decide
to
which
class
belong
each
object.
The
decision
is
made
by
using
a
criteria.
W
e
decide
fol
lo
wing
the
pignistic
criteria.
It
compromises
between
credibility
and
plausibility
.
T
o
summarize
the
process,
we
ha
v
e
the
follo
wings:
Step
1:
Modelling
Masses
are
computed
for
both
sources
supervised
and
unsupervised:
Clustering
(unsupervised
source):
W
e
look
for
the
proportions
of
found
classes
q
1
;
:
:
:
;
q
n
by
the
supervised
cl
assifier
in
each
cluster
[14],[13].
8
x
2
C
i
with
c
the
number
of
clusters
found.
The
mass
function
for
an
object
x
to
be
in
the
class
q
j
is
as
follo
ws:
m
ns
(
q
j
)
=
j
C
i
\
q
j
j
j
C
i
j
(16)
where
j
C
i
j
the
number
of
elements
in
the
cluster
C
i
and
j
C
i
\
q
j
j
,
the
number
of
elements
in
the
intersection
between
C
i
and
q
j
.
Then
we
discount
the
mass
functions
as
follo
ws,
8
A
2
2
by:
m
ns
i
(
A
)
=
i
m
ns
(
A
)
(17)
m
i
ns
()
=
1
i
(1
m
ns
())
(18)
The
discounting
coef
ficient
i
depends
on
objects.
W
e
can
not
discount
in
the
same
w
ay
all
the
objects.
An
object
situated
in
the
center
of
a
cluster
is
considered
more
representati
v
e
of
the
cluster
than
another
one
situated
in
the
border
for
e
xample.
The
coef
ficient
i
is
defined
as
(
v
i
is
the
center
of
cluster
C
i
):
i
=
e
k
x
v
i
k
2
(19)
Classification
(supervised
source):
W
e
used
the
probabilistic
model
of
Appriou:
m
j
s
(
q
j
)(
x
k
)
=
ij
R
s
p
(
q
i
j
q
j
)
1
+
R
s
p
(
q
i
j
q
j
)
(20)
m
j
s
(
q
c
j
)(
x
k
)
=
ij
1
+
R
s
p
(
q
i
j
q
j
)
(21)
Cr
edal
Fusion
of
Classifications
for
Noisy
and
Uncertain
Data
(Kar
em)
Evaluation Warning : The document was created with Spire.PDF for Python.
1076
ISSN:
2088-8708
m
j
s
()(
x
k
)
=
1
ij
(22)
q
i
the
real
class,
ij
reliability
coef
ficient
of
the
supervised
classification
concerning
class
q
j
.
Conditional
prob-
abilities
are
computed
from
confusion
matrices
on
the
learning
database:
ij
=
max
p
(
q
i
j
q
j
)(
i
2
1
;
:::;
n
)
(23)
R
s
=
max
q
l
(
p
(
q
i
j
q
l
))
1
(
i;
l
2
1
;
:::;
n
)
(24)
Step
2:
combination
Use
of
conjuncti
v
e
rule
equation
4.
Step
3:
decision
Use
of
pignistic
criteria
equation
15.
Three
impro
v
ements
are
aiming
in
the
present
paper:
noise,
missing
data
(uncertain
data)
and
lack
of
learning
database.
In
the
pre
vious
w
ork
we
ha
v
e
supposed
that
data
are
correct.
T
o
do
so,
we
introduce
certain
modifications
to
the
pre
vious
mechanism.
T
o
compute
masses
for
the
supervised
source
we
k
eep
Appriou’
s
model
20,21,22.
F
or
the
unsu-
pervised
source,
we
follo
w
the
ne
xt
steps:
Step
1
:
F
or
each
cluster
C
i
,
we
combine
supervised
masses
of
the
objects
belonging
to
by
the
conjuncti
v
e
rule:
8
x
k
2
C
i
;
A
2
2
;
m
i
(
A
)
:
=
\
x
k
2
C
i
m
A
s
(
A
)(
x
k
)
(25)
Thanks
to
that,
we
ha
v
e
an
idea
of
the
proportion
of
labels
present
in
a
cluster
.
What’
s
the
majority
class
and
minority
ones.
Step
2
:
W
e
obtain
c
masses
for
each
element
A
2
2
with
c
number
of
clusters
obtained.
W
e
combine
them
by
the
conjuncti
v
e
rule.
W
e
can
vie
w
ho
w
the
tw
o
classifications
agree
with
each
other
.
More
the
masses
tend
to
1
more
the
y
are
not
in
conflict.
Before
combining,
we
discount
masses
using
a
reliability
coef
ficient
noted
by
deg
net
ik
.
8
x
k
2
C
i
;
A
2
2
;
m
k
ns
(
A
)
:
=
\
s
=1
;:::;c
m
deg
net
ik
s
(
A
)(
x
k
)
(26)
W
e
obtain
the
f
aith
in
the
elements
of
the
frame
of
discernment.
deg
net
ik
is
a
measure
of
neatness
of
object
x
k
relati
v
ely
to
cluster
C
i
.
Object
x
k
may
be
clear
or
ambiguous
for
a
gi
v
en
cl
uster
.
If
it
is
in
the
center
of
a
cluster
or
near
to,
it
is
considered
a
v
ery
good
one.
It
represent
s
v
ery
well
the
cluster
.
W
e
can
af
firm
that
it
belongs
to
only
one
cluster
.
If
it
is
situated
in
the
border(s)
between
tw
o
or
man
y
clust
ers
it
may
not
be
considered
as
clear
object
for
only
one
cluster
.
It
is
ambiguous.
It
may
belong
to
more
than
only
one
group.
The
computation
of
deg
net
ik
tak
es
account
of
tw
o
f
actors:
de
gree
of
members
h
i
p
to
cluster
C
i
and
the
maximal
de
gree
of
o
v
erlapping
in
the
present
partition
noted
by
S
max
.
It
is
the
maximal
similarity
in
the
partition
(found
by
the
clustering).
deg
net
ik
=
1
deg
ov
er
l
i
(27)
deg
ov
er
l
i
is
the
o
v
erlapping
de
gree
to
cluster
C
i
.
It
is
computed
as
follo
ws:
deg
ov
er
l
i
=
(1
ik
)
S
max
(28)
De
gree
of
neatness
is
the
complement
to
1
of
the
de
gree
of
o
v
erlapping.
It
is
composed
of
tw
o
terms:
first
one
(1
ik
)
measures
the
de
gree
of
not
membership
of
a
point
x
k
to
a
cluster
C
i
.
Second
one
tak
es
account
of
o
v
erlapping
aspect.
IJECE
V
ol.
7,
No.
2,
April
2017:
1071
–
1087
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
1077
S
max
measures
the
maximal
o
v
erlapping
in
the
partition.
It
is
computed
as
follo
ws:
S
max
:
=
max(
S
(
C
i
;
C
j
))
(29)
The
clusters
C
i
and
C
j
are
considered
as
fuzzy
not
hard
sets.
S
(
C
i
;
C
j
)
=
max
x
k
2
X
(min(
C
i
(
x
k
)
;
C
j
(
x
k
))
(30)
Similarity
measure
is
not
based
on
distance
measure
due
to
its
limits.
In
f
act,
we
can
find
tw
o
clusters
ha
ving
the
same
distance
separating
them
b
ut
are
not
separable
in
the
same
w
ay
.
It
is
based
on
membership
de
gree.
W
e
look
for
the
de
gree
of
co-relation
between
tw
o
groups.
What’
s
the
minimum
le
v
el
of
co-relation
guaranteed.
The
ne
w
measure
satisfies
the
follo
wing
properties:
Pr
operty
1
:
S
(
C
i
;
C
j
)
is
the
maximum
de
gree
between
tw
o
clusters.
Pr
operty
2
:
The
similarity
de
gree
is
limited,
0
S
(
C
i
;
C
j
)
1
Pr
operty
3
:
If
C
i
:
=
C
j
then
S
(
C
i
;
C
j
)
:
=
1
and
if
(
C
i
\
C
j
:
=
;
)
then
S
(
C
i
;
C
j
)
:
=
0
.
Pr
operty
4
:
The
measure
is
commutati
v
e,
S
(
C
i
;
C
j
)
:
=
S
(
C
j
;
C
i
)
F
or
e
xample,
if
S
(
C
i
;
C
j
)
:
=
0
:
4
so
the
tw
o
clusters
are
similar
or
in
relation
with
minimum
de
gree
of
0.4.
The
y
are
not
connected
with
a
de
gree
of
0.6.
S
max
:
=
max(
max
x
k
2
X
(min(
C
i
(
x
k
)
;
C
j
(
x
k
))))
(31)
deg
net
ik
:
=
1
(1
ik
)
S
max
(32)
The
de
gree
of
membership
of
an
object
x
k
to
a
cluster
C
i
is
calculated
as
follo
ws:
ik
:
=
c
X
l
=1
(
(
k
x
k
v
i
k
)
(2
=
(
m
1))
(
k
x
k
v
l
k
)
(2
=
(
m
1))
)
1
i
:
=
1
;
:
:
:
;
c
;
k
:
=
1
;
:
:
:
;
n
1
(33)
where
v
i
the
center
of
cluster
C
i
,
n
1
number
of
objects.
F
or
the
combination
phase,
we
use
the
cautious
rule
5.
Sources
are
not
totally
independent
because
computation
of
masses
for
the
unsupervised
source
is
based
on
classes
gi
v
en
by
supervised
sources.
So,
we
can
not
say
that
the
y
are
independent.
At
the
end,
we
decide
using
the
pignistic
probability
.
W
e
are
interested
only
in
singletons:
labels
gi
v
en
by
the
classification.
T
o
summarize
the
process
of
fusion
we
illustrate
that
by
the
follo
wing
figure:
4.
RESUL
T
AND
AN
AL
YSIS
In
this
section,
we
present
the
obtai
ned
results
for
our
fusion
approach
between
supervised
classification
and
unsupervised
classification.
W
e
conduct
our
e
xperimental
study
on
dif
ferent
databases
coming
from
generic
databases
obtained
from
the
U.C.I
repository
of
Machine
Learning
databases.
In
future
we
intend
to
use
real
data
base
lik
e
medical
imaging
or
sonar
imaging.
Firstly
,
we
did
e
xperiments
on
data
without
an
y
change.
in
second
time
we
edit
our
data
and
remo
v
e
some
information
to
mak
e
a
data
missing.
Thirdly
,
we
inject
nois
e
with
dif
ferent
rates
and
we
tak
e
a
little
sampling
database
(
10%
).
The
aim
is
to
demonstrate
the
performance
of
the
proposed
method
and
the
influence
of
the
fusion
on
the
classification
results
in
a
noisy
en
vironment
and
with
missing
data.
The
e
xperience
is
based
on
three
unsupervised
methods
such
as
the
Fuzzy
C-Means
(FCM),
the
K
-Means
and
the
Mixture
Model.
F
or
the
supervised
methods,
we
use
t
he
K
-Nearest
Neighbors,
credal
K
-Nearest
Neighbors,
Bayes,
decision
tree,
neural
netw
ork,
SVM
and
credal
neural
netw
ork.
W
e
sho
w
in
the
T
able
2
the
obtained
classification
rates
before
and
after
fusion
for
the
ne
w
mechanism.
The
data
sho
wn
are:
Iris,
Abalone,
Breast-cancer
,
Car
,
W
ine,
Sensor
-readings24
and
Cmc.
Th
e
first
ones
(before
fusion)
are
those
obtained
with
only
supervised
methods
(
K
-Nearest
Neighbors,
credal
Cr
edal
Fusion
of
Classifications
for
Noisy
and
Uncertain
Data
(Kar
em)
Evaluation Warning : The document was created with Spire.PDF for Python.
1078
ISSN:
2088-8708
Figure
1.
Fusion
mechanism
K
-Nearest
Neighbors,
Bayes,
decision
tree,
neural
netw
ork,
SVM
and
credal
neural
netw
ork).
The
learning
rate
is
equal
to
10%
.
W
e
sho
w
in
the
T
able
3
the
obtained
classification
rates
before
and
after
fusion
for
the
ne
w
mechanism
for
missing
data.
The
data
sho
wn
are:
Iris,
Abalone,
W
ine,
Sensor
-readings24
and
Cmc.
The
first
ones
(before
fusion)
are
those
obtained
with
only
supervised
methods
(
K
-Nearest
Neighbors,
credal
K
-Nearest
Neighbors,
Bayes,
decision
tree,
SVM
and
credal
neural
netw
ork).
The
learning
rate
is
equal
to
10%
.
W
e
sho
w
in
the
T
ables
4,
5,
6,
7,
8
and
9
the
obtained
classification
rates
before
and
after
fusion
for
the
ne
w
mechanism
in
a
v
ery
noisy
en
vironment.
W
e
v
ary
the
IJECE
V
ol.
7,
No.
2,
April
2017:
1071
–
1087
Evaluation Warning : The document was created with Spire.PDF for Python.
IJECE
ISSN:
2088-8708
1079
noise
le
v
els.
W
e
sho
w
results
obtained
with
the
follo
wing
le
v
els:
55%
,
65%
and
70%
respecti
v
ely
for:
IRIS,
Abalone,
Y
east,
wine,
sensor
-readings4
and
sensor
-readings2.
4.1.
Experimentation
The
number
of
clusters
may
be
equal
to
the
number
gi
v
en
by
the
supervised
classification
or
fix
ed
by
the
user
.
The
tests
conducted
are
independent
for
the
three
le
v
els
of
noise.
It
means
that
the
y
were
not
made
in
the
same
iteration
of
the
program.
In
the
follo
wing,
we
present
the
data
(table
1)
and
the
results
obtained
(tables
2,
3,
4,5,
6,
7,
8
and
9).
T
able
1.
Data
characteristic
NbA:
Number
of
attrib
utes,
NbC:
number
of
classes,
NbCl:
number
of
clusters
tested
Data
NbA
NbC
NbCl
Iris
5
3
3
Abalone
8
2
2
Breast-cancer
11
3
3
Car
6
4
4
W
ine
13
3
3
Sensor
-readings24
5
4
4
Sensor
-readings2
2
4
4
Sensor
-readings4
4
4
4
Y
east
8
10
10
Cmc
9
3
3
4.2.
Discussion
If
we
look
to
the
results
sho
wn
in
table
2.
W
e
remark
the
follo
wing
results
for
each
data:
1.
Iris
The
performance
obtained
after
fusion
are
equal
to
100%
e
xception
are
for
decision
tree
and
neural
netw
ork
no
impro
v
ement.
The
classification
rate
is
approximately
66%
.
2.
Abalone
The
performance
obtained
after
fusion
are
better
than
that
obtained
before
fusion
e
xception
is
for
decision
tree
no
impro
v
em
ent.
The
classificati
on
rate
is
31
:
28%
.
The
best
result
obtained
is
for
KNN
with
mixture
model
97
:
58%
.
3.
Breast
cancer
The
performance
obtained
after
fusion
are
e
qu
a
l
to
100%
(KNN,
Bayes,
decision
tree,
neural
netw
ork,
credal
KNN)
e
xception
are
for
SVM
and
credal
neural
netw
ork.
The
classification
rate
is
approximately
65%
.
4.
Car
The
classification
rate
after
fusion
is
better
for
most
cases
equal
to
100%
(KNN
and
credal
KNN),
96%
(Bayes),
92%
(Decision
tree).
F
or
SVM,
neural
netw
ork
and
credal
neural
netw
ork
the
performance
is
less
than
that
before
fusion
equal
to
70%
.
5.
W
ine
The
clas
sification
rates
obtained
after
fusion
are
equal
to
100%
(KNN,
Bayes,
decision
tree,
neural
netw
ork,
credal
KNN),
73%
for
credal
neural
netw
ork,
approximately
40%
for
SVM.
6.
Sensor
-readings24
The
classification
rates
obtained
aft
er
fusion
are
equal
to
100%
(KNN,
Bayes,
decision
tree,
credal
KNN,
SVM)
and
to
99%
for
neural
netw
ork.
Cr
edal
Fusion
of
Classifications
for
Noisy
and
Uncertain
Data
(Kar
em)
Evaluation Warning : The document was created with Spire.PDF for Python.
1080
ISSN:
2088-8708
T
able
2.
Classification
rates
obtained
before
and
after
fusion
Data
Iris
Abalone
Breast-cancer
Car
W
ine
Sensor
-readings24
Cmc
K
N
N
90.37
50.73
57.87
83.67
67.50
75.36
48.38
K
N
N
+
FCM
100
97.21
100
100
100
100
100
K
N
N
+
K-
Means
100
78.13
100
100
100
100
100
K
N
N
+
Mixture
model
100
97.58
100
100
100
100
100
Bay
es
94.81
50.65
94.91
76.53
89.38
61.61
47.40
Bayes
+
FCM
100
62.89
100
96.27
100
100
100
Bayes
+
K-Means
100
62.92
100
96.27
100
100
100
Bayes
+
Mixture
model
100
63.39
100
96.27
100
100
100
Decision
tr
ee
66.67
31.28
93.32
74.92
64.38
94.13
37.06
Decision
tree
+
FCM
66.67
31.28
100
92.22
100
100
100
Decision
tree
+
K-
Means
66.67
31.28
100
92.22
100
100
100
Decision
tree
+
Mixture
model
66.67
31.28
100
92.22
100
100
100
Neural
netw
ork
64.44
53.02
95.23
70.10
63.13
72.10
39.17
Neural
netw
ork
+
FCM
66.67
79.04
100
70.03
100
99.76
65.28
Neural
netw
ork
+
K-Means
66.67
83.51
100
70.03
100
99.31
65.28
Neural
netw
ork
+
Mixture
model
66.67
72.44
100
70.03
100
99.63
65.28
Cr
edal
K
N
N
94.81
49.88
60.25
82.57
74.38
75.82
44.15
Credal
K
N
N
+
FCM
100
56.90
100
100
100
100
100
Credal
K
N
N
+
K-Means
100
57.62
100
100
100
100
100
Credal
K
N
N
+
Mixture
model
100
55.60
100
100
100
100
100
SVM
93.33
52.86
65.50
70.35
39.38
52.77
43.09
SVM
+
FCM
100
65.02
65.50
70.10
38.75
100
54.87
SVM
+
K-Means
100
66.37
65.50
70.10
40.00
100
55.55
SVM
+
Mixture
model
100
66.45
65.50
70.10
33.13
100
61.43
Cr
edal
Neural
Netw
ork
96.30
53.31
65.66
73.70
66.88
64.01
45.96
Credal
Neural
Net-
w
ork
+
FCM
100
62.52
65.66
70.03
73.13
100
99.25
Credal
Neural
Net-
w
ork
+
K-Means
100
60.52
65.66
70.03
73.13
100
99.17
Credal
Neural
Netw
ork
+
Mixture
model
100
57.81
65.66
70.03
73.13
100
99.77
7.
Cmc
W
e
obtain
100%
in
most
cases
(KNN,
Bayes,
decision
tree,
credal
KNN),
99%
for
credal
neural
netw
ork,
65%
IJECE
V
ol.
7,
No.
2,
April
2017:
1071
–
1087
Evaluation Warning : The document was created with Spire.PDF for Python.