Intern
ati
o
n
a
l
Jo
urn
a
l
o
f
P
u
b
lic Hea
l
th Science (IJ
P
HS)
V
o
l.3
,
No
.4
, Dece
m
b
er
2
014
,
p
p
.
2
3
1
~
240
I
S
SN
: 225
2-8
8
0
6
2
31
Jo
urn
a
l
h
o
me
pa
ge
: h
ttp
://iaesjo
u
r
na
l.com/
o
n
lin
e/ind
e
x.ph
p
/
IJPHS
Using
Da
ta
Mining
to
Pre
d
ic
t Po
ssible
Futur
e
De
pr
essio
n
Ca
se
s
Kevin Daimi, Shadi
Bani
taan
Computer Scien
ce
and Softwar
e
Engineeri
ng, University
of Detroit Mercy
,
USA
Article Info
A
B
STRAC
T
Article histo
r
y:
Received Oct 28, 2014
Rev
i
sed
No
v
14
, 20
14
Accepted Nov 26, 2014
Depres
s
i
on is
a dis
o
rder char
act
eriz
ed b
y
m
i
s
e
r
y
and gloom
ines
s
felt over
a
period of
time.
Some s
y
mptoms of
depression
overlap
with other somatic
illnesses im
pl
y
i
ng considerab
le difficul
t
y
in d
i
agnosing it
.
This paper
contributes to
its diagnosis thro
ugh the
application of data
mining, namely
classifi
cat
ion,
to
predic
t pa
tien
t
s who will m
o
st l
i
kel
y
d
e
velop
de
pression or
are
curren
t
ly
su
ffering from d
e
pression.
S
y
nth
e
ti
c da
ta
was u
s
ed for th
is
stud
y
.
To acquir
e
the results, th
e popular suite of machine learn
i
n
g
software,
WEKA,
was used.
Keyword:
Classification
D
a
ta Min
i
ng
Depressi
on
Diag
no
sis
Healthcare
Copyright ©
201
4 Institut
e
o
f
Ad
vanced
Engin
eer
ing and S
c
i
e
nce.
All rights re
se
rve
d
.
Co
rresp
ond
i
ng
Autho
r
:
Kevi
n Dai
m
i
,
Depa
rt
m
e
nt
of
M
a
t
h
em
at
i
c
s, C
o
m
put
er Sci
e
nce a
n
d
S
o
ft
w
a
re E
ngi
neeri
n
g,
Un
i
v
ersity of
Detro
it Mercy,
4
001
W
e
st McN
i
co
ls R
o
ad,
D
e
tro
it, MI
483
34
, USA.
Em
a
il: d
a
i
m
ik
j@ud
m
e
rcy.ed
u
1.
INTRODUCTION
Data Min
i
n
g
is a
m
u
lt
id
iscip
lin
ary field th
at is b
a
sed
on
v
a
riou
s field
s
in
cludin
g
d
a
tab
a
se
man
a
g
e
m
e
n
t
syste
m
s, artifici
a
l in
tellig
en
ce,
mach
in
e lear
n
i
n
g
,
n
e
ural n
e
t
w
orks, statistics, p
a
ttern
recogn
itio
n,
kn
o
w
l
e
d
g
e-
bas
e
d sy
st
em
s, kn
owl
e
dge
acq
ui
si
t
i
on, i
n
f
o
rm
at
i
on ret
r
i
e
val
,
hi
g
h
-
p
er
f
o
rm
ance c
o
m
put
i
n
g
,
an
d
d
a
ta v
i
su
alizatio
n
.
In
Data
Min
i
n
g
, th
e ex
traction
of
i
m
p
l
icit, p
r
ev
iou
s
ly un
kno
wn
, an
d
po
ten
tially u
s
efu
l
in
fo
rm
atio
n
from
d
a
ta is d
ealt with
[1
]-[7
]
.
Data min
i
n
g
ap
p
lication
s
in
h
ealth
care are
co
nstan
t
l
y
i
n
cr
easi
ng a
n
d bec
o
m
i
ng
m
o
re p
o
p
u
l
a
r
.
Dat
a
m
i
ni
ng can pl
a
y
a
m
a
jor r
o
l
e
i
n
heal
t
h
care a
l
l
o
wi
n
g
i
n
su
re
r
s
unc
ove
r f
r
au
d an
d ab
use, i
m
provi
n
g
heal
t
h
care
cust
om
er rel
a
t
i
ons
hi
p m
a
nagem
e
nt
deci
sions
, hel
p
i
n
g phy
si
ci
ans i
d
e
n
t
i
f
y
effect
i
v
e
t
r
eatm
e
nt
s and
best
practices, ide
n
tifying
risk fact
ors
ass
o
ci
ated
with t
h
e
onset
of dia
b
etes, a
n
d e
n
a
b
ling pati
ents to recei
ve
better
an
d m
o
re affo
rd
ab
le h
ealth
care serv
ices
[8
].
Health
care data m
i
n
i
n
g
pro
v
i
d
e
s m
y
riad
opp
ortun
ities fo
r
h
i
dd
en
pat
t
e
rn
ex
pl
or
at
i
on
fr
om
t
h
e h
u
g
e
heal
t
h
care
dat
a
st
o
r
es. T
h
ese
pat
t
erns
ca
n
be
u
s
ed
by
p
h
y
s
i
c
i
a
ns t
o
est
a
bl
i
s
h di
a
g
n
o
ses,
p
r
o
g
nose
s
an
d t
r
eat
m
e
nt
s fo
r pat
i
e
nt
s
i
n
heal
t
h
ca
re
o
r
ga
ni
zat
i
ons
[
9
]
.
W
a
n
g
et
al
[1
0]
i
nvest
i
g
at
e
d
t
h
e use
o
f
dat
a
m
i
ni
ng i
n
t
h
e
heal
t
h
care
i
n
d
u
st
ry
.
The
e
n
o
r
m
ous heal
t
h
ca
r
e
dat
a
a
r
e l
o
o
k
e
d
u
p
o
n
as on
e
o
f
th
e m
o
st challen
g
i
n
g
and
m
o
st d
i
ffi
cu
lt of all d
a
ta
to
wo
rk
with
.
Su
itab
l
e
d
a
ta
min
i
n
g
p
r
actices offer
t
h
e t
ech
ni
q
u
es
and
t
o
ol
s t
o
t
r
a
n
sf
orm
t
h
e v
o
l
u
m
i
nous am
ou
nt
s o
f
dat
a
i
n
t
o
val
u
a
b
l
e
i
n
fo
r
m
at
i
on fo
r
dec
i
si
on
m
a
ki
ng.
W
i
t
h
i
n
heal
t
h
care
,
dat
a
m
i
ni
ng can be em
pl
oy
ed t
o
ai
d i
n
di
scove
ri
n
g
cu
res fo
r cur
r
e
n
t
di
sease
s
,
u
n
c
ov
eri
n
g p
a
t
t
ern
s
fo
r
g
e
n
e
tic d
i
seases, an
d reco
gn
itio
n
of th
e
causes o
f
new d
i
seases wo
rl
d
w
i
d
e.
According to Obe
n
s
h
ain [11], “Bus
i
n
ess an
d m
a
rket
i
ng or
gani
zat
i
o
ns
m
a
y be ahead of healthcare in
applying data
m
i
ning
to derive knowledge
from
da
ta. This is
quic
k
ly cha
ngi
ng. Success
f
ul mining
appl
i
cat
i
o
ns ha
ve bee
n
im
pl
em
ent
e
d i
n
t
h
e heal
t
h
care a
r
en
a. Furt
h
e
r ex
p
l
orat
i
o
n of
dat
a
m
i
ni
ng fo
r re
searc
h
related
to
i
n
fectio
n
co
n
t
ro
l an
d
ho
sp
ital epid
emio
lo
g
y
se
e
m
s in orde
r,
especially whe
r
e the
data volum
e
exceeds capa
b
ilit
ies of tradit
ional statistica
l
techniques.”
The technique
s
of data m
i
ni
ng
have a
num
b
er of
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
252
-88
06
IJPHS Vol. 3, No. 4, D
ecem
ber 2014
:
231 – 240
23
2
applications i
n
healthca
re. Agra
wal et al
[12] a
p
p
lied cl
assification t
o
analyze c
o
lon s
u
rge
r
y data
. T
h
ey
con
s
t
r
uct
e
d ri
s
k
pre
d
i
c
t
i
o
n
m
odel
s
f
o
r
post
-
ope
rat
i
v
e u
n
d
e
s
i
r
abl
e
co
nse
q
uence
s
i
n
col
o
n su
rge
r
y
usi
n
g dat
a
m
i
ni
ng t
ech
ni
q
u
es.
Dat
a
m
i
ning
was al
s
o
a
p
pl
i
e
d t
o
hea
r
t
t
r
ans
p
l
a
nt
dat
a
fr
om
t
h
e Uni
t
e
d Net
w
o
r
k f
o
r
Or
ga
n
Sh
ar
ing
(
UNOS)
pr
og
r
a
m
to
p
r
ed
ict r
i
sk
of
m
o
r
t
ali
t
y w
ith
i
n
1
year
o
f
h
e
ar
t tr
an
sp
lan
t
.
Th
e g
o
a
l w
a
s
to
aid
phy
si
ci
ans i
n
t
h
ei
r deci
si
o
n
m
a
ki
ng p
r
oces
s by
fur
n
i
s
hi
n
g
them
with patient-specific
risk assessm
ents [13].
Tahsi
n
et
al
[
1
4]
desc
ri
be
d a
dat
a
m
i
ni
ng ap
pl
i
cat
i
on t
o
de
vel
o
p sy
st
em
s
fo
r aut
o
m
a
t
e
d cl
assi
fi
cat
i
on o
f
d
r
u
g
p
a
ir m
e
n
tio
n
e
d in
tex
t
in
to
o
n
e
of th
e fo
llowing
four
classes: no interac
tion, advice, effect, m
echanism and
gene
ri
c i
n
t
e
rac
t
i
on.
Depressi
on is a serious m
e
dical condition accom
p
an
ied by a disruption in
m
ood, deliberation, and
b
o
d
y
cau
s
ing
a p
e
rso
n
to feel v
e
ry
m
i
sera
b
l
e, fru
itless, an
d
freq
u
e
n
tly lack
in
g
th
e ab
ility
to
ex
p
e
rien
ce a
no
rm
al
li
ve. T
h
e sy
m
p
t
o
m
s
and i
m
pact
of
dep
r
essi
on
h
a
ve
been a
n
al
y
zed by
a
nu
m
b
er of r
e
sea
r
che
r
s.
B
i
ol
ogi
cal
sy
m
p
t
o
m
s
of de
pressi
o
n
w
e
re
st
u
d
i
e
d
by
M
a
t
t
h
ew et
al
[
15]
.
They
st
u
d
i
e
d t
h
e
fre
qu
ency
o
f
epi
s
o
d
es
of
di
verse
bi
ol
ogi
c
a
l
sym
p
t
o
m
s
in rel
a
t
i
o
n t
o
t
h
e i
n
t
e
nsi
t
y
of
dep
r
essi
on a
n
d
neu
r
ot
i
c
i
s
m
usi
ng
3
7
dep
r
esse
d
pat
i
e
nt
s. St
udi
e
d
i
ndi
vi
d
u
al
l
y
, t
h
ey
obse
r
v
e
d t
h
at
t
h
e su
perl
at
i
v
e si
g
n
of
de
p
r
essi
o
n
se
veri
t
y
and
n
e
uro
ticis
m
w
e
re early wak
i
n
g
u
p
an
d
excessiv
e
dream
i
n
g. Th
ey d
e
p
l
o
y
ed
stepwise
m
u
ltip
le reg
r
essio
n
anal
y
s
es of va
r
i
ance t
echni
qu
e t
o
fi
nd a g
r
o
up
of bi
ol
o
g
i
c
al
sym
p
t
o
m
s
p
r
edi
c
t
i
n
g t
h
e severi
t
y
of de
p
r
essi
on
.
The t
ech
ni
q
u
e
was
not
as s
u
c
cessf
ul
i
n
p
r
ov
i
d
i
ng
an i
n
si
g
h
t
on t
h
e sev
e
ri
t
y
of ne
u
r
ot
i
c
i
s
m
.
Howe
ver,
t
h
ey
fu
rt
he
r
obse
r
ve
d t
h
at
ne
ur
ot
i
c
i
s
m
was a
usef
ul
p
r
e
d
i
c
t
o
r
o
f
t
h
e bi
ol
o
g
i
cal
sym
p
t
o
m
s
when t
a
ke
n a
s
a
w
hol
e
.
Triv
ed
i [1
6
]
in
v
e
stig
ated
the ro
le p
l
ayed
b
y
ph
ysical sy
m
p
to
m
s
with
reg
a
rd
s t
o
d
e
p
r
essi
on
. Th
e
au
thor
st
ressed t
h
at
u
n
ex
pl
ai
ne
d ac
hes a
nd
pai
n
s
are re
peat
edl
y
t
h
e pi
n
p
o
i
n
t
i
ng sy
m
p
t
o
m
s
of
de
pressi
on
. These
sy
m
p
to
m
s
in
clu
d
e
ch
ron
i
c jo
in
t p
a
i
n
, limb
p
a
i
n
,
b
a
ck
p
a
in
,
g
a
st
ro
i
n
testin
al p
r
ob
lem
s
, t
i
red
n
e
ss, sleep
d
i
stu
r
b
a
n
ces,
psych
o
m
o
t
o
r
activ
ity ch
an
g
e
s, an
d
app
e
tite ch
an
g
e
s. A larg
e p
e
rcen
tag
e
o
f
p
a
tien
t
s, who
su
ffer
fr
om
depressi
o
n
,
rep
o
rt
onl
y
t
h
ei
r p
h
y
s
i
cal
sym
p
t
o
m
s
. Thi
s
can ca
use t
h
e di
a
g
n
o
si
s
o
f
de
pre
ssi
o
n
t
o
be a
diffic
u
lt task. The author empha
sized th
at
p
h
y
s
i
cal
pai
n
an
d de
pressi
on e
xhi
bi
t
st
ron
g
er
bi
ol
o
g
i
cal
con
n
ect
i
o
n
than
sim
p
le ca
use a
n
d effect.
A n
u
m
b
er of st
udi
es t
r
eat
e
d
t
h
e pre
d
i
c
t
i
on o
f
dep
r
essi
on
usi
ng
vari
ous t
e
c
h
ni
q
u
es. A st
ud
y
aim
e
d at
dem
onst
r
at
i
n
g ho
w a ri
sk
pre
d
i
c
t
i
on i
n
de
x wo
ul
d e
n
abl
e
dep
r
essi
on a
v
o
i
dance by
pi
n
p
o
i
n
t
i
n
g pat
i
e
nt
s wh
o
wo
ul
d
be
m
o
st
l
i
k
el
y t
o
bene
fi
t
m
o
st
from
pre
v
e
n
t
a
t
i
v
e pr
oced
u
r
es i
n
pri
m
ary
care set
tings
was carri
e
d
o
u
t
b
y
Van
Voorh
ees et al [17
]
. Th
ey ado
p
t
ed
so
cial and
co
gn
itiv
e
v
u
l
n
e
rab
ility an
d
m
o
od
as
b
a
seline risk
fact
or
s
t
o
pre
d
i
c
t
o
n
set
of
a d
e
pres
si
ve
e
p
i
s
o
d
e
at
1
-
y
ear fo
l
l
o
w-
u
p
. They
rel
i
e
d on
bo
ost
e
d
cl
assi
fi
cat
i
o
n
a
n
d
reg
r
essi
o
n
t
r
ee
s t
o
de
vel
o
p a
pre
d
i
c
t
i
on i
nde
x ap
pr
o
p
ri
at
e f
o
r a
pers
o
n
al
com
put
er o
r
ha
nd
-
h
el
d
devi
ce
. D
e
C
h
o
u
d
h
u
ry
et
al
[18]
l
o
oke
d
at
t
h
e pot
ent
i
a
l
of usi
n
g soci
a
l
m
e
di
a t
o
i
d
ent
i
f
y
and
di
ag
nos
e key
de
pr
essi
ve
d
i
sord
er in indiv
i
d
u
a
ls. Th
ey first em
p
l
o
y
ed
crowdsou
rci
n
g to
g
a
th
er a set o
f
Twitter u
s
ers
who
rep
o
rted
bei
n
g di
a
g
n
o
se
d wi
t
h
cl
i
n
i
cal
dep
r
essi
on
, bas
e
d o
n
a st
an
da
r
d
psy
c
hom
et
ri
c vehi
cl
e.
Usi
n
g t
h
ei
r s
o
ci
al
m
e
di
a
post
i
ngs
ove
r a y
ear pri
o
r t
o
t
h
e i
n
cept
i
o
n o
f
dep
r
essi
on
, b
e
havi
oral
at
t
r
i
b
ut
es rel
a
t
i
ng t
o
soci
al
engage
m
e
nt
,
em
ot
i
on, l
a
n
g
u
age a
n
d l
i
n
g
u
i
s
t
i
c
st
y
l
es, ego
net
w
or
k, a
nd i
ndi
cat
i
o
ns
of a
n
t
i
d
ep
res
s
ant
m
e
di
cat
i
o
ns we
r
e
measured. T
h
e
s
e be
havi
oral
cues
were the
basis for
build
ing a statistical classifier
that offere
d estimates of
t
h
e ri
s
k
of
de
pressi
o
n
.
A
p
a
per
su
g
g
est
i
n
g a
st
at
i
s
t
i
cal i
n
fe
re
nce a
p
pr
oac
h
,
nam
e
d
Negat
i
ve E
m
oti
o
n
Eval
uat
i
o
n (
N
EE) M
o
del
,
t
o
expl
o
r
e t
h
e d
e
pres
si
o
n
t
r
en
d o
f
we
b p
o
st
s was i
n
t
r
o
d
u
ce
d by
Tu
n
g
et
al
[19]
.
Fo
r th
is pu
rpose, a C
h
in
ese
fo
ru
m
s
p
o
s
t
d
a
t
a
set was co
lle
cted from
PTT Prozac
zone i
n
Tai
w
an. Eac
h
post
was cl
assi
fi
ed
and
veri
fi
e
d
i
n
t
e
rm
s of fo
ur
de
pressi
o
n
t
e
n
d
ency
va
ri
abl
e
s nam
e
ly
, negat
i
v
e e
m
oti
on,
trig
g
e
ring
ev
ent, sy
m
p
to
m
,
an
d
n
e
g
a
tiv
e th
ink
i
ng
. Tho
s
e were co
llected
fro
m
th
e Diag
nostic an
d
Statistical
Man
u
a
l of M
e
n
t
al Disord
er, Fou
r
t
h
Ed
itio
n
(DSM-IV-
TR) b
a
sed
on th
e d
e
fi
n
itio
n o
f
m
a
j
o
r
d
e
pressiv
e
episode.
Lee et al. [8] investigate
d
the association be
t
w
een t
h
e
chronic
obstructive pulm
onary disease
(C
OP
D)
asses
s
m
ent
t
e
st
(C
A
T
) a
n
d
dep
r
ess
i
on i
n
C
O
P
D
p
a
tients. T
h
eir
results indicated that the C
A
T
scores
are significantl
y associated
wi
th the prese
n
ce
of
depression
and
ha
ve good
accuracy for
predicting depres
sion
in COP
D
patients. In addition, am
ong the eight item
s
of the CAT, the
ene
r
gy sc
ore reveale
d
the best
correlation with
the prese
n
ce
of de
pr
essio
n
. Fu
ller et al. [2
0
]
inv
e
stig
at
e
d
the ass
o
ciation
betwee
n m
i
grai
ne
an
d d
e
p
r
ession
. Th
ey
f
oun
d th
at m
i
g
r
ai
ne i
s
associ
at
ed
wi
t
h
hi
g
h
er
od
ds
o
f
c
u
r
r
e
n
t
de
pr
essi
o
n
am
on
g
C
a
nadi
a
n
s. T
h
ey
al
so fou
nd t
h
at
t
hose wi
t
h
dep
r
essi
on we
r
e
y
oun
ger
,
u
n
m
arri
ed, an
d p
o
o
r
er a
nd h
a
d
act
i
v
i
t
y
l
i
m
i
t
a
t
i
ons. F
u
rt
her
w
o
rk
o
n
pre
d
i
c
t
i
n
g
de
p
r
essi
on
co
ul
d
b
e
f
o
u
n
d
i
n
[
4
]
,
[
21]
-
[
24]
.
In t
h
i
s
pa
per
,
a
dat
a
m
i
ni
ng appl
i
cat
i
o
n base
d o
n
cl
assi
fi
cat
i
on i
s
pr
o
pose
d
t
o
pre
d
i
c
t
w
h
o w
o
ul
d
b
e
a p
o
ssi
bl
e ca
n
d
i
d
at
e
fo
r
dev
e
l
opi
n
g
de
pres
si
on
. Sy
nt
het
i
c
dat
a
i
s
use
d
t
o
t
r
ai
n a
n
d t
e
st
t
h
e cl
assi
fi
c
a
t
i
o
n
m
o
d
e
l. Section
2
in
t
r
odu
ces
th
e attribu
t
es
used
for t
h
is
st
u
d
y
.
In
Sect
i
o
n
3, t
r
ai
ni
n
g
a
n
d
t
e
st
i
ng t
h
e
m
odel
are
prese
n
t
e
d
.
Sect
i
on
4
deal
s wi
t
h
m
a
ki
ng
pre
d
i
c
t
i
ons o
n
new
un
seen
dat
a
.
Fi
nal
l
y
, di
scu
s
si
on a
n
d co
ncl
u
si
o
n
are co
v
e
red in
Sectio
n
5
.
Th
e well-k
nown
WEKA too
l
is
ad
op
ted fo
r th
i
s
stud
y.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
PH
S I
S
SN
:
225
2-8
8
0
6
Using
Da
t
a
Mi
n
i
ng
to Pred
ict Po
ssi
b
l
e
F
u
ture De
pressi
on
Cases (Kevin
Dai
m
i)
23
3
2.
ATTRIB
UTES SELECTION
Attributes (sy
m
pto
m
s in the case of
depres
sion) sel
ect
i
o
n
i
s
o
n
e
of t
h
e
m
o
st
im
port
a
nt
pr
ocesse
s i
n
dat
a
m
i
ni
ng.
Thi
s
pr
ocess
i
n
v
o
l
v
e
s
sel
ect
i
n
g
an
ef
fect
i
v
e
su
bset
of
rel
e
vant
at
t
r
i
but
es
or
feat
ures
nee
d
ed
f
o
r
con
s
t
r
uct
i
n
g t
h
e dat
a
m
i
ni
ng
m
odel
.
R
u
s
h
i
ng t
h
i
s
pr
oces
s can re
sul
t
i
n
pos
si
bl
y
sel
ect
i
ng re
d
u
n
d
a
n
t
a
n
d
un
necessa
ry
at
t
r
i
but
es,
w
h
i
c
h
coul
d
heavi
l
y
im
pact
t
h
e co
nst
r
uct
e
d m
o
d
e
l
and t
h
e o
u
t
c
om
es of t
h
e
m
i
ni
ng
pr
ocess
.
Thi
s
w
o
rk
rel
i
e
d o
n
a
num
ber o
f
o
n
l
i
n
e s
u
r
v
ey
s an
d
qu
est
i
o
n
n
ai
res i
n
cl
udi
n
g
t
h
ose
prese
n
t
e
d
i
n
[25
]
-[27
] to
select th
e attrib
utes neede
d
for
classifying
depression. T
h
e s
e
lected set of a
ttributes wa
s furt
her
enl
a
r
g
ed
by
addi
ng m
o
re at
t
r
i
but
es f
r
om
t
h
e abo
v
e m
e
nt
i
one
d re
fere
nces o
n
de
pre
ssi
on a
nd
pre
d
i
c
t
i
n
g
d
e
pressi
o
n
. After filtering
out redu
nd
an
cies, th
e
n
u
m
b
e
r
of attribu
t
es in
th
e selected
set was
5
0
. Fo
llo
wi
ng
co
nsu
ltatio
n with
facu
lty at th
e Co
lleg
e
of
Health
p
r
o
f
es
sio
n
s
, th
is
set was red
u
c
ed
t
o
31
attribu
t
es in
clu
d
i
ng
the class
varia
b
le “May Ha
ve
De
pressi
on.”
The
fina
l set
of attrib
u
t
es is presen
ted in
Tab
l
e 1
b
e
low.
Tabl
e 1. At
t
r
i
b
ut
es
Set
Attribute Values
Sadness
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
Discour
agem
ent
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
L
o
w self-
e
stee
m
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
I
n
fer
i
or
ity
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
Guilt
None: 0, Mild: 1,
Mediu
m
: 2,
Seriou
s: 3
I
ndecisiveness
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
Irritabilit
y and frustration
None: 0,
Mild: 1,
Mediu
m
: 2,
Seriou
s: 3
Loss of interest in life
None: 0, Mild: 1,
Mediu
m
: 2,
Seriou
s: 3
loss of
m
o
tivation
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
Poor
self-
i
m
a
ge
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
Poor
m
e
m
o
ry
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
L
o
se libido
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
Hy
pochondr
iasis
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
Suicidal im
pulse
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
Sluggish
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
Cr
y
i
ng spells
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
L
ack of em
otional
r
e
sponsiveness
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
Helplessness
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
Pessim
i
s
m
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
Agitation
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
Past failur
e
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
Reduced pain toler
a
nce
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
Desir
e
for
Social Suppor
t
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
Psy
c
ho
m
o
tor
r
e
tardation
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
Conf
usion
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
Scatter
b
r
a
ined
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
Cognitive i
m
pair
ment
None: 0, Mild: 1,
Mediu
m
: 2,
Seriou
s: 3
L
o
ss war
m
feeling towar
d
fam
i
ly
or fr
iends
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
Substance Abuse
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
Childho
od tr
au
m
a
None: 0,
M
ild: 1,
M
e
diu
m
: 2, Ser
i
ou
s: 3
May
Hav
e
Dep
r
ess
i
o
n
Yes
,
No
3.
MODEL
CRE
A
TION AND TESTING
In t
h
i
s
st
udy
,
cl
assi
fi
cat
i
on i
s
depl
oy
ed f
o
r
fi
ndi
ng
hi
d
d
e
n
pat
t
e
r
n
s i
n
d
a
t
a
. To creat
e
t
h
e m
odel
,
a
classification
algorithm
needs to
be
applied. T
o
acquire the classifica
t
i
on m
odel
,
t
h
e C
4
.
5
deci
si
on t
r
ee
al
go
ri
t
h
m
i
s
em
pl
oy
ed.
WE
KA
i
m
pl
em
ents a l
a
t
e
r a
n
d sl
i
ght
l
y
i
m
prove
d
versi
o
n
nam
e
l
y
, C
4
.
5
revi
s
i
on
8.
Thi
s
i
s
refer
r
ed
t
o
as J4.8. T
h
e
resul
t
s
of i
m
plem
ent
i
ng t
h
e d
e
pres
si
o
n
cl
assi
fi
cat
i
on m
odel
are obt
ai
ne
d u
s
i
n
g
J4
.8
.
Sp
littin
g a d
a
taset in
t
o
train
i
ng
an
d
testin
g
sets is a cen
tral
p
a
rt
o
f
assessi
n
g
d
a
ta m
i
n
i
n
g
m
o
d
e
ls.
No
rm
al
ly
, whe
n
a
dat
a
set
i
s
di
vi
de
d i
n
t
o
a
t
r
ai
ni
n
g
set
and testin
g
set, t
h
e h
i
gh
est
p
o
rtion
o
f
t
h
e
d
a
ta is u
s
ed
f
o
r
tr
ain
i
ng
, an
d
a sm
aller
f
r
actio
n
of
th
e
data is u
s
ed
fo
r testin
g
.
Th
e
J4
.8
algo
r
ith
m
is tr
ain
e
d
u
s
i
n
g
600
in
stan
ce
d
a
taset an
d
tested
with
400
in
stan
ce dataset.
T
a
bl
e 2
de
pi
ct
s a sam
p
l
e
of t
h
e t
r
ai
ni
ng
dat
a
. The
n
u
m
b
e
rs i
n
the h
e
ad
er ro
w rep
r
esen
t th
e sy
mto
m
s (attrib
utes) m
e
n
tio
n
e
d in
Tab
l
e
1
.
The nu
m
b
ers in th
e
first
co
lu
m
n
i
m
p
l
y
r
o
w
s
30
1
to
320
o
u
t
of
th
e
600
tr
ai
n
i
ng
d
a
ta r
o
w
s
(
d
epr
e
ssi
o
n
cases)
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
252
-88
06
IJPHS Vol. 3, No. 4, D
ecem
ber 2014
:
231 – 240
23
4
Tabl
e 2. Sam
p
le
t
r
ai
ni
n
g
dat
a
As m
e
nt
i
one
d
abo
v
e,
sy
nt
het
i
c
dat
a
was
u
s
ed.
Thi
s
dat
a
was c
r
eat
ed
us
i
ng a
Java
p
r
o
g
ram
.
The
train
i
ng
o
f
th
e
J4
.8
algo
rith
m
p
r
ov
id
ed
en
cou
r
ag
ing
resu
lts b
a
sed
on
th
e
attrib
u
t
es sub
s
et (3
0
attribu
t
es) th
at
were i
n
cl
ude
d
aft
e
r caref
ul
co
nsi
d
e
r
at
i
on a
n
d co
nsul
t
a
t
i
o
n wi
t
h
ex
pert
s i
n
t
h
e fi
el
d of de
pressi
o
n
. The
WE
KA
out
put
o
f
t
r
ai
ni
ng
J4
.8
i
s
s
u
m
m
a
ri
zed i
n
Fi
g
u
re
1
.
A
s
can
b
e
obser
v
e
d
in
Figur
e 1, 555
in
st
an
ces
w
e
re c
o
rrectly classified and
45 i
n
s
t
ances we
re
in
correctly classified
. Th
is resu
lted
in
92
.5% o
f
th
e in
stances being corre
c
tly c
l
assified. There was a relative
abs
o
l
u
t
e
err
o
r
of
24
.9
4%
, an
d a r
oot
rel
a
t
i
v
e squa
re
d err
o
r o
f
4
9
.
9
4
%
.
Fi
gu
re 1 al
s
o
s
h
o
w
s t
h
e C
o
n
f
usi
o
n
matrix
b
e
lo
w i
n
d
i
cating
th
e t
r
u
e
p
o
s
itiv
e
(TP), tru
e
neg
a
tiv
e (TN), false p
o
s
itiv
e (FP), and
false
n
e
g
a
tiv
e
(FN
)
.
26
3
34
(TP)
(F
N)
11
29
2
(FP
)
(TN)
Seve
ral classifi
cation m
e
trics were
use
d
for
evaluatio
n
na
mely accuracy, precision,
a
nd recall. T
h
ese
metric
s
are defi
ne
d as
f
o
l
l
o
w
s
:
Accuracy =
Precision =
Recall =
Based on the training data,
we no
tice that accuracy = (263+292)/
600 =
0.925, precision
= 263/ (263+11) =
0.
95
9,a
n
d
recal
l
=
26
3/
(
2
63+
3
4
)
=
0.
88
5.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
301
2
2
2
1
1
3
0
1
0
2 3 1 1 2 1 1 1 2 2 3 0 2 1 2 2 1 1 1 2 0
No
302
2
3
2
3
1
1
2
1
1
1 1 2 2 3 2 3 1 1 2 2 1 3 3 3 2 1 3 3 2 3
Yes
303
1
1
1
2
3
2
2
3
2
1 2 2 1 1 0 3 1 1 3 3 2 3 1 1 2 2 3 2 3 3
Yes
304
2
3
2
2
0
3
2
1
1
1 1 1 1 1 1 1 2 0 1 1 1 1 1 1 2 1 3 1 3 0
No
305
2
1
1
2
1
1
1
1
2
3 0 3 3 3 3 3 1 3 3 1 1 3 3 2 1 3 1 3 1 1
No
306
1
2
2
3
3
1
1
2
1
3 3 1 2 2 2 2 1 2 3 1 2 1 1 2 2 2 3 1 2 3
Yes
307
2
1
2
2
1
3
1
3
2
3 3 1 3 3 3 1 1 1 3 1 3 3 2 2 2 1 2 1 3 3
Yes
308
2
3
2
2
3
2
3
2
2
1 2 3 1 2 3 1 2 2 3 3 2 2 1 3 1 1 2 1 3 2
Yes
309
3
3
3
1
1
2
1
2
1
2 2 2 2 2 2 3 1 3 2 1 2 1 2 1 2 1 3 2 2 3
Yes
310
3
1
2
1
3
1
3
3
2
1 1 1 1 3 1 2 2 2 2 1 0 1 3 3 3 2 3 2 3 3
Yes
311
1
1
2
2
1
3
2
1
2
1 1 3 2 2 1 2 0 1 3 3 2 2 3 2 1 2 3 3 3 0
No
312
1
2
3
2
3
1
1
1
1
0 3 1 1 2 1 2 2 1 2 1 1 1 0 1 1 1 1 3 1 0
No
313
0
0
3
1
1
0
3
0
1
1 3 3 3 0 3 3 0 0 2 3 2 2 0 3 3 0 3 0 0 0
No
314
2
2
2
2
2
1
1
3
3
2 2 1 1 1 2 3 1 2 1 1 2 3 1 2 1 2 1 3 3 3
Yes
315
1
1
2
2
3
0
1
2
1
3 2 3 3 1 1 0 2 3 1 2 3 1 2 3 2 0 2 2 1 3
No
316
0
3
3
1
1
3
2
3
2
2 1 3 2 2 3 1 1 3 3 1 2 2 3 2 1 2 2 3 3 3
Yes
317
1
2
1
2
1
1
3
1
2
1 2 1 2 2 3 1 1 1 0 3 3 2 2 2 1 1 1 2 2 3
No
318
1
1
2
1
3
1
1
3
2
2 2 2 3 3 1 2 2 3 3 2 3 1 2 1 2 2 1 2 3 1
Yes
319
1
3
1
2
1
3
2
1
3
2 1 2 1 2 2 1 1 3 1 1 0 1 2 2 0 3 3 1 2 0
No
320
1
3
3
1
1
2
0
2
3
3 1 3 3 3 2 1 1 1 1 3 2 1 2 3 3 3 1 3 2 2
Yes
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
PH
S I
S
SN
:
225
2-8
8
0
6
Using
Da
t
a
Mi
n
i
ng
to Pred
ict Po
ssi
b
l
e
F
u
ture De
pressi
on
Cases (Kevin
Dai
m
i)
23
5
Fi
gu
re 1.
Trai
n
i
ng o
u
t
c
om
e
A test set
is used to determ
ine
the accuracy (validation)
of
the
m
odel. The
resulting m
o
del is applied
to
th
e testin
g in
stan
ces. After co
m
p
letin
g
th
e train
i
ng
p
h
a
se sat
i
s
fact
ori
l
y
,
40
0 r
o
ws (
d
ep
ressi
on cas
es)
o
f
d
a
ta were
u
s
ed to
test th
e created
m
o
d
e
l. Tab
l
e 3 sh
ows t
h
e
p
a
rtial testin
g in
stan
ces
(
c
ases 101
to 120
), an
d
Fi
gu
re 2 de
pi
ct
s
t
h
e
t
e
st
i
n
g ou
t
c
om
e.
Fig
u
re 2, illu
strates th
at 33
3
in
stan
ces
were co
rrec
tly clas
sified
and
67
i
n
stan
ces
were
in
correctly
cl
assi
fi
ed. T
h
i
s
l
e
d t
o
t
h
e
c
oncl
u
si
o
n
t
h
at
83
.2
5%
of
the instances
we
re correc
tly
classified
. Th
e relativ
e
ab
so
lu
te erro
r
was
2
4
.94
%
, an
d th
e
roo
t
rel
a
tiv
e squ
a
re
d e
r
ror
was
49.94%. T
h
ese
rela
tive error val
u
es aim
to
offset
fo
r t
h
e b
a
sic
p
r
ed
ictab
ility o
r
unp
red
i
ctab
ility o
f
t
h
e class
v
a
riable. Th
e C
o
n
f
u
s
io
n
Matri
x
for
testin
g
is given below. The accuracy
of the testing for the m
odel is given
by: accuracy = (1
63 + 170) / (163 + 170 +
4
0
+
2
7
)
= 0.83
3,
p
r
ecision
= 16
3
/
(
163
+27
)
= 0
.
85
8,
r
ecall= 16
3
/
(
163
+40)
=
0
.
80
3.
163
40
(TP
)
(F
N)
27
17
0
(FP
)
(T
N)
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
252
-88
06
IJPHS Vol. 3, No. 4, D
ecem
ber 2014
:
231 – 240
23
6
Tabl
e 3. Sam
p
le
t
e
st
i
ng dat
a
Fi
gu
re
2.
Test
i
n
g
o
u
t
c
om
e
4.
MODEL USAGE
After traini
ng
and testing, the created m
odel
sh
oul
d
be
use
d
f
o
r cl
assi
fy
i
n
g
u
n
k
n
o
w
n i
n
st
ances
provide
d
that
the acc
uracy
of classi
fi
cat
i
o
n i
s
ade
q
uat
e
.
T
h
e cl
assi
fi
cat
i
on m
odel
sho
u
l
d
be ca
p
a
bl
e
of
pre
d
i
c
t
i
ng
u
n
s
een i
n
st
a
n
ces
usi
n
g t
h
e m
o
d
e
l
i
t
has l
earne
d. C
e
rt
ai
nl
y
,
i
t
i
s
desi
rabl
e t
o
re
-t
rai
n
pe
ri
odi
cal
l
y
usi
n
g
new
t
r
ai
ni
n
g
dat
a
. T
h
e
dep
r
essi
o
n
cl
assi
fi
cat
i
o
n m
o
d
e
l w
a
s
used
to pr
ed
ict 20
un
seen
i
n
stan
ces
t
h
r
o
u
g
h
re
-e
va
l
u
at
i
ng t
h
e m
odel
o
n
t
h
ese
u
n
see
n
i
n
st
a
n
ce
s. Ta
bl
e4
de
pi
ct
s t
h
ese
pre
d
i
c
t
i
ons.
O
u
t
of
t
h
e
20
instances (unkown de
pressi
on
cases
),
13
were classi
fied as “
N
o” a
n
d
7 insta
n
ces
as “Yes
.” Col
u
m
n
31
represe
n
ts the
diagnosis.
Ano
t
h
e
r
way
of sho
w
i
n
g
th
e resu
lts i
n
v
o
l
v
e
s
p
r
ov
id
i
n
g th
e prob
ab
ility d
i
strib
u
t
io
n
fo
r th
e
p
r
ed
ictio
n
s
.
Th
is is illu
strated
in Figu
re 3.
Th
e
act
u
a
l cl
asses are unk
no
wn
and
th
erefore ‘
?
’ is
d
i
splayed
unde
r the “actual” colum
n
. The “pre
dict
ed” colum
n
cont
ains the predic
tions (classe la
bels). T
h
e ‘+
’ under
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
101
3
3
3
1
1
3
3
1
2
2 2 3 2 2 3 1 1 2 2 3 2 1 3 3 2 2 1 1 3 3
Yes
102
2
1
1
3
1
1
1
1
1
3 1 2 1 3 3 2 1 1 2 3 3 1 1 1 1 1 1 1 3 0
No
103
2
1
2
2
2
1
3
2
2
1 1 2 3 2 1 1 1 2 3 1 2 3 1 1 1 3 2 1 2 3
No
104
2
2
3
3
2
3
2
2
2
3 1 2 3 2 3 0 2 1 3 3 1 2 1 2 1 2 0 3 2 1
No
105
2
1
2
2
3
3
3
3
3
2 2 2 2 2 1 2 2 1 1 3 1 2 3 2 2 2 1 3 3 2
Yes
106
2
2
2
2
2
2
2
3
3
1 1 1 2 2 2 1 2 1 2 3 2 2 1 3 1 2 1 1 1 2
No
107
1
0
3
2
1
1
3
1
0
1 1 2 1 2 2 1 3 1 1 3 2 1 3 3 3 3 1 2 0 1
No
108
3
1
2
3
3
1
1
1
1
2 1 2 1 1 1 1 1 2 2 3 1 1 2 1 1 3 1 1 1 0
No
109
2
1
1
3
1
2
3
2
2
2 1 2 1 3 3 3 2 2 3 2 2 2 1 3 2 2 3 3 2 3
Yes
110
1
1
3
1
2
3
0
2
1
1 3 3 2 1 3 2 3 3 2 2 2 3 1 1 3 2 2 1 3 3
Yes
111
2
1
1
1
3
1
2
3
3
3 2 3 1 1 1 3 3 3 1 1 1 3 2 1 2 1 3 3 3 2
Yes
112
0
3
1
3
1
2
3
3
2
1 2 2 2 3 1 2 3 2 3 2 2 3 1 3 2 3 3 2 2 2
Yes
113
3
3
3
3
2
3
1
3
1
1 2 3 0 2 3 2 3 1 2 2 3 3 2 3 0 2 1 2 1 1
No
114
2
0
1
3
0
1
2
3
2
1 3 3 3 2 1 2 3 1 1 1 3 0 1 1 1 1 2 3 2 0
No
115
3
1
2
3
1
2
1
3
3
1 2 3 3 2 2 1 2 2 2 1 3 1 1 1 3 1 1 2 1 1
No
116
1
3
3
2
3
2
3
2
1
2 3 0 2 3 1 2 3 1 2 2 2 1 1 0 3 2 3 1 3 2
Yes
117
3
1
3
1
2
3
3
3
1
2 2 2 1 2 2 1 1 1 2 2 2 1 2 1 2 2 1 3 3 3
Yes
118
1
1
3
2
2
1
2
1
3
2 1 2 2 1 3 1 1 1 1 1 2 2 3 1 1 3 1 2 3 3
No
119
1
3
2
1
3
1
2
2
2
1 1 0 3 3 1 2 3 1 3 2 2 1 3 2 1 1 1 2 3 3
Yes
120
3
3
2
3
2
1
1
3
2
3 3 0 3 2 2 1 3 1 1 0 2 1 3 3 1 2 0 1 0 1
No
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
PH
S I
S
SN
:
225
2-8
8
0
6
Using
Da
t
a
Mi
n
i
ng
to Pred
ict Po
ssi
b
l
e
F
u
ture De
pressi
on
Cases (Kevin
Dai
m
i)
23
7
th
e “er
ro
r
”
co
lu
m
n
i
m
p
l
ies th
e actu
a
l an
d
pred
icted
classes ar
e n
o
t
th
e same. Sin
ce th
e actu
a
l is u
n
know
n
(
?
),
all th
e p
r
ed
icted
classes do
not
m
a
tch
the actual classes. In othe
r words,
t
h
ere are no errors beca
use
no a
c
tual
classes exit or
are known.
On
t
h
e
ot
he
r
han
d
, i
f
t
h
e ‘+
’ a
p
peare
d
du
ri
n
g
t
h
e t
e
st
i
ng
o
f
t
h
e
m
odel
,
t
h
e
n
t
h
e ‘+
’
sy
m
b
o
l
is significan
t. Th
ere
are two
p
r
ob
abilit
y d
i
strib
u
t
i
o
n
co
lu
m
n
s. Th
e fi
rst co
l
u
mn
is
for class
1 (No),
an
d
t
h
e second
fo
r class 2
(Yes). Th
e ‘*
’ n
e
x
t
to
th
e
prob
ab
ility d
i
str
i
b
u
tion
im
p
lie
s th
e co
rrect class’s
p
r
ob
ab
ility.
The c
o
rrect class refers t
o
the class shown
unde
r “
p
redicted.”
For
each row, t
h
e
probability
distribution for the two classes sum
up
t
o
1. Taking row 1
as
an
exam
ple,
it is noticeable t
h
at 0.923 and
0.077
ad
d
to
1
.
Th
e
g
i
v
e
n
v
a
lues ind
i
cate th
at class 1
was pr
ed
icted
with
a probab
ility o
f
0
.
923
, and
th
ere is a v
e
ry
sm
a
ll p
r
o
b
a
b
ility, 0
.
0
7
7
,
to
co
n
c
l
u
d
e
it is class 2
.
Ou
t
o
f
th
e twen
ty classes, ten
were p
r
ed
icted
with
a
p
r
ob
ab
ility o
f
1
.
Th
is in
cl
u
d
es 9
No’s and 1
Yes. Th
e
sm
a
llest
p
r
ob
ab
ility fo
r p
r
ed
i
c
tin
g
class 1
(No) is
0
.
7
7
8
.
Fo
r class 2
(yes), th
e
sm
a
llest p
r
ob
ab
ility is
0
.
82
9. Th
erefo
r
e, th
e p
r
ed
iction
s
mad
e
are tru
s
two
r
t
h
y
and reliable.
Th
e ab
ov
e men
tio
n
e
d
p
r
obab
ility d
i
strib
u
tio
ns are
n
o
rmall
y
i
m
p
o
r
tan
t
if fu
rt
h
e
r
an
alysis and
research
is n
e
ed
ed
b
y
h
u
m
an
.
Wh
en
d
ealin
g
with
d
e
p
r
essio
n
, ph
ysician
s
will d
e
fin
i
t
e
ly p
u
r
su
e furth
e
r
ex
am
in
atio
n
prio
r to adop
ting
th
e reco
mmen
d
a
tion (p
redi
ctio
n
)
. Hen
ce, th
ese
prob
ab
ility d
i
strib
u
tion
s
are
essen
tial fo
r m
e
d
i
cal app
licatio
n
s
of
d
a
ta m
i
n
i
ng
an
d it wou
l
d
b
e
app
r
op
ri
ate to
tak
e
th
em
in
to
con
s
i
d
eratio
n.
Table 4. Unkown
cases with
t
h
eir diagnosis
1
2
3
4
5
6
7
8
9
1
0
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
1
2
1
2
2
3
2
1
2
2
2
2 1 1 2 1 3 0 2 2 2 3 1 1 2 1 1 2 1 1 1
No
2
3
2
1
2
1
0
1
3
2
2
1 1 3 2 3 2 3 0 1 1 2 1 1 2 2 2 2 1 1 2
No
3
2
3
2
3
1
3
1
3
2
2
3 1 1 0 1 1 1 1 1 2 1 2 1 1 1 3 3 1 3 0
No
4
1
1
2
3
3
3
1
3
1
1
3 2 3 1 1 1 3 2 1 2 2 2 3 3 2 3 3 2 3 2
Yes
5
2
1
0
2
2
2
2
3
1
2
1 2 2 1 1 1 3 0 1 3 2 1 3 1 3 1 2 1 3 2
Yes
6
0
2
2
1
1
3
3
2
1
2
2 2 2 1 2 3 1 2 1 2 1 1 1 2 1 1 2 2 2 1
No
7
1
3
1
3
3
2
1
1
2
3
2 1 2 2 1 2 1 0 2 2 1 1 1 1 2 1 2 2 2 2
No
8
2
2
0
2
1
0
2
2
2
2
3 1 3 3 2 2 1 3 3 1 1 3 1 2 1 3 1 1 0 1
No
9
3
2
1
3
2
1
3
1
3
1
1 2 1 2 1 2 2 2 1 3 2 3 3 3 3 2 3 2 1 1
No
10
1
1
2
2
1
2
1
3
3
2
2 3 0 3 1 1 1 3 1 3 2 3 1 2 2 2 1 2 3 2
Yes
11
3
2
3
3
2
3
1
1
3
2
1 3 3 2 2 1 1 2 1 3 1 3 2 3 2 1 1 1 2 1
No
12
1
1
2
1
2
0
1
1
0
2
3 0 1 0 1 2 0 3 1 2 0 3 3 2 2 3 0 3 0 0
No
13
2
2
1
2
2
1
3
0
2
1
1 3 3 2 3 3 2 2 2 3 2 1 3 1 2 3 1 1 0 3
No
14
3
1
1
2
3
2
1
2
3
2
3 3 3 1 2 1 1 1 2 3 2 2 2 1 1 2 3 3 0 2
No
15
1
1
2
1
2
1
2
2
3
3
2 1 2 1 1 1 1 3 2 3 0 3 1 2 1 2 3 3 3 2
Yes
16
3
2
2
3
3
1
3
3
2
3
2 2 3 2 2 2 3 1 2 3 1 1 1 2 1 2 1 2 1 2
No
17
0
1
1
1
0
3
3
2
1
3
1 1 3 1 2 3 1 2 3 1 1 1 2 2 3 1 0 2 2 2
Yes
18
3
1
1
1
3
1
2
2
3
2
1 3 1 1 1 3 2 1 1 2 3 1 3 3 2 2 3 2 2 3
Yes
19
1
0
3
3
3
0
2
0
3
0
1 3 3 2 1 3 0 0 2 2 2 2 2 1 2 1 3 0 0 0
No
20
1
2
2
2
3
3
2
3
3
1
1 2 1 1 2 2 3 3 1 1 2 3 1 3 2 2 2 1 2 3
Yes
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
252
-88
06
IJPHS Vol. 3, No. 4, D
ecem
ber 2014
:
231 – 240
23
8
Fi
gu
re 3.
W
E
K
A
out
put
o
n
un
seen
i
n
st
ances
5.
RESULTS INTERPRETAT
I
ON
To
in
terp
ret the resu
lts, Tab
l
e 4
will b
e
u
tilized
. Th
is tab
l
e co
n
t
ain
s
all th
e d
a
ta u
s
ed
for th
e 20
un
k
o
w
n
case
s
t
h
at
nee
d
t
o
be
di
ag
nose
d
.
Each
ro
w
or instance
re
presents a case
to be
dia
g
nos
e
d. T
h
e
num
bers in t
h
e firstcol
um
n represe
n
t the
case num
b
er
.
The
num
bers
in the heade
r
row re
present the
sym
p
t
o
m
s
(at
t
r
i
but
es)
as m
e
nt
i
one
d i
n
Tabl
e
1.
To i
n
t
e
r
p
ret
t
h
e di
a
g
n
o
si
s
of ca
ses (
r
o
ws
)
5 a
n
d
8
,
I
F
-T
HE
N
ru
les (If con
d
i
t
i
o
n
s
th
en
co
n
c
l
u
sion
)
will b
e
e
m
p
l
o
y
ed
.
Any sy
m
p
to
m
th
a
t
h
a
s a v
a
lu
e
of ‘0
’
will n
o
t
ap
p
e
ar
in
th
e con
d
ition
s
of th
e ru
les. Tab
l
e 1
in
sectio
n
2
ind
i
cat
es th
at th
e v
a
lu
e ‘0
’ stan
d
s
fo
r “Non
e.” Tab
l
e 4
m
a
t
c
hes t
h
e o
u
t
com
e
s of Fi
gu
re 3 a
n
d was
g
e
nerat
e
d by
t
h
e
W
E
KA sy
st
e
m
. The fol
l
o
wi
ng a
r
e t
w
o e
x
a
m
pl
es
of
usi
n
g
t
h
e
p
r
edi
c
t
i
on m
odel
.
Row
5
(
c
ase
#
5
)
IF
Sadn
ess is m
e
d
i
u
m
& Disco
u
r
ag
em
en
t is
mild
& In
feri
o
r
ity is
m
e
d
i
u
m
& Gu
ilt is
med
i
u
m
&
Ind
ecisiv
e
n
e
ss
is
m
e
d
i
u
m
& I
rritab
ility an
d
fru
s
tration
is
med
i
u
m
& Lo
ss o
f
in
terest in
life is serio
u
s & lo
ss
o
f
m
o
tiv
atio
n
is
m
ild
& Poor self-im
a
g
e
is
m
e
d
i
u
m
& Po
or m
e
m
o
ry is mild
& Lo
se
lib
id
o
is m
e
d
i
u
m
&
Hypo
chon
driasis is
m
e
d
i
u
m
& Su
ici
d
al i
m
p
u
l
se is m
i
ld
&
Slu
g
g
i
sh
is m
i
l
d
& C
r
yin
g
spells is
m
i
ld
&
Lack
o
f
e
m
o
tio
n
a
l respo
n
s
i
v
en
ess is seriou
s &
Pessi
mis
m
is
m
ild
& Ag
itatio
n
is
seriou
s & Past
failu
re is
m
e
d
i
u
m
&
R
e
duce
d
pai
n
t
o
l
e
ra
nce i
s
m
i
ld &
Desi
re
f
o
r
Soci
al
S
u
p
p
o
r
t
i
s
seri
o
u
s &
Psy
c
hom
ot
or
ret
a
rdat
i
on i
s
m
i
l
d
&
Co
nfu
s
i
o
n
is seriou
s & Scatt
e
rbrain
ed
is mild
& Co
gn
itiv
e im
p
a
ir
m
e
n
t
is
m
e
d
i
u
m
& Lo
ss
warm
feelin
g
to
ward fam
ily
o
r
friend
s is m
i
ld
&
Sub
s
tan
c
e A
b
u
s
e is seriou
s
& Ch
ildho
od
trau
m
a
is
m
e
d
i
u
m
THEN
patient will
develop de
pression
(
Yes
)
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
PH
S I
S
SN
:
225
2-8
8
0
6
Using
Da
t
a
Mi
n
i
ng
to Pred
ict Po
ssi
b
l
e
F
u
ture De
pressi
on
Cases (Kevin
Dai
m
i)
23
9
Row
8
(
c
ase
#
8
)
IF
Sadn
ess is m
e
d
i
u
m
& Disco
u
r
ag
em
en
t is
med
i
u
m
& In
feriority is
m
e
d
i
u
m
& Gu
ilt is
mild
&
Irritab
ility an
d
fru
s
tration
is
med
i
u
m
& Lo
ss of in
terest
in
life is m
e
d
i
u
m
& lo
ss of m
o
tiv
atio
n
is m
e
d
i
u
m
&
P
o
o
r
s
e
l
f
-
i
ma
g
e
i
s
me
d
i
u
m
&
P
o
o
r
me
mo
r
y
i
s
s
e
r
i
o
u
s
& Lo
se lib
ido
is
mild
& Hypo
chon
driasis is serio
u
s
&
Sui
c
i
d
al
im
pul
se i
s
seri
ous
& Sl
ug
gi
s
h
i
s
m
e
di
u
m
&
C
r
y
i
ng s
p
el
l
s
i
s
m
e
di
u
m
& Lack of em
ot
i
onal
responsive
ness
is
m
i
ld & Helplessne
ss is serious &
Pessim
i
sm is serio
u
s
& Ag
itatio
n
is mild
& Past failu
re is
mild
& Red
u
c
ed
pain
to
lerance is seriou
s
& Desire
for
So
cial Su
ppo
rt is
mild
& Psychom
o
t
o
r
retard
atio
n
is
med
i
u
m
& Con
f
u
s
ion
is m
i
l
d
& Scatterb
rain
ed is seri
ou
s & C
o
gn
itiv
e
i
m
p
a
ir
m
e
n
t
is mild
& Lo
ss
warm
feelin
g toward
fam
i
ly o
r
friend
s is m
ild
& C
h
ild
hoo
d trau
ma is m
i
ld
THEN
patient
will not devel
op
depression
(
No
)
6.
CO
NCL
USI
O
N
Depressi
o
n
is
an
exp
o
n
e
n
tially g
r
owing
m
e
d
i
cal illn
ess. It is h
a
rd
to
d
i
ag
no
se
d
e
pressi
o
n
d
u
e
t
o
a
n
u
m
b
e
r of its sy
m
p
to
m
s
b
e
in
g
sh
ared
wit
h
o
t
h
e
r so
m
a
ti
c illn
ess. In
t
h
is p
a
p
e
r, a larg
e set
o
f
attrib
u
t
es
(sy
m
pt
o
m
s) were sel
ect
ed
ba
sed
on
su
rvey
s
and
i
n
t
e
r
v
i
e
w
s
wi
t
h
e
xpe
rt
s
i
n
t
h
e fi
el
d o
f
dep
r
essi
on
. S
o
m
e
of
th
ese
attribu
t
es
o
v
e
rlap
with
v
a
ri
o
u
s
so
m
a
ti
c
illn
esses.
Ho
wev
e
r, tak
e
n
to
g
e
th
er, t
h
e ad
op
ted
attri
b
u
t
e set is
suf
f
i
c
i
e
nt
t
o
i
s
ol
at
e de
pre
ssi
o
n
fr
om
ot
her i
l
l
nesses.
Sy
nt
het
i
c dat
a
was
use
d
t
o
t
r
ai
n
a
n
d
t
e
st
t
h
e cl
assi
fi
c
a
t
i
on
m
odel
.
As ca
n be
o
b
ser
v
e
d
i
n
t
h
e fi
gu
res a
b
o
v
e, t
h
e o
u
t
c
om
es for t
h
e s
y
nt
het
i
c
dat
a
se
t
s
were
reaso
n
a
bl
e i
n
term
s of accuracy, precision,
and recall
of
the training a
n
d t
e
sting
processe
s.
Th
e abov
e d
e
p
r
essi
on
classificatio
n
app
licatio
n
will b
e
fu
rt
h
e
r im
p
r
o
v
e
d
in
th
e fu
ture. First, th
e
selected
attribu
t
es will
b
e
fu
rt
h
e
r d
i
scu
s
sed
with
m
o
re
exp
e
rts i
n
t
h
e field
t
o
d
e
ri
v
e
th
e m
o
st effectu
a
l
attrib
u
t
es set.
Hav
i
n
g
don
e t
h
at, a
surv
ey
will b
e
creat
ed
.
Th
e real
d
a
ta will b
e
u
s
ed
to train and
test th
e
m
o
d
e
l. Later,
th
e m
o
d
e
l will b
e
ap
p
lied
to un
seen
in
stances and
th
e outco
m
e
s will b
e
co
m
p
ared
with
th
e
o
u
t
co
m
e
s th
at were ob
tain
ed
u
s
ing
t
h
e syn
t
hetic d
a
ta.
AC
KN
OWLE
DG
MENTS
We
wo
ul
d l
i
k
e t
o
t
h
an
k
Dr. C
a
rl
a Gr
oh
, p
r
ofess
o
r
of
n
u
rsi
ng
at
t
h
e M
c
Aul
e
y
Scho
ol
o
f
N
u
rsi
n
g
,
Uni
v
ersi
t
y
o
f
Det
r
oi
t
M
e
rcy
,
fo
r he
r v
a
l
u
a
b
l
e
com
m
e
nt
s and s
u
gg
est
i
o
n
s
rega
rdi
ng t
h
e
at
t
r
i
but
es (sy
m
pt
om
s)
u
s
ed
in th
is st
ud
y.
REFERE
NC
ES
[1]
Dunham MH.,
“
Data
Min
i
ng: Introductor
y
and
A
dvan
ced
Topics”,
Pr
entice Hall,
2003.
[2]
S
h
apiro, G
., S
m
yth
,
P
., “
F
rom
D
a
ta M
i
ni
ng to
Knowledge Discover
y
in Datab
a
ses,”
AI Magazine
, Vol.17
, pp.
37-
54, 1996
.
[3]
Han, J., Kamber
, M., “Data Min
i
ng: Concep
ts
an
d Techniques
”
,
Morgan Kaufmann, 2006
.
[4]
Levin
,
HS., McCauley
, SR., Josic,
CP., Boak
e, C., Brown, SA., Goodm
an,
HS., Merritt, SG., Brundag
e
,
SI.,
“Predicting Dep
r
ession Following M
i
l
d
T
r
a
u
ma
tic
Bra
i
n Inj
u
ry
,”
Archiv
es of Gen
e
ral Psych
iatry
, vol/issue:
62(5)
,
pp. 523-528
, 20
05.
[5]
Oslon, D., Shi,
Y., Kumar, V., ”I
ntroduction to
Busi
ness Data
Mining”,
McGraw Hill, 2007.
[6]
P
a
rthas
a
ra
th
y,
S
., “
D
at
a M
i
ning
at th
e Cros
s
r
oad
s
: S
u
cces
s
e
s
,
F
a
ilures
and L
earn
i
ng F
r
om
Them
”,
The 13
th
ACM
SIGKDD Intern
ation
a
l Conf
eren
ce on
Knowledg
e Discover
y
and
Data Min
i
ng, San Jose, CA, pp.
1053-1055, 200
7.
[7]
Tan, P., Steinbach, M., V. Ku
mar, “Introduction
to Data Mi
ning
”, Addison-Wesley
,
2006
.
[8]
Koh, H., Tan
,
G., “Data Mining
Applic
ations
in He
alth
car
e,”
Journal of Healthcare
Information Manag
ement
,
vol/issue: 19(2), pp.
64-72
,
2005
.
[9]
Milovic, B., Milovic, M., ”Pr
e
d
i
ction
and Decision Making in H
ealth Car
e
using Data Minin
g
”, Internationa
l
Journal of Public Health S
c
ien
c
e (
I
JPHS)
, vol/issue: 1(2)
, pp
. 69-
78, 2012
.
[10]
W
a
ng, J
., Zhou
,
Z., Y
a
n, R
., "Be
n
efits
and B
a
rri
ers in Mining the Health
car
e Indu
str
y
Data",
In
ter
national Journal
of Strategic Decision Sciences
(
I
JSDS)
, vol/issue:
3(4), pp
. 51-67
,
2012.
[11]
Obens
h
ain, M
K
.
,
“
A
pplica
tion o
f
Data M
i
n
i
ng
Techn
i
ques
to
Health
car
e Dat
a
”,
Infection Con
t
rol and Hospital
Epidemiolog
y
, v
o
l/issue: 2
5
(8), p
p
. 690-695
, 200
4.
[12]
Fay
y
a
d
,
U., Piat
etsk
y
-
Agr
a
wal
,
A., Al-Bahran
i
,
R.,
Merkow, R., Bilim
oria, K.,
Choudhar
y
, A.,
“Colon Surger
y
Outcome Prediction Using ACS NSQIP Data”,
KDD Work
shop
on Data Mining for Heal
thcare (DMH),
Chicago,
IL, Aug
.
2013
.
[13]
Agrawal, A., R
u
sso, M., R
a
man, J., Choudhar
y
, A., “Hear
t Tr
ansplant Outco
m
e Prediction u
s
ing UNOS Data”,
KDD Workshop
on Data Min
i
ng
for Health
car
e (
D
MH), Chicago
,
IL, Aug. 2013.
[14]
Tahs
in,
T.
,
Em
adzadeh
, E
.
,
Gonzal
ez
, G.,
“
A
utom
ated Ext
r
actio
n and Classification of
Drug-Drug Inter
actions fr
om
Text”, in
KDD
Workshop on Data Min
i
ng for
Healthcare (
D
MH)
, Chicago, IL, Au
g. 2013
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
252
-88
06
IJPHS Vol. 3, No. 4, D
ecem
ber 2014
:
231 – 240
24
0
[15]
Mathew, RJ., Largen, J., Claghor
n, JL.,
“Biological S
y
mptoms of
Depression”,
Psychosomatic Medicin
e
, vo
l/issue
:
41(6), pp
. 439-4
43, 1979
.
[16]
Trived
i, MH., “
T
he
Link b
e
tween De
pression
and Ph
y
s
ical S
y
mptoms”,
The Primary Care
Companion to the
Journal of C
linical
Psychia
try
, v
o
l/issue: 6
(
1), pp
. 12–16
, 2004
.
[17]
Van Voorhees,
BW., Paunesku, D., Gollan
, J
.
,
K
u
w
a
bara,
S
.
,
R
e
ine
c
ke
, M
.
, Ba
s
u
, A., “Predicting Future Risk
of
Depressive Episode in Adolescents: The Chicag
o
Adole
s
c
e
n
t De
pre
ssion Risk
Asse
ss
me
nt (CA
D
RA)”
,
Annals of
Family
Medi
cin
e
, vol/issue: 6(6),
pp 503-511, 200
8.
[18]
De Choudhur
y
,
M., Gamon
M., Counts S., Horvitz, E.,
“
P
redicting
Depr
ession via Social Media”, Th
e 7
th
International AA
AI Conferen
ce
o
n
Weblogs an
d
Social Media, Bost
on, Massachusetts, 2013
.
[19]
Tung, C., Lu
, W., “Predict Depr
essi
on Tenden
c
y of Web Posts u
s
ing Nega
tiv
e Emotion Evalu
a
tion Model”, AC
M
SIGKDD Workshop on Health
In
formatics
(HI-K
DD 2012), B
e
ijing, Chin
a, 2012.
[20]
Fuller-Thomson, Esme, Meghan
Sc
hrum
m
,
S
a
rah Brenn
e
ns
tuhl
, "M
igrain
e
and
des
p
air
:
fa
ctors
as
s
o
cia
t
ed wi
th
depression and s
u
icid
al id
eation
among Ca
nadian
migraineurs in
a population
-
ba
sed stud
y
"
, Dep
r
es
sion research
an
d
treatment, 2013.
[21]
Abdel-Khalek
,
AM., “
C
an Somati
c S
y
m
p
tom
s
Predict Depr
ession
?
”
,
In
ternatio
nal Journal ofS
o
cial Behavior and
Personality
, vo
l/issue: 32(7), pp.
657-666, 2004
.
[22]
Cloninger
,
CR.,
Svrakic, DM., Przy
b
eck
,
TR., “
C
an Persona
li
ty
Asse
ss
me
nt
Pre
d
i
c
t Fut
u
re
De
pre
ssi
on?
A T
w
e
l
ve
-
Month Follow-Up of 631
Subjects”,
Journal of Affe
ctive
Disorde
r
s
, vol. 92, pp. 35
-44, 2006
.
[23]
Robinson, MS., Allo
y
,
L.B
., “
N
egativ
e Cogni
tive St
y
l
es
and
Stress-Reactiv
e
Rum
i
nation Intera
ct to Predi
c
t
Depression: A P
r
ospectiv
e Stud
y”,
Cogn
itive Therapy and Resear
ch
, vo
l/issue: 27
(3), pp
. 275-291
, 2003.
[24]
Rude, SS.,
Vald
ez,
CR.
, Odom
, S.,
Ebrahim
i
,
A., “
N
egat
ive C
ognitive
Bias
es
Predict Subsequ
e
nt Depr
ession”
,
Cognitive Thera
p
y and
Research
, vol/issue: 27(4)
, pp
. 415-429
, 2
003.
[25]
Beck Depr
ession Inventor
y
,
Mood/Depressi
on
Assessmen
t Questionnaire, Available:
http://www.ibog
aine.desk.
n
l/graphics/3639b1c_
23.pdf.
[26]
Burns Depression Checklist,
Un
iversity
Health services, Un
iversity
of
California, Berk
eley, 2010, Availab
l
e:
http://uhs.berk
e
ley
.
edu
/
home/
healthtopics/PDF%20Handouts/De
p
r
ession%20Check%20List.pdf
.
[27]
Survey
s of Adult U.S. Women and Do
ctors Gauge Percep
tions
about Depr
ession through Hormonal Transitio
n
s,
Society for Women Health r
e
search, 2007, Available:
http://www.
womenshealthresearch.
o
rg/site/DocServer/DepressionSu
rvey
Anal
y
s
is
.pdf?
d
ocID=1801.
BIOGRAP
HI
ES OF
AUTH
ORS
Kevin Daimi
is a full prof
essor and dir
e
ctor
of
Computer Science
and Softwar
e
Engin
eer
ing
program
s at th
e
Universit
y
of
De
troit Mer
c
y,
USA. He jo
ined
the
Universit
y
o
f
D
e
troit
Merc
y
in
1998 after
working in industr
y
f
o
r a number of
y
e
ars.
Kevin r
eceived
a Master of Scien
ce
in
Applied Computation
(1980) and a Ph.D. in
Computational
Optimal Contro
l (1983) from
Universit
y
of Cr
anfie
l
d,
England
.
He is
a fe
llow
of the Br
itish Co
m
puter Societ
y
(
BCS), a senior
m
e
m
b
er
of the Association for Com
puting Machin
er
y
(ACM),
a senior m
e
m
b
er of the Institute
of Elec
tric
al and
Elec
tronics Eng
i
neers (IEE
E
),
a
nd a m
e
m
b
er of the IEEE Com
puter S
o
cie
t
y.
His research interests include
co
mputer and netw
or
k security
, software engin
eerin
g, data mining
,
and computer
science
and soft
ware eng
i
neering
education.
S
h
adi Bani
taan
is
current
l
y
an
a
s
s
i
s
t
ant profes
s
o
r at
the M
a
them
ati
c
s
,
Com
puter
S
c
ienc
e,
and
Software Engin
eering d
e
par
t
ment at
the Universi
ty
of De
troit Me
rcy
.
He
te
ac
he
s c
l
a
sse
s in
Software Engin
eering
and Com
puter S
c
ienc
e
.
His
res
earch
interes
t
s
in
cl
ude s
o
ftware
engineering
and
data mining. H
e
is a member
of the Association for Computing Machiner
y
(ACM), a m
e
m
b
er of the Institut
e
of Elec
tric
al
a
nd Elect
ronic E
ngineers (IEE
E
),
and a m
e
m
b
er
of the IEEE Co
m
puter S
o
ciet
y.
He rec
e
ived a B
.
S
.
degre
e
in Com
puter S
c
ience
from
Yarm
ouk
University
, an
M.S. de
gree in
Computer and I
n
formati
on Sciences from Yarmouk University
,
and a Ph.D. degr
ee in Computer
Science from Nort
h Dakota State University
. He taught for fiv
e
y
e
ars at
the Univ
ersity
of
Nizwa,
Oman. He jo
ined the University
of Detroit Mer
c
y in 2013
.
Evaluation Warning : The document was created with Spire.PDF for Python.