TELKOM
NIKA
, Vol.14, No
.2, June 20
16
, pp. 674~6
8
3
ISSN: 1693-6
930,
accredited
A
by DIKTI, De
cree No: 58/DIK
T
I/Kep/2013
DOI
:
10.12928/TELKOMNIKA.v14i1.2385
674
Re
cei
v
ed Fe
brua
ry 28, 20
16; Re
vised
April 20, 201
6; Acce
pted
May 4, 201
6
Cluster Analysis for SME Risk Analysis Documents
Based on Pillar K-Means
I
r
fa
n
W
a
hy
ud
i
n
*
1
, Taufik Djatna
2
, Wis
nu Anan
ta
K
u
sum
a
3
1,3
Computer Scienc
e, Mathem
atic and N
a
tura
l
Scienc
e, Bog
o
r Agricu
ltural
Univers
i
t
y
2
Postgradu
ate
Program i
n
De
pt. Agro-ind
ustrial
T
e
chnol
og
y, Bogor Agricu
l
t
ural Un
iversit
y
,
Kampus IPB Darmaga P.O. B
o
x
220 Bogor
, (62-251)8621974/
(62-
251)8621974
T
e
l: (0251) 862
284
48, F
a
x: (0
251) 8
6
2
298
6
*Corres
p
o
ndi
n
g
author, e-ma
i
l
: irfan
w
a
h
y
ud
i
n
@a
pps.ip
b
.ac
.
id
1
, taufikdjatn
a
@ip
b
.ac.id
2
, a
nanta
@
ip
b.ac.i
d
3
A
b
st
r
a
ct
In Small M
edi
um
Enterpr
i
se
’s (SME) finan
cing r
i
sk an
aly
s
is, the i
m
p
l
e
m
e
n
tatio
n
of q
ualit
ati
v
e
mo
de
l by giv
i
n
g
opi
ni
on re
ga
rdin
g busi
ness
risk is to ove
r
come the su
b
j
ectivity in q
u
a
n
titative
mo
del
.
How
e
ver, there
is anoth
e
r pro
b
le
m that
the
d
e
cisio
n
mak
e
rs have d
i
fficulity
to quantify the
risk
’
s
w
e
i
ght tha
t
deliv
ere
d
throu
gh those o
p
i
n
i
ons. T
hus, w
e
focused on
th
ree ob
jectives
to overco
me t
he pro
b
le
ms t
hat
oftenly occur in
q
u
a
litative
mo
de
l
i
m
p
l
e
m
entatio
n.
First, we m
o
delled risk clusters
using K-Means
clusteri
ng, opti
m
i
z
e
d
by Pi
llar
Algorith
m
to g
e
t the
opti
m
u
m
nu
mber of cl
u
s
ters. Second
l
y
, w
e
performe
d
risk me
asur
ement by calcu
l
a
t
ing ter
m
-i
mp
o
r
tance
scores
usin
g T
F
-
IDF
combi
ned w
i
th
term-senti
m
e
n
t
scores b
a
sed
on Se
ntiW
ord
N
et 3.0 for B
ahas
a Indo
ne
sia. Eventu
a
lly
, w
e
summari
z
e
d
the res
u
lt
by
correlati
ng
the
featured
ter
m
s
in e
a
ch c
l
uster
w
i
th t
he 5Cs
C
r
edit Cr
iteria. T
he res
u
lt s
how
s that the
mod
e
l
is effective
to g
r
oup
an
d
me
as
ure th
e l
e
vel
of
the ri
sk
a
nd c
an
be
use
d
as
a b
a
sis for
the
decisi
o
n
maker
s
in ap
provi
ng th
e loa
n
pro
posa
l
.
Ke
y
w
ords
: ris
k
analys
is, SME busi
ness, ce
ntroid
opti
m
a
z
i
on, K-Mea
n
s, opi
nio
n
min
i
ng
, pillar
alg
o
rith
m,
senti
m
e
n
t ana
l
ysis
Copy
right
©
2016 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Introduc
tion
Curre
n
tly, Th
ere
are t
w
o
model
s th
at
are
wi
dely u
s
ed to
imple
m
ent ri
sk a
s
se
ssment i
n
financi
ng, na
mely a qu
ant
itative model
and
a qu
alitative model [
1
]. Risk a
s
se
sment fo
r S
M
E
busi
n
e
ss in
n
a
tional ba
nks in Indone
sia
is co
mmonl
y
dominate
d
by
the impleme
n
tation of cre
d
it
scorin
g sy
st
em (q
uantita
t
ive model
).
Unfortu
nately
,
not all of
the ban
ks h
a
ve su
cce
s
fully
impleme
n
ted
these mo
del
s. This con
d
ition is a
s
sh
o
w
n in a natio
nal private b
ank in Ind
o
n
e
sia,
whe
r
e the
No
n Performi
ng
Loan (NPL
) ratio ha
s an i
n
clini
ng tren
d
in the last th
ree yea
r
s. Th
is
con
d
ition d
r
o
v
e the top
m
anag
ement
o
f
the ba
nk
to
encourage
th
e ri
sk ma
nag
ement divi
sio
n
to
refine the imp
l
ementation o
f
risk ma
nag
e
m
ent.
From the obs
e
rvation thr
ough the loan ass
e
s
m
ent’s
Standard
Operating Proc
edure
(SOP) in the
bank where
this research is con
d
u
c
ted, it is found that the there a
r
e so
me
leakage
s in measuri
ng the accepta
n
ce
criteri
a
. In
fa
ct, the leakag
es are domin
antly found in t
h
e
cre
d
it sco
r
ing
system,
and
cu
stome
r
’s fi
nan
cial q
ua
lit
y analysi
s
wh
ich a
r
e
pa
rt o
f
what
we
call
ed
as qu
antitative model. In q
uantitat
ive model, the obj
ectivity is ques
tion
ed, sin
c
e it is pe
rformed
by the marke
t
ing staff that st
and si
de
s to the cu
stom
er. More
over,
the data are
origin
ated fro
m
the cu
stome
r
itself, vulnerable to have a manipul
atio
n, espe
cially whe
n
the financial
stateme
n
t
has
no a
n
y insp
ectio
n
fro
m
the extern
al audito
r.
Hence, the qu
alitativ
e mod
e
l are
depl
oyed t
o
overcome th
ese d
r
a
w
b
a
cks,
whe
r
e th
e analysi
s
i
s
obje
c
tively proce
ed by so
me risk a
nal
ysts.
Ho
wever, t
h
e
implem
entati
on of
qualitati
v
e model
i
s
n
o
t wh
olly reliable,
whi
c
h
wa
s in
dicated
in
the ban
k’s
NPL ratio. The
other p
r
obl
e
m
is that
the
qualitative m
odel was
not
signifi
cantly u
s
ed
by the autho
rities in
ma
ki
ng de
ci
sion
sin
c
e the
r
e
i
s
no
mod
e
l to qua
ntify the wei
ght of ri
sk’
s
busi
n
e
ss that
implicitely d
e
livered th
ro
ugh opi
nion
s.
Also, there i
s
no d
e
ci
sio
n
crite
r
ia that
can
be used a
s
a baseline in m
a
kin
g
the de
cision.
Afterwa
r
ds,
we form
ulate
three obj
ect
i
ves of this
resea
r
ch to address the
probl
em
statement
s a
bove: (1) T
o
perfo
rm cl
ust
e
ring ta
sk
to grou
p the ri
sk analy
s
is
do
cume
nts, si
n
c
e
there i
s
no l
abele
d
do
cu
ments yet, (2
) To
mea
s
u
r
e the
risk lev
e
l in e
a
ch cl
uster u
s
ing
term-
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Cluster Anal
ysis for SME Risk Anal
ysis
Docu
m
ents B
a
sed on Pillar K-
Means (Irf
an Wahyudin)
675
importa
nce and sentime
n
t analysi
s
, and (3) To
evaluate cl
usteri
ng ta
sk and se
ntim
ent
measurement
to reveal th
e
implicatio
n
wi
th the
cr
ite
r
ia
in
a
s
sessin
g
t
he
lo
an ri
sk. The usage
s o
f
machi
ne lea
r
ning tech
niqu
es for credit
scorin
g and ri
sk q
uantificati
on we
re su
ccessfully prov
en
to be more superi
o
r tha
n
tradition
al (sta
tistical) te
ch
ni
que [2]. Ho
wever, those machi
ne lea
r
ning
techni
ques were still
used
financi
a
l report (num
erical
data)
t
o
compute t
he
credit
score,
therefo
r
e tho
s
e a
r
e vulne
r
able to mani
pulation a
s
we me
ntione
d earlie
r. Th
us, usi
ng opi
nion
data from
risk an
alysts
we su
gge
st a
new
app
ro
a
c
h to perfo
rm
sentime
n
t an
alysis
com
b
i
ned
with machine
learni
ng to qu
antify the risk.
Reg
a
rdi
ng to
the techniq
ues u
s
e
d
in
a sentimen
t analysis, there a
r
e two majo
r
techni
que
s
commonly u
s
ed; those a
r
e ma
chi
ne
learni
ng b
a
sed an
d lexi
con
ba
sed [
3
].
Supervi
sed
machi
ne l
earning te
ch
niqu
es th
at ar
e commonly use
d
are su
ch a
s
Sup
p
o
r
t
Ve
ctor
Machi
ne [4],
Neu
r
al
Net
w
orks [5], and
Naive Baye
s
[6]. In additio
n
, for
un
sup
e
r
vised
ma
chi
ne
learni
ng, the
r
e a
r
e
seve
ral
clu
s
terin
g
te
chniqu
es, l
e
t say K-Me
an
s [
7
] and
hie
r
a
r
chi
c
al
clu
s
teri
ng
[8].
For the lexicon based, there are various le
xicon resources that
can be utilized as
a
diction
a
ry to
determi
ne th
e pola
r
ity of term
s, su
ch
a
s
Senti
W
ord
N
et [9], whi
c
h is d
e
rive
d from a
well
kn
own
corpu
s
, th
at is, Wo
rd
Net, a
n
Engli
s
h
dictionary fo
r
word
syno
nym
s
a
nd
antony
ms.
The next on
e
is SentiStren
gth [10], which is a
lexi
con
base
d
techni
que, dist
ribut
ed as
a de
sktop
appli
c
ation to
ol that is al
ready combin
ed with
seve
ral
po
pula
r
supervi
sed an
d
un
sup
e
rvised
cla
ssifie
r
al
g
o
rithm
s
: SVM, J48
cla
ssification tree,
and
Naive
Bayes. Th
e
other i
s
e
m
o
t
icon
based
sentim
ent analy
s
is,
whi
c
h i
s
con
s
idere
d
the
si
mplest
one [1
1]. Unfortu
nat
ely, most of t
he
lexicon
dictio
narie
s a
nd
corpu
s
re
sou
r
ce
s a
r
e d
e
si
g
nated fo
r En
glish. Som
e
efforts h
a
ve
been
done to ove
r
come thi
s
sh
ortfall, for instance, by
tra
n
slatin
g eithe
r
the ob
se
rve
d
co
rpu
s
obj
ect
[12] or the l
e
xicon di
ctio
nary [13]. Moreove
r
,
se
ntiment analy
s
i
s
we
re al
so
can b
e
u
s
ed
to
sup
port
de
cision ma
king.
Z
hang
et al., [14] empl
oy
ed
sentim
ent an
alysis to
agg
regate
s
net
wo
rk
publi
c
se
ntiment emergen
cy decisi
on-m
a
kin
g
.
Suppo
rt Vect
or Decompo
s
ition (SVD) fo
r dime
n
s
io
n redu
ction an
d con
c
e
p
t extra
c
tion is
perfo
rmed
as an initializati
on followed b
y
a cluste
ring
task u
s
in
g K-Mean
s, optim
ized by
centroid
initialization
n
a
mely Pillar
Algorithm [1
5
]. A me
thod to mea
s
u
r
e th
e term im
portance f
r
om th
e
r
i
sk
op
in
io
n co
r
p
us
is
pe
r
f
o
r
med
us
in
g
th
e
w
i
de
ly use TF-I
D
F, co
mbined
with
positive-neg
a
t
ive
polarity mea
s
urem
ent usi
n
g the Sent
iWord
N
et 3.0 lib
rary [8]. Unli
ke
in English, as of today th
ere
is o
n
ly one
int
e
rnatio
nal
re
search
publi
c
a
t
ion utiliz
in
g
SWN
3.0 in
Baha
sa Ind
one
sia th
at aim
s
to
detect
sarcasm in social m
edia
[13]. The translation
probl
em
s ar
e overcomed by utilizing tool
s
and techniq
u
e
s
su
ch a
s
Googl
e Tra
n
s
late, Kategl
o (
Kam
u
s B
e
sa
r Baha
sa
Indone
sia
ba
s
e
d
diction
a
ry), a
nd by asking
ban
king exp
e
r
ts
for spe
c
ific ban
kin
g
an
d finance terms.
2. Rese
arch
Metho
d
As a ca
se st
udy, we con
d
u
cted it in on
e
of national private ban
ks in Indonesi
a
where
the SME fin
anci
ng i
s
o
n
e
of thei
r co
re b
u
si
ne
sse
s
. We colle
cted abo
ut 51
9 ri
sk an
alysis
document
s from the Ri
sk
Manag
eme
n
t division. A
ll
of the do
cum
ents a
r
e in
Microsoft Wo
rds
(*.do
c
a
n
*.d
o
cx fo
rmat),
con
s
i
s
ting
of
narrative
o
p
in
ions in B
aha
sa Indo
ne
sia.
There a
r
e
se
ven
o
p
i
n
i
on
po
in
ts
de
live
r
ed
in th
e
do
cu
ments
;
th
o
s
e a
r
e
1)
Credit S
c
o
r
ing
2) Fin
a
n
c
ial Pe
rforma
nce
3) Propo
se
d
Loan
Fa
cility 4) Bu
sine
ss Perform
a
n
c
e 5)
Re
paym
ent Ability an
d Ca
sh
Flo
w
6)
Legal
Analysi
s
, an
d 7
)
Foreign Exchan
ge
(optio
n
a
l).
All of the
pa
rts
we
re
anal
yzed
ba
sed
o
n
5Cs Credit Criteria (Cha
ra
cter, Capa
city
, Capital, Co
n
d
ition, and Collateral
)
.
As se
en in Fi
gure
1 belo
w
,
the re
sea
r
ch
frame
w
ork
was divide
d int
o
4 pa
rts; tho
s
e a
r
e
1) P
r
eprocessing 2) Ri
sk
Cluste
ring 3) Ri
sk Measurement, and
4)
Evaluation. We will discuss
the details a
n
d
results in th
e followin
g
se
ction.
2.1. Preproc
essing
Processin
g
al
l parts an
d its co
ntent in a ri
sk analy
s
i
s
do
cume
nt is unn
ecessa
ry since
there might b
e
one or mo
re part whi
c
h
do not cont
ai
n information
about ri
sk
su
ch a
s
the ope
ning
and
th
e clo
s
ing se
ction. We ob
serve
d
that
th
e
r
e
are
three
major pa
rts
in ri
sk an
alysi
s
document
s:
(1) Ope
n
ing
(2) Opinio
n a
nd
mitigati
o
n
(3
) Clo
s
ing
and sig
nature.
Since wha
t
this
resea
r
ch re
all
y
need i
s
the
opinion, thu
s
, we only retrieved the ri
sk opinion
and
mitigation pa
rt
by parsi
ng th
e document
s.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 2, June 20
16 : 674 – 68
3
676
Figure 1. Re
search F
r
ame
w
ork
2.2. K-Me
ans
Cluste
ring
K-Mean
s
clu
s
terin
g
, a wi
dely use
d
ce
ntroid ba
se
d partitionin
g
al
gorithm, was used i
n
orde
r to
find
ho
w ma
ny risk cl
uste
rs
exist in th
e
corpu
s
. Th
e
origin
al p
r
op
osal
of K-M
e
ans
clu
s
terin
g
used ran
dom ce
ntroid sele
ction in the
begi
nning of iterat
ion. This is n
o
t an issu
e when
it com
pute
s
a
sm
all
size of
data
s
ets.
Ho
wever,
deali
n
g with
a l
a
rge
si
ze
of data
coul
d ta
ke
a l
o
t
of time to produ
ce the
be
st ce
ntroid
selectio
n. Thu
s
, we im
plem
ent an o
p
timization
algo
rithm,
named Pillar Algorithm that propos
ed to tackles the drawback of K-
Means cl
ust
e
ring in initiat
i
ng
centroid.
2.3. Singular Value Deco
mposition
The term-d
ocument mat
r
ix wa
s con
s
ide
r
ed
a hig
h
di
mensi
onal
m
a
trix, so it co
uld take
big am
ount
of time to
h
a
ve computa
t
ion. Thu
s
, t
he te
rm-document m
a
trix
dimen
s
io
n
wa
s
redu
ce
d by u
s
ing Si
ngula
r
Value
De
co
mpositio
n.
T
he be
st K-ve
ctors to
rep
r
ese
n
t the
wh
ole
dataset [18] were sel
e
cte
d
based on fo
rmulatio
n (1
).
←
∑
/
∑
(
1
)
2.4. TF-IDF a
nd Sentimen
t Weigh
t
ing
In gene
ral, T
F
-IDF [1
4] wa
s u
s
ed
to id
e
n
tify how im
p
o
rtant i
s
e
a
ch
available
term in the
corpu
s
. It is also a
com
m
on tech
niqu
e
to calcul
ate
the vector
weight ba
sed
on the sema
ntic
r
e
la
te
dn
es
s
[
17
].
tf
t,
d
,
the frequen
cy of term
t
in docum
ent
d
, is defin
ed as fo
rmula
(2).
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Cluster Anal
ysis for SME Risk Anal
ysis
Docu
m
ents B
a
sed on Pillar K-
Means (Irf
an Wahyudin)
677
,
,
(
2
)
For
idf
i
, inverse do
cum
ent freque
ncy for
term
t
in the corpu
s
D
i
s
de
fined as form
ula (3
).
(
3
)
Whe
r
e
N
is th
e numbe
r of document av
ailable in co
rpus,
n
t
is occurren
ce num
ber of term
t
in all
document
s in
the co
rpu
s
. T
here
wa
s a lit
tle modifi
catio
n
in implem
e
n
ting the form
ula above. Th
e
t
e
rm in t
h
e b
a
si
c TF
-I
DF
wa
s
sele
ct
ed
dist
in
ct
ly ba
sed
only o
n
how th
e te
rm
wa
s
spell
ed,
and
disrega
rd
ed the term
pre
p
o
sition i
n
se
n
t
ence. Si
n
c
e
the SWN
3.0
wa
s al
so ba
sed
on the te
rm
prep
ositio
n, the term po
si
tion in the term
list need
e
d
to be add
ed, as obtain
ed in the POS
Taggin
g
task
1
. In sentiment
weightin
g the pro
c
e
s
s si
mply done by
comp
arin
g the negative score
and th
e
po
sitive sco
r
e i
n
S
W
N 3.0
data
base a
s
def
in
ed in
logi
cal
f
o
rmul
ation
be
low. T
he
re
ason
is be
cau
s
e th
e value po
siti
ve and ne
gati
v
e value va
ri
ed from 0, 0.
123, 0.5,
0.6
25, 0.75, 1, thus
we ne
ed to d
e
fine the exa
c
t sentim
ent of a term as seen in Fo
rmul
a (4).
1,
0
1,
0
(4)
We
com
b
ine
d
Te
rm Imp
o
r
tance
Weig
h
t
ing u
s
ing
TF
-IDF
and
Se
ntiment Weig
hting to
define ri
sk lev
e
l in each clu
s
ter ge
ne
rate
d from
the Ri
sk
Clu
s
terin
g
pro
c
e
ss. Th
e idea was
cam
e
to find out on
how the
risk analyst emp
hasi
s
the
u
s
a
ges of terms
by calculating
its importa
nce
usin
g TF-I
D
F. And the sent
iment weig
hting used to
ca
lculate the p
o
l
arity, whethe
r a term is te
nd
to positive o
r
negative. He
nce, the fo
rm
ulation of
b
o
th cal
c
ul
ation
for ea
ch te
rm
described
a
s
belo
w
in Formula 5, wh
ere
.
,
, is
TF-IDF sc
ore of term
t
in document
d
, an
d
s
is
sentime
n
t score of term
t
.
.
,
(5)
3. Results a
nd Analy
s
is
3.1. Preproc
essing
Preprocessin
g
is th
e p
r
ereq
uisite
ta
sk
to re
mov
e
stop
word
s, pun
ctuatio
n, and
unimpo
r
tant term
s, and to
formali
z
e the
terms u
s
e
d
a
s
feature vect
or for the n
e
xt task [16]. After
scanni
ng the seven p
a
rts o
f
the corpu
s
, there we
re 3289 term
s fo
und. From th
e term colle
ct
ion,
there
we
re
se
veral thin
gs t
o
do
such a
s
fixing typos
a
nd fo
rmali
z
in
g the te
rm
s.
To d
o
so, a
mini
diction
a
ry wa
s created to b
e
use
d
as a referen
c
e fo
r the pro
g
ram.
Furthe
rmo
r
e,
the form
alization was d
one fo
r som
e
term
s li
ke
“st
r
atejik”
”strategi
”
(strategy), “spe
kul
a
s”
”spekula
s
i” (speculation
)
, “melam
ah
”
”melema
h
” (decli
ning
). The
formali
z
ation
wa
s al
so imp
o
rtant
since some te
rms
could n
o
t be f
ound i
n
the S
W
N
3.0 lexicon,
althoug
h those are not typos. For in
stan
ce, te
rm
s like “kom
p
e
reh
e
n
s
if” was converte
d
to
“ko
m
prehe
nsi
f
”; “identity” wa
s co
nv
erte
d to “identita
s
”; “volatility”
wa
s co
nverte
d to “volatilitas”.
We al
so fa
cin
g
pro
b
lem
s
li
ke tra
n
sl
ation
and fi
ndin
g
the prope
r syn
onim t
hat ca
n
not be found
in
SWN, tho
s
e
were solve
d
by utilizing to
ols such as G
oogle T
r
an
sla
t
e, and Kateg
l
o
2
.
3.2. Reduc
e
d
Datase
t us
ing SVD
Actually, by utilizing SVD, the dimension of t
he dataset is already r
educed, but
we tried
to get a
sm
al
ler d
a
taset b
y
getting the
best
k-
ra
nk
t
hat re
present
the enti
r
e
co
rpu
s
d
a
taset, by
usin
g formul
a
(2). In this
rese
arch
we set threshold
q
=0.9
8 and t
he re
sult is,
selecte
d
k-
r
ank
is
300. Yet, there is no
stan
dard o
n
wh
at is the best thre
shol
d for the be
st
k-r
a
n
k
. Earlier, Zhang
Don
g
(2
004
) d
e
fined that the best
k-
rank
is 0.8, Osin
zki(200
4) d
e
fine
d that the best
k-ra
nk
is
0
.
9.
The dimensi
on reduction
objective was achi
eved
by utilizing t
he document
concept
matrix
V
and
the diago
nal
matrix
∑
[8]. Reg
a
rdl
e
ss o
f
docum
ent cl
usteri
ng
can
be a
c
hieved
by
only performi
ng SVD, the
number of
document groups i
s
considered
st
ill too vast for the bank to
get the pro
p
e
r
risk mo
del, sin
c
e the
r
e are 300 con
c
ep
ts found, in ot
her
words the
r
e are 300
ri
sk
levels an
d co
nce
p
ts that can be obtai
ne
d as seen in
Formul
a 6.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 2, June 20
16 : 674 – 68
3
678
A
300
=
U
300
Σ
300
V
3
0
0
(6)
3.3. K-Means Clustering and Centroid
Optimiz
a
tion using Pillar
Algorithm
The algorithm was inspir
ed by the function of pillars of a
buil
d
ing or a
construction. It is
comm
on that
a pillar in
a
building i
s
deployed at
each edge or
at each corn
er i
n
a buildi
ng,
so
that the m
a
ss of the
buildi
n
g is con
c
ent
rated in
ea
ch
pillar. T
he
sa
me id
ea
wa
s
adopte
d
fo
r t
he
clu
s
terin
g
ta
sk that the
be
st initial
centroids
we
re
pre
s
ume
d
to exi
s
t in the
edg
e of the d
a
ta
set,
o
r
in
o
t
he
r wo
r
d
s
,
th
os
e
k-
fu
rthe
st obj
ects in
the
d
a
taset
we
re
sele
ct
ed
a
s
i
n
itial centroids,
whe
r
e
k
is th
e numbe
r of
clu
s
ters to be
obse
r
ved.
Hence, to find the best
clust
e
r solution, we
iterated som
e
possible nu
mbers
of
cl
usters from K=2
to K=10,
an
d
each iteratio
n wa
s p
r
e
c
ed
ed
by centroi
d
optimization using Pillar Algorithm as
seen in Figure 3 below.
Figure 3. Pse
udo
cod
e
to find the be
st cl
uster
solutio
n
Compl
e
te
steps of
origi
nal
Pillar Algorithm
paper are descri
bed i
n
Figure
4, where the
mean calcula
t
ion wa
s don
e for each
t
variable te
rm
, that
is, available term
s a
r
e in the term-
document
m
a
trix list
P
,
n
is the
numb
e
r
of do
cu
me
nt, and
m
is
the me
an ve
ctor of the
te
rm.
After getting
the m
ean
of
a
ll term
s
as a
starting
p
o
int
of the ite
r
atio
n, the
algo
rithm
sele
cted
the
k
farthe
st distance o
b
je
cts from
m
, defined a
s
Ж
or initial centroi
d
s, and
che
c
ked
wheth
e
r
Ж
alrea
d
y existed in
SX
list; if not, it woul
d be sto
r
ed t
o
SX.
The se
lect
ion met
h
o
d
wa
s sim
p
ly
by
sortin
g th
e d
i
stan
ce
matri
x
dataset
co
ntaining
ea
ch term
ve
cto
r
di
stan
ce
to
the m
ean.
The
distan
ce form
ula we u
s
e
d
here i
s
the ba
sic
Eu
clidia
n distan
ce me
a
s
ureme
n
t.
There a
r
e
also two
criteri
a
veriabl
es,
na
mely
alph
a
a
nd
be
ta
. Ea
ch is u
s
ed
to
determi
ne
minimum nu
mber, and
the
farth
e
st
distan
ce
of
neigh
bor obj
ects for the
sele
cted
cen
t
roid
candidates. This criteri
a
must be
fulfilled in order to avoid an outlie
r is sel
e
cted as a centroid.
3.4. Sentiment Weighting
Before
perfo
rming
sentim
e
n
t wei
ghting
by usi
ng
Goo
g
le T
r
an
slate
API, SWN 3.
0 lexicon
need
ed to
be
tran
slated
in
to Baha
sa In
done
sia
[13]. By usin
g L
e
v
ensthei
n di
stance
mea
s
u
r
e,
not all term
s i
n
the corp
us
were pe
rfe
c
tly matc
hed, so
the
pol
arity
values we
re manually set for
spe
c
ial
term
s in b
a
n
k
ing
a
nd fina
nce e.
g., “collecta
b
i
lity" or “coll
”
,
“bo
w
h
e
e
r”
(p
roje
ct em
ploy
er)
and “j
amina
n
”
(collate
ral
)
to get the p
r
e
c
ise po
si
tive(pos) o
r
ne
gat
ive(neg
) pol
a
r
ity value. If the
weig
ht was n
o
t man
ually
defined, th
ere would
be
misinte
r
pretation in
the
se
ntiment
weig
hting
task, si
nce those terms
we
re not found i
n
the lexicon.
There
were 197
terms ca
tegori
z
ed as “FIX”,
co
nsi
d
ered
as typo
s, and
nee
de
d to b
e
formali
z
ed.
O
f
all the te
rm
s, 31
6 term
s
con
s
i
s
t of pe
rson
nam
e, pl
ace
nam
e, a
nd
spe
c
ial te
rms
like “reta
k
sasi” (re
asse
sse
d
collate
ral),
“pinjam
an reke
ning
kora
n" (ch
e
ckin
g
account ba
sed
loan) catego
ri
zed
a
s
“BNK” that a
r
e
co
nsidere
d
spe
c
ia
l term
s in
ba
n
k
ing
an
d fina
nce.
Moreove
r
,
about 5
20 terms were cate
gori
z
ed
as
“K
AT” or te
rm
s that co
uld n
o
t be foun
d in S
W
N
3.0 lexicon
and th
eir pro
per syn
onym
s
n
eed
ed to
be
sea
r
ched
out in
Kategl
o
2
datab
ase.
As the
SWN
3.0
lexicon
provided both po
si
tive
and
neg
a
t
ive scores, t
he te
rm p
o
la
rity was defin
ed by
co
mpa
r
ing
the positive score and ne
g
a
tive sco
re. If the posit
ive score i
s
gre
a
t
er than nega
tive score, the
sentime
n
t wei
ght is 1, otherwise it is equ
al to -1 [9].
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Cluster Anal
ysis for SME Risk Anal
ysis
Docu
m
ents B
a
sed on Pillar K-
Means (Irf
an Wahyudin)
679
Figure 4
.
Pillar Algorithm m
odified from
Barakbah 2009
.
3.5. Ev
aluation
3.5.1. Silhouette Func
tio
n
After pe
rformi
ng the
clu
s
tering ta
sk,
the
clu
s
ter evalu
a
tion
wa
s
do
ne by
u
s
ing
silhouett
e
function
[19]. By usi
ng
sil
houette fu
ncti
on, it
would
be e
a
sy to
u
nderstan
d h
o
w g
ood
an
o
b
ject
placed in a
clu
s
ter i
s
; therefo
r
e, the
quality
of a clu
s
terin
g
ta
sk
wa
s e
n
su
red fo
r the risk
document
s. The
pu
rp
ose of
silho
uette function
i
s
to
repla
c
e t
he
usa
ge of va
riance an
alysi
s
in
the ori
g
inal
p
aper of Pilla
r Algorithm
si
nce
the
va
ria
n
ce
an
alysi
s
can
not d
e
scri
be the
qu
ality
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 2, June 20
16 : 674 – 68
3
680
level of cluster result just
like
silhou
ette has, that is,
∈
1.00
,1.00
. The form
ulati
on is a
s
see
n
in formula (6).
)}
(
),
(
max{
)
(
)
(
)
(
i
b
i
a
i
a
i
b
i
s
(
6
)
Whe
r
e
s(i
)
is the silhou
ette score of o
b
ject
i
,
a(i
)
is the averag
e distan
ce
b
e
twee
n obje
c
t
i
again
s
t all obj
ects
within the same
clu
s
ter of obje
c
t
i
.
b(i)
is
the average di
stan
ce
between o
b
j
e
ct
i
again
s
t all o
b
ject
s in oth
e
r
clu
s
te
rs. By usin
g silh
ou
ette function,
it will be ea
sy to understa
n
d
how well i
s
a
n
obj
ect
pla
c
ed in
a
clu
s
te
r, therefore
th
e qu
ality of a
clu
s
terin
g
ta
sk i
s
e
n
sured f
o
r
the risk do
cu
ments.
From the
experim
ents th
at have bee
n con
d
u
c
ted,
the values
of
α
and
β
play a
signifi
cant rol
e
in silho
uette score.
We
notice
d
that the lowest val
ues of
α
an
d
β
are 0.4 an
d
0.6,
feasibl
e
to
2
1
0
. Any combinat
ion value lower than those
combin
ation
s
are only feasibl
e
to
K=
2
. T
here
are
71
5
clu
s
ter
solutio
n
s, t
hus, it i
s
har
d
to ob
se
rve al
l the
clu
s
ter
solution
s, so
we
deci
ded to pi
ck u
p
the clu
s
ter solutio
n
s
with the high
est silh
ouette
sco
re a
s
liste
d in Table 2.
Table 2. Best
cluste
r solution for ea
ch
k
,
base
d
on sil
houette sco
r
e
K
α
β
Silhouette
Score
Number of E
m
pt
y Cluster
2 0.85
0.9
0.494237
0
3 0.75
0.85
0.660766
0
4 0.95
0.85
0.660766
1
5 0.65
0.75
0.642436
2
6 0.75
0.75
0.642436
3
7 0.7
0.8
0.701234
1
8 0.55
0.75
0.70496
1
9 0.95
0.75
0.574014
4
10 0.7
0.75
0.624935
3
From
Tabl
e 2
,
it seem
s th
a
t
K=8 i
s
the
b
e
st
clu
s
ter
so
lution to m
o
d
e
l the
risk
do
cume
nt
based o
n
the term relati
onship. Neverthele
ss,
whe
n
we ta
ke a
clo
s
er l
o
o
k
, it has a
clu
s
t
e
r
without
any
membe
r
in
it, so
we
ca
n
co
ncl
ude th
at the
cluste
ring ta
sk di
d
not pla
c
e
the
document
properly. T
hen,
the expl
or
ation
contin
ued
with a
ddition
al conditio
n
s
to sel
e
ct th
e
best
clu
s
ter
soluti
on. Additiona
l conditio
n
s
are d
e
fined
as follo
ws: th
e be
st clu
s
te
r sol
u
tion is
only
sele
cted
(1
) if
it ha
s no
em
pty clu
s
ter
an
d, (2
) if it ha
s no
negative
averag
e
silho
uette sco
r
e
a
s
figured in Fi
g
u
re 5.
Figure 5. Co
mpari
s
o
n
bet
wee
n
bad
clu
s
ter
so
lution
and ne
gative averag
e silh
o
uette score
(left), and go
od clu
s
ter
sol
u
tion witho
u
t averag
e neg
a
t
ive silhouette
sco
re (right
)
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Cluster Anal
ysis for SME Risk Anal
ysis
Docu
m
ents B
a
sed on Pillar K-
Means (Irf
an Wahyudin)
681
The se
co
nd addition
al co
ndition ha
s b
een add
ed b
e
ca
use high
silho
uette score do
es
not always
re
pre
s
ent
a sol
u
tion with
go
od qu
ality for each
cluste
r within. Fo
r e
x
ample, for
k
=7,
α
=0.65, an
d
β
=0.8, from t
he silh
ouette
score, they
may be co
nsid
ere
d
on
e
of best cl
uster
solutio
n
s,
but
wh
en it i
s
o
b
se
rved
dee
per, m
o
st
of
its obj
ect
s
h
a
v
e neg
ative
silho
uette
score.
Our explo
r
ati
on fou
nd th
at the b
e
st
clu
s
ter sol
u
tion i
s
k
=6,
α
=0.6
5, and
β
=
0
.8
, b
e
c
a
us
e it ha
s
the highe
st sil
houette sco
r
e
,
s
=0.30
205,
and fulfills bot
h addition
al criteria.
3.5.2. Sum Squared o
f
Error
This evalu
a
tion also h
e
lp
s to understa
n
d
the nature
of the cluste
r solution. It has bee
n
notic
ed that t
he greater the number of
c
l
us
ters
in a
c
l
us
ter solution
is
, the lower the SSE is
resulted. Fo
r
instan
ce, fo
r
k=6, the top
5 clu
s
te
r solu
tions a
r
e li
ste
d
in Ta
ble 3
above. Ta
ble
3
empo
we
rs o
u
r
rea
s
o
n
to add the additi
onal conditi
o
n
s si
nce the clu
s
ter soluti
ons that hav
e not
fulfilled both
additional condition
s tend
to have hi
gher SSE. Th
us, those
are not recomm
ended
as the be
st solution
s, despite having hi
gh silh
ouette
score.
Table 3. Lis
t
of the SSE of
firs
t top 5 c
l
us
ter
s
o
lutions for
k
=6, and
the best cl
ust
e
r sol
u
tion
α
β
Number of
Cluster with Nega
tive Av
g Silhouette
Number of E
m
pt
y Cluster
SSE
0.75 0.75
1
3
7786.063
0.7 0.75
0
3
7802.287
0.6 0.8
2
0
7358.057
0.55 0.8
2
0
7358.057
0.5 0.8
2
0
7358.057
0.65 0.8
0
0
3974.671
3.5.3.
K-Me
a
n
s Execu
tio
n
Time
We al
so pe
rforme
d som
e
comp
ari
s
o
n
betwee
n
two clusteri
ng task where the first ta
sk
wa
s perfo
rm
ed without S
V
D decompi
sition, and t
he se
con
d
task wa
s pe
rforme
d with SVD
decompo
sitio
n
. From the result we
can
see that
by redu
ce the di
mensi
on u
s
in
g SVD we can
save exe
c
utio
n time. As se
en in Figu
re
6, the per
formance of c
l
us
tering task
with SVD (up to 50
ms) i
s
su
rpa
ss the othe
r task that not
using
SVD (400 ms to 4
60 ms). The comp
ari
s
o
n
wa
s
tak
en for these following parameters
: 6
≤
K
≤
7
, 0.5
≤
α
≤
1.0, and 0
.
5
≤
β
≤
1.0.
Figure 6. Clustering performanc
e comparison (by execution time
in miliseconds) on dataset that
decompo
se
d with SVD and
without SVD
3.5.4.
Sentiment Analy
s
is
Performanc
e
Djatna
T. an
d Mo
rimoto
T. [21] u
s
ed
so
rt
ation
an
d ra
nk meth
od to
sel
e
ct
feature
d
nume
r
ical attribute
s
that contain correl
ation
in data
bases. We u
s
ed the sam
e
idea to pop
ulate
and
ran
k
th
e
most im
porta
nt term
s, but
the differe
nc
e
is that i
n
thi
s
resea
r
ch th
e sortation
a
nd
ran
k
ing
were
base
d
on the weight of sentime
n
t score. The
sort
ation wa
s limited up to 20
0
mostly prese
n
ted term
s
which
rep
r
e
s
e
n
t the ch
ar
acter of the
clu
s
ter, a
nd Ta
ble 4
sho
w
s
the
most wei
ghte
d
terms (by selectin
g the term
s
with ne
gative polarit
y then accum
u
lating those TF-
IDF a
nd
se
ntiment
weight
) in e
a
ch
clu
s
ter. To
get proportio
nal m
easure
m
ent, the
Ri
sk
S
c
o
r
e
wa
s g
a
ine
d
b
y
dividing th
e
total term
score
with th
e t
o
tal num
be
r
of do
cume
nts in the
cl
uste
r.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 2, June 20
16 : 674 – 68
3
682
For in
stan
ce, in clu
s
ter 1, the total term weig
ht
is -62
62.070, an
d the total numb
e
r of docume
n
t
s
is 221; thu
s
, the Ri
sk Score is -28.3
35.
To indicate h
o
w big the ri
sk is, the sum
of TF-IDF score an
d neg
a
t
ive sentimen
t score
wa
s accum
u
l
a
ted in each clu
s
ter. Whil
e
each cl
us
te
r has uni
que
chara
c
te
risti
c
s, on the contrary,
each of them
has va
riatio
n se
cto
r
of b
u
sin
e
ss. Th
i
s
indicates th
at spe
c
ific
bu
sine
ss do
es
not
alway
s
h
a
s a
sp
ecifi
c
risk,
so, th
e b
a
n
k
may b
e
mo
re a
w
are a
n
d
more
tho
r
oug
hly in a
nalyzi
ng
every loan
p
r
opo
sal th
at come
s in, n
o
t treating it
in the sa
m
e
way a
s
an
alyzing p
r
evi
ous
prop
osals
wit
h
the same b
u
sin
e
ss
se
cto
r
. The
re
sult
also
sh
ows th
at the type of risk foun
d in t
he
clu
s
ter
solutio
n
is related to
four of 5
C
s
(Cha
ra
cter,
Capa
city, Capit
a
l, Con
d
ition, and Collate
ral)
Cre
d
it crite
r
ia
[20] that are
comm
only used to
make le
nding de
ci
sio
n
. The mostly found crite
r
io
n
in the
co
rpu
s
is
Cap
a
city,
while
Charact
e
r
and
Ca
pita
l are
n
o
t con
s
idere
d
too
si
g
n
ificant, a
s
th
e
top ran
k
ed te
rms d
o
not re
flect these
cri
t
eria.
This result ca
n be used by
the bank to resh
ar
p the
ri
sk a
nalysi
s
si
nce o
n
ly thre
e of the
5Cs
crite
r
ia a
r
e exp
o
sed
si
gnifica
ntly in
at lea
s
t one
cluster.
The
a
nalysts may
have difficuliti
e
s
in an
alyzin
g
Cha
r
a
c
ter si
n
c
e it
ne
eds
more
in
-dept
h inve
stigatio
n in
the fiel
d
.
Ho
weve
r, they
must al
so im
prove the
Ch
ara
c
ter a
naly
s
is
sin
c
e it
is the most im
portant
crite
r
i
on. Capital i
s
th
e
crite
r
ion that t
he analy
s
ts
may rely on the sco
r
ing
system, so th
at they will not b
e
too co
ncern
e
d
with deliveri
n
g the opinio
n
s
.
Table 4. Ri
sk
clu
s
ter an
alysis and its
corresp
ondi
ng 5
C
s
crite
r
ia
Cluster
(Rank)
Risk Analy
s
is
Number of
Documents
Risk
Score
Correspo
nding
5Cs Criteria
1(2)
Related to collateral and asset (fi
x
asset and cur
r
e
n
t asset),
e.g. “asset”, “
p
iutang” (claim), “ta
n
ah”(land
),
“j
ami
nan”(c
ol
l
a
te
ral
)
221 -28.335
Collateral
2(3)
Related to income, e.g. “net
profit”
,
“copat” (cash o
perating
profit after ta
x), “
pendapatan
”(p
rof
i
t), “leverage”
(ga
i
n and loss
rati
o)
213 -27.502
Capacit
y
3(4)
Related to pr
odu
ction capacity
,
e.
g. “persediaan
”(s
tock),
“kapasitas”(capa
city
), “penjualan
”
(
sales)
47 -27.447
Capacit
y
4(5)
Related to both i
n
come and finan
cial measurements, e.g. “net
profit”, “copat
” (c
ash operating p
r
ofit after ta
x), “
p
e
ndapatan”
(profit),
“leverage
”(gain and loss r
a
tio), “equit
y
”, an
d
“roe”
(ret
u
rn of
eq
uity)
34 -26.102
Capacit
y
5(1)
Related to busin
ess condition, e.g. “persaingan
” (
business
competition),
“
w
ila
y
a
h
”(te
rritor
y
), “
demonstrasi w
a
r
ga”
(p
rotest
from local residents)
2 -35.064
Condition
6(6)
Related to busin
ess financial me
asurement, e.g.
“
perputa
r
an”
(business cy
cle),
“quick ratio”, “eq
u
ity”, “
r
etu
r
n of e
quit
y
”,
“roe”
(ret
u
rn of
eq
uity) and
“pr
o
fit”
2 -24.548
Capacit
y
3.5.5. Comp
arison
w
i
th
The Conv
entional Risk Analy
s
ist
The conventi
onal ri
sk a
nal
ysist for SME busin
es
s, m
o
stly are pe
rf
orme
d by onl
y giving
opinio
n
s o
r
comment
s re
g
a
rdin
g the
cu
stome
r
’s
bu
si
ness
con
d
itio
n, without b
e
i
ng abl
e to give a
clea
r ri
sk q
u
antification.
Here
we h
e
l
p
to
imp
r
ove
the process by
qua
ntify the ri
sk th
ro
ugh
sentime
n
t an
alysis, an
d h
opefully will
dire
ctly
help and
imp
r
ove the
de
cisi
o
n
makin
g
proce
ss.
Furthe
rmo
r
e,
we
su
gge
st t
hat in th
e futu
re th
ere
is an
enh
an
ceme
n
t
in this resea
r
ch
by
addi
ng
a
cla
ssifi
cation
feature so that the risk analyst
s
will
be able to cla
ssify the new custo
m
er’s
informatio
n a
gain
s
t the risk clu
s
te
rs.
4. Conclusio
n
In this
res
e
arc
h
, the c
l
us
tering task
s
h
own that
the
r
e
were
six cl
ust
e
rs that
rep
r
e
s
ent t
h
e
risk exp
o
sure
s in SME
bu
siness fina
nci
n
g, whi
c
h
we
re previou
s
ly analyzed by
risk analy
s
ts i
n
a
national p
r
iva
t
e bank in In
done
sia du
rin
g
2013 to ea
rly 2014. The pro
c
e
ss of cl
usteri
ng task
is
performed by
utilizing K-M
eans clus
teri
ng algorithm
optimized by P
illar algorithm iterating some
possibl
e num
ber of clu
s
ters ran
g
ing fro
m
K
=2 t
o
K
=10. This re
se
arch also sho
w
n that senti
m
en
t
analysi
s
which is no
w d
o
m
i
nated by ind
u
stry to gai
n informatio
n from the ma
rket reception
u
pon
their product
s
, can be utilized to
measure the ri
sk l
e
vel despite
of
its limitation
such as only
available in E
nglish.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Cluster Anal
ysis for SME Risk Anal
ysis
Docu
m
ents B
a
sed on Pillar K-
Means (Irf
an Wahyudin)
683
Freq
uent up
d
a
te of data source a
r
e re
qui
re
d to enrich the
kno
w
ledge b
a
se a
nd the
informatio
n re
gardi
ng
risks
in SME bu
sin
e
ss, sin
c
e
thi
s
research
on
ly obse
r
ved
d
o
cum
ents fro
m
2013
to the
b
eginni
ng of
2
014. Thi
s
ca
n
be u
s
e
d
a
s
a
ch
alleng
e fo
r the future
works, to fin
d
t
h
e
best efficie
n
t and effective
method in ad
ding ne
w
info
rmation
whe
n
there are do
cume
nts ad
d
ed.
Referen
ces
[1]
Soares J, Pin
a
J, Ribeiro M
,
Catalao-
Lop
e
s
M.
Quantitati
v
e
vs.
Qualitative Criteri
a
for Credit Risk
Assessment.
F
r
ontiers i
n
F
i
na
nce an
d Econ
o
m
ics
. 20
11; 8(
1): 68-87.
[2]
Chih F
T
, Jhen W
W
. Using n
eura
l
net
w
o
rk
ensem
bles for
bankru
ptc
y
pr
edicti
on an
d credit scori
ng.
Experts System w
i
th Appl
icat
ions
. 20
08; 34:
2639-
26
49.
[3]
Medh
at W
,
Hassan A, Koras
h
y
H. Sentim
e
n
t
ana
l
y
sis a
l
g
o
rithms a
nd a
pplic
atio
ns: A surve
y
.
Ain
Sh
am
s En
gi
nee
ri
ng
Jo
u
r
na
l
. 201
4: 109
3-11
13.
[4]
Xu
K, Sh
ao
yi
S, Li J, Yu
xia
S. Minin
g
com
par
ativ
e o
p
in
io
ns from custo
m
er revi
e
w
s fo
r Comp
etiti
v
e
Intelli
genc
e.
Decision Suppor
t System
s
. 201
1; 50: 743-
754.
[5]
Ghiassi M, Ski
nner J, Z
i
mbir
a D.
T
w
itter br
and se
ntime
n
t anal
ys
is: A hybr
id s
y
stem u
s
ing n-
gram
ana
l
y
sis a
nd d
y
n
a
mic artific
i
a
l
neur
al n
e
t
w
or
k.
Expert Systems w
i
th App
lic
ations
. 2
013; 4
0
(16): 62
66-
628
2.
[6]
Li N, W
u
DD. Using te
xt mi
ni
ng a
nd se
ntim
ent
an
al
ysis fo
r onli
ne forums
hotspot d
e
tection.
Dec
i
sio
n
Supp
ort Systems
. 20
10; 48: 354-
368.
[7]
Xu
H, Z
h
a
i
Z
,
Liu B, J
i
a P.
C
l
u
sterin
g Pro
d
u
c
t F
eatures for
Opini
on M
i
ni
n
g
.
Proceed
in
gs
of the fo
urth
ACM intern
atio
nal co
nfere
n
ce
on W
eb sear
c
h
and d
a
ta min
i
ng. ACM. 201
1: 347-3
54.
[8]
Z
hang
D, Do
n
g
Y. Sema
ntic, Hier
a
rchic
a
l,
Onlin
e Cl
usteri
ng of W
e
b Se
arch R
e
sults.
Advanc
ed W
e
b
T
e
chno
log
i
es a
nd App
licati
ons
.
2004; 30(
07): 69-7
8
.
[9]
Esuli,
Sebastiani.
Sentiw
ord
n
e
t: A public
ly
. Internatio
nal
Confer
ence
on
Lan
gua
ge R
e
sources a
n
d
Evalu
a
tion (
L
R
E
C). 2006; 1: 4
17-4
22.
[10]
T
hel
w
a
ll
M. Se
ntiment Stre
ng
th
Detection in Short Informal T
e
x
t.
Journ
a
l
of the A
m
eric
a
n
Soc
i
ety for
Information Sci
ence a
nd T
e
ch
nol
ogy
. 20
10; 1: 2544-
25
58.
[11]
Gonçalv
e
s P, Benev
en
uto F
,
Araujo M, Cha M.
Compa
r
ing a
nd Co
mbini
ng Se
nti
m
ent Analys
is
Methods
. Co
nferenc
e on Onl
i
ne Soci
al Net
w
orks (COSN). 201
3; 1: 27-38.
[12]
Denecke
K.
U
s
ing Se
ntiW
or
dNet for Multi
l
i
ngu
al Se
nti
m
e
n
t Analysis
. Int
e
rnational Council for Open
and D
i
stanc
e Educati
on
C
onfe
r
ence. 20
08.
[13]
Lun
an
do E, Pur
w
ari
anti A. I
ndo
nesi
an So
cial Med
i
a Se
ntiment An
a
l
ys
is
w
i
t
h
Sarcas
m Detection.
Advanc
ed C
o
mp
uter Scie
nc
e and In
for
m
ati
on Syste
m
s (ICACSIS)
. 201
3; 1: 195-1
98.
[14]
Z
hang Q, Liu
F
,
Xie B, Hua
ng Y. Inde
x S
e
le
cti
on Prefer
ence
and W
e
i
ghtin
g for Unc
e
rtain N
e
t
w
ork
Sentime
n
t Emerge
nc
y
.
T
E
LK
OMNIKA Indon
esia
n Jour
nal
of Electrical E
n
gin
eeri
n
g
. 20
1
3
; 11(1): 28
7-
295.
[15]
Barakb
ah A
R
, Ki
yoki
Y.
A P
illar
Alg
o
rith
m
for K-Means
Optimi
z
a
t
i
o
n
b
y
Distanc
e Ma
ximi
z
a
tio
n
fo
r
Initial Ce
ntroid Desig
nati
o
n
. IE
EE S
y
m
posi
u
m on C
o
mp
uta
t
iona
l Intel
lig
en
ce an
d D
a
ta Mi
nin
g
(CIDM).
200
9; 1: 61-68.
[16]
Mann
ing
CD,
Prab
akhar
R
,
Schütze
H. An
Introduction of
Information
R
e
triev
a
l.
Cambr
i
d
ge:
Cambri
dg
e Uni
v
ersit
y
Press.
200
9: 118-
119.
[17]
Z
hang PY. A
Ho
w
N
et-bas
ed
Semantic R
e
l
a
tedn
ess Ker
n
el for T
e
xt Cla
ssificatio
n
.
TEL
K
OMNIKA
.
201
3; 11(4): 19
09-1
915.
[18]
[18] Osinski S
L
, Stefano
w
s
k
i
J, W
e
iss D. Lin
go: Se
arch
Results C
l
ust
e
rin
g
Alg
o
rith
m Based
on
Sing
ular Va
lu
e Decom
positi
o
n
.
Master
T
hesis.
Poznan: Poz
nan U
n
ivers
i
t
y
of
T
e
chnol
og
y.
2003.
[19]
Rouss
eeu
w
PJ
. Silhou
ettes: a Graphica
l Aid
to
the Interpretation an
d Vali
datio
n of Clust
er Anal
ysis
.
Co
mp
utation
a
l
and Ap
pli
ed M
a
the
m
atics
. 19
86; 20: 53-
65.
[20]
Gumparthi
S. Risk Ass
e
ss
ment Mod
e
l f
o
r A
ssessi
ng
NBF
C
s’ (As
s
et F
i
na
ncin
g)
Customers.
Internatio
na
l Journ
a
l of T
r
ade
, Econo
mics a
nd F
i
na
nce
. 20
10; 1: 121-
130.
[21]
Djatn
a T
,
Morimoto Y. Attribute Selecti
on
for Numeric
a
l
Datab
a
ses tha
t
Contain C
o
rr
elati
ons.
Int J
Software Inform
atics
. 2
008; 2
(
2): 125-1
39.
Evaluation Warning : The document was created with Spire.PDF for Python.