TELKOM
NIKA Indonesia
n
Journal of
Electrical En
gineering
Vol. 12, No. 11, Novembe
r
2014, pp. 79
3
5
~ 794
5
DOI: 10.115
9
1
/telkomni
ka.
v
12i11.64
81
7935
Re
cei
v
ed
Jul
y
14, 201
4; Revi
sed Septe
m
ber
10, 201
4; Acce
pted
Septem
ber 2
8
, 2014
Winner-Takes-
All based Multi-Strategy Learning for
Information Extraction
D
w
i
Hendratmo Wid
y
antoro*, Kurnia
Muludi, Kuspri
y
a
nto
Schoo
l of Elect
r
ical En
gin
eeri
ng & Informa
tic
s
, Institute of
T
e
chn
o
lo
g
y
Ban
dun
g
*Corres
p
o
ndi
n
g
author, e-ma
i
l
: d
w
i
@
stei.it
b
.ac.id
A
b
st
r
a
ct
T
he pro
liferati
o
n of i
n
formatio
n
on
the Inter
net
has
en
ab
l
ed o
ne fi
nd
an
y infor
m
ati
on
he/she
i
s
looki
n
g
for. Ne
verthel
ess, al
most
all
of the
s
e in
f
o
rmatio
n
s
are
desi
g
n
e
d
for h
u
m
a
n
r
ead
ers a
nd
ar
e n
o
t
mac
h
i
ne re
ad
a
b
le. Infor
m
ati
o
n extractio
n
is
a task t
hat
a
d
d
r
esses
the ab
o
v
e
pro
b
le
m by extracting a
pi
ece
of infor
m
ation
from
unstructu
red for
m
ats. T
h
is
pap
er
pro
poses
a w
i
n
n
e
r-takes
-a
ll ba
sed multi-strat
egy
lear
nin
g
for in
formati
on extr
action. Un
like
the maj
o
rity of multi-strate
gy
appr
oach
e
s
that commo
n
l
y
combi
ne the
pr
edicti
on of a
ll b
a
se le
arn
i
ngs
i
n
volv
ed,
our
a
ppro
a
ch takes
a differe
nt strategy by e
m
pl
oyi
n
g
only t
he
best, s
i
ngl
e
pred
ictor
for a sp
ecific
in
formati
o
n
task. T
he b
e
st pr
edi
ctor (a
mon
g
other
pred
ictors)
i
s
ide
n
tified
duri
n
g traini
ng p
has
e usin
g k-fold c
r
oss vali
dat
i
on,
w
h
ich is then r
e
train
ed o
n
the
full train
i
ng s
e
t.
Emp
i
rica
l eva
l
uatio
n on tw
o benc
h
m
arks d
a
ta sets de
mo
ns
trates the eff
e
ctiven
ess
of o
u
r strategy. Out of
26 inf
o
rmatio
n
extraction ca
ses, our strategy o
u
tper
for
m
s oth
e
r infor
m
ati
on
extracti
on al
gor
ith
m
s a
n
d
strategies
in 1
6
cases. T
he
w
i
nner-tak
es-
a
l
l
strategy i
n
g
ener
al e
l
i
m
i
nat
es the d
i
fficult
situatio
n in
mu
lti-
strategy lear
ni
ng w
hen the
majority of
bas
e lear
ners can
n
o
t
make correct
pred
iction, res
u
ltin
g in inc
o
rr
ect
pred
iction
o
n
it
s outp
u
t. In s
u
ch a
cas
e
, the
best pr
edict
or
w
i
th correct pr
edicti
on
i
n
our
strategy w
i
l
l
ta
ke
over for the ov
eral pr
edicti
on.
Ke
y
w
ords
: w
i
nner takes a
ll, mu
lti-strategy
l
earn
i
ng, inf
o
rmation extracti
on
Copy
right
©
2014 In
stitu
t
e o
f
Ad
van
ced
En
g
i
n
eerin
g and
Scien
ce. All
rig
h
t
s reser
ve
d
.
1. Introduc
tion
The
rapi
d g
r
o
w
th of Inte
rne
t
cau
s
e
s
text
ual info
rmatio
n be
com
e
a
b
unda
nce.
Un
til now,
Information
Retrieval techn
o
logy is
not e
noug
h to fulfill a sp
ecifi
c
inf
o
rmatio
n ne
e
d
be
cau
s
e thi
s
techn
o
logy o
n
ly provide
s
i
n
formatio
n in
the leve
l of d
o
cum
ent
coll
ection. Althou
gh current
state-
of-the-art of
sea
r
ch e
ngin
e
allo
ws on
e
find rele
vant
do
cume
nts quickly,
a sig
n
ificant effort
of
cognitive task is
still required to
scan the docum
ents
content in or
der to
extract the informati
on
need
ed. A to
ol or meth
od
that is
able
to
ca
rry
out
su
ch a ta
sk i
s
of
great i
m
po
rta
n
ce
to a
ddress
the informatio
n load proble
m
.
Information E
x
traction is t
he pro
c
e
s
s of aut
omaticall
y
obtaining p
r
e-sp
ecifie
d specifi
c
informatio
n (e.g., events,
entity or its relation
sh
ip
) from un
structu
r
ed
data
su
ch text do
cum
ents
and web pag
es [1]. Information Extracti
on is very
useful for many appli
c
ation
s
such a
s
bu
sin
e
ss
intelligen
ce, a
u
tomatic a
n
n
o
tation on
we
b pag
es, text mining, an
d knowl
edge
ma
nagem
ent. T
h
e
basi
c
ta
sk
o
f
many info
rmation
extra
c
tion
re
se
arches ha
s be
e
n
focused
o
n
na
med
en
tity
recognitio
n
(NER) and rel
a
tion extracti
on in text
s.
Name
d entity is a seque
n
c
e of words t
hat
rep
r
e
s
ent
s a real worl
d ent
ities su
ch a
s
name
s
of person, o
r
gani
za
tion, location
etc [2]. Name
d
entity re
cogni
tion is a ta
sk that attempt
s
to i
dentify t
hese e
n
tities
and th
en to
categori
z
e
the
m
into entity class. Mean
wh
ile, relation e
x
traction
i
s
a
task to d
e
te
ct and id
entify the sema
ntic
relation between
na
med entities,
for example,
the
CEO of a
comp
any o
r
CEO
(
pe
rso
n
,
comp
any). T
he re
cent de
velopment of
informati
on
extraction
re
search ad
dre
s
se
s the issue
o
f
open
informa
t
ion extra
c
tio
n
, whi
c
h
atte
mpts to
disco
v
er imp
o
rtant
relatio
n
a
nd
entities
(with
out
limiting the type of relation
or entity).
This pap
er fo
cu
se
s o
n
the
extraction
me
t
hod fo
r
NER. Two
main
a
ppro
a
che
s
th
at have
been
well
de
veloped fo
r t
he extra
c
tion
of nam
ed
e
n
tity include
(1) rul
e
-base
d
[3-5]
and
(2)
statistical-b
a
sed [6, 7] extraction m
e
tho
d
s. Th
e rul
e
-based a
ppro
a
ch
em
ploy
s extraction rul
e
s,
whi
c
h
ca
n b
e
man
ually
cra
fted or ind
u
ct
ively (
autom
a
t
ically)
gene
rated from trai
ning
example
s
throug
h lea
r
ning p
r
o
c
e
s
s. The maj
o
rit
y
of NE
R
h
a
s b
een
add
resse
d
u
s
ing
the rul
e
-based
approa
ch [3
-5]. The stati
s
tical-b
a
sed m
e
thod
s co
n
s
i
der NER as seq
uen
ce
la
b
e
ling
p
r
obl
em
in
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 11, Novem
ber 20
14: 79
35 – 794
5
7936
whi
c
h
ea
ch t
o
ke
n in
the
text is l
abele
d
eithe
r
e
n
tity or
non
-entity. Belon
g
ing
to
this ap
pro
a
ch
inclu
d
e
s
the one
s employ
ing Hidd
en Markov Mod
e
l, Maximum Entropy Markov Mod
e
l and
Con
d
itional Random Fi
eld [6, 7].
Information
Extraction
ca
n also
be vi
ewe
d
as
cla
ssifi
cation p
r
oblem
whe
r
e
text is
divided i
n
to t
o
ke
ns an
d
cl
assified i
n
to
related
cla
s
se
s [8,
9]. Ge
n
e
rally,
classifi
cation
metho
d
s
requi
re
a l
o
t
of traini
ng
example
s
in
o
r
der fo
r t
he
method
s to
be a
b
le to
g
enerate a
c
cu
rate
extraction
ru
les. Each cl
assificatio
n
method (c
la
ssifie
r) typically biase
s
t
o
wa
rd a
set
of
assumptio
n
s
that the classifier u
s
e
s
to make pr
edi
ctions. As a result, no sin
g
le cla
ssifie
r
is
sup
e
rio
r
ove
r
diverse information extra
c
tion tas
ks
(do
m
ains). Re
se
arche
r
s a
d
d
r
ess this p
r
obl
em
by employing
multi classifiers
(often termed as mu
lti
strategy o
r
hybrid
meth
ods), expecting t
hat
the wea
k
ne
sse
s
of
a
sin
g
l
e
cl
assifie
r
can b
e
co
m
p
e
n
sate
d by t
h
e st
ren
g
th of
other cl
assifi
ers.
The
perfo
rma
n
ce
of m
u
lti
cla
ssifie
r
m
e
thod
s
were
re
ported
bette
r over that of
singl
e
cla
ssifi
er
[10-12].
Most
of
mult
i clas
sif
i
er
s (
m
ulti strategy
) app
roa
c
h
e
s for informati
on extractio
n
proble
m
make
predi
ctions b
a
sed
on the
co
mbination
of
predi
ction
s
of different
cla
ssifie
r
s.
The
differen
c
e
s
a
m
ong them a
r
e mainly in the sp
ecifi
c
method to wei
g
h the signifi
ca
nce (co
n
fiden
ce)
level of ea
ch
cla
ssifie
r
du
ring the
cal
c
ulation of
final predic
tion. Involv
ing multiple classifiers
durin
g the cl
assificatio
n
proc
ess can avoid maki
n
g
inco
rre
ct
predi
ction
when the task is
relatively easy for the majority of c
l
ass
i
fiers
.
Ho
we
ver, it also could
p
r
event
the system
to
prod
uce co
rrect pre
d
ictio
n
particularly
when the
ta
sk i
s
easy to learn (i.e,
can b
e
co
rre
ctly
predi
cted) onl
y
for a few, t
he mino
rity of classifiers. In the latte
r
case, the final prediction will
be
inco
rrect be
cause the incorrect p
r
edi
ct
ion from
the
majority of cla
ssifie
r
s
overshad
ows t
he
corre
c
t pre
d
iction of the minority one.
In this p
ape
r we
pre
s
e
n
t
our
wo
rk on
addr
essin
g
t
he p
r
obl
em
encounte
r
ed
by multi
cla
ssifie
r
s
ap
proa
ch
whe
n
the task i
s
difficult fo
r m
o
st cla
s
sifiers as de
scrib
e
d
above. Rather
than involvin
g all
cla
s
sifie
r
s du
ring
the
cla
s
si
ficatio
n
process, o
u
r a
p
p
r
oa
ch
only empl
oys a
singl
e
cla
ssifi
er fo
r m
a
ki
ng
predictio
n,
selecte
d
am
on
g othe
r
cla
s
si
fiers a
s
the
b
e
st o
ne fo
r th
e
task
duri
ng th
e trainin
g
ph
ase. Thi
s
id
e
a
is ba
se
d on
the prin
ciple
that it is better to let the be
st
expert or
spe
c
iali
st to perform the task
in his/
he
r exp
e
rtise. To o
u
r kno
w
led
ge, there h
a
s b
e
e
n
little prior wo
rk (if any
) that discusse
d and evalu
a
te
d this app
roa
c
h, parti
cula
rl
y in information
extraction ta
sk. The main
pape
r co
ntrib
u
tion is to int
r
odu
ce th
e u
s
e of sin
g
le
best p
r
edi
cto
r
in
multi classifie
r
app
roa
c
h (o
r winn
er
-tak
e
s
-all
strate
gy
)
for informat
io
n extraction.
Our expe
rime
nt
on two do
m
a
ins
with a
total of 26 informatio
n tasks d
e
mo
n
s
trate
s
the
effectivene
ss our
approa
ch.
The rest of t
he pa
per i
s
o
r
gani
ze
d a
s
follows
. Secti
on 2 di
scusses p
r
io
r rel
e
vant wo
rk
and
sh
ows
how their works differ f
r
om ou
r
pro
p
o
se
d meth
od
. In Sectio
n
3, we d
e
scribe
information
extraction as classifi
cation
problem, which will
be
used as
the main skeleton
of
extraction p
r
o
c
e
ss. Th
e det
ail descri
p
tio
n
of our
multi
-
strategy learning
ba
se
d o
n
winn
er-take
s
-
all pri
n
ciople
is presented
in
Sectio
n 4.
Evaluation of
the
p
r
opo
se
d app
ro
ach o
n
two
domai
ns
and the
discu
ssi
on of exp
e
r
iment re
sults is provided i
n
Section
5,
followed by co
nclu
ding
rem
a
rk
in Section 6.
2. Relate
d Work on Multi-Srategy
Learning and Information Extraction
Multi-strategy
learni
ng is
an app
ro
ach
that
attempts to learn problem
s with
variou
s
levels of com
p
lexity by employing multi
p
le ty
pes of biases an
d computation
a
l para
d
igm
s
in
a
learni
ng p
r
o
c
ess. It typically wo
rks
by runni
ng m
u
l
t
iple cla
s
sifie
r
s a
nd m
a
ke
s final d
e
ci
si
on
based on th
e com
b
inatio
n of these
cla
ssifi
e
r
s
o
u
tputs. Assu
mming the
diversity am
ong
cla
ssifie
r
s in
their lea
r
nin
g
para
d
igm
s
, h
o
w
data
and
noise a
r
e
inte
rpreted
and
h
o
w th
e o
u
tput
s
are
ca
refully
combi
ned, th
e pe
rform
a
n
c
e of multi
-
cl
a
ssifie
r
i
s
exp
e
c
ted to
be
bet
ter than
a
sin
g
le
cla
ssif
i
e
r
.
Multi-cl
assifie
r
s ca
n
b
e
ca
tegori
z
ed
int
o
four topol
o
g
ical
con
s
tru
c
tion
s b
a
sed
on th
e
cla
ssifie
r
s
de
cisi
on-ma
king
steps [1
3]. The first is
con
d
itional topol
ogy, a st
rate
gy that sele
cts a
prima
r
y cla
ssifier to perfo
rm the task of
cla
ssifi
cation
and the n
e
xt cla
ssifie
r
is
sele
cted
only if
the earlie
r fai
l
s to identify the new d
a
ta
. The se
co
n
d
is hierarchi
c
al (serial o
r
selectio
n-b
a
se
d)
topology,
whi
c
h
empl
oy cl
a
ssifie
r
s in
sucse
ssi
on
s,
red
u
cin
g
the
n
u
m
ber of
po
ssi
ble
cla
s
ses a
s
it
moves from
one cl
assifie
r
to the next. The third
to
p
o
logy is hyb
r
i
d
, a topology
that adopts t
h
e
strategy fo
r
selectin
g the
best
cla
ssfie
r based o
n
th
e given d
a
ta. Lastly i
s
mul
t
iple (p
arall
e
l
or
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Winn
er-Ta
k
e
s
-All ba
se
d Multi-Strateg
y
Learning fo
r Inform
ation…
(Dwi He
nd
rat
m
o Widyanto
r
o)
7937
fusion
-ba
s
e
d
) topology
wh
ere it first run
s
all
cla
s
sifiers an
d all th
e re
sult
s a
r
e
combi
ned. T
h
i
s
topology is th
e most com
m
on imple
m
e
n
tation in a
multi-cl
assifie
r
s system
s. Prior
works u
s
ing
parall
e
l (fu
s
io
n-ba
se
d) top
o
logy vary greatly in
the selectio
n of combinato
r
ial f
unctio
n
s, whi
c
h
can b
e
fixed (liner, non
-line
a
r or
statistical)
functio
n
s
or dynami
c
all
y
trained functions.
The multi-strategy learnin
g
with multipl
e
(pa
r
allel
)
to
pology ha
s b
een
sho
w
n u
s
eful by
Freitag fo
r informatio
n extraction p
r
obl
e
m
[11]. In
this wo
rk, the cl
assifiers are treated a
s
bla
c
k
boxes
with only their reliability, which
are function
s of classifiers’
confidences, are
considered to
model the co
mbine
r
s. The
confiden
ce i
s
cal
c
ulat
e
d
from the valid
ation set du
ri
ng the trainin
g
phase. Regression
is
th
en applied t
o
map
confi
dence to
the probability of correctness
.
Predi
ction
s
of
a ne
w insta
n
c
e i
s
pe
rform
ed by co
mbin
ing the
calcul
ated proba
bili
ties of vario
u
s
cla
ssifie
r
s to ma
ke the
best
choi
ce.
Their
co
mbin
ation ap
proa
ch im
prove
s
over in
divid
ual
cla
ssifie
r
s. Si
milar
su
ccess on t
he p
a
rallel t
opol
ogy
wa
s al
so
re
ported
by Ne
umann
(), wh
ich
employs two
cla
s
sifiers: (1) M
a
ximum-Entropy
Mo
d
e
ling
ba
sed
cla
ssifie
r
a
n
d
(2
) tree-ba
sed
cla
ssifie
r
ba
sed on
Data
-Oriente
d
Parsing.
Neum
a
nn’s
app
roa
c
h used votin
g
mechani
sm
applie
d by iterative tag-in
sertion alg
o
rith
m.
Similar meth
od in u
s
ing v
a
lidation
set
for estim
a
tin
g
the extra
c
tor’s
pe
rform
a
nce
wa
s
also
empl
oye
d
by
Hiyaku
moto, Lita &
Nyberg [10
].
Duri
ng th
e training
ph
ase,
the b
e
st
extractor
for a
given
e
x
traction
task is identified
durin
g the
tra
i
ning
pha
se. I
n
stea
d of
u
s
i
ng fu
sion
-ba
s
ed
topology, the
i
r extractio
n
pro
c
e
ss i
s
b
a
se
d on
con
d
itional topol
ogy. In particula
r, the be
st
extracto
r fo
r t
he ta
sk (i.e.,
identified
duri
ng the
trai
nin
g
ph
ase) is a
pplied
first. If it fails (i.e., i
t
s
confidence level below
a threshold) the second
best extractor
will be
selected, and so on.
The
work of
Feilmayr, Voji
novic a
nd Pröll [14]
re
pre
s
ents a
hybri
d
informatio
n e
x
traction
approa
ch ba
sed o
n
se
rial
topology. The first pa
rt is kn
owl
edg
e-based inform
ation extracti
on
system
that extract enri
c
hed
fe
atures set
u
s
in
g
m
a
nually
cod
e
d
rul
e
s.
The
result
s of
the
first
part are then
fed to the second pa
rt, wh
i
c
h is in
du
ctive-ba
se
d lea
r
n
e
r.
Other majo
r i
n
formatio
n e
x
traction
sy
stems that h
a
d
bee
n d
e
velo
ped i
n
cl
ude
LP2 [5],
S
NO
W-IE
[15], R
APIER
[4], SRV
[16] and
ELIE
[8]. LP2learns
symb
olic rul
e
s for
identifying
st
art
tag
and
end t
a
g
cl
ass of
sl
ot sep
a
rately [5]. It employs
coveri
ng al
gorithm,
starti
ng from
sp
eci
f
ic
rule
s a
nd att
e
mpts to
gen
erali
z
e i
n
o
r
d
e
r to
cove
r a
s
mu
ch
po
sitive example
s
as
po
ssi
ble.
It
p
e
r
for
m
s tw
o-
s
t
e
p
pr
oc
es
se
s
.
D
u
r
i
n
g
the
fir
s
t s
t
ep
, it
le
a
r
ns
tag
g
i
ng
r
u
le
s us
ing s
i
mp
le
bo
tto
m-
up ge
neralization. Start and e
nd tag
s
a
r
e
con
s
id
ered
po
sitive example
s
while the
re
st are
negative exa
m
ples. The
seco
nd ste
p
is
to choo
se th
e best ge
nera
lization.
SN
O
W-IE is I
n
formatio
n Extraction Syst
em that
is ba
sed on relatio
nal
lea
r
nin
g
a
l
gorithm
[15]. Thi
s
system id
entifiyies text
fragm
ent completel
y
without
se
p
a
rating
st
a
r
t
t
ag a
nd
end
tag.
This
algo
rith
m co
nsi
s
t of t
w
o
step
s. Fi
rst, all
po
sibl
e
text fragme
n
t
s are filtere
d
to sepa
rate
non
relavant ne
ga
tive instance
s
. Two criteri
a
are us
ed fo
r filtering: (a)
if
no general
feature
s
exist
s
on po
sitive examples, an
d (b) the co
nfiden
ce
valu
e of the fragment is less then a given
tresh
o
ld. Eve
r
y frag
ment
candid
a
te i
s
repre
s
e
n
ted
b
y
usin
g p
r
e
-
d
e
fined fe
atures. F
eatures
are
extracted f
r
o
m
three
part
s
; the frag
m
ent it
self, pre
c
ee
ding frag
ment, and fo
llowing f
r
agm
ent
part
.
On t
h
e
se
con
d
st
e
p
,
cor
r
e
c
t fra
g
m
ents a
r
e
colle
cted from th
e
re
st of fragm
ents. Th
e first
step re
sult
s in high re
call,
while the
se
cond on
e re
sul
t
s in high pre
c
isi
on.
R
APIER
uses
Inducti
ve Lo
g
i
c Prog
ram
m
i
ng
to discove
r
extractio
n
rules [4]. Rapi
er doe
s
not sep
a
ratin
g
start tag a
nd end ta
g, but learn to
identify compl
e
te releva
nt string. Botto
m-up
sea
r
ch is p
e
rformed th
rou
gh the mo
st spe
c
ific
rule f
o
r ea
ch exa
m
ple an
d rep
eatedly trying
to
gene
rali
ze to cover m
o
re p
o
sitive exam
ples. Rapie
r
learn
s
rules of
pre-filler
,
post-filler
and
filler
.
Pre-filler tri
e
s to mat
c
h text before
target slot
and post-f
iller tries t
o
macth text
after target
sl
ot.
Every pattern
is a
seq
uen
ce eleme
n
t th
at can
be m
a
tche
d. R
APIE
R
then p
r
o
c
ee
ds to g
ene
rali
ze
these rule
s b
y
selectin
g pairs of rul
e
s
and gen
erali
z
ing them by
obtaining th
e least gen
e
r
al
gene
rali
zatio
n
of ea
ch p
a
ir of rule
s. To
consi
der
all po
ssi
ble p
r
e
-
an
d po
stfiller pa
tterns
wo
uld
be
prohi
bitive so
R
APIER
start
s
generating
pre- an
d post
-fillers from t
he f
iller outwards.
The rul
e
s
are orde
red
by Information Gain and
weig
hted by
the size of th
e rule, with small rule
s be
ing
prefe
rre
d. If
a rule
ha
s n
o
ba
d
pre
d
ict
i
on o
n
trainin
g
exam
ple
s
,
it is ad
ded
to the
final
ru
le,
repla
c
in
g any
less g
ene
ral
rule
s with worse pe
rform
a
n
c
e
s
.
SRV employ
s sim
p
le feat
ure
s
co
mbin
ation and
rel
a
tional features [16]. Different rul
e
sets a
r
e lea
r
ned for cla
s
si
fying each te
xt fragment
as an insta
n
ce
or non-i
n
sta
n
ce of a sing
le
attribute valu
e. SRV learn
s
top-do
wn,
gree
d
ily addi
ng predi
cate
s of some p
r
edefine
d
types.
Rule
s are validated and th
eir accu
ra
cy are e
s
tima
ted
using three-f
o
ld cross validation; and the
three
re
sultin
g rule
set
s
a
r
e then
merg
ed. The
accura
cy estim
a
tions a
r
e av
ailable fo
r e
a
ch
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 11, Novem
ber 20
14: 79
35 – 794
5
7938
predi
ction.
An adva
n
tage
of rel
a
tional
l
earn
e
rs i
s
bei
ng a
b
le to
a
c
quire
po
we
rfu
l
rel
a
tional
rul
e
s
that cover a
large
r
and
more flexible
contex
t than most othe
r rule-l
earning
and statisti
cal
approa
che
s
.
The do
wn
sid
e
is that the
large
sp
ac
e of possible
rules
can l
e
a
d
to high trai
ning
times and the
r
e is no g
u
a
r
a
n
tee of findin
g
optimal rul
e
s (lo
c
al maxi
ma pro
b
lem
)
.
The ELIE
system a
dopt
s
Suppo
rt Vect
or M
a
chine
s
(SVMs) for
B
egin
/
End
tagging [8].
Highly imp
r
o
v
ed re
sults
a
r
e a
c
hieve
d
by augme
n
ti
ng this
setup
with a seco
nd level (L2) of
begin
/
en
d
cla
ssifie
r
s. Th
e L2 end cl
assi
fier focu
se
s on finding su
itable
end
ta
gs for mat
c
hi
ng
left-over
begi
n
tag
s
from t
he first-level
(L1)
begi
n
cla
ssifie
r
, an
d th
e L2
be
gin
cl
assifier
matches
left-over
end
tags.
While
the L
1
cla
ssifi
ers a
r
e t
r
aine
d on
a ve
ry h
i
gh n
u
mbe
r
o
f
token
s
, al
m
o
st
all
of whi
c
h a
r
e negative
i
n
stan
ce
s (O),
t
he
L
2
cla
ssifi
ers o
n
ly con
s
ider t
he
nea
r
context of l
e
ft-
over L1 b
egi
n/end tag
s
which all
o
ws a
more fo
cu
se
d cla
ssifi
catio
n
. Hen
c
e the
L1 cla
s
sifiers
must be tu
ne
d to favor pre
c
isi
on ove
r
re
call to av
oid
prod
uci
ng lot
s
of false
po
sitives (spu
rio
u
s
extraction
s) from all
over t
e
xt,
but the
L
2
cl
assifie
r
s
can
be
tun
e
d
to favor reca
ll over p
r
e
c
isi
o
n
sin
c
e th
ey on
ly cla
ssify a
v
e
ry
small
su
b
s
et of
all th
e
token
s
. In
thi
s
way, by
ad
ding th
e
se
co
nd
level the reca
ll of the overall system can
be in
crea
se
d without overly
hurting the p
r
eci
s
io
n.
3.
Token
Class
ification
-
bas
e
d Informa
t
ion Extra
c
tio
n
Information
Extraction
(I
E) in thi
s
pa
per is cast
as to
ke
n
cla
ssifi
cation
p
r
oblem. In
c
ontras
t
to text c
l
as
s
i
fic
a
tion that attempts
to
cate
go
rize th
e entire text, the to
ken
cla
ssifi
ca
tion
predi
cts
wh
ether a to
ke
n
is a me
mbe
r
of a pre
-
sp
ecified i
n
formation ne
ed
(extra
ction
slot).
Ca
sting IE a
s
token
classificatio
n
pro
b
lem ha
s a
n
advantage i
n
that it
has the flexibility for
sele
cting the
variety of classificatio
n
me
thod
s, whi
c
h h
a
ve alrea
d
y been well studi
ed.
In c
l
as
s
i
fic
a
tion-based IE, texts
are
split
into toke
ns. Ea
ch to
ken is
en
cod
ed with
feature
s
an
d
is lab
e
led
with its
cla
s
s (eithe
r bel
o
nging to
an
extraction
slo
t
or not) fo
r
the
learni
ng p
r
o
c
ess. Any cla
ssifi
cation
al
gorithm
ca
n then b
e
appli
ed to create
the cla
s
sifica
tion
model of ea
ch extra
c
tion
slot. Given a new
text, the extracti
on pro
c
e
s
s is perfo
rme
d
by
identifying the classe
s of toke
ns an
d filling in extracti
on slots fro
m
a sequ
en
ce o
f
tokens with t
h
e
same cla
s
s
e
s
.
In this pa
per,
we a
dopt th
e cla
s
sificatio
n
-ba
s
e
d
IE d
e
velope
d by Finn [8] and
Siefkess
[9]. For a given extra
c
tion
tasks, the sy
stem empl
oys two cl
assifi
ers: (1)
o
ne i
dentifies the
start
token
of frag
ments
and
(2
) an
other i
d
e
n
tifies the
e
n
d
toke
n of fra
g
ments. T
he
system
extra
c
ts
fragme
n
t s f
r
om the
start t
o
ke
n to the
cl
ose
s
t follo
win
g
end
token.
The two
classfiers, du
ring
the
training p
h
a
s
e, creat
e two
model
s of be
gin and e
nd toke
ns in
dep
e
ndently.
Becau
s
e
a to
ken it
self is n
o
t enoug
h for learni
ng the
token
cla
s
sification, a
n
instance i
s
descri
bed a
s
a token feat
ure ve
ctor. T
herefo
r
e, ea
ch instan
ce i
s
labeled a
s
e
i
ther
s
t
art tag
or
not for lea
r
nin
g
the model
o
f
start toke
n, and it is lab
e
l
ed as
end tag
or not for m
o
delling the
en
d
token. The f
eature ve
ctor of a token in this pap
er
con
s
i
s
ts of
w
i
ndo
w si
ze
,
Part of Speech
(POS),
O
r
togr
a
p
h
y
,
To
ken
k
ind
,
L
e
mma
and
Lo
oku
p
. The win
d
o
w
size dete
r
mine
s the numb
e
r
of
tokens
before and after the current
token that
will be
considered.
E
a
ch token i
s
also tagged
with
its part of sp
eech (e.g., Noun, Verb, et
c). The
or
tog
r
aphy features includ
e wh
ether the to
ken
is
capitali
ze
d, uppe
r-ca
se
or
l
o
we
r-ca
se.
To
ken
k
in
d
de
scribe
s
wheth
e
r the
token
is a
word
,
nume
r
ic,
sym
bol o
r
p
u
n
c
tu
ation.
L
e
mma
is th
e b
a
si
c f
o
rm
of token
as
a
re
sult of
morphol
ogi
cal
analysi
s
. L
o
o
k
up
p
r
ovide
s
a value
a
s
so
ciated
with
th
e token,
base
d
on
u
s
e
r-d
efined
dictio
nary.
It contain
s
th
e list of city
name
s
, co
unt
ry name
s
, first name
s
, an
d last n
a
me
s. For exam
ple
,
a
“ban
dun
g” to
ken
coul
d be
assign
ed a value “city_na
me”.
Figure 1 de
scribe
s the
learni
ng a
nd extra
c
tio
n
pro
c
e
s
s
of cla
ssifi
cat
i
on-b
a
sed
informatio
n e
x
traction. T
h
e lea
r
nin
g
p
r
oce
s
s r
equi
res t
w
o
sets
of trainin
g
d
a
ta for l
earni
n
g
s
t
art_tag
mo
del and
en
d_
tag
model
s, resp
ectively. The trainin
g
data are
sets of token feature
vec
t
or
fv
t
a
n
d
their lab
e
l. The L
EARN
_M
ODE
L
in the
figure imple
m
ents a
n
y inductive lea
r
ni
ng
algorith
m
for cla
ssifi
cation task (i.e.,
Naïve
Baye
s, De
cisi
on Tree, Nea
r
e
s
t Neig
hbou
r, etc.). The
extraction p
r
oce
s
s co
nsi
s
ts of two ste
p
s. The firs
t step is to classify all tokens a
s
eithe
r
a
st
art
_
tag
, an
end
_
tag
o
r
n
one of th
em.
In the
se
con
d
step, fo
r e
a
c
h id
entified
star_ta
g
tok
e
n, it
will
extract a sequence
of t
o
kens from
the
st
a
r
t
_
tag
tok
e
n to the
c
l
os
es
t following tok
e
n that
is
an
end
_
tag
.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Winn
er-Ta
k
e
s
-All ba
se
d Multi-Strateg
y
Learning fo
r Inform
ation…
(Dwi He
nd
rat
m
o Widyanto
r
o)
7939
Learn_E
x
tr
a
c
tion_M
odel (
L,
TD
)
Input
: Lis an
inductive lea
r
ning al
go
rithm.
TD= Trai
ning
_data
start_tag
&
Traini
ng_
data
end_ta
g
/*
Training
_data
sta
r
t_tag
〈
,
〉
where
,
/*
Training_
d
a
ta
end_tag
〈
,
〉
wh
ere
,
start_ta
g_m
odel
L
EARN
_M
ODE
L
(L,
Training
_data
sta
r
t_tag
)
end_ta
g_m
od
el
L
EARN
_M
ODE
L
(L,
Training_
data
end_tag
)
Outpu
t
: (
start
_
tag_m
odel
,
end_ta
g_m
od
el
)
____
____
___
____
____
___
____
____
___
____
____
___
____
____
___
____
____
___
____
____
Extrac
t (
L,
E
x
tra
c
tion_
Mo
del, D
)
Input
: Lis an
indu
ctive learning alg
o
rith
m.
Extra
c
tion_M
odel
=
(
start
_
tag_m
odel
,
end_ta
g_m
od
el
)
D
=
is a docu
m
ent
/*
First stag
e
: classification of each token
wheth
e
r
or not belo
ng
to
s
t
art_tag
,
e
nd_tag
for eac
h
token t
in
docum
en
t D
do
:
start_ta
g_lab
el
t
C
LASSIFY
(L,
start_ta
g_
m
odel
,
)
end_ta
g_lab
e
l
t
C
LASSIFY
(L
,
end_tag
_m
odel
,
)
end for
/*
Second stage
: extractin
g
phra
s
e b
e
twee
n (in
c
ludi
ng)
tokens of
start and e
n
d
tags
extra
c
tion
_re
s
ult
= {}
for eac
h
sta
r
t_tag_la
bel
t
== ‘+’
do
i
Position (
t
) /
/
t
is i-
th
tok
en in
D
j
min
Position (
t
) where
en
d_ta
g_lab
el
t
= ’+’
and
Pos
i
tion
(
t
)
i
extra
c
tion
= seque
nce of token fro
m
i-
th
to j-
th
position.
Res
u
lt
Re
sult
ex
tr
ac
tio
n
end for
Outpu
t:
Re
s
u
lt
Figure 1. Lea
rning a
nd Extractio
n
Pro
c
e
ss of
Cl
assification-b
a
sed Informatio
n Extraction
4. Winner-Ta
kes-All Multi-Stra
tegy
Learning for In
formatio
n Extrac
tion
In additio
n
to
empl
oying
multiple l
earn
i
ng al
go
rithm
s
with
diverse
learni
ng
pa
radig
m
s
(bia
se
s) du
ri
ng the
trai
nin
g
ph
ase,
m
o
st m
u
lt
i-st
rat
egy ap
proa
ches al
so
u
s
e
multiple
mo
dels
durin
g the p
r
edictio
n pha
se by com
b
ini
ng the
mo
de
ls’ outp
u
ts in
one
way or anothe
r. Thi
s
approa
ch
will
gen
erally
wo
rk well
when
the majo
ri
ty o
f
base-le
arne
rs are
a
b
le to
co
rrectly le
arn
the und
erlyin
g con
c
epts.
Ho
wever, thi
s
is n
o
t the
ca
se
whe
n
the
probl
em i
s
dif
f
icult for m
o
st
of
the ba
se-l
earners such tha
t
they mo
stly inco
rrecly p
r
e
d
ict the o
u
tpu
t
s, cau
s
in
g a
final pre
d
ictio
n
that will be likely incorrect.
Our
strate
gy to address th
e above o
b
servation
s
is t
o
use a si
ngl
e, best le
arn
e
r du
rin
g
the predi
ction
pro
c
e
s
s. If a
learne
r i
s
id
entified
a
s
th
e be
st o
ne f
o
r a
pa
rticula
r
ta
sk (amo
ng
other lea
r
n
e
rs) du
rin
g
the training p
h
a
s
e, then it
is better to use t
hat best lea
r
ner for p
r
edi
cting
that particular tas
k
(i.e, the winne
r takes all). Therefore, other lea
r
n
e
rs th
at do n
o
t perform well
on the
task will not inte
rfere the
predi
ction of th
e
be
st
one.
A
s
a
n
extreme exa
m
ple, con
s
id
er a
task of extra
c
ting
“date
”
f
i
eld. A lea
r
n
e
r that
l
e
a
r
n
s
the
re
gula
r
expre
s
sion
of variou
s
d
a
te
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 11, Novem
ber 20
14: 79
35 – 794
5
7940
formats
will likely become
the best learner, compar
ed to other learners
such as Bayesi
an and
Nea
r
e
s
t Neig
hbor le
arners. With majority voti
ng,
the corre
c
t pre
d
iction of re
g
u
ler exp
r
e
ssi
on
learn
e
r
co
uld
be ea
sily d
e
feated by t
he othe
r
two
learn
e
rs th
at might o
c
a
ssi
onally ma
ke
inco
rrect p
r
e
d
iction. T
h
is will not b
e
the ca
se
if
the be
st (regule
r
exp
r
e
ssi
on) l
e
a
r
ne
r is
con
s
i
s
tently employed for e
x
tracting “dat
e” field.
Given a
set o
f
base le
arn
e
rs
L
=
{
L
1
, L
2
, ...,
L
n
}
,
and a
set of i
n
form
a
t
ion extra
c
tio
n
tasks
S
=
{
s
1
, s
2
, ...
, s
m
}where
s
i
is an info
rmat
ion slot to be
extrac
ted, o
u
r
strate
gy can
be outlined a
s
follows
:
a)
Find the be
st learn
e
r
L
i
in
L
for each extraction ta
sk
s
in
S
.
b)
Re-t
rain the b
e
st learne
r for each
extrac
tion task
us
ing full training data.
c
)
For extrac
tion task
s
,
app
ly only the b
e
st lea
r
ne
r
(as id
entified i
n
Step 1 a
n
d
re-
trained in Ste
p
2) for that task.
F
IND
_B
EST
_L
EA
R
N
E
R
(
L
,
s
,
TD
s)
Input
: A set of base le
arn
e
rs
L
=
{L
1
, L
2
, ..
., L
n
},
s is a specified information task ,
TD
S
is traini
n
g
data for info
rmation ta
sk
s.
Max_
ac
cu
ra
c
y
0;
Best_Le
arner
null;
for eac
h
L
i
in
Ldo
:
accuracy
P
ERFORMANCE
(
L
i
, TD
S
)
if
(
accu
ra
cy
> Max
_
accu
ra
cy
)
Max_
ac
cu
ra
c
y
ac
cu
ra
cy
Best_Le
arner
L
i
end for
Outpu
t
:
Best
_Lea
rne
r
Figure 2. Finding the Best
Learning Alg
o
rithm for a S
pecifi
c
Extraction Task
Figure 2 de
scrib
e
s th
e al
gorithm fo
r finding th
e be
st learne
r for a sp
ecifi
c
e
x
traction
task.
Given
a
sp
ecifie
d inf
o
rmatio
n extraction
task
(e.g., name of
city), trai
ning data for the tas
k
,
and a
set of
indu
ctive learnin
g
alg
o
rit
h
ms, it
mea
s
ures th
e pe
rforma
nce of
each indu
cti
v
e
learni
ng al
go
rithm on
the
same
traini
n
g
data
and
select the
one
with the
be
st accuracy
(the
winn
er). The
functio
n
P
ERFORMANCE
employs
k
-fol
d cro
s
s valid
ation in
orde
r to m
e
a
s
ure
the
accuracy
of l
earni
ng
algo
rithm.
k
-f
old
cross validatio
n is a
n
exp
e
r
imental
met
hod th
at divi
des
training
data
into
k
p
a
rtit
ions,
and
ru
n
k
expe
rim
ents
wh
ere
i
n
ea
ch
i
th
ex
perim
ent, the
i
th
partition i
s
used a
s
th
e val
i
dation
set
while the
re
st
of partition
s f
o
r trainin
g
se
t. The ind
u
cti
v
e
learni
ng alg
o
rithm is traine
d usin
g traini
ng set an
d its accura
cy is
measured u
s
i
ng the validat
ion
set. The final
accuracy is a
v
erage
d over
the
k
trials
.
The d
e
tail of
our
extra
c
tion
strate
gy is d
epict
e
d
by Fi
gure
3.
Duri
n
g
the trainin
g
pha
se, it
first id
entifies the
be
st in
ductive l
e
a
r
n
i
ng al
gorit
hm for
eac
h
extrac
tion tas
k
(i.e., s
t
ored in
variable
Be
st
_Lea
rne
r
). T
he extra
c
tion
model of ea
ch ex
tra
c
tion
task i
s
then
re-l
earned u
s
ing
the full trainin
g
data u
s
ing
the best in
du
ctive lear
ning
algorithm fo
r the task. Fin
a
lly, for a given
information task, it will use only the best
ex
traction m
o
del for the extraction process.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Winn
er-Ta
k
e
s
-All ba
se
d Multi-Strateg
y
Learning fo
r Inform
ation…
(Dwi He
nd
rat
m
o Widyanto
r
o)
7941
/* Training Phase
Input
: Learn
e
rs
L
=
{L
1
, L
2
, ..
., L
n
},
Information E
x
traction Ta
sks
S =
{s
1
, s
2
, ...
, s
m
},
Traini
ng data sets
TD
=
{T
Ds
1
, TDs
2
, ...,TDs
m
}.
/*
Find the be
st learn
e
r for
each task
for eac
h
s
in
S
do
:
Best_Le
arner
[
s
]
F
IND
_B
EST
_L
EA
R
N
E
R
(
L,s, TD
s
)
end for
/*
Re-l
earn the extraction
model for e
a
ch task
for eac
h
s
in
S
do
:
Extra
c
tion_M
odel
[
s
] =
L
EA
R
N
_E
XTR
A
CTI
O
N
_M
ODEL
(
Best_L
earne
r
[
s
]
,TD
s
)
end for
Outpu
t
: (
Be
st_Learne
r
,
Extraction
_Mo
d
e
l
)
____
____
___
____
____
___
____
____
___
____
____
___
____
____
___
____
____
___
____
____
/* Extractio
n Phase
Input
: Information Extrac
tion Tasks
S = {s
1
, s
2
, ...
, s
m
}
a document
D
=
for eac
h
s
in
S
do
:
Ex
trac
tion
_
Res
u
lt
[s
] =
E
XTR
A
C
T
(
Best_Learne
r
[s
],
Extra
c
tion_
Mo
del
[s
], D)
end for
Outpu
t:
E
x
trac
tion
_
Result
Figure 3. Multi-Indu
ctive Le
arnin
g
St
rategy for Informat
ion Extracti
on Task
5. Ev
aluation
This
se
ction
descri
b
e
s
ou
r experi
m
ent
s to
evaluate
the perfo
rm
ance of ou
r prop
osed
strategy i
n
co
mpari
s
o
n
wit
h
othe
r
well
know i
n
form
ation extra
c
tion
method
s a
s
well a
s
m
e
th
ods
based on m
u
lti-strategy learni
ng. We
employ
F-m
easure
(van Rij
s
be
rge
n
, 1975
) as the
perfo
rman
ce
measure and
we treat the p
r
eci
s
io
n
and
recall in the F
-
measure equ
ally as follows:
(
1
)
Preci
s
io
n is
a
portion
of co
rre
ct extra
c
tio
n
over
all
extracted fragm
e
n
ts, whil
e re
call mea
s
u
r
e
s
a
portion
of
co
rrect
extra
c
tio
n
ove
r
all
rel
e
van
fragme
n
ts.
High
p
r
eci
s
ion
ge
nerally ca
uses l
o
w
recall, and vice versa. Th
e
F-m
easu
r
e
p
r
ovides the b
a
l
ance betwee
n
pre
c
isi
on a
nd re
call.
5.1. Data Se
t
For the
eva
l
uation
we
con
s
id
er t
w
o
ben
chm
a
rk data
sets
comm
only u
s
ed fo
r
informatio
n e
x
traction ta
sk. The first dat
a set is
Re
uters Co
rpo
r
at
e
Acqui
sition
(Freitag,
199
8b).
It consi
s
ts
of 600 a
r
ticle
s
retrieve
d fro
m
Reute
r
Ne
ws
wir
e
s
cov
e
ring ne
ws a
bout
corp
orate
acq
u
isitio
ns. The second data
set
is Job
Postin
g, whi
c
h
con
s
i
s
ts of 30
0 ne
wsgroup
me
ssage
about job va
cany in Austin, Texas. Table
1 & 2 pr
ovide the summ
ary of the data
sets u
s
e
d
in the
experim
ents.
F
_
mea
s
ur
e
2.
pre
c
is
ion
.
re
call
pre
c
isi
o
n
re
call
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 11, Novem
ber 20
14: 79
35 – 794
5
7942
Table 1. Sum
m
ary of
Co
rp
orate Acquitition
Data Set
No IE
T
a
sks
#Event
Ex
ample
1 acqabr
1450
Norc
ros
2 acqloc
213
Southern C
a
lifornia
3 acquired
683
Nor
c
r
o
ss Plc
4 dlramt
283
542.2
M
L
N S
T
G
,
alm
o
st a billion dlrs, not discl
o
5 purchabr
1263
Sahlen, Sahlen a
nd Associates
6 purchaser
624
Sahlen and Associated Inc
7 seller
267
CSR Ltd
8 sellerabr
431
CSR
9 status
461
proposed, ap
pro
v
ed, agreed in p
r
inciple
Table 2. Sum
m
ary of
Job Posting
Data Set
No IE
T
a
sks
#Event
Ex
mples
1 application
605
DB2, Oracle, DB2 server, sybase
2 area
980
Failure analysis,
m
u
lti
m
edia, TCP/
IP, internet
3 city
639
Austin, Battle Creek, San Antonio
4 compan
y
291
Alliance, CPS, Charter P
r
ofessional Services Inc
5 countr
y
363
US, USA, Englan
d, UK
6 desired_degre
e
21
PhD, BS, BSCS,
M
a
sters,
M
S
CS
7 desired_
y
e
a
r
s_e
x
perience
45
5, 4, 10
8 id
299
NE
W
T
News.872347949.117
38.consults.ws-n
9 language
867
RPG
,
CO
B
O
L, CICS, Java, c, c++
,
SQ
L
10 platform
705
AS400,
W
i
ndows
95, windows, po
rtable system
, P
C
11 post_date
288
30 Aug 1997, 1
1
Sep 1997
12 recruiter
325
Resource Spectr
um
13 req_deg
ree
80
BS, B.S., Bachelor, Bachelor’
s
, B
S
CS
14 req_
years_e
x
pe
r
i
ence
173
2, 2+, Two, 5,
4
15 salar
y
143
$50k to $70k, to
$60k
16 state
462
TX, Texas,
M
i
a
m
i, Georgia,
M
I
17 title
466
ALC Application Progra
m
m
e
r, Visual Basic Develo
pers
5.2. Setup of Experiments
As d
e
scri
bed
in e
a
rlie
r
section,
our
multi-strategy
app
roa
c
h
requires seve
ral b
a
se
indu
ctive learning alg
o
ritm
s from which
to sele
ct th
e best lea
r
ni
ng algo
rithm
for a spe
c
ifi
e
d
information task
. In this
experiment we inc
o
r
porate Perceptro
n Algorithm with
Uneven Ma
rgin
(PA) [18], Su
pport Ve
cto
r
Machi
ne
(SVM), Aver
ag
ed
One
-
Depe
nd
ence Estimat
o
rs (AO
D
E) [19]
and
k
-Ne
a
re
st Neighb
ou
r (
k
-NN) [20]
as th
e ba
se
learn
e
rs
,
whi
c
h provid
es diverse
le
arn
i
ng
biases.
For pe
rform
ance comp
a
r
iso
n
with o
t
her exis
ting
multi-strate
gy lea
r
ning
s,
we
run
experim
ents
usin
g Voting and Baggi
ng
algorith
m
s.
Voting algo
rith
m has be
en
comm
only used
as the ba
sel
i
ne for com
b
ining cl
assif
i
ers by
averaging the p
r
obability esti
mates ove
r
the
combi
ned
cla
ssifie
r
s [21]. The impl
eme
n
tation of Vot
i
ng alg
o
rithm
is ba
se
d on
Kittler et. Al [22]
and Kun
c
hev
a [23]. We use the sam
e
b
a
se le
arners
for Voting alg
o
rithm a
s
in o
u
r multi-strate
gy
approa
ch. T
h
e Bag
g
ing
al
gorithm
is im
plemente
d
a
s
in Breima
n [
24]. It po
se
s
a meta
-lea
rni
n
g
algorith
m
wh
ere
a numb
e
r
of cla
s
sifica
tion
iterati
o
n
is
pe
r
f
or
med
o
n
l
y
b
y
a s
i
ng
le
c
l
as
s
i
fier
in
order to reduce va
riance.
Its predi
ction is based
on the av
erage of
probability estimates. AODE
is employe
d
as the cla
ssifie
r
of Bagging alg
o
rit
h
m. Addition
ally, we also comp
are the
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Winn
er-Ta
k
e
s
-All ba
se
d Multi-Strateg
y
Learning fo
r Inform
ation…
(Dwi He
nd
rat
m
o Widyanto
r
o)
7943
perfo
rman
ce
of WTA
-
ba
sed multi
-
strategy ap
pro
a
ch with
well
kno
w
n
information extra
c
tion
algorith
m
s, in
cludi
ng R
APIE
R
[4],
SRV [16], ELIE
[8], L
P
2 [5], SN
O
W
-
IE [15].
5.3. Results
Table 3
sh
ows the p
e
rfo
r
m
ance compa
r
i
s
on
on
Reutu
r
e
’
s Corpo
r
ate
Acquition
dat
a set.
Out of nine
informatio
n
tasks, ou
r
WTA-ba
s
ed
multi-strategy
approa
ch o
u
tperfo
rms
o
t
her
algorit
h
m
s in
sev
en
ca
se
s (
ac
qa
b
r
,acq
lo
c
,
ac
qu
ir
ed, d
l
r
a
mt, p
u
r
c
h
ab
r
,
p
u
r
c
ha
s
e
r
, and
st
a
t
us
).
The be
st results of the ot
her
ca
se
s (
seller
a
nd
sell
erab
r
)
are provided by th
e SRV alg
o
ri
thm.
The avera
ge perfo
rman
ce
of
ou
r strateg
y
(0.46
)
i
s
al
so si
gnificantly high
er t
han
the oth
e
rs. Th
e
domina
n
t ba
se lea
r
n
e
rs e
m
ployed in o
u
r mult
i-strat
egy app
roa
c
h are SVM a
nd PA indu
ctive
learni
ng alg
o
r
ithms.
Seller
and
sell
era
b
r
are amo
n
g
the most
difficult information extra
c
tion
tasks si
nce al
l algorithm
s a
nd strat
egie
s
suffer fro
m
lo
w
F-m
e
a
s
ure
on these tasks.
Table 3. Perf
orma
nce Co
mpari
s
o
n
on
Reute
r
s Co
rp
orate
Acqui
sition
Data Set
R
APIER
[
4
]
SRV[16]
ELIE[8]
Mullti-
Str
a
tegy
Approach
WT
A
-
b
ased
Multi-S
t
rate
g
y
A
pproa
c
h
Voting Bagging
acqabr
0.26
0.38
0.40
0.31
0.24
0.57
(SVM)
acqloc 0.24
0.22
0.34
0.05
0.25
0.47
(SVM)
acquired 0.29
0.39
0.43
0.06
0.22
0.51
(SVM)
dlramt 0.39
0.62
0.59
0.09
0.34
0.65
(PA)
purchabr
0.24
0.48
0.29
0.28
0.39
0.49
(SVM)
purchaser
0.28
0.45
0.46
0.22
0.33
0.52
(PA)
seller 0.15
0.23
0.16 0.00
0.23
0.22
(SVM)
sellerabr 0.09
0.25
0.13 0.00
0.24
0.21
(SVM)
status 0.41
0.47
0.50
0.12
0.25
0.53
(PA)
Average
0.27
0.41
0.39
0.13
0.28
0.46
Table
4 d
epi
cts th
e p
e
rfo
r
mance
com
p
arison
on
Job
P
o
st
ing
data
set. Althou
g
h
not
as
domina
n
t as in the
Corp
orate Acquisi
tion
data set
,
our WTA-b
a
se
d multi-st
rategy algo
rithm
achi
eves hig
hest pe
rform
ance on nin
e
cases (
cit
y
, com
pan
y, desired_
de
gree, platform
,
recruite
r, req
_deg
ree, sala
ry, state,
an
d
title
) out of 17 cases. Th
e h
i
ghe
st perfo
rmance on oth
e
r
ca
se
s a
r
e
a
c
hieved
by R
APIER
(2
c
a
s
e
s),
LP
2
(5
ca
se
s)
and
S
N
O
W-I
E
(
2
ca
se).
The
b
e
st
perfo
rman
ce
on
p
a
st_
date
is sh
ared by
R
APIER
, LP
2, SN
O
W
-
I
E an
d
ou
r s
t
ra
teg
y
s
i
nc
e th
ese
method
s pro
duce the sa
me
F-m
easure
(0.99). Similar to
Corp
orate Acqui
si
tion
data set, the
averag
e p
e
rf
orma
nce of
o
u
r m
u
lti-
st
rate
gy learning
(0
.81) in
Job
P
o
st
ing
data set is
significantly
highe
r than th
e other meth
ods.
Table 4. Perf
orma
nce Co
mpari
s
o
n
on
Job Po
sting
Data
Set
R
APIER
[
4
]
L
P2[5] SN
O
W-IE[15]
Mullti-
Str
a
tegy
Approach
WT
A
-
b
ased M
u
l
t
i-
St
ra
t
e
g
y
A
pproa
c
h
Voting Baging
application
0.69
0.78
0.61 0.17
0.20
0.74
(PA)
Area
0.42
0.67
0.52 0.06
0.08
0.57
(PA)
Cit
y
0.90
0.93
0.89
0.74
0.50
0.95 (SVM)
Compan
y 0.70
0.72
0.75
0.62
0.34
0.82 (P
A
)
Countr
y
0.93
0.81
0.95
0.64 0.57
0.59
(PA)
desired_degre
e
0.72
0.65
0.61
0.14
0.06
0.74 (P
A
)
desired_
y
e
a
r
s _e
xperience
0.87
0.60 0.79
0.71
0.79
0.86
(SVM)
Id 0.97
1.00
0.99 0.73
0.57
0.99
(SVM)
Language
0.81
0.91
0.82 0.37
0.39
0.88
(PA)
Platform 0.72
0.80
0.74
0.33
0.30
0.82 (P
A
)
post_date
0.99 0.99
0.99
0.97 0.97
0.99
(SVM)
Recruiter
0.68
0.81
0.85
0.57
0.59
0.87 (P
A
)
req_deg
ree
0.81
0.85
0.83
0.48
0.37
0.86 (P
A
)
req_
years _e
xpe
r
ience
0.67
0.69
0.84
0.63 0.62
0.81
(SVM)
Salar
y
0.67
0.63
0.73
0.49
0.25
0.84 (P
A
)
State 0.90
0.85
0.92
0.61 0.40
0.92 (SVM)
Title 0.40
0.44
0.53
0.27
0.16
0.69 (SVM)
Average
0.75
0.77
0.79
0.50
0.42
82,1
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 11, Novem
ber 20
14: 79
35 – 794
5
7944
6. Conclusio
n
This pap
er has de
scrib
e
d
ou
r
wi
nne
r-ta
k
e
s
-all
b
a
se
d multi-st
rategy le
arni
ng for
informatio
n e
x
traction,
whi
c
h i
s
cast a
s
cla
ssifi
cation
probl
em. Unli
ke m
u
lti-st
rat
egy learning t
hat
gene
rally co
mbine
s
the p
r
edi
ction
s
of
variou
s le
a
r
ni
ng alg
o
rithm
s
, our st
rategy
simply em
pl
oys
the be
st learning al
gorith
m
for a
spe
c
ific in
fo
rmati
on task, sel
e
cted
amon
g
other le
arni
ng
algorith
m
s
wit
h
varying bi
a
s
e
s
. Ou
r ap
proach ha
s a
n
advantag
e in
a difficult situ
ation when
o
n
ly
a si
ngle
or
mi
nority of le
arning al
go
rith
ms
can
ma
ke
co
rrect p
r
e
d
i
c
tion
s. In such a
situation,
the
majority of
learning al
gorithms
with incorrect predicti
ons
will
outwei
gh the final
predi
ction,
whi
c
h
is not the
ca
se in our
win
n
e
r-ta
k
e
s
-all b
a
se
d mult
i-strategy learnin
g
. Neve
r
t
he
les
s
,
ou
r
ap
p
r
oa
c
h
also h
a
s a
sh
ortco
m
ing in t
hat it requires more comput
ational effort in sele
cting th
e best lea
r
nin
g
algorith
m
s si
nce
it ha
s to
evaluate th
e
perfo
rma
n
ce
of all b
a
se
learn
e
rs d
u
ri
ng the t
r
aini
ng
pha
se. Empi
rical eval
uatio
n on t
w
o
be
nchm
ark d
a
t
a
sets
(
Reut
ers Corp
orate Acq
u
isitio
n
and
Job Po
sting
) reveals that
our strateg
y
is quite e
ffective. In both data set
s
, our a
ppro
a
ch
outperfo
rm
s the other a
p
p
r
oache
s in term of F-mea
s
ure.
Referen
ces
[1]
Finn A, Kushmeric N.
Infor
m
a
t
ion Extractio
n
by Co
nver
gent
Boun
dary
Cla
ssificatio
n
. Pro
c
eed
ings
of
the AAAI Workshop o
n
Ada
p
ti
ve T
e
x
t
Ex
traction a
nd Min
i
n
g
. 2004.
[2]
Aggar
w
a
l C
C
, Z
hai C. Minin
g
T
e
xt Data. Spr
i
ng
er. 201
2.
[3]
Sodd
erla
nd S.
Learn
i
n
g
info
rmation e
x
trac
ti
on rul
e
s for semi-structure
d and fre
e
te
xt.
Machine
Lear
nin
g
. 199
9
;
34 (1-3):
233-27
2.
[4]
Califf ME, Mooney
RJ.
Re
latio
nal
L
earn
i
ng
of Pattern-
M
atch R
u
les
for Infor
m
ati
o
n
Extraction
.
Procee
din
g
s
of the S
i
xteent
h
Natio
nal
Co
nfe
r
ence
on
Artifi
cial Int
e
ll
ige
n
c
e
a
nd E
l
ev
ent
h Co
nfere
n
ce
on Innov
ative
Appl
icatio
ns of Artificial
Intel
lig
ence. Orlan
do,
F
L
. 1999: 328-
334.
[5] Ciravegna
F.
(LP)2, a
n
a
daptiv
e a
l
gor
ithm for i
n
for
m
at
ion
extractio
n
fro
m
W
eb-r
e
late
d texts
.
Procee
din
g
s of
IJCAI-2001 W
o
rksho
p
on Ad
aptive T
e
xt Ext
r
action a
nd Mi
nin
g
. Seattle, USA. 2001.
[6]
McCall
um A, Freitag D, Pere
ir
a F
C
.
Maximu
m Entropy Mar
k
ow
Models for
Informati
on Ex
traction an
d
Se
gm
en
ta
ti
on
. Procee
din
g
s
of the 17th Int
e
rnati
ona
l
Co
n
f
erence
on Ma
chin
e Le
arni
ng
. 2000: 5
91-
598.
[7]
Sara
w
a
gi S,
Coh
en W
W
.
Semi-Markov
Con
d
itio
nal
R
and
o
m
F
i
el
ds
for Informati
on Extractio
n
.
Advanc
es in N
eura
l
Informati
on Process
i
ng
S
y
stems. 20
04
: 1185-1
1
9
2
.
[8]
F
i
nn A. A Multi-L
e
vel Bo
u
ndar
y Cl
assific
a
tion
Ap
proac
h to Information Extracti
on.
Ph.D.
T
hesis.
Dubl
in: Un
ivers
i
t
y
Co
lle
ge D
u
b
lin; 20
06.
[9]
Siefkess C. An
Incrementa
l
l
y
T
r
ainable Stati
s
tica
l Ap
proac
h to Informatio
n
Extracti
on
ba
sed o
n
T
o
ke
n
Classific
a
tio
n
a
nd Ric
h Conte
n
t Models. Ph.
D
.
T
hesis. Berli
n
, F
r
eie Univ
er
sitat Berlin; 20
07.
[10]
Hi
yakum
o
to
L, Lita
LV,
N
y
b
e
rg E.
Multi-Strategy Information Extr
action for Question Answering
.
Procee
din
g
s of
F
L
AIRS Confe
r
ence. 20
05: 6
78-6
83.
[11] F
r
eitag
D.
Multi
s
trategy Le
arni
ng for Informati
on Extraction
.P
rocee
d
in
gs of Internati
o
n
a
l Co
nferenc
e o
f
Machi
ne Le
arn
i
ng. 19
98: 16
1-
169.
[12]
Sigl
etos G, Pa
liour
as G, Sp
yropo
ulos
CD,
Stamatopo
ul
os
T
.
Meta-lear
ni
ng b
e
yo
nd
Cla
ssificatio
n
: A
F
r
amew
ork for
Information E
x
traction fro
m
the W
e
b
.
Internatio
nal W
o
rks
hop & T
u
torial
on Ad
apti
v
e
T
e
xt Extractio
n
and M
i
ni
ng
hel
d in co
nj
un
ction
w
i
th th
e
14th Euro
pe
a
n
Confer
enc
e on Mach
in
e
Lear
nin
g
. 200
3
:
58.
[13]
Lam
L. Cl
assifi
er Com
b
i
natio
ns:
Implem
ent
ations
an
d T
h
e
o
retica
l Issues.
Multiple Classifier System
s
.
Sprin
ger Berl
in
Heid
elb
e
rg. 20
00: 77-8
6
.
[14]
Feilma
y
r C, V
o
jin
ovic K, Pr
oll B.
D
e
sig
n
i
ng a M
u
lti-d
i
me
nsi
ona
l Sp
ace for Hy
bri
d
Infor
m
atio
n
Extraction.
Dat
abas
e a
n
d
E
x
pert S
y
stems
Appl
icatio
ns
(
D
EXA),
23r
d I
n
ternati
o
n
a
l W
o
rksho
p
o
n
.
IEEE. 2012: 12
1-12
5.
[15]
Roth D, Yih W
T
.
Relation
al
Lear
nin
g
via P
r
opos
ition
a
l Al
gorith
m
s: An I
n
formatio
n
Extraction C
a
s
e
Study
. Procee
din
g
s of Intern
ation
a
l Joi
n
t Confere
n
ce
o
n
Artificial Intel
lig
e
n
ce. 200
1; 17(
1): 1257-
12
63.
[16] F
r
eitag
D.
T
o
w
a
rd Gener
al-
P
urpos
e L
ear
n
i
ng
for Infor
m
ation
Extractio
n
. Proce
edi
ng
s of the
3
6
th
Annu
al Me
etin
g of the Assoc
i
atio
n for Com
putatio
na
l Li
ng
uistics.
San F
r
ancisc
o
, CA. 1
998; 1: 4
04-
408.
[17]
Van Ri
jsber
ge
n CJ. Information Retri
e
val.
S
e
con
d
Editi
on. Butter
w
orth-H
e
i
nem
ann. 1
979
.
[18]
Li Y, Z
a
ra
goz
a H, Her
b
ric
h
R, Sha
w
e
-
T
a
yl
or J, Ka
ndo
l
a
JS.
Perce
p
tron Al
gorit
hm
w
i
th Uneve
n
Margin
. Proc
ee
din
g
s of Intern
ation
a
l Co
nfere
n
ce on Mac
h
i
n
e Lear
nin
g
. 20
02; 2: 379-
386.
[19]
Webb GI, Boughton JR
, Wang Z. Not S
o
Naïve B
a
y
e
s:
Aggr
egating O
ne-
Dependenc
e Estimator
.
Machi
ne Le
arn
i
ng
. 20
05; 58(
1
)
: 5-24.
[20]
Aha DW
, Kibl
e
r
D, Albert MK. Instance-b
a
se
d Le
arni
ng Al
g
o
rithms.
Machi
ne L
earn
i
n
g
. 1
991; 6(
1): 37-
66.
Evaluation Warning : The document was created with Spire.PDF for Python.