TELKOM
NIKA
, Vol.13, No
.1, March 2
0
1
5
, pp. 357~3
6
3
ISSN: 1693-6
930,
accredited
A
by DIKTI, De
cree No: 58/DIK
T
I/Kep/2013
DOI
:
10.12928/TELKOMNIKA.v13i1.648
357
Re
cei
v
ed Se
ptem
ber 30, 2014; Revi
se
d De
ce
m
ber
13, 2014; Accepted Janu
ary 10, 201
5
Towards a Framework for an Indonesian Medical
Question Generator
Wi
w
i
n Su
w
a
rningsih*
1,2
, Iping Suprian
a
1
, A
y
u
Pur
w
arianti
1
1
School of Ele
c
tronic Eng
i
ne
erin
g and Infor
m
ati
cs, Institute T
e
chnolo
g
y
Band
un
g,
Indo
nesi
a
2
Research C
e
nter for Informatics,
Indon
esi
an Institute of Scienc
e
*Corres
p
o
ndi
n
g
author, e-ma
i
l
:
w
i
w
i
n.su
w
a
r
n
ingsi
h
@stu
de
n
t
s.itb.ac.id
A
b
st
r
a
ct
Question g
e
n
e
r
ating is the ta
sk of
automatic
ally ge
ner
ating
questi
ons
fro
m
various in
puts such as
raw
text, datab
ase, or se
mant
ic repr
esentati
on. In this
pap
er, w
e
attempt
to descri
be
a g
ener
al fra
m
ew
ork
that cou
l
d
hel
p
dev
elo
p
a
nd c
haracter
i
se
efforts to
Ind
ones
ian
med
i
cal
qu
estions
ge
ner
a
t
ion. W
e
pr
op
o
s
e
a new
style of
questi
on g
e
n
e
r
a
tion th
at activ
e
ly uses s
ente
n
ces w
i
thin a
d
o
cu
me
nt as a s
ource of a
n
sw
e
r
s.
W
e
use man
u
a
lly w
r
itten rul
e
s to perfor
m
a seq
uenc
e
of
gen
eral p
u
rp
o
s
e syntactic transfor
m
ati
ons
(e.g.
ide
n
tificatio
n
of
keyw
ords or
k
e
y
p
h
rases
to
NER b
a
se
d
on
PICO frame)
to
turn
a d
e
cl
arat
ive s
entenc
e
in
to
questi
ons. T
h
e
final r
e
su
lt of this res
earch
is
a patter
n
of q
u
e
stion
an
d a
n
s
w
er pairs, w
h
e
r
e the test res
u
lts
show
the patte
rn match
i
n
g
al
gorith
m
pr
ecis
i
on val
ue of 0.1
01 an
d a reca
ll
of 0.712.
Ke
y
w
ords
:
ge
nerati
ng q
uesti
on, Indo
nesi
a
n m
e
dical text,
PICO fram
e
1. Introduc
tion
Medical qu
estions a
r
e
often long
and
complex an
d t
a
ke
many fo
rms [1]. One
way to
meet medica
l information
needs i
s
to refer to
the publish
ed literature for related medi
cal
eviden
ce [2].
This info
rma
t
ion is ente
r
e
d
in
a fr
e
e
fl
owin
g text. The g
oal of
such
re
co
rdin
g is
prima
r
ily limited to manu
al
inspe
c
tion
o
r
future
referenci
ng when
the sam
e
pat
ient asks
aga
in
[3]. This pap
er presents
a system de
sign
ed to
sat
i
sfy the information nee
d
s
of patient. We
explore the
s
e
interestin
g rese
arch qu
estions in
the medical and
clini
c
al dom
ai
ns, focu
sin
g
on
the info
rmati
on n
eed
s
of
patient
s in
preventive me
d
i
cal
actio
n
. T
he m
e
thod
fo
r the
info
rmat
ion
need
s
of pa
tients a
r
e
m
anifold
su
ch
as he
al
th
informatio
n t
h
rou
g
h
adve
r
tising,
Onli
n
e
Discu
s
sion
[4
], health
edu
cation,
que
sti
on g
ene
ratio
n
(QG
)
et
c.
QG i
s
th
e ta
sk of
gen
erat
ing
rea
s
on
able q
uestio
n
s from
a para
g
ra
ph,
text or
sente
n
ce a
nd data
base. QG inv
o
lves nu
mero
us
compl
e
x subt
asks
whi
c
h li
e at the interface b
e
twe
e
n
Natural La
ngua
ge Un
d
e
rsta
ndin
g
a
n
d
Natural Lan
g
uage G
ene
rat
i
on [5].
Previou
s
ly h
a
ve do
ne
so
me resea
r
ch
on thi
s
QG f
o
r
a vari
ety
of metho
d
, te
chni
que
s
and dom
ain
s
su
ch a
s
sem
antics-b
a
sed
method, ke
y
w
ord-ba
sed t
e
ch
niqu
es an
d logic rea
s
o
n
ing
que
stion fo
r
stude
nt. Yao
et al. [5], p
r
opo
se
a
sem
antics-b
a
sed
method
of t
r
an
sformi
ng t
he
Minimal Recursi
on Sem
a
ntics
(M
RS) rep
r
e
s
entat
i
on of de
cla
r
ative sente
n
c
e
s
to that
o
f
interrogative
sente
n
ces.
Urlaub
[6], stud
ies th
e im
p
a
ct of the readi
ng
comp
re
he
nsio
n
strateg
y
of
gene
rating
q
uestio
n
s
on li
terary
rea
d
in
g develo
p
me
nt in a
se
con
d
lang
uag
e. Li and
Hu
an
g [3]
develop an
d
evaluate a techni
que for an MQG (Medical Que
r
y Gene
rator) that gene
rates
keyword
-
b
a
sed qu
erie
s from natu
r
al la
ngua
ge d
e
sc
riptions
of me
dical i
n
form
ation ne
ed
s. MQ
G
aims
at se
rvi
ng a
s
a f
r
ont-end
com
pone
nt of a re
triev
a
l syste
m
: it gene
rate
s q
u
e
rie
s
to
retrie
ve
relevant
texts, whi
c
h
are t
hen
ra
nke
d
f
o
r fu
rt
he
r p
r
o
c
e
ssi
ng
(e.g.,
rel
e
vant me
asu
r
em
ent a
nd
ran
k
ing
)
. Kho
deir et
al. [8], estimate th
e stud
ent kn
owle
dge m
o
d
e
l in a p
r
o
b
a
b
ilistic
domai
n
usin
g automa
t
ic adaptively
generated a
s
sessme
nt
q
uestio
n
s. Th
e
student
’
s
an
swers are used
to measure t
he actu
al stu
dent mod
e
l. Upd
a
ti
ng an
d
verification
of the model
are cond
uct
ed
founde
d on the matchi
ng
betwe
en the stude
nt’s an
d
model an
swers.
Wang et
al. [9], present
algorith
m
s for automatic qu
estion
s ge
neration
of logic
rea
s
oni
ng wit
h
uniqu
e sol
u
tions.
Another pi
ece of resea
r
ch, by Ajmera et
al. [10], prop
osed tech
niqu
es fo
r exploi
t
informatio
n to mine real
cu
stome
r
co
nce
r
n
s
o
r
problem
s an
d then ma
p the
m
to well
wri
tten
kno
w
le
dge
a
r
ticle
s
fo
r th
at ente
r
pri
s
e.
Thi
s
ma
ppi
ng
results in
the g
ene
rati
on of
que
sti
on-
answe
r
(QA)
pairs. Bed
narik a
nd Kova
ck [11], d
e
scri
be the
pri
n
ci
ples an
d met
hod
s a
pplied
to
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No. 1, March 2
015 : 357 – 3
6
3
358
determi
ne a
u
t
omatically th
e wo
rd to
b
e
extra
c
ted f
o
rm a
ch
ose
n
se
nten
ce f
o
r the
que
sti
o
n
gene
ration a
c
cording to inf
o
rmatio
n gain
ed by the method
s mention
ed above.
From th
ese
studie
s
, we
obtaine
d so
me thing
s
that ca
n be
explore
d
furt
her a
nd
improve
d
to
prod
uce q
u
e
s
tion
s g
ene
rator fo
r Ind
o
nesi
an m
edi
cal lang
uag
e.
In this
pape
r,
we
prop
ose a ne
w style of question g
ene
ra
tion that ac
tively use
s
sent
ences
within a document a
s
a
sou
r
ce of
an
swer.
We
use man
ually written ru
les to
perfo
rm
a
seque
nce of g
eneral p
u
rp
o
s
e
a
syntacti
c tran
sform
a
tion (e
.g. identificati
on of
keywo
r
ds o
r
key ph
rase to
NER
based o
n
PICO
frame) to turn
a decla
rative
senten
ce int
o
que
stion
s
.
The re
st of the pap
er i
s
orga
nised a
s
follows: Se
ction 2 de
scri
bes
relate
d work o
n
que
stion g
e
n
e
ration
followed by Se
ctio
n 3
whi
c
h
d
e
tails o
u
r
QG framework. Se
ction 4
de
scri
be
about
an In
d
one
sian
med
i
cal
que
stion
gen
eratio
n.
Section 5 sh
ows
the
con
c
ept of
patte
rn
matchin
g
alg
o
rithm for Im
QG. Finally, in Section 6 we give a con
c
l
u
sio
n
and fut
u
re di
re
ction
s
.
2.
Relate
d
Wor
k
The Q
G
fram
ewo
r
k
ca
n b
e
co
nsi
dered
as the i
n
teg
r
ation of
sev
e
ral frame
w
o
r
ks fro
m
related areas of text mining [12],[14],
natural
language processi
ng [6],[
15],[1
9],[25], semantic
knowledge
management [4],[13],[14], pedagogy
and software devel
opment [5],[9],[10].
Re
cently,
several qu
estion
gene
rato
r fra
m
ewo
r
ks in n
a
tural lan
gua
ge tasks ha
s been de
sig
n
e
d
.
In the following, we highlight their differenc
es
.
Iwane
et al.
[18], define
a frame
w
o
r
k for lea
r
ne
r-centred
qu
esti
on ge
ne
ratio
n
on th
e
tablet PC. T
h
e fram
ework i
s
b
a
sed
on
a
learne
r’s
a
c
tions an
d the
context of
an
expo
sitory te
xt.
In assistin
g a
c
tive lea
r
ning
, three diffe
re
nt types
of q
u
e
stion
s
can
b
e
gen
erated
according
to the
learn
e
r’
s marking a
c
tion
s. Some asso
ci
ated
que
stion
s
ca
n also b
e
generated b
a
se
d on the type
of question.
The g
enerated questions
will prom
ote
active learni
ng by motivati
ng further self-
que
stioning
s and
ma
rki
n
g
s
.
The fram
ewo
r
k of Sumita
et al. [19], consi
s
ts
of thre
e main ste
p
s:
senten
ce
extractio
n
,
determi
ning
the bl
an
k p
a
rt and
gen
era
t
ion optio
n word
s.
Th
e se
lection
of se
ntences, bla
n
k
positio
ns
an
d option
a
l
word
s i
s
d
e
termined
wi
th
help of
ma
chine le
arning
method
s
using
statistical and
discrimin
a
tive model
s.
Heilma
n
a
nd
Smith [20], d
e
fine a f
r
ame
w
ork fo
r
gen
erating
a
ran
k
ed
set of fa
ct-ba
s
e
d
que
stion
s
ab
out the text o
f
a given
article. T
hey d
e
scribe
an exten
s
ible
app
ro
ach to ge
ne
ratin
g
que
stion
s
for the purp
o
se o
f
reading
com
p
reh
e
n
s
ion a
s
sessme
nt and pra
c
tice.
Lindb
erg
h
et
al. [21], de
velop a tem
p
late-b
ased f
r
ame
w
o
r
k fo
r QG. T
he
prima
r
y
motivation for this is the opportu
nity of
a temp
late-b
ase
d
app
roa
c
h to gen
erat
e que
stion
s
that
are not me
rel
y
declarative interrogative tran
sform
a
tion
s.
Ali et al.
[22], con
s
ide
r
a ki
nd of Text-to-
Que
s
tion ge
n
e
ration ta
sk framework, wh
ere the
input text is
sente
n
ces. T
he QG
sy
ste
m
wo
uld
the
n
gen
erate
a
set of q
u
e
s
tions fo
r
whi
c
h
the
sente
n
ce co
n
t
ains, implie
s, or need
s an
swers.
3.
Proposed m
e
thod
The pap
er gi
ves first an o
v
erview of the main meth
ods of Indon
esia
n medi
ca
l question
gene
ration
(I
mQD) the
n
th
e a
r
chitectu
re
of the
propo
sed
sy
stem i
s
present
ed.
F
o
r th
at task,
we
prop
ose an ImQD ge
ne
ral
framework (Figure 1),
intende
d as an
abstra
c
t and
flexible model
w
i
th
in
w
h
ic
h
to
c
h
ar
ac
ter
i
s
e
an
d
c
o
mp
a
r
e
ac
tual
syste
m
s of
que
stion
ge
neratio
n. A
brief
descri
p
tion of
each
step de
scribe
d in se
ction 4.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Towa
rd
s a Fram
ework for
an Indon
esi
a
n Medical Qu
estion G
ene
rator (Wiwi
n
Suwa
rnin
gsi
h
)
359
Figure 1. ImQG frame
w
o
r
k
4.
Indonesian
Medical Que
s
tion Gen
e
ra
tion
The first mo
dule
s
of the
frame
w
ork a
r
e sy
ntax an
d
sema
ntic p
a
r
sin
g
, whi
c
h
natural
langu
age
sen
t
ence
s
to a d
e
tailed, forma
l
, meaning
re
pre
s
entatio
n l
angu
age
(e.g
. tokenise an
d
stemmin
g
). It first use
s
an integrate
d
shall
o
w pa
rser to prod
uce a semanti
c
ally augment
ed
parse tre
e
b
a
se
d on ph
rase
s, in whi
c
h each
non
-t
ermin
a
l nod
e
has both
a syntacti
c and
a
sema
ntic la
b
e
l. A comp
osi
t
ional-sem
ant
ics
proce
d
u
r
e
is then
used
to map the
a
ugmente
d
pa
rse
tree into
a definitive meaning
representation. For
example, Fi
gure
3 illustrat
e
s the syntactic
parse
of the
Indon
esi
a
senten
ce
s “
P
e
ning
katan
si
stem
im
un menah
an
dari
serang
an
vi
rus
”
(Improve
d im
mune sy
ste
m
resi
sts virus attack).
Base
d on gra
mmar in Fig
u
re 2, the p
a
rser
assign
s a VP for the verb phra
s
e, an
NP for the
noun phra
s
e, a JJP for the adj
ective phrase
,
an
AdvP for the adverb
phrase, a Nu
mP for the nu
merals ph
ra
se a
nd a PP for the prepo
sitio
nal
phra
s
e.
Next, name
d
entity re
cog
n
ition i
s
a
su
btas
k o
f
medical info
rmation
extra
c
tion that
se
e
k
s
to locate a
n
d
cla
ssify el
ements i
n
a
text into pre
defined
cate
gorie
s. Such
as the n
a
m
e
s of
person
s
, org
anisation, location, time, proble
m
, in
tervention, com
pari
s
on, out
comes. Th
e re
sult
of NER is in
Table 1.
Figure 2. Indone
sian me
di
cal gramma
r
based on PO
S [7]
Table 1. Na
m
e
entity recog
n
ition ba
sed
on key-ph
ra
se resulted
Word/p
hrase
NER
descrip
tio
n
sistem
im
un
(immune s
y
stem)
<CO
M
PARIS
O
N
> = <C>
Ke
y
-
ph
rase
serangan virus
(virus attack)
<INTERVENSI
ON> = <I>
Ke
y
-
ph
rase
I
n
do
ne
s
i
a
n
Me
d
i
c
a
l
Te
x
t
Sy
n
t
a
x
an
d
se
m
a
n
t
ic
Pa
r
s
i
n
g
(e.g
.
To
k
e
n
i
z
e
an
d
st
e
m
m
i
n
g
)
Na
me
d
En
t
i
t
y
R
e
c
o
g
n
it
io
n
For
m
a
l
i
z
a
t
ion
(e.g
.
wo
r
d
an
d
ph
r
a
s
e
)
Que
s
t
i
o
n
R
e
p
r
es
e
n
ta
ti
o
n
(e.g
.
pa
t
t
e
r
n
ma
tc
h
i
n
g
an
d
cl
a
s
s
i
f
i
c
a
t
i
o
n
que
s
t
i
o
n
wi
t
h
PI
C
O
fr
a
m
e
)
Se
le
c
t
in
g
of
QA
pa
i
r
r
e
p
r
e
s
e
n
ta
ti
o
n
mo
d
e
l
(e.g.
QA
pa
i
r
fi
lt
e
r
in
g,
QA
pa
i
r
ev
a
l
u
a
t
e
d
)
I
n
do
ne
s
i
a
Me
d
i
c
a
l
QA
‐
pa
i
r
Mo
d
e
l
1
2
3
4
5
No
Rule
description:
S = sentence ; SUBJ = subjects ;
PRED = predicat
e
NP = noun ph
rases ; AP = adverb phrases
JJP = adjective phrase ; N = noun
PP = prepositional phrase (pr
epo
sition)
PRP = personal
pronoun
RB = adverb (e
g,
sedang, agar
)
VP = verb phras
e
JJ = adjective (eg, cantik, malas)
IN = preposition (
eg di, ke, dari)
VBI = intr
ansitive
ver
b
(
eg, per
g
i)
VBT = transitive verb (eg, mem
b
e
li)
CC = adverb/a
d
d
i
tional correlati
ve
conjunctions (eg, dan)
SC = adverb/add
itional subordi
nate conjunctions (eg, jika)
1. S
SUBJ PRE
D
2. SUBJ
NP
3. NP
(AP
)
N (P
P) (J
J
P
)
NP
PRP
4. AP
JJ (
PP)
5. PP
IN NP
6. JJP
(N) J
J
7. PRED
(RB)
V
P
8. VP
(VBI)
(VB
T
)
9. VBT
VB (I
N)
O
B
J
VBT
C
C
S
VBT
SC S
10. OBJ
NP
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No. 1, March 2
015 : 357 – 3
6
3
360
From th
ese
studie
s
, we
obtaine
d so
me thing
s
that ca
n be
explore
d
furt
her a
nd
cla
ssifi
cation
Indon
esi
a
n
medi
cal
q
uestio
n
with
frame
PICO
(Proble
m
, Intervent
ion,
Comp
ari
s
o
n
, Outcome
s
).
The effici
ent
identificat
io
n
of patient, int
e
rvention,
co
mpari
s
o
n
, an
d
outcom
e
(PI
C
O
)
com
pon
ents in me
d
i
cal a
r
ticl
e
s
is helpful in
evidence-b
a
se
d medi
ci
ne.
Evidence
-
b
a
sed me
dici
ne
or p
r
a
c
tice (E
BM) involv
e
s
an
swerin
g m
edical qu
esti
ons by a
nalysi
s
of relat
ed
arti
cle
s
from lite
r
ature
datab
a
s
e
s
su
ch
as
PubMed
[2]. From
Tabl
e 1
.
we
ca
n
see
that
the formali
s
at
ion p
r
o
c
e
ss
u
s
e
s
an
an
not
ation mo
dule
to encode
the
text into a m
o
re
or l
e
ss
rig
i
d
formali
s
m. It i
s
a
d
visa
ble t
hat it b
e
p
o
ssible to
pe
rform de
du
ctive, and
pr
efera
b
l
y
also
in
ducti
ve
and
abd
uctiv
e
infe
ren
c
e
s
t
he fo
rmali
s
ati
on. In thi
s
pa
per, th
e form
alisatio
n to
co
rre
ction
of
wo
rds
or ph
ra
se the rule
s of the Indone
sian
languag
e.
The ch
osen formali
s
m mu
st also allo
w for
a
comp
ari
s
o
n
to a previou
s
l
y
built knowle
dge ba
se in
whi
c
h medi
ca
l knowl
edg
e is rep
r
e
s
ente
d
at
the desi
r
ed le
vel of detail
The next step
is que
stion
repre
s
e
n
tation
to
medical
knowl
edge
an
d text repre
s
entation
s
will then be
comp
ared wi
thin a kno
w
l
edge mat
c
hi
ng modul
e, the sp
ecifi
cs
of which dep
end
largely o
n
t
he cho
s
en f
o
rmali
s
m. T
hese mo
dul
es
sho
u
ld e
x
tend the repre
s
e
n
tation
by
perfo
rming a
ll allowa
ble
inferen
c
e
s
(su
c
h a
s
Problem/Patient
, Intervention, Comp
ari
s
on,
Outco
m
e
s
), then matchin
g
the results to the
conte
n
t of the medical
kno
w
le
dge ba
se. PICO
stand
s for :
Patient/Probl
em used
Sia
p
a (Who)
or
A
p
a (W
hat)
; Intervention
use
d
Ba
gai
m
ana
(Ho
w
)
; Comp
arison u
s
ed
Apa (
W
h
a
t
i
s
the main al
ternative); O
u
tcome
used
Apa (
W
ha
t)
are
you trying to
accompli
sh,
measure, imp
r
ove, effe
ct.
The mat
c
h
of pattern
s a
g
a
i
nst a
given f
r
ee
text is do
ne
at the lexi
cal,
synta
c
tic
an
d semant
i
c
le
vels. Also,
we di
scard
all
matche
d p
h
rase
s
or wo
rd
s in which the an
swer b
a
sed on
the sem
anti
c
catego
ry expected
by the que
stion. Fro
m
the above NE
R re
sult in Ta
ble 1, we can
generate
a q
uestio
n
for ea
ch of the NE
R (see Ta
ble
2).
Figure 3. Sentence exam
pl
e and
its shall
o
w pa
rser results
The re
sult
s are pa
ssed
on to a que
stion re
prese
n
tation modu
le, where an
array of
Indone
sia
n
m
edical q
u
e
s
tion mo
del
s a
n
d
qu
estio
n
-a
n
s
wer
pai
rs is
sea
r
ched
in
o
r
de
r to i
dentify
the types of questio
n
s to b
e
gene
rated f
o
r the se
l
e
cte
d
input. The que
stion
s
thus gene
rate
d are
then asse
sse
d
and filtered
. The results of ques
tion
filtering with
pattern
s ba
sed on POS and
PICO Cla
s
sification a
r
e in
Table 3.
S
N
JJP
NP
P
e
ni
ng
k
a
ta
n
N
JJ
si
s
t
e
m
im
u
n
N
N
se
r
a
n
g
a
n
vi
r
u
s
SU
B
J
P
R
E
D
NP
VB
T
VB
IN
OB
J
m
e
na
ha
n
da
r
i
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Towa
rd
s a Fram
ework for
an Indon
esi
a
n Medical Qu
estion G
ene
rator (Wiwi
n
Suwa
rnin
gsi
h
)
361
Table 2. Que
s
tion Representation
with
que
stion ge
n
e
ration fo
r “
P
ening
kata
n si
stem
im
un
m
enahan da
ri
sera
nga
n vi
rus
”
Ques
tio
n
ge
ner
a
tion
base
d o
f
NER
Expecte
d
A
n
s
w
er T
y
pe
base
d
NER
1.
A
p
ak
ah
manfaat
meningkatkan sistem imun
?
<INTERVENSI
ON>
2.
Ap
a
yang dimak
s
ud dengan sistem imun
?
<DEFINISI
O
N>
3. Sistem
imun
itu
apa
ya
?
<DEFINISI
O
N>
4
.
A
p
ak
ah
dampak
n
y
a bila
sistem i
m
un
meningkat?
<
D
IA
GN
OS
IS
>
5.
Bagai
m
ana
m
e
n
ahan seranga
n virus
?
<C
O
M
P
A
RI
SO
N>
6. Bagai
m
ana
vir
u
s meny
er
ang
tub
uh ?
<
D
IA
GN
OS
IS
>
Table 3. The
result of question filtering for “
Peni
ng
kat
an si
stem
imun m
enahan
dari serang
an
viru
s
”
Ques
tio
n
Ques
tio
n
Pat
t
er
n
A
n
sw
er
A
n
sw
er Pattern
A
p
ak
ah
manfaat
meningkatkan
sistem imun
?
A
p
ak
ah
+ N + N
+ JJP
Menahan sera
ng
an virus
VB + NP
Bagai
m
ana
m
e
n
ahan
serangan virus
?
Bagai
m
ana
+ V
B
+ NP
Peningkatan sist
em imun
N + JJP
The final
ste
p
is th
e sele
cting of a
QA
pair
r
e
pr
es
enta
t
io
n
mo
de
l
(
e
.g
. Q
A
pa
ir filte
r
in
g
,
QA pair eval
uated). In
ou
r frame
w
o
r
k, t
here
are tw
o
rule
s for QA
pair filteri
ng.
The first on
e
is
that if the phrase in
a gen
erat
ed
que
sti
on ha
s a p
r
o
noun a
s
a p
a
r
t-of-sp
eech, it is meanin
g
l
e
ss
[23]. The se
cond on
e is th
at a que
stion
that does
not
contain
s
of
NER b
a
sed o
n
PICO fram
e, it
is meani
ngle
s
s.
5.
The Con
cep
t of Pattern M
a
tching
Algo
rithm for ImQD
We a
pproa
ch
ed the p
r
obl
em of finding
the
pattern
matchin
g
wit
h
modified
al
gorithm
from Determinis
tic
Finite
Automata (DFA) me
thod
s [24]. The concept of pattern matchi
n
g
algorith
m
is i
n
Figu
re 4. T
he alg
o
rithm
prop
osed
in t
he pa
per is
e
x
tensive in te
rms
of su
ppo
rt
variou
s p
a
ttern b
a
sed
on
POS/NER
an
d ea
ch
no
de
ca
n h
a
ve m
u
ltiple lab
e
ls
(su
c
h
a
s
PO
S
tagging, NE
R).
5.1.
Experimenta
l
Data for Pa
tter
n
Matchi
ng Algorith
m
In ou
r d
a
taba
se, the
r
e
is n
o
me
dical
Ind
one
sian
qu
estion ge
neratio
n data
availa
ble. Fo
r
this re
aso
n
, we built ou
r
own Ind
one
si
an medi
cal q
uestio
n
do
cu
ment pairs. We colle
cted
from
two popul
ar Indonesi
a
n
health web
s
ites (detikh
ealth.com a
n
d
komp
ashe
alth.com
). In the
que
stion coll
ection, we
a
s
ked 10
Indo
n
e
sia
n
peo
ple
to write Ind
o
n
e
sia
n
que
stio
ns b
a
sed on
50
article
s
that we sele
cted m
anually from t
he Indon
esi
a
n medical co
rpus.
5.2.
Experimenta
l
Result fo
r Patte
rn Matc
hing Algorithm
The
scope
of the test
con
s
i
s
ts o
f
a sente
n
ce with PO
S pattern
(su
c
h a
s
N
+
N+
JJ+
VB+IN
+N
+N
)
.
This
w
a
s
necess
ar
y
s
o
th
at durin
g th
e testing
sta
ge, the patt
e
rn
matchin
g
al
g
o
rithm
woul
d
be abl
e to
re
cog
n
ise the I
ndon
esi
an g
r
ammar. In thi
s
pa
pe
r, we
u
s
ed
100 senten
ces with a P
O
S pattern f
o
r the patte
rn matchi
ng
algorith
m
experim
ent. Fo
r the
pattern mat
c
hing, we u
s
e
d
two evaluat
ion mea
s
u
r
e
s
: preci
s
io
n an
d re
call. Pre
c
ision
sho
w
s the
averag
e rati
o of the relevant pattern. The rele
v
ant pattern i
s
a pattern
that contain
s
a
n
Indone
sia
n
g
r
ammar b
a
se
d on
Indo
ne
si
an POS.
Re
call sho
w
s a
n
u
mbe
r
of
patt
e
rn
s th
at mig
h
t
have pattern-matchin
g in o
u
r data
b
a
s
e.
Our p
a
ttern
matchin
g
a
c
h
i
eves p
r
e
c
isi
on of 0.10
1 a
n
d
recall of 0.71
2.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 13, No. 1, March 2
015 : 357 – 3
6
3
362
Figure 4. Pattern mat
c
hin
g
algorith
m
for ImQG
6. Conclu
sion
In this pape
r, a new fra
m
e
w
ork for Ind
o
nes
i
an medi
cal que
stion g
eneration an
d
con
c
ept
of pattern matc
hing with
Determinis
tic Fini
te Auto
mata (DFA)
method
s wa
s p
r
opo
se
d. The
frame
w
ork i
s
ba
sed
on
a
parsin
g
p
r
o
c
e
s
s for
NE
R ide
n
tificati
on. NE
R id
e
n
tification in
clude
s
word and p
h
rase
whi
c
h are cau
s
e
d
by self-q
ue
st
ioni
ngs
while id
e
n
tifying the POS pattern. After
the sy
stem id
entifies a
ne
w con
c
ept
or
finds a
n
a
n
swer to a
que
stion on th
e da
tabase sente
n
ce
,
the system
will mark it on the sentence. In t
he proposed fram
ework, fo
ur different types of
que
stion ca
n be gene
rate
d
acco
rdi
ng to this NER ide
n
tification. Th
e final result
of this rese
arch
is a
pattern
of que
stion
a
nd a
n
swe
r
p
a
irs,
wh
ere the te
st re
sult
s
sho
w
the
p
a
ttern m
a
tchi
ng
algorith
m
p
r
e
c
isi
on val
ue
o
f
0.101
an
d a
recall
of 0.
712. The future
work is to fully implement t
he
system and evaluate
the effectivene
ss of its u
s
e in
motivating u
s
er for
active
learning i
n
th
e
medical dom
ain.
Ackn
o
w
l
e
dg
ment
We than
k an
onymou
s
revi
ewe
r
s fo
r thei
r helpful com
m
ents.
Referen
ces
[1]
Jon P, Min
L. An onto
l
og
y f
o
r clin
ical
qu
e
s
tions a
bout t
he co
nt
ents of
patie
nt notes.
Journ
a
l of
Biom
edical Informatics
. 201
2
;
45(2): 292
–30
6.
[2]
Dina
DF
, Jimm
y
L. Ans
w
e
r
in
g
Cli
nica
l Questi
ons
w
i
t
h
K
n
o
w
l
edg
e-Bas
ed
a
nd Statistic
a
l T
e
chn
i
qu
es.
Journ
a
l of Bio
m
e
d
ic
al Infor
m
atics
. 2010; 4
3
(
6): 962-9
71.
[3]
Re
y LL, Yi C
H
. Medica
l qu
er
y
gen
erati
o
n
b
y
term–c
a
te
gor
y corre
latio
n
.
Journa
l of Informati
o
n
Processi
ng & Mana
ge
me
nt.
201
1; 47(1) : 6
8
-79.
[4]
Erlin, Ra
hmiati
, Unan
g R. Tw
o
T
e
xt Class
ifiers
in On
lin
e
Discussi
on: S
upp
ort Vector
Machi
ne vs
Back-Prop
agat
ion Neur
al Ne
t
w
ork.
T
E
LKO
M
NIKA T
e
leco
mmu
n
icati
on
Co
mp
uting
Ele
c
tronics a
n
d
Contro
l.
201
4; 12(1): 18
9-2
0
0
.
[5]
Xuc
h
e
n
Y, Gosse B, Yi Z
.
Semantic
b
a
se
d Question G
ener
ation
and
Impleme
n
tatio
n
.
Journa
l of
Dial
o
g
ue a
nd
Discours
e
. 201
2; 3(2): 11-42.
Algorithm for pattern matching
Input:
Sentence with POS pattern
//
POS = Part of speech
Output:
Pattern of sentence founded
Method:
One state for each pattern based on POS/NER and each node can have multiple
labels (such as word, POS tagging, NER).
//
create a pointer to the next state
Int j = 0
Int p = 0
//
identification for POS on database table = d.pos
//
identification for query POS from sentence = q.pos
// matching process
For
(int i = 0; i < N; i++)
//
N is length of pattern
If
(q.pos(i) = d.pos(j)
then
//
compare POS at that position with next pattern POS
Next[j] = next i];
j ++;
//
pattern match do copy and increment
else
next [j] = x + 1;
x = next[x];
//
pattern mismatch do opposite
endif
//
If pattern match then POS founded
If
(j = p)
then
// if POS at j position found at p position
Return i-p+j
Endif
Endfor
Return
-1
/
/ if
P
OS at
j
p
osition not found at
p
p
osition
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Towa
rd
s a Fram
ework for
an Indon
esi
a
n Medical Qu
estion G
ene
rator (Wiwi
n
Suwa
rnin
gsi
h
)
363
[6]
Per U. Readi
n
g
strategies a
n
d
lit
erature i
n
struction: T
eachi
ng le
arners to gen
erate qu
est
i
ons to foster
literar
y re
adi
ng
in the seco
nd l
ang
ua
ge.
System
.
2
012; 4
0
(2
): 296-30
4.
[7]
Alfan FW, Ay
u P.
HMM Based Part-of-Sp
eech T
a
g
ger for Bahas
a Ind
ones
ia
. Proce
edi
ng of the
F
ourth Internati
ona
l MALINDO
W
o
rkshop (MALINDO
20
10).
Jakarta, Indon
esia. 20
10; 1: 164-
270.
[8]
Nabi
la K, N
a
yer W
,
Nevin
D, Nadi
a H. B
a
ye
si
an b
a
se
d
ada
ptive
que
stion g
e
n
e
ratio
n
techn
i
q
ue.
Journ
a
l of Elec
trical Systems
and Infor
m
ati
o
n T
e
chno
lo
gy
. 201
4, 1(1): 10-
16.
[9]
Kun W
,
T
ao L, Jungan
g H, Yani L. Al
gorit
hms
for Automatic Generati
on of Lo
gica
l Questions
o
n
Mobil
e
Dev
i
ces
. IERI Procedia
. 2012; 2(4): 25
8-26
3.
[10]
Jitendr
a A, Sa
chin
dra
J, Ash
i
sh V, Amol M.
Auto
matic Ge
nerati
on
of Question A
n
sw
er
Pairs F
r
o
m
Noisy Cas
e
Lo
gs
, IEEE 30th International Conf
erenc
e on Data Engineering (
ICDE
). 2014
: 436-44
7.
[11]
Laszl
o B, Las
zlo K.
Automated EA-type
Question Ge
n
e
ra
tio
n
from
Annotate
d
T
e
x
t
s
, 7th IEEE
Internatio
na
l Symp
osi
u
m on
Appl
ied (om
put
ation
a
l ln
t
e
ll
ige
n
ce an
d Inform
atics, Roman
i
a
.
2012: 19
1-
195.
[12]
F
u
YY. Multipl
e
pe
er-assess
ment mod
e
s to
augm
ent on
lin
e stude
nt ques
ti
on-g
ener
atio
n
process
e
s.
Co
mp
uters & Educati
on.
20
11
; 56(2): 484-4
9
4
.
[13]
Jong
ik K. An effective cand
ida
t
e gener
atio
n method for imp
r
ovin
g per
form
ance of ed
it similarit
y
qu
e
r
y
process
i
ng.
Information System
s.
20
13; 47(
2
)
: 116-12
8.
[14]
Yukiko SA. A Subcateg
or
y-b
a
se
d Pars
er Dir
ecte
d to Generati
n
g
Repr
es
entati
ons for T
e
xt
Und
e
rstand
in
g.
Procedia - So
cial a
nd Be
havi
o
ral Sci
ences.
201
1; 27(3): 19
4-20
1.
[15]
Dan M, Christi
ne C,
San
da H
,
Danie
l
H. C
OGE
X
A semanticall
y
a
nd co
nte
x
tu
all
y
e
n
rich
e
d
log
i
c prov
er
for questio
n
an
s
w
e
r
in
g.
Journ
a
l of Appl
ie
d L
ogic.
20
07; 5(1
)
: 49-69.
[16]
Z
heng
X, Yun
h
uai L, Li
n M, C
hua
npi
ng H, L
an C. Gener
ati
ng tempor
al se
mantic conte
x
t
of concepts
usin
g
w
eb s
ear
ch eng
in
es.
Journa
l of Netw
ork and Co
mpute
r
Applic
ations
.
201
4; 43(3): 42
-55.
[17]
Azade
h N, Eh
san E, Graci
e
la G. T
o
w
a
r
d
s gen
er
ati
ng
a pati
ent’s tim
e
lin
e: E
x
tracti
ng temp
ora
l
relati
onsh
i
ps from clinic
al n
o
tes.
Journal of Biom
edical Informatic
s. 201
3
;
46(Supp
lem
e
nt): S40-S47.
[18]
Noriy
u
ki I, Chun
ming G, Makoto Y.
Questio
n
Generati
on f
o
r
Lear
ner C
e
ntered L
ear
nin
g
. IEEE 13th
Internatio
na
l C
onfere
n
ce o
n
Advanc
ed L
earn
i
ng T
e
chnol
og
i
e
s. 2013: 3
30-
332.
[19]
Sumita E, Sugay
a F, Yamam
o
to S.
Automatic
Generati
on Method of
a
Fill-in-th
e
-bl
ank Question
fo
r
Measuri
ng En
g
lish Profici
ency
.
T
e
chnical re
p
o
rt of IEICE. 2
004; 10
4(5
03): 17-2
2
.
[20]
Micha
e
l
H, No
ah AS.
Questi
on Ge
ner
ation
via
Over
gen
er
ating
T
r
ansformations
an
d
R
anki
ng.
200
9.
http://
w
w
w
.
cs.c
m
u.edu/~
m
he
il
man/pa
per
s/h
e
ilman-sm
ith-q
g
-
tech-rep
o
rt.pd
f
.
[21]
David
L,
F
r
ed
P, John
NPW
.
Generati
n
g
Na
tural
La
ngu
ag
e
Questio
n
s to
Supp
ort L
earn
i
ng On-
L
i
n
e
.
14th Euro
pe
an
W
o
rkshop on
Natura
l Lan
gu
age Gen
e
rati
o
n
, Bulgar
ia. 20
13: 105
–1
14.
[22]
Husam A, Yllia
s C, Sadid AH.
Automati
on of
questio
n
gen
e
r
ation fro
m
sen
t
ences
. In Proceed
ings o
f
QG2010: T
he
T
h
ird W
o
rkshop on Ques
ti
on
Generati
on. 20
10: 58–
67.
[23]
Min KK, Ha
n JK.
Desig
n
of Questio
n
Answ
e
r
ing Syst
e
m
w
i
t
h Auto
mated
Question Ge
ne
ration
. IEEE
confere
n
ce on
F
ourth Internat
ion
a
l Co
nferen
ce
on Net
w
ork
ed Comp
utin
g and Adv
ance
d
Informati
o
n
Mana
geme
n
t. 200
8: 365-
368.
[24]
Don K, Jim M, Vaugha
n
P.
Knuth-Morri
s-Pratt (KMP) exact pattern
-match
in
g alg
o
rith
m
. 201
0.
http://
w
w
w
.
cs.c
m
u.edu/~
k
mp.
pdf.
[25]
Rosni
L, Elis
a MS, Rani, Monic
a
VS, Ayu
n
is
a, Minda
ri, Suhen
dro
w
an PS. An Appro
a
ch fo
r
Automatica
ll
y Generati
ng
Star
Schema fro
m
Natural La
n
gua
ge.
TELKOMNIKA Teleco
mmu
n
icati
o
n
Co
mp
uting El
e
c
tronics an
d C
ontrol.
20
14; 1
2
(2): 501- 5
1
0
.
Evaluation Warning : The document was created with Spire.PDF for Python.