TELKOM
NIKA
, Vol.12, No
.2, June 20
14
, pp. 501~5
1
0
ISSN: 1693-6
930,
accredited
A
by DIKTI, De
cree No: 58/DIK
T
I/Kep/2013
DOI
:
10.12928/TELKOMNIKA.v12i2.2011
501
Re
cei
v
ed
Jan
uary 30, 201
4
;
Revi
sed
Ma
y 14, 201
3; Accepted Ma
y
27, 2014
An Approach for Automatically Generating Star
Schema from Natural Language
Rosni Lumb
a
ntor
uan, Elisa Marga
r
eth Siba
rani, Monica Vera
w
a
ty
Sitorus, A
y
un
isa
Mindari, Suhendro
w
a
n
Putra Sinaga
Del Institute of T
e
chnolog
y
Jl. Sising
aman
gara
j
a, Sitol
u
a
m
a, Lagu
boti,
Kabu
pate
n
T
o
ba Samos
i
r, Sumatera Utar
a
*Corres
p
o
ndi
n
g
author, e-ma
i
l
: rosni@d
e
l.ac
.id,elis
a@d
e
l.a
c
.id
A
b
st
r
a
ct
T
he star sche
m
a is a for
m
of data w
a
rehous
e mod
e
lli
ng, w
h
ich acts primary storage fo
r
di
me
nsio
nal
da
ta that en
ab
les
efficient r
e
trie
val of
busi
ness
infor
m
ati
on for
decis
ion
maki
ng. Star sch
e
m
a
s
can b
e
g
e
n
e
rat
ed fro
m
busi
n
e
ss nee
ds th
at w
e
refer to as
a user
bus
in
es
s key or fro
m
a
relati
on
al sch
e
m
a
of the operati
o
nal syste
m
. T
here ar
e ma
ny
tools avail
a
b
l
e to auto
m
atic
ally ge
ner
ate
star schema from
relati
ona
l sche
m
a, suc
h
as BI
RST
and SAM
ST
AR; how
eve
r
, there is no
a
pplic
atio
n that
can a
u
to
matica
lly
gen
erate
it fro
m
a
user
bus
in
ess key
t
hat is repres
ente
d
i
n
the
for
m
of
h
u
m
a
n
lan
g
u
age.
In this
p
a
p
e
r,
w
e
offered a
n
a
p
p
roac
h for aut
omatica
lly g
e
n
e
ratin
g
st
ar sc
he
ma fro
m
us
er busi
ness k
e
y(s). It begin
s
by
process
i
ng th
e
user busi
nes
s key using a
syntactical
pa
rsing pr
ocess
to identify n
o
u
n
w
o
rds. T
hose
ide
n
tified w
o
rd
s w
ill be us
ed t
o
ge
nerate
di
mensi
on tab
l
e
ca
ndi
dates a
nd
a
fact table. The
evalu
a
tio
n
res
u
lt
indic
a
tes th
at the too
l
ca
n g
e
nerate star
sc
he
ma
bas
ed o
n
the i
n
p
u
tted
user b
u
siness
key(s) with som
e
li
mitatio
n
s i
n
that the star s
c
he
ma w
i
l
l
n
o
t be fo
r
m
e
d
if
the di
mensi
o
n
a
l tab
l
es
do
n
o
t have
a d
i
re
ct
relati
onsh
i
p.
Ke
y
w
ords
: star schema, user
busin
ess key, parsi
ng
proc
es
s, fact
table,
di
me
nsi
ona
l tabl
e
1. Introduc
tion
The data
warehou
se i
s
di
mensi
onal d
a
t
a stor
a
ge co
ntaining a
c
cu
mulated tra
n
saction
a
l
data from
a
wide
ran
ge o
f
sou
r
ce
s
within an
org
ani
sation
and i
s
use
d
to gui
de man
age
m
ent
deci
s
io
ns [1]. Its purpo
se
is to supp
ort
deci
s
io
n
-
ma
kers to take
deci
s
io
ns qui
ckly, sin
c
e th
e
operational
d
a
taba
se h
a
s
a vast am
oun
t of data an
d
i
t
is po
ssibly changi
ng ove
r
time. The initi
a
l
step to data warehouse
modelling is
explicitly stating the user
busi
n
ess
key
in the form o
f
natural
lang
u
age
so a
s
to i
dentify whi
c
h
data ne
ed
s t
o
be a
c
cum
u
l
a
ted to supp
ort the d
e
ci
si
on-
make
rs. Ho
wever, it is a
time-con
sumi
ng pro
c
e
s
s since ma
ny steps n
eed to be com
p
lete
d,
starting
from
unde
rstandi
ng the u
s
e
r
busi
n
e
ss
key
whi
c
h i
s
ex
pre
s
sed in
n
a
tural l
angu
a
ge,
contin
uing
wi
th identifying the dimen
s
io
nal table
b
a
sed on the
user bu
sin
e
ss
key and so forth.
From a p
r
act
i
cal point of view, data wareh
o
u
s
e mo
delling p
r
o
c
e
s
ses n
eed to
be improved
in
terms of the
efficien
cy of
deci
s
io
n-m
a
ki
ng time. T
h
e
r
e a
r
e
several ki
nd
s of d
a
ta ware
hou
se
model
s, and
this research
will focus o
n
dimen
s
io
na
l modellin
g,
spe
c
ifically o
n
sta
r
sch
e
m
a
s.
The star sche
ma
i
s
a
relati
onal schem
a that
ha
s one
or m
u
ltiple fa
ct table
s
which a
r
e
refe
ren
c
ed
to one or mo
re dimen
s
ion t
able
s
[2].
Relate
d re
se
arch re
gardi
ng automati
c
gen
er
atio
n
tools of star sche
ma
su
ch a
s
SAMSTAR [3], BIRST [4] is available a
nd wa
s ca
rri
e
d
out in [5] with the main objective bein
g
to
transfo
rm a
n
entity relation
ship m
odel in
to star
sch
e
m
a
. The tran
sf
ormatio
n
process acce
pts t
he
transactio
nal
databa
se an
d the ex
pect
ed output by looki
ng into the ca
rdin
ality of each tabl
e is
can
d
idate fa
ct tables. T
he
main
chall
e
n
ge i
s
if
the
d
a
taba
se
cont
ains many ta
bles,
as it
wo
uld
be a time-co
n
sumi
ng p
r
o
c
ess. Also, it woul
d be diffi
cult for the u
s
er to find a
n
d
sele
ct the fact
table if many can
d
idate fa
ct tables
are g
enerated by the appli
c
atio
n.
Therefore, th
e prop
osed solution is p
r
o
c
e
ssi
ng
the u
s
er b
u
si
ne
ss
key whi
c
h i
s
written in
human
lan
g
u
age
by impl
e
m
enting
seve
ral
step
s
defi
ned i
n
n
a
tura
l lang
uag
e p
r
oce
s
sing
(NL
P
)
.
NLP is a set of techniqu
e
s
used to examine
the g
r
ammar
stru
ct
ure an
d the meanin
g
of the
sente
n
ce
whi
c
h
wa
s given
by the
user fo
r the
pu
rp
ose
of attaini
ng i
n
formatio
n from the
sente
n
ce
[6]. The main
differen
c
e
wi
th the re
sea
r
ch in [5] is i
n
the input, which
wa
s u
s
e
r
bu
sine
ss ke
ys
,
and th
erefore
the
wh
ole
proce
s
s of
mod
e
lling th
e
st
ar sche
ma i
s
al
so
co
mpletel
y
different.
Th
e
aim of o
u
r
re
sea
r
ch
wa
s t
o
build
an
ap
plicatio
n that
is abl
e to
pro
c
e
s
s u
s
er bu
sine
ss key u
s
ing
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 12, No. 2, June 20
14: 501 – 51
0
502
natural l
ang
u
age p
r
o
c
e
ssi
ng, co
ntinue
with de
si
g
n
in
g
and creatin
g
a
data wa
rehou
se usi
n
g
star
schema m
o
d
e
lling in a
cco
rdan
ce with th
e use
r
bu
sine
ss
key.
The sco
pe of
this re
se
arch is to tra
n
sl
ate a u
s
er
b
u
sin
e
ss
key i
n
the form
of natural
langu
age
written in
Engli
s
h
to p
r
ovide
th
e info
rmation
desi
r
ed
by th
e u
s
e
r
. To
do
so,
a
se
rie
s
of
natural l
ang
u
age p
r
o
c
e
ssi
ng tasks mu
st be d
one t
o
obtain
wo
rds
whi
c
h a
r
e
categ
o
ri
sed
as
noun
s
and to
be
able to
id
entify can
d
id
ates fo
r di
me
nsio
n table
s
.
Dimen
s
io
n ta
bles are u
s
e
d
as
the basi
s
for the establi
s
hment of a related fact
ta
ble, hen
ce b
o
th fact and
dimen
s
ion ta
bles
form a star schem
a. The appli
c
ation was built
usi
n
g C# langu
a
ge and the libra
ry which wa
s
use
d
for imp
l
ementing
na
tural la
ngu
ag
e proc
essin
g
is
Proxem Advance
d
Natural Lan
gu
age
Processin
g
Obje
ct-o
rient
ed Environ
m
ent (Ant
elop
e) [7]. The
evaluation
pro
c
e
ss of
the
appli
c
ation u
s
es Adventu
r
e
W
o
r
ks and
Northwi
nd data
bases.
This
pap
er i
s
org
ani
sed
as follows: Ch
a
p
te
r 1
de
scri
bes
cle
a
rly th
e problem
s o
c
curring
and the m
o
tivation for do
ing the rese
arch. The
r
ef
ore, the g
e
n
e
ral o
b
je
ctive and p
r
op
o
s
ed
solutio
n
sho
u
l
d
al
so be sta
t
ed.
We
then
contin
ue
with
ch
apte
r
2,
where
we expl
ain the
ste
p
s
of
our research
in orde
r to get the solutio
n
to t
he prob
lem stated in
chapte
r
1. We continu
e
with
cha
p
ter 3, in
whi
c
h a
de
scription of the l
i
terature
stu
d
y
is given. In
cha
p
ter 4, th
e analy
s
is
re
sult
will be explai
ned to prese
n
t our
soluti
on to so
lve t
he problem
and defin
e the re
sult of
the
evaluation p
r
oce
s
s. Finally
, in chapte
r
5
we poi
nt
to the co
ncl
u
si
o
n
of the re
se
arch ba
se
d o
n
an
evaluation
of the re
sult a
n
d
also raise recom
m
en
dat
i
ons fo
r futu
re
re
sea
r
c
h
wo
rk
relate
d to t
h
is
field.
2. Res
earc
h
Method
Our research
is mainly analytical. It trie
s to
find a sol
u
tion to automatically gen
erate st
a
r
schema from
requi
rem
ent
s expr
esse
d in huma
n
la
ngua
ge. Ou
r research
was cond
ucte
d by
applying ap
proache
s usi
n
g
t
he followin
g
step
s:
1.
Con
d
u
c
t literature
study
on natu
r
al l
a
ngua
ge
p
r
o
c
essing and data
wareho
use co
ncept,
spe
c
ifically on star
sche
ma modelli
n
g
. At
this stage, the lea
r
ning an
d un
derstandi
ng
of
those
con
c
e
p
t
s wa
s by rea
d
ing an
d un
d
e
rsta
ndin
g
pa
pers an
d textboo
ks avail
a
b
l
e on tho
s
e
fields.
2.
Explore an
d analyse seve
ral natu
r
al la
ngua
ge pr
ocessing tool
s. Some
tools a
r
e provided
i
n
the form of libra
rie
s
su
ch
as Natu
ral
Lang
uage T
o
olKit (NLTK
)
and Proxe
m
Advance
d
Natural Lan
g
uage Pro
c
e
ssing Obje
ct-o
ri
ented Enviro
nment (Antel
ope). Th
e re
sult was u
s
ed
as practi
cal consi
deration i
n
the selection of the
tools that will be used
in this research.
3.
Analyse the
pro
c
e
ss fo
r
star sche
ma d
e
sig
n
to impl
ement several finding
s in
the definin
g
solutio
n
to an
swer the rese
arch problem.
4.
Develo
p an
appli
c
ation
whi
c
h con
s
ists of basi
c
softwa
r
e en
g
i
neeri
ng ste
p
s
su
ch a
s
appli
c
ation a
nalysi
s
, desi
g
n, and co
de p
r
og
rammi
ng to build the fin
a
l appli
c
ation.
5.
Evaluate the appli
c
atio
n by usi
n
g
exis
ting d
a
taba
se
s in
SQL Serv
er which a
r
e
Adventure
W
o
r
ks and
Nort
hwin
d. We u
s
ed tho
s
e two databa
se
s as the so
urces to buildo
u
r
data ware
ho
use. T
he eva
l
uation
wa
s d
one by a
naly
s
ing th
e sta
r
schema
re
sul
t
base
d
on
the use
r
bu
si
ness key whi
c
h was
entered into t
he a
pplication. Ba
sed o
n
tho
s
e
results, we
evaluated the
result manu
ally and de
ci
ded whet
he
r
the re
sulting
star
schem
a wa
s goo
d o
r
not.
Analyse
the
e
v
aluation
re
sult in o
r
d
e
r to
su
mma
rise t
he
re
sea
r
ch q
uestio
n
a
nd
concl
ude
wheth
e
r the p
r
oble
m
wa
s a
n
swered o
r
n
o
t.
3. Analy
s
is
In this se
ctio
n, we explain
our analy
s
is re
sult
s and
approa
ch fin
d
ing
s
to auto
m
atically
gene
rating th
e star
schem
a.
3.1. User Bu
siness Ke
y
Practi
cally, the deci
s
io
n-m
a
ke
rs
pre
s
e
n
t their inform
a
t
ion need
s in
a use
r
bu
sin
e
ss
key
that is usuall
y
in the form of natural langua
ge
whi
c
h has cha
r
acteristic
s of un
limited stru
ct
ure
and sometim
e
s is
not ba
sed on the tru
e
gram
mar
of
that particul
a
r lang
uage. T
herefo
r
e, in o
r
de
r
to ena
ble
a
compute
r
to
u
nderstan
d n
a
t
ural la
ngu
ag
e, re
sea
r
ch i
s
d
one
in th
e
field
kno
w
n
as
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
An Approa
ch
for Autom
a
tically Ge
nerating St
ar Sche
m
a
from
.... (Ro
s
ni Lum
ba
ntorua
n)
503
natural la
ngu
age processi
ng [8]. The gene
ral a
r
chi
t
ecture of n
a
tural lan
gua
ge pro
c
e
s
sin
g
con
s
i
s
ts of (1
) a chun
ke
r, whi
c
h is th
e software fo
r pa
rsin
g on
e sen
t
ence i
n
to a set of words
a
n
d
is usually sto
r
ed in a
n
array or a list, (2) a
pa
rser,
whi
c
h give
s the tag ba
sed
on the type of
words
and
also rule
s of grammar
ba
se
d on the
ch
u
n
kin
g
re
sult
s
and finally fo
rms
a se
nten
ce
tree, and
(3)
a part-of-spe
ech tag
g
e
r
(POS Tagge
r), which
will tag every word that has b
e
e
n
p
a
r
s
e
d
in
a
se
n
t
e
n
c
e
b
a
s
e
d
o
n
a
c
o
mp
le
te
ta
g
s
e
t in o
r
d
e
r
to
o
b
t
a
i
n
th
e
re
la
tio
n
s
h
i
p
b
e
t
w
een
words. T
he t
ags th
at are
given are mo
re
comp
l
e
x a
nd detail
ed t
han the ta
g
by the pa
rser, so
that POS T
a
gger
can
di
stinguish va
ria
t
ions i
n
ve
rb
form
s, va
ria
t
ions
of
sing
ular an
d pl
ural
noun
s, and v
a
riation
s
in conjun
ction
s
[8].
Several stu
d
i
e
s
were co
nd
ucted o
n
buil
d
i
ng natu
r
al l
angu
age p
r
o
c
essing tool
s,
and on
e
of the re
sul
t
s wa
s Ant
e
lope
(Adva
n
ce
d Na
tu
ra
l Langu
age
Object
-O
rie
n
ted Pro
c
e
s
sing
Environme
n
t). It wa
s a
library
written i
n
C# fo
r
doing
natural
lan
g
u
age
pro
c
e
s
si
ng
whi
c
h
co
n
s
ist
s
of three
p
r
o
c
esse
s, which
are ta
ggin
g
, chun
king,
and
parsing,
and
also
a li
st of
words ne
ede
d to
pro
c
e
ss
sent
ences in Engl
ish.
We defin
e a f
o
rmat to si
m
p
lify the process fo
r the u
s
er
wh
o provides the
user
busi
n
e
s
s
key as
an in
put to the tool. Several g
u
idan
ce
s
are
available to
help u
s
ers
define thei
r use
r
busi
n
e
ss key
:
1.
The se
nten
ce
must be
an i
n
terrogative senten
ce.
Basically, a d
a
ta wa
reh
o
u
s
e is b
u
ilt to p
r
ovide the
inf
o
rmatio
n ne
e
ded by the
u
s
er,
su
ch
as
“What are the
average
sale
s?” a
nd “Wh
a
t
is the most orde
re
d pro
d
u
ct?”.
2.
The senten
ce must
conta
i
n more th
an
one
no
un a
nd should
ha
ve the sam
e
word with th
e
name of
an
existing tabl
e
in the data
b
a
se
so
urce.
There a
r
e two re
aso
n
s why we n
eed
a
use
r
bu
sine
ss key that mu
st contai
n two
noun
s:
a.
Star schem
a
modellin
g co
uld be
done
by finding an
y relation
ship
betwe
en
can
d
idate
s
of
dimen
s
ion ta
bles
whi
c
h a
r
e found from
any noun
s in
the use
r
bu
siness
key. Th
erefo
r
e if
the use
r
bu
siness key only has one
noun, then
it will need much effort
to find all
relation
shi
p
s
whi
c
h on
e ca
ndidate dim
e
nsio
n table h
a
s.
b.
If the user
bu
sine
ss key h
a
s mo
re th
an
one n
oun, th
en the p
r
o
c
e
ss
ca
n contin
ue to form
the sta
r
sch
e
ma. However, the
num
ber
of
nou
ns ca
n affect t
he pe
rforma
nce
of the
appli
c
ation. It must
se
arch
throu
gh all
p
o
ssib
le
relati
onship
s
am
o
ng the
dimen
s
ion ta
ble
can
d
idate
s
. F
o
r
example
if
the u
s
e
r
p
r
ovi
des
the
user
busi
n
e
s
s key: "Wh
o
a
r
e th
e
vendo
rs
who
s
e p
r
od
u
c
ts are the most wid
e
ly purcha
s
e
d
b
y
any custo
m
ers?". There are thre
e
noun
s
whi
c
h
were foun
d: "prod
u
ct", "cu
s
to
me
r", and
"vendor". Th
e
r
efore the n
e
x
t process
need
s to fin
d
a co
mbin
ation of po
ssible
relatio
n
s
hip
s
bet
we
en tho
s
e ca
ndidate
s
:
relation
shi
p
b
e
twee
n "pro
d
u
ct" and "cu
s
tome
r", betwee
n
"cu
s
to
mer”
and "ve
ndor", an
d
betwe
en "vendor" a
nd "p
rodu
ct". The
result w
ill d
e
finitely be that more tim
e
will be
requi
re
d to search the relationship bet
wee
n
t
hose noun
s. The
r
e
f
ore, in this rese
arch we
rest
rict the nu
mber of no
un
s to be proce
s
sed to only two.
There a
r
e two define
d
st
ructures u
s
ed
in gen
er
al fo
r interro
gative se
nten
ce in
English:
yes/no
q
u
e
s
tion
an
d word que
stion cate
gorie
s.
Th
e
s
t
ruct
u
r
e f
o
r u
s
er
bu
sin
e
s
s
key
mu
st
b
e
i
n
the form
of a
que
stion. Th
e
stru
ctures
of interrogative sente
n
ce
s which use
a qu
estion wo
rd a
r
e
as
follows
:
1.
Ques
tion Word
+
Verb
+
Ob
je
c
t
Example:
Wh
at is you
r
na
m
e
? The obje
c
t can b
e
a n
oun.
2.
Ques
tion Word
+
Auxiliary
+
Subje
c
t
+
Verb
+
Obje
ct
Example:
Wh
en d
o
e
s
the
t
eacher m
eet
the stu
dent
?
Both
subje
c
t and obje
c
t co
uld
b
e
n
oun
wor
d
s.
Therefore
the
user bu
sin
e
ss
key fo
rmat
whi
c
h
i
s
written in interroga
tive form mus
t
follow
the stru
ctures below:
1. [w
hat
/
who
/
whe
r
e
/
wh
en
] +
Verb
+ [
M
easure
]
+ of
+ [
Table'
s Na
m
e
] +
Verb
+ by +
[
Table
's Nam
e
] +
[
A
d
v
e
rb of time
/
Null
]
Example:
Wh
at is the high
est pri
c
e of th
e prod
uct pu
rcha
se
d by a
cu
stom
er
?
2. [
what
/
wh
o
/
whe
r
e
/
when
] +
Verb
+
[
T
able'
s Nam
e
] + an
d + [
wha
t
/
who
/
wh
er
e
/
whe
n
] +
Verb
+ [
Ta
ble
'
s Nam
e
]
Example:
Wh
o are ou
r cust
om
ers an
d what prod
uct
s
are the
y
bu
yi
ng
?
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 12, No. 2, June 20
14: 501 – 51
0
504
3.1.1. User Business
Ke
y
Processin
g
The n
a
tural
l
angu
age
pro
c
e
ssi
ng lib
ra
ry that
we
used was Antel
ope, which p
e
rform
s
three
processes: tag
g
ing,
chun
ki
ng,
and
parsing.
Thi
s
resea
r
ch o
n
ly used t
he ta
g
g
ing
pro
c
e
s
s
in
obtainin
g
a li
st of nou
ns
b
e
ca
use the to
ol req
u
ires th
ose
nou
n words to b
u
ild a
data ware
hou
se.
In detail, the use
r
bu
sine
ss key
p
r
o
c
ess can be d
e
scribed a
s
follows:
1.
Splitting to deco
n
struct a
sente
n
ce to g
e
t a
list of wo
rds i
n
the se
ntence. An e
x
ample inp
u
t
is:
What is t
he m
o
st ord
e
red p
r
o
d
u
c
t by cu
stom
er in June 2
0
1
0
?
The process will use
String.Split, whi
c
h will
store the inform
ation as
a li
st of words in an
array of strings. The
result of thi
s
pro
c
e
s
s are:
{ "Wh
a
t", "is", "the"
, "most", "ordered", "pr
oduct", "by", "customer",
"in", "June", "2010"}.
2.
Nou
n
identification is the p
r
ocess of ide
n
tifying and g
i
ving a
tag to each wo
rd. Id
entification
is pe
rform
ed
usin
g BrillTa
ggerLexicon.txt. The list of noun words
whi
c
h were f
ound
sho
u
ld
be stored in a list of string
s. If no noun word ha
s be
en identified, then
the tool will ask the
use
r
to e
n
te
r a n
e
w
use
r
bu
sine
ss
key. The outp
u
t of this p
r
oce
s
s is: "p
rodu
ct", and
"customer".
3.
Measure ide
n
tification is t
he process o
f
searchi
ng a
word
that would be a m
e
asu
r
e in the
fact table. To
obtainin
g
a m
easure
wo
rd
a com
pari
s
o
n
w
ith the list of
statistical
wo
rd which is
listed in
Ta
bl
e 1. If the p
r
oce
s
s
could
not find
a
n
y
measure word in the
u
s
er
busi
n
e
ss
key,
we autom
atically assi
gned
"quantity"
to
be used a
s
the measure.
4.
Nou
n
fo
rm id
entification
is the p
r
o
c
e
s
s
of se
archin
g
a si
ngul
ar or
plural
form
of
the n
oun.
If
the nou
n whi
c
h
wa
s identi
f
ied is in the
sing
ular
, the
n
the pro
c
e
s
s
will find the p
l
ural fo
rm of
the noun a
n
d
vice versa. Identificatio
n wa
s pe
rform
ed usi
ng Pro
x
em.Lexicon.
dat file. The
result of this pro
c
e
ss i
s
ad
ded to the list
of nouns.
Table 1. List
of Statistical Wo
rd
No
Func
tio
n
Defini
tio
n
Ke
y
w
ord
1 AVG
average
Average, M
ean
2 SUM
total
Total, Su
mm
ary
3 MIN
minimum
amount
M
i
ni
m
u
m
,
m
i
nimal,
highest
4 MAX
max
i
mum
amoun
t
Ma
x
i
mu
m,
ma
x
i
ma
l
,
l
o
w
e
s
t
5
COU
N
T
number of occur
ence
Count, Nu
mber o
f
5.
Finding
dime
nsio
n table
s
based on th
e
noun
wo
rd
. The inp
u
t is
a list of nou
n
s
which wa
s
prod
uced by the previou
s
pro
c
e
ss. Ba
sed on ou
r example, the list of words a
r
e "prod
u
ct",
"c
us
tomer", "
p
roduc
t
s
"
, "cus
tomers
". The firs
t
s
t
ep
i
s
to elicita li
st of table’ na
mes from the
databa
se
so
urce. Thi
s
p
r
oce
s
s u
s
e
s
sysdatab
ase to retri
e
ve all
table na
me
s. The qu
ery
whi
c
h was u
s
ed to ret
r
ie
ve all table name
s
had t
he followin
g
syntax: "Se
l
ect * from
Information
_
S
chem
a.Tabl
es where Ta
ble_
Ca
talog
= 'Adventu
r
e
W
o
r
ks' an
d Table_
Type
=
'BASE TABLE' and
Table_Nam
e
NOT
IN ('sysdiagrams','dtpropert
i
es')". The
second step i
s
to compa
r
e th
e name of the table with o
u
r list
of noun
s. The output
of this proce
ss b
a
sed on
the above pro
c
e
s
sed u
s
er
busi
n
e
ss
key are: "pro
du
ct " and "cu
s
to
mer ".
3.2. Star Sch
e
ma Modelling
Data
warehouse modelling aim
s
to facilitat
e the
presentation, de
sign, availability,
acce
ssi
bility,
reliability, an
d unde
rsta
nd
ability of
info
rmation so that it can prese
n
t the facts
need
ed by d
e
ci
sion
-ma
k
e
r
s of the to
p-level ma
na
gement in a
compa
n
y. Data ware
ho
use
modellin
g ca
n be perfo
rm
ed usi
ng two
techni
que
s, which a
r
e entit
y relationship
and dimen
s
i
onal
modelling. Entity relationship mode
lling i
s
a technique used to m
o
del the data at
the conceptual
level of datab
ase d
e
si
gn [9
].
Star schema
modelling is the most com
m
only us
ed technique, whi
c
h consist
s
of a large
central tabl
e
(fa
c
t table
)
to sto
r
e th
e l
a
rge
s
t
colle
ct
ion of
useful
data
and
su
pportin
g
tabl
es
(dime
n
si
on ta
bles) [10].
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
An Approa
ch
for Autom
a
tically Ge
nerating St
ar Sche
m
a
from
.... (Ro
s
ni Lum
ba
ntorua
n)
505
3.2.1. Crea
ting Dimensio
n Tables
The m
a
in
obj
ective of thi
s
pro
c
e
s
s is to
find the ta
ble
and
col
u
mn
o
f
that table
which
ha
s
sa imilar n
a
m
e
to the candi
date dimen
s
i
on table
from
the processe
d use
r
bu
sine
ss
key and al
so
to deno
rmali
s
e the
dime
nsio
n table
whi
c
h
wa
s found. T
he d
e
tailed p
r
o
c
e
ss
of creati
n
g
dimen
s
ion ta
ble is a
s
follo
ws:
1.
Validate
the numbe
r
of dimensi
on
ta
ble
ca
ndid
a
te
s;
if it is mo
re t
han o
ne, the
n
proceed t
o
the next p
r
o
c
ess, but if
it i
s
small
e
r
or
eq
ual to o
ne, th
en
the
end
th
e process
to ask
the user
to enter anot
her u
s
e
r
bu
si
ness key.
2.
Tra
c
e the rel
a
tionship that
dimen
s
ion t
abl
e candi
dat
es have
had
in the sou
r
ce databa
se.
This process will check t
h
rough
the entire transact
ion table
in the source database to
examine th
e
relatio
n
ship
s amon
g the
dimen
s
ion
ta
ble
can
d
idat
es. If no
ca
ndidate
s
are
interconnected with each
other
, then
no new fact table
will
be formed
and th
e process will
stop. The
proce
s
s of find
ing rel
a
tion
ships
between
the dimen
s
i
on table
ca
n
d
idate
s
ha
s
three st
eps:
a.
Che
c
k the tra
n
sa
ction tabl
e that has a pr
ima
r
y key of the dimen
s
ion table
ca
ndidate
s
.
This
pro
c
e
s
s
can
be d
one
usin
g sy
sdata
bases
by ch
e
cki
ng fo
r a ta
ble that ha
s
a
foreig
n
key as
well a
s
a prim
ary key.
b.
If the transa
c
tion table ha
s no p
r
ima
r
y key col
u
mn
of the dimen
s
ion tabl
e ca
ndidate
s
then the
p
r
o
c
ess
will
che
c
k the
p
r
ima
r
y key
of the
d
i
mensi
on ta
bl
e of th
e tra
n
s
a
c
tion
table that was che
c
ked.
If
the dimensi
on table
from the tran
sa
ction table ha
s
relation
shi
p
t
o
the
dime
nsion tabl
e
can
d
idate
th
en
we co
uld say
that
they are
related.
This p
r
o
c
e
ss can b
e
do
n
e
usi
ng sy
sd
ataba
se
s by examine the
sou
r
ce tabl
e of the
foreign
key in
the dimensi
o
n table of the transactio
n
table.
c.
If the transa
c
tion table tha
t
is che
c
ked
onl
y has a
re
lationship to one of the di
mensi
on
table
candi
d
a
tes th
en ot
her
dimen
s
io
n tabl
e
ca
ndi
dates should
be
che
c
ked
with th
e
dimen
s
ion of
the transa
c
ti
on table. Thi
s
pro
c
e
s
s ca
n be done u
s
ing
sysd
ata
bases by
examining th
e sou
r
ce tabl
e of the forei
gn ke
y in the
dimensi
on ta
ble of the tra
n
sa
ction
table.
3.
The den
orm
a
lisation of dim
ensi
on
table
can
d
idate
s
h
a
s thre
e step
s:
a.
Finding all fo
reign
keys of
each table t
o
retr
ieve all
attributes
req
u
ired by the table to
sup
port the
deno
rmali
s
ati
on process
of t
hose
tabl
es.
The sea
r
ch pro
c
e
s
s can
b
e
compl
e
ted by
executing
Q
u
e
ry 1.
Query 1. Find
ing Foreign K
e
y
b.
Finding the
source table
o
f
the foreign
key
and
retri
e
ving all attri
butes that th
e sou
r
ce
table ha
s. This pro
c
e
s
s is
ca
rri
ed out by executin
gQu
e
ry 2.
Query 2. Find
ing the Source Table
c.
Adding all the
attributes ge
nerate
d
in th
e
previou
s
pro
c
e
s
ses into di
mensi
on tabl
es.
SELECT OBJECT_NAME (f.rkeyid) AS 'PKTable' FROM sysforeignkeys
f INNER JOIN syscolumns c1 ON f.fkeyid = c1.[Id] AND f.fkey =
c1.colid INNER JOIN syscolumns c2 ON f.rkeyid = c2.[Id] AND
f.rkey = c2.colid Where OBJECT_NAME(f.fkeyid) = 'Product' + "'
and c1.[name] = '<foreign key>';
SELECT ac1.[Name] AS 'ColumnName' FROM
sys.foreign_key_columns fk INNER JOIN sys.all_columns ac1 ON
fk.parent_object_id = ac1.[Object_id AND fk.parent_column_id
= ac1.column_id Where OBJECT_NAME (fk.parent_object_id) =
'Product '.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 12, No. 2, June 20
14: 501 – 51
0
506
The o
u
tput re
sult of thi
s
p
r
oce
s
s in
clud
e
s
all
dimen
s
io
n table
s
tog
e
ther
with
all at
tributes
of those table
s
, as can be
see
n
in Figu
re 1.
Figure 1. Ne
w Dime
nsi
on
Table
s
3.2.2. Crea
ting Fact T
a
bl
es
A facttable was create
d
from each pri
m
ary
key fou
nd in every d
i
mensi
on tabl
e and the
statistical word co
ntained i
n
the user b
u
sin
e
ss
key will be
com
e
the mea
s
u
r
e i
n
the fact tab
l
e.
The pro
c
e
s
s contin
ue
s by asking the u
s
er for f
eedb
a
ck ab
out wh
a
t
the right measu
r
e
s
is, whi
c
h
will be used f
o
r num
eri
c
al
attributes. The detailed pr
ocess of creating fact
tables is as follows:
1.
Cho
o
se n
a
m
e
for fact
tabl
e by
com
b
ini
ng al
l
dim
e
n
s
ion tabl
es’
na
mes. F
o
r exa
m
ple, if
we
have dim
e
n
s
ion
table
s
"pro
du
ct" and "
c
u
s
to
m
e
r", then
o
u
r fa
ct tabl
e’s
name
i
s
"Custo
merP
rodu
ct".
2.
The sea
r
ch for a p
r
ima
r
y key in the di
mensi
on tabl
es i
s
ca
rri
ed
out be
cau
s
e
a fact tabl
e
contai
ns all
keys in the di
mensi
on tabl
es. To
find th
e prima
r
y key
,
Query 3 ca
n
be execute
d
.
Query 3. Find
ing Prima
r
y Keys
3.
Adding
a
pri
m
ary
key fro
m
a
dime
nsi
o
nal tabl
e
into
the fact
table.
All p
r
ima
r
y keys
whi
c
h
are
found i
n
the
previou
s
step
are
store
d
i
n
the li
st
of stri
ngs
that hold
s
the
colum
n
name
s
and
c
o
lumn types.
4.
Adding
mea
s
ure
s
to th
e f
a
ct tabl
es which
were
fo
und i
n
the
u
s
er bu
sin
e
ss key
be
side
“qua
ntity”. The measure would be
com
e
a value in
the fact table so that it has a con
c
lu
sio
n
or
sum
m
ary.
The
inp
u
t o
f
this
pro
c
e
s
s i
s
th
e li
st
of mea
s
u
r
e
s
whi
c
h
was f
ound
in th
e
previou
s
ste
p
, whi
c
h i
s
t
he u
s
e
r
bu
si
ness
key p
r
oce
s
s. It co
ntinue
s by o
b
taining th
e
measures u
n
i
t
by following these two ste
p
s:
a.
Loo
king for
a
column i
n
the dimen
s
ion t
abl
e that ha
s a numeri
c
value by ch
ecking the
data type of
each colum
n
in the dim
e
n
s
ion tabl
e
such as integ
e
r,
doubl
e, long,
etc. and
storing it in the list of colum
n
s.
b.
Askin
g
for fe
edba
ck from
the users b
y
di
splaying
a list of col
u
mns to the
u
s
er
and
asking
th
em to
sele
ct whi
c
h colu
mn can
be
calcul
ated ba
se
d o
n
the me
asu
r
es.
The
measure whi
c
h is
created
is semi
-a
dditi
ve or additive
.
5.
Definin
g
fact table column
s by following these step
s:
a.
Finding
addit
i
onal me
asure ca
ndidate
s
by obt
ainin
g
all mea
s
u
r
es
contain
e
d
in the
transactio
n
ta
ble that
have
bee
n o
b
tain
ed in
the
pro
c
e
s
s of
sea
r
ching fo
r
relati
onship
s
betwe
en dim
ensi
on table
can
d
idate
s
.
b.
Getting fee
d
back from u
s
ers by di
spla
ying
all me
a
s
ures which
are
co
ntaine
d in the
transactio
n
t
able
and
which
co
nne
ct d
i
mensi
on ta
bl
es
and
a
ski
n
g
them
to
se
lect a
n
addition
al me
asu
r
e u
s
ed in
the fact table.
SELECT col_name (ic.OBJECT_ID, ic.column_id) AS ColumnName FROM
sys.indexes as i INNER JOIN sys.index_columns AS ic ON
i.OBJECT_ID = ic.OBJECT_ID AND i.index_id = ic.index_id WHERE
i.is_primary_key = 1 and OBJECT_NAME (ic.OBJECT_ID) =
<product>;'<foreign key>';
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
An Approa
ch
for Autom
a
tically Ge
nerating St
ar Sche
m
a
from
.... (Ro
s
ni Lum
ba
ntorua
n)
507
c.
Cre
a
ting fact
table and g
i
ving it a name by joinin
g all dimen
s
i
on table na
mes, for
example "Pro
duct
C
u
s
tome
r". Column
s o
f
the fa
ct table includ
e the prima
r
y key of each
dimen
s
ion ta
ble, identified
measu
r
e
s
an
d use
r
-sel
ect
ed mea
s
u
r
e
s
.
3.2.3. Crea
ting Time Dimension Table
s
The time
dim
ensi
on ta
ble i
s
created
by
che
c
kingth
e
data type
of t
he tran
sa
ctio
n table
colum
n
whi
c
h co
nne
cts the dim
e
n
s
io
n table
s
. If the data
type
is d
a
tetime
then the ti
me
dimen
s
ion
ta
ble
can
be
created. O
n
ly if the p
r
o
c
e
s
s finds any col
u
mns with
th
e
datetime
d
a
ta
type, can we
establi
s
h a t
i
me dimen
s
io
n table.
After we succe
e
d
in creating t
he sche
ma f
o
r
dimen
s
ion
ta
bles, fa
ct tabl
es, a
nd time
dimen
s
ion
ta
bles, th
en th
e process
ca
n continu
e
a
nda
data wa
reh
o
u
s
e sch
e
ma can be create
d
and im
plem
e
n
ted into the databa
se
server.
3.3. Building Data
War
e
h
ouse Schem
a
s
Input in this pro
c
e
s
s is
a list of fact
tabl
es, di
m
ensi
onal ta
bl
es, an
d a li
st of time
dimen
s
ion
ta
bles
(if esta
b
lishe
d).
T
h
e
use
r
i
s
aske
d
to provid
e feedba
ck
fo
r data wa
reh
o
u
se
name. In deta
il, the proce
s
s ca
n be de
scrib
ed a
s
follows:
1.
Validating th
e data
wa
re
hou
se n
a
me
to pr
eve
n
t
data wareho
use
dupli
c
ati
on. If the da
ta
wareho
use n
a
me al
rea
d
y exists, then t
he next st
e
p
is to
che
c
k for redu
nda
nt tables. If it
exists, the ap
plicatio
n will
alter the tabl
e by
deleting
redun
dant ta
bles. If the entered d
a
ta
warehouse name does not
exis
t, the process
will create a
new data warehouse
schem
a.
2.
Validating the
fact table to
prevent fact tabl
e dupli
c
ati
on. If it exists then the next proce
s
s i
s
che
c
king
all
colum
n
s of t
he fa
ct table;
if any
colum
n
diffe
rs fro
m
the
colu
m
n
s
of the fa
ct
table, then
th
e p
r
o
c
e
s
s will
alter the
existing t
able.
If the p
r
o
c
e
s
s d
oes not
find t
he
sam
e
fa
ct
table then the
application w
ill add the ne
w fact table.
3.
Validating th
e dimen
s
io
n table
s
to prev
ent dime
n
s
io
n table du
plication. If it exists the
n
the
next pro
c
e
s
s
is
che
c
king
al
l colu
mn
s on
the dime
nsi
o
n table, if a
n
y colu
mn
differs fro
m
the
colum
n
s of t
he n
e
w
dime
nsio
n table
then th
e p
r
o
c
ess
will alte
r the existin
g
table. If the
pro
c
e
s
s d
o
e
s
n
o
t find
th
e same
dim
ensi
on ta
ble,
then
the
ap
plicatio
n
will
add th
e
new
dimen
s
ion ta
ble.
4.
Definin
g
a q
uery to b
u
ild
the ne
w d
a
ta wa
re
hou
se
by defining
all table n
a
m
e
s, colum
n
name
s
, data t
y
pe of each column, an
d relation
ship
of
each table in
string type
a
nd sto
r
ing it
as
a varia
b
le.
The va
riabl
e
is a
dde
d u
s
i
ng
keywo
r
d
s
with
spe
c
ific
SQL syntax t
o
form
que
ry
syntax to cre
a
te databa
se
tables o
n
the serve
r
.
5.
Adding l
og
ch
ange
s to
the l
og d
a
taba
se
to re
cord
any
cha
nge
s to
th
e data
wareh
ouse. T
h
e
log datab
ase desi
gn can b
e
see
n
in Fig
u
re 2.
6.
A data wa
re
hou
se sch
e
m
a co
uld be
cre
a
ted by
tracking
all column
s and
relation
shi
p
s
betwe
en dim
ensi
on an
d fact tables. A d
a
ta
wareho
use schema
ca
n be se
en in
Figure 3.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 12, No. 2, June 20
14: 501 – 51
0
508
Figure 2Data Wa
reho
use Schem
a
Figure 3.Log
Datab
a
se
4.
Resul
t
and Discussion
A use
r
bu
sin
e
ss key is a
sente
n
ce th
at
identifies t
he key information that
deci
s
io
n-
make
rs nee
d to suppo
rt their busin
ess. T
he ke
y informatio
n contai
ns
basi
c
dataof
an
orga
nisation,
whi
c
h i
s
used to
con
d
u
c
t detailed
an
alyse
s
an
d d
e
rive b
u
si
ne
ss value. Si
nce a
use
r
bu
sin
e
ss key i
s
often pre
s
e
n
ted
usin
g natu
r
al
langua
ge, we delivered
an app
ro
ach
to
retrieve the
informatio
n by proce
s
sing
the
natural
languag
e to elicit noun
word
s thro
ug
h
syntacti
cal p
a
r
sin
g
. Identifi
ed no
un
wo
rd
s will
t
hen
be
mapp
ed
with
the table
s
in
the ope
ratio
n
a
l
databa
se to
get the dime
nsio
nal table
candi
date
s
. The process
contin
ued to
identify the fact
tables
with t
he summa
risation of the
measurem
ent
data a
s
soci
ated with
di
mensi
onal ta
bles.
Therefore, we have to first
l
y find the rela
tionship bet
wee
n
dimen
s
i
onal table
ca
ndidate
s
.
Based
on
the
app
roa
c
h,
we devel
ope
d
a tool
whi
c
h
i
s
a
b
le to
aut
omatically
ge
nerate
a
data wa
re
hou
se by a
c
cepti
ng user in
put, which is the
n
use
d
to cre
a
te a sta
r
sch
e
ma. The
r
e a
r
e
two main fun
c
tion
s of the
tool: (1) pro
c
e
ssi
ng u
s
er busin
ess ke
y to obtain noun word
s f
o
r
dimen
s
ion
ta
ble a
nd th
e
measures th
at will
be
use
d
(M
ea
sures
are
nee
ded
for fa
ct table
and
key of the dimensi
on tabl
e); (2) b
u
ild the dat
a ware
hou
se ba
sed
on the dime
nsio
n table a
n
d
measure re
sulting from the previo
u
s
proce
s
s. The fact table is formed usi
ng the prima
r
y key of
each dimen
s
i
on table an
d measure. The result of
this pro
c
e
s
s is t
he data ware
hou
se sch
e
m
a
.
F
K
_T
A
B
LE
LO
G
_
H
A
S_D
A
T
A
W
A
R
E
F
K
_AT
T
R
I
BU
T
_
H
A
S
2
_T
AB
LELO
G
D
a
t
a
W
a
r
ehou
s
e
C
hangeL
og
I
dLog
Na
m
e
M
odi
f
i
ed
D
a
t
e
i
n
t
eger
c
har
(
50)
da
t
e
<p
k
>
At
t
r
i
but
e
Log
Id
A
ttr
i
b
u
t
e
I
d
T
abl
e
Na
m
e
Ty
p
e
A
ddi
t
i
on
al
C
o
l
u
m
n
in
t
e
g
e
r
in
t
e
g
e
r
c
har
(
50)
c
har
(
50)
c
h
a
r(6
)
<p
k
>
<fk
>
T
abl
eLo
g
Id
T
a
b
l
e
I
dLog
Na
m
e
Ty
p
e
i
n
t
eger
i
n
t
eger
c
h
a
r
(
50)
c
h
a
r
(
50)
<p
k
>
<f
k
>
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
An Approa
ch
for Autom
a
tically Ge
nerating St
ar Sche
m
a
from
.... (Ro
s
ni Lum
ba
ntorua
n)
509
In orde
r to e
v
aluate the a
nalysi
s
re
sult
, we
execute
d
the tool by
providin
g the use
r
busi
n
e
ss keys as an in
p
u
t. Due to the wide
ran
ge of use
r
busi
n
e
ss key senten
ce
s, we
categ
o
ri
sedth
e
m by doing the followi
ng:
A
.
Us
er
bu
sin
e
s
s
key
ha
s a
st
at
ist
i
c
a
l
wo
r
d
an
d t
w
o
no
uns corre
s
po
nding
to th
e
name
of the
table in a rela
tional datab
a
s
e.
1.
What is the hi
ghe
st price of t
he produ
ct p
u
rcha
sed by
a cu
stome
r
?
2.
What is the a
v
erage p
u
rch
a
se of
the p
r
o
duct
s
su
pplie
d by vendors?
B.
User b
u
sin
e
ss key do
es
n
o
t contain
stat
istical
word
and two no
uns
co
rre
sp
o
nding to the
table’s n
a
me
in the relation
al databa
se.
1.
Who i
s
our
cu
stome
r
and
what
prod
uct
s
are they buyi
ng?
2.
What p
r
odu
ct
is the most
widely purcha
s
ed from any vendo
r?
C.
User
bu
sine
ss
key h
a
s m
o
re t
han t
w
o
nou
ns corre
s
po
ndin
g
to
the table
na
me in
the
relation
al dat
aba
se.
1.
Who a
r
e th
e vendo
rs
who
s
e p
r
o
d
u
cts
are th
e
most wi
del
y purcha
s
ed
by any
cu
st
ome
r
s
?
2.
What is the a
ddre
s
s of the cu
st
ome
r
wh
o boug
ht the prod
uct in 20
03?
D.
User bu
sin
e
ss key ha
s a si
ngle no
un.
1.
What is the a
m
ount of pro
duct sale
s in 2006
?
2.
What a
r
e the
averag
e sal
e
s of pro
duct
s
in Jun
e
200
6
?
E.
User bu
sin
e
ss key requi
re
s se
mantic u
nderstan
ding.
1.
What p
r
odu
ct
s are p
r
ofitabl
e?
2.
Who i
s
the m
o
st freq
uent cu
stome
r
who
make
s a pu
rchase?
F.
User bu
sin
e
ss key which a
r
e not
structu
r
ed a
s
an inte
rro
gative se
ntence.
1.
Give me the averag
e pu
rchase pro
d
u
c
t from vendo
rs
in 2004.
2.
I want to kno
w
our
cu
stom
ers a
nd what prod
uct
s
are they buying.
G.
User bu
sin
e
ss key which h
a
s no
un in pl
ural form.
Who a
r
e ou
r
cu
stome
r
s a
n
d
what produ
ct are they bu
ying?
From the
evaluation, in g
eneral, the o
u
tput
re
sult
of each user
busi
ness
ke
y being
pro
c
e
s
sed by
the tool
cou
l
d be
con
s
id
ered
succe
s
sful; howeve
r
,
we fou
nd th
ree
points th
at
need to be
co
nsid
ere
d
for the pro
p
o
s
ed
approa
ch:
1.
The nu
mbe
r
of noun
sin th
e user
busi
n
e
ss
key
will affect the al
gorit
hm complexit
y
; the more
noun
s th
ere
are, the
mo
re
com
p
lex the
algorith
m
. In
orde
r to
creat
e a fa
ct tabl
e, we
have
to
identify a tran
sa
ctional
tabl
e that
con
n
e
c
ts all
th
e di
m
ensi
onal
tabl
es. F
o
r exam
ple, with
two
dimensional t
ables, the algorith
m
will
run the method to find t
he
relationship once only. A
s
the numbe
r
of dimensi
o
n
table can
d
idates (r
e
p
re
sente
d
as n
)
incre
a
sed, the numb
e
r
relation
shi
p
s
will form a pol
ynomial se
rie
s
with Equati
on 1.
మ
ି
ଶ
(1)
2.
The app
roa
c
h was n
o
t able to define a star sch
e
ma
i
f
there is no d
i
rect relation
ship betwe
en
the dimen
s
io
nal table can
d
idate
s
.
The app
roa
c
h was n
o
t able to define a star sch
e
ma for user bu
sin
e
ss keys that
require
sema
ntic
und
ersta
ndin
g
. F
o
r exa
m
ple,
the test
scen
ario F1 ha
s
t
he word
"p
ro
fitable",
whi
c
h
mean
s all pro
duct
s
that ha
ve the highe
st sales.
Th
ere
f
ore, the tran
sa
ct
ion table
whi
c
h shoul
d be
identified wa
s SalesOrderDetail. It sho
u
ld displ
a
y the star sch
e
m
a
that has the Produ
ct table
and
Cu
stom
er ta
ble
as
dimen
s
ion
ta
bles an
d
Cu
stome
r
Pro
d
u
c
t table
s
tha
t
have th
e
same
colum
n
a
s
the Sale
sO
rd
erDetail table
.
However, d
ue to limitations
of our n
a
tural la
ngu
a
g
e
pro
c
e
ssi
ng
a
ppro
a
ch in
id
entifying sem
antic m
eani
n
g
,
those kind
s
of user bu
si
ness key
s
ca
nnot
be pro
c
e
s
sed
prope
rly.
5.
Conclu
sion
The
created
tool is a
b
le t
o
cre
a
te a
star
schem
aby
processin
g
a u
s
e
r
b
u
si
n
e
ss
key
whi
c
h is
writt
en in English
and contain
s
exactly tw
o n
oun
s. The u
s
er can u
s
e th
e tool rep
eate
d
ly
and e
n
sure t
hat it will di
splay the
ne
we
st
sta
r
schema b
a
sed
on the n
e
e
d
s of the
user.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 12, No. 2, June 20
14: 501 – 51
0
510
Therefore,
th
e expe
cted
f
u
ture
develo
p
ment i
s
to
p
r
ocess u
s
er
busi
n
e
s
s keys
with m
o
re
t
han
two nou
ns in
Indone
sia
n
. The process of
cre
a
ting a st
ar sch
e
ma re
quire
s
con
s
id
erabl
e time, but
by improving
sea
r
ch algo
rit
h
ms for di
me
nsio
n and fa
ct tables, the tool is
expe
ct
ed to create star
schema in a
sho
r
ter time.
This research acts a
s
a pione
er for future
large tool develop
ment whi
c
h is able t
o
integrate
all
pro
c
e
s
ses
starting
from
u
s
er b
u
si
ne
ss key
processing
whi
c
h i
s
able to
ide
n
tify
sema
ntic me
aning
by im
p
l
ement
ing
all
processe
s i
n
natu
r
al
lan
guag
e p
r
o
c
e
ssi
ng. It
can
be
done
by providing the l
e
xicon in a
specif
ic lan
gua
ge,
and thu
s
e
nab
ling it to ident
ify whethe
r th
e
use
r
mistype
d
the bu
sine
ss key a
nd not
ify the us
er.
The outp
u
t of the use
r
bu
si
ness key sh
o
u
ld
be sh
own on
a dashbo
ard t
o
help the u
s
er und
erst
an
d the informat
ion gen
erate
d
by the tool.
Referen
ces
[1] Rainardi
V.
Bui
l
din
g
a D
a
ta W
a
reh
ouse: W
i
th
Examples i
n
S
Q
L Server
. First Edition. CA USA. Apress
Berkel
y. 20
11.
[2] Stephe
n
F
. Dashb
oard
Desi
gn for Re
al-T
i
m
e S
i
tuati
on
Aw
areness
. P
e
rceptu
a
l Ed
g
e
Cons
ulta
nc
y
.
200
7.
[3]
Song IY, Kh
a
r
e R, Da
i B.
SAMST
AR: A Semi-Auto
m
a
t
ed Lex
ical M
e
thod for G
e
n
e
ratin
g
Star
Sche
mas
fro
m
an E
n
tity-Re
l
ations
hip
Di
agr
am
. In:
10th
A
C
M Int’l W
o
rks
hop
on
Data
W
a
reho
usin
g
and OLAP (DO
L
AP 200
7). AC
M, Ne
w
York.
200
7: 9-16.
[4] http://
w
w
w
.
birs
t.com/product/te
chnology
/data-
w
a
rehous
e-automat
ion. A
u
tomatic star schema
gen
eratio
n, BIRST
.
[5]
Andre
a
s P, Dani
el N, Rin
a
S.
T
r
ansformator
Entit
y
R
e
lati
onsh
i
p Mo
del i
n
to Star Sche
ma.
Dipl
o
ma
T
hesis
. T
oba Samosir. Pol
i
tek
n
ik Informatika
Del. 20
12.
[6] Ahmad
S.
T
u
to
rial o
n
Natur
a
l
Lan
gu
age Pr
oc
essin
g
. Univ
er
sit
y
of N
o
rther
n Io
w
a
. Artificial Intelligence
F
a
ll. 200
7.
[7]
https://
w
w
w
.
proxem.com. Prox
em Antelope f
o
r Microsoft .net. Version 0.8.
7. March 200
9.
[8]
Go
yal S, Bh
at S, Gulati S,
Anantar
am C.
Onto
log
y
-
d
riv
en Ap
pro
a
ch t
o
Obtain
Sem
antica
l
l
y
Va
li
d
Chu
n
ks for Natural L
a
n
gua
ge Ena
b
l
ed B
u
sin
e
ss Appl
ic
ations.
Res
ear
ch in Co
mputi
ng Scie
nce
.
Speci
a
l Issue: Natura
l Lan
gu
age Proc
essin
g
and Its Appl
i
c
ations
. 20
10;
46(1): 10
5-1
1
6
.
[9]
Ralph K, M
a
rgy
R, Warren T
,
Joy
M, B
ob
B.
T
he Data
W
a
r
eho
use Lifecyc
le
T
o
olkit
. Sec
ond Edition.
Wi
l
e
y
.
20
08
.
[10]
Jia
w
e
i
H, Mich
elin
e K.
Data Minin
g
: Conc
e
p
ts and T
e
ch
ni
ques
. Seco
nd
Editio
n. Morga
n
Kaufman
n
.
200
5.
Evaluation Warning : The document was created with Spire.PDF for Python.