Internati
o
nal
Journal of Ele
c
trical
and Computer
Engineering
(IJE
CE)
V
o
l.
6, N
o
. 3
,
Ju
n
e
201
6, p
p
. 1
048
~ 10
58
I
S
SN
: 208
8-8
7
0
8
,
D
O
I
:
10.115
91
/ij
ece.v6
i
3.9
790
1
048
Jo
urn
a
l
h
o
me
pa
ge
: h
ttp
://iaesjo
u
r
na
l.com/
o
n
lin
e/ind
e
x.ph
p
/
IJECE
A Hybrid Model Sche
ma Matching Using Constraint-Based
and Inst
ance-B
as
ed
Edhy Su
tanta
1
, Re
tant
yo W
a
rd
oyo
2
, K
h
ab
ib
Mu
sto
f
a
2
, Edi
Winar
k
o
2
1
Doctoral Program of Computer
Science
at Dep
a
r
t
ment of Compu
t
er Sciences & Electronics,
Universitas Gad
j
ah Mada
, Yog
y
a
k
arta, Indon
esia
2
Department of Computer
Scien
ce & Electronics
,
Universitas Gadjah Mad
a
, Yogy
ak
arta, Indones
i
a
Article Info
A
B
STRAC
T
Article histo
r
y:
Received Dec 25, 2015
Rev
i
sed
Ap
r 5, 20
16
Accepted Apr 20, 2016
Schema matching is an impor
tant proc
ess in the Enterpr
i
se Information
Integra
tion (
E
II)
which is
at
the
leve
l of
the b
a
c
k
end to
solve
t
h
e probl
em
s
due to the schematic heterogen
e
ity
.
Th
is paper is a sum
m
a
ry
of
prelim
ina
r
y
result work of the model development
s
t
age a
s
part of res
ear
ch on the
development of
models and p
r
ototy
p
e of
h
ybrid schema m
a
tch
i
ng th
at
combines two
methods, namely
constr
ain
t
-bas
ed
and instan
ce-based. The
discussion inclu
d
es a general descript
ion of the proposed mod
e
ls and th
e
development of
models, star
t from
requirement analy
s
is, data ty
p
e
conversion, matching mech
anis
m, datab
a
se
sup
port, constrain
t
s and
instan
ce
extra
c
tion
,
m
a
tc
hing and com
pute the sim
ilar
i
t
y
, pr
elim
inar
y r
e
sult, user
verification, v
e
r
i
fied r
e
sult, dataset fo
r tes
t
ing,
as
well as
th
e p
e
rform
ance
m
eas
urem
ent. B
a
s
e
d on
res
u
lt experiment on
3
6
da
tasets of
heterogen
e
ous
RDBMS, it obtained th
e highest P valu
e is 100.00% while the lowest is
71.43%;
The h
i
g
h
est R value
is 1
00.00%
whil
e th
e lowest
is 75.0
0
%; and F-
Measure highest
value is 100.00
% while the low
e
st is 81.48%. Unsuccessful
m
a
tching on
the
m
odel sti
ll h
a
p
p
ens, in
cluding
use of an
id
attribute with
data
t
y
pe
as
auto
increm
ent
;
us
ing
codes
th
at
are d
e
fined
in th
e s
a
m
e
wa
y
bu
t
differen
t
m
eanin
gs; and if
encou
n
tered
in com
m
on instanc
e
wit
h
the sam
e
definition bu
t dif
f
erent meaning
.
Keyword:
Const
r
aint-bas
ed
Het
e
r
oge
ne
ous
dat
a
ba
se
Hy
bri
d
m
odel
Insta
n
ce-base
d
Schem
a
m
a
t
c
hing
Copyright ©
201
6 Institut
e
o
f
Ad
vanced
Engin
eer
ing and S
c
i
e
nce.
All rights re
se
rve
d
.
Co
rresp
ond
i
ng
Autho
r
:
Ed
hy
S
u
t
a
nt
a,
D
e
p
a
r
t
m
e
n
t
o
f
I
n
f
o
r
m
atics
En
g
i
n
e
er
ing
,
I
S
T A
K
PRIN
D, Yo
g
y
ak
ar
ta,
5522
2, I
ndo
n
e
sia.
Doct
oral
Pr
og
r
a
m
of C
o
m
put
er Sci
e
nce,
De
part
m
e
nt
of C
o
m
put
er Sci
e
nc
es & El
ect
r
o
ni
cs,
U
n
i
v
er
sitas G
a
d
j
ah
Mad
a
,
Yog
y
ak
ar
ta, 5
5281
, In
don
esia.
Em
a
il: ed
h
y
_
s
st@akp
rind
.ac.i
d
, edh
y
_
sst
@yah
oo
.co
m
1.
INTRODUCTION
Schem
a
m
a
t
c
hi
ng i
s
a m
a
t
c
hi
ng
pr
ocess i
n
t
e
r-sc
h
em
a t
o
fi
nd si
m
i
l
a
r rel
a
t
i
ons
hi
p of
pai
r
of at
t
r
i
b
ut
es
s [1]
,
or ar
ra
n
g
e m
a
ppi
ng a
nd m
a
t
c
hi
ng schem
a
i
n
t
w
o
appl
i
cat
i
on s
y
st
em
s [2]
.
Schem
a
m
a
t
c
hi
ng i
s
a
solutio
n o
f
Ent
e
rp
rise In
f
o
rm
ation I
n
teg
r
ation
(EI
I) [
3
]
w
h
i
c
h i
s
d
one at
back e
nd l
e
vel
t
o
sol
v
e t
h
e
pr
obl
em
s
of
schem
a
t
i
c
het
e
r
oge
nei
t
y
[4]
,
t
h
at
i
s
a
di
ffe
re
nt
nam
i
ng
(t
y
p
e,
f
o
r
m
at
, and
pre
c
i
si
on)
i
n
t
h
e
schem
a
d
e
fi
n
itio
n
s
[5
]. Tech
n
i
cally, sch
e
m
a
match
i
n
g
is an
in
tegratio
n
pro
cess o
n
h
e
tero
g
e
n
e
o
u
s
d
a
tab
a
se an
d
will
pr
o
duce
a
ge
n
e
ral
i
zat
i
on or speci
al
i
zat
i
on i
n
t
h
e dat
a
bas
e
[6]
.
Schem
a
m
a
t
c
hi
ng pl
ay
s
im
port
a
nt
r
o
l
e
i
n
ap
p
lication
s
that requ
ires in
t
e
rop
e
rab
ility between
sy
stems with h
e
terogen
e
ou
s d
a
ta so
urces [7
].
Sch
e
ma
m
a
t
c
hi
ng i
s
a
m
a
i
n
pr
obl
em
on
de
vel
o
pi
n
g
t
h
e rel
a
t
i
o
ns
h
i
p bet
w
een
el
em
ent
s
i
n
t
h
e t
w
o
dat
a
base s
c
hem
a
[2]
,
[8]
-
[
1
1
]
.
S
c
hem
a
m
a
t
c
hing
wa
s
ori
g
i
n
a
l
l
y
do
ne m
a
n
u
ally o
n
a sp
ecific app
licatio
n do
m
a
in
[12
]
,
so
it is
neede
d
a ne
w
m
odel
t
h
at
i
s
m
o
re general
a
nd a
p
pr
op
ri
at
e fo
r t
h
e ap
pl
i
cat
i
on a
nd
di
ffe
re
nt
schem
a
l
a
nguage
s
[13
]
. Th
e m
a
i
n
p
r
ob
lem
o
f
sch
e
m
a
match
i
n
g
is often
foun
d
n
o
t
clear na
m
i
n
g
in
th
e sch
e
m
a
, d
i
fficulties in
sy
no
ny
m
s
nam
i
ng, a
n
d sc
h
e
m
a
l
a
ng
uage
di
ffe
re
nces s
o
t
h
at
t
h
e m
a
t
c
hi
ng m
e
t
h
o
d
m
a
y
not
p
r
o
v
i
d
e
1
0
0
%
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
A Hy
bri
d
M
o
d
e
l
Sche
m
a
M
a
t
c
hi
n
g
Usi
n
g C
onst
r
ai
nt
Base
d
an
d
I
n
st
a
n
ce
Base
d (
E
d
h
y
S
u
t
a
nt
a)
1
049
right in the res
u
lt [2]. T
h
e sche
m
a
m
a
tching cannot
be done
autom
a
tica
lly
because the m
a
ppi
ng
of c
o
m
puting
is u
s
u
a
lly co
rr
ected
b
y
th
e
u
s
er
to ob
tain
th
e
co
rr
ect v
e
r
i
f
i
ed r
e
su
lts [1
4
]-[15
]
.
Dev
e
l
o
p
m
en
t m
o
d
e
l an
d
so
ft
ware on
sch
e
ma
m
a
tch
i
n
g
are still
o
p
e
n
to
fi
n
d
p
r
o
p
e
r way
s
to
co
m
b
in
e
exi
s
t
i
ng m
e
t
h
o
d
s [
1
1]
,[
16]
.
T
h
e
use
of c
o
m
b
i
n
at
i
onal
m
a
t
c
hers
[
17]
-
[
18]
,
can
be i
m
pl
em
ent
e
d i
n
hy
bri
d
or
i
n
co
m
p
o
s
ite [16-17
]. Hy
b
r
i
d
m
o
d
e
l is also
called
in
tra-match
e
r p
a
rallelis
m
[1
9
]
usin
g
so
m
e
criteria
conc
u
rre
nt
l
y
m
a
t
c
hi
ng [
1
3]
,[2
0
]
-
[
2
1]
t
o
gi
ve res
u
l
t
s
an
d
bet
t
e
r pe
rf
orm
a
nce t
h
a
n
u
s
i
n
g i
n
di
vi
d
u
al
m
a
t
c
her
[1
7]
. Si
m
p
l
e
conce
p
t
o
f
hy
b
r
i
d
m
a
t
c
her i
s
to com
b
i
n
e t
w
o
di
ffe
rent
m
e
t
h
ods si
m
u
l
t
a
neo
u
sl
y
pr
ocesse
d,
whi
l
e
t
h
e com
posi
t
e
m
a
t
c
her com
b
i
n
es t
w
o
m
e
t
hods t
h
at
are
p
r
o
cessed i
n
a
se
que
nce.
Sc
he
m
a
m
a
t
c
hi
ng
usi
n
g a
hy
b
r
i
d
m
a
t
c
her was a
p
pl
i
e
d
i
n
C
L
I
O
[
22]
-
[
2
6
]
,
C
U
PI
D [
18]
, a
n
d S
Y
M
[2
7]
.
Whi
l
e
t
h
e sc
hem
a
m
a
t
c
hi
n
g
usi
n
g a com
p
o
s
i
t
e
m
a
t
c
her fo
un
d i
n
SEM
I
N
T
[2
1]
,[
2
8
]
-
[
2
9]
, LS
D [
30]
,
C
upi
d [
18]
, C
O
M
A
[1
4]
, C
O
M
A
++
[1
5]
, C
O
M
A
3.
0 [
3
1]
-[
32]
,
IM
AP
[3
3]
, P
R
OTO
P
LA
SM
[3
4]
-[
3
7
]
,
F
A
LC
ON
-A
O
[2]
,
[
38]
, a
nd
AS
M
O
V
[39
]
. Refers to
[40
]
-[41
], d
e
v
e
lo
p
m
en
t o
f
n
e
w sch
e
m
a
m
a
t
c
h
i
ng
m
o
d
e
ls an
d
p
r
o
t
o
t
yp
e is still o
p
e
n
esp
e
cially
on
hy
b
r
i
d
m
o
d
e
l
s
. The
next
s
ect
i
on de
scri
be
s t
h
e p
r
o
p
o
sed
a new
hy
b
r
i
d
m
odel
schem
a
m
a
t
c
hi
ng t
h
at
was
devel
ope
d
base
d
on
co
nst
r
ai
nt
-base
d
a
n
d i
n
st
ance-
base
d.
2.
THE PROPOSED
MODEL
The p
r
op
ose
d
m
odel
of hy
b
r
i
d
schem
a
m
a
t
c
hi
n
g
i
s
by
com
b
i
n
i
n
g t
w
o m
e
tho
d
s (c
o
n
st
rai
n
t
-
base
d an
d
i
n
st
ance-
base
d
)
im
pl
em
ent
e
d sim
u
l
t
a
neousl
y
. C
onst
r
ai
nt
-
b
ase
d
an
d i
n
st
ance-
base
d are
m
e
t
hods cat
ego
r
i
e
s
according to [11],[16]), in
which invol
ves the DTM (data type
m
a
tche
r), CM (constrai
nt
m
a
tcher), and IDM
(instance
data m
a
tcher) m
e
thods
(categories
accordi
n
g to
[42]). Ge
nerall
y, the propose
d m
odel is develope
d
refe
ri
n
g
t
o
t
h
e
gene
ral
m
odel
of
dat
a
p
r
ocess
i
ng, c
o
ns
i
s
t
i
n
g
of
4 sect
i
o
ns,
nam
e
ly
i
nput
,
pr
ocess
,
o
u
t
p
ut
, a
n
d
veri
fi
cat
i
o
n a
n
d e
v
al
uat
i
o
n as
sh
ow
n i
n
Fi
g
u
r
e
1. T
h
e
desc
r
i
pt
i
on
o
f
eac
h s
ect
i
on a
r
e as
f
o
l
l
o
ws:
1.
Input, receive
s
input by
DBSource
(as a r
e
fere
nce data
b
a
se) an
d
DBTa
rget
(databa
s
e to be m
a
tched),
t
h
e t
y
pe of D
B
M
S
, ext
r
act
i
ng c
onst
r
ai
nt
s,
dat
a
t
y
pe conversi
on, extrac
ting
instances
, and chec
king the
si
m
ilarit
y
in
ter attrib
u
t
es i
n
D
BSource
and
DBTarget
.
2.
Process, conducting m
a
tching proce
ss,
which m
a
tch each attribute in
DB
Source
with each attribute in
DBTarget
and
th
en
calcu
late th
e
v
a
lu
e
of
si
m
ilarit
y
(
SIM
MN
) o
n
ea
ch
p
o
ssi
bl
e
pai
r
m
a
t
c
hed at
t
r
i
but
es,
and dete
rm
ine a pair of attributes
declare m
a
tched.
3.
Out
put
, sh
o
w
t
h
e
si
m
i
l
a
ri
t
y
m
a
ppi
n
g
pai
r
of
at
t
r
i
but
es p
a
ir o
f
attribu
t
es s, th
at is
p
a
irs o
f
attrib
u
t
es
th
at
has t
h
e
SIM
MN
MAX
and
SIM
MN
=1
,
n
a
m
e
ly a
p
r
elim
in
ary resu
lt.
4.
Verification
an
d Ev
alu
a
tion
.
Veri
ficatio
n
i
s
th
e
p
r
o
cess
to
d
e
term
in
e wh
et
h
e
r th
e
preli
m
in
ary resu
lts
g
e
n
e
rated
b
y
th
e m
o
d
e
l are co
rrect o
r
still need
to
b
e
m
a
n
u
ally co
rrected b
y
th
e u
s
er. Th
u
s
, th
e
p
r
o
cess is
sup
e
r
v
i
s
ed
ap
p
r
oac
h
.
Prel
i
m
inary
res
u
l
t
has
bee
n
veri
fi
ed
by
t
h
e
use
r
pr
od
uces
t
h
e
ve
r
i
fi
ed r
e
sul
t
i
n
t
h
e
fo
rm
of
m
a
ppi
ng
pai
r
of at
t
r
i
but
es s t
h
at
are
val
i
d
. E
v
al
uat
i
on
pr
ocess i
s
p
e
rf
orm
e
d t
o
cal
cul
a
t
e
t
h
e val
u
e
s
of m
odel
per
f
o
rm
ance param
e
t
e
rs, w
h
i
c
h a
r
e P (Preci
si
on), R (Recall), and F
(F-m
easure). T
h
e
values
of
P, R,
and
F are
calculated
by
co
m
p
arin
g th
e
p
r
elim
in
ary resu
lt and
th
e
v
e
ri
fied
resu
lt.
DB
S
o
u
r
c
e
DB
T
a
r
g
e
t
Sc
h
e
ma
M
a
t
c
hi
ng
Co
n
s
t
r
a
i
n
t
‐
ba
s
e
d
In
s
t
a
n
c
e
‐
ba
s
e
d
Hy
b
r
i
d
Sc
h
e
ma
M
a
t
c
hi
ng
Co
n
s
t
r
a
i
n
t
&
Ins
t
a
n
c
e
Co
n
s
t
r
a
i
n
t
&
Ins
t
a
n
c
e
Pr
e
l
i
m
i
n
a
r
y
Re
su
l
t
of
Sc
h
e
ma
Ma
tc
h
i
n
g
Us
e
r
Ve
r
i
f
i
c
a
t
i
o
n
Ver
i
f
i
e
d
Re
su
l
t
of
Sc
h
e
ma
Ma
tc
h
i
n
g
Pe
rf
o
r
m
a
n
c
e
Ev
a
l
ua
t
i
o
n
Re
p
o
s
i
t
o
r
y
Fi
gu
re
1.
The
pr
o
pose
d
m
ode
l
of
hy
bri
d
sc
h
e
m
a
m
a
t
c
hi
ng
3.
MODEL IN DETAIL
3.
1.
Requireme
nt Analysis
R
e
qui
rem
e
nt
anal
y
s
i
s
i
s
con
d
u
ct
ed
on
fi
ve a
s
pect
s t
h
at
are
fu
nct
i
o
nal
req
u
i
rem
e
nt
, i
nput
doc
um
ent
,
out
put
d
o
cum
e
nt
,
dat
a
base,
a
n
d
m
odel
eval
uat
i
o
n
.
F
u
nc
tio
n
a
l
requ
irem
en
ts of th
e pro
p
o
s
ed
m
o
d
e
l are as
fo
llows:
1.
Inpu
t of Th
e m
o
d
e
l is
DBSou
rce
and
DBT
arget
.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E
V
o
l
.
6,
No
. 3,
J
u
ne 2
0
1
6
:
10
4
8
– 10
58
1
050
2.
The m
odel
ca
n e
x
t
r
act
i
n
fo
rm
ati
on sc
he
m
a
t
o
fi
n
d
t
h
e nam
e
s of t
h
e t
a
bl
es
, at
t
r
i
but
e
nam
e
s, an
d
con
s
t
r
ai
nt
s (t
y
p
e, wi
dt
h, d
o
m
a
i
n
, nul
l
a
bl
e
,
u
n
i
q
ue)
i
n
DBS
ource
and
D
BTarget
.
3.
The m
odel
ca
n c
o
n
v
ert
t
h
e
dat
a
t
y
pes
o
n
t
h
e
at
t
r
i
but
es use
d
by
t
h
e DB
M
S
on
DBSource
and
DBTarget
i
n
t
o
new
dat
a
t
y
p
e
use
d
by
t
h
e
m
odel
.
4.
The m
odel ca
n extract i
n
stanc
e
s in
DBSour
ce
and
DBTa
rget
.
5.
The m
odel can
m
a
tch and com
pute the value of si
m
ilarity
betwee
n each
pair of attributes on
DBSour
ce
and
DBTarge
t
.
6.
Th
e m
o
d
e
l is
ab
le to
d
e
termin
e th
e p
a
ir of attrib
u
t
es, b
y
co
m
p
arin
g
th
e v
a
lu
e of
SIM
MN
of each
pair of
attrib
u
t
es and
fin
d
a p
a
rtn
e
r
with
th
e larg
est si
m
ilarit
y
v
a
lu
e
(
SIM
MN
MAX
) o
r
a
pai
r
of a
t
t
r
i
but
es
wi
t
h
t
h
e
si
m
ilarit
y
v
a
lue is equ
a
l to 1 (
SIM
MN
=1).
7.
The m
odel can receive the us
er veri
fication
to the
p
r
elimina
r
y resu
lt
si
m
i
larity
map
p
i
ng p
a
ir of attrib
u
t
es
s.
8.
The m
odel ca
n calculate a
n
d s
h
ow t
h
e
va
lue of the
pa
ra
m
e
ter that indicates the e
f
fectiveness
of t
h
e
m
odel
.
I
npu
t do
cu
m
e
n
t
s r
e
qu
ir
ed
b
y
t
h
e m
o
d
e
l in
cl
ud
e;
1.
User
nam
e
, da
t
e
of
anal
y
s
i
s
,
t
h
e t
y
pe
o
f
DB
M
S
,
d
o
m
a
in
of
ap
pl
i
cat
i
on, a
n
d si
ze
of
DBSource
and
DBTarget
.
2.
In
fo
rm
at
i
on schem
a
docum
ent
w
h
i
c
h c
ont
ai
ns t
h
e
da
t
a
b
a
se nam
e
, t
a
bl
e nam
e
s, at
t
r
ibut
e
nam
e
s, and
constraints i
n
DBSource
and
DBTarget
,
and insta
n
ces i
n
DBSource
and
DBTarge
t
.
3.
User ve
rification on t
h
e
p
r
elimina
r
y resu
lt
.
R
e
qui
rem
e
nt
s
out
put
o
f
doc
u
m
ent
s
ge
nerat
e
d
by
t
h
e m
odel
are as
f
o
l
l
o
ws;
1.
Inform
atio
n
ab
ou
t th
e u
s
er, th
e typ
e
o
f
DBMS, d
a
ta
b
a
se nam
e
, database si
ze, t
a
bl
e nam
e
s, attri
b
ut
e
nam
e
s, and constraints
and i
n
stances in
DBS
ource
and
D
BTarget
.
2.
R
e
sul
t
s
o
f
t
h
e
dat
a
t
y
pe c
o
nv
ersi
o
n
acc
or
di
n
g
use
d
i
n
t
h
e m
odel
.
3.
The
SIM
MN
Ma
x
v
a
lu
e fo
r each
attrib
u
t
es
pair, and
th
e preli
m
in
ary result an
d
v
e
ri
fied resu
lt si
m
ilari
ty
m
a
ppi
n
g
.
4.
Th
e test
resu
lts of th
e m
o
d
e
l param
e
ters th
at are P, R, and
F.
3.
2.
Da
ta
Ty
pe
Co
nversi
o
n
Data typ
e
conv
ersi
o
n
is
requ
ired
to
ch
a
n
g
e
t
h
e dat
a
t
y
p
e
on t
h
e DB
M
S
use
d
by
DBSource
and
DBTarget
into
n
e
w
d
a
ta ty
p
e
u
s
ed
b
y
th
e m
o
d
e
l. Th
is
pr
o
cess is m
ean
t to
facilitate th
e m
a
tch
i
n
g
p
r
o
cess.
For e
x
am
ple, in the MySQL, data type
ch
ar(n)
or
varchar(n)
will be conve
rted into
string
, whil
e
th
e d
a
ta ty
p
e
i
nt(n)
or
flo
at(n,d)
will b
e
con
v
e
rted in
to
numeric
.
3.
3.
Ma
tchin
g
Me
chanism
a
nd
Com
p
u
t
ing
th
e Similarit
y
of
At
tribu
t
e P
a
i
r
Match
i
n
g
m
e
c
h
an
ism
an
d
similarit
y
v
a
lu
e calcu
latio
n
carried out at every possi
ble pai
r
attribute in
DBSource
(
AS
i
) and i
n
DBSource
(
AT
i
), and
ev
ery p
a
ir will p
r
ov
id
e a
SIM
MN
val
u
e
.
Each
t
y
pe of
con
s
t
r
ai
nt
s
(t
y
p
e,
wi
dt
h,
d
o
m
ai
n, n
u
l
l
a
bl
e
,
u
n
i
q
ue
) a
n
d
t
h
e m
a
t
c
hi
ng
i
n
st
ances
wi
l
l
be
gi
ve
n a
val
u
e
according t
o
weight that
predeterm
i
ned,
whe
r
eas i
f
it
does not
m
a
tch then
it will
be
assigned a
null value.
C
onst
r
ai
nt
s o
n
DBTarget
a
r
e the sam
e
as
constrai
nts on
DBSource
i
f
bot
h ha
ve t
h
e sam
e
const
r
ai
nt
s
d
e
fi
n
itio
n
s
. M
ean
wh
ile, th
e sa
m
e
in
stan
ce will b
e
sta
t
ed
if th
e in
stan
ce in
DBTarget
appea
r
s i
n
DBSource
. A
m
a
t
c
hi
ng m
echani
s
m
and
co
m
put
i
ng t
h
e
SIM
MN
is show
n
in
Figur
e
2
.
AS
AS
AS
AS
AT
AT
AT
DB
S
o
u
r
c
e
Ta
b
l
e
At
t
r
i
b
ute
DB
T
a
r
g
et
Ta
b
l
e
At
t
r
i
b
u
t
e
SI
M
AS
1
AT
1
SI
M
AS
1
AT
2
SI
M
AS
1
AT
3
SI
M
AS
2
AT
1
SI
M
AS
2
AT
2
SI
M
AS
2
AT
3
SI
M
AS
3
AT
1
SI
M
AS
3
AT
2
SI
M
AS
3
AT
3
SI
M
AS
4
AT
1
SI
M
AS
4
AT
2
SI
M
AS
4
AT
3
SI
M
AS
1
AT
N
Ma
x
SI
M
AS
2
AT
N
Ma
x
SI
M
AS
3
AT
N
Ma
x
SI
M
AS
4
AT
N
Ma
x
Fig
u
re
2
.
Match
i
ng
m
ech
an
ism
an
d
co
m
p
u
tin
g th
e
similarit
y
v
a
lu
e
(
SIM
MN
)
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
A Hy
bri
d
M
o
d
e
l
Sche
m
a
M
a
t
c
hi
n
g
Usi
n
g C
onst
r
ai
nt
Base
d
an
d
I
n
st
a
n
ce
Base
d (
E
d
h
y
S
u
t
a
nt
a)
1
051
3.
4.
Supp
ort Database
All o
f
d
a
ta inpu
t, pro
cess, and
resu
lts on
t
h
e p
r
o
p
o
s
ed
h
y
b
r
i
d
m
o
d
e
l will b
e
stored
i
n
t
o
a relation
a
l
dat
a
base m
ode
l
nam
e
d
dbhy
bridschematch
. The
dbhybridschematch
co
n
s
ists o
f
15
tab
l
es in
th
e
third norm
form
, which the
use
of each tables as liste
d
in Ta
ble 1. T
h
e s
u
pport
dat
a
base is i
n
tended to
minimize the c
o
m
putational l
o
ad, es
pecially
d
u
ri
ng
t
h
e l
a
st
ed
pr
ocess
o
f
m
a
t
c
hi
ng.
3.
5.
Information Schema
In
fo
rm
at
i
on sc
hem
a
i
n
a dat
a
base c
o
nt
ai
n al
l
t
h
e m
e
t
a
dat
a
i
n
f
o
rm
at
i
on o
f
al
l
dat
a
base
o
b
j
ect
s st
ore
d
,
exam
pl
e for t
h
e pr
op
ose
d
m
odel
has
28 t
a
bl
es i
n
t
h
e i
n
f
o
r
m
at
i
on schem
a
. Som
e
of t
h
e i
n
f
o
rm
at
i
on t
h
a
t
can
be e
xpl
ore
d
fr
om
t
h
e i
n
f
o
rm
at
i
on sc
hem
a
and
u
s
ef
ul
i
n
t
h
e p
r
ocess
o
f
m
a
t
c
hi
ng sc
h
e
m
a
, suc
h
as
t
a
bl
e,
tab
l
e_
con
s
traints, referen
tial_
co
nstr
ai
nt
s,
an
d st
at
i
s
t
i
c
s. Th
us, t
h
e
pr
o
pos
ed m
odel
d
o
es
n
o
t
use
XM
L
as
a
n
i
n
t
e
rm
edi
a
ry
l
a
ng
ua
ge e
v
er
de
vel
o
ped
by
[
4
3
]
,[4
4
]
.
3.
6.
Constr
aint an
d Ins
t
ance E
x
trac
tion
A constrai
nt e
x
traction is a proces
s to
obtai
n the
data
t
y
pe
, wi
dt
h,
d
o
m
a
in val
u
e,
n
u
l
l
a
b
l
e, as wel
l
as
o
n
th
e
un
iqu
e
n
a
ture
o
f
each
attribu
t
e in
DBSource
and
DBTarget
. C
o
nst
r
ai
nt
s
can
be e
x
pl
or
ed f
r
o
m
table_constraints in the i
n
formati
on schem
a
or dire
ctly from
each tabl
e in the
database. In m
a
ny cases,
d
a
tab
a
se
d
e
sign
ers
o
f
ten
are
n
o
t
ex
p
licitly
d
e
fi
n
e
d
t
h
e co
n
s
train
t
s, so
it
will n
o
t
b
e
foun
d
i
n
th
e informatio
n
sch
e
m
a
an
d
it will b
e
ig
no
red in
th
e m
a
tch
i
n
g
pro
c
ess.
An
i
n
stan
ce ex
tract
io
n
is a pro
cess to
ob
tain
in
st
an
ces
o
n
each
attribu
t
e in
DBSou
rce
and
DBT
arget
. T
h
e i
n
stance
can
be
explored from
each table that is i
n
DBSource
and
DBTarget
. Norm
al
ly, the num
ber of instances in
each
table is equal to the m
u
ltiplic
ation
of the
num
b
er of
records w
i
t
h
t
h
e n
u
m
b
er of
at
t
r
i
but
es.
Ho
weve
r,
no
g
u
arantee that the
value is correc
t
, so it
is n
ecessary to
find
th
e co
rrect
num
b
er of instances.
Tabl
e 1. Dat
a
b
a
se
su
p
p
o
r
t
f
o
r
t
h
e pr
o
pose
d
m
odel
Tab
l
e Na
m
e
Usage
m
s
t_user
Stor
e the user
application data
m
s
t_db
m
s
_ty
p
e
Stor
e the data ty
pes of DBM
S
m
s
t_data
_ty
p
e_conver
s
ion
Stor
e the conver
s
ion of or
igin data ty
pes to th
e data ty
p
e
s used by
the DB
M
S
in the
m
odel
m
s
t_application_d
o
m
ain
Stor
e the ty
pes of applications
field o
n
DBSour
ce and DBT
a
r
g
et
m
s
t_alt_weight_
m
atch
Store the alte
rnative
m
a
t
c
hing criteria
weights (t
ype,
wid
t
h,
do
m
a
in,
nullable,
unique,
&
instan
ce)
which
specif
i
ed by the user
m
s
t_alt_str
i
ng_size_
m
a
tch
Store the alte
rnative string size dif
f
e
ren
ce
m
a
t
c
hing which specif
i
ed by the user
sour
ce_database
Stor
e data about a
database on DBSour
ce
sour
ce_table
Stor
e data about the tables in DBSource
sour
ce_attr
ibute
Store data constrai
nts (type,
width, d
o
m
ain, nullable,
u
n
ique) and instanc
e
f
o
r each attr
ibute in DBSource
tar
g
et_database
Stor
e data about a
database on DBT
a
r
g
et
target_table
Stor
e data about the tables in DB
T
a
r
g
et
target_ attribute
Store data constrai
nts (type,
width, d
o
m
ain, nullabl
e, u
n
ique) & instance
f
o
r each attribute i
n
DBTarget
m
a
tching_pr
elim
in
ar
y
Store the preli
m
ina
r
y
result for all
alte
rnative weigh
ting cr
iter
i
a & differ
e
nces size of str
i
ng that has not
been verif
i
ed by the user
m
a
tching_final
Store the verified result for all alterna
tive weighting
crit
eria
& dif
f
e
rences
size of
string that has been
ver
i
fied by
the user
m
a
tching_r
epor
t
Stor
e the su
m
m
ary data of pr
elim
inary
and ver
i
fied
r
e
su
lt,
& the evaluatio
n of schem
a
m
a
tching
m
odel
3.
7.
Co
mputing the Va
lue
o
f
Si
mila
rity
Pa
ir
o
f
Attribute
(
SIM
MN
)
The value of
SI
M
MN
fo
r eac
h
pai
r
of at
t
r
i
but
es o
n
DBSour
ce
and
DBTa
rget
is d
e
termin
ed
b
a
sed
on
t
h
e
si
m
i
l
a
ri
ty
of
t
h
e
co
nst
r
ai
nt
s (
d
at
a t
y
pe
, wi
dt
h,
dom
ain
val
u
e
,
nul
l
a
b
l
e, u
n
i
q
ue)
a
n
d
i
n
st
ance
s.
Pr
o
b
l
e
m
s
t
h
at
ha
ppe
n i
n
t
h
e p
r
oce
ss o
f
m
a
t
c
hi
ng ar
e n
o
l
i
m
i
t
e
d and
v
e
ry
o
p
en
dat
a
b
a
se desi
gne
rs t
o
s
p
eci
fy
an
d
defi
ne
t
h
e si
ze of t
h
e
dat
a
i
n
st
ri
n
g
dat
a
t
y
pe. To
ove
rc
om
e
i
t
,
the p
r
o
p
o
sed m
odel
pr
ovi
des f
eat
ures t
h
at
al
l
o
w t
h
e
u
s
er to
ch
oo
se
an
altern
ativ
e
d
i
ffere
nce dat
a
si
ze (wi
d
t
h
)
o
f
t
h
e st
ri
ng
dat
a
t
y
pes befo
re t
h
e
m
a
t
c
hi
ng p
r
oces
s
i
s
do
ne.
O
p
t
i
ons
p
r
o
v
i
d
e
d
i
n
cl
ud
e,
AL
T_1
(de
f
ault) the string si
ze of attri
but
e in
DBSource
and
DBTarget
mu
st b
e
ex
actly t
h
e sam
e
;
ALT
_2
th
e strin
g
size o
f
attribu
t
e in
DBSource
and
DBTarg
et
has
t
h
e di
f
f
ere
n
ce
wi
dt
h i
s
5;
ALT_3
th
e string
size o
f
attribu
t
e in
DBSource
and
DBTar
get
has the
diffe
renc
e
wi
dt
h i
s
1
5
;
an
d
ALT_4
th
e strin
g
size of attribu
t
e in
DBSource
and
DB
Target
has t
h
e di
f
f
ere
n
ce
wi
dt
h i
s
25
.
SIM
MN
val
u
e cal
c
u
l
a
t
i
on
pr
ocess
al
so
fa
ces p
r
o
b
l
e
m
s
rel
a
t
e
d t
o
t
h
e a
d
m
i
ni
st
rat
i
on
of
t
h
e
wei
g
ht
val
u
e t
o
each m
a
tching criteria. Assuming that th
e sim
i
larity pair of attributes
can
be speci
fied by constra
i
nt or
in
stan
ce on
ly, o
r
bo
th
sim
u
lt
an
eou
s
ly, th
en th
e p
r
opo
sed
m
o
d
e
l p
r
o
v
i
d
e
s featu
r
es th
at allo
w u
s
ers to
select
altern
ativ
e v
a
l
u
es on
th
e
weig
h
t
of th
e
match
i
n
g
criteria b
e
fore th
e calcu
latio
n
is d
o
n
e
. By defau
l
t
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E
V
o
l
.
6,
No
. 3,
J
u
ne 2
0
1
6
:
10
4
8
– 10
58
1
052
(
INDEX_1
), t
h
e wei
ght
s
us
ed i
n
eac
h m
a
t
c
hi
ng c
r
i
t
e
ri
a i
s
0.1
on t
h
e const
r
ai
nt
s (
t
y
p
e, wi
dt
h,
d
o
m
a
i
n
,
nul
l
a
bl
e,
uni
qu
e) an
d 0.
5 o
n
t
h
e i
n
st
ance
. Th
e val
u
es are gi
ven
wi
t
h
t
h
e as
sum
p
t
i
on t
h
at
t
h
e m
a
t
c
hi
ng p
r
ocess
will b
e
do
n
e
on
ly b
a
sed
on
t
h
e sim
i
larity c
o
n
s
t
r
ain
t
s
o
r
i
n
stan
ces
on
ly. Th
e seco
nd
alt
e
rn
ativ
e
(
IND
EX_2
),
the weights
us
ed in eac
h m
a
tching cr
iteria i
s
0.17. T
h
is va
lue is give
n
on the assum
p
tion that each c
r
iterion
h
a
s t
h
e sam
e
ro
le in
d
e
term
in
in
g
t
h
e sim
ilari
ty o
f
attribu
t
es.
Differen
t
co
m
b
in
ation
s
on
the cho
i
ce of strin
g
size
and
weig
h
t
to th
e m
a
tch
i
n
g
criteria will g
i
v
e
8
diffe
re
nt res
u
lts o
n
SIM
MN
and
SIM
MN
MAX
as sho
w
n
i
n
Tab
l
e 2. Th
ese resu
lts will b
e
u
s
efu
l
as a m
a
terial for
ev
alu
a
ting
th
e
p
e
rform
a
n
ce o
f
th
e m
o
d
e
l an
d d
e
term
in
in
g
the b
e
st altern
ativ
e co
m
b
in
ation
s
.
SIM
MN
v
a
lue is in
the range between 0 and 1,
whe
r
e for
SIM
MN
=1 m
eans th
at the valu
e of an attribute on
DBSource
match
with
t
h
e attribu
t
es
o
n
DBTa
rget
,
fo
r
SI
M
MN
val
u
e=0 means t
h
e attri
b
ute
on
DBSou
rce
no
t m
a
tch
with th
e
attrib
u
t
es on
DBTarget
, an
d t
o
val
u
e
0<
SI
M
MN
<1 m
eans
that the attributes on
DBSource
m
a
tch
e
s with
th
e
attrib
u
t
es on
D
BTarget
with
sim
ilarit
y
lev
e
l is
SIM
MN
.
Tabl
e 2.
C
o
m
b
in
atio
n
of
stri
ng
size, ind
e
x of m
a
tch
i
n
g
criteria, an
d sim
i
larity v
a
lu
e
Alternative of the
string size (width)
INDE
X_1
(Def
aul
t
)
INDE
X_2
ALT_
1
(
Default)
SIM
MN
MAX
11
SIM
MN
MAX
12
ALT_
2
SIM
MN
MAX
21
SIM
MN
MAX
22
ALT_
3
SIM
MN
MAX
31
SIM
MN
MAX
32
ALT_
4
SIM
MN
MAX
41
SIM
MN
MAX
42
3.
8.
Preliminary Result,
User
Ve
rification,
and Verified Res
u
lt
The m
odel
de
v
e
l
ope
d p
r
ovi
de
s a l
i
s
t
of pai
r
of at
t
r
i
b
ut
e an
d si
m
i
l
a
ri
t
y
value ge
ne
rat
e
d
b
y
t
h
e
m
odel
nam
e
ly
p
r
elimin
a
r
y resu
lt
. Pair o
f
attri
b
u
t
es
is d
eclared
m
a
t
c
h
if it h
a
s
v
a
lu
e
SIM
MN
=1 or
SIM
MN
MAX
b
e
tw
een
each pair of
att
r
ibutes
. User
verifi
cation
is done
by providi
ng an
a
ssessm
e
n
t and the
n
det
e
rm
ines whether the
resul
t
s
o
f
m
a
ppi
n
g
si
m
i
l
a
ri
t
y
of eac
h pai
r
o
f
at
t
r
i
but
es
hav
e
been as e
x
pe
ct
ed. T
h
e res
u
l
t
s
of t
h
e asse
ss
m
e
nt
will g
i
v
e
u
s
ers 4
typ
e
s
o
f
po
ssib
le v
a
lu
es,
na
m
e
ly TP
(tru
e p
o
sitiv
e),
FP
(false po
sitiv
e), FN
(false
n
e
gativ
e),
or T
N
(t
r
u
e
n
e
gat
i
v
e) as s
h
ow
n i
n
Ta
bl
e 3 [
45]
-
[
46]
. V
e
ri
fi
ed re
sul
t
o
f
t
h
e m
odel
i
s
m
a
ppi
n
g
o
f
s
c
hem
a
m
a
t
c
hi
ng
res
u
l
t
s
t
h
at
ha
ve
b
een
veri
fi
e
d
b
y
t
h
e u
s
er,
an
d t
h
e
val
u
es
o
f
t
h
e
pa
ram
e
t
e
rs P
,
R
,
an
d F
w
h
i
c
h
sho
w
e
d
t
h
e m
odel
'
s per
f
o
r
m
a
nce.
Tabl
e
3. T
h
e
c
ont
i
n
ge
ncy
t
a
b
l
e fo
r e
x
am
i
n
i
ng
resul
t
of
hy
br
i
d
m
odel
sche
m
a
m
a
t
c
hi
ng
Relevant
Non Relevant
Retrieved
True Positive
False Positive
Not Retrieved
False Negative
T
r
ue Negative
3.
9.
The Da
ta
set
Hyb
r
i
d
m
o
d
e
l sch
e
m
a
match
i
n
g
will b
e
test
ed
u
s
ing
th
e test d
a
ta in
th
e fo
rm
o
f
a relati
o
n
a
l d
a
tab
a
se
m
odel
s
t
h
at
meet
s t
h
e het
e
ro
gene
o
u
s nat
u
re
, fo
rm
it
i
s
has di
ffe
ren
ces i
n
t
e
rm
s of app
l
i
cat
i
on dom
ains,
a
s
wel
l
as di
f
f
ere
n
t
DB
M
S
bei
n
g u
s
ed
. T
h
e p
r
op
ose
d
m
odel
is tested
on
30 d
a
tab
a
se in
relatio
n
a
l m
o
d
e
ls th
at
are fulfilled the criteria of
he
terogene
ous, t
h
at is,
differe
n
t DBMS platform
s
(MS Access and MySQL) and
diffe
re
nt application dom
a
ins (academ
ic a
pplication in highe
r education
and
high sc
hool, e
g
ove
r
ment, and
commerce). T
h
e largest dat
a
capacity
is 1
7
2
,
44
1.6
KB
wh
ile th
e
smallest is 1
2
.
2
KB; th
e larg
est tab
l
e
n
u
m
b
e
r is 163
wh
ile t
h
e sm
al
lest is 2
tab
l
es; th
e larg
est
num
b
e
r
o
f
attr
ibutes is 1,642
,
wh
ile f
e
w
e
st is
1
6
;
as
well as th
e t
h
e larg
est
nu
m
b
er of i
n
st
an
ces is 3
,
59
6,857
wh
ile f
e
w
e
st is 23
1, as
show
n
i
n
Tab
l
e 4. Th
e en
tir
e
d
a
tab
a
se fo
r testin
g
m
o
d
e
ls
deriv
e
d
fro
m
su
rv
ey at
11
i
n
stitu
tio
n
s
, in
clud
ing
the
u
n
i
v
e
rsities, gov
ernmen
t
in
stitu
tio
n
s
, sen
i
or
h
i
gh
sch
o
o
l
s, so
ft
ware de
v
e
lop
e
rs co
m
p
an
y, and
co
mmercial en
terprises.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
A Hy
bri
d
M
o
d
e
l
Sche
m
a
M
a
t
c
hi
n
g
Usi
n
g C
onst
r
ai
nt
Base
d
an
d
I
n
st
a
n
ce
Base
d (
E
d
h
y
S
u
t
a
nt
a)
1
053
Tabl
e
4. T
h
e
d
a
t
a
set
s
fo
r t
e
st
i
n
g
o
f
pr
o
p
o
s
ed
hy
b
r
i
d
m
odel
schem
a
m
a
t
c
hing
No Database
Na
m
e
DBMS
Na
m
e
Application
Do
m
a
in
Ca
pa
city
(KB
)
Table
Atrribu
te
Instance
1
db01
_sipt_a
d
m
i
sio
n
M
S
Access
HE
Academ
ic
75.
0
25
193
199,
06
4
2
db02
_sipt_acade
m
ic
MS Ac
cess
HE A
cade
m
ic
42.6
69
451
135,31
9
3
db03
_sipt_
pay
r
oll
M
S
Access
HE
Academ
ic
12.
2
16
97
8,
827
4
db04
_sipt_em
ploy
M
S
Access
HE
Academ
ic
17.
8
16
97
6,
607
5
db05
_sipt_tax
_
p
p
h
M
S
Access
HE
Academ
ic
1,
331.
2
10
57
627
6
db06
_sipt_r
e
sear
ch
M
S
Access
HE
Academ
ic
326.
6
9
63
3,
150
7
db07
_sipt_labw
or
k_r
egistr
ation
M
S
Access
HE
Academ
ic
171,
05
6.
7
26
162
443,
44
8
8
db08
_sipt_li
br
ar
y
M
S
Access
HE
Academ
ic
9,
932.
8
53
435
188,
41
5
9
db09
_sipt_m
e
nwa_r
e
gistr
a
tion
M
S
Access
HE
Academ
ic
144,
00
8
42
231
10
db10
_nu
ptk
M
y
SQL
E
goverm
ent
240.
0
53
607
1,
700,
19
5
11
db11
_po
or
_dss
M
y
SQL
E
goverm
ent
214.
0
14
64
429,
60
2
12
db12
_o
ffice_letter
M
y
SQL
E
goverm
ent
224.
1
8
71
710
13
db13
_lisence
M
y
SQL
E
goverm
ent
578.
5
2
31
6,
200
14
db14
_lisence_s
m
s
M
y
SQL
E
goverm
ent
172,
44
1.
6
140
687
3,
596,
85
7
15
db15
_dpt
_bgcipt
o
M
y
SQL
E
goverm
ent
79,
769.
6
4
19
2,
721
16
db16
_quickc
ount
_
bgcipto
M
y
SQL
E
goverm
ent
138,
85
4.
4
15
88
7,
313
17
db17
_dpt
_kp
M
S
Access
E
goverm
ent
76,
697.
6
7
46
334,
27
0
18
db18
_hs_
s
inisa
M
y
SQL
HS
Acade
m
ic
77,
246.
0
6
71
2,
010
19
db19
_hs_
s
ipp
M
y
SQL
HS
Acade
m
ic
656.
4
18
151
737,
90
9
20
db20
_hs_
p
sb
M
y
SQL
HS
Acade
m
ic
540.
6
10
63
564
21
db21
_hs_
s
chool
gr
ade
M
y
SQL
HS
Acade
m
ic
49,
049.
6
22
190
12,
391
22
db22
_hs_
s
chool
gr
ade_online
M
y
SQL
HS
Acade
m
ic
256.
2
4
27
567
23
db23
_hs_r
ap
or
t
M
y
SQL
HS
Acade
m
ic
1,
024.
0
44
311
745,
65
5
24
db24
_hs_er
a
por
t
M
y
SQL
HS
Acade
m
ic
4,
558.
1
32
233
381,
90
0
25
db25
_hs_websm
a2
pwt
M
y
SQL
HS
Acade
m
ic
2,
047.
5
100
1,
642
980,
47
5
26
db26
_elear
ning
M
y
SQL
HS
Acade
m
ic
78.
8
163
1,
423
163,
64
5
27
db27
_elear
ning_h
o
m
eschooling
M
y
SQL
HS Acade
m
ic
1,
433.
6
105
748
20,
205
28
db28
_
m
otor
cy
cle_
loan
M
y
SQL
Co
m
m
er
ce
432.
0
10
57
3,
879
29
db29
_cust_telk
o
m
vision
M
y
SQL
Co
m
m
er
ce
75.
0
5
31
2,
916
30
db30
_r
s
m
itr
a
_pharm
acy
M
y
SQL
Co
m
m
er
ce
42.
6
14
66
7,
453
3.
10
.
Perfor
mance Measurement
Ev
alu
a
tion of t
h
e m
o
d
e
l is
run
t
o
m
easu
r
e t
h
e m
o
d
e
l
p
e
rform
a
n
ce. Th
e ev
al
u
a
tion
will b
e
run
u
s
i
n
g
th
e p
a
ram
e
ters
P
,
R
, and
F
obt
ai
ne
d f
r
o
m
t
h
e sim
u
l
a
t
i
on o
f
p
r
ot
ot
y
p
e
s
on t
e
st
dat
a
.
The val
u
es o
f
t
h
ese
p
a
ram
e
ters are calcu
lated
b
a
sed
on
th
e value o
f
tru
e
po
sitiv
e (TP),
false p
o
s
itiv
e
(FP),
false n
e
g
a
tiv
e (FN
),
and t
r
ue ne
gat
i
v
e (T
N) as t
h
e
eval
uat
i
o
n of
per
f
o
r
m
a
nce used i
n
t
h
e i
n
f
o
rm
ati
on ret
r
i
e
v
a
l
(IR
) fi
el
d re
sear
c
h
[4
5]
-[
4
6
]
,
and
t
h
en cal
cul
a
t
e
d
t
h
e val
u
e of
p
r
eci
si
on
(P)
,
re
cal
l
(R
), and f
-
m
easure (F
) us
i
ng eq
uat
i
o
n (
1
) f
o
r
P,
(2)
for
th
e R,
and
(3
)
to
F [7
],
[1
5
]
,[
21
],
[39
]
,
[
4
2
]
,[
47
]-[
53
],
th
at is:
(1
)
(2
)
(3
)
4.
RESULT AND DIS
C
USSI
ON
Hy
bri
d
m
odel
schem
a
m
a
t
c
hing
has bee
n
t
e
st
ed fo
r 3
6
t
i
m
e
s i
n
pai
r
of
DB
Source
and
DBTarget
.
Test
was
per
f
o
rm
ed by
u
s
i
ng t
h
e
defa
ul
t
m
a
t
c
hi
ng m
e
chani
s
m
t
h
at
i
s
a com
b
i
n
at
i
on
of
ALT_
1
and
INDEX_1
, i
n
t
h
ree
va
ri
at
i
ons
of
pai
r
of
DB
Source
and
DBTarget
. T
h
e fi
rst
t
e
st
wa
s co
nd
uct
e
d
30
t
i
m
e
s
in
p
a
ir
o
f
sim
i
l
a
r
DBSource
and
DBTarg
et
, th
e second test w
a
s co
ndu
cted
3
ti
m
e
s i
n
p
a
ir
o
f
DBS
ource
and
DBTarge
t
in
th
e sam
e
ap
p
licatio
n
do
m
a
in
, an
d
the last test was p
e
rform
e
d
th
ree ti
m
e
s in
p
a
ir of
DBSource
and
DBTarget
i
n
di
ffe
re
nt
ap
p
l
i
cat
i
on d
o
m
a
ins.
Th
e
first ex
perim
e
n
t
al step
is to
read
t
w
o d
a
tab
a
ses th
ro
ugh
th
e
i
m
p
o
r
t pro
cess, it acts as
DBSource
an
d a
not
her
as
D
BTarget
. If
t
h
e typ
e
of
D
B
MS on
DBSou
rce
and
DBT
arget
d
i
ff
er
en
t fr
o
m
t
h
e DB
M
S
use
d
i
n
t
h
e m
odel
,
i
t
i
s
necessa
r
y
do t
h
e
dat
a
t
y
pe co
n
v
ersi
o
n
as
desc
ri
be
d
i
n
Sect
i
o
n
3.
2
.
The
n
e
x
t
step, th
e
d
a
ta con
s
train
t
s are ob
tain
ed
b
a
sed
on
th
e ex
traction
of the in
form
atio
n
sch
e
m
a
in
DBS
ource
and
DBTarge
t
, while the
database
insta
n
ce is e
x
tract
ed
fr
om
each
of these
data
bases. T
h
e
process is
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E
V
o
l
.
6,
No
. 3,
J
u
ne 2
0
1
6
:
10
4
8
– 10
58
1
054
cont
i
n
ui
n
g
t
o
do
m
a
t
c
hi
ng a
n
d
cal
cul
a
t
i
o
n
SIM
MN
acr
oss
al
l
possi
bl
e
pa
i
r
s o
f
at
t
r
i
but
e
s
i
n
DBSourc
e
and
DBTarget
, then
d
e
term
in
in
g
th
e p
a
ir prov
ed
a m
a
tch
th
at wh
ich
h
a
s m
a
x
i
m
u
m
SIM
MN
v
a
lue. T
h
e end
of t
h
is
step
will g
e
n
e
rate an
ou
tpu
t
called
pr
elimin
ary resu
lt co
n
t
ai
n
i
ng
p
a
irs of
attrib
u
t
es
d
eclared
m
a
tch
e
d
and
t
h
e
SIM
MN
v
a
lu
e.
A
p
r
elim
in
ary resu
lt is
v
e
rified
b
y
a user t
h
ereb
y
p
r
ov
id
ing th
e
v
e
rified
resu
lt. Th
e
v
e
rifi
catio
n
pr
ocess
per
f
o
r
m
e
d on eac
h pai
r
o
f
at
t
r
i
but
es i
n
t
h
e pr
el
i
m
i
n
ary
resul
t
.
Each ve
ri
fi
cat
i
on
pr
ocess
ge
nerat
e
s
val
u
es
of
TP,
FP, F
N
,
or
TN
as st
i
pul
at
ed i
n
Ta
bl
e 3. B
a
s
e
d o
n
s
u
c
h
val
u
es, i
t
was t
h
e
n
com
put
ed
ac
ros
s
values
of P, R,
and F
which i
n
dicates
th
e
p
e
r
f
o
r
m
a
n
ce of
the pr
opo
sed m
o
d
e
l.
B
y
usi
ng e
quat
i
on (
1
)
,
(
2
)
,
an
d (
3
), i
t
has b
e
en o
b
t
a
i
n
e
d
t
h
e expe
ri
m
e
nt
al resul
t
val
u
es t
h
e hi
g
h
est
P
v
a
lu
e
w
a
s 100
.0
0% wh
ile th
e lo
w
e
st w
a
s 71.43
%
(
F
igur
e
3 (
a
))
; th
e h
i
g
h
e
st R v
a
lu
e w
a
s 10
0.00
% wh
ile th
e
lo
w
e
st
w
a
s
7
5
.0
0% (
F
i
g
ur
e
3
(
b
)
)
;
an
d
t
h
e
h
i
g
h
e
st F-
Measur
e v
a
l
u
e is 100.00
%
w
h
ile th
e lo
w
e
st
w
a
s
8
1
.48
%
(Fig
ure
3
(c
))
.
The
hi
g
h
est
P val
u
e
was
10
0%
obt
ai
ne
d o
n
t
h
e
fo
u
r
m
a
t
c
hi
ng expe
ri
m
e
nt
s on
t
h
e si
m
i
l
a
r
DBSource
and
D
BTarget
, nam
e
ly
db12_office_letter
,
db15_dpt_bgcipto
,
db22_hs_schoolgrade_online
, and
db29_cust_telkomvision
,
wh
ile
th
e lo
west P v
a
lu
e
was
71
.4
3%
o
b
t
a
i
n
ed i
n
e
x
peri
m
e
nt
s o
n
t
h
e sam
e
DBSource
and
DBTarget
at
db13_lisence
. The
h
i
ghes
t
R
val
u
e was 1
0
0
%
obt
ai
ned
on t
h
e f
o
ur m
a
t
c
hi
ng e
xpe
ri
m
e
nt
s on t
h
e si
m
i
l
a
r
DBSour
ce
and
DBTa
rget
,
nam
e
ly
db13_lisence
,
db15_dpt_bgcipto
,
db17_dpt_kp
, and
db30_r
smitra_pharmacy
,
wh
ile th
e lowest R v
a
lu
e
was 75
.0
0% ob
t
a
in
ed
i
n
m
a
tc
h
i
ng
ex
p
e
rim
e
n
t
on
t
h
e sim
i
lar
DBSource
and
DBTarget
at
db29_telkomvision
.
Th
e h
i
gh
est
F valu
e was 1
00%
ob
tain
ed
on
th
e
m
a
tch
i
n
g
p
a
irs on
th
e sim
i
lar
DBSource
and
DBTarget
, nam
e
ly
db15_
dpt_bgcipto
, wh
ile th
e l
o
west
F
v
a
lu
e was
81
.4
8%
o
b
t
a
i
n
ed i
n
ex
peri
m
e
nt
o
n
t
h
e di
ffe
rent
a
ppl
i
cat
i
o
n
dom
ai
n fo
r
DBSource
and
DBTarget
, i.
e.
o
n
t
h
e m
a
t
c
hi
ng
bet
w
ee
n
d
b02_sipt_academic
as
DBSource
and
db08_
sipt_library
as
DBTarget
. B
a
sed
o
n
t
h
e e
x
peri
m
e
nt
al
res
u
l
t
s
k
n
o
w
n
t
h
at
err
o
r
s
i
n
t
h
e res
u
l
t
s
of
hy
bri
d
m
odel
s
sc
hem
a
match
i
n
g
o
c
curs in three cases, i.e. th
e
u
s
e of an
id attribu
t
e with
d
a
ta t
ype au
to in
crem
e
n
t; th
e
u
s
e
of co
d
e
on
dat
a
t
h
at
i
s
def
i
ned i
n
t
h
e
sam
e
way
(t
y
p
e
,
w
i
dt
h,
d
o
m
a
i
n
, n
u
l
l
a
bl
e,
uni
que
) b
u
t
has a
di
f
f
e
rent
m
eani
ngs
;
and
i
f
enc
o
unt
er
ed
t
h
e sam
e
i
n
st
ances a
n
d t
h
e
da
t
a
defi
ned
i
n
t
h
e sam
e
way
bu
t
act
ual
l
y
have
di
ffe
re
nt
m
eanings
.
(a)
Precisio
n
(b
)
Recall
(c) F-M
eas
ure
Fi
gu
re
3.
The
e
xpe
ri
m
e
nt
al
resul
t
s
o
f
hy
b
r
i
d
m
odel
s
schem
a
m
a
t
c
hi
ng
C
o
m
p
ared
wi
t
h
t
h
e
res
u
l
t
s
of
t
h
e
hy
b
r
i
d
sc
h
e
m
a
m
a
t
c
hi
ng
m
odel
s
havi
ng
been
de
vel
ope
d
pre
v
i
o
usl
y
by
[
26]
w
h
i
c
h
obt
ai
ne
d a
val
u
e o
f
P =
7
0
.
0
0%, R
=
7
5
.
0
0
%
, an
d F =
N
A
an
d
[2
7]
w
h
i
c
h o
b
t
a
i
n
e
d
a
val
u
e
of
P = 90.
00
%, R
= 80.0
0
, a
nd F
= 84.0
0
%;
i
t
m
eans t
h
at
t
h
e
pr
o
pose
d
m
odel
has fai
r
l
y
go
od re
sul
t
.
To i
n
creas
e
th
e
v
a
lu
e F-M
easu
r
e,
th
e p
r
op
o
s
ed
m
o
d
e
l
wou
l
d
still
b
e
en
h
a
n
c
ed
b
y
prov
id
i
n
g
v
a
riat
io
n
o
f
weigh
tin
g
on
con
s
t
r
ai
nt
s,
w
h
ere i
n
ge
ner
a
l
con
s
t
r
ai
nt
o
f
dat
a
t
y
pe i
s
m
o
re
d
o
m
i
nant
as a det
e
rm
i
n
ant
i
n
com
m
on
pai
r
of
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
A Hy
bri
d
M
o
d
e
l
Sche
m
a
M
a
t
c
hi
n
g
Usi
n
g C
onst
r
ai
nt
Base
d
an
d
I
n
st
a
n
ce
Base
d (
E
d
h
y
S
u
t
a
nt
a)
1
055
at
t
r
i
but
es t
h
a
n
const
r
ai
nt
o
f
wi
dt
h;
a
nd c
o
nst
r
ai
nt
of
do
m
a
i
n
val
u
e i
s
m
o
re deci
si
ve t
h
an t
h
e c
o
nst
r
ai
nt
o
f
wid
t
h
;
wh
ereas con
s
train
t
o
f
nu
llab
l
e and
u
n
i
q
u
e
h
a
v
e
a si
m
ilar ro
les. Th
is
research also
will b
e
fu
rt
h
e
r
devel
ope
d t
o
i
m
pl
em
ent
t
h
e m
odel
i
n
t
o
a soft
ware
pr
ot
o
t
y
p
e by
ap
pl
y
i
ng al
l
si
ze va
r
i
at
i
ons at
l
e
ng
t
h
of
characte
r
(
AL
T_1
,
ALT_2
,
ALT_3
, and
A
LT_4
) an
d t
h
e vari
et
y
wei
g
ht
s use
d
on ea
ch m
a
tching criteria
(
INDEX_2
and
INDEX_2
). Th
e m
o
d
e
l will b
e
re-ev
a
lu
ated
to
d
e
term
in
e wh
eth
e
r th
ere is influen
ce
of
variety lengt
h of
data and va
riety weights used on each m
a
t
c
hing criteri
a,
in order to know the best vari
ation
t
o
obt
ai
n t
h
e
m
o
st preci
se r
e
sul
t
s
on t
h
e
m
odel
of sche
m
a
m
a
t
c
hi
ng. I
m
provem
e
nt
effo
rt
s are ex
pec
t
ed t
o
be
abl
e
t
o
i
n
creas
e t
h
e val
u
e
by
F, s
o
t
h
at
t
h
e
pr
o
pos
ed m
odel
can
ge
nera
t
e
sim
i
l
a
ri
t
y
on m
a
ppi
n
g
pai
r
o
f
attrib
u
t
es b
e
tter.
5.
CO
NCL
USI
O
N
A hy
bri
d
m
o
d
e
l
schem
a
m
a
tchi
n
g
by
c
o
m
b
i
n
i
ng t
h
e t
w
o
m
e
t
hods
of c
o
nst
r
ai
nt
-base
d
and i
n
ct
ance
-
base
d si
m
u
l
t
a
n
e
ou
sl
y
has bee
n
de
vel
o
pe
d. T
h
e m
odel
has f
o
u
r
m
a
i
n
part
s,
nam
e
ly
i
nput
,
process
,
out
put
, and
verifica
tio
n
and
eva
l
ua
tion
.
B
a
sed o
n
exp
e
ri
m
e
nt kn
ow
n t
h
at
t
h
e pr
o
pos
ed m
odel
has fai
r
l
y
go
o
d
res
u
l
t
,
com
p
ared wi
t
h
t
h
e resul
t
s
o
f
t
h
e hy
bri
d
sc
hem
a
m
a
t
c
hi
ng m
odel
s
t
h
e have
bee
n
de
v
e
l
ope
d by
p
r
e
v
i
o
usl
y
researc
h
er
. E
r
r
o
rs
res
u
l
t
s
on
t
h
e
pr
o
pose
d
m
odel
occ
u
r
s
i
n
t
h
ree ca
ses,
i
n
cl
udi
n
g
use
of
an
i
d
at
t
r
i
but
e
wi
t
h
d
a
ta typ
e
as au
to
i
n
crem
en
t; u
s
i
n
g cod
e
s th
at are d
e
fin
e
d
in th
e sam
e
way (typ
e,
wid
t
h
,
do
m
a
in
, nu
llab
l
e,
u
n
i
q
u
e
) bu
t d
i
fferen
ces m
ean
i
n
g
s
; an
d
if enco
un
tered
in
commo
n
in
stan
ces with
th
e same d
e
fin
ition
s
on
th
e
at
t
r
i
but
es
but
d
i
ffere
nt
m
eani
n
g
.
O
u
r f
u
t
u
re
wo
r
k
are t
o
p
r
o
v
i
d
i
ng
va
ri
at
i
on
of
wei
ght
i
ng
o
n
c
onst
r
ai
nt
s an
d
i
n
st
ances,
s
o
t
h
at
t
h
e m
odel
ca
n
gene
rat
e
si
m
i
l
a
ri
t
y
on m
a
ppi
ng
at
t
r
i
b
ut
e pai
r
bet
t
e
r.
REFERE
NC
ES
[1]
He B.
and Chan
g K. C.
C.
, “
S
ta
tis
tic
al s
c
hem
a
m
a
tching
acros
s
web quer
y
int
e
rfaces
,”
The ACM SIGMOD Int’l
Conf. Manag
ement o
f
Data
. San
Diego, C
A
, USA,
pp. 217-228, 20
03. DOI: 10
.114
5/872757.87278
4.
[2]
Engmann D.
an
d Massmann S.,
“Insta
nce matching with COMA++. Datenbank
S
y
steme in
Business, Technolog
ie
und Web,”
Pro
cediings of th
e Model Manag
ement
&
Meta
data. Aa
che, G
e
rmany,
pp. 28
-37, 2007. UR
L:
http://ceu
r-ws.org/Vol-814/ om2
011 _Tpap
e
r5.p
df.
[3]
Villan
y
i B
.,
et
al
.
, “A novel fram
e
work for the com
position of schem
a
m
a
tchers
,”
The
14
th
WSEAS Int’l Conf. on
Computers, Latest Trends on Co
mputers. Corfu Island, Greece,
p
p
. 379-384, 201
0. URL: ht
tp://dl.acm.org
/citation
.cfm ?
i
d=19815
73.1981641.
[4]
Velic
anu M.,
et al.
,
“Way
s t
o
inc
r
e
a
s
e
t
h
e
e
ffic
i
e
n
cy
of i
n
forma
t
i
on s
y
st
e
m
s,”
The
10
th
WSEAS Int’l Conf. o
n
Artifi
cial Int
e
ll
i
g
ence, Knowled
g
e E
ngineering
and Database
s
. Cambridge, UK,
pp. 211-216, 2011. URL:
http://www.wseas.us/e-libr
a
r
y
/co
n
ferenc
es/2011/Cambridge/AIKED/AIKED-36.pdf.
[5]
Kim
W
.
and Se
o J., “
C
lassif
y
in
g schem
a
ti
c and
dat
a
he
terog
e
n
e
it
y
in m
u
ltid
at
abase s
y
stem
s,”
IEEE
, vo
l/issue
:
24(12), pp
. 12-1
8
, 1991
. DOI: 10
.1109/2.116884
.
[6]
Kavitha C
.,
et al.
, “Ontolog
y
based semantic integra
tion of
heterog
e
neous d
a
tab
a
ses,”
Euro
pean Journal o
f
Scien
tifi
c Resear
ch
, vo
l/issue: 64
(1), pp
. 115-122
, 2011.
[7]
Algergaw
y
A.
,
et al
.,
“Combining effectiv
eness and eff
i
cien
cy
for schema matching ev
alu
a
tion
,
”
The 1st In
t’l
Workshop on M
odel-Based
Software and Data
Integrati
on (
M
BSDI 2008)
. Communications
In Computer an
d
Information Science (
CCIS)
. Berlin, Germany,
pp
. 19-30, 2008. DOI: 10.1007/978
-3-540-78999-4_4.
[8]
Bernstein P.,
et al.
, “The Micr
osoft repositor
y
,
”
The
23
rd
In
t’l Conf.
Very La
rge Databases (VLDB)
. Athens,
Gr
eece
,
pp
. 3-12
, 1997
. DOI: 10
.1.1.50
.
8527.
[9]
Bernstein P. A
.
, “Apply
i
ng m
odel managemen
t
to classi
cal meta data problems,”
The
1
st
In
t’l
Conf. Innova
tive
Data Sy
ste
m
s
Rese
arc
h
(CIDR).
Asilomar,
CA,
U
S
A,
pp
. 209-220
, 2003. DOI: 10.1
.
1.12
.2729.
[10]
S
t
abenau A.
,
et al.
, “An overview of ensemble,”
Genome Research Journal,
vol/issue: 14(5), pp. 929-933, 2004.
DOI: 10.1101/gr
.1857204.
[11]
Bernstein P. A.,
et al.
,
“Ge
n
e
r
ic sc
he
ma
ma
t
c
hing, ten
y
ears later,”
The VLDB Endowment. Sea
ttle, Washington
,
USA,
vol/issue: 4(11)
, pp. 695
-701, 2011
. UR
L: h
ttp://www.vldb.org/pvldb/vol4
/
p
695-
bernstein_madh
a
van_rahm.pdf
.
[12]
Do H.
H.
and Rahm E.
,
“COMA:
A s
y
stem
for f
l
e
x
ible
com
b
inat
io
n of schema matching
approach
,”
The 28th
Conf.
on Very Large Data Bases (
V
LDB
)
. Ho
ng Kong, China,
pp. 610-6
21, 2002. UR
L: http
://dbs.un
i
-
leip
zig.d
e
/fi
l
e
/
C
O
MA.pdf.
[13]
Do H. H., “Schema matching
and
mapping-based data
integration,”
Ph.D.
T
h
e
s
is
. Interdisciplinar
y
Center for
Bioinformatics & Dept. of Computer Scien
c
e, Univ. of Le
ip
zig
.
Leip
zig, German
y
,
2005
. URL: lips.informatik.un
i
-
leip
zig.d
e
/files/2
006-4.pdf.
[14]
Ma
ssma
nn S.
,
et
al.
,
“Evolution
of the COMA match s
y
stem,”
The
6
th
In
t’l Works
hop on Ontology Matching
(
O
M)
.
Bonn, Germany,
pp. 49-60, 2011
. URL: http://ceu
r-ws.org/Vol-814/om2011_Tpaper5.pdf
.
[15]
Milo T. and Zoh
a
r S., “Using sc
hema matching
to simplif
y
heter
ogeneous data tr
anslation
,
”
The
24
th
Int’l Conf. o
n
Ve
ry
Large
Data Base
s (VLDB).
NY
, USA,
pp. 12
2-133, 1998
. UR
L: h
ttp:
//www.vldb.org/c
onf
/199
8/p122.pdf.
[16]
Özsu M. T.
and
Valduriez
P. P.,
“Principles of dist
ributed d
a
tab
a
s
e
s
y
s
t
em
s
,
”
3
rd
edition. Pearson Education
Inc.
NY
, US
A
,
2011.
DOI: 10.1007/9
78-1-4419-8834-8.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
S
SN
:
2
088
-87
08
IJEC
E
V
o
l
.
6,
No
. 3,
J
u
ne 2
0
1
6
:
10
4
8
– 10
58
1
056
[17]
Rahm E. and Bernstein P. A., “
A
survey
of
app
r
oaches
to au
to
m
a
tic s
c
hem
a
m
a
tch
i
ng,”
Very Large
Database
s
(
V
LDB
)
Journal
, vol/issue: 10(4)
, pp. 334-350, 20
01. DOI: 10
.100
7/s007780100057.
[18]
Madhavan J.,
et al.
, “Generic s
c
hema match
i
ng
with Cupid
,
”
The
27
th
In
t’l Co
nf. on
Very
Large Data Bas
e
s
(
V
L
D
B
)
. Rom
a
,
Italy
,
pp
. 49-58
,
2001. URL:
ht
tp
://dl
.
acm
.
org/cit
a
tion
.
cfm
?
id
=64
5927.672191.
[19
]
Gross A.
,
et al.
,
“On matching large lif
e scienc
e ontologies in
par
a
llel,”
The
7
th
In
t’l Conf
. Data In
tegration
in th
e
Life S
c
ien
ces (
D
ILS)
. Gothenburg, Swed
en,
pp. 3
5
-49, 2010
. DOI: 10.1007
/978-3-
642-15120-0_4.
[20]
Bergam
as
chi S
.
,
et a
l
.
, “
S
em
antic
integr
at
ion o
f
sem
i
struct
ured
and structured
data sources,”
ACM
SIGMOD
Recor
d
,
vol/issue: 28(1)
, pp
. 54-
59, 1999
. DOI: 1
0
.1145/309844.3
09897.
[21]
Li W. S. and Clifton C., “Semint: A tool for identif
y
ing
attribute correspondences in he
terogen
e
ous datab
a
ses using
neural network,”
Data and Knowledge En
gineering Jour
nal
, vol/issue: 33(1), pp. 4
9
-84, 2000. DOI:
10.1016/S0169-023X(99)00044-0.
[22]
Hernández M. A
.,
et a
l
.,
“CLIO:
A semi-automatic
tool for sch
e
ma mapping (s
oftware demonstration),”
The
A
C
M
SIGMOD Int’l C
onf. Manag
emen
t of Data. Santa
Barbara, CA, U
S
A,
pp
. 607
, 200
1. DOI: 10
.1145
/376284.375767.
[23]
Naumann F.,
et al.
,
“
A
ttribut
e c
l
assifica
tion
usin
g feature analy
s
is,”
IBM re
se
arch re
port
.
IBM Re
se
arc
h
Division
.
San Jose, CA,
USA, 2002. URL: www.hpi.u
ni-potsdam.d
e/f
i
leadmin/hpi /F
G_Naumann/publications/ ICD
E
02
Poster.pdf.
[24]
Popa L.,
et a
l
.
,
“
M
apping XML & rel
a
tion
a
l sc
hem
a
s with
CLI
O
(software demonstration),”
T
h
e Int’l Con
f
. o
n
Data Enginnering (
I
CDE
)
. San Jose, CA, USA,
pp
. 498-49
9, 2002. URL:
http:
//disi
.
unitn
.it/~v
e
lgi
a
s/docs
/
PopaHVMNH02
.pdf.
[25]
Haas
L.
M
.
,
et al.
, “CLIO gro
w
s up: from research
prototy
p
e
to industrial
too
l
,”
The ACM SIGMOD Int’l Conf.
Management o
f
Data. Ba
ltimore, Maryland, USA
,
pp. 805-810, 20
05. DOI: 10
.114
5/1066157.1066
252.
[26]
Kang J. and
Naughton J., “On sche
ma match
i
ng
with opaqu
e
co
lumn names &
data v
a
lu
es,”
The ACM SIGMOD
Int’l Con
f
. Mana
gement o
f
Data
.
San Diego, CA,
USA,
pp
. 205-21
6, 2003
. DOI: 10
.1145/872757.87
2783.
[27]
C
h
i
e
n
B
.
C
.
a
n
d
H
e
S
.
Y
.
,
“
A
h
y
b
r
i
d
a
pproach f
o
r automatic sch
e
ma matching
,”
The
9
th
Int’l Co
nf. on Ma
chin
e
Learning and Cybernetics. Qingd
ao, China
,
pp
. 2
881-2886, 2010
. DOI: 10.1109
/I
CMLC.2010.55
80776.
[28]
Li W
.
S. and Clifton C., “
S
em
antic int
e
gra
tion i
n
heterogen
e
ous databases
using
neural network
s
,”
The
20
th
Int’l
Conf. on Very Large Dat
a
Bases
(
V
LDB)
. Santiago de Chile,
Chile,
pp. 1-12, 1994. URL:
https://www.cerias.purdue.e
du/assets/pdf/bibtex _
a
rchiv
e
/2001-86
-report.pdf
.
[29]
Li W
.
S
.,
et al.
, “Database integration using n
e
ural ne
tworks: i
m
p
lem
e
ntati
on and
experiences,”
Knowledge and
Information Systems Journal,
vol/issue: 2(1)
, pp
.
73-96, 2000
. DOI: 10.1007
/s101150050004.
[30]
Doan A.
H.,
et
al
., “Reconciling sc
hem
a
s
of
dis
p
arat
e d
a
ta
s
ources
-a m
a
chin
e-le
arning
appro
ach,
”
The A
C
M
SIGMOD Int’l Conf. Mana
gement of Da
ta. Santa Bar
bara, CA, US
A,
pp. 509-52
0, 2001. DOI:
10.1145/376284.375731.
[31]
Madhavan J.,
et al.
, “Corpus-based schema matching,”
The IJC
A
I-03 Workshop on Information Integration on the
Web (
IIWeb)
. Acapulco, Mexico
,
pp. 59-63
, 2003
. DOI: 10.1109
/I
CDE.2005.39
.
[32]
Rahm
E., “
T
owards
large-s
c
ale
s
c
he
ma and ontolog
y
match
i
ng,” in Bell
ahsene
Z, Bonifati A, Rahm E.
Schema
matching and
mapping, data-
centric systems
&
application
s
. Springer. NY, USA,
pp.
3-28, 2011. DOI:
10.1007/978-3-6
42-16518-4_1.
[33]
Dham
ankar R.,
et al.
, “IMAP:
discovering co
mplex semantic
matches between da
ta
ba
se
sc
he
ma
s,
”
The
A
C
M
SIGMOD Int’l C
onf. Manag
emen
t of Data. Paris,
France,
pp. 383-
394, 2004
. DOI:
10.1145/100756
8.1007612.
[34]
Berns
t
ein P
.
A
.,
et al.
, “Industrial-strength
sch
e
ma matching
,”
ACM SIGMOD Record
, vol/issue: 3
3
(4), pp. 38-53
,
2004. DOI: 10
.1
145/1041410.10
41417.
[35]
Dragut E.
and Lawrence R
., “Co
m
posing mappin
g
s
between schemas using a reference on
tolog
y
,”
The Int’l Conf.
on Ontologies, Databases,
&
A
pplications of
Semantics (
ODB
ASE)
. Larnaca,
Cyprus,
pp. 783-800, 2004. DOI:
10.1007/978-3-5
40-30468-5_50.
[36]
Mork P. and Bernstein P. A., “Adapting a g
e
neric ma
tch algor
ith
m to align ontologies of human anatom
y
,
”
The
20
th
Int’l Conf. on
Data Engin
e
ering
(
I
CDE)
.
Boston,
Massachusetts, US
A,
pp
. 787-79
0, 2004. DOI:
10.1109/ICDE.2
004.1320047.
[37]
Tu K. W
.
and Y
u
Y., “
C
MC: co
m
b
ining m
u
ltipl
e
schem
a
-m
at
chi
ng strateg
i
es b
a
sed on cr
edib
ili
t
y
predi
c
tion
,
”
The
10
th
Int’l Conf.
on Database Systems for Advanced Appli
c
at
ion
s
(
D
ASFAA)
. Beijing, China
,
pp. 888-893, 2005.
DOI: 10.1007/1
1408079_80.
[38]
J
i
an N
.,
et al.
, “
F
alcon-AO: Alig
ning ontologies with Falcon,”
The K-CAP Workshop on Integrating Ontologies (K-
CAP’05)
. Ban
ff,
Canada, USA,
p
p
. 85-91
, 2005
.
DOI: 10.1016/j.websem.2008.02
.006.
[39]
J
ean-M
ar
y
Y. R.
,
et al.
, “Ontolog
y
matching with
semantic verif
i
cation,”
Web Semantics Journal,
vol/issue: 7(3),
pp. 235-251
, 20
09. DOI: 10
.
101
6/j.websem.200
9.04.001
.
[40]
S
u
tanta E
.,
et al.
, “Kajian model dan prototip
e schema matchi
ng (Studi untuk menemuka
n peluang
pengembangan
model dan proto
tipe b
a
ru),”
Prosiding Seminar N
a
sional Aplikasi
Teknol
ogi Infor
m
asi (
S
NATI 20
15)
. Y
o
g
y
akarta,
Indonesia,
pp
. J-
9-15, 2015
. UR
L: h
ttp:
//journ
a
l.uii.ac.i
d/ind
e
x.p
hp/Snati/
arti
cl
e/
viewFile/3556
/3
147.
[41]
Suta
nta
E
.
,
et a
l
.
, “Survey
:
Models and pro
t
ot
y
p
es of Schema Matching
,”
In
t’l Jo
urnal of Electrical and Computer
Engineering (
I
JECE)
,
vol/issue: 6(3), 2016. URL: http
://w
ww.iaesjournal.com
/
online/ind
e
x
.
php/IJECE/articl
e/view/9789.
[42]
Karasneh Y.,
et al.
, “Integr
a
tin
g schemas of heterogen
e
ous re
lational databases
thr
ough sch
e
ma match
i
ng,”
The
11th Int’l Conf. on Information Integration an
d Web-based
Applications and Services (
iiWAS)
.
Kuala Lumpu
r
,
Malaysia,
pp. 20
9-216, 2009
. DOI: 10.1145
/1806
338.1806380.
Evaluation Warning : The document was created with Spire.PDF for Python.
I
J
ECE
I
S
SN
:
208
8-8
7
0
8
A Hy
bri
d
M
o
d
e
l
Sche
m
a
M
a
t
c
hi
n
g
Usi
n
g C
onst
r
ai
nt
Base
d
an
d
I
n
st
a
n
ce
Base
d (
E
d
h
y
S
u
t
a
nt
a)
1
057
[43]
Sa
mini S.
,
et al.
, “Bridging XML and relation
a
l d
a
tab
a
ses: An ef
f
ective mapping s
c
heme
based on
persistent,”
Int’
l
Journal of Electrical and Com
puter Engineering (
I
JECE)
, vol/issue: 2(2), pp. 239-246, 2012. DOI:
10.11591/ijece.v
2
i2.215.
[44]
Wi
n L
.
H.,
“XML
-ba
s
e
d
RDF da
t
a
ma
n
a
gement for XPath q
u
er
y
langu
age,”
Int’l Journal o
f
Informatics an
d
Communication Technology
(
I
J-ICT)
, vol/issue: 2
(
1), pp
. 1-8
,
201
3. DOI: 0
.
11591
/ij-i
c
t
.
v2i1.1503
.
[45]
Manning C. D. and Schutze H., “F
oundations of statistical natu
r
a
l lang
uage processin
g
,”
London The
Massachusetts I
n
stitute of
Tech
n
o
logy
Press. Lon
don, Eng
l
and,
1
999
.
DOI: 10
.11
45/601858.6018
67.
[46]
Bellahs
ene Z.
,
et al.
, “Schema
matching
and mapping, data-c
en
tric s
y
s
t
em
s
and
appli
c
a
tions
,”
Springe
r.
Ne
w
Y
o
r,
USA,
2011
. DOI: 10.1007
/978-3-
642-16518-4.
[47]
Rijsbergen C.
J. V.,
“
I
nform
a
tion Retr
ieva
l,”
2
nd
edition.
Butterworths, Londo
n,
1979. DOI
:
10.1002/asi.463
0300621.
[48]
Do H.
H.
,
et
al
.
, “Comparison of schema
matching
evalu
a
tio
ns,”
Proceed
ing
s
of 2
nd
In
t’l W
o
rkshop Web
&
Databases.
In:
Lecture Notes in
Computer Science (
L
NCS
)
259
3
. Springer-Verlag, German
y
,
pp. 221-237, 2003.
URL: http
://
lips.
inform
atik.un
i
-le
i
pzig
.de/f
iles/20
02-28.pdf.
[49]
Ehrig M. and Staab S., “
QOM-quick onto
l
og
y
m
a
pping,”
The
3
rd
Int’l Semantic
Web Conf. (
I
SWC)
. Hiroshima,
Japan,
pp
. 683
-6
97, 2004
. DOI: 1
0
.1007/978-3-54
0-30475-3_47.
[50]
Giunchiglia F.,
et al.
, “A large scale taxonomy
mapping evaluation
,
”
The
4
th
Int’l Conf. Semantic Web Conf.
(
I
SWC
)
. Galway, Ireland
,
pp
. 67
-
81, 2005
. DOI: 1
0
.1007/1157462
0_8.
[51]
Li J
.,
et al.
, “
R
iMOM: A dy
n
a
m
i
c m
u
ltistrateg
y ontolog
y a
lig
nm
ent fram
e
work,”
Journal of IEEE T
r
ansactio
n
Knowledge Data
Engin
eering
,
vo
l/issue: 21
(8), pp
. 1218-1232
, 20
09. DOI: 0
.
1109
/TKDE.2008.20
2.
[52]
Martinek P., “Schema matching met
hodologies and runtime solutions in
SOA based enterprise applicatio
n
integr
ation
,
”
Ph
.
D
T
h
esis
. Dept. of Electronics Techno
log
y
, Bu
dape
st University
of Technolo
g
y
&
Economics.
Hungar
y
, 2
009. URL:
https://r
ep
ozitorium
.om
i
kk
.bm
e
.hu/
bitstream
/
handle/10890/86
9/
tezis_eng.pdf
?
s
equenc
e=3&isAllowed=y
.
[53]
Karasneh Y.
,
et al.
, “An appro
ach for match
i
n
g
relational databa
se
sc
he
ma
s,”
Journal of Dig
ital Information
Management,
vo
l/issue: 8
(
4), pp.
260-269, 2010
.
URL: http
://ww
w
.dirf.org/
jdim
/
v8i4.asp.
BIOGRAP
HI
ES
OF AUTH
ORS
Edhy
Sutanta
rece
ived Ba
che
l
o
r
of Inform
atics
M
a
na
gement &
Computer Engin
eering form IST
AKPRIND Yog
y
a
k
a
rta, Indones
i
a in 1996, r
eceived
Master of Computer Science from Universitas
Gadjah Mada,
Yog
y
ak
arta, Ind
onesia in 2006. Curre
ntly
he
is a lectur
er
at Department o
f
Informatics Eng
i
neer
ing in IST AKPRIND Yo
g
y
ak
arta Indonesia and pursuin
g his doctoral
program
in Computer S
c
ien
ce a
t
Departm
e
nt of
Com
puter S
c
ien
ces
& El
ectron
i
cs
in Univers
itas
Gadjah Mada, Yog
y
ak
arta, Indon
esia. His research areas
of interest are databa
se
s
y
ste
m
s,
da
ta
ba
se
anal
ys
is
& d
e
s
i
gn, and
inform
at
i
on s
y
s
t
em
s
.
Email : edh
y
_sst@akprind.ac
.id, edh
y
_sst@
y
aho
o
.com
D
r
s
.
R
e
tan
t
yo Ward
o
yo
, M.S
c
., Ph
.D
.
rece
iv
ed Bache
l
or of M
a
them
atics
from
Univers
itas
Gadjah Mada,
Yog
y
ak
arta
, Indonesia in 1982, receiv
e
d Maste
r
of Com
puter
Scienc
e from
the
University
of M
a
nchester, UK in 1990, and receiv
ed Ph.D. of Computation from University
of
Manchester
Institute of Scienc
e and Technolog
y
,
UK in 1996. Currently
he
is a lecturer
at
Department of
Computer Scie
nce & El
ec
tron
ics
in Univers
i
tas
Gadjah M
a
da, Yog
y
akar
t
a
,
Indonesia. His r
e
search
area
of
interest
are database s
y
stem
s, operating s
y
stem
s, management
information s
y
stems, fuzzy
logi
cs, and
software engineer
ing.
Em
ail :
rw@ug
m
.ac.
id
Dr
.
T
e
c
hn.
Khabib M
u
stofa,
S.
Si.
,
M.
Kom.
receiv
e
d Ba
ch
elor of Com
puter S
c
ienc
e from
Universitas Gad
j
ah Mada (UG
M
), Yog
y
ak
arta, I
ndonesia in 19
97, receiv
ed Master of Computer
Science from U
n
iversitas Gadjah
Mada, Yog
y
ak
arta, Indon
esia in 2001,
and r
e
ceieved
Ph.D. f
r
om
Vienna University
of
Technolog
y
,
Austria in 20
07.
Currently
h
e
is a lect
urer
at Department o
f
Computer Scien
ce & Electron
i
cs
in Universitas
Gadjah Mada (UGM), Y
ogy
akarta, Indonesia. His
res
earch
ar
ea of
inter
e
s
t
are d
a
t
a
bas
e
s
y
s
t
em
,
s
e
mantic web, w
e
b engin
eer
ing,
and information
management.
Email: khabib@
ugm.ac.id
Evaluation Warning : The document was created with Spire.PDF for Python.