TELK
OMNIKA
T
elecommunication,
Computing,
Electr
onics
and
Contr
ol
V
ol.
18,
No.
4,
August
2020,
pp.
1884
1891
ISSN:
1693-6930,
accredited
First
Grade
by
K
emenristekdikti,
No:
21/E/KPT/2018
DOI:
10.12928/TELK
OMNIKA.v18i4.13858
r
1884
Spatial
association
disco
v
ery
pr
ocess
using
fr
equent
subgraph
mining
Gio
v
anni
Dai
´
an
Rottoli
1
,
Her
n
´
an
Merlino
2
1
Uni
v
ersidad
Nacional
de
La
Plata,
Ar
gentina
1
Uni
v
ersidad
T
ecnologica
Nacional,
Ar
gentina
1,2
Information
Systems
Research
Group,
National
Uni
v
ersity
of
Lan
´
us,
Buenos
Aires
Article
Inf
o
Article
history:
Recei
v
ed
Aug
10,
2019
Re
vised
Mar
10,
2020
Accepted
Apr
3,
2020
K
eyw
ords:
Frequent
subgraph
mining
SARM
Spatial
association
mining
Spatial
data
mining
Spatial
kno
wledge
disco
v
ery
ABSTRA
CT
Spatial
associations
are
one
of
the
most
rele
v
ant
kinds
of
patterns
used
by
b
usiness
intelligence
re
g
arding
spatial
data.
Due
to
the
characteristics
of
thi
s
particular
type
of
information,
dif
ferent
approaches
ha
v
e
been
proposed
for
spatial
association
mining.
This
wide
v
ariety
of
methods
has
entailed
the
need
for
a
process
to
inte
grate
the
ac-
ti
vities
for
association
disco
v
ery
,
one
that
is
easy
to
implement
and
fle
xible
enough
to
be
adapted
to
an
y
particular
situation,
particularly
for
small
and
medium-size
projects
to
guide
the
useful
pattern
disco
v
ery
process.
Thus,
this
w
ork
proposes
an
adaptable
kno
wledge
disco
v
ery
process
that
uses
graph
theory
to
model
dif
ferent
spatial
rela-
tionships
from
multiple
scenarios,
and
frequent
subgraph
mining
to
disco
v
er
spatial
associations.
A
proof
of
concept
is
presented
using
real
data.
This
is
an
open
access
article
under
the
CC
BY
-SA
license
.
Corresponding
A
uthor:
Gio
v
anni
Dai
´
an
Rottoli,
Departamento
de
Ingenier
´
ıa
en
Sistemas
de
Informaci
´
on,
Uni
v
ersidad
T
ecnol
´
ogica
Nacional,
F
.R.
Concepci
´
on
del
Uruguay
,
676
Ing.
Pereira
Street,
Concepci
´
ıon
del
Uruguay
(3260),
Entre
R
´
ıos,
Ar
gentina.
Email:
rottolig@frcu.utn.edu.ar
1.
INTR
ODUCTION
Spatial
kno
wledge
di
sco
v
er
y
aims
to
find
usef
u
l
and
no
v
el
patterns
in
spatial
datasets
to
support
decision-making
in
a
particular
problem
domain
[1].
Among
all
the
possible
patterns
to
disco
v
er
,
spatial
asso-
ciations
are
one
of
the
most
commonly
us
ed
today
in
multiple
fields
such
as
climatology
,
geograph
y
,
geology
,
criminology
and
ecology
,
among
man
y
others.
The
y
are
comprised
of
predicates
that
in
v
olv
e
spatial
objects
along
with
spatial
and
non-spat
ial
relationships
between
those
objects
[2].
There
are
man
y
challenges
associ-
ated
with
the
characteris
tics
of
spatial
data
that
mak
e
this
data
mining
task
more
complicated,
such
as
the
spatial
dependenc
y
data
attrib
utes,
the
multiplicity
of
spatial
data
representation
models,
the
spatial
relations
between
data
objects
and
some
particular
spatial
properties
s
uch
as
spatial
autocorrelation
and
spatial
heterogeneity
[3].
Multiple
algorithms
ha
v
e
been
de
v
eloped
for
association
pattern
mining
that
can
be
used.
Each
of
these
algorithms,
in
general,
aims
to
solv
e
particular
concerns
about
the
aforemetioned
challenges.
The
se-
lection
of
a
proper
algorithm
has
become
an
arduous
acti
vity
due
to
the
gro
wing
number
of
ne
w
alternati
v
es
and
their
v
ariants,
specially
to
ine
xperienced
users.
Thus,
it
is
necessary
to
pro
vide
a
ne
w
process
for
small
or
medium-size
application
domains,
one
that
is
easy
to
implement
and
fle
xible
enough
to
be
adapted
to
multiple
conte
xts.
Consequently
,
this
paper
proposes
a
ne
w
process
for
association
mining
disco
v
ery
from
spatial
data
J
ournal
homepage:
http://journal.uad.ac.id/inde
x.php/TELK
OMNIKA
Evaluation Warning : The document was created with Spire.PDF for Python.
TELK
OMNIKA
T
elecommun
Comput
El
Control
r
1885
that
utilizes
graph
theory
to
model
spatial
objects
and
the
relations
between
them
and
frequent
subgraph
mining
to
find
the
substructures
with
a
high
repetition
rate
inside
the
general
graph.
These
substructures
correspond
to
association
patterns.
The
proposal
is
a
ne
w
alternati
v
e
to
model
comple
x
sit
u
a
tions
from
a
particular
problem
domain,
b
ut
not
replace
or
impro
v
e
results
from
the
algorithms
in
the
state-of-the-art,
ho
we
v
er
it
pro
vides
a
road
map
to
initially
address
a
problem.
The
rest
of
this
paper
is
arranged
as
follo
ws:
section
2.
on
the
charac-
teristics
of
spatial
data;
section
3.
contains
association
patterns
and
their
characteristics
re
g
arding
spatial
data;
section
4.
includes
the
proposed
process
for
disco
v
ery
of
spatial
associations;
a
proof
of
concept
using
real
w
orld
data
is
sho
wn
in
section
5.
Lastly
,
section
6.
contains
conclusions
and
future
w
orks
.
2.
SP
A
TIAL
D
A
T
A
Spatial
data
is
a
particular
type
of
dependent
data.
F
ormally
,
a
spatial
database
D
is
a
set
of
spati
al
records
D
=
f
~
T
1
;
~
T
2
;
;
~
T
d
g
with
T
i
=
f
S
1
i
;
S
2
i
;
;
S
m
i
;
X
1
i
;
X
2
i
;
;
X
n
i
g
,
where
each
S
k
i
is
a
spatial
attrib
ute
that
stores
v
alues
about
the
spatial
conte
xts,
and
each
X
l
i
is
a
non-spatial
attrib
utes
with
v
alues
mea-
sured
at
particular
locations
[3,
4].
The
non-spatial
attrib
utes
may
be
nu
m
erical
or
cate
gorical
according
to
the
problem
domain
and
the
spatial
attrib
utes
may
be
specified
as
coordinates
or
places
(e.g.
city
name
or
state
code).
Additionally
,
there
are
three
basic
types
of
spatial
objects:
points,
used
to
model
specific
punctual
locations
in
the
space;
lines,
used
to
model
linear
e
xtensions
such
as
ri
v
ers
or
roads;
and
polygons,
used
to
represent
objects
that
ha
v
e
a
tw
o-dimensional
e
xtension
in
the
space,
such
as
re
gions
or
states.
The
dependence
of
non-spatial
attrib
utes
on
spatial
ones
means
that
dif
ferent
implicit
spatial
relati
ons
can
be
e
xtracted
from
data.
Let
D
be
a
spatial
database,
a
relation
R
D
2
is
called
spatial
if
and
only
if
it
is
defined
through
a
binary
predicate
P
(
x;
y
)
j
x;
y
2
D
that
in
v
olv
es
the
spatial
attrib
utes
from
the
spatial
objects
x
and
y
.
F
or
e
xample,
the
spatial
relation
N
D
2
,
with
x;
y
2
D
,
defined
by
the
predicate
sho
wn
in
(2),
is
the
neighborhood
relations
between
tw
o
spatial
points
using
euclidean
distance:
xN
y
(
)
D
ist
(
x;
y
)
<
;
2
R
+
These
relations
can
be
classified
as
g
eometric
,
if
the
y
are
related
to
the
principles
of
euclidean
geom-
etry
(e.g.
neighbouring
relationships);
dir
ectional
,
when
the
y
refer
t
o
relati
v
e
spatial
orientations
(e.g.
abo
v
e,
belo
w
,
north,
east);
topolo
gical
,
if
the
y
are
independent
from
the
concepts
of
distance
and
direction
and
are
not
af
fected
by
spatial
transformations
such
as
rotation
or
translation
(e.g.
intersect,
inside),
or
hybrid
,
if
the
y
are
related
to
tw
o
or
more
of
the
aforementioned
types
of
properties.
These
relationships
can
be
calculated
using
dif
ferent
methods
depending
on
the
problem
domain
and
the
class
of
spatial
data
used:
points,
lines
or
polygons
[5,
6].
On
the
other
hand,
tw
o
properties
are
deri
v
ed
from
spatial
dependence:
spatial
autocorrelation,
i.e.,
observ
ations
of
spatially
distrib
uted
random
v
ariables
are
not
location-independent,
and
spatial
heterogeneity
,
i.e.,
patterns
found
in
some
re
gion
of
the
space
may
not
ha
v
e
the
same
support
in
other
re
gion.
Spatial
auto-
correlation
refers
to
the
particularity
of
spatial
data
to
not
be
distrib
uted
independently
throughout
the
space.
The
distrib
ution
depends
on
the
characteristics
of
the
data
points,
the
characteristics
of
the
underlying
space
or
the
spatial
neighboring
relationships.
F
or
e
xample,
churches
tends
to
be
located
near
public
squares
or
animal
tends
to
tra
v
el
to
locations
that
contain
their
food
sources
[7].
Spatial
heterogeneity
is
related
to
spatial
auto-
correlation.
This
phenomenon
describes
the
local
nature
of
spatial
patterns,
which
are
subordinated
to
some
specific
locations.
Thus,
a
spatial
pattern,
such
as
association
rules,
may
ha
v
e
a
high
support
v
alue
in
a
re
gion
and
a
lo
w
support
v
alue
in
a
dif
fer
ent
one.
This
phenomenon
is
also
kno
wn
as
Simpson’
s
paradox
[8].
All
these
particular
characteristics
mak
e
kno
wledge
e
xtraction
from
spati
al
data
become
a
comple
x
acti
vity
which
not
only
has
to
consider
patterns
between
data
records,
b
ut
also
the
implicit
relationships
between
spatial
objects.
3.
SP
A
TIAL
ASSOCIA
TIONS
One
of
the
most
common
patterns
to
find
in
data
is
the
association
pattern.
An
association
pattern
P
is
defined
as
an
n-ary
predicate
P
=
(
p
1
;
p
2
;
;
p
n
)
with
a
high
probability
of
occurrence
in
the
dataset.
Its
classic
application
is
the
supermark
et
bask
et
analysis
to
disco
v
er
whether
or
not
there
is
some
correlation
between
items
that
are
bought
together
.
An
association
pattern
is
referred
to
as
spatial
if
at
least
one
of
its
atomic
predicates
p
k
in
v
olv
es
a
spatial
relationship
between
its
v
ariables
[2].
F
or
e
xample,
in
a
city
C,
churches
and
public
squares
tend
to
be
neighbors:
C
ity
(
C
)
^
C
hur
ch
(
X
)
^
P
ubl
ic
S
q
uar
e
(
Y
)
^
I
nside
(
X
;
C
)
^
I
n
s
ide
(
Y
;
C
)
^
N
eig
hbor
s
(
X
;
Y
)
As
sho
wn
in
the
pre
vious
e
xample,
I
nside
(
X
;
C
)
,
I
nside
(
Y
;
C
)
and
N
eig
hbor
s
(
X
;
Y
)
are
spatial
Spatial
association
disco
very
pr
ocess
using
fr
equent
subgr
aph
mining
(Gio
vanni
Dai
´
an
Rottoli)
Evaluation Warning : The document was created with Spire.PDF for Python.
1886
r
ISSN:
1693-6930
predicates
related
to
topological
and
geometric
relationships.
Man
y
dif
ferent
relations
must
be
tak
en
into
consideration
at
the
same
time
to
find
useful
spatial
associations.
Also,
these
relations
must
be
calculated
in
local
conte
xts,
due
to
the
aforementioned
Simpson’
s
P
aradox.
Multiple
ef
forts
ha
v
e
been
made
in
order
to
find
spatial
association
patterns
in
spatial
databases:
[7]
proposes
a
method
for
spatial
association
mining
that
consider
spatial
autocorrelation
by
using
a
cell
structure;
[9]
focuses
on
the
problem
of
rule
e
xtraction
from
spatial
data
with
crisp
condition
attrib
utes
and
fuzzy
deci-
sions.
A
rough-fuzzy
set
based
rule
e
xtraction
model
is
used
to
deal
with
both
fuzziness
and
roughness
;
[10]
combines
and
e
xtend
techni
ques
de
v
eloped
in
both
spati
al
and
fuzz
y
data
mi
ning
to
dea
l
with
the
uncertainty
found
in
typical
spatial
data.
This
proposal
uses
fuzzy
logic
to
get
rele
v
ant
information
from
transition
areas
between
spatial
neighborhoods
to
spatial
association
mining
and
for
spatial
relationships
modelling;
[11,
12]
propose
an
algorithm
for
local
patterns
disco
v
ery
considering
spatial
heterogeneity
that
incorporates
a
no
v
el
spatial
metric
for
support
e
v
aluation
based
on
e
v
ent
density
in
a
particular
area;
[13]
presents
a
specially
de-
signed
algorithm
to
disco
v
er
spatial
associations
related
to
El
Ni
˜
no
Southern
Oscillation
(ENSO);
[14]
applies
an
algorithm
that
e
xplores
multiple
spatial
objects
hierarchies;
[15]
uses
A-Priori-based
approaches
to
find
spatial
association
rules;
[6,
16]
propose
using
Induct
i
v
e
Logic
Programming
(ILP)
for
reach
this
data
mining
purpose
by
modelling
and
stracting
high
support
spatial
relations
from
spatial
data.
[17]
w
ork
ed
with
meta-
heuristics
s
uch
as
genetic
algorithms
and
e
v
olutionary
programming;
[18]
suggested
a
data-transformation
approach
before
using
traditional
association
rule
mining
algorithms;
[19]
introduced
non-tri
vial
structures
such
as
graphs
for
spatial
relationship
representation;
among
others.
Because
of
this
v
ariety
of
spatial
data
mining
approaches
for
association
disco
v
ery
,
it
is
dif
ficult
to
select
a
proper
algorithm
or
method
to
be
used
in
small
kno
wledge
disco
v
ery
application
conte
xts.
Because
of
this,
a
unified
and
general
process
is
required
to
deal
with
the
aforementioned
problems
and
it
has
to
be
fle
xible
enough
to
be
adapted
to
multiple
particular
situations
and
easy
to
implement.
4.
SP
A
TIAL
ASSOCIA
TION
DISCO
VER
Y
PR
OCESS
This
w
ork
describes
a
ne
w
process
for
spatial
association
e
xtraction
considering
the
possibility
of
ha
ving
multiple
relationships
between
spatial
objects
of
an
y
kind
(i.e.
points,
lines,
polygons),
and
considering
the
spatial
autocorrelation
and
spatial
heterogeneity
.
This
process
is
designed
as
a
first
approach
to
get
spatial
association
kno
wledge
from
data
in
particular
conte
xts
easy
to
implement
in
small
or
medium-size
projects.
The
process
Figure
1
is
di
vided
into
5
main
steps:
data
preparation
(section
4.1.),
neighborhood
definition
(section
4.2.),
modelling
of
spatial
relationships
using
graphs
(section
4.3
.)
,
frequent
subgraph
mining
(section
4.4.)
and
e
v
aluation
of
results
(section
4.5.).
Figure
1.
Spatial
association
disco
v
ery
process
4.1.
Data
pr
eparation
The
proposed
process
starts
with
a
s
p
a
tial
data
preparation
step.
It
is
necessary
to
codify
the
v
arious
spatial
datasets
obtained
from
dif
ferent
sources
in
dif
ferent
formats,
in
order
to
enable
t
he
e
xtraction
of
relations
between
all
the
data
instances
in
later
steps.
In
general
terms,
it
is
not
uncommon
to
ha
v
e
multiple
spatial
objects
layers,
each
of
them
with
a
particular
representation
type
and
related
to
a
particular
scenario
from
the
problem
domain.
On
the
other
hand,
tw
o
types
of
datasets
must
be
considered:
tar
get
datasets,
with
objects
directly
TELK
OMNIKA
T
elecommun
Comput
El
Control,
V
ol.
18,
No.
4,
August
2020
:
1884
–
1891
Evaluation Warning : The document was created with Spire.PDF for Python.
TELK
OMNIKA
T
elecommun
Comput
El
Control
r
1887
related
to
the
problem
domain
that
are
going
to
be
present
in
e
v
ery
association
pattern,
and
rele
v
ant
datasets,
that
may
or
may
not
be
related
to
the
tar
get
datasets,
b
ut
add
important
information
that
may
be
useful
for
decision
making
[20].
These
data
must
be
prepared
by
cleaning
errors,
solving
inconsistent
and
null
v
alues,
and
dealing
with
outliers.
Ne
w
attrib
utes
or
e
v
en
ne
w
data
objects
could
be
generated
using
the
input
data.
This
step
requires
considerable
ef
fort
and
may
require
man
y
iterations.
Thus,
it
is
advisable
to
implement
the
process
using
a
proper
methodology
such
as
CRISP-DM
[21].
4.2.
Neighborhood
definition
As
mentioned
before,
a
parti
cular
spatial
association
pattern
may
ha
v
e
a
higher
occurrence
probabili
ty
in
some
re
gions
and
lo
wer
probability
in
others
[8].
F
or
this
reason
it
is
preferred
to
search
for
this
kind
of
pattern
locally
.
F
or
this,
we
propose
defining
partitions
of
the
dataset,
called
neighborhoods
in
this
conte
xt,
and
the
subsequent
e
x
ecution
of
the
association
pattern
search
algorithm
on
each
of
them.
These
neighborhoods
can
be
defined
beforehand
using
kno
wledge
from
to
the
problem
domain,
or
using
spatial
clustering
techniques.
Using
density-based
or
distance-based
spatial
clustering
algorithms
[22–
24]
is
suggested
due
to
the
First
La
w
of
Geograph
y
,
which
stat
es
that
spatial
objects
located
together
are
more
closely
related
than
those
that
are
f
ar
a
w
ay
from
each
other
[25,
26].
Nonetheless,
there
is
an
issue
to
consider
in
this
step:
the
limits
between
neighborhoods
may
add
important
information
for
spatial
association
mining.
Thus,
the
use
of
fuzzy
clustering
techniques
or
fle
xible
boundaries
models
may
be
desirable.
4.3.
Modelling
of
spatial
r
elationships
using
graphs
No
w
,
we
ha
v
e
to
calculate
the
spatial
relations
between
the
tar
get
data
instances
and
the
insta
n
c
es
of
the
rele
v
ant
dataset
from
each
neighborhood.
Depending
on
the
problem
domai
n,
dif
ferent
types
of
spatial
relations
can
be
calculated:
euclidian,
topological,
directional
or
h
ybrid
relationships,
as
mentioned
abo
v
e
[6].
This
might
be
a
step
with
a
high
computational
cost.
Graph
theory
is
proposed
to
model
t
he
spatial
relationships
due
to
its
close
relation
with
first
order
logic
and
the
pattern
to
find
[16].
Graphs
are
discrete
structures
consisting
of
v
ertices
and
edges
that
connect
these
v
ertices.
There
are
dif
ferent
kinds
of
graphs,
depending
on
whether
edges
ha
v
e
directions
(digraphs),
whether
multiple
edges
can
connect
the
same
pair
of
v
ertices
(multigraphs),
and
whether
loops
are
allo
wed.
F
ormally
,
a
simple
gr
aph
G
=
(V
,
E)
consists
of
V
,
a
nonempty
set
of
v
ertices
(or
nodes)
and
E,
a
set
of
edges.
Each
edge
has
tw
o
v
ertices
associated
with
it,
called
its
endpoints.
An
edge
is
said
to
connect
its
endpoints.
T
o
relate
each
edge
to
its
endpoints,
a
function
:
E
!
f
v
1
2
V
;
v
2
2
V
g
,
called
incidence
function,
is
used.
A
multigraph,
on
the
other
hand,
is
a
graph
where
multiple
edges
can
e
xist
associated
with
the
same
endpoints.
Additionally
,
each
v
erte
x
and
each
edge
can
be
labeled
with
data
related
to
the
represented
object.
This
structure
can
be
adapted
to
multiple
scenarios
and
multiple
ef
ficient
algorithms
can
be
used
to
e
xtract
v
aluable
information
such
as
maximum
cliques
[27].
In
the
conte
xt
of
this
w
ork,
multigraphs
are
used
to
model
spatial
objects
as
v
ertices
and
the
rel
ations
between
them
as
edges.
A
small
e
xample
can
be
seen
in
Figure
2
(a).
T
w
o
sets
of
labels
and
tw
o
e
xtra
functions
to
asign
those
labels
to
the
v
ertices
and
edges
are
needed.
So,
let
G
be
a
multigraph
without
loops
G
=
(
V
;
E
;
L;
K
;
;
l
;
k
)
where:
V
is
the
v
erte
x
set
of
G
,
which
corresponds
to
the
spatial
objects
from
the
datasets;
E
is
the
edge
set
that
corresponds
to
each
calculated
relationship
between
the
spatial
objects;
L
is
the
v
erte
x
label
set
with
the
characteristics
of
the
spatial
data
objects;
K
is
the
edge
label
set,
with
the
charac
teristics
of
each
spatial
relation;
:
E
!
f
x
2
P
(
V
)
=
j
x
j
2
g
is
the
incidence
function;
l
V
L
and
k
E
K
are
labeling
relations.
The
aforementioned
structure
mak
es
it
possible
to
model
multiple
dif
ferent
relationships
with
the
same
endpoints
labeled
with
dif
ferent
attrib
utes.
Also,
man
y
attrib
utes
of
spatial
objects
could
be
tak
en
into
consideration.
Additionally
,
it
must
be
noted
that
loops
(i.e.
edges
with
only
one
endpoint)
are
not
considered
because
t
heir
lack
of
semantics
in
this
conte
xt
(there
are
not
spatial
relationships
that
in
v
olv
es
only
one
spatial
object).
Fuzzy
logic
could
also
be
a
v
aluable
tool
to
model
the
spatial
relationships,
if
the
situation
requires
it
[10].
More
information
about
fuzzy
logic
this
can
be
found
in
[28]
4.4.
Fr
equent
subgraph
mining
T
o
e
xtract
spatial
associations
wit
h
a
high
probability
of
occurrence,
frequent
subgraph
mining
is
pro-
posed
to
be
used
for
each
modeled
graph.
Gi
v
en
a
multigraph
G
=
(
V
;
E
;
L;
K
;
;
l
;
k
)
lik
e
the
one
described
in
the
pre
vious
section,
the
frequent
subgraph
mining
problem
in
a
single
multigraph
is
finding
recurring
subgraph
Spatial
association
disco
very
pr
ocess
using
fr
equent
subgr
aph
mining
(Gio
vanni
Dai
´
an
Rottoli)
Evaluation Warning : The document was created with Spire.PDF for Python.
1888
r
ISSN:
1693-6930
G
i
G
,
or
in
other
w
ords,
a
subgraph
that
has
multiple
instances
in
the
original
graph
Figure
2
(b).
It
must
be
noted
that
tw
o
graphs
are
isomorphic
if
all
of
their
v
ertices
and
edges
are
shared
including
its
labels.These
frequent
subgraphs
represent
the
rela
tionships
between
spatial
object
types
that
tak
e
place
in
the
space
with
a
high
occurrence
probability
.
Multiple
algorithms
ha
v
e
been
designed
for
frequent
subgraph
mining
in
a
single
big
graph,
calcu-
lating
the
rele
v
ance
of
a
pattern
in
dif
ferent
w
ays.
Some
well-kno
wn
e
xamples
of
this
are
IncGM+,
FSSG,
SUBDUE,
among
others
[29,
30].
A
set
of
frequent
subgraphs
for
each
neighborhood
is
obtained
as
a
result
of
this
step
and
must
be
analyzed
to
obtain
useful
kno
wledge
for
decision-making.
4.5.
Ev
aluation
of
r
esults
In
the
final
step,
frequent
subgraphs
translated
into
n-ary
predicates
that
represent
tri
vial
information
(non-no
v
el
patterns)
must
be
filtered.
The
support
and
confidence
measures
can
be
e
xtracted,
selecting
the
metrics
that
the
desicion-mak
er
consider
to
be
more
appropiate.
This
acti
vity
could
be
performed
automatically
or
manually
by
an
analyst
with
kno
wledge
about
the
problem
domain
with
help
from
an
e
xpert.
P
ubl
icS
q
uar
eX
1
C
ity
A
C
h
ur
chY
1
I
ncl
ude
I
n
c
l
ude
N
eig
hbor
s
(a)
(b)
Figure
2.
(a)
Simplified
e
xample
of
spatial
relationship
modelling
using
a
simple
graph.
(b)
Example
of
frequent
subgraph
(bottom)
found
in
a
simple
graph
without
labels
in
the
edges.
5.
PR
OOF
OF
CONCEPT
The
proof
of
concept
presented
in
this
section
is
intended
to
sho
w
ho
w
the
proposed
process
w
orks,
implemented
by
dif
ferent
programming
and
data
mining
tools.
The
data
used
in
this
e
xample
consists
of
10
data
files
containing
the
location
of
f
acilities
in
Buenos
Aires
(Ar
gentina)
and
its
surroundings.
These
f
acilities
include
libraries(74),
clinics(63),
post
of
fices(55),
sports
halls(50),
nightclubs(41),
schools(107),
g
as
stations
(97),
churches
(125),
museums(37)
or
police
stations
(93).
F
or
each
of
them,
in
the
preparation
step
of
the
proposed
process,
the
data
files
were
inte
grated
into
a
single
data
file
of
spatial
points
using
QGis
(http://qgis.or
g/).
Each
spatial
point
is
comprised
of
tw
o
spatial
attrib
utes,
Latitude
and
Longitude,
and
one
non-spatial
attrib
ute,
the
type
of
b
uilding
from
the
pre
vious
list.
After
that,
only
the
points
that
are
located
outside
Buenos
Aires
limits
were
filtered
to
reduce
the
search
space,
lea
ving
742
spatial
points
Figure
3
(a),
(orange).
Then,
in
the
neighborhood
definition
step,
the
HDDBSCAN
clustering
algorithm
[31]
from
the
’
dbscan’
library
from
R
programming
language
w
as
used
on
the
spatial
data
attrib
utes
to
generate
tw
o
neighborhoods
with
a
minimum
number
of
points
equal
to
50
in
each
of
them
Figure
3
(a),
(blue).
Only
tw
o
neighborhoods
were
used
because
of
e
xplanatory
purposes.
In
the
ne
xt
step,
for
each
of
the
generated
neighborhoods,
a
geome
tric
relationship
between
their
data
points
w
as
e
xtracted
forming
a
graph
with
v
ertices
labeled
wit
h
the
type
of
f
acility
related
to
each
data
point
and
edges
labeled
with
the
sentence
”close
to”
if
the
adjacent
points
were
less
than
150
meters
a
w
ay
from
each
other
(this
v
alue
w
as
selected
for
illustrati
v
e
purposes
only).
Thus,
tw
o
graphs
were
created:
one
with
71
v
ertices
and
45
edges
in
neighborhood
1,
and
another
with
15
v
ertices
and
11
edges
in
neighborhood
2.
T
o
ob
t
ain
the
frequent
subgraphs
of
each
of
the
generated
graphs,
SUBDUE
algorithm
w
as
used
via
its
implementation
in
Subdue
Graph
Miner
Softw
are,
using
the
compression
rate
as
support
measure.
TELK
OMNIKA
T
elecommun
Comput
El
Control,
V
ol.
18,
No.
4,
August
2020
:
1884
–
1891
Evaluation Warning : The document was created with Spire.PDF for Python.
TELK
OMNIKA
T
elecommun
Comput
El
Control
r
1889
The
result
w
as
a
subgraph
as
sho
wn
in
Figure
3
(b),
with
a
compression
rate
of
15.5%
in
neighborhood
1,
which
w
as
translated
into
the
predicate
Post
of
fice
(
x
1
)
^
Nightclub
(
x
2
)
^
Close
to
(
x
1
;
x
2
)
and
tw
o
sub-
graphs
in
neighborhood
2
,
both
with
a
compression
rate
of
27.2%
that
w
as
translated
into
the
predicates
Clinic
(
x
1
)
^
Post
of
fice
(
x
2
)
^
Close
to
(
x
1
;
x
2
)
Post
of
fice
(
x
1
)
^
Sport
hall
(
x
2
)
^
Close
to
(
x
1
;
x
2
)
(a)
(b)
Figure
3.
(a)
Spatial
neighborhoods
generated
for
the
proof
of
concept
using
HDBSCAN
algorithm;
(b)
Results
of
the
proof
of
concept.
5.1.
Discussion
The
contrib
utions
of
the
proposed
process
are,
firstly
,
the
possibility
of
adapting
it
to
multiple
s
cenar
-
ios,
due
to
its
fle
xible
underlying
struc
ture
being
based
on
graphs.
Some
of
the
aforementioned
methods
use
fle
xible
structures
too
[6,
16]
b
ut
the
comple
xity
of
these
methods
increases
because
of
the
use
of
techniques
based
on
Logic
Programming.
On
the
other
hand,
some
other
methods
do
not
tak
e
int
o
account
comple
x
pat-
terns
[19].
Furthermore,
the
possibility
of
including
v
aluable
information
related
to
the
data
objects
and
the
spatial
relations
by
using
labels
in
the
graph
representation
is
also
considered.
Generally
,
the
data
structures
in
v
olv
ed
do
not
tak
e
into
account
comple
x
data
associated
to
the
spatial
relations
between
spatial
data.
In
relation
to
the
abo
v
e,
t
he
proposed
process
consi
d
e
rs
spatial
phenomena
such
as
autocorrelation
and
heterogeneity
,
by
using
spatial
neighborhoods.
Some
alternati
v
es
such
as
[7]
considering
spatial
autocorrelation
b
ut
not
considering
spatial
heterogeneity
or
comple
x
data
relationships.
In
most
of
the
cases
studied,
these
characteristics
are
present
due
to
their
rele
v
ance
in
data
mining.
Also,
related
to
t
his,
the
proposed
process
allo
ws
its
implementation
by
using
e
xisting
tools
such
as
frequent
subgraph
mining
algorithms
and
clustering
algorithms.
Some
of
the
stat
e-of-the-art
alternati
v
es
include
v
ery
fle
xible
and
po
werful
strate
gies,
b
ut
implementation
is
hard,
making
them
not
suitable
for
appli-
cation
in
small
or
medium
size
projects
[6,
9,
16,
19].
Lastly
,
the
high
adaptability
of
the
procedure
is
a
desired
characteristic
due
to
the
possibility
of
selecting
among
man
y
algorithms
for
the
implementation
of
each
step.
Usually
,
the
state-of-the-art
methods
propose
a
single
alternati
v
e
for
its
e
x
ecution.
6.
CONCLUSION
This
w
ork
describes
a
kno
wledge
disco
v
ery
process
called
for
e
xtraction
of
spatial
associat
ions.
The
process
is
fle
xible
enough
to
tak
e
into
account
multiple
and
v
aried
spatial
relationships
between
spatial
objects
of
an
y
kind,
using
a
graph
structure
to
model
them.
Heterogeneity
and
autocorrelation
phenomena
are
also
considered,
defining
neighborhoods
where
the
search
process
is
performed
t
o
find
this
class
of
re
gularity
.
The
solution
w
as
designed
to
initially
approach
to
this
data
mining
task
without
w
orrying
too
much
about
par
-
ticular
characteristics
of
data
mining
algorithms.
In
a
lar
ge-scale
project,
this
process
could
guide
the
selection
of
specific
methods
based
on
the
results
obtained
in
first
iterations
of
an
incremental
methodology
.
A
proof
of
concept
is
presented
as
well,
using
real
data
to
illustrate
ho
w
the
process
is
implemented
using
dif
ferent
programming
and
data
mining
tools
in
each
of
the
proposed
steps.
Spatial
association
disco
very
pr
ocess
using
fr
equent
subgr
aph
mining
(Gio
vanni
Dai
´
an
Rottoli)
Evaluation Warning : The document was created with Spire.PDF for Python.
1890
r
ISSN:
1693-6930
In
future
w
orks,
the
research
will
be
focused
on
implementation
strate
gies
according
to
the
problem
domain
for
each
of
the
steps
of
the
process,
in
order
to
decrease
computational
e
x
ecution
time
when
dealing
with
lar
ge
amounts
of
spatial
objects
and
spatial
relationships.
Also,
fuzzy
methods
will
be
considered
for
relation
modelling
and
neighborhood
definition.
A
CKNO
WLEDGEMENT
The
research
presented
in
this
paper
w
as
partially
funded
by
the
PhD
Scholarship
Program
to
reinforce
R&D&I
areas
(2016-2020)
of
the
Uni
v
ersidad
T
ecnol
´
ogica
Nacional
and
the
Research
Project
80020160400001
LA
of
Nat
ional
Uni
v
ersity
of
Lan
´
us.
The
authors
also
w
ant
to
e
xtend
their
gratitude
to
K
e
vin-Mark
Bozell
Poudereux,
for
proofreading
the
translation.
REFERENCES
[1]
R.
Garcia-Martinez,
P
.
Britos,
and
D.
Rodriguez,
“Information
mining
processes
based
on
intelligent
sys-
tems,
”
International
Conference
on
Industrial,
Engineering
and
Other
Applications
of
Applied
Intelligent
Systems,
pp.
402-410,
2013.
[2]
K.
K
operski
and
J.
Han,
“Disco
v
ery
of
spatial
associati
on
rules
in
geographic
information
databases,
”
International
Symposium
on
Spatial
Databases,
pp.
47–66,
1995.
[3]
Y
.
Leung
et
al.,
”Kno
wledge
disco
v
ery
in
spatial
data,
”
Springer
,
2010.
[4]
C.
C.
Agg
arw
al,
”Data
mining:
The
te
xtbook,
”
Springer
,
2015.
[5]
R.
Agra
w
al,
et
al.,
“F
ast
algorithms
for
mining
association
rules,
”
Proc.
20th
int.
conf.
v
ery
lar
ge
data
bases,
v
ol.
1215,
pp.
487-499,
1994.
[6]
A.
Appic
e,
M.
Ceci,
A.
Lanza,
F
.
A.
Lisi,
and
D.
Malerba,
“Disco
v
ery
of
spatial
association
rules
in
geo-referenced
census
data:
A
relational
mining
approach,
”
Intelligent
Data
Analysis,
v
ol.
7,
no.
6,
pp.
541-566,
2003.
[7]
J.
Chen,
“
An
algorithm
about
association
rule
mining
based
on
spatial
autocorrelation,
”
The
International
Archi
v
es
of
the
Photogrammetry
,
Remote
Sensing
and
Spatial
Information
Sciences,
v
ol.
37,
no.
B6b,
pp.
99-106,
2008.
[8]
E.
H.
Simpson,
“The
interpret
ation
of
interaction
in
contingenc
y
tables,
”
Journal
of
the
Ro
yal
Statistical
Society
.
Series
B
(Methodological),
v
ol.
13,
no.
2,
pp.
238–241,
1951.
[9]
H.
Bai,
Y
.
Ge,
J.
W
ang,
D.
Li,
Y
.
Liao,
and
X.
Zheng,
“
A
method
for
e
xtracting
rules
from
spatial
data
based
on
rough
fuzzy
sets,
”
Kno
wledge-Based
Systems,
v
ol.
57,
pp.
28-40,
2014.
[10]
R.
Ladner
,
F
.
E.
Petry
,
and
M.
A.
Cobb,
“Fuzzy
set
approaches
to
spatial
data
mining
of
association
rules,
”
T
ransactions
in
GIS,
v
ol.
7,
no.
1,
pp.
123-138,
2003.
[11]
Z.
Sha
and
X.
Li,
“Mining
local
association
patterns
from
spatial
dataset,
”
Se
v
enth
International
Confer
-
ence
on
Fuzzy
Systems
and
Kno
wledge
Disco
v
ery
(FSKD),
v
ol.
3,
pp.
1455-1460,
2010.
[12]
Z.
Sha,
X.
T
an,
and
Y
.
Bai,
“Localized
spatial
association:
A
case
study
for
understanding
v
e
getation
successions
in
a
typical
grassland
ecosystem,
”
Geo-Informatics
in
Resource
Management
and
Susta
inable
Ecosystem,
pp.
33-45,
2015.
[13]
X.
Cunjin
and
L.
Xiaohan,
“No
v
el
algorithm
for
mining
ENSO-oriented
marine
spatial
association
pat-
terns
from
raster
-formatted
datasets,
”
ISPRS
International
Journal
of
Ge
o
-
Information,
v
ol.
6,
no.
5,
pp.
1-15,
2017.
[14]
A.Salleband
and
C.Vrain,
“
Anapplication
of
associat
ion
rules
disco
v
ery
to
geographic
information
sys-
tems,
”
European
Conference
on
Principles
of
Data,
pp.
613-618,
2000.
[15]
S.
S.
U.
Sutjipto,
I.
S.
Sitangg
ang,
and
B.
Barus,
“Potential
usage
estimation
of
ground
w
ater
using
spatial
association
rule
mining,
”
TELK
OMNIKA
T
elecommunication,
Computing,
Electronics
and
Control,
v
ol.
15,
no.
1,
pp.
504-511,
2017.
[16]
D.
Malerba,
F
.
Es
posito,
F
.
A.
Lisi,
and
A.
App
i
ce,
“Mining
spatial
association
rules
in
census
data,
”
Research
in
Of
ficial
Statistics,
v
ol.
5
no.
1,
pp.
19-44,
2003.
[17]
A.
H.
Goudarzi
and
N.
Ghadiri,
“
A
h
ybrid
spatial
data
mining
approach
based
on
fuzzy
topological
rela-
tions
and
moses
e
v
olutionary
algorithm,
”
Artificial
Intelligence,
Cornell
Uni
v
ersity
2017.
[18]
I.
Lee,
“Mining
multi
v
ariate
associations
within
gis
en
vironments,
”
International
Conference
on
Indus-
trial,
Engineering
and
Other
Applications
of
Applied
Intelligent
Systems,
pp.
1062-1071,
2004.
TELK
OMNIKA
T
elecommun
Comput
El
Control,
V
ol.
18,
No.
4,
August
2020
:
1884
–
1891
Evaluation Warning : The document was created with Spire.PDF for Python.
TELK
OMNIKA
T
elecommun
Comput
El
Control
r
1891
[19]
H.
Y
ang,
S.
P
arthasarath
y
,
and
S.
Mehta,
“Mining
spatial
object
associations
for
sc
ientific
data,
”
II
Inter
-
national
Joint
Conference
on
Artificial
Intelligence
(IJCAI),
pp.
902-907,
2005.
[20]
V
.
Bogorn
y
,
P
.
M.
Engel,
and
L.
O.
Alv
ares,
“Geoarm:
an
interoperable
frame
w
ork
to
impro
v
e
geographic
data
preprocessing
and
spatial
association
rule
mining.
”
SEKE,
pp.
79-84,
2006.
[21]
R.
W
irth
and
J
.
Hipp,
“Crisp-dm:
T
o
w
ards
a
standard
process
model
for
data
mining,
”
Proceedings
of
the
4th
international
conference
on
the
practical
applications
of
kno
wledge
disco
v
ery
and
data
mining,
pp.
29-39,
2000.
[22]
J.
Sander
,
et
al.,
“Density-based
clustering
in
spatial
databases:
The
algorithm
gdbscan
and
its
applica-
tions,
”
Data
mining
and
kno
wledge
disco
v
ery
,
v
ol.
2,
no.
2,
pp.
169-194,
1998.
[23]
Y
.
Zhu,
K.
M
.
T
ing,
and
M.
J.
Carman,
“Density-ratio
based
clustering
for
disco
v
ering
clusters
with
v
arying
densities,
”
P
attern
Recognition,
v
ol.
60,
pp.
983-997,
2016.
[24]
A.
Sharma,
R.
Gupta,
and
A.
T
iw
ari,
“Impro
v
ed
density
based
spatial
clustering
of
applications
of
noise
clustering
algorithm
for
kno
wledge
disco
v
ery
in
spatial
data,
”
Mathematical
Problems
in
Engineering,
v
ol.
2016,
2016.
[25]
W
.
R.
T
obler
,
“Cellular
geograph
y
,
”
Philosoph
y
in
geograph
y
,
pp.
379-386,
1979.
[26]
J.
Duan,
L.
W
ang,
and
X.
Hu,
“The
ef
fect
of
spatial
autocorrelation
on
spatial
co-location
pattern
mining,
”
International
Conference
on
Computer
,
Information
and
T
elecommunication
Systems,
pp.
210-214,
2017.
[27]
G.
D.
Rottoli,
H.
Merlino,
and
R.
Garc
ıa-Martinez,
“Co-location
rules
disco
v
ery
process
focused
on
ref-
erence
spatial
features
using
decision
tree
learning,
”
International
Conference
on
Industrial,
Engineering
and
Other
Applications
of
Applied
Intelligent
Systems,
pp.
221-226,
2017.
[28]
D.
J.
Dubois,
”Fuzzy
sets
and
systems:
theory
and
applications,
”
Academic
press,
v
ol.
144,
1980.
[29]
E.
Abdelhamid,
M.
Canim,
M.
Sa
d
oghi
,
B.
Bhattac
h
a
rjee,
Y
.
Chang,
and
P
.
Kalnis,
“Incremental
frequent
subgraph
mining
on
lar
ge
e
v
olving
graphs,
”
IEEE
T
ransactions
on
Kno
wledge
and
Data
Engineering,
v
ol.
29,
no.
12,
pp.
2710-2723,
2017.
[30]
D.
Ka
vitha,
V
.
Kamakshi,
and
J.
Murth
y
,
“Finding
frequent
subgraphs
in
a
single
graph
based
on
symme-
try
,
”
International
Journal
of
Computer
Applications,
v
ol,
v
ol.
146,
no.
11,
pp.
0975-8887
2016.
[31]
R.
J.
Campello,
D.
Moula
vi,
A.
Zimek,
and
J.
Sander
,
“Hierarchical
density
estimat
es
for
data
clustering,
visualization,
and
outlier
detection,
”
A
CM
T
ransactions
on
Kno
wledge
Disco
v
ery
from
Data
(TKDD),
v
ol.
10,
no.
1,
pp.
5,
2015.
BIOGRAPHIES
OF
A
UTHORS
Gio
v
anni
Dai
´
an
Rottoli
is
a
researcher
at
the
Computational
Intelligence
and
Softw
are
Engineering
Research
Group
(GIICIS)
from
the
National
Uni
v
ersity
of
T
echnology
(Ar
gentina).
He
holds
a
Bach-
elor
´
s
De
gree
in
Information
Systems
from
the
aforementioned
uni
v
ersity
(2015).
He
is
currently
a
Ph.D.
candidate
in
Computer
Science
at
the
National
Uni
v
ersity
of
La
Plata
(Ar
gentina).
He
w
orks
as
an
associate
professor
of
Discrete
Mathematics
and
Da
ta
Science
at
the
National
Uni
v
ersity
of
T
echnology
(Ar
gentina).His
research
is
focused
in
the
fields
of
spatial
data
mining
and
kno
wledge
disco
v
ery
,
artificial
intelligence
and
search-based
softw
are
engineering.
Further
info
can
be
found
on
his
profile:
http://www
.frcu.utn.edu.ar/giicis/rottolig/
Her
n
´
an
Merlino
is
the
head
of
the
Adv
anced
Information
Syste
ms
Laboratory
at
Buenos
Aires
Uni-
v
ersity
(Ar
gentina)
and
the
head
of
the
Artificial
Intelligence
Laboratory
at
the
National
Uni
v
ersity
of
Lan
´
us
(Ar
gentina).
He
is
a
fello
w
of
the
Gas
and
Petroleum
Institute
at
Buenos
Aires
Uni
v
ersity
.
He
holds
a
Bachelor’
s
De
gree
in
Informat
ion
Systems
from
the
Uni
v
ersity
of
Belgrano
(Ar
gentina),
a
Master’
s
De
gree
in
Softw
are
Engineering
from
the
Computer
Science
Department
of
the
Polytechnic
Uni
v
ersity
of
Madrid
(Spain),
and
a
Ph.D.
in
Information
Sciences
from
the
National
Uni
v
ersity
of
La
Plata
(Ar
gentina
).
He
w
orks
as
a
tenured
full
professor
in
graduate
and
postgraduate
courses
at
Buenos
Aires
Uni
v
ersity
,
Austral
Uni
v
ersity
and
Na
tional
Uni
v
ersity
of
Lan
´
us.
His
research
inter
-
ests
are:
artificial
intelligence,
data
mining,
and
blockchain
technologies.
In
the
professional
field,
he
w
orks
as
a
Scientific
Res
earch
Director
in
an
Artificial
Intelligence,
Data
Science,
Blockchain
&
Smart
Contracts
compan
y
in
Ar
gentina.
Spatial
association
disco
very
pr
ocess
using
fr
equent
subgr
aph
mining
(Gio
vanni
Dai
´
an
Rottoli)
Evaluation Warning : The document was created with Spire.PDF for Python.