Int
ern
at
i
onal
Journ
al of Ele
ctrical
an
d
Co
mput
er
En
gin
eeri
ng
(IJ
E
C
E)
Vo
l.
10
,
No.
4
,
A
ugus
t
2020
,
pp.
3869
~
38
82
IS
S
N: 20
88
-
8708
,
DOI: 10
.11
591/
ijece
.
v10
i
4
.
pp3869
-
38
82
3869
Journ
al h
om
e
page
:
http:
//
ij
ece.i
aesc
or
e.c
om/i
nd
ex
.ph
p/IJ
ECE
Vertical
intent pr
edicti
on app
roac
h based on D
oc2vec
and
co
n
vo
lut
ional n
eu
ra
l n
etworks f
or imp
rovin
g verti
ca
l
se
l
ect
ion
in agg
regated se
arch
Sa
n
ae Achs
as,
El
Habib
Nfa
ou
i
LII
AN
La
bor
at
o
r
y
,
Facu
lty
of
Sc
ie
nc
es
Dhar El
Mahra
z, Sidi Moham
m
ed
Ben
Ab
del
l
ah
Univer
si
t
y
,
Moroc
co
Art
ic
le
In
f
o
ABSTR
A
CT
Art
ic
le
history:
Re
cei
ved
J
ul
18
, 2
019
Re
vised
Jan
13
,
20
20
Accepte
d
Fe
b 1
, 2
020
Vert
ical
sel
ec
t
io
n
is
the
ta
sk
o
f
sele
cting
the
m
ost
rel
eva
nt
v
ert
i
ca
ls
to
a
give
n
quer
y
i
n
orde
r
to
impr
ove
the
dive
rsit
y
and
qualit
y
o
f
web
sea
rch
result
s.
Thi
s
ta
sk
req
uire
s
not
onl
y
pre
di
cting
rel
e
vant
ver
t
ic
a
ls
but
al
so
the
se
ver
tical
s
m
ust
b
e
those
the
use
r
expe
ct
s
to
be
rel
eva
n
t
for
his
par
ti
cular
informati
on
need.
Mos
t
exi
sting
works
foc
used
on
using
tra
dit
io
nal
m
ac
hin
e
le
arn
ing
techniq
ues
to
combine
m
ult
ipl
e
t
y
pes
of
feature
s
fo
r
sele
c
ti
ng
seve
ral
re
le
v
ant
ver
ti
c
al
s.
Alth
ough
the
se
te
c
hnique
s
are
ver
y
eff
ic
i
ent,
handl
ing
v
ert
i
cal
select
ion
with
high
accurac
y
is
stil
l
a
ch
al
l
enging
rese
ar
ch
ta
sk.
In
thi
s
pap
er,
we
propos
e
a
n
appr
oa
ch
for
i
m
proving
ver
ti
c
al
sel
ec
t
io
n
in
orde
r
to
sati
s
f
y
th
e
user
ver
t
i
ca
l
in
te
nt
and
r
educ
e
user’s
br
ows
ing
ti
m
e
and
eff
or
ts.
First,
i
t
gen
erate
s
q
uer
y
embedding
s
vec
tors
using
t
he
doc2v
e
c
al
gorit
hm
th
at
pre
serve
s
s
y
n
tactic
and
sem
antic
informat
ion
withi
n
e
ac
h
quer
y
.
Se
condly,
th
is
vector
wi
ll
be
used
as
inpu
t
to
a
convol
ut
i
onal
n
eur
al
net
work
m
odel
for
inc
rea
sin
g
the
rep
re
sen
ta
ti
on
of
the
quer
y
wit
h
m
ult
ipl
e
l
eve
ls
of
abstra
ction
in
cl
uding
ri
ch
se
m
ant
ic
informat
i
on
and
the
n
cre
a
ti
ng
a
glob
al
sum
m
ari
za
t
io
n
of
the
quer
y
feature
s.
W
e
demons
tra
te
the
eff
e
ct
iv
ene
s
s
of
our
appr
o
a
ch
through
comprehe
nsive
exp
e
rimen
ta
t
ion
using
va
rious
d
at
ase
ts.
Our
ex
per
imental
find
i
ngs
show
tha
t
our
s
y
stem
ac
hi
eve
s
signif
icant
ac
cur
acy
.
Fu
rthe
r,
it
re
al
i
ze
s
ac
cur
ate
pre
d
ic
t
i
ons
on
new
unsee
n
da
ta.
Ke
yw
or
d
s
:
Aggregate
d
se
arch
Conv
olu
ti
onal
neural
netw
ork
Deep l
ear
ning
Do
c
2vec
Inform
at
ion
ret
rieval
Ver
ti
cal
intent
Ver
ti
cal
selec
ti
on
Copyright
©
202
0
Instit
ut
e
o
f Ad
vanc
ed
Engi
n
ee
r
ing
and
S
cienc
e
.
Al
l
rights re
serv
ed
.
Corres
pond
in
g
Aut
h
or
:
Sanae
Ac
hs
as,
Dep
a
rtm
ent o
f C
om
pu
te
r
Scie
nce,
Faculty
of Sciences
Dhar
El Ma
hraz,
Sidi Mo
ham
m
ed
Be
n Abdell
ah Un
i
ver
sit
y,
Fez, M
orocco.
Em
a
il
: sanae.achsas
@u
sm
ba.a
c.m
a
1.
INTROD
U
CTION
On
e
of
t
he
m
os
t
sig
nificant
dev
el
op
m
ents
on
li
ne
i
n
the
la
st
few
ye
ars
is
the
rising
popula
rity
of
aggre
gated
sea
rch
(AS)
syst
em
s,
they
pr
ese
nt
a
m
os
t
popula
r
web
searc
h
prese
ntati
on
par
a
dig
m
us
e
d
by
m
ajo
r
sea
rch
e
ng
i
nes
in
rece
nt
ye
ars,
this
te
chn
i
qu
e
c
onsist
s
of
integ
rati
ng
sea
rch
res
ul
ts
fr
om
a
var
ie
ty
of
div
e
rse
ver
ti
ca
ls
su
c
h
a
s
news,
im
ages,
vi
de
os
,
healt
h,
a
nd
W
i
kip
e
dia
i
nto
a
si
ng
le
i
nter
f
ace
wit
h
g
e
ne
r
al
we
b
search
.
As
s
ho
wn
in
F
ig
ur
e
1
[
1]
,
researc
h
in
a
ggregat
ed
searc
h
has
ta
ken
tw
o
m
ai
n
directi
ons.
Th
e
first
directi
on
stu
dies
differe
nt
m
e
thods
use
d
for
pr
e
dicti
ng
w
hich
ve
rtic
al
s
to
pr
ese
nt
kn
own
as
ver
ti
cal
sel
ect
io
n
(V
S
),
a
no
t
her
directi
on
invol
ves
te
ch
niques
that
analy
ses
the
way
of
pr
e
sentin
g
these
ver
ti
cal
s
in
the
W
e
b
resu
lt
s
know
n
as
ver
ti
cal
presentat
io
n
(VP),
the
resea
r
ch
pro
blem
i
nv
e
sti
gated
i
n
this
pap
e
r
f
ocuses
on the
first
one
.
Ver
ti
cal
sel
ect
ion
ta
sk
c
onsist
s
of
sel
ect
ing
a
subset
of
t
he
m
os
t
relevan
t
ver
ti
cal
s
to
a
giv
e
n
us
er
inf
or
m
at
ion
ne
ed
a
nd
im
pr
oves
the
sea
rc
h
e
ff
ect
ive
ness
w
hile
re
du
ci
ng
the
loa
d
of
que
ryi
ng
a
la
r
ge
s
et
of
m
ul
ti
ple
ver
ti
cal
s.
The
m
ai
n
go
al
beh
i
nd
this
ta
sk
is
to
help
the
us
e
r
to
sat
isfy
his
inf
or
m
at
ion
needs.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
4
,
A
ugus
t
2020
:
3869
-
3882
3870
Id
e
ntifyi
ng
the
intent
beh
in
d
the
us
er
quer
y
is
a
cru
ci
al
ste
p
towa
rd
re
achin
g
this
go
al
.
In
a
broa
d
sense
,
the
aut
om
atic
pr
e
dicti
on
of
use
r
inte
ntions
helps
i
n
e
nh
a
nc
ing
t
he
us
er
e
xp
e
rience
by
r
et
u
r
ning
m
or
e
releva
nt
resu
lt
s
t
o
us
ers
an
d
a
da
pting
these
re
su
lt
s
t
o
their
s
pecific
needs.
T
hus,
ve
rtic
al
sel
ect
ion
is
as
so
ci
at
ed
with
two
m
ai
n
chall
eng
e
s that a
re:
the d
i
ver
sit
y
of the
ver
ti
cal
s a
nd the
unde
rsta
nd
i
ng of t
he us
er in
te
nt.
Figure
1
.
A
ggr
egated
searc
h process
[
1]
Re
gardin
g
the
first
chall
e
ng
e
,
a
va
riet
y
of
he
te
rogen
e
ous
ve
rtic
al
search
eng
i
nes
e
xist
in
the
we
b,
wh
ic
h
m
eans
t
hat
each
ve
rtic
al
has
it
s
ow
n
featur
e
s,
f
or
e
xam
ple,
so
m
e
ver
ti
cal
s
are
not
direct
ly
searchab
le
by
us
ers
li
ke
th
e
weather
ve
rtic
al
,
thu
s
featu
r
es
gen
e
rated
f
r
om
the
ver
ti
cal
qu
ery
-
lo
g
will
no
t
be
avail
ab
le
fo
r
this
kin
d
of
ve
rtic
al
s.
Also
,
f
eat
ur
es
ge
ne
rated
from
ver
ti
c
al
cor
pus
will
no
t
be
avail
a
ble
fo
r
ve
rtic
al
s
su
c
h
as
the
cal
culat
or
and
la
ng
uag
e
translat
ion
[
2]
.
The
refo
re,
re
searche
rs
m
us
t
deal
with
t
he
fact
th
at
di
fferent
ver
ti
cal
s m
ay
r
equ
i
re
diff
e
re
nt
f
eat
ur
e
r
e
pres
entat
ion
s
whe
n creat
in
g new v
erti
cal
selec
ti
on
a
ppr
oach
es
.
The
sec
ond
c
ha
ll
eng
e
is
relat
ed
to
the
us
er
i
ntent;
we
know
that
a
ve
rtic
al
sel
ect
ion
syst
e
m
fo
cuses
on
retrie
ving
r
el
evan
t
ver
ti
ca
ls
for
s
howi
ng
it
s
res
ults
to
t
he
us
er
.
F
or
e
xam
ple,
if
the
us
er
searc
hes
im
ages
and
ne
ws
ve
rtic
al
s,
he
s
pecif
ic
al
ly
need
s
t
he
res
ults
of
t
he
se
ve
rtic
al
s,
he
is
n
'
t
interest
ed
in
the
c
onte
nt
of
the
ot
her
ve
rtic
al
s
su
c
h
as
s
hoppin
g
or
we
at
her
,
e
ve
n
if
t
heir
c
on
te
nt
is
releva
nt
beca
us
e
the
go
al
is
to
not
on
ly
hav
e
r
el
e
van
t
res
ults b
ut
also satisfy
t
he
u
se
r
inte
nt.
Existi
ng
re
sea
rch
pa
pe
rs
to
date
ha
ve
st
ud
ie
d
t
he
pro
blem
of
ve
rtic
al
sel
ect
ion
from
diff
ere
nt
ways
[3]
,
s
om
e
pr
io
r
w
ork
f
oc
us
e
d
on
co
ns
tr
ucting
m
od
el
s
that
aim
a
t
detect
ing
queries
with
co
ntent
-
s
pecific
ver
ti
cal
su
c
h
as
sho
pp
i
ng
[
4]
,
ne
ws
[
5]
,
j
obs
[
4]
or
Q
uestio
n
A
nsw
erin
g
[
6]
.
Ot
he
r
w
orks
f
or
ver
ti
cal
sel
ect
ion
[7
-
10]
f
ocu
se
d
on
us
in
g
t
rad
it
io
nal
m
achine
le
arn
i
ng
te
ch
niq
ue
s
to
c
om
bin
e
m
ulti
ple
typ
es
of
featur
e
s
f
or
s
el
ect
ing
se
veral
releva
nt
ve
rtic
al
s.
Alth
ough
these
te
c
hniq
ues
a
re
ve
r
y
eff
ic
ie
nt,
ha
nd
li
ng
v
erti
cal
selec
ti
on w
it
h hi
gh a
ccur
acy
is
sti
ll
a ch
al
le
ngin
g r
esearch
task
.
In
t
his
pa
pe
r,
we
a
re
inter
est
ed
in
te
xtua
l
qu
e
ries;
im
age
qu
e
r
ie
s
te
nd
to
ha
ve
im
age
res
ults.
Our
pro
posed
appr
oach
f
or
predict
in
g
ver
ti
cal
intent
c
ons
ist
s
of
ge
ne
rati
ng
que
ry
em
b
edd
i
ngs
vecto
rs
us
i
ng
do
c
2vec
al
gor
it
h
m
,
that
can
accuratel
y
pr
eserv
e
synta
ct
ic
and
sem
antic
inform
at
ion
within
eac
h
qu
e
ry,
therefo
re
we
pro
pose
t
o
us
e
it
as
a
pr
im
ary
query
repres
entat
ion
in
our
ver
ti
cal
sel
ection
m
od
el
pipe
li
ne;
then
it
will
be
us
e
d
as
i
nput
to
a
c
onvo
luti
on
al
neural
netw
ork
(CN
N)
m
od
el
that
can
inc
rease
this
represe
ntati
on
with
m
ulti
ple
l
evels
of
abst
ra
ct
ion
com
pr
isi
ng
rich
sem
antic
inform
ation
and
c
reati
ng
a
global
su
m
m
arization
of
the
query
f
eat
ur
es.
T
o
the
best
of
ou
r
kn
ow
le
dg
e
,
this
is
the
first
tim
e
wh
e
n
the
be
ne
fits
of
deep
le
a
rn
i
ng
and
par
a
grap
h
vectors
are
e
xp
l
oited
in
the
con
te
xt
of
ve
rtic
al
sel
ect
ion
,
w
hich
ca
n
achiev
e
an
am
azi
ng
progressi
on and
dev
el
op
m
ent in th
is
area.
The
rem
ai
nd
er
of
this
pap
e
r
is
structu
red
as
fo
ll
ows.
I
n
the
nex
t
sect
ion
,
we
rev
ie
w
the
relat
ed
wor
k
con
ce
r
ning
ve
r
ti
cal
sel
ection
.
In
Sect
ion
3,
we
pr
ov
i
de
a
descr
i
ption
of
our
propose
d
m
et
ho
d.
Sect
io
n
4
is
devoted
to
t
he
exp
e
rim
ental
set
ti
ng
s.
W
e
present
an
d
disc
us
s
the
ex
pe
rim
ental
resu
lt
s
in
Sect
ion
5.
F
i
nally
,
we
c
on
cl
ud
e
th
e stu
dy and
dis
cuss
t
he fut
ur
e
issue.
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
Vert
ic
al intent
pr
e
dicti
on
approac
h b
as
e
d o
n Do
c
2vec
and co
nvo
l
utio
na
l
neural
network
s…
(
Sana
e
Ac
hsas
)
3871
2.
RELATE
D
W
ORK
Aggregate
d
se
arch
ca
n
be
c
om
par
ed
to
fed
e
rated
searc
h,
w
hich
ai
m
s
to
prov
i
de
an
inte
grat
ed
searc
h
acro
s
s
m
ulti
pl
e
te
xt
colle
ct
ion
s
,
re
ferre
d
to
as
res
ource
s,
into
one
sing
le
rankin
g
l
ist
[11]
.
Sim
ilar
to
aggre
gated
sea
rch, f
e
der
at
e
d
s
earch
is t
y
pical
ly
d
ecom
po
sed
into
tw
o
s
ub
-
t
asks: r
e
source
sel
ect
ion
a
nd
r
esults
m
erg
in
g.
The
m
a
in
dif
fer
e
nce
betwee
n
f
eder
at
e
d
sea
rc
h
a
nd
a
ggre
ga
te
d
sear
ch
is
the
hete
roge
ne
it
y
of
the d
at
a
an
d
t
he
presentat
io
n of t
he res
ults.
Re
gardin
g
re
source
s
el
ect
ion
,
existi
ng
a
ppr
oa
ches
ca
n
be
c
at
egorized
i
nto
two
cl
asses
of
al
gorithm
s:
te
rm
-
based
a
nd
sam
ple
-
base
d.
For
t
he
firs
t
cl
ass,
Te
rm
-
base
d
al
go
rith
m
s
rep
resen
t
sh
ar
ds
by
c
ollec
ti
on
sta
ti
sti
cs
abo
ut
the
te
r
m
s
in
t
he
searc
h
en
gin
e’s
vo
ca
bula
r
y.
Ca
ll
an
et
a
l.
[12]
propose
the
CORI
al
gor
it
h
m
,
in
wh
ic
h
a
sha
rd
is
re
pr
ese
nted
by
the
num
ber
of
it
s
docum
ents
that
con
ta
in
the
t
erm
s
of
a
vo
c
abu
la
ry.
The
s
ha
rd
s
are
ra
nk
e
d
us
in
g
t
he
INDRI
ve
rs
ion
of
the
.
sco
re
f
unct
ion
usi
ng
the
m
entione
d
nu
m
ber
a
s
the
fr
e
quency
of
eac
h
que
ry
te
rm
.
CORI
sel
ect
s
a
fixed
nu
m
ber
of
s
ha
rd
s
from
the
t
op
of
t
his
ra
nkin
g.
Anothe
r
w
ork
of
[
13]
pro
poses
Tai
ly
,
a
novel
s
hard
s
el
ect
ion
al
gori
thm
that
m
od
el
s
a
query
’s
sco
re
distrib
ution
i
n ea
ch
s
hard as
a
G
am
m
a d
ist
ribu
ti
on a
nd
sel
ect
s sh
ar
ds
with
h
ig
hly scor
e
d do
c
um
ents in
the tai
l
of
the
distrib
ution
.
Tai
ly
est
im
at
es
the
par
am
et
ers
of
sco
r
e
distribu
ti
ons
base
d
on
the
m
ean
and
va
ri
ance
of
the
sco
re
f
unct
ion’
s
feat
ur
es
in
the
c
ollec
ti
on
s
a
nd
s
ha
rd
s
. Co
nce
rn
i
ng
th
e
al
gorithm
s
of
the
seco
nd
cat
egory,
they
us
e
a
cent
ral
sam
ple
ind
e
x
(C
SI)
of
do
c
um
ents
fr
om
each
s
hard
for
s
hard
sel
ect
io
n.
Fo
r
exam
ple,
in
[
14]
auth
or
s
pro
po
s
ed
RED
DE
al
gorithm
,
wh
e
re
they
rank
sh
a
r
ds
acco
rd
i
ng
to
the
num
ber
of
the
highest
rank
e
d
do
c
um
ents
that
belo
ng
to
this shard,
wei
gh
te
d
by
the r
at
io bet
ween
t
he
s
ha
rd’s
siz
e
a
nd
th
e
siz
e
of
the
sha
rd’
s
do
c
um
ents
in
the
CSI
.
T
he
SU
S
HI
al
gorit
hm
[15]
c
hoose
the
best
fitt
i
ng
f
unct
ion
f
r
om
a
li
st
of
possible
functi
ons
betw
een
the
est
i
m
a
te
d
ranks
of
a
sh
ar
d’
s
doc
um
ents
in
the
CSI
an
d
their
ob
s
er
ved
sc
or
e
s
fr
o
m
the
init
ia
l
search.
Using
this
f
un
ct
io
n,
t
he
al
gorithm
est
i
m
a
te
s
the
scores
of
the
to
p
-
ra
nk
e
d
doc
um
ents
of
ea
c
h
sh
ar
d.
SUSH
I
sel
ect
s
sh
ar
ds
base
d
on
their
est
i
m
at
ed
nu
m
ber
of
docum
ents
am
on
g
a
num
ber
of
t
op
-
r
ank
e
d
do
c
um
ents in
t
he glo
bal r
a
nking.
In
the
oth
e
r
ha
nd,
ide
ntifyi
ng
the
qu
e
ry
in
te
nt
of
a
us
er
is
a
well
-
known
pro
blem
in
Inform
at
ion
retrieval
a
nd
ha
ve
bee
n
stu
di
ed
by
a
c
onsid
erab
le
num
ber
of
r
esearc
hers
in
t
his
fiel
d,
wh
e
re
it
s
goal
is
to
sel
ect
relevan
t
do
c
um
ents
or
web
searc
h
re
su
lt
s
tha
t
can
sat
isfy
the
us
er
inf
or
m
at
ion
need.
For
this
reason
,
m
os
t
of
the
a
ppr
oach
es
ci
te
d
in
the
li
te
ratur
e
em
plo
y
qu
ery
lo
gs
,
supe
rv
ise
d,
sem
i
-
s
up
e
r
vised
cl
assifi
ers
,
diff
e
re
nt
la
ngua
ge
m
od
el
s
a
nd
word
em
beddin
gs
.
W
e
ha
ve
f
or
e
xam
ple
so
m
e
popu
la
r
works
that
e
xa
m
ined
qu
e
ry
intent
de
te
ct
ion
,
li
ke
the
w
ork
of
[16]
wh
e
re
the
auth
or
s
hav
e
de
velo
ped
a
ne
w
dataset
with
alm
os
t
2,000
queries
t
agg
e
d
i
n
i
nform
at
ion
al
,
na
vi
gational
an
d
tr
ansacti
onal
cat
egories.
They
cal
c
ulate
d
fe
at
ur
es
f
or
each
of
these
qu
e
ries
us
i
ng
a
real
-
w
or
l
d
query
lo
g.
Jia
ng
et
Yang
in
[17]
,
ta
ckle
th
e
prob
le
m
of
qu
e
r
y
intent
infe
re
nc
e
by
integ
rati
ng
m
ulti
ple
inf
or
m
at
ion
sources
i
n
a
se
a
m
le
ss
m
ann
er.
T
hey
first
pro
po
s
e
a
com
pr
ehensi
ve
data
m
od
el
cal
le
d
Search
Qu
e
ry
Lo
g
S
tructu
re
(SQL
S)
that
re
pr
es
ents
the
rela
ti
on
s
hi
p
betwee
n
searc
h
queries
via
the
Use
r
dim
ension,
the
URL
di
m
ension,
th
e
Session
dim
ensio
n,
a
nd
th
e
Ter
m
dim
ension
.
T
he
n
they
pr
opose
three
new
f
ram
ewo
rks
tha
t
are
eff
ect
ive
to
infe
r
que
r
y
intents
by
m
ining
the
m
ulti
di
m
e
ns
io
nal
str
uct
ur
e
c
onstr
ucte
d
f
r
om
the
search
query
l
og.
A
no
t
her
w
ork
in
[
18]
exp
l
or
es
the
com
petence
of
l
exical
sta
ti
sti
cs
and
em
bed
din
g
m
eth
od.
Fir
st,
a
novel
te
rm
exp
an
sio
n
al
gorithm
is
desig
ne
d
to
s
ke
tc
h
al
l
possib
le
intent
can
did
at
es.
Mo
re
over,
a
n
ef
fici
ent
qu
e
ry
inte
nt
gen
e
rati
on
m
od
el
is
pro
po
se
d,
w
hich
le
ar
ns
la
te
nt
rep
re
sentat
io
ns
f
or
intent
c
and
i
date
s
via
e
m
bed
din
g
-
base
d
m
et
ho
ds
,
a
nd
the
n
vecto
rized
inte
nt
can
did
at
es
a
re
cl
us
te
re
d
a
nd
detect
e
d
as
query
inte
nts.
K
i
m
et
al
.
[19]
us
e
d
en
riche
d
word
e
m
bed
di
ngs
to
force
sem
an
ti
cal
ly
si
m
il
ar
or
dissim
il
ar
word
s
to
be
cl
os
e
r
or
far
t
her
a
w
ay
in
the
em
bed
di
ng
sp
ace
to
im
pr
ove
the
perform
ance
of
inte
nt
detect
ion
ta
s
k
for
spo
ken
la
ngua
ge
unde
rstand
i
ng,
an
d
th
us
by
exp
l
oiti
ng
se
ve
ral
se
m
antic
le
xicon
s,
s
uc
h
as
W
or
dN
e
t,
PPD
B
(P
a
r
aphra
se
Data
ba
se),
an
d
Ma
c
m
illan
Dict
ion
a
ry,
an
d
us
i
ng
t
hem
la
te
r
as
init
ia
l
representat
io
n
of
wor
ds
f
or
intent
detect
ion.
Th
us,
they
trai
n
an
e
nd
-
to
-
en
d
m
od
el
that
jointl
y
le
arn
s
the
sl
ot
an
d
i
ntent
cl
asses
f
ro
m
the
trai
ni
ng
data
by
buil
ding
a
bid
irect
io
nal
LSTM
(L
ong
Shor
t
-
Term
M
e
m
or
y).
I
n
[
20]
,
the
pr
op
os
e
d
appro
ac
h
ai
m
s
at
identify
ing
qu
e
ry
intent
as
a
m
ul
ti
-
cl
ass
cl
assificati
on
ta
sk
w
hi
ch
extracte
d
query
vect
or
re
pr
ese
ntati
ons
usi
ng
CN
N
ins
t
ead
of
eng
i
neer
i
ng
qu
ery
feat
ur
es
.
T
his
m
et
ho
d
use
s
dee
p
le
a
rn
i
ng
to
fin
d
que
ry
vect
or
re
presentat
ion
s
,
t
he
n
us
e
them
as f
eat
ur
e
s to
cl
assify
qu
eries by i
ntent.
As
we
ca
n
see
in
th
e
re
view
ed
a
ppr
oach
e
s
above,
the
pro
blem
of
que
ry
intent
de
te
ct
ion
is
st
ud
i
e
d
without
c
onsid
erin
g
the
ve
rtic
al
intent
issue
.
knowin
g
that
with
the
ap
pea
ran
ce
of
a
ggre
gated
searc
h,
va
rio
us
web
searc
h
en
gin
es
do
not
re
tur
n
un
if
orm
lists
of
Web
pa
ge
s,
bu
t
t
hey
al
s
o
incl
ude
re
su
l
ts
of
a
dif
fer
e
nt
ty
pe
from
diff
eren
t
ver
ti
cal
s
su
c
h
a
s
i
m
ages,
news,
vid
e
os
,
a
nd
s
o
on,
w
hich
m
eans
that
the
ke
y
co
m
po
ne
nt
of
this
do
m
ai
n
is
the
no
ti
on
of
ve
rtic
al
s.
The
refor
e
,
an
a
ggre
gated
search
syst
em
m
us
t
m
ake
pr
e
dicti
on
s
a
bout
wh
ic
h
ver
ti
cal
(s)
are
m
or
e
exp
ect
e
d
to
a
ns
we
r
the
issued
query,
wh
ic
h
al
lo
ws
r
edu
ci
ng
the
l
oa
d
of
queryi
ng
a
la
rg
e
set
o
f
m
ulti
ple v
erti
cal
s,
a
nd t
hen im
pr
ov
es
the sea
rch ef
fec
ti
ven
ess.
The
releva
nce
of
a
ve
rtic
al
de
pends
basical
ly
on
tw
o
fact
ors,
t
he
rele
va
nc
e
of
the
doc
um
ents
within
the
ve
rtic
al
co
ll
ect
ion
an
d
on
the
us
e
r’
s
i
nt
ent
to
the
vert
ic
al
(V
erti
cal
In
te
nt).
F
or
th
e
first
one,
t
o
m
ake
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
4
,
A
ugus
t
2020
:
3869
-
3882
3872
ver
ti
cal
sel
ect
i
on
decisi
on,
va
rio
us
ap
proac
hes
us
e
tra
diti
on
al
m
achine
le
arn
in
g
te
ch
niques
to
co
m
bin
e
diff
e
re
nt
s
ourc
es
of
e
vid
e
nce
that
ca
n
be
found
i
n
Query
f
eat
ur
es
(fea
tur
es
de
pe
nd
only
on
t
he
qu
e
ry)
[
6
-
9]
,
in
Ver
ti
cal
fea
tures
(
featu
res
dep
e
nd
only
on
t
he
ve
rtic
al
)
[2
,
5
,
21
,
22]
and
i
n
Ve
r
ti
cal
-
Qu
ery
fe
at
ur
es
(f
eat
ures
ai
m
to
m
easur
e
relat
ion
s
hip
s
between
t
he
ve
rtic
al
and
the
query,
a
nd
a
re
therefo
re
uniq
ue
t
o
the
ver
ti
cal
-
qu
ery
pai
r)
[2
,
7
,
22
-
24]
.
I
nd
ee
d,
am
on
g
al
l
the
se
w
orks
,
th
ere
are
th
os
e
t
hat
integrate
the
c
on
te
nt
from
a
sing
le
ver
ti
cal
[3]
.
In
this
respec
t,
Li
et
al
.
[4]
ad
dress
the
ge
ner
al
prob
le
m
of
ve
rtic
al
sel
ect
ion
us
i
ng
an
a
ppr
oach
th
at
fo
c
us
es
on
sh
op
ping
a
nd
job
ve
rtic
al
s
to
extract
im
plici
t
feedback
usi
ng
sem
i
-
su
pervise
d
le
arn
in
g
ba
sed
on
cl
ic
kthr
ough
data.
Diaz
[
5]
inv
est
igate
d
al
so
the
ve
rtic
al
sel
ect
ion
prob
le
m
with
resp
ect
t
o
the
new
s
ver
t
ic
al
,
wh
ere
he
der
ive
d
feat
ur
es
f
r
om
ne
ws
colle
ct
ion,
web
an
d
ve
r
ti
cal
qu
ery
-
lo
gs
an
d
inco
rpor
at
e
d
cl
ic
k
-
fee
dbac
k
int
o
the
m
od
el
.
More
rece
nt
w
ork
in
[
6]
has
al
so
ta
rg
et
ed
a
va
riant
of
the v
e
rtic
al
sel
ect
ion
pro
bl
e
m
,
wh
e
re
the
auth
or
us
e
C
omm
un
it
y
Qu
est
ion
An
s
we
r
ing
(CQ
A)
V
e
rtic
al
s
fo
r
det
ect
ing
qu
e
ries
with C
QA intent.
Othe
r
a
ppr
oac
hes
ha
ve
bee
n
de
velo
pe
d
where
se
ver
al
ve
r
ti
cal
s
are
co
nsi
der
e
d
sim
ultan
eo
us
ly
[3]
.
Arguel
lo et al
.
[7]
propose
a
c
la
ssific
at
ion
-
ba
sed
a
ppr
oach f
or
ve
rtic
al
selec
ti
on
in
which
they
exploit
f
e
at
ur
es
from
the
ve
rtic
al
co
ntent,
t
he
query
strin
g,
an
d
t
he
ver
ti
c
al
’s
qu
e
ry
lo
g.
The
cl
ic
k
-
t
hroug
h
data
is
use
d
t
o
const
ru
ct
a
des
cripti
ve
la
ngua
ge
m
od
el
f
or
e
ach
ver
ti
cal
’s
r
el
at
ed
queries.
Diaz
an
d
Arg
ue
ll
o
[
9]
al
so
present
sever
al
al
gorit
hm
s
fo
r
com
bin
in
g
us
e
r
fee
dback
with
offli
ne
cl
assifi
er
in
form
ation
,
the
fo
c
us
of
their
work
was
to m
axi
m
i
ze user
sati
sfac
ti
on
b
y
pr
ese
nt
ing
the a
ppr
opr
ia
te
v
erti
cal
d
isplay
. Ano
t
her
work
from
A
rguell
o
et
al
[8]
ha
ve
be
en
pro
po
se
d
wh
e
re
t
he
go
al
was
to
us
e
trai
ning
data
ass
oc
ia
te
d
with
a
set
of
e
xisti
ng
ve
r
ti
cals
in
or
der
to
le
a
rn
a
m
od
el
tha
t
can
m
ake
ve
rtic
al
sel
ect
ion
predict
io
ns
f
or
a
ta
rg
et
ve
rt
ic
al
.
Re
cent
w
ork
in
the
sam
e
con
te
xt
was
propos
ed
f
ro
m
[10]
,
i
n
w
hich
the
de
sired
ver
ti
cal
of
t
he
us
e
r
is
placed
on
th
e
top
of
the w
e
b res
ult pag
e
. T
his is a
chieve
d by pre
dicti
ng
ve
rtic
al
s b
ase
d o
n
the
us
er
’
s
p
ast
b
e
h
avi
or.
Re
gardin
g
the
seco
nd
facto
r
,
there
a
re
a
few
w
orks
a
ddressi
ng
the
ve
rtic
al
intent
issue
wh
e
n
stud
yi
ng
qu
e
ry
intent
detect
io
n.
F
r
om
the
use
r
intent
pe
rspect
ive,
us
e
r
ve
rtic
al
intent
al
so
play
s
an
im
po
rta
nt
ro
le
i
n
the
im
pr
ov
em
ent
of
the
a
ggre
ga
te
d
s
ear
ch
proces
s.
For
e
xam
ple,
Zh
ou
et
al
.
[
25]
pro
po
s
e
a
m
e
tho
dol
og
y
to
pr
e
dict
the
ver
ti
cal
intent
of
a
qu
e
ry
usi
ng
a
searc
h
e
ngine
lo
g
by
e
xploit
ing
cl
ic
k
-
th
r
ough
data.
Re
cent
w
ork
of
Ts
ur
et
al
.
[6]
present
a
su
pe
rv
ise
d
cl
assifi
cat
ion
sc
hem
e
wh
er
e
th
ey
aim
at
dete
ct
ing
qu
e
ries
with
quest
io
n
intent
as
a
var
ia
nt
of
the
ver
ti
cal
sel
ect
ion
pro
blem
.
They
intr
oduc
ed
two
cl
assif
ic
at
io
n
schem
es
that
c
on
si
der
qu
e
ry
structu
re.
I
n
th
e
first
approac
h,
they
induce
featur
e
s
from
t
he
query
str
uctur
e
as
an
in
pu
t
t
o
s
uper
vised
li
near
cl
assifi
cat
ion
.
In
t
he
sec
ond
ap
proac
h,
w
ord
cl
us
te
rs
a
nd
t
heir
posit
ion
s
in
the
query
are
us
ed
as
in
pu
t
to
a
ran
dom
fo
r
est
cl
assifi
er
to
ide
ntify
discrim
inati
v
e
structu
ral
el
e
m
ents
in the q
uer
y.
Desp
it
e
it
s
interest
ing
r
ole,
the
re
sea
rch
i
n
this
directi
on
is
sti
ll
lim
ited,
a
nd
t
her
e
is
no
huge
li
te
ratur
e
re
gardin
g
ve
rtic
al
sel
ect
ion
base
d
on
us
e
r
ve
rtic
al
intent,
especi
al
ly
with
the
evo
l
ution
sho
w
n
in
I
R
(Infor
m
at
ion
Re
trie
val)
duri
ng
the
la
st
ye
ars.
The
refor
e
,
we
will
fo
cus
in
t
his
w
ork
on
th
e
prob
le
m
of
ve
rtic
al
intent
predict
io
n,
wh
e
re
we
propose
a
new
a
ppr
oach
t
hat
com
bin
es
the
do
c2v
ec
al
gorith
m
s
and
co
nvol
ution
a
l
neural
net
works
an
d
ex
plo
it
s
for
the
first
ti
m
e
the
ben
e
fits
of
both
te
ch
niques
in
orde
r
to
im
pr
ove
ve
rtic
al
sel
ect
io
n
ta
s
k.
3.
PROP
OSE
D APP
ROAC
H
Thro
ugh
t
he
ve
rtic
al
sel
ect
ion
process,
the
qu
e
ry
is
proce
ssed
a
nd
se
nt
to
m
ulti
ple
ver
t
ic
al
s
as
well
as
the
We
b
s
earch
en
gin
e
,
in
orde
r
to
decide
w
hich
of
th
os
e
s
houl
d
be
sel
ect
ed
f
or
a
giv
e
n
qu
e
ry,
this
dep
e
nds
on
w
hat
are
t
he
ver
ti
cal
s
inten
ded
to
be
retrie
ved
by
the
us
er
,
we
ref
e
r
to
th
is
as
the
us
er
ve
rtic
al
intent (
VI) a
nd
it
can be
def
i
ne
d
as
foll
ows:
Give
n
a
us
er
’s
query
(
)
and
a
s
et
of
ca
ndidate
ver
ti
cal
s
=
{
1
,
2
,
…
,
}
,
the
ver
ti
cal
intent
is
represe
nted
by
the
vecto
r
=
{
1
,
2
,
…
,
}
,
w
her
e
each
va
lue
ind
ic
at
es
the
im
po
rtanc
e
of
the
gi
ve
n
ver
ti
cal
to
the
qu
e
ry
,
an
d
for
each
ve
rtic
al
,
giv
e
n
a
th
reshold
,
a
bove
w
hi
ch
the
ve
rtic
al
is
assum
ed
to h
a
ve
a
h
i
gh
intent f
or the
query: i
f
>
the
n w
e can sa
y t
hat t
he verti
cal
is i
nten
ded b
y t
he qu
e
ry
.
To
ad
dress
thi
s
pro
blem
,
we
propose
a
n
a
ppr
oach
base
d
on
Doc2
vec
and
C
onvoluti
on
al
Neural
Netw
orks.
Fir
s
t,
it
gen
erates
a
qu
ery
em
bed
di
ng
s
ve
ct
or
us
ing
the
do
c
2vec
al
go
rithm
that
pr
eser
ves
synt
act
ic
and
sem
antic
i
nfor
m
at
ion
within
each
qu
e
ry.
Seco
nd
ly
,
thi
s
vector
will
be
us
ed
as
inpu
t
to
a
Con
vo
l
ut
ion
al
Neural
Netw
ork
m
od
el
for
increasi
ng
t
he
represe
ntati
on
of
the
query
with
m
ulti
ple
le
vels
of
a
bs
tr
act
ion
includi
ng
ric
h
sem
antic
i
nf
orm
at
ion
an
d
the
n
cr
eat
ing
a
glo
bal
su
m
m
ariz
at
ion
of
the
qu
ery
featu
res.
Fi
gure
2
sh
ows
the
overall
arch
it
ect
ur
e
of
ou
r
propose
d
ve
rtic
al
sel
ect
ion
sy
stem
.
It
con
ta
ins
two
m
ai
n
par
ts
,
the
sem
antic
rep
rese
ntati
on
of
t
he
query
a
nd
the
query
l
evel
feat
ur
e
e
xt
racti
on
.
S
ub
se
ct
ion
s
bello
w
descr
i
be
these tw
o parts
and all
p
e
rfo
r
m
ed
ste
ps
i
n d
epth.
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
Vert
ic
al intent
pr
e
dicti
on
approac
h b
as
e
d o
n Do
c
2vec
and co
nvo
l
utio
na
l
neural
network
s…
(
Sana
e
Ac
hsas
)
3873
Figure
2
.
Archi
te
ct
ur
e
of
t
he p
rop
os
ed
v
e
rtic
al
selec
ti
on
syst
e
m
3.1.
Semantic
repr
esent
at
i
on
of the quer
y
The
c
or
e
al
go
r
it
h
m
of
this
st
ep
is
doc
2v
ec
wh
ic
h
is
an
unsupe
rv
ise
d
m
od
el
that
is
use
d
m
os
t
to
const
ru
ct
distri
bu
te
d
re
pr
e
sen
ta
ti
on
s
of
ar
bitraril
y
long
se
nt
ences.
It
is
a
n
extensi
on
of
word2
vec
that
le
arn
s
fix
e
d
-
le
ng
t
h
f
eat
ur
e
represe
ntati
on
s
f
or
va
riable
-
le
ng
t
h
pieces
of
te
xts
su
c
h
as
se
ntences
,
pa
ra
grap
hs
,
and
doc
um
ent
s
[26]
.
A
Doc2v
ec
or
par
a
gr
a
ph
vect
or
s
has
t
wo
dif
f
eren
t
a
rch
it
ect
ur
es:
T
he
Dis
tribu
t
e
d
Ba
g
-
of
-
Word
s
m
od
el
and
t
he
Distribu
te
d
Mem
or
y
m
od
el
.
The
Distrib
ut
ed
Ba
g
-
of
-
Wo
r
ds
(
DBO
W)
m
od
el
trai
ns
f
ast
er
a
nd
do
es
not
c
onsider
w
ord
ord
er;
it
pr
e
dicts
a
rand
om
gr
ou
p
of
w
ords
in
a
par
a
gr
a
ph
base
d
on
the
pro
vid
e
d
pa
ragrap
h
vecto
r.
I
n
the
Distri
bu
te
d
Me
m
or
y
(D
M)
m
od
el
,
the
par
a
grap
h
is
treat
ed
as
an
extra
word,
wh
ic
h
is
then
a
ve
rag
e
d
with
the
loc
al
relevan
t
w
ord
vecto
rs
for
m
aking
pr
e
dic
ti
on
s.
This
m
et
hod,
howe
ver, ac
qu
i
res
a
dd
it
io
nal c
al
culat
ion
but
can ac
hieve
b
e
tt
er r
es
ults tha
n DBO
W.
We
c
hose
Doc2V
ec
m
od
el
because
it
ov
erco
m
es
the
di
sadv
a
ntages
of
the
ot
her
ba
g
-
of
-
w
ords
m
od
el
s
by
le
ar
ning
sem
antic
relat
ion
s
hip
s
be
tween
w
ords,
this
is
wh
y
it
has
bee
n
wide
ly
us
ed
rece
nt
ly
in
var
i
ou
s
NL
P
t
asks
[27
-
29]
a
nd
I
nfor
m
at
ion
Re
trie
val
w
or
ks
[30
-
35]
wh
e
re
it
ha
s
pro
ve
n
that
it
is
able
to
captu
re
the
se
m
antic
s
of
pa
r
agr
a
phs
wh
ic
h
le
ads
to
excel
le
nt
res
ults.
I
n
this
ste
p,
the
goal
is
to
le
a
rn
a
good
sem
antic
rep
re
sentat
ion
o
f
th
e
inp
ut
query, foll
ow
i
ng
that
idea;
we
trie
d
to
represe
nt
each
query
as
vect
or
an
d
us
e
d
this
vect
or
as
featu
res
f
or
our
cl
ass
ific
at
ion
m
od
e
l
as
prese
nted
in
Fig
ur
e
2.
Ther
e
f
or
e,
Doc2v
ec
al
gorithm
co
ntribute
s effect
iv
el
y fo
r
im
pr
ov
i
ng the
pe
rfor
m
ance
of our sy
s
tem
.
3.2.
Query
le
vel
fe
at
u
re e
xt
r
act
i
on
Un
li
ke
tra
diti
onal
a
ppro
ac
he
s
that
require
handc
raf
te
d
fe
at
ur
es
to
predi
ct
releva
nt
vert
ic
al
s,
our
appr
oach
c
onsist
s
of
us
in
g
a
Conv
olu
ti
ona
l
Neural
Net
wor
k
to
e
xtract
th
e
m
os
t
i
m
po
rta
nt
sem
antic
featur
es
that
rep
re
sent
each
query
an
d
delet
e
tho
se
that
are
unnec
essary.
Re
cent
ly
,
CNNs
ha
ve
achieved
pro
m
isi
ng
perform
ances
in
va
rio
us
NL
P
ta
sk
s
,
s
uc
h
as
I
nfor
m
at
ion
Extracti
on
[
36]
,
S
umm
ariza
ti
on
[37]
,
Ma
chine
Transl
at
ion
[
38]
,
Cl
assifi
cat
i
on
[39]
,
Q
uestion
Answe
rin
g
[40]
and
oth
e
r
tradit
i
on
al
N
L
P
ta
sk
s
[
41
-
43]
.
CN
N
arch
it
ect
ure
use
d
in
this
pa
per is sh
own
in Fi
gure
3.
-
Inp
ut
L
ayer:
F
irst
of
al
l,
once
the
query
em
bed
di
ngs
vecto
r
s
are
ge
ne
rated
from
the
pr
e
vio
us
ste
p,
we
use
them
to
con
str
uct
the
i
nput
m
at
rix
nee
ded
in
the
em
bedding
la
ye
r
w
he
re
each
query
is
bein
g
represe
nt
ed
as a
2
-
dim
ensi
on
al
m
at
rix.
-
Con
v
olu
ti
on
al layer:
The
pri
m
ar
y
purpose
o
f
this
la
ye
r
is t
o
ca
ptu
re
the
s
ynta
ct
ic
and
se
m
antic
featur
e
s
of
the
entire
que
r
y
and
co
m
pr
e
ss
these
val
ua
ble
sem
antic
s
into
featu
re
m
aps.
T
hus,
w
e
perform
sever
al
conv
olu
ti
ons
ov
e
r
the
em
bed
de
d
que
ry
vecto
rs
us
i
ng
m
ulti
ple
filt
e
rs
with
diff
e
r
e
nt
wind
ow
s
iz
e.
As
t
he
filt
er
m
ov
es
on,
va
r
iou
s
feat
ur
es
are
pro
du
ce
d
and
c
om
bin
ed
into
a
featu
r
e
m
ap.
Acti
va
ti
on
functi
ons a
re a
dd
e
d
t
o
inc
orp
or
at
e ele
m
ent
-
wise
non
-
li
nea
rity
.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
4
,
A
ugus
t
2020
:
3869
-
3882
3874
-
Po
oling
la
yer:
In
this
la
ye
r,
the
goal
is
to
extract
the
m
os
t
relevan
t
fea
tur
es
within
e
ach
feat
ur
e
m
ap
,
therefo
re,
we
us
e
the
m
ax
-
poolin
g
strat
e
gy
fo
r
t
he
pooli
ng
op
e
rati
on.
Since
there
a
r
e
m
ulti
ple
feat
ure
m
aps,
we
ha
ve
a
vecto
r
after
each
po
olin
g
oper
ation
.
All
ve
ct
or
s
that
a
re
ob
ta
ine
d
from
the
m
ax
-
poolin
g
la
ye
r
are c
onca
te
nate
d
i
nto
a
fi
xed
-
le
ngth
fea
ture vect
or.
-
Fully
co
nn
ect
e
d
layers
:
these
la
ye
rs
const
it
ute
the
cl
assifi
cat
ion
pa
rt
of
our
m
od
el
;
their
m
a
in
pur
po
s
e
is
to
us
e
high
-
le
ve
l
featur
es
obta
ined
fr
om
t
he
previ
ou
s
la
ye
r
s
and
passe
d
t
he
m
to
the
final
s
of
tm
ax
la
ye
r
f
or
c
la
ssifyi
ng
t
he i
nput que
ry int
o vari
ou
s
class
es b
ase
d o
n
it
s
ver
ti
cal
intents
scores.
In
a
dd
it
io
n
to
these
dif
fer
e
nt
la
ye
rs,
there
ar
e
so
m
e
op
tim
iz
a
ti
on
s
that
we
ha
ve
pe
rfor
m
ed
in
order
t
o
reduce
the
overf
it
ti
ng
a
nd
obta
in
bette
r
te
st
accuracy.
T
hese
opti
m
iz
ation
s
incl
ud
e
a
pp
ly
in
g
an
l2
norm
const
raint
of
the
wei
gh
t
vect
or
s
in
t
he
co
nvol
utions
an
d
fu
ll
y
connecte
d
la
ye
rs
as
we
ll
as
add
in
g
sever
al
Ba
tc
h
N
or
m
alizat
ion
la
ye
rs,
that
norm
al
ize
the
act
ivati
ons
of
t
he
pr
e
vious
la
ye
r
for
each
batc
h
duri
ng
trai
ning,
w
hich
h
el
ps t
o t
rain
our
m
od
el
f
a
ste
r
a
nd conse
quently
i
m
pr
ove
our
m
od
el
’s
p
e
rfor
m
ance.
Figure
3
.
I
ll
us
t
rati
on of the
C
NN m
od
el
arc
hitec
ture
4.
E
X
PERI
MEN
TS
This
sect
io
n
de
scribes
th
e
da
ta
set
s
us
ed
a
nd
t
he
va
rio
us
hyperpa
ram
eter
s
ch
os
e
n
f
or
evaluati
ng
the p
e
rfo
rm
ance of
our
syst
em
, it al
so
giv
es
det
ai
ls about ho
w
the
doc
2v
ec
and CN
N were
traine
d.
4.1.
Datasets
In
th
ese
ex
pe
r
i
m
ents,
we
em
plo
y
th
ree
pu
bl
ic
dataset
s
fo
r
the
trai
ning,
validat
io
n
an
d
te
sti
ng
our
pro
po
se
d
syst
e
m
resp
ect
ively
,
a
su
m
m
ary
s
ta
ti
sti
c
of
these
dataset
s
is
li
ste
d
in
Ta
ble
1.
We
de
scrib
e
each
dataset
in deta
il
b
el
ow:
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
Vert
ic
al intent
pr
e
dicti
on
approac
h b
as
e
d o
n Do
c
2vec
and co
nvo
l
utio
na
l
neural
network
s…
(
Sana
e
Ac
hsas
)
3875
In
the
fi
rst
da
ta
set
,
we
us
e
the
offici
al
NTCIR
-
12
IMi
ne
-
2
ver
ti
cal
i
ntent
colle
ct
ion
fo
r
E
ngli
sh
Subto
pics
[
44
]
wh
ic
h
was
de
sign
e
d
to
e
xplore
a
nd
eval
ua
te
the
te
chnolo
gies
of
underst
and
i
ng
us
e
r
in
te
nt
s
beh
i
nd
the
qu
e
ry.
IMI
NE
-
2
in
cl
ud
es
a
set
of
100
to
pics
(i.e.
qu
e
ries)
.
Eac
h
top
ic
is
la
bele
d
by
a
s
et
of
in
te
nts
with
prob
a
bili
ti
es,
an
d
the
re
i
s
a
set
of
s
ub
t
opic
s
as
releva
nc
e
judgm
ent
for
each
ve
rtic
al
intent.
A
s
ub
t
opic
of
a
giv
e
n
que
ry
is
viewe
d
as
a
search
i
nt
ent
that
sp
eci
al
iz
es
and
/o
r
disam
big
uates
the
ori
gin
al
qu
e
ry,
the
num
ber
of
these
subto
pics
is
53
3.
The
ge
ner
al
idea
is
to
first
gen
e
rate
a
m
or
e
com
plete
rep
rese
ntati
on
of
the
diff
e
re
nt
possible
intents
associat
ed
wit
h
the
input
qu
ery,
an
d
then
t
o
pe
rfor
m
ver
ti
cal
sel
ection
f
or
eac
h
intent se
par
at
el
y.
In
a
dd
it
io
n,
thi
s
dataset
inclu
des
fi
ve
ty
pes
of
qu
e
ries,
na
m
el
y
“a
m
b
iguou
s
”,
“facet
e
d”
,
“ver
y
cl
ear”
,
“t
a
sk
-
or
ie
nted”
,
a
nd
“ve
rtic
al
-
ori
ented”
,
t
his
al
lows
us
t
o
i
nv
e
sti
gate
the
perform
ances
of
ou
r
syst
em
with
div
e
rse qu
e
ries
and
var
ie
d
to
pi
cs. Th
e
d
et
ai
ls
of the
f
ive
que
ry ty
pes
a
re as fo
ll
ows
[
44
]
:
a.
Am
big
uous
: T
he
c
on
ce
pts/o
bject
s
beh
i
nd th
e query a
re am
biguous
(
e.
g., "
Jag
uar
"
-
> ca
r,
anim
a
l, etc
.)
.
b.
Facet
ed:
The
inf
or
m
at
ion
nee
ds
be
hind
the
qu
e
ry
inclu
de
m
any
facets
or
aspects
(e.g.,
“harry
po
tt
er”
-
>
m
ov
ie
, b
oo
k,
W
i
kip
e
dia,
etc.
).
c.
Ver
y
cl
ear:
T
he
in
form
ation
need
be
hind
the
query
is
ve
ry
cl
ear
so
t
hat
usual
ly
a
sing
le
rele
va
nt
do
c
um
ent can sat
isfy his in
f
orm
ation
needs.
(e.
g.
,
“ap
ple.c
om
h
o
m
epag
e”)
d.
Task
-
ori
ented:
The
sea
rch
i
nt
ent
be
hind
th
e
qu
e
ry
relat
es
the
searc
her’s
go
al
(e.
g.
,
“l
os
e
weig
ht”
-
>
exer
ci
se,
h
e
al
thy f
ood, m
edici
ne,
etc.
).
e.
Ve
rtic
al
-
ori
ent
ed:
The
searc
h
intent
beh
i
nd
t
he
query
stron
gly
ind
ic
at
es
a
sp
eci
fic
ve
rtic
al
(e.g
.,
“i
P
hone
photo
”
-
>
I
m
a
ge verti
cal
).
We
use
this
dataset
as
trai
ning
data
fro
m
wh
ic
h
we
const
ru
ct
a
va
li
dation
set
by
sel
ect
ing
10%
rand
om
ly.
The
sec
ond
da
ta
se
t
us
ed
in
this
pa
per
is
Fe
dW
e
b’1
4
[
45
]
that
is
us
ed
in
the
TREC
FedWeb
track
2014,
t
he
colle
ct
ion
c
onta
ins
sea
rch
r
esult
pa
ges
fro
m
10
8
we
b
se
arch
e
ngines
(
e.g
.
G
oogle,
Y
ahoo
!,
YouT
ube
an
d
W
i
kip
e
dia)
.
F
or
eac
h
e
ng
i
ne
,
75
te
st
top
ic
s
wer
e
pro
vide
d,
from
wh
ic
h
50
will
be
use
d
f
or
ver
ti
cal
sel
ect
ion
e
valuati
ons
.
W
e
co
nduct
a
pr
e
dicti
on
ta
s
k
f
or
t
hese
50
te
st
qu
eries
a
nd
com
par
e
the
resu
lt
ob
ta
ine
d wit
h t
he real
v
al
ues
i
n
the
d
at
aset
.
The
la
st
datase
t
is
TREC
2009
Mi
ll
ion
Q
ue
r
y
Track
[46]
,
t
his
colle
ct
ion
c
on
ta
in
s
40
000
qu
e
ries
that
wer
e
sam
pled
from
two
la
rg
e
qu
e
ry
log
s
.
T
o
hel
p
an
onym
iz
e
and
sel
ect
qu
e
rie
s
with
re
la
ti
vely
hig
h
volum
e,
they
wer
e
pr
oc
essed
by
a filt
er
that
c
onve
rted
them
into
qu
eries
with
r
oughly
eq
ual
f
requ
ency
in
a
t
hir
d
qu
e
ry
log
.
T
his
datas
et
is
us
e
d
a
s
tr
ai
nin
g
data
f
or
doc2
vec
al
gor
it
h
m
to
im
pr
ove
it
s
perf
or
m
ance
with
la
rge
data
.
More
ov
e
r,
the
diff
e
re
nt
qu
e
ri
es
us
ed
i
n
thes
e
exp
e
rim
ents
hav
e
dif
fer
e
nt
le
ng
th
s,
f
ro
m
sh
ort
to
lo
ng
queries
as sho
wn in Fi
gures 4,
5 an
d 6. The
str
ucture o
f our m
od
el
al
lows
us
t
o
le
arn b
oth ki
nd
s
.
Table
1
.
Stat
ist
ic
al
inform
at
io
n
o
f
v
a
rio
us
da
ta
set
s
Dataset
Descripti
o
n
Size (
n
u
m
b
e
r
o
f
qu
eries)
Max Qu
er
y
Leng
th
(nu
m
b
e
r
o
f
word
s)
Task
NTCIR
-
1
2
I
MI
NE
-
2
Eng
lish
Sub
to
p
ic
Minin
g
[
4
4
]
It
co
n
tain
s th
e Que
ry
un
d
erstand
in
g
su
b
task
an
d
the Vertica
l
I
n
co
rpo
rating
su
b
tas
k
.
533
12
9
0
% Tr
ain
in
g
1
0
%
Valid
atio
n
Fed
W
eb
’14
[
4
5
]
It
is d
esig
n
ed
to re
so
u
rce
selectio
n
,
r
esu
lts
m
e
rgin
g
an
d
vertic
al selectio
n
task
s.
50
9
Testin
g
TREC
Million
Query
Tr
ack 2
0
0
9
[
4
6
]
It
is an
exp
lo
ration
of
ad h
o
c r
etrieval
ov
er
a
large set of
qu
eries
and
a
large
co
llect
io
n
o
f
d
o
cu
m
en
ts, and
it
i
n
v
estig
ates q
u
estions
of
syste
m
ev
alu
atio
n
.
4
0
0
0
0
16
d
o
c2
v
ec'
trainin
g
data
Figure
4
.
Quer
y l
eng
th
for N
TCIR
-
12 I
Mi
ne
-
2
Figure
5
.
Quer
y l
eng
th
for Fe
dW
e
b’1
4
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
4
,
A
ugus
t
2020
:
3869
-
3882
3876
Figure
6
.
Quer
y l
eng
th
for T
REC
m
il
l
ion
query trac
k
2009
4.2.
H
yp
er
parameter
s
an
d
tra
ini
n
g
Firstl
y,
in
order
to
t
rain
and
e
valuate
our
pr
opos
e
d
syst
em
,
we
nee
d
t
o
a
dj
us
t
se
ver
al
hype
rp
a
ram
et
e
rs.
I
nd
ee
d,
re
ga
rd
i
ng
the
first
pa
rt
of
ou
r
arc
hitec
ture,
to
use
doc2
vec
f
or
our
dataset
s,
w
e
first
trai
ned
do
c
2ve
c
m
od
el
(P
yt
ho
n
ge
ns
im
li
br
ar
y
i
m
ple
m
entat
i
on)
on
TREC
2009
Mi
ll
ion
Qu
e
ry
Trac
k
da
ta
set
s
us
in
g
Distrib
uted
Me
m
or
y
m
od
el
.
Th
en
we
trans
form
ed
al
l
the
queries
on
both
t
rainin
g
a
nd
te
sti
n
g
s
et
s
t
o
Do
c
2Vec
vector
s
. T
he vari
ous p
a
ram
et
ers
us
ed fo
r
trai
ning
doc2vec m
od
e
l are s
how
n
in
Table
2.
In
the
seco
nd
pa
rt,
t
he
im
ple
m
entat
ion
of
our
netw
ork
is
m
ade
us
i
ng
Ker
a
s
fr
am
ewor
k
with
Tens
orFlo
w
ba
ckend. Fo
r
the
input l
ay
er,
w
e
u
se
d
Ge
ns
im
'
s
do
c
2vec em
bed
di
ngs and c
re
at
ed
in
pu
t
data
fro
m
it
,
instea
d
of
usi
ng
ke
ras
em
bed
di
ng
la
ye
r.
T
o
bu
il
d
ou
r
CN
N
a
rch
it
ect
ure
there
a
re
m
any
hype
r
par
am
eter
s
t
o
choose
from
.
Ther
e
fore,
we
first
c
onsider
the
pe
rfo
rm
ance
of
a
b
asel
ine
CN
N
config
ur
at
io
n
us
in
g
the p
a
ram
et
ers
descr
i
bed in
T
able 3.
Table
2
.
Dif
fere
nt
pa
ram
et
ers
u
se
d
t
o
trai
n
D
oc2Vec
m
od
el
Para
m
eters
alp
h
a
m
in
_
alp
h
a
m
in
_
co
u
n
t
Vector size
Nu
m
b
e
r
o
f
epo
ch
s
Librar
y
us
ed
Valu
e
0
.02
5
0
.00
2
5
1
300
100
Gen
si
m
Table
3
.
Param
et
ers
o
f
t
he
bas
el
ine
CN
N
c
onfig
ur
at
io
n
f
ilter
regio
n
sizes
Featu
re
m
a
p
s
l2
regu
larization
Drop
o
u
t
rate
Batch
size
o
p
ti
m
iz
er
Activ
atio
n
f
u
n
ctio
n
p
o
o
lin
g
Los
s f
u
n
ctio
n
Nu
m
b
e
r
o
f
epo
ch
s
[3
-
5]
100
0
.01
0
.5
64
ad
a
m
ReLU
m
a
x
p
o
o
lin
g
Mean sq
u
ared
err
o
r
100
The
n,
we
e
valuated
t
he
ef
fec
t
of
eac
h
of
t
he
oth
e
r
par
am
e
te
rs
by
ho
l
ding
al
l
oth
er
set
ti
ng
s
c
on
sta
nt
and
var
y
only
the
facto
r
of
interest
.
D
ur
i
ng
t
hese
ex
per
im
ents,
we
c
ho
se
t
o
us
e
Re
L
U
act
ivati
on
f
un
ct
i
on
a
nd
m
ax
-
po
oling
strat
egy
f
or
our
CNN
m
od
el
w
it
h
m
ean
sq
uared
er
r
or
l
os
s
functi
on,
these
pa
ram
et
ers
are
m
os
tly
us
e
d
in
CN
N
a
rch
it
ect
ures
a
nd
giv
es
go
od
pe
rfor
m
ances.
F
inall
y,
we
c
ombine
d
al
l
the
good
va
riat
ion
r
esults
ob
ta
ine
d from
these e
xp
e
rim
e
nts,
a
nd
us
ed
them
f
or
our
sugg
e
ste
d
C
NN
m
od
el
.
5.
RESU
LT
S
AND DI
SCUS
S
ION
In
this
sect
io
n,
we
pr
e
sent
an
d
disc
us
s
ou
r
exp
e
rim
ental
re
su
lt
s
ob
ta
i
ned
by
each
pa
rt
of
our
ve
rtic
al
sel
ect
ion
syst
e
m
.
5.1.
Se
ma
nt
i
c
represe
n
tati
on
of the q
uer
y
usin
g
D
oc2v
ec
Starti
ng
with
the
first
par
t
of
ou
r
propose
d
syst
em
,
on
c
e
doc2
vec
m
od
el
is
trai
ne
d,
we
us
e
it
t
o
gen
e
rate
an
e
m
bed
din
g
vect
or
f
or
each
query
in
ou
r
colle
ct
ion
s
(trai
n
set
and
validat
ion
set
+
te
st
set
).
These
em
beddi
ng
vect
or
s
m
us
t
captu
re
sem
antic
m
eanin
gs
of
eac
h
of
the
se
queries
.
In
this
re
sp
ect
,
to
m
ake
su
re
t
hat
this
m
od
el
achieve
d
this
go
al
,
we
us
e
d
our
doc2
vec
m
od
el
to
f
ind
the
m
os
t
sim
il
ar
qu
eries
f
or
t
wo
sam
ple
qu
eries
in
ou
r
dataset
,
the
resu
lt
ing
queries,
a
nd
c
orrespo
nd
i
ng
sc
ores
are
present
ed
i
n
Table
4.
As
we
can
see
in
this
ta
ble,
the
sim
il
arity
scor
es
between
the
first
qu
e
ries
and
the
sec
ond
on
es
are
sig
ni
ficant.
We
fig
ur
e
ou
t
that
this
do
c2v
ec
m
od
el
is
a
m
eaning
f
ul
m
od
el
and
c
an
su
ccess
f
ully
reco
gniz
e
sem
antic
inf
or
m
at
ion
a
m
on
g
que
ries.
Evaluation Warning : The document was created with Spire.PDF for Python.
In
t J
Elec
&
C
om
p
En
g
IS
S
N: 20
88
-
8708
Vert
ic
al intent
pr
e
dicti
on
approac
h b
as
e
d o
n Do
c
2vec
and co
nvo
l
utio
na
l
neural
network
s…
(
Sana
e
Ac
hsas
)
3877
Ta
ble
4
.
Sim
il
a
rity
scor
e
f
or two sam
ple q
ue
ries
f
ro
m
o
ur c
ollec
ti
on
s
First qu
ery
Seco
n
d
qu
ery
Their si
m
ilarit
y
s
c
o
re
Moth
er’
s D
ay
gif
ts sh
o
p
p
in
g
site
W
allp
ap
er
co
m
p
u
ter
0
.24
3
7
0
5
6
6
0
6
6
2
6
3
7
6
3
Make re
su
m
e on
li
n
e f
ree
Make re
su
m
e f
ree
te
m
p
lates
0
.90
9
2
4
9
9
4
4
7
3
0
9
5
6
9
Ap
p
le latest news
p
rod
u
cts
h
o
m
e
cleanin
g
pro
d
u
cts
0
.85
4
6
9
0
2
9
4
8
5
6
9
0
2
6
b
an
an
as seed
s p
lan
t
b
an
an
as seed
ins
id
e
0
.99
7
9
9
6
8
0
7
0
9
8
3
8
8
7
Virgin
ia tou
ris
m
Ir
aq
wa
r
p
ictu
res
0
.14
6
3
2
5
7
6
4
9
6
4
2
1
6
3
wallp
ap
er
m
ag
azin
e
Asian
cultu
re
0
.29
1
0
7
1
2
8
8
4
5
1
4
8
0
6
7
5.2.
Query
le
vel
fe
at
u
re e
xt
r
act
i
on
using
CNN
As
w
e
ha
ve
e
va
luate
d
the qua
li
ty
of
the
quer
y
vector
s obtai
ned
i
n
the p
re
vi
ou
s p
art,
we
procee
d
now
to the e
valuati
on of
our
m
od
e
l archite
ct
ure.
First o
f
al
l,
we st
art b
y asses
sing t
he
im
pact o
f va
rio
us
par
a
m
et
ers
us
e
d
in
ord
e
r
t
o
c
onfig
ur
e
ou
r
CN
N
m
od
el
a
nd obtai
n
t
he b
est
possible
res
ults.
5.2.1.
I
mp
act
of
filter si
z
es
We
ex
plore
d
the
ef
fect
of
va
rio
us
filt
er
siz
es,
w
hile
kee
pi
ng
the
num
ber
of
filt
ers
for
each
re
gion
siz
e
fixe
d
at
100.
Fi
gure
7
s
hows
that
th
e
p
lot
for
t
he
filt
er
siz
e
of
[
2
-
4]
was
at
the
to
p
of
al
l
the
ot
he
r
plo
ts
thr
oughout
the
r
un,
a
nd
it
yi
el
ded
bette
r
acc
ur
acy
(69.9
4%
)
c
om
par
ed
t
o
filt
er
siz
es
[
3
-
5]
an
d
[
4
-
6]
(
66.
39%
and 67.
22%
re
sp
ect
ively
)
as
re
ported
in
T
abl
e
5.
Figure
7
.
A
cc
uracy
co
m
par
iso
n of fil
te
r
siz
e
var
ia
ti
ons
Table
5
. Acc
uracy
co
m
par
iso
n of fil
te
r
siz
e
var
ia
ti
ons
Filter
sizes
[2
-
4]
[3
-
5]
[4
-
6]
Accurac
y
6
9
.94
%
6
6
.39
%
6
7
.22
%
5.2.2.
I
mp
act
o
f
fe
ature m
aps
The
var
ia
ti
ons
of
t
he
nu
m
ber
of
filt
ers
per
filt
er
siz
es
do
n’
t
help
m
uch
as
s
how
n
in
Fig
ur
e
8,
but
sti
ll
there a
re a
f
e
w
noti
ceable
acc
ur
acy
res
ults w
hen the
num
ber
of
filt
ers
is
200 (
68.
48
%
)
as
shown i
n
Ta
bl
e 6
.
5.2.3.
I
mp
act
of
r
eg
ulariz
at
i
on
We
ha
ve
us
e
d
two
com
m
on
regulariz
at
ion
strat
egies
for
our
CN
N,
that
are
dropo
ut
and
l2
norm
const
raints.
W
e ex
plo
r
e the e
ff
ect
of
t
hese t
wo strate
gies
he
re. we
pr
ese
nt
ed
the
ef
fect o
f
the l2 n
or
m
im
po
sed
on
t
he
weig
ht
vecto
rs
in
Fig
ure
9
an
d
Ta
ble
7.
We
the
n
e
xperim
ented
wit
h
va
ryi
ng
the
dro
pout
rate
fro
m
0.
1
to
0.5
a
s
s
hown
in
in
Fig
ure
10
a
nd
Ta
ble
8,
fi
xing
t
he
l2
norm
con
st
raint
to
0.
01.
T
he
var
ia
t
ion
s
i
n
regulariz
at
io
n
sh
ow
t
hat
f
or
the
l2
norm
con
st
raint,
t
he
cl
assifi
cat
ion
perform
ance
i
s
higher
with
va
lue=0
wh
ic
h
pro
du
ce
best
resu
lt
s
co
m
par
ed
to
hig
he
r
values
that
of
te
n
hurts
pe
r
form
ance
as
sh
own
in
F
ig
ur
e
9
an
d
Table 7.
Se
pa
r
at
el
y, we
con
si
der
e
d
the ef
fec
t of
d
r
opout rat
e. Fig
ure 1
0
in
dicat
e that wh
e
n
set
ti
ng
the dr
opout
rate
to
0.2
it
gi
ves
t
he
best
ac
cur
acy
res
ults
(69.7
3%)
an
d
t
his
value
dec
re
ase
w
he
n
we
i
ncr
ease
the
dr
opout
rate as
repor
te
d
in
Ta
ble 8.
5.2.4.
I
mp
act
of batch
siz
e
We
nex
t
i
nv
e
sti
gated
the
ef
fe
ct
of
the
Ba
tc
h
siz
e.
Fi
gure
11
a
nd
Ta
ble
9
sh
ow
t
hat
var
y
ing
t
he
batch
si
ze al
so
helps
li
ttle, an
d
the
best
accur
a
cy
r
e
su
lt
obtai
ne
d w
as 67.
01%
with a
batch
size
of v
al
ue 60.
Evaluation Warning : The document was created with Spire.PDF for Python.
IS
S
N
:
2088
-
8708
In
t J
Elec
&
C
om
p
En
g,
V
ol.
10
, No
.
4
,
A
ugus
t
2020
:
3869
-
3882
3878
Figure
8
.
I
m
pact o
f
the
num
ber
of
filt
ers per
f
il
te
r
siz
e
Figure
9
.
A
cc
uracy
co
m
par
iso
n wit
h
var
i
ou
s
var
ia
ti
ons
of l2
nor
m
co
ns
t
raint
Table
6
. Im
pact
of the
num
ber
of
filt
ers
pe
r fil
te
r
siz
e
Nu
m
b
e
r
o
f
f
ilters
Accurac
y
(
%)
N=5
0
6
6
.39
%
N=1
0
0
6
6
.81
%
N=1
3
0
6
5
.97
%
N=1
5
0
6
6
.60
%
N=2
0
0
6
8
.48
%
Table
7
. Acc
uracy
co
m
par
iso
n wit
h vari
ou
s
var
ia
ti
ons
of l2
nor
m
co
ns
t
raint
l2
no
r
m
con
strain
t
v
alu
e
Accurac
y
0
7
5
.37
%
0
.01
6
5
.97
%
0
.1
6
5
.14
%
0
.25
6
5
.34
%
0
.4
6
4
.93
%
Figure
10
. Acc
ur
acy
c
om
par
ison wit
h va
rio
us
var
ia
ti
ons
of
dr
opout rate
Figure
11
. Acc
ur
acy
c
om
par
ison wit
h va
rio
us batc
h
siz
es
Table
8
. Acc
uracy
co
m
par
iso
n wit
h vari
ou
s
var
ia
ti
ons
of dr
opout rate
Drop
o
u
t r
ate
Accurac
y
0
.1
6
7
.64
%
0
.2
6
9
.73
%
0
.3
6
7
.43
%
0
.4
6
5
.97
%
0
.5
6
5
.34
%
Table
9
. Acc
uracy
com
par
iso
n
with
var
i
ou
s
bat
ch
siz
es
Batch si
zes
A
ccura
cy
30
6
5
.34
%
60
6
7
.01
%
120
6
4
.72
%
240
6
6
.64
%
5.2.5.
I
mp
act
of
optimi
z
ers
Re
gardin
g
the
op
ti
m
iz
ers,
we
can
see
cl
ea
rly
fr
om
Figure
12
that
on
ly
the
c
urves
of
“
adam
”
an
d
“adad
el
ta
”
optim
iz
ers
wer
e
at
the
top
.
Th
us
,
they
pro
duce
best
accu
r
acy
resu
lt
s
(66.81%
and
65.97%)
com
par
ed
t
o
“
sg
d”
(
56.
16
%
)
as
sho
wn
in
Table
10.
Ne
xt
,
we
e
xp
l
oited
the
obser
vat
ion
s
giv
e
n
a
bove
to
config
ur
e
our
CNN
m
od
el
w
it
h
the
be
st
va
r
ia
ti
on
s
of
par
a
m
et
ers.
W
e
s
um
m
arize
the
su
ggest
e
d
pa
ra
m
et
ers
in
Table
11.
Evaluation Warning : The document was created with Spire.PDF for Python.