Indonesian J
ournal of Ele
c
trical Engin
eering and
Computer Sci
e
nce
Vol. 2, No. 3,
Jun
e
201
6, pp. 684 ~ 69
4
DOI: 10.115
9
1
/ijeecs.v2.i3.pp64
8-6
9
4
684
Re
cei
v
ed Ma
rch 7, 2
016;
Re
vised
Ma
y 3, 2016; Acce
pted May 1
9
, 2016
Analysis of Reinforcement Based Adaptive Routing in
MANET
Rahul Desai*
1
, B P Patil
2
1
Sinh
ga
d Col
l
e
ge of Engi
ne
eri
ng, Arm
y
Instit
ute of T
e
chnol
og
y, P
une, Ma
haras
htra, Indi
a
2
E &
T
C
Depar
tment, Army
In
stitute of
T
e
chnol
og
y, Pun
e
, Mahar
ashtra, Indi
a
*Corres
p
o
ndi
n
g
author, e-ma
i
l
: desaimr
ahu
l
@
yah
oo.com
1
, bp_p
atil@r
edif
f
mail.com
2
A
b
st
r
a
ct
T
h
is pa
per d
e
s
c
ribes
and
eva
l
uates the
perfo
rma
n
ce
of vari
ous re
inforce
m
ent le
arni
ng
al
gorith
m
s
w
i
th shortest p
a
th al
gorith
m
s
that are w
i
de
ly
used fo
r ro
uti
ng p
a
ckets thr
oug
h the n
e
tw
ork. Shortest p
a
th
routin
g is the si
mp
lest po
licy u
s
ed
for routin
g the packets al
o
ng the pat
h ha
ving
min
i
mu
m
nu
mb
er of hop
s.
In hig
h
traffic or hig
h
mo
bil
i
ty conditi
ons, t
he sh
or
test pat
h get flo
ode
d
w
i
th huge
nu
mber of p
a
ckets
and
cong
estio
n
s oc
curs, So suc
h
shortest p
a
th
does
not
pr
ovi
des the
shorte
st path a
nd
in
creases
del
ay
for
reach
i
ng th
e p
a
ckets to the
d
e
stinati
on. R
e
i
n
force
m
e
n
t le
a
r
nin
g
al
gorit
hms are a
d
a
p
tive
alg
o
rith
ms w
h
e
r
e
the path is se
le
cted bas
ed o
n
the tra
ffic present on the
net
w
o
rk at real
time. T
hus they
guar
ante
e
the l
eas
t
deliv
ery ti
me t
o
re
ach
the
pa
ckets to th
e d
e
s
tinatio
n. An
al
ysis d
one
o
n
a
6
by
6 irr
e
g
u
l
a
r gr
id
an
d s
a
mp
le
ad hoc netw
o
rk
show
s
that p
e
rform
anc
e p
a
r
ameters us
ed
for ju
dgi
ng
th
e n
e
tw
ork - pa
cket de
livery r
a
ti
o
and d
e
l
a
y provi
des opti
m
u
m
results usi
ng rei
n
force
m
e
n
t lea
r
nin
g
alg
o
rith
ms.
Ke
y
w
ords
: Ad
Hoc Netw
ork, AODV, AOMD
V, DSDV, DS
R, CQ Routing, DRQ
Routi
ng, Q Routing
Copy
right
©
2016 In
stitu
t
e o
f
Ad
van
ced
En
g
i
n
eerin
g and
Scien
ce. All
rig
h
t
s reser
ve
d
.
1. Introduc
tion
Information is trans
m
itted in the network
in
form
of
packet
s
. Rou
t
ing is th
e p
r
oce
s
s of
transmitting these packet
s
from one ne
twork to
anot
her. Whil
e transmitting th
e packet
s
fro
m
sou
r
ce to the
destinatio
n, a numb
e
r of i
n
terme
d
ia
te
hop
s ca
me in
picture. Vari
ous p
e
rfo
r
ma
nce
para
m
eters a
r
e u
s
ed to
ju
dge the
qualit
y of routing
such
as
delay,
packet d
e
livery ratio,
con
t
rol
overhe
ad, through
put, jitter etc. So
me
of the mo
st i
m
porta
nt pa
rameter
used
for jud
g
ing
the
quality is the
delay a
nd
pa
cket delive
r
y
ratio.
Diffe
re
nt
ro
uting alg
o
rithm
s
su
ch
as sh
orte
st
p
a
th
routing,
bellm
an fo
rd
algo
ri
thms
are
u
s
e
d
. The
mo
st
simple
st
and
effective
poli
c
y u
s
ed
i
s
th
e
sho
r
test
path
routing.
In
sh
ortest
path
ro
uting
the
path
with
minimu
m num
be
r of
hop
s i
s
sele
ct
ed
to deliver the
pa
cket from
so
urce to
th
e de
sti
nation.
In shorte
st
path
routing,
co
st tabl
e a
nd
neigh
bor ta
bl
es a
r
e p
r
e
s
e
n
t to store t
he app
ro
priat
e
informatio
n
and table
s
are ex
cha
n
g
e
d
freque
ntly for adaptatio
n pu
rpo
s
e.
The sho
r
te
st path
routing
policy
i
s
goo
d
an
d
fou
nd
effective for less n
u
mbe
r
of nod
es
and less traffic pre
s
e
n
t on the network. But this
policy is not always good a
s
there are so
me
interme
d
iate
node
s p
r
e
s
e
n
t in the net
work that a
r
e
always
get floode
d with
huge n
u
mb
er of
packet
s
. Such route
s
are referred as p
o
pular route
s
.
I
n
suc
h
ca
se
s,
it
is alway
s
bet
t
e
r t
o
select
the alternate
path for tran
smitting the packets. This
p
a
th may not be sho
r
test in term
s of numb
e
r
of hop
s, but
this path
defi
n
itely re
sults in mini
m
u
m
delivery time
to rea
c
h
the
packet
s
to t
he
destin
a
tion
b
e
ca
use of l
e
ss traffic o
n
th
ose
ro
utes
.
Such
ro
utes a
r
e dynami
c
ally
sel
e
cte
d
in
real
time base
d
o
n
the actual traffic present
on the
network. Hen
c
e when the more
traffic is pre
s
en
t
on som
e
pop
ular routes, some unp
opul
ar ro
utes
mu
st be sele
cte
d
for deliveri
ng the packe
ts.
This i
s
th
e
main m
o
tivating fa
ctor fo
r de
signin
g
a
nd impl
emen
ting vario
u
s
adaptive
rout
ing
algorith
m
s o
n
a network.
Learning
su
ch effective policy for de
ci
ding ro
utes
online i
s
maj
o
r ch
allen
ge,
as the
deci
s
io
n of selectin
g ro
utes mu
st be t
a
ke
n in real
time and p
a
c
kets a
r
e div
e
rted o
n
so
me
unpo
pula
r
ro
utes. T
he m
a
in go
al i
s
to
o
p
timize
the
d
e
livery time f
o
r th
e p
a
cket
s to
re
ach to
the
destin
a
tion a
nd preventin
g the network to go into
t
he conge
stio
n. There is n
o
trainin
g
sig
nal
available fo
r
deci
d
ing
opti
m
um p
o
licy
at run
ti
me, i
n
stea
d de
ci
si
on mu
st b
e
t
a
ke
n when
the
packet
s
are
routed an
d pa
ckets rea
c
he
s
to the desti
nation on p
o
p
u
lar route
s
.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 25
02-4
752
IJEECS
Vol.
2, No. 3, Jun
e
2016 : 684
– 694
685
Ad Hoc n
e
tworks a
r
e infrastru
c
tu
re le
ss n
e
two
r
ks.
These a
r
e
con
s
i
s
ting of
mobiles
node
s which are movin
g
randomly. Fig
u
re 1 sho
w
a
n
ad ho
c net
work whe
r
e
multiple hop
s are
use
d
to deliver the pa
cke
t
s to the destinati
on. Rout
ing protocols for an ad h
o
c net
wo
rk a
r
e
gene
rally cla
ssifie
d
into two types
- Proa
ctiv
e and
On De
man
d
. Proactive p
r
otocol
s which
are
are ta
ble
dri
v
en ro
uting
proto
c
ol
s
whi
c
h
attempt t
o
maintai
n
consi
s
tent, u
p
to date
rout
ing
informatio
n from e
a
ch n
o
d
e
to
every
other no
de
i
n
t
he
n
e
two
r
k. These proto
c
ols re
quire
e
a
ch
node to maint
a
in one or mo
re table
s
to store ro
uting inf
o
rmatio
n and
they resp
ond
to chang
es in
netwo
rk top
o
l
ogy by excha
nging u
pdate
s
throu
gho
ut the network.
Figure 1. Example of Mobil
e
Ad Hoc
Net
w
ork
DSDV is o
ne
of the proa
ctiv
e routing p
r
o
t
ocol
s develo
ped for an a
d
hoc net
work.
This is
based
on distance
ve
ctor routing proto
c
ol
an
d
u
s
e
s
destinatio
n
seq
uen
ce n
u
m
bers to av
oid
cou
n
t to infinity problem. Seco
nd type o
f
routing protocol o
n
an a
d
hoc n
e
two
r
k is on d
e
ma
nd
routing
proto
c
ol
s whi
c
h a
r
e also
kn
own
as re
acti
ve routing p
r
oto
c
ols.
The
s
e ro
uting
protocol
s
maintain rout
es when
ever
requi
re
d.
The
Dynami
c
Source
Routi
ng Pr
otocol (DSR) i
s
o
ne
of the
o
n
d
e
m
and
s
routin
g protocol
whi
c
h i
s
cha
r
acteri
ze
d by
the us
e of
source
routin
g
.
That is, the
sen
d
e
r
kno
w
s the
compl
e
te
hop-by-h
op route to the
destin
a
ti
on. These ro
utes are
store
d
i
n
a ro
ute ca
che. Ad
Ho
c on
Dema
nd Di
st
ance Vecto
r
Routin
g (AO
D
V) [1, 2] is also on
-de
m
an
d routing p
r
ot
ocol. AODV u
s
e
s
traditional
ro
uting table
s
, one ent
ry pe
r de
stinati
on.
This i
s
in contra
st to DSR, whe
r
e
DSR
maintains multiple route
cache en
trie
s fo
r ea
ch de
stin
ation. Being
a singl
e path
proto
c
ol, it had
to invoke
a
new ro
ute di
scovery p
r
o
c
ess
when
ev
e
r
this si
ngle
path fails.
To
overcome
this
limitation, an
other M
u
ltipat
h extensi
on t
o
AODV
call
ed Ad Hoc
O
n
-Dema
nd M
u
ltipath Di
sta
n
ce
Vector
(AOM
DV) is u
s
e
d
.
The results
obtaine
d fro
m
analysi
s
of
variou
s p
r
oa
ctive and
on-d
e
man
d
routing
proto
c
ol
s [3, 4] sho
w
s, for low load
s a
nd low mo
bili
ty, proactive proto
c
ol
s su
ch as DS
DV a
nd
OLSR give
s better re
sult
s. End-to-en
d
delay is min
i
mum in DS
DV and OLS
R
proto
c
ol
s. On
deman
d Rout
ing protocols
su
ch a
s
AO
DV, DSR an
d
AOMDV a
r
e
more
effective in hig
h
traff
i
c
diversity as
well as high mobilit
y. Packet delivery ratio is high
est in case of
AODV and DSR
Protocols. A
O
MDV al
so
prod
uces
sim
ilar re
sult
s a
s
that of AODV but it pro
duces la
rge
r
over
head
s. In AO
DV proto
c
ol
con
g
e
s
tion o
c
curs in
sele
cted
sho
r
test
path and
eventually it sta
r
ts
drop
ping
the
packet
s
. T
h
u
s
AO
DV a
nd
AOMDV
prot
ocol
give
s g
o
od p
e
rfo
r
man
c
e
at lo
w lo
a
d
s
but at high mobility and he
avy load situa
t
ions, both of them fail to work.
2. Liter
a
tur
e
Surv
e
y
of Various
Rei
n
forc
ement
Learning
Al
gorithms
–
Problems a
n
d
Proposed S
o
lution
Reinfo
rceme
n
t learni
ng is learni
ng whe
r
e t
he ma
ppi
ng between
situations to a
c
tion
s is
carrie
d out
so as to
maxi
mize a
num
e
r
ical
re
wa
rd
signal [5-6]. Reinforcem
ent
Learning i
s
u
s
ed
for dyn
a
mical
l
y sele
cting
p
a
th in
cro
w
d
ed e
n
viron
m
ent in
the
ne
twork. T
he l
e
arne
r i
s
not t
o
ld
whic
h ac
tions to tak
e
, as
in mos
t
forms of ma
chin
e learni
ng, but instea
d must
discover
whi
c
h
action
s yield
the most reward by trying
them.
In the most interesting and
chal
lengin
g
ca
se
s,
Evaluation Warning : The document was created with Spire.PDF for Python.
IJEECS
ISSN:
2502-4
752
Analysis of Reinforcem
ent Based Ad
apt
i
v
e Routing in
MANET (Rah
ul De
sai)
686
action
s may
affect not onl
y the immedi
ate re
wa
rd
b
u
t also th
e n
e
xt situation
and, thro
ugh
that,
all su
bsequ
e
n
t re
wards.
T
hese two
cha
r
acte
ri
stic
s--trial-an
d
-error
sea
r
ch a
nd d
e
layed
re
ward--
are the two m
o
st impo
rtant disting
u
ishing
feature
s
of reinforcem
ent learni
ng.
Reinfo
rceme
n
t learni
ng i
s
define
d
no
t by chara
c
t
e
rizi
ng lea
r
ni
ng metho
d
s,
but by
cha
r
a
c
teri
zin
g
a learni
ng p
r
oble
m
. Basic idea is si
m
p
l
y
to capture the most imp
o
r
tant asp
e
ct
s of
the re
al p
r
obl
em faci
ng
a l
earni
ng
age
nt intera
ctin
g
with its envi
r
on
ment to a
c
hi
e
v
e a go
al. Su
ch
an a
gent
mu
st be
abl
e to
sen
s
e
the
sta
t
e of th
e
envi
r
onm
ent to
some
extent a
nd m
u
st
be
a
b
le
to take actio
n
s that affect
the state. The age
nt
also must have
a goal or g
o
a
ls rel
a
ting to the
state of the
environ
ment.
The fo
rmul
ation is
inte
nded to
in
cl
ude ju
st the
s
e th
ree
a
s
p
e
cts
sen
s
atio
n, action, and goal
in their simpl
e
st po
ssi
ble form
s witho
u
t trivializing a
n
y
of them.
One of the
chall
enge
s th
at arise in reinforcem
ent
learni
ng is t
he trad
e-off
betwe
en
exploratio
n a
nd expl
oitatio
n
. To
obtain
a lot of
re
wa
rd, a
rei
n
forcement le
arni
ng a
gent m
u
st
prefe
r
actio
n
s that it has tri
ed in the pa
st and
found t
o
be effective
in prod
uci
n
g
reward. But
to
discover
su
ch actio
n
s, it
has to t
r
y actions t
hat it h
a
s n
o
t sel
e
ct
ed befo
r
e. T
he ag
ent ha
s to
exploit what it already kno
w
s in
orde
r to obtain re
ward, but it also ha
s to explore in o
r
de
r to
make
better action
sel
e
ction
s
in the
future
. The
agent mu
st
try a variet
y of action
s and
prog
re
ssively favor those that appea
r to be best. On
a stoch
a
sti
c
task, each a
c
ti
on must be tri
ed
many times to gain a reli
a
b
le es
timate its
expec
ted reward [7].
There are t
w
o approa
che
s
to learning
a c
ontroller
for a given task. In mode
l-ba
sed
approa
ch, th
e lea
r
nin
g
a
gent m
u
st fi
rst l
earn
a
model
of th
e envi
r
onm
e
n
t and
u
s
e
this
kno
w
le
dge to
learn
an effe
ctive co
ntrol
policy for th
e
task,
while i
n
the mod
e
l-f
r
ee a
p
p
r
oa
ch
a
controlle
r is
learn
ed di
re
ctly from the
actual
outco
mes [7
-8]. Reinforcem
ent
learni
ng i
s
an
example of the model
-ba
s
e
d
approa
ch.
Q Routin
g [9-11] i
s
reinf
o
rceme
n
t ba
sed le
arni
ng
algorithm. It is based o
n
the Q
learni
ng p
r
in
ciple [12-13] in
orde
r to lea
r
n the opt
imal
path to the d
e
stinatio
n. Each no
de in th
e
netwo
rk h
a
s
a reinfo
rcem
ent learni
ng
module in
o
r
der to dynam
ically determi
ne the optim
um
path to the
d
e
stinatio
n. In
ord
e
r to
imp
l
ement
regul
ar a
daptive
routing, the
r
e
is a
ne
ed fo
r a
training
sig
n
a
l
to evaluate
or imp
r
ove th
e rout
in
g poli
c
y, whi
c
h ca
nnot be g
ene
rated u
n
til the
packet
rea
c
h
e
s the
final d
e
stinatio
n. Ho
wever,
usi
ng reinfo
rcement
learning,
the
update
s
can
be
made
mo
re
quickly a
nd
u
s
ing
only
local info
rmatio
n. Figu
re
2 il
lustrate
s the
basi
c
process o
f
reinfo
rcement
learnin
g
. Let
Qx(y, d) be the time
that a node x estimates it takes to delive
r
a
packet P
bo
u
nd fo
r n
ode
d
by way of x’
s n
e
igh
bor n
ode y i
n
cl
udi
ng a
n
y time t
hat P
woul
d
have
to sp
end
in
n
ode x’
s q
ueu
e. Up
on
se
nd
ing P to
y, x immediately
g
e
ts b
a
ck y’
s
estimate
for t
he
time remaini
n
g in the trip [10-11].
Figure 2. Rei
n
forceme
n
t Learni
ng
In Q Routing,
each no
de
maintain
s informatio
n abo
ut Q values for ea
ch of the possible
next hop
s. T
hese Q valu
es
rep
r
e
s
ent
s the d
e
liv
ery time for the pa
ckets to
rea
c
h to th
e
destin
a
tion.
For eve
r
y pa
cket, the no
d
e
ma
ke
s
a
choice of the
next hop tha
t
has th
e lea
s
t
es
timate
of the time it takes
to
reac
h the des
tina
tion.
Also, an
upd
a
t
e is al
so
se
nt to the p
r
evio
us
node
re
ga
rdi
ng the
prese
n
t Q valu
e. I
n
orde
r to
ke
ep the
Q val
u
e a
s
clo
s
e to
the a
c
tual val
ues
as p
o
ssibl
e
a
nd to reflect t
he chan
ge
s i
n
the
state of
the network, the Q val
ue e
s
timate
s ne
e
d
to
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 25
02-4
752
IJEECS
Vol.
2, No. 3, Jun
e
2016 : 684
– 694
687
be up
dated
with minim
u
m po
ssibl
e
overhe
ad [1
0-12]. Fig
u
re
3 sh
ows
Q
routing
forward
exploratio
n. As soo
n
as t
he node X se
nds a pa
cket
P (S, D) destined for nod
e D to one of the
neigh
bori
ng n
ode
s Y, node
Y send ba
ck to node X, its best e
s
timate
Q
y
(Z, D) for the des
tination
D. Thi
s
valu
e
esse
ntially e
s
timate
s the
remai
n
ing tim
e
in th
e jou
r
n
e
y of pa
cket
P (S, D). Up
on
receiving Q
y
(Z, D), nod
e
X compute
s
t
he ne
w e
s
timate. The expl
oration i
n
volved in up
datin
g
the Q value
o
f
the sen
d
ing
node X
usi
n
g the info
rm
a
t
ion obtain
ed
from the
re
ce
iving nod
e Y, is
referred to a
s
forward exploratio
n. With ev
ery hop
of the packet P (S, D),
one Q value
is
update
d
[11].
Figure 3. Q routing
Fo
rward Exploration
In anothe
r op
timized fo
rm, Confid
en
ce B
a
se
d
Q Routi
ng, ea
ch Q v
a
lue Qx(Y, D) in the
netwo
rk i
s
a
s
so
ciated
with
a me
asure
of
confid
en
ce
Cx(Y, D),
whi
c
h is a
real
nu
mber bet
wee
n
0
and 1. A
valu
e of 1
mean
s that there i
s
full co
nfiden
ce i
n
the
co
rresp
ondi
ng Q
value a
nd th
at
this Q value reflects the cu
rre
nt state of the netwo
rk (compl
etely re
liable). In oth
e
r wo
rd
s, this
Q
value ha
s
re
cently been
up
dated.
A valu
e of 0, on
the
other
han
d, mean
s that th
e co
rrespon
di
ng
Q value i
s
random
and
doe
s not ne
cessarily
refl
e
c
t anything a
bout the
current state of t
he
netwo
rk. In o
t
her word
s, this Q valu
e
has n
e
ver b
e
en upd
ated.
All Intermedi
ate node
s al
ong
with Q value, also t
r
ansmit
s C
onfidence value which
will updated i
n
confidence
t
able. Figure
4
sho
w
s Co
nfid
ence ba
sed
Q rout
ing fo
rward explo
r
at
ion.
Figure 4. Con
f
idence Q rou
t
ing
Evaluation Warning : The document was created with Spire.PDF for Python.
IJEECS
ISSN:
2502-4
752
Analysis of Reinforcem
ent Based Ad
apt
i
v
e Routing in
MANET (Rah
ul De
sai)
688
Dual
reinfo
rcement Q Rou
t
ing (DRQ
) is a m
odified v
e
rsi
on of the
Q-Routing
al
gorithm,
whe
r
e l
earni
ng o
c
curs i
n
both
way
s
.
Since, the
le
arnin
g
p
r
o
c
e
s
s o
c
curs in
both
way
s
t
h
e
learni
ng
perf
o
rma
n
ce of t
he Q
-
Routing
algo
rithm
do
uble
s
. Inste
a
d
of trying
to
use the
si
n
g
le
reinfo
rcement
sig
nal, an
in
dire
ct reinforcement
si
gn
al
is extra
c
ted
from the i
n
com
i
ng info
rmatio
n
and i
s
u
s
ed t
o
upd
ate the
local
de
cisio
n
make
r. Whe
n
a no
de X send
s a p
a
cke
t
to neighb
ori
ng
node
Y, som
e
ad
ditional
routing i
n
form
ation
can
be
sent alon
g wi
th
the
p
a
cket.
Thi
s
info
rmat
ion
can b
e
u
s
ed t
o
update
nod
e Y's de
ci
sio
n
s in the
dire
ction op
po
site to the dire
ct
ion of the pa
cket.
This u
pdate
add
s ba
ckwa
rd explo
r
atio
n to Q-
Routi
ng. Figure 5 sho
w
s forward and ba
ckward
exploratio
n in
volved in Q learnin
g
process [14].
Figure 5. Forward and Ba
ckward Explo
r
ation
The Q-Ro
utin
g algorith
m
make
s u
s
e of
forwa
r
d expl
oration in u
p
dating the Q-values in
the netwo
rk.
In forwa
r
d ex
ploratio
n, the Q-va
lue of the se
nding n
ode is u
pdat
ed ba
sed o
n
the
informatio
n coming from
the re
ceiving
node.
DRQ
routin
g [14]
makes
use
of ba
ckwa
rd
exploratio
n a
s
well. When
a nod
e X
sen
d
s a
pa
cket
P
(
S, D) to on
e
of its nei
ghb
o
r
s Y, the
pa
cket
can ta
ke alon
g information
about the Q-values of
nod
e X. When n
ode Y receives this pa
cke
t, it
can
u
s
e thi
s
i
n
formatio
n in
upd
ating its
Q-value
s
pe
rtaining t
o
the
neigh
bor X. L
a
ter
whe
n
n
o
de
Y has to ma
ke a de
cisi
on,
it can u
s
e the
updated
Q value
s
for X. Q value up
da
tes in ba
ckward
exploratio
n a
r
e m
o
re a
ccurate
than
Q
value
upd
ates i
n
fo
rward
explo
r
ation.
Figure 6
sho
w
s
Dual reinfo
rcement Q ro
uting whi
c
h
invo
lves ba
ckwa
rd exploratio
n.
Figure 6. Backward Explo
r
ation
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 25
02-4
752
IJEECS
Vol.
2, No. 3, Jun
e
2016 : 684
– 694
689
3. Results a
nd Discu
ssi
on
Network Sim
u
lator
NS2 i
s
u
s
e
d
for e
x
perim
entation. NS2 is
the s
t
andard
network
simulato
r u
s
ed for a
naly
s
is
of wired
and wi
re
l
e
ss netwo
rks. T
w
o different
experim
ents
are
perfo
rmed
to
judge
the
qu
ality of rei
n
forceme
n
t lea
r
ni
ng al
gorith
m
s usi
ng
differe
nt pe
rform
a
n
c
e
para
m
eters, in
first expe
riment
6×6
irre
gula
r
g
r
id
is
used to
test the
p
e
rform
a
n
c
e
of
reinfo
rcement
learnin
g
for rando
m traf
fic.
Second
experim
ent is perf
o
rme
d
on an ad
hoc
network consisting of
10 to 100 nodes
with
random mobility of nod
es and random traffic
gene
rated o
n
the network.
In first expe
ri
ment, the ne
twork top
o
log
y
used i
s
the
6×6 i
r
regul
a
r
gri
d
sho
w
n
in the
figure 7. Fig
u
r
e 7
sho
w
s th
e simul
a
tion
of the sam
e
. In this net
wo
rk there are two po
ssibl
e wa
ys
of routin
g pa
ckets
betwe
e
n
the
left cluster
(n
ode
s 1
throu
gh 18)
a
nd
the right cl
uster
(n
ode
s 19
throug
h 36
): the ro
ute incl
u
d
ing no
de
s 1
2
and
25
(R1
)
and the
rout
e inclu
d
ing n
ode
s 18 an
d 19
(R2
)
. For ev
ery pair of source an
d d
e
stinatio
n no
des in differe
nt cluste
rs, either of the two
route
s
, R1 o
r
R2 can be
ch
ose
n
. Figure 8 sho
w
s a
si
mulation of the 6×6 irre
gul
ar gri
d
in NS2
.
Figure 7. The
6×6 Irregul
ar Grid
Figure 8. NS2 simulatio
n
for 6×6 Irregul
ar Gri
d
Route
R1 i
s
al
ways
selecte
d
by
sho
r
te
st p
a
th ro
uting.
For lo
w l
oad
s (1
packet/sim
ula
t
ion time), sh
ortest path
routing
is givi
ng be
st resul
t
s. Average
packet delive
r
y
time is very less a
s
compa
r
ed with rei
n
fo
rce
m
ent lea
r
n
i
ng method
s (figure 9
)
Evaluation Warning : The document was created with Spire.PDF for Python.
IJEECS
ISSN:
2502-4
752
Analysis of Reinforcem
ent Based Ad
apt
i
v
e Routing in
MANET (Rah
ul De
sai)
690
Figure 9. Average Pa
cket Delivery Time
vs. Simulation Time step f
o
r low lo
ad
s
At medium (2 pa
ckets/
si
mulation tim
e
)
(figu
r
e 1
0
), or hig
h
load co
ndit
i
ons
(3
packet
s
/simul
ation time) (fi
gure
11), im
posed ove
r
the netwo
rk, it is found th
at the sh
orte
st p
a
th
routing
brea
ks do
wn
and t
he average
p
a
cket delive
r
y time gro
w
s
linearly a
s
th
e simul
a
tion t
i
me
prog
re
sse
s
. T
h
is i
s
be
ca
use the pa
cket
queu
es
at pa
rticula
r
n
ode
s 12 an
d 25 i
n
cre
a
ses with
out
boun
d. A lot
of que
ue
waiting time i
s
incurred
by pa
ckets g
o
i
ng throug
h t
hese no
de
s. In
reinfo
rcement
ro
uting,
simu
lation time
st
eps 15
00
to
2000
are
req
u
ired
to fin
d
out the
optim
um
paths, an
d there after they
settle down to
most sta
b
le
routing poli
cy.
Figure 10. Averag
e Packet
Delivery Tim
e
vs. Simulation Time ste
p
for Medium lo
ads
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 25
02-4
752
IJEECS
Vol.
2, No. 3, Jun
e
2016 : 684
– 694
691
Figure 11. Averag
e Packet
Delivery Tim
e
vs. Simulation Time ste
p
for High lo
ad
s
Secon
d
exp
e
r
iment i
s
ca
rried
out on
an ad
ho
c n
e
twork
and v
a
riou
s
reinfo
rceme
n
t
routing algo
ri
thms
a
r
e co
mpared with
existi
ng
rou
t
ing protocol
s such a
s
A
O
DV, DS
R
and
AOMDV.
Table 1. Simulation Para
meters Used
for Analysi
s
Parameter Value
No of nodes
10 to 100 no
des
Mobility
model
Random Wa
y
Po
int Mobility
Mode
l
Simulation time
200 s
Topolog
y Size
800 m × 800 m
Routing pro
t
ocols
anal
y
z
ed
DSR, AODV,
AOMDV, Q Routing
,
Confidence Q Routing a
nd Dual
Reinforcement R
outing
Mobility
rate
25m/s to 125 m/s
Pause time
0, 50, 100, 15
0 a
nd 200 s
The vario
u
s param
eters used in
se
con
d
experi
m
ent for an
alysis of rei
n
forceme
n
t
learni
ng
are
sho
w
n
in
Ta
ble 1.
Packet
delive
r
y ratio is
a
nalyzed
for
medium
size
netwo
rk
(50
node
s) an
d p
a
cket rate
ch
angin
g
fro
m
25 m/s to
1
2
5
m/s. It i
s
o
b
se
rved
as the mo
bility rate
increa
se
s, ro
uting p
r
oto
c
ol
s
su
ch
as AO
MDV a
nd DS
R
p
r
oto
c
ol
s starts dro
ppin
g
the
p
a
cket
s as
it become
s
di
fficult for the
m
to ad
apt th
e chan
ges
in
the n
e
two
r
k
in short
amo
unt of time.
For
obtainin
g
the
con
s
iste
nt p
a
cket ratio f
o
r medi
um size
net
wo
rk, reinfo
rcement
routing wo
rks
better than
e
x
isting protocols. Figu
re 1
3
sho
w
s the
perfo
rman
ce
of an ad ho
c netwo
rk
whil
e
cha
ngin
g
the numbe
r of no
des at lo
w loa
d
netwo
rk.
Evaluation Warning : The document was created with Spire.PDF for Python.
IJEECS
ISSN:
2502-4
752
Analysis of Reinforcem
ent Based Ad
apt
i
v
e Routing in
MANET (Rah
ul De
sai)
692
Figure 12. Packet Delivery
Ratio vs. Packet Rate
Figure 13. Packet Delivery
Ratio vs. No
of Node
s
Figure 14. De
lay vs. No of Nod
e
s
It is observed that as we start increasing
the number
of nodes,
(figure 14) delay will
start
s
in
cre
a
s
ing. DSDV as
a
proa
ctive routin
g p
r
otocol
pr
ovid
es mi
nimum
delay, but
dual
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 25
02-4
752
IJEECS
Vol.
2, No. 3, Jun
e
2016 : 684
– 694
693
reinfo
rcement
Q
routin
g
provide
s
pro
m
inent
re
sult
s a
s
compa
r
ed
with
AO
DV, DS
R
a
n
d
AOMDV.
Figure 15. Co
ntrol Overhea
d vs. Packet rate
As far a
s
co
ntrol ove
r
he
a
d
is
con
s
id
ered (f
igu
r
e
15
) it is o
b
serv
ed that in p
r
oactive
proto
c
ol
s such as
DSDV t
he ro
uting ov
erhe
ad i
s
hig
h
. AOMDV protocol tho
ugh
it is on dem
a
nd
routing p
r
oto
c
ol, routin
g overhe
ad is
high a
s
it
th
e extensio
n of AODV pro
t
ocol to find the
maximum nu
mber of p
a
th
betwee
n
so
urce an
d
de
stination. Rein
force
m
ent ro
uting gen
erates
less and
con
s
istently equal
numbe
r of
pa
ckets through
out the netwo
rk.
4. Conclusio
n
In this pape
r, analysis an
d comp
ari
s
o
n
of
reinforcement ro
utin
g, Confide
n
ce based
reinfo
rcement
routing a
n
d
dual rei
n
forceme
n
t routi
ng we
re p
r
e
s
ente
d
. Conf
iden
ce ba
se
d
reinfo
rcement
and Dual rei
n
forceme
n
t routing ar
e sh
owin
g promi
n
ent results as compa
r
ed
wi
th
sho
r
test p
a
th
routing for m
edium an
d hi
gh load
con
d
i
t
ions. At high
loads, du
al reinforcem
ent Q
routing
perfo
rms mo
re tha
n
twice a fa
st
as Q
-
Rout
in
g. F
o
r
a
n
ad
ho
c
ne
tw
or
k
,
In
AO
D
V
pr
o
t
o
c
o
l
con
g
e
s
tion o
c
curs in sele
cted short
e
st
path and ev
e
n
tually it starts droppin
g
the packet
s
. Th
us
AODV an
d AOMDV p
r
oto
c
ol give
s go
od pe
rform
a
nce
at low l
oad
s but at
high mo
bility and
heavy load
situations, b
o
t
h
of them f
a
il to wo
rk
.
In dual reinfo
rcement routin
g,
as ba
ckward
exploratio
n i
s
involved i
n
cl
uding
confide
n
ce
me
as
ure
,
less time
is re
quired i
n
orde
r to
settle
down the Q value
s
thus th
ey more accu
rately predi
ct the state of network at run
time. It is found
that, though
mobility rate
cha
nge
s at hi
gh rate a
s
we
ll as hi
gh traffic, dual
rei
n
fo
rce
m
ent
routi
ng
obtain
s
more accurate re
su
lt as com
pare
d
with Q ro
uting.
Referen
ces
[1]
G Vetrichelvi a
nd G Mohank
u
m
ar. “Performance Ana
l
ysis o
f
Load Min
i
miz
a
tion i
n
AODV and F
S
R”.
Internatio
na
l Journ
a
l of Infor
m
at
i
on a
nd N
e
tw
ork Security (IJINS)
. 2012; 1(3): 152-
15
7.
[2]
Malek Al-Ga
b
r
i
, Chu
n
li
n LI
and
La
yu
an
L
i
. “Improving
Z
i
gBee AODV
Mesh Ro
utin
g Alg
o
rith
m
T
opolog
y an
d
simulatio
n
an
al
ysis”.
T
E
LKOMNIKA Indonesi
an Jour
nal
of Electrical Engi
neer
in
g
.
201
4; 12(2): 15
28-1
535.
[3]
R Des
a
i a
nd
BP Patil. "An
a
l
y
s
is of ro
utin
g protoc
ols fo
r Ad Hoc
Net
w
o
r
ks".
Circ
u
it
s, System
s,
Co
mmun
icati
o
n an
d Infor
m
at
ion T
e
c
hno
lo
g
y
Applic
atio
ns
(CSCIT
A), 201
4 Internati
o
n
a
l
Confer
ence
on
, Mumba
i
. doi: 10.11
09/CS
CIT
A
.
2014.6
8
3
924
4. 201
4: 11
1-11
5.
[4]
Rah
u
l Des
a
i, BP Patil. “
Performa
nce An
al
ysis of Ad Hoc Routi
ng Pr
otocols
”. 3rd Internati
o
n
a
l
Confer
ence o
n
Recent T
r
ends in Engin
eer
in
g &
T
e
chnol
og
y (ICRT
ET
’201
4). ISBN No.: 9
78-9
3
-51
07-
220-
1. 201
4: 11-15.
[5]
Richar
d S Sutton an
d Andre
w
G Barto. “Rein
f
orcem
ent Le
ar
nin
g
: An Intro
ductio
n
”. A Bradford Book,
the MIT
Press
Cambri
dg
e, Massach
usetts L
ond
on, Eng
l
an
d.
Evaluation Warning : The document was created with Spire.PDF for Python.