TELKOM
NIKA Indonesia
n
Journal of
Electrical En
gineering
Vol. 12, No. 11, Novembe
r
2014, pp. 79
4
6
~ 795
1
DOI: 10.115
9
1
/telkomni
ka.
v
12i11.64
78
7946
Re
cei
v
ed
Jul
y
13, 201
4; Revi
sed Septe
m
ber
23, 201
4; Acce
pted
Octob
e
r 7, 20
14
Audio and Video Communication Software Design
Based on SIP
Shansha
n
Peng
Schoo
l of Information Sci
enc
e and En
gi
neer
ing, Hu
na
n Internatio
nal Ec
on
omics Univ
ersit
y
,
Chan
gsh
a
, Ch
ina, postco
de:
410
20
5
E-mail: matla
b
_b
ysj@
12
6.co
m
A
b
st
r
a
ct
T
h
is pap
er ma
kes a desi
gn
w
h
ich impl
e
m
e
n
ts
the aud
io a
nd vid
eo co
mmu
n
ic
ation
bas
ed on SI
P
over the
W
i
n2
K platfor
m
. T
h
e oSIP a
nd
eX
osip w
e
re
use
d
to exc
han
ge
the Si
gna
ls, a
nd the
jrtp w
a
s
for
packi
ng th
e a
udi
o an
d vi
de
o data. In th
e
end, the
mult
ithrea
din
g
w
a
s
ado
pted to
make the s
o
ftware
perfor
m
w
e
ll.
Ke
y
w
ords
: se
ssion i
n
iti
a
tion
protoco
l
(SIP), oSIP/eXosip,
j
r
tp, audi
o an
d
vide
o co
mmu
n
i
catio
n
, real-ti
m
e
transport proto
c
ol (RT
P
)
Copy
right
©
2014 In
stitu
t
e o
f
Ad
van
ced
En
g
i
n
eerin
g and
Scien
ce. All
rig
h
t
s reser
ve
d
.
1. Introduc
tion
With the ra
pid develo
p
m
ent of Internet te
chno
logy, multimedia commu
nicatio
n
techn
o
logy
h
a
s
bee
n
rapi
d devel
opme
n
t, which a
u
d
io a
nd vid
e
o
co
mmuni
cati
ons a
r
e th
e
most
intere
sting
d
e
velopme
n
t.
IP phon
es (VOIP) is a
tech
nolo
g
y to
tran
smit voi
c
e
throug
h
IP
netwo
rk.
In 1
995, Vo
calte
c
com
pany
l
aun
che
d
the
first p
hon
e
system b
a
sed
NTERNET
[1].
Since the
n
, IP phone
s
started to g
e
t peopl
e of all
age
s. Com
p
ared to th
e tradition
al PSTN
netwo
rk, V
O
IP is chea
pe
r and
ea
sier t
o
achieve.
In
re
cent yea
r
s, it has d
e
vel
oped
a lot of
IP
telepho
ny sta
ndard, and it
s call quality
is also ri
sin
g
, it has be
e
n
alre
ady co
mparable to
the
traditional PS
TN net
work.
Most of the
d
o
mesti
c
tele
p
hone
network is
ba
se
d o
n
H.323
protocol, becau
se t
he H.3
2
3
proto
c
ol i
s
propo
sed by IT
U-T, it is b
a
sed
on IP mul
t
imedia sta
n
d
a
rd
s, whi
c
h a
r
e devel
ope
d by
telecom
m
uni
cation
s net
work
sig
naling
and protoco
l
s,
rathe
r
tha
n
it is raised
spe
c
ifically for IP
phon
es, so the IP netwo
rk a
nd its ap
plicatio
n
are
the gre
a
t co
mplexity, and it is not easily
extended. Re
lative to H.323, SIP protocol has
in
here
n
t advantage
s, multimedia
commu
nications
of networks
with SIP p
r
ot
ocol
is a
nat
ural
th
ing.
China
ha
s d
e
velope
d a
tru
e
VoIP
software
based
on SI
P, whi
c
i
s
al
so
ra
rely, bu
t there
are
example
s
of
su
cce
ss,
su
ch
as T
s
ing
h
u
a
University NGN la
bo
ratory has
develo
ped
CoolSI
P
[2]. Voice n
e
twork
com
m
u
n
icatio
n can
not
only be
a
c
hie
v
ed by SIP p
r
otocol, b
u
t a
l
so vid
eo
net
work commu
nicatio
n
can
be
reali
z
ed,
this
pape
r is to introdu
ce SIP-b
a
se
d audi
o a
nd video softphon
e prin
cip
l
e and imple
m
entation.
2. SIP Profile
Sessi
on
Initiation Protocol (refe
rre
d t
o
a
s
SIP) is one
of th
e
core
proto
c
ol
in
next
gene
ration
n
e
twork (NG
N
). It wa
s ori
g
inally d
e
vel
oped
by the
IETF MM
USIC (M
ultipa
rty
Multmede
a S
e
ssion
contro
l) working
g
r
o
up, an
d
a
sta
ndard
wa
s p
r
opo
sed
in 1
9
96, the
sign
al
ing
control is
sol
v
ed for IP n
e
twork
comm
unication
s, SIP is a sig
n
a
ling work at
the appli
c
atio
n
layer, it is use
d
to cre
a
te, modify, and
terminate multi
m
edia sessio
ns process [3
].
Comp
ared wi
th H.323, SIP has a
simp
le, scal
able,
and the
r
e a
r
e the existin
g
Internet
appli
c
ation
fe
ature
s
clo
s
ely
,
therefo
r
e, in
re
ce
nt
years,
the
develo
p
m
ent of SIP a
pplication
s
a
r
e
much
fa
ster than
H.32
3. S
I
P starting
p
o
i
nt is t
hat
IP
telepho
ne
se
rvice n
e
two
r
k i
s
a
r
chite
c
ture
d
based on th
e existing Internet. The
r
ef
ore, SIP has a compl
e
tel
y
different desig
n idea
s
with
H.323, it is a
decentrali
ze
d
proto
c
ol, the
compl
e
xi
ty of the netwo
rk will be p
u
she
d
to the edg
e
o
f
the netwo
rk d
e
vice
s, and i
t
s com
pared
with
IP phon
es ba
se
d on
H.323
re
com
m
endatio
n, SIP
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Audio and Vi
deo Com
m
unication Software Desi
gn B
a
se
d on SIP (Shanshan Pe
ng)
7947
need
s
relativ
e
intellige
n
t termin
al. Fo
r the o
c
ca
sio
n
which
the
use
r
te
rminal
is
non
-intelli
gent
terminal
s, SIP can also be
used a
s
a ca
ll signalin
g [4].
SIP protocol [5-9] refe
rs a
n
d
widely uses the
two kind
s of netwo
rk
pr
oto
c
ol entiti
e
s: SIP
is u
s
e
d
for
hypertext tra
n
sfer p
r
oto
c
o
l
(H
TTP)
of
web
browse
r and fo
r
em
ail Simple
M
a
il
Tran
sfe
r
Prot
ocol
SMTP.
From th
e b
e
g
innin
g
HTT
P
, SIP draws C / S
de
sign
mode,
an
d
the
Uniform Re
source L
o
cato
r and
uniform
resou
r
ce id
e
n
tifier (URI)
a
r
e u
s
ed, SIP dra
w
s
plain t
e
xt
encodin
g
sch
e
me a
nd h
e
a
der
style fro
m
SMTP, SIP re-uses S
M
TP hea
d, such
as
To, F
r
om,
Date, Subjec
t, etc
.
3. Design Pr
ogram
SIP call setu
p functio
n
s
m
a
inly rely on
the co
mpletion of various
entities. Entity (client
)
is sent by th
e SIP to gen
erate
a requ
est, it is
sent
to the receiving SIP entity (serve
r). Se
rver
pro
c
e
s
ses th
e reque
st, a
n
d
o
ne
or mo
re cli
ent
re
spo
n
se
me
ssage
are returned.
Co
rrespon
di
ng
requ
est
and
resp
on
se
con
s
titutes
a tra
n
sa
ction
(T
ra
nsa
c
tion
). In
SIP protocol, com
m
uni
cati
on
comp
one
nt consi
s
ts of two
parts: the u
s
er age
nts an
d
network
serv
ers.
User Agent is an intelligent
term
inal
syst
em, the need joins
the cal
l
on behalf of
client
s,
whi
c
h in
clud
e
s
two pa
rts: a
use
r
age
nt cl
ient UAC
(User Agent Cli
e
nt), whi
c
h is
use
d
to initialize
a call
req
uest; Use
r
Ag
en
t Server UA
S (Use
r A
g
e
n
t Serve
r),
which
is u
s
ual
ly the
call
ed
destin
a
tion, it is u
s
ed to
a
n
swer
call
s a
nd to
send
o
u
t a re
spo
n
se. Network
servers in
clu
d
e
regi
stratio
n
server
(Regi
st
rar), we can
kee
p
ab
re
a
s
t
of the re
gist
ered
SIP use
r
s i
n
the
regi
on;
proxy se
rver
(Proxy Serve
r) is
simila
r to HTTP
p
r
ox
y, which rece
ives a re
que
st and
sen
d
the
requ
est; redi
rect serve
r
(Redire
ct
Se
rve
r) doe
s
not
submit ba
ckward
s afte
r receiving a
re
qu
est,
but the clie
nt area i
s
directly informed to re
q
u
e
s
t the next hop se
rver. Figu
re
1 illustrates t
h
e
pro
c
e
s
s in th
e entire au
di
o and
video
comm
uni
cati
on me
ch
ani
sm, the Prox
y Server i
s
o
n
ly
identified in three n
e
two
r
k serve
r
figure, the cu
st
ome
r
regist
ration p
r
ocesse
s a
r
e
not introdu
ce
d.
Figure 1. Audio an
d Vide
o Comm
uni
cation Pro
c
e
s
ses
Figure 2. Software Termina
l
Frame
w
o
r
k
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 11, Novem
ber 20
14: 79
46 – 795
1
7948
In the design
pro
c
e
ss, UA i
s
the focu
s of t
he design, b
e
ca
use it not
only respon
si
ble for
initiating a
call, and the
call i
s
processed, it
is
into a SIP entity of Human
-
Compu
t
er
Interactio
n(HCI). The foll
owin
g de
sign
is bas
ed o
n
a client o
peratin
g syst
em Win
2
K (UC)
s
o
ftware,
the s
o
ftware s
t
ruc
t
ure i
s
shown in Figure 2.
SIP softphon
e stru
cture in
clud
es two module
s
, whi
c
h
are sig
nalin
g
control m
o
d
u
le and
audio a
nd video commu
ni
cation m
odul
e. Signaling
c
ontrol mod
u
l
e
is imple
m
e
n
ted by the SIP
proto
c
ol, the spe
c
ific p
r
oto
c
ol sta
ck a
r
e
oSIP
and e
X
osip, its ma
in achieve
m
e
n
t is to creat
e,
modify and
d
i
smantle
the
call; au
dio a
n
d
video
com
m
unication m
odule
co
nsi
s
t
s
of th
ree
sub-
module
s
com
positio
n, whi
c
h are
audi
o
and vide
o da
ta interfa
c
e,
audio
and vi
deo
cod
e
cs
and
RTP tran
spo
r
t, its functions are audi
o an
d video
ca
ptu
r
e, encodin
g
, transmissio
n and playb
a
ck.
3.1. Signalin
g Control Module De
sign
Curre
n
tly, a
relatively larg
e numb
e
r of
open
sou
r
ce
SIP protocol
stack in
clud
e
s
Vocal,
sipX, ReSIProcate a
nd oS
IP.
Signaling
co
n
t
rol se
ction
u
s
e
s
oSIP prot
ocol
st
a
ck, be
cau
s
e
oSIP source
cod
e
is mainly
one of
the
fe
w
p
r
oto
c
ol stack whi
c
h
are wri
tten b
y
usin
g
C la
ngua
ge, it h
a
s th
e
sho
r
t
and
con
c
i
s
e cha
r
acteri
stics,it focu
se
s on th
e SIP
underl
y
ing parsing,
and there is more efficie
n
t.
But there are
also di
sa
dvan
tages, the first, the av
ailability is poor, there is n
o
goo
d API packa
g
e
,
so that th
e u
pper layer
ap
plicatio
n is
b
r
oken
at
the time the
call
proto
c
ol
sta
c
k; se
co
ndly,
a
transactio
n
le
vel proto
c
ol p
a
rsi
ng p
r
o
c
e
s
s is ju
st
don
e
,
the resolve
of call, se
ssi
on, dialog a
n
d
other
pro
c
e
s
s a
r
e la
cked,
this al
so i
n
crea
se
s t
he
a
pplication diff
iculty; again,
the me
cha
n
isms
for
con
c
u
rre
n
t
pro
c
e
s
sing
threa
d
i
s
la
cked, so it h
a
s
limited p
r
o
c
e
ssi
ng
ca
pabili
ty. eXosip i
s
an
extensio
n of the agre
e
me
nt set oSIP,
whi
c
h pa
rtia
lly encap
sulat
e
s oSIP prot
ocol sta
c
k, it is
made ea
sie
r
to use. eXo
s
i
p
increa
se
s a
nalytical
call, dialog, re
gist
ration, sub
s
cri
p
tion and oth
e
r
proc
es
ses
,
tere is
more prac
tic
a
l. In s
u
mmary,
oSIP protocol sta
c
k plu
s
eXosi
p
protocol sta
c
k
to achieve SIP protocol, this is a go
od choice.
3.2. Audio a
nd Video Tra
n
smission M
odule De
sign
Audio an
d video d
a
ta inte
rface
sectio
n
incl
u
d
e
s
au
di
o and vid
eo
capture
and
pl
ayback.
Becau
s
e it is developed
based on
Wi
n2K system
platform, so t
he Win
d
o
w
s
API function of
oneself libraryis used in
audio
capture
and pl
ayback , waveInXXX
class i
s
used with
recording
function, waveOutXXX class functions i
s
used in
pl
ay sound. AVICap
wi
ndow
class VF
W (V
ideo
for Windows) is used in Vi
deo
capture,
VFW is
devel
oped by Mi
crosoft, whi
c
h
is rel
e
ased
with
the Wind
ows
operating sy
stem. A key idea of VF
W pl
ayback is tha
t
no spe
c
ial h
a
rd
wa
re, whi
c
h
enabl
es th
e
appli
c
ation
progra
m
digital,
and
the vid
eo i
s
gotten
from tradition
al an
alog
vide
o
play
ba
ck sou
r
ce
s.
audio
and
video e
n
codin
g
Inclu
d
e
s
a
udio e
n
codin
g
and
video
encodin
g
. When a
udi
o
and vid
eo
dat
a a
r
e
ca
pture
d
the
amo
unt
of data
i
s
u
s
ually very la
rge, it is not
condu
cive to
the
transmissio
n netwo
rk,
th
ro
ugh codi
ng, without com
p
romi
sing th
e
quality of voi
c
e a
nd vid
eo,
th
e
amount of da
ta are red
u
ced maximuml
y. Audi
o codec sta
nda
rd
s include G.7
11, G.723.1 and
G.729a; vide
o cod
e
c
stan
dard
s
in
clud
e
H.263 an
d H.264.
Tran
smi
ssi
on
of audio and
video packet
s
use a re
al-t
ime tran
spo
r
t proto
c
ol RT
P (Real
-
Time Tra
n
sp
ort Proto
c
ol)
and RT
CP (Real-Tim
e Tra
n
sp
ort Co
ntro
l Protocol
).
4. Implementation
Based
on th
e above
sch
e
me, the
clie
nt softwa
r
e i
m
pleme
n
tatio
n
incl
ude
s t
w
o p
a
rt
s,
whi
c
are
sign
aling control
and au
dio & video com
m
u
n
icatio
n, in order to imp
r
ov
e the efficien
cy
of resou
r
ce
u
s
e, si
gnalin
g
control, and
audio
& vide
o com
m
uni
ca
tions a
r
e a
c
h
i
eved by usi
n
g
sub
-
thread.
The rel
a
tion
ship between
each threa
d
is sh
own in Figure 3.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Audio and Vi
deo Com
m
unication Software Desi
gn B
a
se
d on SIP (Shanshan Pe
ng)
7949
Figure 3. Inter-th
r
ead
Com
m
unication
Whe
n
the
sof
t
ware
sta
r
ts, the main th
re
ad is
gene
rat
ed, and th
en
nine
sub
-
thre
ads
are
prod
uced in t
he main th
re
ad . The m
a
i
n
threa
d
is
u
s
ed to
co
ordi
nate the b
e
h
a
vior of ea
ch
sub
-
thread
actio
n
, the messa
g
e
notification
mech
ani
sm
is comm
uni
cat
ed with the
child thre
ad, a
nd it
also respon
si
ble for sign
al
ing se
nding
a
nd pro
c
e
s
sin
g
se
ction.
4.1. Signallin
g Control Module
Signaling
con
t
rol mo
dule
i
s
respon
sibl
e
for th
e regi
stration a
nd to
initiate a
call
to the
serve
r
, mainl
y
in sub-th
rea
d
monitor of
SIP signaling
and the main
thread p
r
o
c
e
s
sing.
SIP signalin
g
excha
nge
proce
s
s is th
e
uppe
r ap
p
lica
t
ion, whi
c
h
ca
lls SIP proto
c
ol sta
ck
and
provides API functio
n
s
, th
e p
r
oto
c
ol sta
c
k i
s
no
tified to
app
ropriate
a
c
tion
, the
stack
wi
ll
detect th
e u
n
derlying
time,
whi
c
h
is
re
p
o
rted
in
th
e
form of m
e
ssage to
the a
pplication lay
e
r,
after the appli
c
ation laye
r receive
s
SIP mess
ag
ema
d
e, and appro
p
riate tre
a
tment is made.
The entire proce
s
s of regi
stration,
call signali
ng
cont
rol flowcha
r
t is sh
own in Figure 4.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 11, Novem
ber 20
14: 79
46 – 795
1
7950
Figure 4
.
Sig
naling
Call whole Pro
c
e
ss
4.2. Audio a
nd Video Co
mmunicatio
n Module
Audio a
nd vid
eo d
a
ta tra
n
smissi
on th
re
a
d
of
comm
uni
cation
mod
u
l
e
in
clude
s vid
eo a
nd
audio data sampling, co
di
ng
an
d RTP
pa
cka
ged
a
nd
sent tra
n
smi
ssi
on; re
ceiving
thre
ad
inclu
d
e
s
RTP
packets rece
iving, decodi
ng and
au
dio
& video playback. Enco
di
ng and de
co
d
i
ng
proto
c
ol
stan
dard
s
a
r
e
used in a
udio
and vide
o co
mmuni
cation
pro
c
e
ss,
duri
ng the the S
I
P
sign
aling exchang
e pro
c
e
s
s, agre
e
ment
is rea
c
he
d throug
h neg
otia
tion of messa
ge body.
Audio and vid
eo com
m
uni
cation spe
c
ific
data pro
c
e
s
ses are sh
own
in Figure 5.
Figure 5
.
Au
dio and Vide
o
Commu
nication Pro
c
e
s
s
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Audio and Vi
deo Com
m
unication Software Desi
gn B
a
se
d on SIP (Shanshan Pe
ng)
7951
Among them,
the audi
o code
c sta
nda
rds in
clu
de G
.
711, G.723.
1 and
G.729
a; video
cod
e
c sta
nda
rds in
clud
e
H.263 a
nd
H.2
64. What
kin
d
of
crite
r
ia
i
s
u
s
e
d
in
the
spe
c
ific call?
the
call si
gnalin
g excha
nge i
s
negotiate
d be
tween the two side
s.
5. Conclusio
n
Audio a
nd vid
eo p
hone
soft de
sign
is
pro
posed
ba
sed
on SIP Proto
c
ol, the
de
sig
n
ide
a
s
and meth
od
s are detail
ed
to chan
ge th
e softwa
r
e,
a
nd these are
impleme
n
ted
by using Vi
sua
l
Studio on Wi
n2K platform.
In future studies, we
will con
s
id
er the
se
curity me
chani
sms[1
0
], and
the P2P technology is used, the part function of t
he
server will be pushed
to the brink of the
netwo
rk, the
pre
s
sure i
s
redu
ced o
n
the se
rver.
SIP proto
c
ol
s are for its sim
p
licity, versatili
ty,
portability, and other
char
acteri
stics, which are attent
i
on by everyone[11-13
]. Development and
Utilizatio
n of
SIP multimedia com
m
uni
cations
will
be
the future
d
e
velopme
n
t trend
of network
comm
uni
cati
ons.
Referen
ces
[1] BUR
GOODE.
Voice Over Internet Protoco
l
.
Procee
din
g
s of
the I
EEE. 2002; 90(9): 14
95-
151
7.
[2]
Cao
Xuanm
ing
,
Z
hang Sifa,
Hua
ng Yo
ngfe
ng. Base
on S
I
P Instant Message Inte
grate
Applic
atio
n
.
Data Comm
uni
cations. 20
07;
6(3): 33-3
7
.
[3]
SI DuanF
e
ng,
HAN
Xi
nHu
i
, L
O
NG Qin. A Surve
y
o
n
the C
o
re
T
e
chniq
ue and Res
earch Devel
opm
en
t
in SIP Standar
d.
Journa
l Of Softw
are
. 2005; 16(2): 23
9-2
5
0
.
[4]
LI Jun, XIE Z
anfu, CUI Huai
li
n. Design a
n
d
Implementati
o
n of
Voice Co
mmunicati
on B
a
sed o
n
SIP
Protocols.
C
o
m
p
u
t
e
r
En
gi
nee
ri
ng
.
200
5; 31
(24): 117-
11
9.
[5]
Bai Jia
n
j
un, P
eng
Hui, T
i
an
Min. SIP Re
veal
ed. Pe
opl
e'
s Posts and
T
e
lecommuni
cations Pr
ess,
Beiji
ng, Ch
in
a. 200
3; 6.
[6]
Rose
nber
g J
,
Schulzri
nne
H, Cama
nilo
G. SIP: Session
initi
a
tion
protoco
l
. RF
C
3261[EB/OL].http://
w
w
w
.
ietf.o
rg/rfc/rfc3261.tx
t,2011.
[7
]
El
w
e
ll
J. C
o
nn
e
c
ted i
d
en
ti
ty i
n
the se
ssi
on i
n
i
t
ia
ti
on p
r
o
t
o
c
o
l
(SIP),R
F
C
4916[EB/OL].http://
w
w
w
.
ietf.org
/rfc/rfc4916.tx
t,2011.
[8]
Johnsto
n AB. SIP: Understan
din
g
the Sessi
on Init
iati
on Pr
otocol[M]. Boston,USA:Artech
House,2
0
0
4
.
[9]
Han
d
le
y M, Ja
cobso
n
V, Perkins C.
SDP: Session D
e
scripti
on Protoco
l
. 20
06.
[10]
Arkko J, T
o
rvinen V, C
a
mar
i
llo G. Secur
i
t
y
mech
a
n
ism a
g
reem
ent for t
he sess
io
n in
iti
a
tion
protoc
ol
(SIP), RFC 3329[EB/OL].http://
w
w
w
.
ietf.org/
r
fc/rfc3329,2011.
[11]
Liu
She
n
x
i
ao,
W
ang
Xuech
u
n
, W
ang
Z
h
ig
a
ng. T
he Ach
i
eveme
n
t of V
o
IP’s SIP Ca
lli
n
g
i
n
T
r
i-Pla
y
Sw
i
t
ch
. Bulleti
n of Science
a
nd T
e
chn
o
l
ogy
. 2013; (10)
[12]
XU P
eng
yu
, XU Z
i
can. Desi
gn an
d Impl
e
m
entatio
n of Monitori
ng S
y
s
t
em Based on
SIP.
Comp
uter
Engi
neer
in
g
. 2013; (11).
[13]
Peterson
J, Je
nni
ngs
C. En
h
ancem
ents for
authe
nticate
d
i
dentit
y m
a
n
a
g
e
ment
in th
e s
e
ssio
n
i
n
itiati
o
n
protocol (SIP),RFC 4474[EB/OL]. http
:/
/
w
ww
.
i
etf.org/rfc/rf
c
4474,2011.
Evaluation Warning : The document was created with Spire.PDF for Python.