TELKOM
NIKA Indonesia
n
Journal of
Electrical En
gineering
Vol.12, No.5, May 2014, pp
. 4091 ~ 41
0
0
DOI: http://dx.doi.org/10.11591/telkomni
ka.v12i5.4388
4091
Re
cei
v
ed O
c
t
ober 1
5
, 201
3; Revi
se
d Decem
b
e
r
26, 2013; Accept
ed Ja
nua
ry 1
9
, 2014
Based on Weighted Gauss-Newton Neural Network
Algorithm for Uneven F
o
restry Information Text
Classification
Yu Chen*, Li
w
e
i Xu
Dep
a
rtment of informati
on an
d computer e
n
g
in
eeri
ng, Nort
heast forestr
y
univ
e
rsit
y
,
C
h
i
n
a
Harbi
n
cit
y
of Heil
on
gji
a
n
g
Provinc
e
, 26 He
xing R
oad,
Xi
an
gfan
g Dist
r
ict, Northeast F
o
restr
y
Un
ive
r
sit
y
, 15
004
0
*Corres
p
o
ndi
n
g
author, em
ail
:
xuli
w
e
i
4
7
5
2
7
360
8@1
63.co
m
A
b
st
r
a
ct
In ord
e
r to d
e
a
l w
i
th the
pro
b
le
m
of low
ca
tegori
z
at
ion
ac
curacy of
mino
rity class
of th
e un
eve
n
forestry infor
m
ation text clas
sifica
tio
n
al
gori
t
hm, this p
a
p
e
r
puts forw
ard the un
even for
e
stry infor
m
ati
o
n
text classificati
on al
gor
ith
m
b
a
sed
on w
e
ig
h
t
ed Gau
ss-N
e
w
t
on neura
l
ne
tw
ork, on the basis of w
e
i
g
h
t
ed
Gauss-New
t
on
alg
o
rith
m, th
e al
gorit
h
m
is
prove
d
vi
a
sing
ular v
a
l
u
e
deco
m
pos
itio
n pri
n
cip
l
e. T
h
e
exper
imenta
l
r
e
sult sh
ow
s that the
al
gorith
m
h
a
s hi
gh
er classificati
on accuracy
of m
a
jority
class and
mi
nority c
l
ass
than a
l
g
o
rith
m of co
mmo
n
c
l
assificati
on. T
he a
l
g
o
rith
m e
x
pan
ds a
new
met
hod
for th
e
researc
h
on th
e unev
en fores
t
ry informat
i
on
text classificati
on al
gorith
m
.
Ke
y
w
ords
:
tex
t
classificatio
n
, w
e
ighted Ga
us
s-New
t
on, itera
t
ive alg
o
rith
m
Copy
right
©
2014 In
stitu
t
e o
f
Ad
van
ced
En
g
i
n
eerin
g and
Scien
ce. All
rig
h
t
s reser
ve
d
.
1. Introduc
tion
Fore
stry information re
so
u
r
ce
s a
r
e very
rich in China
,
with more a
nd more peo
ple use
the Internet, fore
stry information obtain
ed also
sho
w
ed an up
ward trend, but in real life, people
just wa
nt the Internet to provide a
small part of
forest
ry information, there
f
ore the fore
stry
informatio
n text classifi
cati
on tech
nolo
g
y
emerge a
s
the time req
u
ires.
This p
ape
r p
u
ts forwa
r
d the uneve
n
fore
stry
information text classificatio
n
algorith
m
based on
weighted G
a
u
s
s-Newto
n
n
eural
network.
Firstly, pretreatme
nt of uneven forestry
informatio
n text using ICT
C
LAS Chine
s
e
word
segme
n
t
ation system
of the Chine
s
e Academy
of
Scien
c
e
s
(se
g
mentation a
nd to stop word
s), sec
o
n
d
ly, using the classi
cal T
F
-IDF form
ul
a to
cal
c
ulate th
e
eigenvalu
e
s of text word
,con
stitut
e the initial text featur
e m
a
trix, Then, by the
method of prin
cipal
co
mpone
nt an
alysis to
re
duce the dimensi
onality
of the fea
t
ure
matrix
;
Finall
y
, the dimensionality redu
ction cha
r
a
c
teri
stic matrix fo
r training, con
s
tru
c
t weig
hted
Gau
s
s-Newto
n
ne
ural n
e
twork cl
assifi
er, in
o
r
de
r
to achieve
the p
u
rpo
s
e
of cla
s
sificati
on.
Thro
ugh a lot
of experimen
ts demo
n
strate that
the alg
o
rithm ha
s re
ach
ed the ex
pecte
d goal, t
he
cla
ssif
i
cat
i
on
of
t
he min
o
rit
y
cla
ss
and
majorit
y
class
h
a
s high
er
co
rrec
t
rate.
Beca
use of
the
spe
c
ificity of
the un
even t
e
xt [1-3], through
the
gl
o
bal a
c
curacy
or
erro
r rate
to evaluate
the
perfo
rman
ce
of cla
ssifie
r
a
r
e not e
nou
g
h
, theref
o
r
e, the ge
ometri
c
mean fo
rmul
a are i
n
tro
d
u
c
ed
t
o
con
s
ide
r
t
he cla
s
sif
i
cat
i
on pe
rf
orm
a
n
c
e of
t
he min
o
rit
y
cla
ss a
n
d
majorit
y
cl
a
ss
sam
p
le
s [
4
]
.
The cl
assification pe
rform
ance of the a
l
gorithm i
s
si
gnifica
ntly higher th
an tha
t
of the cla
ssi
ca
l
cla
ssifi
cation
method
of u
n
e
ven fo
re
stry info
rm
ation t
e
xt, it provide
s
a
ne
w met
hod fo
r
unev
en
fores
t
ry information text c
l
ass
i
fic
a
tion.
2.
Ke
y
T
echnol
og
y
of the Unev
en Forestry
Information T
e
x
t
Classifica
tion
Algorithm
Bas
e
d on W
e
ighted G
a
u
ss-Ne
w
t
on
Neural Ne
t
w
o
r
k
2.1. Repre
s
e
n
ta
tion of un
ev
en forestr
y
information tex
t
Pretreatment
of uneven
forestry informat
ion text
using ICTCLAS Chinese wor
d
segmentation system of the
Chin
ese
Academy of Sciences
(segmentation an
d to sto
p
words), cou
n
t the weig
h
t
of word of
uneven for
e
stry information text, constitute the in
itial
tex
t
feature matrix
.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 5, May 2014: 4091 – 41
00
4092
The a
s
sump
tion that th
e total num
ber
of
all
chara
c
te
risti
c
s of uneve
n
fore
stry
information t
e
xt is
n
, formation the
n
-dimen
sion
al
vector spa
c
e
,
Each
un
even fo
rest
ry
information text
d
is rep
r
e
s
e
n
ted as a
n
-di
m
ensi
onal fe
ature vecto
r
.
)
(
,
;
);
(
,
);
(
,
(
d
2
2
1
1
)
)
(
d
W
T
d
W
T
d
W
T
V
n
n
(1)
Her
e
,
i
T
is the text segment
ation of
unev
en fore
stry in
formation,
)
(
d
W
i
is the weight
of
i
T
in text
d
, using the TF-I
D
F formula to calcul
ate the weight of text
word [5].
n
i
i
i
i
i
n
i
i
i
i
i
L
n
N
t
TF
L
n
N
TF
t
IDF
t
TF
t
IDF
t
TF
d
1
2
1
2
i
))
log(
)
(
(
)
log(
)
t
(
))
(
)
(
(
)
(
)
(
)
(
w
(2)
In the formul
a (2),
)
(
w
d
i
repre
s
ents the weight of feature words
i
T
,
)
t
(
i
TF
is the
numbe
r of feature words
i
T
that appears in the text
d
,
N
rep
r
e
s
ent
s the total number o
f
uneven fo
re
stry informatio
n
text,
i
n
is the
numbe
r of fe
ature
wo
rd
s
i
T
appe
are
d
in t
he un
even
fores
t
ry information text.
2.2. Unev
en
Fores
tr
y
Info
rmation T
e
xt Featur
e Selection
The text fea
t
ure m
a
trix
dimen
s
ion
re
ductio
n
, the
sele
ction
of
prin
cipal
co
mpone
nt
analysi
s
.
Suppo
se th
ere a
r
e
n
s
a
mpl
e
s
of
t
e
x
t
,
ea
ch
sa
mple
ha
s
p
eige
nvalue
s
P
X
X
X
,...
,
2
1
,
get the origin
al data feature matrix [6].
)
(
2
1
2
1
2
22
21
1
12
11
p
np
n
n
p
p
X
X
X
x
x
x
x
x
x
x
x
x
X
p
2
1
2
1
i
,
,
,
,
i
x
x
x
X
ni
i
i
(3)
Text feature matrix
X
is a linear combi
nati
on of
p
vec
t
ors
P
X
X
X
,...
,
2
1
, Principal
comp
one
nt scores g
ene
rat
ed.
p
pp
p
p
p
p
p
p
p
X
a
X
a
X
a
F
X
a
X
a
X
a
F
X
a
X
a
X
a
F
2
2
1
1
2
2
22
1
12
2
1
2
21
1
11
1
(4)
Equals to:
p
i
X
a
X
a
X
a
F
p
pi
i
i
,
2
1
2
2
1
1
i
,
,
(5)
Cons
traint for c
oeffic
i
ent of:
)
(
2
1
pi
i
i
i
a
a
a
a
,
,
,
p
i
a
a
a
ni
i
i
,
,
,
,
,
2
1
1
2
2
2
2
1
(6)
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Based o
n
We
ighted Ga
uss-Ne
wton
Neu
r
al Net
w
o
r
k A
l
gorithm
for Uneven Fo
re
stry… (Yu Che
n
)
4093
Use the foll
o
w
ing fo
rmul
a
to cal
c
ul
ate
the co
varianc
e
matrix of text feature matrix
p
p
ij
s
S
)
(
Among,
p
j
i
x
x
x
x
n
s
j
kj
i
n
k
ki
ij
,
,
,
,
2
1
,
)
)(
(
1
1
_
_
1
(7)
Cal
c
ulating t
he eigenval
u
e
s
0
2
1
n
of covari
ance matrix S and the
corre
s
p
ondin
g
eigenve
c
tor.
pp
p
p
p
p
p
a
a
a
a
a
a
a
a
a
a
a
a
2
1
2
22
12
2
1
21
11
1
,
,
,
(8)
The
i
-th princi
ple com
pon
e
n
t sco
re
s of feature m
a
trix is
p
i
X
a
F
i
i
,
,
,
2
,
1
'
By calcul
atin
g the co
ntribu
tion rate an
d
the
cum
u
lative cont
ributio
n
rate to determine all
of the main compon
ents
which
sho
u
ld b
e
sele
cted for experime
n
tal
evaluation.
p
j
i
i
i
1
and
p
j
i
r
j
i
r
G
1
1
)
(
(9)
In the experi
m
ent, the extractio
n
of the cumula
tive contributio
n ra
te of 99% of the main
comp
one
nt, cal
c
ulatio
n o
f
n
sam
p
le
s
in t
he sel
e
ct
ed
r
fore
stry inform
ation prin
cip
a
l
comp
one
nts scor.
r
i
X
a
X
a
X
a
F
p
pi
i
i
i
,
,
,
2
,
1
2
2
1
1
(10)
2.3. W
e
ighte
d
Gauss
-
Ne
w
t
o
n
Algorithm
Comm
only used cla
s
sifica
tion methods of uneven
fo
rest
ry inform
ation text are
suppo
rt
vector ma
chi
nes, Bayesi
a
n
, decisi
on tree, etc.
These classi
c une
ven fore
stry
i
n
formatio
n text
cla
ssif
i
cat
i
on
algorit
hm f
o
r t
he minor
it
y
class cl
a
ssif
i
cat
i
on a
c
cur
a
cy
rat
e
i
s
v
e
ry
low
,
t
he
experim
ental
results of
uneven fo
re
stry inform
atio
n text classi
fication alg
o
rithm based
on
weig
hted G
a
uss-Ne
wton
neural net
work have
been
bette
r
,
Impro
v
e the accu
ra
cy of the min
o
rity
cla
ss
cla
ssif
i
c
a
t
i
on [
7
,
8]
.
Ne
wton'
s method using the main idea of
t
he second-order
T
a
ylor expansi
o
n of
the
obje
c
tive function, and then
find its minimization [9].
A
ssu
ming t
h
a
t
)
(
x
f
twice
dif
f
erentiable,
n
k
R
x
,
)
(
2
k
x
f
is positive defi
n
ite He
ss
matrix, using
the
T
a
ylor ex
pan
sion of
)
(
x
f
, formul
a for the type [10].
s
x
f
s
s
x
f
x
f
s
x
f
k
T
T
k
k
k
)
(
2
1
)
(
)
(
)
(
2
(1
1)
The ab
ove formula,
k
x
x
s
, the
minimum val
ue ca
n be o
b
tained by th
e followin
g
formula.
k
k
k
k
t
T
x
x
1
1
(12)
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 5, May 2014: 4091 – 41
00
4094
The ab
ove formul
a is
Newton ite
r
ati
on form
ula,
in the form
u
l
a
)
(
2
k
k
x
f
T
,
)
(
k
k
x
f
t
, that is
to s
a
y
k
T
rep
r
e
s
ents th
e se
con
d
de
rivati
ve and
k
t
repres
ents
firs
t
derivative of the functio
n
[1
1].
Ne
wton meth
od, the sele
ction of initial
point
is
very
important, If t
he initial point away
from the optimum value obtaine
d is far from
the last, the second derivativ
e matrix is
not
necessa
rily positive definit
e, therefore, the sea
r
ch directio
n is not
nece
s
sarily decli
ne, and the
final result wil
l
not accurate
enoug
h [12, 13].
Gau
s
s-Newto
n
is th
e iterat
ive algo
rithm
fo
r un
co
nst
r
a
i
ned mi
nimization, the b
a
s
ic idea
is the o
b
je
ctive function
o
f
t
he
least squares probl
em
igno
re
se
con
d
-o
rd
er in
formation ite
m
s
)
(
x
S
, the two mod
e
l of objective
function turn
s into:
)
))(
(
)
(
(
)
(
2
1
)
(
)
(
)
(
)
(
2
1
k
k
T
k
T
k
k
T
k
T
k
k
T
k
k
x
x
x
J
x
J
x
x
x
x
x
r
x
J
x
r
x
r
x
m
(13
)
With
)
(
v
x
T
rep
r
e
s
e
n
ting the outp
u
t value of the er
ror, the error fun
c
tion i
s
expresse
d
as:
n
i
T
i
x
v
x
v
x
v
1
2
)
(
)
(
)
(
)
x
(
f
(14)
Gradi
ent of
)
(
x
f
is
:
)
(
)
(
2
)
(
x
v
x
J
x
f
T
(15)
He
ss mat
r
ix
of
)
(
x
f
is
:
)
(
2
)
(
)
(
2
)
(
2
x
S
x
J
x
J
x
f
T
(16)
In the two formulas,
)
(
x
J
is Ja
cobi matrix, pu
t the last two
equatio
ns int
o
formula
14
to
obtain iteratio
n formula of
Gau
ss –
Ne
wton.
)
(
)
(
)
(
)
(
1
1
k
k
T
k
k
T
k
k
x
v
x
J
x
J
x
J
x
x
(17)
Comp
are Ga
uss-Ne
wton
with Ne
wton
me
thod, there’s no ne
ed to cal
c
ulate
)
(
2
x
f
,
avoid the se
cond order ma
trix is
not positive definite
and se
arch
di
rectio
n doe
s not nece
s
sa
ri
ly
decli
ne, but in the formula,
)
(
)
(
k
T
k
T
x
J
x
J
still irreversi
b
le, So the algorithm may not converge.
The cla
s
sification of directly using M algorithm
is in
e
ffective, to solv
e the above probl
ems, the
Gau
ssi
an ite
r
ation fo
rmul
a,
)
(
)
(
x
J
x
J
T
add
s a
para
m
eteri
z
e
d
unit array
I
to mak
e
the
Gau
s
s - Newton alg
o
rithm
ha
s b
e
tter
regula
r
izati
on
nature,
overcome
whe
n
S
is sin
gula
r
, t
h
e
algorith
m
co
n
v
erge
s to a case of no
n-re
side
nt. Formu
l
a as follo
ws.
)
(
)
(
)
(
)
(
1
1
k
k
T
k
k
T
k
k
x
v
x
J
I
x
J
x
J
x
x
(18)
The a
bove fo
rmula,
I
is
a
n
n
unit matrix,
0
,
i
s
the
re
gula
r
i
z
ation
pa
ram
e
ter.
To ma
ke
Ga
uss - Newto
n
algo
rithm
with gl
o
bal converg
e
n
c
e, then
a
dd
a
one-dime
nsio
nal
sea
r
c
h
f
a
ct
o
r
with dampin
g
function to
obtain the formula.
)
(
)
(
)
(
)
(
1
1
k
k
T
k
k
T
k
k
k
x
v
x
J
I
x
J
x
J
x
x
(19)
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Based o
n
We
ighted Ga
uss-Ne
wton
Neu
r
al Net
w
o
r
k A
l
gorithm
for Uneven Fo
re
stry… (Yu Che
n
)
4095
In the above formul
a,
k
I is a one-di
men
s
i
onal search f
a
ctor, exp
r
e
s
sed a
s
follo
ws.
2
2
)
(
)
(
)
(
)
(
)
(
k
k
T
k
k
T
k
k
x
v
x
J
x
J
x
v
x
J
(20)
Paramete
r
k
is determin
ed b
y
the selectio
n as follo
ws:
)
1
,
0
(
||,
)
(
)
(
||
)
1
(
||
)
(
||
(
k
k
T
k
k
k
x
v
x
J
x
v
(21)
Here it must be noted that
the sele
ction
of paramete
r
should be l
a
rge
r
, be
cau
s
e of
the very larg
e dimen
s
ion
of text feature matrix, So
)
(
)
(
k
T
k
x
v
x
J
will have a significant
numbe
r of no
rm, and the
p
a
ram
e
ter
)
(
k
x
v
norm of value wi
ll be sm
aller,
so mu
st en
su
re that
1
.
Although the
different pa
ra
meters and a
d
justme
nt of
)
(
)
(
x
J
x
J
T
, the Gauss -
Ne
wton
algorith
m
ha
s bette
r co
nverge
nce, there
are
st
i
ll some
re
s
t
rict
ion f
a
ct
o
r
s m
a
k
e
s t
he
cla
ssifi
cation
effect po
or,
so the type
an
d then
wei
ght
ed p
r
o
c
e
ssi
n
g
, join the
we
ight matrix
k
to above
formula, redu
ce
the
error fea
t
ure m
a
trix
di
mensi
onality
redu
ction
on
the cl
assificatio
n
of the impa
ct, the cla
ssifi
cation pe
rform
ance
is im
proved, Wei
ght
ed Ga
uss -
Ne
wton iterative
formula is
as
follows
:
)
(
)
(
)
(
)
(
1
1
k
k
T
k
k
T
k
k
k
k
x
v
x
J
I
x
J
x
J
x
x
(22)
In the above formul
a, Weig
ht matrix is as follows:
)
1
,
,
1
,
1
(
.
1
1
2
1
1
1
N
diag
(23)
1
i
is the i-th co
mpone
nt of calcul
ated
)
(
)
(
λ
)
(
)
(
-
1
-
k
k
T
k
k
T
x
v
x
J
I
x
J
x
J
,
is
the scale fact
or, formula i
s
as follo
ws:
2
1
2
1
2
1
.
N
N
i
i
(24)
Formul
a 24 a
s
the iterative
method of
weighted G
a
u
s
s-Ne
wton alg
o
rithm.
The follo
win
g
sta
b
ility of weig
hted
G
auss-
Ne
wton
iterative me
thod is p
r
oved. Th
e
iterative form
ula 24 corre
s
pond to a line
a
r lea
s
t squ
a
res proble
m
of equation
s
.
)
(
)
(
]
)
(
)
(
[
k
k
T
k
k
T
x
v
x
J
I
x
J
x
J
s
(25)
The iterative formul
a 19 correspon
d to a linear le
ast sq
uare
s
p
r
obl
e
m
for the equ
ations.
)
(
)
(
)]
(
)
(
[
1
k
T
k
k
T
N
x
v
x
J
x
J
x
J
s
(26)
In the formula
27 meets
k
.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 5, May 2014: 4091 – 41
00
4096
2.4. T
ext
Cla
ssifier Perfo
r
mance Ev
alua
tion of
Un
ev
en Forestr
y
Information T
e
x
t
Use only th
e
global
accu
ra
cy or erro
r to
ev
aluate th
e
imbala
n
ced
data cl
assifie
r
is one
-
side
d, theref
ore, the i
n
trodu
ct
ion of
the followi
ng
formula,
co
nsid
erin
g th
e cla
s
sificati
on
perfo
rman
ce
of minority an
d majority cla
ss [14].
The co
rrect rate of minorit
y class sam
p
l
e
s,
TP
said the
minority cla
ss is the numb
e
r of
cor
r
e
c
t
l
y
cla
s
sif
i
ed,
FN
refers to the number of miscla
ssifi
cation
of minority
cla
ss to th
e
majority c
l
ass.
Sensitivity=
)
(
FN
TP
TP
(27)
The co
rr
ect
r
a
t
e
of
majorit
y
clas
s sam
p
l
e
s,
TN
sai
d
the majority cla
s
s is the numb
e
r of
cor
r
e
c
t
l
y
cla
s
sif
i
ed,
FP
refers to the nu
mbe
r
of miscla
ssifi
cation
of maj
o
rity cla
s
s to
the mino
rity
cla
ss.
)
(
TN
FP
TN
y
Specificit
(28)
ecision
Pr
represents fo
r minority cla
ss p
r
e
c
isi
on.
)
(
Pr
TP
FP
TP
ecision
(29)
Corre
c
t rate o
f
geometri
c m
ean
mean
G
.
y
Specificit
y
Sensitivit
G
(30)
Minor
ity c
l
ass’
s
measure
F
.
ecision
y
Sensitivit
ecision
y
Sensitivit
F
Pr
Pr
2
(31)
3. Experimenta
l
Results
The followi
ng
table sho
w
s the sel
e
cte
d
e
x
perime
n
tal sample
s.
T
abl
e 1. Selection
T
able of
Uneven Fo
re
stry Informati
on
T
e
xt
flow
er
s tr
ees
insects
Soil
ty
pe w
a
ter
class
the number
of tra
i
ning sample
1000
50
1000
1000
50
the number
of te
st sample
50
50
50
50
50
As
sho
w
n
in
Tabl
e 1,
sel
e
ct five
cate
gorie
s of u
n
e
ven fo
re
stry inform
ation,
flowe
r
s,
trees, in
se
cts, soil type, w
a
ter cla
s
s, techni
cal
poi
nt uneven data
refers to the different cla
s
se
s
sho
w
un
equ
al distri
butio
n of sampl
e
sets, so
ch
oose flowe
r
s, insect
s, soi
l
three ki
nd
s of
sampl
e
s a
s
t
he m
a
jority
cl
ass, the
ele
c
t
i
on of
10
00
s
a
mple
s.
Cla
s
s t
r
ee
s,
w
a
t
e
r
sampl
e
s
we
re
minority cla
ss types, each
of 50 sampl
e
s, each sel
e
ct
50 sampl
e
s t
o
test.
Prelimina
r
y l
aboratory, al
gorithm
de
si
gn p
r
o
c
e
ss,
the Ga
uss-Newton
algo
rit
h
m ad
d
three p
a
rame
ters in
order,
w
I
,
,
, The formati
on of iterativ
e formul
a of weig
hted G
a
uss-
Ne
wton alg
o
ri
thm, algorith
m
improvem
e
n
t
proc
es
s,
re
sult
s
sho
w
be
low.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Based o
n
We
ighted Ga
uss-Ne
wton
Neu
r
al Net
w
o
r
k A
l
gorithm
for Uneven Fo
re
stry… (Yu Che
n
)
4097
Figure 1. W
e
i
ghted Ga
uss-Ne
wton N
eural Network Compa
r
ison Chart
Revised Ga
u
s
s-Newto
n
-1 stand
s for alg
o
rithm ad
d the first parame
t
er
I
, using formula
20
to
ite
r
ate. Revised Gau
s
s-Newto
n
-2 rep
r
e
s
ent
s
th
e first time
on
the b
a
si
s
of
para
m
eters h
a
ve
been
ad
ded, add a se
con
d
pa
ram
e
ter
α
, usin
g fo
rmula
21
to ite
r
ate.
Wei
ghted
G
auss-Ne
wto
n
algorith
m
rep
r
esents the th
ird p
a
ramete
r wei
ght mat
r
i
x
w
be
add
ed,
usin
g formula
24 to
iterate,
the final
re
sults, with
three p
a
ra
mete
rs gradu
a
lly
add i
n
to the
algorith
m
, th
e rate
of
co
rrect
cla
ssifi
cation
improve g
r
a
d
ually, with se
con
d
pa
ra
meters
, the
c
o
rrec
t rate increas
e
d marginally,
therefo
r
e, ad
ding third p
a
rameters, the weig
hted
Ga
uss-Ne
wton
algorith
m
wit
h
third pa
ram
e
ters
to improve th
e accu
ra
cy of unev
en of fo
rest
ry inform
ation cla
s
sification, durin
g
the experim
e
n
t,
comp
are wei
ghted Ga
uss-Ne
wton ne
ural netwo
rk al
gorithm
with comm
only used cla
s
sificati
o
n
algorith
m
of uneven fore
stry information
text.
Uneve
n
initial training sam
p
le dimen
s
io
n of feature matrix is
1127
3100
, the
initial test
sampl
e
dim
ensi
on of feature mat
r
i
x
is
1127
250
, these two matrices form a
new
dimen
s
ion
a
lity redu
ction
dimen
s
ion
213
3100
an
d
213
250
of feature
matrix. Usi
n
g four
method
s of weighted Ga
uss-Ne
wt
on ne
ural networks, decisio
n trees, Bayesian,
suppo
rt vector
machi
ne cl
as
sif
i
cat
i
o
n
,
sel
e
ct
t
he sa
me
t
r
aining a
nd t
e
st
sa
mple
s,
t
e
st
re
sult
s a
r
e sho
w
n b
e
lo
w
,
absci
ssa is test sampl
e
s
categ
o
ry
of uneven fore
stry information te
xt,
the vertical axis is the
corre
c
t rate o
f
each type o
f
sample
cla
s
sificatio
n
.1 re
pre
s
ent
s for t
y
pe of flowers, 2 rep
r
e
s
e
n
ts
for type of trees, 3 re
pre
s
e
n
ts for type of inse
ct
s, 4 an
d 5 rep
r
e
s
ent
for type of soil and wate
r
.
Figure 2. Fou
r
Cla
ssifi
catio
n
Algorithm
s for Un
even Fo
rest
ry Inform
ation
T
e
xt Cla
ssifi
cation
A
ccu
ra
cy
S
c
h
e
mat
i
c
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 5, May 2014: 4091 – 41
00
4098
In Figure 2, result
s of une
ven forest
ry inform
atio
n text classifie
r
s,
in deci
s
ion trees an
d
B
a
y
e
sian
cla
ssif
i
cat
i
on al
g
o
rit
h
m t
he mi
norit
y
cla
s
s s
a
mple
cla
ssif
i
cat
i
on a
c
cu
ra
cy
rat
e
i
s
low
,
Bayesian
for the tre
e
sa
mple
recogni
tion rate
is v
e
ry lo
w . Su
pport ve
cto
r
machi
ne
cla
s
sifier
cla
ssif
i
cat
i
on
ac
cur
a
cy
f
o
r mino
rit
y
cl
as
s
is almo
st
ze
ro. Wei
ghted
G
a
u
s
s-Ne
wton neu
ral
netwo
rk cl
assificatio
n
a
c
cura
cy
for
min
o
rity cla
s
s
ca
n re
ach 10
0
%
, and for th
e majo
rity of
the
cla
ss
cla
ssif
i
c
a
t
i
on ac
cu
ra
c
y
is also hig
h
.
The ab
ove chart reflect di
fferent aspe
cts of
cla
s
sifier perfo
rman
ce
, in orde
r to
measure
cla
ssif
i
cat
i
on
perf
o
rman
c
e
of
cla
s
sif
i
er mor
e
com
p
reh
e
n
s
ive, highlightin
g the impo
rtan
ce of
minority cla
s
s in the cla
s
sificatio
n
pro
c
e
ss,
u
s
ing t
he com
p
rehe
nsive index j
udgme
n
ts : F-
measure an
d
G-me
an.
Dat
a
will b
e
divid
ed into
t
w
o
categori
e
s, th
e
majority
cla
s
s sampl
e
s
and
the minority class sam
p
le
s.
In the follo
wi
ng table, X
1
rep
r
e
s
ent
s m
a
jority
cl
ass i
n
re
ality and
judgme
n
t is
majority
cla
ss, X2 re
pre
s
ent
s maj
o
rity class in
realit
y but judgme
n
t is minority cla
s
s, X3 represents
minority
cla
s
s in
reality an
d
judgm
ent i
s
minority
cla
s
s, X4 re
presen
ts min
o
rity cl
a
s
s in
reality b
u
t
judgme
n
t is majority cla
ss.
T
abl
e 2.
The
Mixed Matrix of
T
e
st Sampl
e
Set of Four Cla
ssifi
cation
Algorithm
W
e
ighted G-N
Decision tree
Ba
y
e
sian
SVM
X1
150
150
150
150
X2 0
0
0
0
X3 100
66
56
1
X4 0
34
44
99
Figure 3. Cha
nge Figu
re of
the Majority
Cla
ss a
nd Mi
nority Cla
ss
A
c
cura
cy
Figure 3 sho
w
s, a
s
the sample in
crea
se
s,
the maj
o
rity cla
ss a
n
d
minority cl
ass tren
d
cha
nge
s of
correct
rate, the weighte
d
Gau
s
s-Ne
wto
n
neu
ral n
e
twork
with mi
nority incre
a
se in
accuracy of the sample
doe
s not change and
al
ways maintai
n
at 100%.
De
cisi
on tre
e
,
B
a
y
e
sian,
S
V
M
as t
he sam
p
le incr
ea
se
s
,
t
he corre
ct
cla
ssif
i
cat
i
on
rat
e
of
minori
t
y
class sh
ow
a
decrea
s
in
g trend, majo
rity cla
ss
cla
ssifi
cati
on re
sults
of four cla
s
sifiers a
r
e b
e
tter
.
T
abl
e 3. Com
p
reh
e
n
s
ive Cl
assificatio
n
Ef
fect of Four
Kinds of Cla
s
sifiers
Precision
G-mea
n
F-measur
e
w
e
ight
ed G-N
1
1
1
decision
tree 1
0.81 0.795
Ba
y
e
sian 1
0.75
0.72
SVM 1
0.1
0.02
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
2302-4
046
Based o
n
We
ighted Ga
uss-Ne
wton
Neu
r
al Net
w
o
r
k A
l
gorithm
for Uneven Fo
re
stry… (Yu Che
n
)
4099
Table
3, G i
ndex
con
s
id
er the
cl
assif
i
cati
on
pe
rformance of th
e mino
rity cl
ass a
n
d
majority cl
ass sampl
e
s, t
he value
of
G alon
g wi
th
Sensitivity and Spe
c
ificity
value in [0
1]
monotoni
cally
incre
a
sin
g
,
ensure
that t
he two
can
make
a
larg
e am
ount
of both
G val
ue,
weig
hted Ga
uss-Ne
wton neural
n
e
two
r
k
cla
s
sifi
cati
on p
e
rfo
r
man
c
e i
s
goo
d.
F-mea
s
u
r
e
p
a
ys
more attention to reflec
t the c
l
a
ssifi
catio
n
effect of the minority cla
ss. In summa
ry, the weight
ed
Gau
s
s-Newto
n
neu
ral n
e
twork
cla
s
sificat
i
on pe
rform
a
nce i
s
pe
rfe
c
t, majority cla
ss
and mi
nori
t
y
cla
s
s cl
assifi
cation
a
c
cura
cy eq
uali
z
ati
on, de
ci
sion
t
r
ee, Baye
sia
n
an
d
su
ppo
rt vector ma
chine
cla
ssifi
cation
perfo
rman
ce
of the mino
rity clas
s i
s
po
or, so co
mprehen
sive me
asu
r
e F
-
me
a
s
u
r
e
is small.
It’
s kno
w
n from the experiment re
sults,
the propo
se
d weig
hted
Gau
s
s-Newto
n
neural
netwo
rk alg
o
rithm for five
kind
s of uneven forest
ry
information text classification
is preci
s
e an
d
fast, espe
cial
ly for minority class sampl
e
cla
ssifi
catio
n
accuracy is signi
ficantly highe
r than the
comm
only u
s
ed
cla
s
sification alg
o
rith
ms, the al
go
rithm corre
c
t cla
ssifi
catio
n
rate i
s
ev
enly
distributed, cl
assification ability is strong.
4. Summar
y
Based o
n
the
weighte
d
Ga
uss-Ne
wton
neural
netwo
rk for u
neven
forestry information
text classifica
tion algo
rithm
,
using the
cl
assica
l TF-I
D
F formula to
cal
c
ulate the
eigenvalu
e
s
of
text word
s, constitute the
initial text featur
e
matrix
, by the method of p
r
in
ci
pal compo
n
e
n
t
analysi
s
to re
duce the dim
ensi
onality of the featur
e
matrix, the formation of ne
w uneve
n
forestry
informatio
n text feature matrix, in resp
onse to
the feature of text.
Throug
h experim
ents
sh
ow
that the weighted Gau
s
s-Ne
wton n
eural n
e
tw
o
r
k for uneve
n
forest
ry informatio
n text
cla
ssif
i
cat
i
on
algo
rit
h
m,
minorit
y
cl
as
s
clas
sif
i
c
a
tio
n
accu
ra
cy i
s
significantl
y
highe
r tha
n
the
cla
ssi
cal
met
hod
of de
ci
sion
tre
e
cl
assificatio
n
,
Bayesian,
su
pport
vecto
r
ma
chin
es,
this
algorith
m
p
r
ovides a n
e
w
al
gorith
m
for the
st
udy of u
n
e
v
en fore
stry
inform
ation
text
cla
ssifi
cation,
has hig
h
pra
c
tical valu
e.
Ackn
o
w
l
e
dg
ements
Funda
mental
Re
se
arch
F
und
s
for
the Central Univ
ersitie
s
(DL1
2CB02
)
; Nati
onal 948
Proje
c
t (20
1
1
-
4-04);
Heilo
n
g
jiang Provin
cial
D
epa
rtm
ent of Educati
on Scie
nce
and Te
ch
nolo
g
y
Re
sea
r
ch Project (1
251
3
016)
;
P
o
std
o
c
toral F
und
of Heilongji
a
ng Province. The Heilo
ng
jiang
Province Nat
u
ral Scie
nce
Fund Proje
c
t (F20
134
7); Harbin te
chnolo
g
ical in
novation tale
nt
s
p
ec
ia
l fu
nd
p
r
o
j
ec
t (
2
0
1
3
R
F
Q
X
J
1
00
)
Referen
ces
[1]
Xi
e N
a
-na, F
a
n
g
Bin, W
u
Lei.
Stud
y of
te
xt c
a
tegor
izatio
n
o
n
imb
a
la
nce
d
data.
Co
mpute
r
Engi
ne
erin
g
and Ap
plic
atio
ns
. 2012; 6(
1): 1-4.
[2]
Dua
n
Li-g
uo, Dip e
ng, Li Ai-
P
ing. A Ne
w
Naïve Ba
yes T
e
xt Cl
assific
a
ti
on Alg
o
rithm.
TELKOMNIKA
Indon
esi
an Jou
r
nal of Electric
al Eng
i
ne
eri
n
g
.
2014; 1
2
(2): 9
47-9
52.
[3]
Pei-
yin
g
Z
H
A
N
G. A Ho
w
N
et-bas
ed S
e
mantic R
e
late
dness K
e
rn
el
for T
e
xt Classificati
on.
T
E
LKOMNIKA Indon
esi
an Jou
r
nal of Electric
al Eng
i
ne
eri
n
g
.
2013; 1
1
(4): 1
909-
191
5.
[4]
F
engfen
g B
a
i.
Unev
en
data s
e
ts bas
ed te
xt
classificati
on
te
chno
log
y
rese
a
r
ch.
Softw
are Devel
o
p
m
ent
and D
e
sig
n
. 20
09; 12(1
0
): 21-
29.
[5]
Jian
gli
Dua
n
.
Rese
arch of F
eature S
e
l
e
ctio
n an
d
W
e
i
ghti
ng Al
gorit
hm i
n
T
e
xt Class
ifi
c
ation S
y
stem
Based o
n
SVM
.
T
a
i
y
u
an: T
a
i
y
uan U
n
ivers
i
t
y
of technol
og
y.
201
1: 10-1
5
.
[6]
Yafei W
ang. T
e
xt C
a
tegor
iza
t
ion Metho
d
of
R
educ
ing F
e
a
t
ures. Chan
gc
hun: Ch
an
gch
un Un
iversit
y
,
201
0: 20-2
5
.
[7]
Z
u
lin Hu
a, W
e
i
Qian, Li Gu. Applic
atio
n of improv
e
d
LM-BP neur
al net
w
o
rk
in
w
a
t
e
r qu
alit
y eva
l
u
a
tion.
W
a
ter resourc
e
s protectio
n
. 200
8; 24(4): 23
-30.
[8]
De
y
u
n
C
hen, Yu
Ch
en, Lil
i
Wang, X
i
ao
y
a
ng
Y
u
. A n
o
vel Gausss-
Ne
w
t
o
n
Ima
ge R
e
constructi
o
n
Algorit
hm for Electrical
Cap
a
ci
tance T
o
mogra
p
h
y
S
y
stem.
C
h
in
ese j
ourn
a
l
of electro
n
ics
. 200
9; 37(4)
:
739-
743.
[9]
Subram
ani
an
PK, Xiu
NH. C
onver
genc
e A
n
al
ysis
of
Gaus
s-Ne
w
t
o
n
Meth
ods for
the
Co
mpleme
ntarit
y
Probl
em.
Journ
a
l of Optimi
z
a
ti
on T
heory a
nd
Appl
icatio
ns
. 1
997; 94: 7
27-7
38.
[10]
Yu Che
n
. Res
earch o
n
Inver
s
e Probl
ems S
o
lvin
g a
nd Ima
ge Rec
onstruct
i
on Al
gorit
hm F
o
r Electrica
l
Cap
a
citanc
e T
o
mogr
aph
y S
ystem. Harbin:
Harbi
n
Un
ivers
i
t
y
of Scie
nce a
nd T
e
chnol
og
y. 2010: 57-
60.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 23
02-4
046
TELKOM
NI
KA
Vol. 12, No. 5, May 2014: 4091 – 41
00
4100
[11]
Xi
ul
an C
hen, J
un W
e
i. Impr
o
v
ed co
nverg
e
n
c
e an
al
ysis
of Gauss Ne
w
t
on
alg
o
rithm step
.
Journa
l o
f
Chin
ese sci
enc
e and tec
hno
lo
gy inn
o
vati
on
. 201
2; 1(1): 110
-111.
[12]
Subram
ani
an
PK. Gauss-Ne
w
t
o
n
Meth
ods
for the C
o
mpl
e
mentarit
y Pro
b
l
e
m.
Journ
a
l of
Optimi
z
a
t
i
o
n
T
heory an
d Ap
plicati
ons
. 1
9
9
3
; 77: 467-
482.
[13]
Rub
anov
NS.
T
he la
yer-
w
i
s
e
method
an
d t
he
b
a
ck pr
op
a
gatio
n h
y
b
r
id
appr
oach
to l
e
arni
ng
a fee
d
for
w
a
r
d n
eura
l
net
w
o
rk
. IEEE Trans. Nerual Networks
. 200
0; 1(2): 295-3
0
5
.
[14]
Xi
nmin T
ao, F
u
ron
g
Li
u, Bao
x
i
a
n
g
Li
u. Une
v
en d
a
ta SVM classifi
cati
on
al
gorithm a
nd its
appl
icati
o
n
.
Harbi
n
: Hei
l
on
gjia
ng Sci
enc
e and T
e
chno
log
y
Press. 20
11: 14: 16.
Evaluation Warning : The document was created with Spire.PDF for Python.