TELKOM
NIKA
, Vol.14, No
.3, Septembe
r 2016, pp. 7
91~799
ISSN: 1693-6
930,
accredited
A
by DIKTI, De
cree No: 58/DIK
T
I/Kep/2013
DOI
:
10.12928/TELKOMNIKA.v14i3.3413
791
Re
cei
v
ed
Jan
uary 25, 201
6
;
Revi
sed
Ap
ril 20, 2016; Accepted Ma
y
6, 2016
Data Selection and Fuzzy-Rule
s
Generation for Short-
Term Load Forecastin
g Using ANFIS
M. Mustapha, M. W. Mustafa*, S. N. Khalid
F
a
cult
y
of Elec
trical en
gin
eeri
ng, univ
e
rsiti T
e
kno
l
og
i Mal
a
ysia, 813
10, Joh
o
r, Johor ba
hru
,
Mala
y
s
ia
*Corres
p
o
ndi
n
g
author, e-ma
i
l
:
w
a
z
i
r@fke.ut
m.m
y
A
b
st
r
a
ct
Th
i
s
pa
pe
r fo
cu
se
d on
d
a
t
a
a
n
a
l
ysi
s, wi
th
a
i
m
o
f
d
e
ter
m
i
n
in
g th
e
actual
vari
abl
es th
at
affect the
loa
d
consu
m
pti
on in sh
ort term el
ectric loa
d
foreca
sting. C
o
rrelati
on a
nal
ysis w
a
s used to determin
e
h
o
w
the loa
d
cons
u
m
pti
on is re
lat
ed to the forec
a
sting
var
i
ab
le
s (mod
el i
nput
s), and hyp
o
th
esis test w
a
s used
to justify t
he c
o
rrelatio
n
c
oeffic
i
ent
of e
a
ch
va
riabl
e.
T
h
re
e di
fferent mod
e
ls
base
d
on data
selecti
on
cr
iteri
a
w
here tested
usin
g Ada
p
tive
Neuro-F
u
zz
y
Inference
Syst
em (ANF
IS). Subtractive C
l
u
sterin
g (SC) and
F
u
zz
y
c-
me
ans
(F
CM) rules
g
ener
ation
al
gor
ithms w
a
r
e
co
mp
are
d
in
al
l the thre
e
mo
del
s. It w
a
s observe
d
that forec
a
stin
g us
ing
Hy
poth
e
sis test
data
w
i
th SC
a
l
g
o
rit
h
m g
a
ve
better
accur
a
cy c
o
mpare
d
to
the
ot
her
tw
o approac
he
s. But F
C
M alg
o
rith
m is faster
in al
l t
he thr
e
e
appr
oach
e
s. In conc
lusi
on, h
y
pothes
is test o
n
the corr
elati
on
coefficie
n
t of t
he
data
is a
c
o
mmen
dab
le p
r
actice
for data
selecti
o
n
an
d ana
lysis in
s
h
o
r
t-
term
load forecasting
.
Ke
y
w
ords
: sh
ort-term lo
ad fo
recastin
g, anfis
, clustering
a
l
g
o
rith
m, correl
a
tion a
nalys
is, hypothes
is test
Copy
right
©
2016 Un
ive
r
sita
s Ah
mad
Dah
l
an
. All rig
h
t
s r
ese
rved
.
1. Introduc
tion
Facto
r
s influ
e
n
cin
g
ene
rgy
con
s
um
ption
plays a
vital role in its dete
r
minatio
n. Th
ey are
the key
para
m
eters u
s
ed
to fore
ca
st ele
c
tri
c
ity
load
consumption.
Dep
endin
g
o
n
the forecast
ing
hori
z
on, the
s
e para
m
eters vary from we
ather vari
able
s
, eco
nomi
c
variabl
es, cust
omer
cla
ss a
nd
demog
ra
phic
or po
pulation
factors. Som
e
re
sea
r
ch
wo
r
k
s
re
s
t
r
i
c
t
on
tw
o
o
r
mo
re
va
r
i
a
b
l
es
, an
d
others u
s
e
d
only hi
stori
c
al lo
ad
dat
a. For
ex
a
m
ple, variati
on in th
e l
oad
con
s
u
m
ption
corre
s
p
ond
s to time of the
day, wee
k
, month or
temp
eratu
r
e of foreca
sting a
r
ea
and behavio
u
r
of custo
m
ers
towards el
ect
r
icity usa
ge [1].
The first sta
g
e
in obtainin
g
accurate lo
ad fore
ca
stin
g is pro
p
e
r
data pro
c
e
ssi
n
g
. This
gives th
e rea
s
on
why po
wer
system
organi
zation
ar
e gath
e
rin
g
t
he relevant
d
a
ta be
ca
use
of its
signifi
cant i
n
fluence in
their
bu
sine
ss a
c
tivities [
2
]. Such i
n
formatio
n received fro
m
data
pro
c
e
ssi
ng a
nd analy
s
is gi
ve a clue o
n
whi
c
h meth
o
d
to be used
or ho
w to u
s
e
it. It is also e
a
sy
to dete
r
mine
whe
n
the
con
s
umptio
n i
s
l
o
w o
r
hi
gh
in t
he lo
ad
profile, or the
rel
a
tionship
betwe
en
the co
nsumpt
ion an
d these
variable
s
. Even thou
gh,
it is difficult to
determi
ne th
e exact relati
on
betwe
en th
e l
oad
co
nsump
t
ion an
d forecasting
varia
b
l
e
s. Be
cau
s
e,
different va
ria
b
les affect th
e
load i
n
different way. The
deg
ree
of th
e effect
may
be
high
or l
o
w, o
r
even
negative [3].
As
sug
g
e
s
ted i
n
[4], the facto
r
s b
e
ing
re
du
ced, the
r
e
by
makin
g
the
m
odel
simpl
e
a
nd e
a
sy to
u
s
e,
and
sub
s
e
q
u
ently give ro
om for
determining the
a
c
tual p
a
rame
ters th
at influen
ce the l
o
ad
con
s
um
ption.
A lot of methods a
r
e u
s
e
d
to determin
e
how the
data
or varia
b
le inf
l
uen
ce the lo
ad [5-8].
H
o
w
e
ve
r
,
th
er
e
ar
e
pr
o
b
l
ems
as
so
c
i
a
t
ed
w
i
th
th
e
va
riable
s
sele
ct
ion. Firstly, there i
s
n
o
cl
ear
justifiable
re
a
s
on
in
sel
e
cti
ng
su
ch va
ria
b
les.
S
e
condl
y, the co
mmo
n p
r
a
c
tice
is
by applyin
g
t
he
histori
c
al l
o
a
d
data ove
r
a sp
ecifie
d p
e
riod
of
time
, as an
input
to their fo
re
ca
sting m
ode
ls.
Obviously, forecasting with
this data will give
an encouragi
ng accuracy. This is because the load
pattern is
the
s
a
me for
s
i
milar time frames
in mo
st ca
ses. From Fig
u
re 1, it can be obse
r
ved th
at
the daily load
pattern i
s
th
e sam
e
throu
ghout the
we
ek. Thi
s
me
a
n
s that u
s
in
g
any of the da
y to
forecast a
not
her
will give accurate re
su
lts. This
pa
pe
r investig
ates the relation
ship between t
h
e
load
co
nsum
ption a
nd th
e foreca
sting
variabl
es,
with aim
to
find out th
e
exact va
riabl
es.
Correl
ation analysi
s
and
hypot
hesis test will
be
employed to
determi
ne such relation. Also,
becau
se the
study area is surrou
nde
d
by water, whi
c
h dete
r
min
e
s
it is we
athe
r, it is difficult to
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 3, September 20
16 : 791 – 799
792
define exa
c
t
variable
s
that
will affe
ct the load
co
nsu
m
ption. Corre
l
ation an
alysi
s
an
d hypoth
e
si
s
test are u
s
ed
to determin
e
exact variabl
es that influe
nce the lo
ad
con
s
um
ption.
This p
ape
r a
ddre
s
se
s the
issu
e of red
u
cin
g
the nu
mber va
riabl
es through
correlation
analysi
s
and
hypothe
sis te
st. It also applied SC and
FCM to minimize the num
ber of rule
s a
n
d
slo
w
ne
ss of the model d
u
e
to high value
of predicto
r
s.
2. Model Dev
e
lopment
In this work,
ANFIS will be used to map the fore
ca
sting varia
b
les (model i
nputs) to
corre
s
p
ondin
g
load consu
m
ption (mo
d
e
l output). It
is aimed at g
enerating
a
model that wi
ll not
only be
famili
ar
with the
training
set,
bu
t also
be
able
to map
the t
e
st in
puts to t
he te
st outp
u
t
s.
To a
c
hieve
this it i
s
ne
ce
ssary to
dete
r
mine
t
he va
riable
s
that i
n
fluen
ce th
e lo
ad
con
s
um
ption,
and mo
del fu
nction
s that
will give a goo
d
and a
c
curate
mappin
g
bet
wee
n
the
s
e v
a
riabl
es. In th
is
work te
st of
hypothe
sis
o
n
co
rr
elated
data is u
s
ed
to sel
e
ct th
e input va
ria
b
les.
Co
rrel
a
tion
analysi
s
an
d test of hypoth
e
si
s are
di
scu
s
sed in the section b
e
lo
w.
2.1. Correla
tion Analy
s
is
and H
y
pothe
sis Test
Correl
ation is the measu
r
e of interrela
tion betwee
n
the chang
e
s
in two vari
able
s
. It
estimate
s
ho
w
cha
nge
s in on
e va
riabl
e affect
an
other. It
descri
bes the
rel
a
tion b
e
twe
en t
w
o
pairs of d
a
ta
. Correlatio
n
coeffici
ent (R)
ran
g
e
s
from -1
to +1.
Any value
within thi
s
ra
nge
determi
ne
s the rel
a
tion
shi
p
between th
e variabl
es
[
9
]. High valu
e (close to +1 or
-1) indi
cate
s
stron
g
relatio
n
shi
p
and val
ue clo
s
e to zero indi
ca
te
s low co
rrelatio
n. Zero co
rrel
ation coeffi
cient
sho
w
s that th
ere i
s
no
any
relatio
n
b
e
tween th
e
two
pair
of the
da
ta. In other word
s
kno
w
in
g
x
can
not a
ssi
st
in dete
r
mini
ng y. Co
rrel
ation
coeffici
ent R, b
e
tween two
set
of data
ca
n
be
cal
c
ulate
d
usi
ng the followi
ng formul
a:
cov
,
xy
xy
x
y
R
(
1
)
Whe
r
e
cov
,
x
y
is the pop
ulatio
n cova
rian
ce
,
x
and
y
are t
he pop
ulatio
n individu
a
l
stand
ard
dev
iation. If the
input ve
ctor i
s
X
[
X
r1
X
r2
X
r3
………X
rc
], the corre
s
pondi
ng
outp
u
t
vec
t
or
is
Y
r
. Where
r
do
na
te
s
r
o
w
s
a
n
d
c
don
ates colum
n
s. The
output
h
a
s only
one
colu
mn
with a num
b
e
r of ro
ws. Correl
ation a
nalysi
s
is
ca
rrie
d
out u
s
i
ng equ
ation
(1) b
e
twe
en
any
element of
X
rc
(
x
) a
nd
co
rresp
ondi
ng el
ement of
Y
r
(
y
). T
h
us results in
c
-numb
e
r of
co
rrelati
o
n
coeffici
ents.
Hypothe
si
s test is a m
e
th
od used at t
he dat
a a
nal
ysis
stage of
comp
arative
analysi
s
betwe
en a se
t of experimental data. The
purpo
se i
s
to determine th
e significan
c
e
of an empirical
analysi
s
u
s
in
g
p-valu
e
. Th
is is th
e sm
al
lest level of significa
nce that will lead to
accepta
n
ce o
r
reje
ction of the null hypo
thesi
s
. It is the ar
e
a
sha
ded aroun
d the two tail e
nds of no
rm
al
distrib
u
tion
curve. Th
e first step i
s
by
setting a
n
u
ll h
y
pothesi
s
a
n
d
alter
native hypothe
sis o
n
the
observation, followe
d
by pre
-
setting
a
n
α
-v
alue
. T
hen
com
pute
the
p-
va
lu
es
an
d d
edu
ce
c
o
nc
lus
i
on. From [9],
p-val
u
e
is comput
ed usi
ng the relation:
(
2
)
Whe
r
e
Z
i
i
s
the
Z-val
u
e
corre
s
p
ondin
g
to o
ne
sid
e
of the
hyp
o
thesi
s
and
Z
j
is the
Z-va
l
u
e
corre
s
p
ond
s to the other
side.
For two
-
sided
hypothesi
s
(Z
i
and Z
j
).
1
(
3
)
Whe
n
thi
s
val
ue i
s
eq
ual
s t
o
or sm
alle
r than th
e
α
-val
ue
the
ob
se
rvation is si
gnifi
cant, a
nd the
r
e
fore acce
pted
again
s
t the null hypothe
sis.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Data Sele
ctio
n and Fu
zzy-Rule
s Ge
neration for
Short-Term
Load
Fore
ca
sting
…
(M. Musta
pha)
793
2.2. Adap
tiv
e
Neur
o Fuzz
y
Inferenc
e Sy
stem (ANFIS)
ANFIS (devel
oped
by J. S. Rog
e
r in
19
93)
com
b
ine
s
the advanta
ges
of fuzzy system
s
and n
e
u
r
al-n
etworks [10].
Is a
network-b
ased
stru
cture
(figu
r
e 2)
that uses the
Suge
no-t
y
pe
‘IF…..THEN’
rule
s an
d Ne
ural
Network (NN). It us
e
s
hybri
d
learning in which
the con
s
e
q
u
ent
para
m
eters are determi
n
ed
by
Le
ast
Square Alg
o
rithm
(LSA) in
the forward
pass. In
the
backward pa
ss e
r
ror me
asu
r
e
s
are dissemin
ated
backward t
h
rou
gh every node, and
the
premi
s
e p
a
ra
meters (p
ara
m
eters asso
ciated wi
th the
membe
r
ship
function
) are
updated u
s
i
n
g
Gradi
ent
De
cent Algo
rithm
(G
DA).
Figu
re 2
sho
w
s
a t
y
pical A
N
FIS
stru
cture
with
only t
w
o i
npu
ts
(x and y) an
d
one outp
u
t (z). Th
e struct
ure
con
s
i
s
ts
of five layers
with seve
ral n
ode
s (d
epen
d
i
ng
on the n
u
mbe
r
of input va
ri
able
s
and li
n
guisti
c
varia
b
l
e
s). F
o
r a
dat
a point
xi
, ANFIS c
o
mput
es
a
corre
s
p
ondin
g
output
y
i
.
For a first ord
e
r Suge
no-ty
pe fuzzy syst
em
with only two inp
u
ts, the rule
s are:
1. If
x
is A
1
and
y
is B
1
, then
11
1
1
f
px
q
y
r
2. If
x
is A
2
and
y
is B
2
, then
22
2
2
f
px
q
y
r
Whe
r
e
i
p
,
i
q
and
i
r
are the
con
s
e
quent pa
ram
e
ters.
0
1
0
2
0
3
0
4
0
5
0
6
0
7
0
8
0
9
0
100
110
120
130
14
0
1
50
16
0
1
70
180
80
0
90
0
10
00
11
00
12
00
13
00
14
00
H
our
l
y
Lo
ad
C
o
ns
u
m
pt
i
on (
M
W
)
T
i
m
e
(h
rs
)
Ho
ur
l
y
Lo
ad
C
o
n
s
u
m
pt
i
o
n
Figure 1. Hou
r
ly load pattern for seven d
a
ys
(5th to 11th
May, 2014)
Figure 2. Typical ANFIS structure
If
j
i
o
is the outp
u
t of node
i
in layer
j
, the function of ea
ch no
de can
be explain
ed
from nod
e to
node o
n
layer basi
s
:
Layer
1: Each nod
e in thi
s
layer i
s
a
n
a
daptive no
de,
who
s
e
outp
u
t
is dete
r
min
e
d
by the
membe
r
ship
function
(MF
)
in that n
ode.
The M
F
fuzzify the inp
u
t variabl
e
x
i
in
that nod
e. F
o
r
node A
1
the o
u
tput is given
by:
1
i
A
ii
µ
o
x
(
4
)
Whe
r
e
x
i
s
a
n
input to
no
de ‘
i
’ a
nd
A
i
is ling
u
isti
c l
e
vel a
sso
ciat
ed
with this
node
i
A
µ
is the
membe
r
ship f
unctio
n
(MF)
of
A
. gene
rall
y,
j
i
o
is the m
e
m
bership
grad
of a fu
zzy
set
A
w
h
ic
h ca
n
be any MF su
ch a
s
the gua
ssi
an MF of Equation (5):
2
1
2
i
i
i
xc
Ai
µx
e
(5)
Her
e
i
c
and
i
are the premi
s
e
param
eters of this memb
ership fun
c
tio
n
.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 3, September 20
16 : 791 – 799
794
Layer 2:
Every node of thi
s
layer is
a fixed no
de. Its
o
u
tput is the fi
ring strength
o
f
all the
sign
als e
n
teri
ng the nod
e from the p
r
evious laye
r. Thus:
2
....
....
ii
i
i
iA
B
C
ow
µ
µ
µ
(
6
)
Layer 3: Thi
s
is normali
za
tion layer. Th
e out
put of e
a
ch n
ode
here is the ratio
of the
node’
s fairin
g
strengt
h to the sum of all the
firing st
ren
g
ths of the other no
de
s, thus:
3
12
.....
i
ii
w
ow
ww
(
7
)
Layer 4: Outp
ut of each no
de in this laye
r is:
4
()
ii
ii
i
i
i
ow
f
w
p
x
q
y
r
(
8
)
Layer 5: In this layer, the o
n
ly output no
de
will sum u
p
all the outp
u
t signal
s of layer 4,
thus:
5
i
ii
i
ow
f
(
9
)
This give
s the overall outp
u
t,
z.
2.3. Fuzzy
Rules Gene
ration
A Sugeno-ty
pe, “IF…T
H
EN” Fu
zzy rul
e
s is a pa
ra
meter ide
n
tification p
r
obl
e
m
which
requi
re me
m
bership fun
c
ti
on tuning. A
m
ong the me
t
hod
s pre
s
e
n
ted to determi
ne the fuzzy rules
we
will focu
s
on FCM and
subtractin
g cl
usteri
ng. Fu
zzy clu
s
teri
ng
is u
s
ed to
cla
ssify data in t
o
different g
r
ou
ps b
a
sed on
numbe
r of clusters. A da
ta sampl
e
ca
n be in a
nu
mber
of clu
s
t
e
r
groups which are identified by
thei
r degree
of membership
[11]. This
will reduce
the
comp
utationa
l burde
n on th
e system.
2.3.1. SC Method
This is a m
e
thod develo
p
ed by Chiu t
o
i
dentify fuzzy model
s [12]. This method is
introdu
ce
d to
identify, natu
r
ally, a g
r
o
u
p
of data
that will
rep
r
e
s
ent
the
g
ene
ral behavio
ur of the
system. It i
s
aimed
at red
u
cin
g
the
co
mputati
onal
b
u
rde
n
in
larg
e data
sy
ste
m
s. Depe
ndi
ng o
n
the nature of the probl
em
, the algorith
m
involv
es computing d
e
n
sity mea
s
ure of every data
point. Thus, f
o
r every data
point
x
i,
the d
ensity mea
s
u
r
e is given by
:
2
2
1
exp
(2
)
n
ij
i
j
a
xx
D
r
(
1
0
)
W
h
er
e
r
a
i
s
t
he
radi
us tha
t
determi
ne
s
the nei
ghb
ou
rhoo
d
aro
und
the
data
ce
n
t
re, an
d
n
is t
he
numbe
r of da
ta points. Accordin
g to [12]
0.15
0.3
a
r
, therefore 0.
2 is ch
oo
sen i
n
this wo
rk.
For all the three cases initial
cluste
r is selecte
d
base
d
on t
he data
point with the highe
st
dens
i
ty. Now if
x
c1
i
s
that
clu
s
ter centre with
the
d
ensity me
asu
r
e
D
c1
, then t
he n
e
xt den
sity
measure for anothe
r data
point
x
i
can b
e
dedu
ce
d fro
m
the relation
.
2
1
1
2
ex
p
(2
)
ic
ii
c
b
xx
DD
D
r
(
1
1
)
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Data Sele
ctio
n and Fu
zzy-Rule
s Ge
neration for
Short-Term
Load
Fore
ca
sting
…
(M. Musta
pha)
795
We ma
ke
r
b
greate
r
than
r
a
to dis
p
erse the c
l
us
ters
.
Typic
a
lly
r
b
=
1.
5
r
a
[1
2
]. Sa
me
p
r
oc
ed
ure
is
followe
d to determin
e
othe
r clu
s
ter
cent
res until a sufficient num
be
r is rea
c
h
ed.
2.3.2.
Fuzzy
C-M
eans Clu
s
tering (FCM
)
Like
the S
C
,
FCM [1
3] is u
s
ed
to
redu
ce the
model
compl
e
xity throug
h
red
u
cti
on in
the
numbe
r of m
e
mbe
r
ship fu
nction
s. F
C
M
use
s
fu
zzy p
a
rtitioning i
n
whi
c
h a
pa
rticula
r
d
a
ta po
int
belon
gs to a
numbe
r
of
clu
s
ters
with
different
de
gre
e
of memb
ershi
p
. The
alg
o
rit
h
m i
s
b
a
sed
on
optimizatio
n of basi
c
c-me
ans o
b
je
ctive function of eq
uation (1
2):
2
11
,
cn
m
mi
k
j
i
ij
J
UC
c
x
(
1
2
)
Subject to:
1
1
c
ij
i
,
1
in
,
1
j
c
n
is the numbe
r of data vectors an
d
c
is the numbe
r of clu
s
ters,
1
m
is u
s
ed to adju
s
t
the
weig
hts asso
ciated with m
e
mbe
r
ship function.
U
is the fuzzy partition
ma
tr
ix w
h
ic
h c
o
n
t
a
i
ns
th
e
membe
r
ship
of each featu
r
e vector for e
a
ch
clu
s
ter. T
he clu
s
ter m
a
trix is given by:
12
3
,
,
,
.
...
...
,
,
c
Cc
c
c
c
(
1
3
)
Whe
r
e
i
c
is a clu
s
ter
centre of the fuzzy g
r
o
up and
can b
e
comp
uted u
s
ing:
11
;
nn
mm
ii
j
j
i
j
jj
cx
1
ic
(
1
4
)
can b
e
obtain
using the foll
owin
g equati
on:
2
1
1
1
,
/
ji
c
m
ij
j
k
k
dd
for
i
ij
j
dc
x
(
1
5
)
FCM is a
n
iterative algo
ri
thm which in
volv
es few to
compute the
cluste
r ce
ntres an
d
member s
h
ip func
tions
.
Step 1:
We initiali
zed
the membership ma
trix randomly within the interval [0,1]
Step 2:
Equation (14) is use
d
to co
mpute the clu
s
ter
centres.
Step
3:
Equation (1
2) is used in co
mputing the
cost function a
nd final value
was obtai
ne
d after
there is n
o
si
gnifica
nt cha
nge in the iteration.
Step 4:
Usi
ng the cl
u
s
ter
centres n
e
w
j
i
is comp
uted.
Step 5:
Rep
eat step
s 2 to 4.
Both the algo
rithms d
e
termined the cl
uster
cent
re
s and sub
s
e
q
uently the membe
r
ship
matrix. This redu
ced
the n
u
mbe
r
of m
e
mbershi
p
fun
c
tion
s a
nd th
erefo
r
e
sp
ee
ds th
e al
gorit
hm
with the req
u
i
r
ed a
c
curacy.
3. Short-T
er
m Load Fore
castin
g Implementa
tion
Thre
e differe
nt data set
s
from Nova Scotia r
egi
on are use
d
in the
forecastin
g. It is the
smalle
st prov
ince in Cana
da, not more
than 67
km from the oce
an. Therefore
the weathe
r is
being
controll
ed by the ocean.
This is m
a
ke it difficult to defi
ne exact variabl
es t
hat will affect the
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 3, September 20
16 : 791 – 799
796
load con
s
um
ption. Out of the four sea
s
on, data
col
l
ected in the
spri
ng is
con
s
ide
r
ed fo
r this
work. Sp
ring
start
s
fro
m
m
i
ddle
of Ma
rch to th
e
Midd
le of
Jun
e
. Data of three
days
(Tu
e
sd
ay,
Wed
n
e
s
day
and Th
ursda
y
) in each we
ek, from 18
th
of March to 1
0
th
of June 2
014 is u
s
e
d
. First
nine we
eks fo
r training a
nd
last four wee
ks for te
sting.
Meaning that
two months
data for traini
ng
and first we
e
k
of next mo
nth for testin
g. The data
i
s
cl
assified b
a
se on
the correlation an
alysis
and hyp
o
the
s
is test. Th
e fi
rst
cla
ss
of d
a
ta co
mpri
se
d all the
available d
a
ta coll
ected f
r
om th
e
utility compa
n
y and the
weather statio
n
.
This
cove
rs load con
s
um
ption,
tempe
r
ature,
d
e
w po
int,
relative humi
d
ity, wind sp
eed an
d win
d
dire
cti
on. Secon
d
cla
ss comp
rised o
f
data extracted
from the fi
rst
cla
ss. It i
s
built ba
sed
on the
co
rrel
ation an
alysi
s
. Correlatio
n co
efficient
is
comp
uted for every variabl
e, any one wi
th value less
than 0.5 are rejecte
d
from t
he list. The la
st
set is that of the data ba
se
d on
the test of hypothesi
s
result
s on
the correlatio
n coeffici
ent. Fixed
signifi
can
c
e v
a
lue test is u
s
ed in te
sting
the si
gnifica
nce of the
co
rrel
a
tion on t
he data. All data
with
p-
va
lu
e
less
than the
α
-value
a
r
e
also
rej
e
cte
d
. The
data
u
s
ed i
s
p
r
e
s
ent
ed in
table
1
.
Becau
s
e th
e the load
con
s
umption i
s
si
milar th
roug
h
out the we
ek days, Tue
s
d
a
y, Wedn
esd
a
y
and Thu
r
sday
data is co
nsi
dere
d
the foreca
sting.
3.1. Correla
tion Coe
fficie
n
t and Te
st
of H
y
pothesi
s
Gene
rally, d
e
t
erminin
g
the
actu
al va
ria
b
les u
s
ing
correlation
an
alysis an
d hy
pothe
si
s
test to validate the co
rrelat
ion is a g
ood
pra
c
tice
. In this stu
d
y, we
used
co
rrela
t
ion analysi
s
to
determi
ne th
e rel
a
tion
ship
between
the
load
c
onsu
m
ption a
nd t
he lo
ad fo
re
casting
varia
b
l
e
s.
Hypothe
si
s test is used to justify the correlation
coefficients of every
variable ag
ai
nst the load.
Followi
ng co
mputing the
correlatio
n coeffici
ent
R
, b
e
twee
n the load con
s
um
ption and t
h
e
forecastin
g variabl
e, we then define th
e null hypoth
e
si
s
0
H
and the alternative hy
pothe
sis
1
H
.
No
w let:
H
0
= the correlation bet
we
en the load
consumpti
on a
nd the fore
ca
sting varia
b
le
is by
rand
om chan
ce
H
1
= the
co
rrelation b
e
twe
en the lo
ad
consum
ption
a
nd the foreca
sting vari
able
is not
by rando
m ch
ance.
For this purpose, Z-test
will is used
to
determine the P-value of each forecasting
variable
u
s
in
g eq
uation
(2
). No
w
we
set the
α
-valu
e
t
o
a fixed
sig
n
i
f
ican
ce valu
e
of 0.000
5. If the
comp
uted P-value is
gre
a
ter than thi
s
val
ue th
e
null hypoth
e
si
s will
be
reje
cted. A
n
d
sub
s
e
que
ntly reje
ct the variabl
e in th
e
fore
ca
sting
.
Table 1 sh
ows the com
puted
R
and P-
values a
nd th
e dedu
ce
d co
nclu
sio
n
ba
se
d on the P-va
lue.
From the te
sted varia
b
le
s som
e
are
reje
cted ba
sed on the correlation co
efficient,
becau
se their correlation
coefficient is le
ss tha
n
±4.00
0
. For the hypothe
sis te
st, some varia
b
l
e
s
are
also reje
cted
be
cau
s
e
their P
-
value
s
a
r
e
more t
han th
e fixed
sig
n
ifica
n
ce
value (
α
-v
alu
e
).
Therefore the
i
r co
rrel
a
tion i
s
by rand
om
cha
n
ce.
Table 1. Co
rrelation Coeffici
ent
s (R) an
d Hypothe
si
s Test (p
-value
s) for T
r
aini
n
g
and
Testing Data
S/N Variables
R
p-Value
1
Curre
nt da
y hour
ly
t
e
mperat
ure
-0.6381
0.0000
2
Previous da
y
ho
url
y
tempe
r
atu
r
e
-0.6377
0.0000
3
Last two da
y
s
h
o
u
rl
y
tempe
r
atu
r
e
-0.6376
0.0000
4
Curre
nt da
y hour
ly
d
e
w
point
-0.6355
0.0000
5
Previous da
y
ho
url
y
de
w point
-0.6736
0.0000
6
Last two da
y
s
h
o
u
rl
y
de
w point
-0.6925
0.0000
7
Curre
nt da
y hour
ly
relative humidit
y
-0.2099
0.0002
8
Previous da
y
ho
url
y
relative humi
d
ity
-0.2517
0.0000
9
Last two da
y
s
h
o
u
rl
y
relative humi
d
ity
-0.6045
0.0000
10
Curre
nt da
y hour
ly
w
i
n
d
direction
0.1895
0.0008
11
Previous da
y
ho
url
y
w
i
nd directio
n
0.0095
0.8679
12
Last two da
y
s
h
o
u
rl
y
w
i
nd directio
n
0.1509
0.0076
13
Curre
nt da
y hour
ly
w
i
n
d
speed
0.5295
0.0000
14
Previous da
y
ho
url
y
w
i
nd speed
0.4221
0.0000
15
Last two da
y
s
h
o
u
rl
y
w
i
nd speed
0.1940
0.0006
16
Previous da
y
ho
url
y
Load
0.8696
0.0000
17
Last two da
y
s
h
o
u
rl
y
Load
0.7967
0.0000
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Data Sele
ctio
n and Fu
zzy-Rule
s Ge
neration for
Short-Term
Load
Fore
ca
sting
…
(M. Musta
pha)
797
3.2. Load Fo
recas
ting Us
ing the Dev
e
loped Model
As stated in section 5, thre
e sets of dat
a
obtained fro
m
the correla
t
ion analysi
s
and t
h
e
test of hypothesi
s
are used. Ca
se 1:
in which all
the data sets will be used without the
correl
ation
a
nalysi
s
a
n
d
h
y
pothesi
s
te
st.
This
generates a 216 b
y
18 data points. Case 2: in
whi
c
h varia
b
l
e
s with
co
rrel
a
tion co
effici
ents g
r
eat
e
r
than ±4 are
consi
dered, thus, 11 vari
abl
es
are
sele
cted
from that of
ca
se 1.
Ca
se
3: in
whi
c
h t
he varia
b
le
s
with si
gnifica
nce l
e
vel ab
o
v
e
0.0005 are consi
dered.
T
hus,
1
4
varia
b
les are sel
e
ct
ed fro
m
tha
t
of case 1. ANFIS is a
ppli
ed to
forecast the l
oad u
s
ing the
data of these three
Ca
se
s, in ord
e
r to
comp
are thei
r accu
ra
cy. Also,
SC and F
C
M
will be u
s
ed t
o
gene
rate th
e fuzzy rules.
ANFIS was
use
d
to forecast the load
usin
g these t
h
ree
ca
se scenari
o
s. It is cho
s
e
n
becau
se it
ha
s the
a
d
vanta
ge of
end
uri
n
g the
un
ce
rta
i
nties i
n
a
la
rge n
o
isy
data
,
and
it is go
od
in pattern le
arnin
g
a
nd
comp
utationa
lly effe
ctive
[14, 15]. To
simplify the
mathem
atical
difficulties, two techniq
u
e
s
for numbe
r of fuzzy ru
le
s redu
ction are con
s
id
ere
d
. The first model
is
usin
g FCM a
nd the se
con
d
is usin
g SC. Not onl
y the accura
cy, these meth
od
s also, improv
ed
the fore
ca
sti
ng time. Fo
r
both the t
w
o
method
s
a
n
d
the three
case
s the
re
sults obtai
ned
are
descri
bed in
se
ction 4.
The th
ree
e
x
perime
n
ts
(ca
s
e
s
)
wa
re
ca
rrie
d
o
u
t
in win
d
o
w
s
8.1, 64 bit
Operating
System com
puter, with core i5 @ 1.7
0
GHZ
spe
e
d
and 8GB RAM. The results obtaine
d
are
belo
w
, and in
Table 2.
Figure 3. Plot of actual and
foreca
sted lo
ad
usin
g SC for
all data
Figure 4. Plot of actual and
foreca
sted lo
ad
usin
g FCM fo
r all data
10
2
0
30
40
5
0
60
7
0
80
90
1
0
0
80
0
90
0
10
00
11
00
12
00
13
00
14
00
Ac
tual
Load
F
o
r
e
c
a
s
t
ed
Loa
d
T
i
me
(h
rs)
A
c
t
ual
Load (M
W
)
80
0
90
0
10
00
11
00
12
00
13
00
14
00
For
e
cas
t
ed
Loa
d (
M
W
)
Figure 5. Plot of actual and
foreca
sted lo
ad
usin
g SC for
correl
ation da
ta
Figure 6. Plot of actual and
foreca
sted lo
ad
usin
g FCM fo
r co
rrel
a
tion d
a
ta
4. Results Di
scussio
n
For
all the th
ree
ca
se
s M
ean Sq
uare
Erro
r
(MSE),
Root M
ean
s
Square Error (RMSE
)
and M
ean A
b
sol
u
te Percentage E
r
ror (MAPE) are
used
to evaluate
the accuracy of
the
experim
ent.
10
20
30
40
50
6
0
70
80
90
10
0
80
0
90
0
100
0
110
0
120
0
130
0
140
0
Ac
tu
al
Lo
a
d
F
o
r
e
c
a
s
t
ed L
oad
Time (
h
rs
)
A
c
t
ual Load
(
M
W
)
800
900
100
0
110
0
120
0
130
0
140
0
For
e
cast
ed
Load (
M
W
)
10
20
30
40
5
0
60
7
0
80
90
100
800
900
1000
1100
1200
1300
1400
A
c
tua
l
L
oad
F
o
r
e
ca
s
t
ed
Lo
a
d
T
i
me
(h
rs
)
(hr
s
)
A
c
t
ual
L
oad (
M
W
)
800
900
1000
1100
1200
1300
1400
F
o
r
e
cas
t
ed Loa
d (
M
W
)
10
2
0
30
40
50
60
7
0
80
90
1
0
0
800
900
1000
1100
1200
1300
1400
Ac
tu
al
Lo
ad
F
o
r
e
c
a
s
t
ed
Lo
ad
Ti
m
e
(h
rs
) (h
rs
)
A
c
tu
a
l
L
o
a
d
(
M
W
)
80
0
90
0
10
00
11
00
12
00
13
00
14
00
Fo
r
e
cast
e
d
Lo
ad (
M
W
)
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 16
93-6
930
TELKOM
NIKA
Vol. 14, No. 3, September 20
16 : 791 – 799
798
C
a
s
e
1
:
He
r
e
a
ll da
ta
s
e
ts
a
s
p
r
es
en
te
d
in
T
a
b
l
e
1 ar
e u
s
e
d
.
F
i
g
u
r
e
3
sh
ow
s th
e
g
r
a
ph5
of actual loa
d
and pre
d
icte
d load for SC and Fi
gu
re 4
for FCM. An
MSE of 1785.95, RMSE of
42.26 and
MAPE of 3.30% are
obt
ained. For F
u
zzy c-means
algorithm Figures 5
and
6
respe
c
tively, sho
w
the
plo
t
of actual
a
nd fo
recaste
d
load
and f
o
re
ca
sting e
r
ror
with MSE
of
1734.40, RM
SE of 41.65 and MAPE of
3.45%.
Ca
se 2: Here
the data is
selecte
d
ba
se
d on
the results of the co
rrel
a
tion coef
ficients.
Out of th
e 1
7
tested va
ria
b
l
e
s, 6
a
r
e
rej
e
cted
be
cau
s
e
their correlati
on
coeffici
ent
s a
r
e
le
ss tha
n
±4.000
(refer
to Table 1). F
i
gure 5
sh
ows the plot
of actual lo
ad a
nd predicte
d
l
oad for SC
a
nd
Figure 6 for
FCM. For S
C
M
SE of 1140.13, RMSE
of 33.77
and
MAPE of 3.01% are obtai
ned.
For F
C
M algo
rithm an MSE
of 1328.95, RMSE of
36.45 and MAP
E
of 3.15% were obtai
ned.
Ca
sa 3: T
he
data u
s
ed i
n
this Case is
sele
cted
ba
sed on
the results of the
co
rrel
a
tion
coeffici
ents a
nd hyp
o
the
s
i
s
te
st (refe
r
t
o
Ta
ble
1).
F
i
gure
7
sho
w
s the
pl
ot of
actual
loa
d
a
nd
predi
cted lo
a
d
for SC an
d
Figure 8 fo
r FCM.
An MSE of 613.89, RMSE of 24.78 and MAP
E
of
2.19% a
r
e
o
b
tained
for S
C
. Fo
r the
F
C
M a
n
MSE
of 81
6.34,
RMSE of 2
8
.57 a
nd MAP
E
of
2.23% we
re o
b
tained.
Also, Tabl
e 2
sho
w
s the n
u
mbe
r
of rule
s ge
nerated
and the
sp
ee
d of co
rg
en
ce in ea
ch
ca
se. F
o
r SC algo
rithm,
case
1
produ
ce hig
h
e
s
t n
u
m
ber of
rule
s, and
case 2
prod
uce
lowe
st.
For F
C
M the
numbe
r of rul
e
s a
r
e same
and lo
wer tha
n
that of SC in all the ca
se
s. Also hig
h
e
s
t
spe
ed is reco
rded in
ca
se
1
usin
g SC a
nd FCM
re
corded the lo
we
st spe
ed in all
the case.
0
2
04
06
08
0
1
0
0
80
0
90
0
100
0
110
0
120
0
130
0
140
0
A
c
tual
Load
For
e
c
a
s
t
ed Lo
ad
T
i
me
(h
rs
)
Act
u
a
l
Lo
ad (
M
W
)
80
0
90
0
1
000
1
100
1
200
1
300
1
400
Fo
r
e
c
a
s
t
e
d
Loa
d (
M
W
)
0
2
04
06
0
8
0
1
0
0
80
0
90
0
10
00
11
00
12
00
13
00
Ac
tuald Load
F
o
rec
a
s
t
ed Load
Ti
m
e
(
h
rs)
A
c
tu
al
d Loa
d (
M
W
)
80
0
90
0
10
0
0
11
0
0
12
0
0
13
0
0
Fo
r
e
c
a
s
t
e
d
Loa
d (
M
W
)
Figure 7. Plot of actual and
foreca
sted L
oad
us
ing SC for
hypothes
i
s
data
Figure 8. Plot of actual and
foreca
sted L
oad
usin
g FCM fo
r hypothe
sis
data
Table 2. Perf
orma
nce eval
uation an
d co
mpari
s
o
n
of the three
ca
se
s
Data used
Fuzz
y
rules
generation
method
Number of
Fuzz
y
rules generate
d
Error m
easurem
ent
Com
putation tim
e
(sec)
MSE RMSE
MAPE
(%)
Case 1
FCM
15
1734.40
41.65
3.45
27.65
SC 184
1785.95
42.26
3.30
12258.83
Case 2
FCM
15
1328.95
36.45
3.15
22.03
SC 68
1140.13
33.77
3.01
472.30
Case 3
FCM
15
816.34
28.57
2.23
29.56
SC 107
613.89
24.78
2.19
2195.33
5. Conclusio
n
This work in
vestigate
s
the effect of data
sele
ction,
based o
n
statistical an
al
ysis for
sho
r
t-te
rm lo
ad fo
re
ca
sting. Two ANFIS rule
ge
ne
ra
tion alg
o
rithm
are te
sted
b
a
se
d o
n
a
c
cu
racy
and
spee
d. It wa
s ob
se
rve
d
that SC giv
e
s mo
re
accurate
re
sults,
with low
sp
e
ed compa
r
ed
to
FCM. Th
ree
ca
se sce
n
a
r
ios
are i
n
ve
stigated b
a
sed on the d
a
ta use
d
. In Case 1: all
the
available d
a
ta whi
c
h com
p
rised
sevent
een sets of
variable
s
is
use
d
. In Case 2: eleven
sets
variable
s
are
used ba
sed
on
thei
r co
rre
lation coeffici
ents. In
Ca
se
3: thirtee
n
sets of va
riabl
e
s
are con
s
ide
r
ed based on
hypothe
sis te
st on the co
rrelation co
efficient
s. There
f
ore, hypothe
sis
test on the co
rrel
a
tion an
alysis of the foreca
sting d
a
ta
is a good p
r
a
c
tice.
Evaluation Warning : The document was created with Spire.PDF for Python.
TELKOM
NIKA
ISSN:
1693-6
930
Data Sele
ctio
n and Fu
zzy-Rule
s Ge
neration for
Short-Term
Load
Fore
ca
sting
…
(M. Musta
pha)
799
Ackn
o
w
l
e
dg
ements
The a
u
tho
r
s ackn
owle
dg
e Malay
s
ian
Minist
ry of highe
r Ed
u
c
ation
and
Universiti
Tekn
ologi M
a
laysia for sup
porting thi
s
work.
Referen
ces
[1]
A Jain, B
Satish.
Cluster
in
g
base
d
Sh
ort T
e
rm
Lo
ad F
o
r
e
casting
usi
ng
Supp
ort Vector
Machi
nes.
In
Proceeding of 2009 IEEE Bucahr
est Po
w
e
r Tech. 2009: 1-8.
[2]
A Chakrav
o
rt
y
,
C Rong, P Ev
ense
n
, T
W
l
odarcz
y
k
, W
i
ktor.
A Distributed
Gaussia
n
-Mea
n
s Cluster
in
g
Algorit
h
m
for
F
o
recastin
g D
o
mestic En
erg
y
Usag
e. Inter
natio
nal
Co
nfe
r
ence
on Sm
a
r
t Computi
ng.
201
4: 229-
236.
[3]
S Mirasg
edis,
Y Sarafi
dis,
E Georg
o
p
o
u
lo
u,
DP Lalas, M Moschovits, F Karagiannis, D
Papak
onsta
nti
nou. M
o
d
e
ls
for mid-term
electric
it
y de
mand
forec
a
s
t
ing incor
por
a
t
ing
w
e
at
her
influ
ences.
ENERGY
. 2006; 3
1
: 208-2
27.
[4]
J
Zhu.
T
he Optimi
z
a
tio
n
Sel
e
c
t
ion of Corre
lati
ve F
a
ctors for Lon
g-ter
m
pow
er loa
d
F
o
reca
sting.
IEEE
Fifth Int.
Conf. Intell.
Huma
n-
Machi
ne S
y
st. C
y
ber
n. 201
3: 241
–2
44.
[5]
Y Chen, PB L
uh, C Guan, Y
Z
hao, LD Mic
hel,
MA Co
olb
e
th, PB F
r
iedla
nd, SJ Rourk
e
, S Member.
Short-T
e
rm Lo
ad F
o
r
e
castin
g
: Similar
Day
-
Bas
e
d Wave
let Neural Netw
orks.
IEEE T
r
ans. Power
Syst
. 2010; 25(
1): 322–
33
0.
[6]
N Sova
nn, P
Nall
ag
o
w
n
d
e
n
, Z
Baharu
d
in.
A meth
od to
d
e
termin
e
the i
nput var
i
ab
le f
o
r the n
eura
l
network m
o
del
of the electrical system
. 201
4 5th Int. Conf. Intell. Adv. S
y
st
. 2014: 1-6.
[7]
H Quan, D Sr
i
n
ivas
an, A Kh
osravi. Sh
ort-T
e
rm Lo
ad a
n
d
W
i
nd Po
w
e
r
F
o
recastin
g U
s
ing N
eur
al
Net
w
ork-B
a
se
d Predicti
on Int
e
rvals.
IEEE Trans. Neur
al Ne
tw
orks Learn. Syst
. 2014; 25(
2): 303-3
15.
[8]
F
L
Quilumb
a, W
Lee, H Hu
a
ng, DY W
a
n
g
, S Member, R
L
Szab
ados.
Using Sm
art Meter Data t
o
Improve th
e A
ccurac
y
of Intrada
y
Lo
ad
F
o
recastin
g C
o
n
s
ideri
n
g
Custo
m
er Beh
a
vi
or
Similar
i
ties.
IEEE Trans. S
m
art Grid
. 20
1
5
; 6(2): 911-9
1
8
.
[9]
DC Mo
ntgom
e
r
y
,
GC
Run
ger
. Appl
ied
Stati
s
tics an
d Pro
b
abil
i
t
y
for E
ngi
neers. T
h
ird E
d
itio
n. USA:
W
I
LEY. 2002.
[10]
JR Jang. ANF
I
S: Adaptive-N
e
t
w
ork-B
a
se
d F
u
zz
y
Infer
ence
S
y
stem.
IEEE Trans. Syst.
Man Cybern.
199
3; 23(3): 66
5-68
5.
[11]
Z
F
eng, B Z
hang. F
u
zz
y C
l
u
sterin
g Image
Segme
n
tatio
n
Based
on P
a
rticle S
w
arm
Optimizatio
n
.
T
E
LKOMNIKA T
e
leco
mmunic
a
tion C
o
mputi
n
g Electron
ics a
nd Co
ntrol
. 20
15; 13(1): 1
28-
136.
[12]
SL Chi
u
. F
u
zzy Mo
del Id
entifi
c
ation Bas
ed o
n
Cluster Esti
mation.
J. Intell. Fuz
z
y Syst.
1
994; 2: 26
7-
278.
[13]
JC Bezd
ek, R
Ehrlic
h, W
F
u
ll. F
C
M
: T
he F
u
zz
y
C-M
e
a
n
s
Cluster
ing
Al
gorithm.
Com
p
ut. Geosci.
198
4; 10(2): 19
1-20
3.
[14]
A Azri
ye
nni, M
W
Mustafa. Ap
plicati
o
n
of A
N
F
I
S for Distanc
e R
e
la
y Protec
tion in
T
r
ansmi
ssion
Li
ne
.
Int. J.
Electr. Comput. Eng.
20
15; 5(6).
[15]
PK Pande
y, Z
Husain, RK
Jarial. ANF
I
S Bas
ed Ap
pro
a
c
h to Estimate Remn
ant Lif
e
of Po
w
e
r
T
r
ansformer b
y
Predictin
g F
u
ran Co
ntents.
Int. J.
Electr. Com
p
ut. Eng.
201
4; 4(4): 463-4
7
0.
Evaluation Warning : The document was created with Spire.PDF for Python.