Indonesian J
ournal of Ele
c
trical Engin
eering and
Computer Sci
e
nce
Vol. 2, No. 2,
May 2016, pp
. 315 ~ 327
DOI: 10.115
9
1
/ijeecs.v2.i2.pp31
5-3
2
7
315
Re
cei
v
ed
De
cem
ber 1
8
, 2015; Re
vi
sed
Februar
y 5, 2
016; Accepte
d
February 2
6
, 2016
Failure Analysis and Reliability Study of NAND Flash-
Based Solid State Drives
R Sa
y
y
ad, Sangram Red
kar*
T
he Pol
y
tech
ni
c School, Arizo
na State Univ
e
r
sit
y
Mesa, Ari
z
ona 8
5
2
12, U
SA
*Corres
p
o
ndi
n
g
author, e-ma
i
l
: sredkar@
a
su
.edu
A
b
st
r
a
ct
T
h
is researc
h
focuses o
n
con
ductin
g
failur
e
ana
lysis an
d relia
bi
lity study to understa
nd
an
d
ana
ly
z
e
th
e r
oot ca
use
of
Quality, En
dur
ance
co
mp
on
ent R
e
li
abi
lity
De
mo
nstration
Test (RDT)
failur
e
s a
nd
de
termi
ne SS
D p
e
rformanc
e ca
pab
ility.
It ad
dr
esses ess
enti
a
l cha
lle
ng
es i
n
deve
l
o
p
in
g
techni
qu
es that
utili
z
e
so
lid-st
a
te
me
mory
te
chno
log
i
es (w
it
h e
m
p
has
is o
n
NAND fl
ash
memory) fro
m
devic
e, circuit, architecture,
and syste
m
perspec
tiv
e
s. These chal
le
nges i
n
clu
de
not only th
e
perfor
m
a
n
ce d
egra
datio
n aris
ing fro
m
the ph
ysical nat
ure of
NAND flash
memory, e.g., the ina
b
il
ity to
mo
dify data i
n
-pl
a
ce rea
d
/w
rite performanc
e asy
mme
try, and slow
and constra
i
ne
d eras
e
function
ality,
b
u
t als
o
th
e re
li
abil
i
ty dr
aw
bac
ks that
l
i
m
its S
o
lid
State
Driv
es
(SSDs)
perf
o
rmanc
e. In
order to un
der
stand the n
a
tur
e
of failures, a
F
ault
T
r
ee Analysis (F
T
A
) was perfor
m
e
d
that ide
n
tifie
d
the pot
entia
l c
auses
of co
mp
one
nt fa
il
ures.
In the co
urse
of this re
se
arc
h
, sign
ificant
d
a
ta gat
heri
ng
and
an
alysis
e
ffort w
a
s carried
out that
le
d to
a
syste
m
atic ev
alu
a
tio
n
of
the c
o
mp
o
nents
und
er
consi
derati
on. T
he
ap
proac
h
used
here to e
s
timate r
e
li
abi
li
ty utili
z
e
d
a sa
mp
le of dr
ives
to reflect the
relia
bi
lity para
m
eters (R
BER,
AF
R, and MR
R) over 1 ye
ar
. It is anticipate
d
that this stud
y can prov
ide
a methodol
ogy
for future reli
ability studies l
e
adi
ng to
system
atic testi
ng
and ev
aluation procedure for
SSD RDT
’
s
an
d critical co
mp
one
nts.
Ke
y
w
ords
:
Pe
rforma
nce An
al
ysis, Relia
bi
lity
,
Solid State D
e
vices
Copy
right
©
2016 In
stitu
t
e o
f
Ad
van
ced
En
g
i
n
eerin
g and
Scien
ce. All
rig
h
t
s reser
ve
d
.
1. Introduc
tion
Many situ
ations en
cou
n
tered
in th
e
engin
e
e
r
ing
wo
rld, h
a
ve a
cha
n
ce
eleme
n
t
asso
ciated
with them such as variatio
n in ma
terial
prope
rtie
s, physi
cal envi
r
onm
ents, lo
ads,
power a
nd
si
gnal inp
u
ts.
Some of the
basi
c
phy
sical laws can
be treate
d
a
s
determi
nisti
c
–
water see
ks
the lowest
acce
ssi
ble level
,
object
s
floa
t when
they
displ
a
ce their own
weight
in
liquid, etc. B
u
t many phy
sical
laws
are probabilisti
c in
nature.
Since the beginning of hi
story,
humanity ha
s attempted to
predi
ct the future by
watchi
ng the flight of birds and th
e movement of
leaves on
the
tree
s, to
me
ntion a
few.
Fortun
ately, today’s en
gin
eers
do
not h
a
ve to
depe
n
d
on
a ‘cry
stal ball’
in ord
e
r to
predict the futu
re of
thei
r p
r
o
duct
s
. Throu
gh the u
s
e
of statistical to
ols
for p
r
od
uct lif
e data
analy
s
is, reli
ability e
ngine
ers
can
determi
ne th
e proba
bility and
durability of
comp
one
nts,
and system
s to perfo
rm
their re
q
u
ired functio
n
s for desi
r
ed
life span wit
hout
failure.
The
p
r
odu
ct life
dat
a can
be
mea
s
ured i
n
h
o
u
r
s,
mile
s,
cy
cl
es-t
o
-
f
a
ilu
re,
st
re
s
s
cy
cle
s
or
any other met
r
ic with
whi
c
h
the life or
exposure of a p
r
odu
ct ca
n be
measu
r
e
d
.
A solid-state drive (SSD) is a data sto
r
a
ge dev
ice u
s
i
ng integrated
circuit asse
mblies a
s
memory to
st
ore
data
persistently
.
SSD
tech
nology use
s
elect
r
on
ic
inte
rfaces comp
atible with
traditional
bl
ock inp
u
t/out
put (I/O) ha
rd di
sk
drive
s
, thu
s
pe
rm
itting simpl
e
repl
acement
in
comm
on ap
pl
ication
s
.
The purpose of
this
re
search i
s
to determine the perf
ormance capabilit
y of SSD NA
ND
Flash
by co
n
ductin
g
Re
gression, Q
uali
t
y and E
ndurance Reli
abili
ty Demon
s
tra
t
ion Test fro
m
reliability
stan
dpoint to
assure th
e ea
rly
identific
atio
n of
potential p
r
oble
m
s rel
a
ted
to
valid
ation,
firmwa
re an
d NAND. The o
b
jective
s
are
as follo
ws:
1.
Con
d
u
c
t a literature revie
w
on the me
tho
d
s u
s
ed to pe
rform a reliabi
lity study.
2.
Gather failu
re
data from the databa
se fo
r tests
con
d
u
c
ted.
3.
Apply statistical tools an
d reliability methodolo
g
ies to t
he failure d
a
ta.
4.
Analyze the result
s and determi
ne the m
a
jor
causes
of failure a
nd
predi
ct the
reli
ability of the
comp
one
nt.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 25
02-4
752
IJEECS
Vol.
2, No. 2, May 2016 : 315 –
327
316
Quality and
Relia
bility of a SSD incl
ud
es in
itial p
r
o
duct ri
sk a
s
sessment, qu
ality and
reliability test
ing de
sig
n
, p
l
annin
g
an
d
sched
uling,
t
e
sting execution,
testing result rollup
a
n
d
result interp
re
tation There are three types of
failure
s
observed in a
solid state d
r
ives:
1)
Bricke
d Driv
e:
Drive is un
detecta
ble by
the system.
2)
Detec
t
ed da
ta error:
T
h
e
s
e errors are d
e
tected u
s
in
g
CRC algo
rith
m.
3)
Unde
tec
t
ed data
er
ror:
T
hese errors a
r
e und
etecte
d
rende
rin
g
co
rru
pt Data.
The SSD driv
e reliability is
characteri
zed in terms of following
parameters.
Annualiz
ed failure
rate (AFR):
It is the fraction
of drives th
a
t
fail in the
field, per year.
Manufa
c
turers rep
o
rt 0.6
-
2
.
5% [1], but
fi
eld data can
be high
er at 1
.
7-8.6% [2].
Mean time b
e
t
w
e
e
n
failu
res (MT
B
F):
This is the time elap
s
ed
betwe
en 2 consecutive field
failure
s and i
s
Equivalent to 1/AFR.
Uncor
r
ec
tab
l
e bit err
o
r
rate
(UBE
R):
Numbe
r
of
data e
r
rors
per
bit re
ad
from the
driv
e.
Typically 1 in 10
13
to 10
16
f
o
r
HD
Ds
.
Unde
tec
t
ed
bit error r
a
te
:
Nu
mbe
r
of undetect
ed erro
rs p
e
r bit rea
d
from the dri
v
e
.
Req
u
ire
m
ent
can
be
sa
me
as un
co
rre
ct
able fo
r
con
s
umer drive
s
,
but ente
r
p
r
ise re
quireme
n
t
s
can b
e
more string
ent.
2. NAND/SS
D Reliabilit
y
It can take
a
long time in
developm
en
t to get an SSD to the po
int whe
r
e it is highly
reliabl
e and t
here i
s
a stro
ng nee
d to ba
lance relia
bility, cost, and complexity.
Figure 1. 3-D
Structu
r
e of the tran
sisto
r
The main rea
s
on
s for
NAND/SSD failures
are
2.1. Charg
e
Storag
e: Pro
g
ram and Er
ase
Approximatel
y 90% of
a
n
SSD core
is a
me
mo
ry
ar
ray
.
M
e
mory
cell
is
a
sin
g
le
transi
s
to
r wit
h
gate (sho
wn in Figure 1), source
, drai
n, and ch
ann
el. The NAND org
ani
zatio
n
is
s
h
ow
n
in
F
i
gu
r
e
2
.
1T
Memor
y
Cell
Floatin
g
Gate
Evaluation Warning : The document was created with Spire.PDF for Python.
IJEECS
ISSN:
2502-4
752
Failure Analysis and Reliability Study
of
NAND Fl
ash-Based Solid
State …
(San
gram
Red
k
a
r
)
317
Figure 2. Structure of
NAND Flash
In the memo
ry cell, a “float
ing gate
”
(F
G
)
is p
u
t belo
w
the cont
rol g
a
te (CG). Ne
gative
cha
r
ge
on F
G
oppo
se
s th
e CG voltage
, pushin
g
the
V
T
highe
r. Positive ch
arg
e
on FG aid
s
the
CG voltage,
pushing the
V
T
lower. NA
ND
stores d
a
t
a by placin
g different cells at different V
T
’s
(as
sho
w
n in
Figure 3
)
. Sensin
g ci
rcuitry detect
s
the cell cu
rrent and
ret
u
rn
s a 0 or
a 1.
Programmin
g
mean
s inje
cting el
ectron
s
to the F
G
a
n
d hig
h
CG vo
ltage p
u
lls el
ectro
n
s up.
T
h
is
is governed
by quantum-mech
ani
cal tunneli
ng thro
ugh
the “tun
n
e
l oxide” a
s
shown in Figu
re 4
A. Erase mea
n
s re
moving
electron
s fro
m
the FG-Tu
nnelin
g in rev
e
rse directio
n
.
2.2. Arra
y
Distribu
tions a
nd Multi-Lev
e
l Flash
Single-l
e
vel-cell (SL
C
)
NAND
ha
s cells divid
ed
into two V
T
levels (era
sed and
programmed
c
.f. Figure 3). The FG
c
harge varies
from
c
e
ll to
c
e
ll,
s
o
that there
are
dis
t
ributions
of V
T
’
s
fo
r th
e p
r
og
ramm
ed a
n
d
erased
states.
M
u
lt
i-level-cell
(MLC)
Fl
as
h h
a
s
fou
r
le
ve
ls
.
Sensin
g circu
i
try can distin
guish 1’s fro
m
0’s as lon
g
as the distribution
s
don
’
t
cro
ss the re
ad
points
.
Width
of any level is call
ed the
state widt
h.
Space i
n
bet
wee
n
is the
remai
n
ing
re
ad
wind
ow b
udg
et.
Figure 3. Arra
y Distribut
io
n and Multi-Lev
el Flash
Erase
Programm
#
L
L
L
L
V
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 25
02-4
752
IJEECS
Vol.
2, No. 2, May 2016 : 315 –
327
318
Figure 4. A.
Quantum T
u
n
neling in
NAND Flash
2.3. Degra
d
a
t
ion of Tunn
el Oxide
In a typical
memory
cell,
one p
r
o
g
ra
m ope
ration i
s
follo
wed
by an e
r
a
s
e a
n
d
this i
s
calle
d a p
r
o
g
ram/era
s
e
cycle. As NA
ND
is u
s
ed,
it
ca
n und
ergo m
any cycl
es.
Cycling i
s
a
ha
rsh
stre
ss ind
u
ci
ng fa
ctor
on
the tunn
el ox
ide. The
ma
gnitude
of hi
gh el
ectri
c
fi
eld in th
e tu
nnel
oxide is very
large (> 10
million volts per cm). Ow
i
ng to these f
a
ctors (i
ncl
u
sive of thin oxide
layer), l
o
t of
electron
s
pa
ss th
rou
gh th
e
oxide
(tun
ne
ling). A
s
a
re
sult, the
oxid
e de
grade
s
a
n
d
results into b
r
oken Atomic bond
s, both in the bul
k a
nd at the interface, b
r
o
k
en
-bon
d site
s can
trap ele
c
tro
n
s
that pass throu
gh, be
co
ming el
e
c
tri
c
ally negative and tunn
el o
x
ide degrada
tion
(as
sho
w
n in
Figure 4 B)
Figure 4. B.
Deg
r
ad
ation
of Tunnel Oxi
d
e
Neg
a
tive cha
r
ge in tun
nel
oxide make
s erase slo
w
er, pro
g
ram
faster d
ue to
this is
attributed to two effec
t
s
that tak
e
pl
a
c
e
at the Gate-S
ubstrate interf
ace
s
:
Electro
n
s i
n
tunnel oxid
e rep
e
l cha
nnel el
ectro
n
s a
nd raises the VT
whi
c
h h
e
lps
prog
ram
m
ing
and hurt
s
erase
Electro
n
s in t
unnel oxide
repel tunn
elin
g electron
s a
nd hurt both
prog
ram a
nd
era
s
e
As a re
sult,
era
s
e
slows do
wn. Pro
g
ram
h
a
s
competing
effects, n
e
t bei
ng slig
ht
spe
edu
p. Wh
en a
blo
c
k e
r
ase
s
slo
w
e
r
t
han
data
s
he
e
t, it
fails
an
d
is
retire
d. NA
ND data
s
h
e
e
t
s
allow
2%
to
4%
of
bloc
ks
to fail in this
way.
CG
FG
+N
CG
FG
+N
+N
V
CG
Evaluation Warning : The document was created with Spire.PDF for Python.
IJEECS
ISSN:
2502-4
752
Failure Analysis and Reliability Study
of
NAND Fl
ash-Based Solid
State …
(San
gram
Red
k
a
r
)
319
2.4. Write Er
rors
Bit errors
occur d
u
rin
g
write
be
ca
use of
foll
owin
g
rea
s
on
s. Some cell
s that
are
sup
p
o
s
ed to
stay era
s
e
d
, end up
pro
g
rammed by
m
i
stake (“program dist
u
r
b
”
, PD),Some
ce
lls
that are
sup
posed to p
r
ogra
m
to (say) L1, in
stead p
r
og
ram
to L2 (“ov
erp
r
og
rammi
ng”,
OP),Some
cells th
at are supp
ose
d
to prog
ra
m neve
r
m
a
ke
it to t
heir i
n
tend
e
d
VT
level(“unde
rp
rog
r
ammi
ng”) as
sh
own i
n
Figu
re
5.
All
these
t
h
ings
get worse with cycli
ng,
becau
se the trapp
ed
-up tu
nnel oxide g
e
t
s wo
rst with
numbe
r of cy
cle
s
.
Figure 5. Und
e
rp
rog
r
ammi
ng Write Erro
r
2.5. Data
Re
tention Error
s
NAND is
expected to
retai
n
data
with the
power off.There are
two intrinsi
c
m
e
chani
sms
that cau
s
e progra
mmed
ce
lls to lose V
T
, primarily post
-
cy
cling
De-tra
pping:
The ne
gative
cha
r
ge trapp
ed in the tun
n
e
l
oxide det
ra
ps fro
m
the o
x
ide. Also
called intrinsi
c charge loss (ICL)
SILC:
Ele
c
tro
n
s tu
nnel
off
the FG
thro
u
gh b
r
o
k
en
bo
nds in th
e oxi
de. Thi
s
i
s
ca
lled
stre
ss-
indu
ced lea
k
age current (SILC), also
si
ngle bit ch
arg
e
loss (SBCL
)
These me
cha
n
ism
s
have di
fferent ch
ara
c
teristics:
De-t
rap
p
ing i
s
a therm
a
l effect with very little depend
e
n
ce o
n
ele
c
tri
c
field
1.
Affects all th
ree p
r
og
ramm
ed level
s
, but
mostly L
1
&
L2 b
e
ca
use
L3 u
s
ually
ha
s mo
re
margi
n
2.
Accel
e
rated b
y
high temperature ba
ke
(st
anda
rd qu
alification te
st is 10 hou
rs at 1
2
5
o
C)
SILC, like all tunneli
ng, is d
r
iven by
elect
r
ic field, not tempe
r
ature
1.
Affects mo
stly L3, becau
se L3
cells
ha
ve t
he highe
st built-in ele
c
t
r
ic field
(p
rog
r
amme
d
the highe
st)
2.
High
-temp
e
ra
ture ba
ke
s, the stand
ard
wa
y of accel
e
rating data lo
ss, has n
o
effect
Both depen
d on trap
s in the tunnel oxid
e, so
the rete
ntion time gets wo
rse with
cyclin
g.
Erasi
ng/writin
g a bl
ock
re
sets the
retent
ion cl
ock to
zero. Yo
u ne
ed to retain
d
a
ta only for
the time between writes, n
o
t
the entire p
r
odu
ct lifetime.
2.6. Read
Disturb Erro
rs
High
V
pass
on
de
sele
cted WL
s cau
s
e
s
reverse
SILC
-- ele
c
tro
n
s tunneli
ng from
ch
ann
el
to FG. T
h
is
phen
omen
on
is El
ectri
c
-field d
r
iven,
so mo
stly cau
s
e
s
L
0
=>L
1
.
As
with reten
t
ion
SILC, it worsens p
o
st
-cy
c
l
i
ng be
cau
s
e
of the more
defect
s
are p
r
esent in the
tunnel oxide.
As
with retentio
n
,
erasin
g and
rewriti
ng a bl
ock re
-sta
rts t
he clo
c
k.
The failure mech
ani
sms di
scusse
d so fa
r are ‘a
nalo
g
’ failure
s i.e. cell V
T
s
h
ifts
at
a
wro
ng level. Less commo
n but more
se
vere are
defe
c
ts, pa
rticula
r
ly short
s
but sometim
e
s
open
s. Short
s
hap
pen b
e
cause insulato
rs (diele
ctri
cs) bre
a
k d
o
wn. These
sho
r
ts are due to
Gate-oxid
e
breakdo
wn
Interco
nne
ct-t
o-Interco
nne
ct shorting
(a
s sho
w
n in Fig
u
re 6
)
L0
L1
L2
L3
P
O
Under
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 25
02-4
752
IJEECS
Vol.
2, No. 2, May 2016 : 315 –
327
320
Interco
nne
ct sho
r
ts du
e to particle
s
are
the
main so
urce of failure for CPUs, so studied
extensively. Same pa
rticl
e
s a
s
the
one
s that ca
use yi
eld fallout, o
n
l
y smalle
r o
r
positio
ned
so
as
not to cau
s
e i
mmediate fail
ure. Reli
abilit
y failure rate i
s
pro
p
o
r
tional
to yield loss.
Figure 6. Oxide Brea
kd
own and Interc
o
nne
ct to Intercon
ne
ct Shorting
Shorts
and o
pen
s can al
so be cau
s
ed
by corro
s
ion
of cop
per m
e
tallizatio
n. Topsi
d
e
silicon nit
r
ide
pa
ssivation
i
s
sup
posed t
o
keep th
e m
o
isture o
u
t, b
u
t it fails
whe
n
it is
scratch
ed
or
whe
r
e
the
passivation
is ope
ned
at b
ond
pad
s. L
a
r
ge %
of Eph
r
aim fiel
d retu
rns were tra
c
ed
to scrat
c
he
s. A numbe
r o
f
PV OEM return
s ap
pea
r due to th
e
bond
pad i
s
sue. As with
all
defect
s
, sev
e
rity can
ran
ge from
sing
le WL to total die fail. Amount of scratchin
g
ca
n be
improve
d
but not eliminate
d
totally.
3. Failure Anal
y
s
is (FA)
Failure analy
s
is [3
-5] ha
s eight ba
sic
steps. Th
ese
st
eps a
r
e sho
w
n in Figure 7.
Step 1:
Requ
est Ev
aluation and Ac
ce
ptan
ce – Initial In
v
estigation of Failure
A Failure a
n
a
l
yst nee
ds to
kno
w
wh
at th
e go
al i
s
for
analyzi
ng thi
s
failure.
What
is th
e
backg
rou
nd f
o
r the failu
re,
meanin
g
, wh
ere, when,
a
nd ho
w it failed; why it failed ba
sed
on
the
kno
w
n info
rm
ation?
What i
s
the failu
re rate for this
p
a
rticul
ar failu
re mode
? Is it
the first time
to
see
su
ch ki
n
d
of failure?
What a
r
e the
hypothe
se
s for ro
ot cau
s
e
(
s) and d
r
ivin
g factors? Are the
sampl
e
s
ca
p
able of an
swering th
e qu
e
s
tion
s or
are
critical sampl
e
s mi
ssi
ng?
Until the an
al
yst
has a
clea
r picture on what need
s to
be done to
accompli
sh
the goal, an
d has e
nou
g
h
backg
rou
nd i
n
formatio
n pri
o
r to com
m
en
cing the a
nal
ysis.
Metal
or Pol
y
Par
t
icle
Breakdo
w
n
VCC
Or high
-V
0V
ILD
Metal
or Pol
y
CG
Y
i
eld loss
Benign
Rel
SILC
(mo
s
tl
y
)
pl
us
IC
L
Evaluation Warning : The document was created with Spire.PDF for Python.
IJEECS
ISSN:
2502-4
752
Failure Analysis and Reliability Study
of
NAND Fl
ash-Based Solid
State …
(San
gram
Red
k
a
r
)
321
Figure 7. Failure Analy
s
is
Flow
Step 2: Elec
trical Failure Verification
Duri
ng thi
s
st
ep, elect
r
ical
testing i
s
pe
rfor
me
d to co
n
f
irm if the failure
can
be d
e
tected
unde
r the
o
u
tgoing
prod
uction
test
p
r
og
ram, in
te
rnal q
uality a
s
sura
nce te
st prog
ram,
o
r
engin
eeri
ng
e
v
aluation te
st
pro
g
ram. Ea
ch te
st p
r
og
ram ha
s
a diff
erent
gua
rd
-b
and fo
r va
rio
u
s
test pa
ram
e
ters,
an
d
can
provide
u
s
efu
l
inform
ati
on
on if the
d
e
vice fail
ed
marginally, or fail
ed
across all con
d
itions, or ev
en a test escape.
Step 3: Elec
trical Fault Di
agnos
tic an
d Isolation
This i
s
mo
st
difficult and
time con
s
u
m
ing ste
p
fo
r FA work, and requi
re
s a lot of
expertise fro
m
the analyst to understa
nd the ele
c
tr
i
c
al ph
eno
me
na, and in tu
rn, to tran
slat
e it
into the physical ph
enom
e
na. Some of these fa
ilu
re
s are sh
own in Figs. 8-11.
To have mo
re
comp
re
hen
si
ve unde
rstan
d
ing o
n
ele
c
trical fail
ure
s
, some
ba
si
c failu
re typ
e
s, an
d typi
cal
cau
s
e
s
for th
ese failu
re types a
r
e liste
d
in Table 1:
Table 1. Ca
u
s
e an
d Effect of Electrical F
a
ilure
s
T
y
pical Fail
ure T
y
pes
:
o Par
a
metr
ic Hard F
a
ilur
e
s
o Par
a
metr
ic Soft F
a
ilur
e
s
o Functional Har
d
Failures
o Functional Soft
Failures
o Recoverable F
a
ilures
T
y
pical
fail
ure
mod
es f
o
r th
e fi
rst fo
ur fail
ure t
y
p
es
inclu
d
e:
-Pin opens, short
s
and leakage
-High I
cc
, I
ccsb
under different t
e
st modes
-Pin to pin short
-Incorrect logic input/output
-Stuck-at fault
-Speed and
timing issues
-Analog para
m
et
er drift
-Memor
y ar
ra
y
• Whole arra
y
or
partial arra
y
• Particular sector or a g
r
oup of
sectors
•
Parti
c
ul
ar c
o
l
u
mn(s
) or
ro
w(s
)
T
y
pical
caus
es
for su
ch f
a
ilure
s inclu
d
e:
-
Packag
e relate
d:
• Poor
w
i
re bond
s, w
i
re loops
• Package cracking, die cracking
• Package defects in molding com
pound
• Damaged or co
rroded
bond pa
d
s
• Moisture trappi
ng due to r
e
liabilit
y
stress tests
• Improper laser
marking damage
• Package or die surface contamination
• Damaged circuit due to sharp
-
ed
ged fillers
• Hair
line die cr
acks or
die sur
f
ace scr
atches
•Damaged or
par
tially
damag
ed p
a
ckage connections
-De
v
ice rel
a
te
d:
• Electrical Stat
ic Discharge (ESD
)
• Electrical Over Stress (EOS)
• Charge tr
appin
g
due to minor E
O
S/ESD or
hot
car
r
i
er
injection
• Fabrication pro
c
ess error or
ran
dom defect.
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 25
02-4
752
IJEECS
Vol.
2, No. 2, May 2016 : 315 –
327
322
Figure 8. A random fab def
ect ca
used two adja
c
ent m
e
tal lines
Figure 9. Optical an
d SEM photo
s
sh
owed som
e
Cu t
r
ace
Step 4: Ph
y
s
ical Failure Site Isolation
For phy
sical failure
site isol
ation, a two-step approa
ch
is re
comm
en
ded.
Non
-
des
t
ru
c
t
iv
e anal
y
s
is step
:
In this step, followi
n
g
analysi
s
are typically performed:
•
Package leve
l visual inspe
c
tion
•
Package
wire
-bon
d co
nditi
ons a
nalysi
s
•
Package inte
rface del
amin
ation analy
s
is
Des
t
ruc
t
iv
e analy
s
is
step:
In this ste
p
, following a
nalysi
s
are typically pe
rformed:
•
Package d
e
capsulation for physical fault isolation
•
Laser an
d FIB cuts for fail
ure site i
s
olati
o
n
•
Delaye
ring (d
epro
c
e
s
sing
) for
defe
c
t
sea
r
chi
n
g
•
Mech
ani
cal cross-sectio
n throu
gh failu
re
site
Physical fa
ult isolation i
s
carri
ed out b
a
s
ed
on
the el
ectri
c
al failu
re mode
s. If the failure
mode in
dicates a p
o
ssibl
e
packag
e
related failure, followin
g
tool
s and techniq
ues
can
be u
s
ed
for fault isolat
ion as p
r
e
s
en
ted in Table 2
.
Table 2. Meth
ods for Phy
s
i
c
al Failu
re Site Isolation
O
p
tical
inspectio
n
Scanning Acoustic Microscopy
(S
AM)
Real time, 3D X-r
a
y
inspection
Time Domain Re
flection (TDR) a
n
a
ly
sis
Ph
y
s
ical/mechanical
probing
D
y
e
pe
netration
anal
y
s
is
Decapsulation
and
visual inspect
i
on
Mechanical
cross-section
Evaluation Warning : The document was created with Spire.PDF for Python.
IJEECS
ISSN:
2502-4
752
Failure Analysis and Reliability Study
of
NAND Fl
ash-Based Solid
State …
(San
gram
Red
k
a
r
)
323
Figure 10. Re
al time X-ray sho
w
e
d
a sa
gging
Step 5: Ph
y
s
ical Depro
c
e
ssing (or
Defect Lo
caliza
t
ion)
After electri
c
al and physi
cal fault isol
ati
on, a deci
s
ion ne
ed
s to be made
on if th
e
physi
cal d
eproce
s
sing i
s
required.
Fo
r failure
s that h
a
ve simila
r fa
ilure m
ode
s a
nd failure site
a
s
previou
s
kno
w
n failu
re
s, a sig
nature
analysi
s
sh
o
u
ld always b
e
co
nsi
dered
to con
c
lud
e
the
analysi
s
re
sul
t
s without goi
ng thro
ugh th
e full phy
sical
deprocessin
g
. Signature
Analysis m
e
a
n
s
that a new
sa
mple ha
s the
same
set of failure
mo
de
s, same envi
r
o
n
mental
cond
itions a
s
othe
r
device
s
that receive
d
a co
mpre
hen
sive
failure an
alysis. By using signature anal
ysis, it will he
lp
sho
r
ten th
e t
u
rn
-aroun
d-ti
me for analy
s
is a
s
we
ll a
s
avoi
d wasti
ng FA
re
sou
r
ce
s o
n
repe
a
t
ed
failure
s. Sug
geste
d phy
si
cal d
eprocessing m
e
thod
s (in chronolo
g
ical
ord
e
r)
are p
r
e
s
e
n
te
d in
Table 3.
Table 3. Physical Deprocessing
Chemical/Mechanical parallel lap
p
ing do
w
n
to
the level right above the failing lay
e
r
Optical and SEM inspection at the suspected
area
Using binar
y
sea
r
ch or passive voltage contrast
(PVC)
Perform
FIB or
T
E
M cross-section
through th
e
identified defective site for more
details about
the defects
Figure 11. A Si defect wa
s found und
er
the faili
ng cell
with TEM cro
s
s-sectio
n an
alysis
Step 6:
De
fe
ct An
aly
s
is a
nd Char
ac
te
rization
Knowin
g the
defect i
s
j
u
st
as i
m
po
rtant
as fin
d
ing
the
defe
c
t. Adva
nce
d
te
chn
o
l
ogy ha
s
hund
red
s
of
pro
c
e
s
s ste
p
s
, and
kno
w
ing which ste
p
that the d
e
fect was intro
duced
will lea
d
th
e
Evaluation Warning : The document was created with Spire.PDF for Python.
ISSN: 25
02-4
752
IJEECS
Vol.
2, No. 2, May 2016 : 315 –
327
324
pro
c
e
ss t
e
a
m
right to th
e co
re
of the pro
b
lem fo
r corre
c
tion
action
s.
TEM
is a daily failure
analysi
s
tool for defe
c
t examination an
d cha
r
a
c
teri
zati
on.
Step 7:
Roo
t
Cause Ide
n
tification
Finding
a ph
ysical
defe
c
t provide
s
a
critical
pie
c
e to
the pu
zzl
e o
f
what the true root
cau
s
e
is for the p
r
o
b
lem.
Along
with fa
ilure
ba
ck
gro
u
nd evaluatio
n,
fabr
i
c
atio
n
process
hist
ory
che
c
k an
d other
criti
c
al inf
o
rmatio
n rel
a
ted to t
he failure, a
root
ca
use
co
uld be
identified, then
the corrective actions will be
taken, and preventive acti
ons shall be
implemented.
Step 8: F
eed
back the Re
sults fo
r Cor
r
ectiv
e Actio
n
s
The FA
p
r
o
c
e
s
s is a
cl
osed
loop,
providi
ng in
fo
rmatio
n feed
ba
ck fo
r
corre
c
tive a
c
tion
s i
s
to stop the sa
me kind of fai
l
ure
s
bein
g
submitted to the FA lab for a
n
alysi
s
again.
4. Reliabilit
y
Demonstarion Testing
(RDT) and Results
R
e
liab
ilty T
e
s
t
in
g
w
a
s
c
o
nd
uc
te
d
on
En
te
rp
r
i
s
e
d
r
ives
. En
te
r
p
r
i
se
dr
ives
ha
ve
c
o
ns
is
te
n
t
per
fo
r
m
an
ce
and
g
r
e
a
t
er
e
n
d
u
r
an
ce
w
i
th a
d
d
itio
na
l
r
a
w
c
a
pa
c
i
ty. En
te
r
p
r
i
se
dr
ive
s
requi
re
cu
sto
m
appli
c
atio
n
s
for spe
c
ific nee
ds
and
can
be co
mp
rise
d
of seve
ral
SSDs. Joi
n
t
Electro
n
De
vice Engi
ne
ering
Coun
cil (JEDEC)
Solid State Technology
Association, a
semi
con
d
u
c
tor trade a
nd
engin
eeri
ng standardizati
o
n
orga
nizatio
n
defines the
JEDEC 21
8 and
219
standa
rd
s for the SS
D reliability tests. Table 4
shows the te
st
con
d
itions fo
r the cli
ent a
n
d
enterp
r
i
s
e dri
v
es.
Table 4. Cla
s
sificatio
n
of Client and Ente
rpri
se
Drive
s
Enduran
ce RDT is u
s
e
d
to che
c
k on t
he NA
ND
ca
pability of the drive. The
test wa
s
carrie
d out fo
r Hig
h
Endu
rance NA
ND.
The drive
s
were
sho
r
t stro
ke an
d only 4
9
% of the act
u
a
l
drive area i
s
use
d
to analy
z
e the
compl
e
te NAND
. T
h
is results a
r
e further
use
d
to extrapol
ate
the full ca
pa
ci
ty of NAND.
The
well defi
ned
stand
ard
s
alo
ng
with the SSDs of
mixed den
s
ity wa
s
use
d
a
s
sa
mple
size fo
r the
Endu
ra
nce
RDT. T
he te
sts
sta
r
ted by first
defining
the
flow,
worklo
ad i.e
amount
of Te
rabytes that
need
s to
be
written
on th
e SSD, follo
wed
by prepi
ng of
the SSDs an
d final stage
of debug
ging
the failure
s
if any and taki
n
g
corre
c
tive a
c
tion
s.
The re
aliabilit
y paramete
r
s
[7-9] unde
r st
udy for che
cking on the fail
ure
s
we
re:
Mo
vin
g
R
e
ad
R
e
f
e
re
n
c
e
(
M
RR
):
an e
r
ror-avoid
a
n
c
e
or
error-reco
ver
y
sc
heme
that moves the
read tri
p
point
s in re
spo
n
se
to shifts in distributio
ns.
Ra
w
B
i
t Err
or Ra
te
(R
B
E
R):
Prob
abi
lity of a bit being
erron
e
ous
witho
u
t use
of any e
rro
r
corre
c
tion te
chniqu
es.
Defects per
million (DPM):
Attribute
d
to a sin
g
le
failure me
ch
anism
(for
a
give com
pon
ent)
without a corrective actio
n
plan in pla
c
e
to resolve d
e
fects.
The End
u
ra
nce
RDT was divide
d i
n
to 3 leg
s
Rea
d
distu
r
b
at High a
n
d
Ro
om
Tempe
r
atu
r
e
and
No read
disturb at Hi
g
h
Temp
erat
u
r
e. The flo
w
st
arted by d
e
fin
i
ng the d
e
n
s
ity
Evaluation Warning : The document was created with Spire.PDF for Python.