ch6 (ANN Filter Banks).ppt

2.5.4.1 Basics of Neural Networks
X0
X1
INPUT
Y
OUTPUT
 N 1

y  f  Wi xi   
 i 0

X2
X N 1
1
2.5.4.2 Neural Network Topologies
2
2.5.4.2 Neural Network Topologies
3
2.5.4.2 Neural Network Topologies
4
TDNN
5
2.5.4.6 Neural Network Structures for
Speech Recognition
6
2.5.4.6 Neural Network Structures for
Speech Recognition
7
3.1.1 Spectral Analysis Models
8
3.1.1 Spectral Analysis Models
9
3.2 THE BANK-OF-FILTERS
FRONT- END PROCESSOR
10
3.2 THE BANK-OF-FILTERS
FRONT- END PROCESSOR
11
3.2 THE BANK-OF-FILTERS
FRONT- END PROCESSOR
12
3.2 THE BANK-OF-FILTERS
FRONT- END PROCESSOR
13
3.2 THE BANK-OF-FILTERS
FRONT- END PROCESSOR
14
3.2.1 Types of Filter Bank Used for
Speech Recognition
Fs
f i  i,
N
Q  N /2
1 i  Q
Fs
bi 
N
15
Nonuniform Filter Banks
b1  c
bi   bi 1 ,
2iQ
(bi  b1 )
f i  f1   b j 
,
2
j 1
i 1
16
Nonuniform Filter Banks
Filter 1 :
f1  300 Hz ,
b1  200 Hz
Filter 2 :
f 2  600 Hz ,
b2  400 Hz
Filter 3 :
f 3  1200 Hz ,
b3  800 Hz
Filter 4 :
f 4  2400 Hz ,
b4  1600 Hz
17
3.2.1 Types of Filter Bank Used for
Speech Recognition
18
3.2.1 Types of Filter Bank Used for
Speech Recognition
19
3.2.2 Implementations of Filter Banks

Instead of direct convolution, which is computationally
expensive, we assume each bandpass filter impulse
response to be represented by:
hi (n)  w(n)e
j i n
Where w(n) is a fixed lowpass filter
20
3.2.2 Implementations of Filter Banks
21
3.2.2.1 Frequency Domain Interpretation of the ShortTime Fourier Transform
22
3.2.2.1 Frequency Domain
Interpretation of the Short-Time
Fourier Transform
23
3.2.2.1 Frequency Domain
Interpretation of the Short-Time
Fourier Transform
24
3.2.2.1 Frequency Domain
Interpretation of the Short-Time
Fourier Transform
25
Linear Filter Interpretation of the
STFT
~
s ( n)
s (n)
w(n)
e
S n (e
 j1
 ji
26
)
3.2.2.4 FFT Implementation of a
Uniform Filter Bank
27
Direct implementation of an arbitrary
filter bank
s (n)
h1 (n)
X 1 ( n)
h2 (n)
X 2 (n)

hQ (n)
X Q (n)
28
3.2.2.5 Nonuniform FIR Filter Bank
Implementations
29
3.2.2.7 Tree Structure Realizations of
Nonuniform Filter Banks
30
3.2.4 Practical Examples of SpeechRecognition Filter Banks
31
3.2.4 Practical Examples of SpeechRecognition Filter Banks
32
3.2.4 Practical Examples of SpeechRecognition Filter Banks
33
3.2.4 Practical Examples of SpeechRecognition Filter Banks
34
3.2.5 Generalizations of Filter-Bank Analyzer
35
3.2.5 Generalizations of Filter-Bank Analyzer
36
3.2.5 Generalizations of Filter-Bank Analyzer
37
3.2.5 Generalizations of Filter-Bank Analyzer
38
39
40
41
42
43
44
45
‫کپستروم‬-‫روش مل‬
‫سیگنال زمانی‬
‫فریم بندی‬
|FFT|2
Mel-scaling
Logarithm
IDCT
Cepstra
Delta & Delta Delta Cepstra
Differentiator
Low-order
coefficients
46
Time-Frequency analysis

Short-term Fourier Transform

Standard way of frequency analysis: decompose the incoming signal
into the constituent frequency components.

W(n): windowing function
N: frame length
p: step size


47
Critical band integration

Related to masking phenomenon: the threshold of a
sinusoid is elevated when its frequency is close to
the center frequency of a narrow-band noise

Frequency components within a critical band are not
resolved. Auditory system interprets the signals
within a critical band as a whole
48
Bark scale
49
Feature orthogonalization



Spectral values in adjacent frequency channels
are highly correlated
The correlation results in a Gaussian model
with lots of parameters: have to estimate all
the elements of the covariance matrix
Decorrelation is useful to improve the
parameter estimation.
50
Cepstrum

Computed as the inverse Fourier transform of the log
magnitude of the Fourier transform of the signal

The log magnitude is real and symmetric -> the transform
is equivalent to the Discrete Cosine Transform.
Approximately decorrelated

51
Principal Component Analysis




Find an orthogonal basis such that the
reconstruction error over the training set is
minimized
This turns out to be equivalent to diagonalize the
sample autocovariance matrix
Complete decorrelation
Computes the principal dimensions of variability,
but not necessarily provide the optimal
discrimination among classes
52
Principal Component Analysis (PCA)





Mathematical procedure that transforms a number of (possibly)
correlated variables into a (smaller) number of uncorrelated
variables called principal components (PC)
Find an orthogonal basis such that the reconstruction error over
the training set is minimized
This turns out to be equivalent to diagonalize the sample
autocovariance matrix
Complete decorrelation
Computes the principal dimensions of variability, but not
necessarily provide the optimal discrimination among classes
53
PCA (Cont.)

Algorithm
Input= x N *M
 x
M
Cov 
i 1
i

 x xi  x

M 1
(R- dim vectors)
EigVali
EigVec N i
i  1...N
Eigen values
Covariance matrix
(N-dim vectors)
Output = yR*M
T
Apply
Transform
yF x
Eigen vectors
Transform
matrix
 EigVec1 
 EigVec 
2 
F
.



EigVec
N 

EigVal1  EigVal2  ...
54
PCA (Cont.)

PCA in speech recognition systems

55
Linear discriminant Analysis




Find an orthogonal basis such that the ratio
of the between-class variance and withinclass variance is maximized
This also turns to be a general eigenvalueeigenvector problem
Complete decorrelation
Provide the optimal linear separability under
quite restrict assumption
56
PCA vs. LDA
57
Spectral smoothing


Formant information is crucial for recognition
Enhance and preserve the formant information:
Truncating the number of cepstral coefficients
 Linear prediction: peak-hugging property

58
Temporal processing

To capture the temporal features of the spectral
envelop; to provide the robustness:


Delta Feature: first and second order differences;
regression
Cepstral Mean Subtraction:

For normalizing for channel effects and adjusting for spectral
slope
59
RASTA (RelAtive SpecTral Analysis)

Filtering of the temporal trajectories of some
function of each of the spectral values; to provide
more reliable spectral features

This is usually a bandpass filter, maintaining the
linguistically important spectral envelop modulation
(1-16Hz)
60
61
RASTA-PLP
62
63
64