slides.ppt

Function
Approximation
Fariba Sharifian
Somaye Kafi
Function Approximation
spring 2006
1
Contents


Introduction to Counterpropagation
Full Counterpropagation





Architecture
Algorithm
Application
example
Forward only Counterpropagation




Architecture
Algorithm
Application
example
Function Approximation
spring 2006
2
Contents

Function Approximation Using Neural Network



Introduction
Development of Neural Network Weight Equations
Algebra Training Algorithms
 Exact Matching of Function Input –Output Data
 Approximate Matching of Gradient Data in Algebra
Training
 Approximate Matching of Function Input-Output Data
 Exact Matching of Function Gradient Data
Function Approximation
spring 2006
3
Introduction to
Counterpropagation



are multilayer networks based on
combination of input, clustering and output
layers
can be used to compress data, to
approximate functions, or to associate
patterns
approximate its training input vectors pair by
adoptively constructing a lookup table
Function Approximation
spring 2006
4
Introduction to
Counterpropagation (cont.)

training has two stages



Clustering
Output weight updating
There are two types of it


Full
Forward only
Function Approximation
spring 2006
5
Full Counterpropagation

Produces an approximation x*:y* based on



input of an x vector
input of a y vector only
input of an x:y ,possibly with some distorted or missing
elements in either or both vectors.
Function Approximation
spring 2006
6
Full Counterpropagation
(cont.)

Phase 1

The units in the cluster layer compete. The learning rule for
weight updates on the winning cluster unit is (only the
winning unit is allowed to learn)
 wiJnew  wiJold   xi  wiJold  i  1,2,...,n
 new
old
old

ukJ  wkJ   yk  ukJ  k  1,2,...,m
(This is standard Kohonen learning)
Function Approximation
spring 2006
7
Full Counterpropagation
(cont.)

Phase 2

The weights from the winning cluster unit J to the output units are
adjusted so that the vector of activations of the units in the Y
output layer, y*, is an approximation to the input vector y; x*, is an
approximation to the input vector x. The weight updates for the
units in the Y output and X output layers are
v  v  a  yk  v  k  1,2,..., m
 new old
old
 t Ji  t Ji  bxi  t Ji  i  1,2,..., n
(This is known as Grossberg learning)
new
Jk
Function Approximation
spring 2006
old
Jk
old
Jk
8
Architecture of Full
Counterpropagation
X1
w
Xi
u
Ym
Zj
v
Y1
*
Yk*
Ym*
Zp
Cluster layer
Y1
Yk
Z1
Xn
Function Approximation
spring 2006
Hidden layer
t
X1*
Xi*
Xn*
9
Full Counterpropagation
Algorithm
x : input trai ning vector : x  ( x1 ,..., xi ,..., xn )
y : target output corresponding to input x : y  ( y1,...,yk ,...,ym )
z j : activation of cluster layer unit Z j
x* : computed approximation to vector x
y * : computed approximation to vector y
wij : weight from X input layer, unit X i , to cluster layer, unit Z j
ukj : weight from Y input layer, unit Yk , to cluster layer, unit Z j
v jk : weight from cluster layer, unit Z j , to Y output layer, unit Yk*
t jk : weight from cluster layer, unit Z j , to X output layer, unit X i*
 ,  : learning rates for weight s into cluster layer (Kohonen learning)
Approximation rates for weight out from cluster layer (Grossberg learning)
10
aFunction
, b : learning
spring 2006
Full Counterpropagation
Algorithm (phase 1)







Step 1. Initialize weights, learning rates, etc.
Step 2. While stopping condition for Phase 1 is false, do Step 3-8
Step 3. For each training input pair x:y, do Step 4-6
Step 4. Set X input layer activations to vector x ;
set Y input layer activations to vector y.
Step 5. Find winning cluster unit; call its index J
Step 6. Update weights for unit ZJ:


Function Approximation
spring 2006
Step 7. Reduce learning rate  and .
Step 8. Test stopping condition for Phase 1 training
11
Full Counterpropagation algorithm
(phase 2)



Step 9. While stopping condition for Phase 2 is false, do Step 1016
(Note:  and  are small, constant values during phase 2)
Step 10. For each training input pair x:y, do Step 11-14




Function Approximation
spring 2006
Step 11. Set X input layer activations to vector x ;
set Y input layer activations to vector y.
Step 12. Find winning cluster unit; call its index J
Step 13. Update weights for unit ZJ:
12
Full Counterpropagation Algorithm
(phase 2)(cont.)

Step 14. Update weights from unit ZJ to the output
layers
Step 15. Reduce learning rate a and b.
Step 16. Test stopping condition for Phase


2 training.
Function Approximation
spring 2006
13
Which cluster is the winner?


dot product (find the cluster with the largest net input)
net j   xi wij   yk ukj
Euclidean distance i(find the clusterk with smallest
square distance from the input)
Dj   xi  wij    yk  ukj 
2
i
Function Approximation
spring 2006
2
k
14
Full Counterpropagation
Application

The application for counterpropagation is as
follows:



Step0: initialize weights.
step1: for each input pair x:y, do step 2-4.
Step2: set X input layer activation to vector x
set Y input layer activation to vector Y;
Function Approximation
spring 2006
15
Full Counterpropagation
Application (cont.)

Step3: find cluster unit Z, that is closest to the
input pair

Step4: compute approximations to x and y:

X*i=tji

Y*k=ujk
Function Approximation
spring 2006
16
Full counterpropagation example













Function approximation of
y=1/x
After training phase we have
Cluster unit
z1
z2
z3
z4
z5
z6
z7
z8
z9
z10
Function Approximation
spring 2006
v
0.11
0.14
0.20
0.30
0.60
1.60
3.30
5.00
7.00
9.00
w
9.0
7.0
5.0
3.3
1.6
0.6
0.3
0.2
0.14
0.11
17
Full counterpropagation example
(cont.)
X1
Y1
0.11
0.14
0.2
9.0
Function Approximation
spring 2006
Z1
7.0
5.0
Z2
.
.
.
7.0
Y1*
9.0
5.0
Z10
0.14
0.2
0.11
X1*
18
Full counterpropagation example
(cont.)


To approximate value for y for x=0.12
As we don’t know any thing about y compute D just by means of x


D1=(.12-.11)2 =.0001
D2=.0004
D3=.064
D4=.032
D5=.23
D6=2.2
D7=10.1
D8=23.8
D9=47.3
D10=81
Function Approximation
spring 2006
19
Forward Only Counterpropagation

Is a simplified version of the full
counterpropagation

Are intended to approximate y=f(x) function
that is not necessarily invertible

It may be used if the mapping from x to y is
well defined, but the mapping from y to x is
not.
Function Approximation
spring 2006
20
Forward Only Counterpropagation
Architecture
XY
X1
w
XY
u
Z1
Y1
Xi
Zj
Yk
Xn
Zp
Input layer
Function Approximation
spring 2006
Cluster layer
Ym
Output layer
21
Forward Only Counterpropagation
Algorithm






Step 1. Initialize weights, learning rates, etc.
Step 2. While stopping condition for Phase 1 is false, do Step 3-8
Step 3. For each training input x, do Step 4-6
Step 4. Set X input layer activations to vector x
Step 5. Find winning cluster unit; call its index j
Step 6. Update weights for unit ZJ:


new
old
old
wiJ
 wiJ
  xi  wiJ
, i  1,2,..., n



Function Approximation
spring 2006
Step 7. Reduce learning rate 
Step 8. Test stopping condition for Phase 1 training.
22







Step 9. While stopping condition for Phase 2 is false, do Step 10-16
(Note:  is small, constant values during phase 2)
Step 10. For each training input pair x:y, do Step 11-14
Step 11. Set X input layer activations to vector x ;
set Y input layer activations to vector y.
Step 12. Find winning cluster unit; call its index J
Step 13. Update weights for unit ZJ ( is small)


 wiJ weights
  xi from
 wiJunit, ZJ
i to1,2,...,
n layers
Stepw14.
the output
iJ Update
new

old
old
uJknew  uJkold  a yk  uJkold , k  1,2,...,m.


Step 15. Reduce learning rate a.
Step 16. Test stopping condition for Phase 2 training.
Function Approximation
spring 2006
23
Forward Only Counterpropagation
Application




Step0: initialize weights (by training in
previous subsection).
Step1: present input vector x.
Step2: find unit J closest to vector x.
Step3: set activation output units:
yk=ujk
Function Approximation
spring 2006
24
Forward only counterpropagation
example













Function approximation of
y=1/x
After training phase we have
Cluster unit
z1
z2
z3
z4
z5
z6
z7
z8
z9
z10
Function Approximation
spring 2006
w
0.5
1.5
2.5
.
.
.
.
.
.
9.5
u
5.5
0.75
0.4
.
.
.
.
.
.
0.1
25
Function Approximation
Using Neural Network
Introduction
Development
of Neural Network Weight Equations
Algebra Training Algorithms
Exact Matching of Function Input –Output Data
Approximate Matching of Gradient Data in Algebra
Training
Approximate Matching of Function Input-Output
Data
Exact Matching of Function Gradient Data
Function Approximation
spring 2006
26
Introduction

analytical description for a set of data

referred to as data modeling or system
identification
Function Approximation
spring 2006
27
standard tools



Splines
Wavelets
Neural network
Function Approximation
spring 2006
28
Why Using Neural Network

Splines & Wavelets not generalize well to
higher 3 dimensional spaces

universal approximators

parallel architecture

trained to map multidimensional nonlinear
functions
Function Approximation
spring 2006
29
Why Using Neural Network (cont)

Central to the solution of differential equations.



Provide differentiable closed-analytic- form solutions
have very good generalization properties
widely applicable

translates into a set of nonlinear, transcendental
weight equations

cascade structure


nonlinearity of the hidden nodes
linear operations in the input and output layers
Function Approximation
spring 2006
30
Function Approximation Using
Neural Network




functions not known analytically
have a set of precise input–output samples
functions modeled using an algebraic approach
design objectives:




exact matching
approximate matching
feedforward neural networks
Data:



Input
Output
And/or gradient information
Function Approximation
spring 2006
31
Objective

exact solutions

sufficient degrees of freedom


retaining good generalization properties
synthesize a large data set by a parsimonious
network
Function Approximation
spring 2006
32
Input-to-node values

algebraic training base

if all sigmoidal functions inputs are known weight
equations become algebraic

input-to-node values, sigmoidal functions inputs

determine the saturation level of each sigmoid at
a given data point
Function Approximation
spring 2006
33
weight equations structure

analyze & train a nonlinear neural network

means



linear algebra
controlling the distribution
controlling the saturation level of the active
nodes
Function Approximation
spring 2006
34
Function Approximation
Using Neural Network
Introduction
Development
of Neural Network Weight Equations
Algebra Training Algorithms
Exact Matching of Function Input –Output Data
Approximate Matching of Gradient Data in Algebra
Training
Approximate Matching of Function Input-Output
Data
Exact Matching of Function Gradient Data
Function Approximation
spring 2006
35
Development of Neural
Network Weight Equations

Objective


approximate a smooth scalar function of q Inputs
using a feedforward sigmoidal network
Function Approximation
spring 2006
36
Derivative information

can improve network’s generalization
properties
partial derivatives
with input

can be incorporated in the training set

Function Approximation
spring 2006
37
Network Output







z: computed as a nonlinear transformation
w: input weight
p: input
b: bias
d: output bias
v: output weight
:sigmoid functions


such as:
input-to-node variables
Function Approximation
spring 2006
38
Scalar OutPut of Network
Function Approximation
spring 2006
39
Exactly Match of the Function’s
Outputs

output weighted equation
Function Approximation
spring 2006
40
Gradient Equations

derivative of the network output with respect
to its inputs
Function Approximation
spring 2006
41
Exact Matching of the Function’s
Derivatives

gradient weight equations
Function Approximation
spring 2006
42
Input-to-node Weight Equations

rewriting 12
Function Approximation
spring 2006
43
Four Algebraic Algorithms

Exact Matching of Function Input –Output Data

Approximate Matching of Gradient Data in Algebra
Training

Approximate Matching of Function Input-Output
Data

Exact Matching of Function Gradient Data
Function Approximation
spring 2006
44
Function Approximation
Using Neural Network
Introduction
Development
of Neural Network Weight Equations
Algebra Training Algorithms
Exact Matching of Function Input –Output Data
Approximate Matching of Gradient Data in Algebra
Training
Approximate Matching of Function Input-Output
Data
Exact Matching of Function Gradient Data
Function Approximation
spring 2006
45
A.Exact Matching of Function
Input-Output Data



Input
S is known matrix ps
strategy for producing a well-conditioned S

input weights

o


random number N(0,1)
L scaling factor


user-defined scalar
input-to-node values that do not saturate the sigmoids
Function Approximation
spring 2006
46
Input bias

The input bias d is computed to center each
sigmoid at one of the training pairs
from
Function Approximation
spring 2006
47

Finally, the linear system in (9) is solved for v
by inverting S
Function Approximation
spring 2006
48

17 produced an ill-conditioned S =>
computation repeated
Function Approximation
spring 2006
49
Exact Input-Output-Based
Algebraic Algorithm
Function Approximation
spring 2006
50
Fig. 2-a. Exact input–output-based algebraic algorithm
Exact Input-Output-Based Algebraic
Algorithm with gradient information.
Fig. 2-b. Exact
input–output-based
algebraic algorithm
with added p-steps
for incorporating
gradient information.
Function Approximation
spring 2006
51
Then

Exact matching




Input
output
gradient information
solved exactly simultaneously for the neural
parameters.
Function Approximation
spring 2006
52
Function Approximation
Using Neural Network
Introduction
Development
of Neural Network Weight Equations
Algebra Training Algorithms
Exact Matching of Function Input –Output Data
Approximate Matching of Gradient Data in Algebra
Training
Approximate Matching of Function Input-Output
Data
Exact Matching of Function Gradient Data
Function Approximation
spring 2006
53
B.Approximate Matching of Gradient
Data in Algebra Training

estimate



output weights
input-to-node values
first soluation:


use randomized W
all parameters refined by a p-step node-by-node
update algorithm.
Function Approximation
spring 2006
54
Approximate Matching of Gradient Data
in Algebra Training (cont)

d and
Function Approximation
spring 2006
can be computed solely from
55
Approximate Matching of Gradient Data
in Algebra Training (cont)

kith gradient equations solved for the input
weights associated with the ith node
Function Approximation
spring 2006
56
Approximate Matching of Gradient Data
in Algebra Training (cont)

end of each step


terminate




Solve
user-specified gradient tolerance
error enters through v and through the input
weights
error adjusted in later steps
basic idea

ith node input weights mainly contribute to the kth
partial derivatives
Function Approximation
spring 2006
57
Function Approximation
Using Neural Network
Introduction
Development
of Neural Network Weight Equations
Algebra Training Algorithms
Exact Matching of Function Input –Output Data
Approximate Matching of Gradient Data in Algebra
Training
Approximate Matching of Function Input-Output
Data
Exact Matching of Function Gradient Data
Function Approximation
spring 2006
58
C.Approximate Matching of Function
Input-Output Data

algebraic approach



approximate parsimonious network
exact sulotion s<p satisfy rank(S|u)= rank(S)= s
example



linear system in (9) not square sp
inverse relationship between u and v
(9) will be overdetermined
Function Approximation
spring 2006
59
Approximate Matching of Function
Input-Output Data (cont)

superimposes technique



networks that individually map the nonlinear
function
over portions of its input
space
training set, covering entire input space
input space divided into m subsets
Function Approximation
spring 2006
60
Approximate Matching of Function
Input-Output Data (cont)

J
Fig. 3.Superposition of
one s-node network
Function Approximation
spring 2006
-node neural networks into
61
Approximate Matching of Function
Input-Output Data (cont)


the gth neural network approximates the
vector
by the estimate
Function Approximation
spring 2006
62
Approximate Matching of Function
Input-Output Data (cont)

full network

matrix of input-to-node values

with the

Terms


element in the ith column and kth row
main diagonal terms
 input-to-node value matrices for m sub-networks
off-diagonal terms,
 columnwise linearly dependent on the elements in
Function Approximation
spring 2006
63
Approximate Matching of Function
Input-Output Data (cont)

output weights

S constructed to be of rank s
rank of
= s or s+1
zero or small error during the superposition
error does not increase with m



Function Approximation
spring 2006
64
Approximate Matching of Function
Input-Output Data (cont)

key to developing algebraic training
techniques



construct a matrix S, through N
display the desired characteristics
desired characteristics


S must be of rank s
s is kept small to produce a parsimonious
network.
Function Approximation
spring 2006
65
Function Approximation
Using Neural Network
Introduction
Development
of Neural Network Weight Equations
Algebra Training Algorithms
Exact Matching of Function Input –Output Data
Approximate Matching of Gradient Data in Algebra
Training
Approximate Matching of Function Input-Output
Data
Exact Matching of Function Gradient Data
Function Approximation
spring 2006
66
D.Exact Matching of Function
Gradient Data

Gradient-based training sets
At every training point k



is known for e of the neural network inputs
denoted by x
remaining (q-e) denoted by a
Input–output information

Function Approximation
spring 2006
&
67
Exact Matching of Function Gradient
Data (cont)

input weight

output weight

gradient weight

input-to-node weight equation
Function Approximation
spring 2006
68
First Linear System(36)

by reorganizing all

s=p => is a known
vector
rewritten

values
-dimensional column
f
 A is a ps(q-e+1)s matrix
 computed from
all –input vectors

Function Approximation
spring 2006
69
Second Linear System(34)

known
 (34) system Becomes linear




always can be solved for v
provided s = p
S nonsingular
v can be treated as a constant
Function Approximation
spring 2006
70
Third Linear System(35)

(35) becomes linear
unknowns consist of x-input weights
 known gradients in training set
 X is a
known epes

Function Approximation
spring 2006
71
Exact Matching of Function
Gradient Data (cont)

algorithm goals




determines effective distribution for elements
weight equations solved in one step
first solved
strategy


with probability=1, produce well-conditioned S
consists of generating  according to
Function Approximation
spring 2006
72
Input-to-Output Values

Substituted in (38)
Function Approximation
spring 2006
73
Input-to-Output Values (cont)



sigmoids are very nearly centered
desirable one sigmoid be centered for a given
input
prevent ill-conditioning S


same sigmoid should close to saturation for any
other known input
need a factor

absolute value of the largest element in 
Function Approximation
spring 2006
74
Exact Matching of Function Gradient
Data (cont)

Function Approximation
spring 2006
75
Example: Neural Network
Modeling of the Sine Function

A sigmoidal neural network is trained to
approximate the sine function u=sin(y) over
the domain 0≤ y ≤π

The training set is comprised of the gradient
and output information shown in the
table1.{yk, uk , ck} k=1,2,3
 q=e=1
Function Approximation
spring 2006
76
Function Approximation
spring 2006
77
Function Approximation
spring 2006
78

It is shown that the data is matched exactly by a
network with two nodes

Suppose the input-to-node values and are chosen
such that
Function Approximation
spring 2006
79
Function Approximation
spring 2006
80
Function Approximation
spring 2006
81

equations. In this example, is chosen to make the
above weight equations consistent and to meet the
assumptions in (57) and (60)–(61). It can be easily
shown that this corresponds to computing the
elements of ( and ) from the equation
Function Approximation
spring 2006
82
Function Approximation
spring 2006
83
Function Approximation
spring 2006
84
Function Approximation
spring 2006
85
Conclusion

algebraic training vs optimization-based techniques.




faster execution speeds
better generalization properties
reduced computational complexity
can be used to find a direct correlation between the number of network
nodes needed to model a given data set and the desired accuracy of
representation.
Function Approximation
spring 2006
86
Function
Approximation
Fariba Sharifian
Somaye Kafi
Function Approximation
spring 2006
87