Springer

Probabilistic Neural Nets in Knowledge Intense
Learning Tasks (16pt bold, Title Style)
Mieczysław A. Kłopotek (12 pt bold)
Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland
e-mail: klopotek@ipipan.waw.pl (10pt)
Abstract: In this paper an idea of modeling technical processes for purposes of
process optimization with restricted amount of experimental data is described. It
is based on tuning micro-models to reflect real-world data. Quickly learning
probabilistic neural networks are used as a vehicle to invert independent
parameters of micro-models into ones depending on macro-statistics of a
simulated process. (10pt italic)
Keywords: Artificial Neural Networks, technical process design
1. Introduction (11pt bold, sentence style: only first word capitalized)
The papers should be at most 14 pages long. Printing area is 11.7 cm
wide and 19 cm high.. Margins should be left of at least 4 cm from the top and 4
cm from the left. No page numbering should be inserted. No headers and no
footers are permissible.
(Times New Roman or similar used everywhere) Many (10 pt) practical
problems, e.g. in engineering, consist in searching a model for a process, finding
optimal process conditions in this model and / or optimal process control. For
example in chemical engineering, given laboratory experiments, optimal synthesis
conditions are sought yielding maximal gain and selectivity while reducing
negative side effects. The results have to be moved, in appropriate steps, to
industrial scale production where the production process has to be controlled in
such a way as to keep the maximum productivity while avoiding dangerous or
risky situations Another example is optimal macro-control of social and economic
processes [5] (this is the recommended citation style).
1. 1. Subintroduction (10 pt bold, sentence style: only first word capitalized)
Usually, no explicit nor implicit analytical model combining control with
its effects is available. Under these conditions the mathematical experiment
planning is a well-founded methodology for search of optimum. However, high
costs of planned experiments and non-linearity of the process under consideration
make frequently it impossible to find an optimum in this way.
Hence another type of model of the phenomenon under consideration has
to be found that would
-
allow for process optimization and
-
require a restricted number of experiments.
2. General idea
Neural networks with hidden layers are frequently considered as an
effective method of modeling non-linear behavior [6,13]. On the one hand they are
equivalent with some methods of statistical estimation, on the other hand they
possess a nice mehod of learning by presentation of input and expected output
data of the model to be created. However, for purposes of applications considered
here, most types of neural networks offer severe disadvantages:
1.
they require relatively large sample sizes - unacceptable due to high costs
(e.g. of industrial scale experiments) or unavailability of data (few countries
with comparable economies)
2.
they have long training times - which excludes applications with real-time
learning
3.
results of learning depend on presentation sequence
4.
they have significant learning parameters (e.g. number of hidden layers) that
have no direct relation to the application problem
Probabilistic neural networks (PNN) [12] seem to an exception to this rule. They
learn quickly, even with a small sample, and the number of net-specific
parameters is limited (e.g. AINET [1] has only one such parameter).
data
PNN
Learning
New
case
PNN
PNN
Application
Results
.
Fig. 1 Typical application of PNN (figure caption centered)
However, these networks are feed-forward ones so that optimization tasks
cannot be carried out by them (see fig.1). In particular, also their own parameters
cannot be automatically optimized.
Therefore, an additional component for finding optima is needed. We
suggest usage of Evolutionsstrategien (ES) [10] for this purpose (see fig.2).
Subsequent sections will explain in detail PNN and ES.
data
PNN
Learning
PNN
ES
Optimal
solution
Fig. 2 ES cooperating with PNN for finding an optimal solution
Usually, the optimum will be relative only to the current model and
therefore needs to be verified empirically, so that an iterative process will take
place - enhancng the PNN model based on the empirical data. (see fig.3)
cooperating with PNN in an optimization loop
Simulation data
PNN
Learning
Real
data
PNN
Application
Learning
Simulation
experiment
for model
parameters
Simulation
experiment
of the
process
PNN
Simulation data
+ real
data
PNN
Learning
Simulation
parameters
PNN
ES
Optimal
solution
Real world
experiment
Fig. 4 Exploitation of simulation models
However, frequently the costs of experiments are prohibitive. But there
exist models of the process of interest, e.g. the ChemCad [5] for chemical
processes. Such models, being general in nature, usually are not prepared for
simulation of our particular case, especially if the chemical process under
consideration is a new invention. Usually, the models have some parameters (e.g.
the coefficients of synthesis speed) that need to be adjusted for a particular
process. These parameters are in general micro-scale dynamic parameters, that is
they are not observable directly. Only the total input and output of the process can
be traced. In this case the PNN can be exploited in the way described in fig.4. First
some trial simulations with guessed micro-parameters are carried out and the
macro-effects are resulting from a process simulation. Then with PNN learning a
mapping from macro-effects to micro-parameters is sought (inversion of the
simulation process). Then using the real world data available micro-parameters
can be estimated. Usually repeated simulations with the acquired microparameters are to be carried out to achieve good agreement with empirical
observations. Once the micro-parameters are tuned, a simulation study of the
process considered may start, optimal process conditions can be calculated as
previously described using ES and repeated (real and/or simulated) experiments.
Notice that a qualitative jump is achieved with architectures presented
above. Neural networks are usually associated with black boxes, where one tries
to create a real-world model in case of missing theoretical knowledge. However, if
neural networks are coupled with evolutionsstrategien and with some simulation
models, they can in fact make use of domain knowledge incorporated in the
simulation model and in the constraints of evolutionsstrategien.
3. Probabilistic Neural Network
PNN or "Probabilistic Neural Network" is Specht's [12] term for kernel
discriminant analysis. (Kernels are also called "Parzen windows".) One can think
of it as a normalized RBF (radial basis function) network in which there is a
hidden unit centered at every training case. These RBF units are called "kernels"
and are usually probability density functions such as the Gaussian. The hidden-tooutput weights are usually 1 or 0; for each hidden unit, a weight of 1 is used for
the connection going to the output that the case belongs to, while all other
connections are given weights of 0. Alternatively, you can adjust these weights for
the prior probabilities of each class. So the only weights that need to be learned
are the widths of the RBF units. These widths (often a single width is used) are
called "smoothing parameters" or "bandwidths" and are usually chosen by crossvalidation or by some other method. Gradient descent is not used.
Specht claims that a PNN trains 100,000 times faster than
backpropagation network. While they are not iterative in the same sense as
backpropagation, kernel methods require apriorical estimation of the kernel
bandwidth, and this requires accessing the data many times. Furthermore,
computing a single output value with kernel methods requires either accessing the
entire training data or clever programming, and either way is much slower than
computing an output with a feed-forward net. PNN is just faster when the amount
of training data is low. This is the case when usually backpropagation fails, as in
the applications considered.
PNN is a universal approximator for smooth class-conditional densities,
so it should be able to solve any smooth classification problem given enough data.
The main drawback of PNN is that, like kernel methods in general, it suffers badly
from the curse of dimensionality. PNN cannot ignore irrelevant inputs without
major modifications to the basic algorithm. So PNN is not likely to be the top
choice if there are more than 5 or 6 nonredundant inputs. 5-10 variables are in fact
maximum number of independent inputs in technical applications under
consideration.
There exist also modified algorithms that deal with irrelevant inputs, see [7,8].
If all inputs are relevant, PNN has the very useful ability to tell you
whether a test case is similar (i.e. has a high density) to any of the training data.
Fig. 5 A model of a PNN
In Fig.5 an example of a PNN (so-called AiNet [1]) is visible. Denotation:

p

m model vector,

i
indicates the neuron, belonging to the input variable,

o
indicates the neuron, belonging to the output variable.

N number of model vectors,

M number of input variables of the phenomenon,

K number of output variables of the phenomenon (K is equal to 1
in presented case, and is omitted),
prediction vector,

pc penalty coefficient
The weights on connections are either equal to one or equal to zero. The
expression for weight adaptation can be written as:
wij  wij  kj ,
where wij is equal to 1.0, and ij is defined :
1;i  j
0;i  j
 ij  
Network works in prediction mode according to the following scheme:

layer A:
value of the neuron:
transfer function:
output
value
X iA  pi ,
of
linear
the
neuron:
of
linear
the
neuron:
Yi  f ( X i )  X i .
A

layer B:
A
A
value of the neuron:
M


X ijB   YkA  miij  kj
k 1
transfer function:
output
value
   
YijB  f X ijB  X ijB
2
M

layer C:
value of the neurons type d:
transfer function:
output
value
Yi C  f  X iC , pc  e
of
X
C
i
  YijB
linear
the
j 1
neuron:
 X iC
pc
value of the neuron type mo:
transfer function:
output
value
of
Yi C  f ( X iC )  X iC  moi .
X iC  moi ,
linear
the
neuron:
N

layer D:
X
value of the neuron:
D
  Yi C ,
i 1
N
X D   Yi C Yi C
i 1
transfer function:
output
value
po  Y D  f  X D , X D  
of
linear
the
neuron:
D
X
XD
4. Evolution Strategy (ES)
Evolutionsstrategien [10,11] were invented to solve technical
optimization problems like e.g. constructing an optimal flashing nozzle, and until
recently ES were predominantly used by civil engineers, as an alternative to
standard solutions. Usually no closed form analytical objective function is
available for technical optimization problems and hence, no applicable
optimization method exists, but the engineer's intuition.
In a two-membered or (1+1) ES, one parent generates one offspring per
generation by applying normally distributed mutations, i.e. smaller steps occur
more likely than big ones, until a child performs better than its ancestor and takes
its place. Because of this simple structure, theoretical results for stepsize control
and convergence velocity could be derived. The first algorithm, using mutation
only, has then been enhanced to a (m+1) strategy which incorporated
recombination due to several, i.e. m parents being available. The mutation scheme
and the exogenous stepsize control were taken across unchanged from (1+1) ESs.
Schwefel later generalized these strategies to the multimembered ES now denoted
by (m+l) and (m,l) which imitates the following basic principles of organic
evolution: a population, leading to the possibility of recombination with random
mating, mutation and selection. These strategies are termed plus strategy and
comma strategy, respectively: in the plus case, the parental generation is taken into
account during selection, while in the comma case only the offspring undergoes
selection, and the parents die off.
Notice that also evolutionary programs could be used for optimization
problems [9].
5 Conclusions
In the paper an architecture for engaging neural networks into
knowledge-intense learning tasks has been proposed. Neural networks are usually
associated with black boxes, where one tries to create a real-world model in case
of missing theoretical knowledge. However, if neural networks are coupled with
evolutionsstrategien and with some simulation models, they can in fact make use
of domain knowledge incorporated in the simulation model and in the constraints
of evolutionsstrategien.
The proposed architecture could be used e.g.
-
for introductory analysis of costs of implementing a technology [3]
-
for evaluation of usefulness of changes in an existing technology [2,4]
-
for identification of simulation parameters of newly elaborated
technologies,
-
for optimal real-time control of technological processes
References
1.
AINET - documentation URL http://www.ainet-sp.si/aiNetNN.htm
2.
Adamska-Rutkowska D.: Modelowanie złożonych procesów chemicznych za pomocą sieci neuronowych, Przemysł Chemiczny 77/7(1998),
247-250
3.
Adamska-Rutkowska D., Rejewski P.: Szacowanie kosztów inwestycyjnych I eksploatacyjnych technologii chemicznych za pomocą sieci
neuronoweej na wstępie cyklu badawczo-wdrożeniowego, Przemysł
Chemiczny 78/3(1999), 83-86
4.
Adamska-Rutkowska D.: Wykorzystanie sieci neuronowych do estymacji parametrów matematycznego modelu procesu chemicznego,
Przemysł Chemiczny 77/12(1998), 446-448
5.
ChemCad by Chemstations http://www.chemstations.net/
6.
Gately E. Ed . Sieci neuronowe. Prognozowanie finansowe i
projektowanie systemów transakcyjnych. Tłum. z ang. Warszawa 1999
WIG-Press
7.
Lowe, D.G., Similarity metric learning for a variable-kernel classifier,
Neural Computation, 7, (1995) 72-85,
http://www.cs.ubc.ca/spider/lowe/pubs.html
8.
Masters, T. Advanced Algorithms for Neural Networks: A C++
Sourcebook, NY: John Wiley and Sons, (1995)
9.
Michalewicz Z.::. Algorytmy genetyczne + struktury danych = programy
ewolucyjne. Tłum. z ang. Warszawa 1996 WN-T
10. Rechenberg, I.: Evolutionsstrategie: Optimierung technischer Systeme
nach Prinzipien der biologischen Evolution, Stuttgart: FrommanHolzboog. (1973)
11. Schwefel, H.-P:. Numerische Optimierung von Computermodellen mittels
der Evolutionsstrategie, Basel: Birkhäuser. (1977)
12. Specht, D.F.: Probabilistic neural networks, Neural Networks, 3, (1990)
110-118.
13. Tadeusiewicz R.: Sieci neuronowe. Warszawa 1993, Akad. Oficyna.
Wydawnicza.