Probabilistic Neural Nets in Knowledge Intense Learning Tasks (16pt bold, Title Style) Mieczysław A. Kłopotek (12 pt bold) Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland e-mail: klopotek@ipipan.waw.pl (10pt) Abstract: In this paper an idea of modeling technical processes for purposes of process optimization with restricted amount of experimental data is described. It is based on tuning micro-models to reflect real-world data. Quickly learning probabilistic neural networks are used as a vehicle to invert independent parameters of micro-models into ones depending on macro-statistics of a simulated process. (10pt italic) Keywords: Artificial Neural Networks, technical process design 1. Introduction (11pt bold, sentence style: only first word capitalized) The papers should be at most 14 pages long. Printing area is 11.7 cm wide and 19 cm high.. Margins should be left of at least 4 cm from the top and 4 cm from the left. No page numbering should be inserted. No headers and no footers are permissible. (Times New Roman or similar used everywhere) Many (10 pt) practical problems, e.g. in engineering, consist in searching a model for a process, finding optimal process conditions in this model and / or optimal process control. For example in chemical engineering, given laboratory experiments, optimal synthesis conditions are sought yielding maximal gain and selectivity while reducing negative side effects. The results have to be moved, in appropriate steps, to industrial scale production where the production process has to be controlled in such a way as to keep the maximum productivity while avoiding dangerous or risky situations Another example is optimal macro-control of social and economic processes [5] (this is the recommended citation style). 1. 1. Subintroduction (10 pt bold, sentence style: only first word capitalized) Usually, no explicit nor implicit analytical model combining control with its effects is available. Under these conditions the mathematical experiment planning is a well-founded methodology for search of optimum. However, high costs of planned experiments and non-linearity of the process under consideration make frequently it impossible to find an optimum in this way. Hence another type of model of the phenomenon under consideration has to be found that would - allow for process optimization and - require a restricted number of experiments. 2. General idea Neural networks with hidden layers are frequently considered as an effective method of modeling non-linear behavior [6,13]. On the one hand they are equivalent with some methods of statistical estimation, on the other hand they possess a nice mehod of learning by presentation of input and expected output data of the model to be created. However, for purposes of applications considered here, most types of neural networks offer severe disadvantages: 1. they require relatively large sample sizes - unacceptable due to high costs (e.g. of industrial scale experiments) or unavailability of data (few countries with comparable economies) 2. they have long training times - which excludes applications with real-time learning 3. results of learning depend on presentation sequence 4. they have significant learning parameters (e.g. number of hidden layers) that have no direct relation to the application problem Probabilistic neural networks (PNN) [12] seem to an exception to this rule. They learn quickly, even with a small sample, and the number of net-specific parameters is limited (e.g. AINET [1] has only one such parameter). data PNN Learning New case PNN PNN Application Results . Fig. 1 Typical application of PNN (figure caption centered) However, these networks are feed-forward ones so that optimization tasks cannot be carried out by them (see fig.1). In particular, also their own parameters cannot be automatically optimized. Therefore, an additional component for finding optima is needed. We suggest usage of Evolutionsstrategien (ES) [10] for this purpose (see fig.2). Subsequent sections will explain in detail PNN and ES. data PNN Learning PNN ES Optimal solution Fig. 2 ES cooperating with PNN for finding an optimal solution Usually, the optimum will be relative only to the current model and therefore needs to be verified empirically, so that an iterative process will take place - enhancng the PNN model based on the empirical data. (see fig.3) cooperating with PNN in an optimization loop Simulation data PNN Learning Real data PNN Application Learning Simulation experiment for model parameters Simulation experiment of the process PNN Simulation data + real data PNN Learning Simulation parameters PNN ES Optimal solution Real world experiment Fig. 4 Exploitation of simulation models However, frequently the costs of experiments are prohibitive. But there exist models of the process of interest, e.g. the ChemCad [5] for chemical processes. Such models, being general in nature, usually are not prepared for simulation of our particular case, especially if the chemical process under consideration is a new invention. Usually, the models have some parameters (e.g. the coefficients of synthesis speed) that need to be adjusted for a particular process. These parameters are in general micro-scale dynamic parameters, that is they are not observable directly. Only the total input and output of the process can be traced. In this case the PNN can be exploited in the way described in fig.4. First some trial simulations with guessed micro-parameters are carried out and the macro-effects are resulting from a process simulation. Then with PNN learning a mapping from macro-effects to micro-parameters is sought (inversion of the simulation process). Then using the real world data available micro-parameters can be estimated. Usually repeated simulations with the acquired microparameters are to be carried out to achieve good agreement with empirical observations. Once the micro-parameters are tuned, a simulation study of the process considered may start, optimal process conditions can be calculated as previously described using ES and repeated (real and/or simulated) experiments. Notice that a qualitative jump is achieved with architectures presented above. Neural networks are usually associated with black boxes, where one tries to create a real-world model in case of missing theoretical knowledge. However, if neural networks are coupled with evolutionsstrategien and with some simulation models, they can in fact make use of domain knowledge incorporated in the simulation model and in the constraints of evolutionsstrategien. 3. Probabilistic Neural Network PNN or "Probabilistic Neural Network" is Specht's [12] term for kernel discriminant analysis. (Kernels are also called "Parzen windows".) One can think of it as a normalized RBF (radial basis function) network in which there is a hidden unit centered at every training case. These RBF units are called "kernels" and are usually probability density functions such as the Gaussian. The hidden-tooutput weights are usually 1 or 0; for each hidden unit, a weight of 1 is used for the connection going to the output that the case belongs to, while all other connections are given weights of 0. Alternatively, you can adjust these weights for the prior probabilities of each class. So the only weights that need to be learned are the widths of the RBF units. These widths (often a single width is used) are called "smoothing parameters" or "bandwidths" and are usually chosen by crossvalidation or by some other method. Gradient descent is not used. Specht claims that a PNN trains 100,000 times faster than backpropagation network. While they are not iterative in the same sense as backpropagation, kernel methods require apriorical estimation of the kernel bandwidth, and this requires accessing the data many times. Furthermore, computing a single output value with kernel methods requires either accessing the entire training data or clever programming, and either way is much slower than computing an output with a feed-forward net. PNN is just faster when the amount of training data is low. This is the case when usually backpropagation fails, as in the applications considered. PNN is a universal approximator for smooth class-conditional densities, so it should be able to solve any smooth classification problem given enough data. The main drawback of PNN is that, like kernel methods in general, it suffers badly from the curse of dimensionality. PNN cannot ignore irrelevant inputs without major modifications to the basic algorithm. So PNN is not likely to be the top choice if there are more than 5 or 6 nonredundant inputs. 5-10 variables are in fact maximum number of independent inputs in technical applications under consideration. There exist also modified algorithms that deal with irrelevant inputs, see [7,8]. If all inputs are relevant, PNN has the very useful ability to tell you whether a test case is similar (i.e. has a high density) to any of the training data. Fig. 5 A model of a PNN In Fig.5 an example of a PNN (so-called AiNet [1]) is visible. Denotation: p m model vector, i indicates the neuron, belonging to the input variable, o indicates the neuron, belonging to the output variable. N number of model vectors, M number of input variables of the phenomenon, K number of output variables of the phenomenon (K is equal to 1 in presented case, and is omitted), prediction vector, pc penalty coefficient The weights on connections are either equal to one or equal to zero. The expression for weight adaptation can be written as: wij wij kj , where wij is equal to 1.0, and ij is defined : 1;i j 0;i j ij Network works in prediction mode according to the following scheme: layer A: value of the neuron: transfer function: output value X iA pi , of linear the neuron: of linear the neuron: Yi f ( X i ) X i . A layer B: A A value of the neuron: M X ijB YkA miij kj k 1 transfer function: output value YijB f X ijB X ijB 2 M layer C: value of the neurons type d: transfer function: output value Yi C f X iC , pc e of X C i YijB linear the j 1 neuron: X iC pc value of the neuron type mo: transfer function: output value of Yi C f ( X iC ) X iC moi . X iC moi , linear the neuron: N layer D: X value of the neuron: D Yi C , i 1 N X D Yi C Yi C i 1 transfer function: output value po Y D f X D , X D of linear the neuron: D X XD 4. Evolution Strategy (ES) Evolutionsstrategien [10,11] were invented to solve technical optimization problems like e.g. constructing an optimal flashing nozzle, and until recently ES were predominantly used by civil engineers, as an alternative to standard solutions. Usually no closed form analytical objective function is available for technical optimization problems and hence, no applicable optimization method exists, but the engineer's intuition. In a two-membered or (1+1) ES, one parent generates one offspring per generation by applying normally distributed mutations, i.e. smaller steps occur more likely than big ones, until a child performs better than its ancestor and takes its place. Because of this simple structure, theoretical results for stepsize control and convergence velocity could be derived. The first algorithm, using mutation only, has then been enhanced to a (m+1) strategy which incorporated recombination due to several, i.e. m parents being available. The mutation scheme and the exogenous stepsize control were taken across unchanged from (1+1) ESs. Schwefel later generalized these strategies to the multimembered ES now denoted by (m+l) and (m,l) which imitates the following basic principles of organic evolution: a population, leading to the possibility of recombination with random mating, mutation and selection. These strategies are termed plus strategy and comma strategy, respectively: in the plus case, the parental generation is taken into account during selection, while in the comma case only the offspring undergoes selection, and the parents die off. Notice that also evolutionary programs could be used for optimization problems [9]. 5 Conclusions In the paper an architecture for engaging neural networks into knowledge-intense learning tasks has been proposed. Neural networks are usually associated with black boxes, where one tries to create a real-world model in case of missing theoretical knowledge. However, if neural networks are coupled with evolutionsstrategien and with some simulation models, they can in fact make use of domain knowledge incorporated in the simulation model and in the constraints of evolutionsstrategien. The proposed architecture could be used e.g. - for introductory analysis of costs of implementing a technology [3] - for evaluation of usefulness of changes in an existing technology [2,4] - for identification of simulation parameters of newly elaborated technologies, - for optimal real-time control of technological processes References 1. AINET - documentation URL http://www.ainet-sp.si/aiNetNN.htm 2. Adamska-Rutkowska D.: Modelowanie złożonych procesów chemicznych za pomocą sieci neuronowych, Przemysł Chemiczny 77/7(1998), 247-250 3. Adamska-Rutkowska D., Rejewski P.: Szacowanie kosztów inwestycyjnych I eksploatacyjnych technologii chemicznych za pomocą sieci neuronoweej na wstępie cyklu badawczo-wdrożeniowego, Przemysł Chemiczny 78/3(1999), 83-86 4. Adamska-Rutkowska D.: Wykorzystanie sieci neuronowych do estymacji parametrów matematycznego modelu procesu chemicznego, Przemysł Chemiczny 77/12(1998), 446-448 5. ChemCad by Chemstations http://www.chemstations.net/ 6. Gately E. Ed . Sieci neuronowe. Prognozowanie finansowe i projektowanie systemów transakcyjnych. Tłum. z ang. Warszawa 1999 WIG-Press 7. Lowe, D.G., Similarity metric learning for a variable-kernel classifier, Neural Computation, 7, (1995) 72-85, http://www.cs.ubc.ca/spider/lowe/pubs.html 8. Masters, T. Advanced Algorithms for Neural Networks: A C++ Sourcebook, NY: John Wiley and Sons, (1995) 9. Michalewicz Z.::. Algorytmy genetyczne + struktury danych = programy ewolucyjne. Tłum. z ang. Warszawa 1996 WN-T 10. Rechenberg, I.: Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution, Stuttgart: FrommanHolzboog. (1973) 11. Schwefel, H.-P:. Numerische Optimierung von Computermodellen mittels der Evolutionsstrategie, Basel: Birkhäuser. (1977) 12. Specht, D.F.: Probabilistic neural networks, Neural Networks, 3, (1990) 110-118. 13. Tadeusiewicz R.: Sieci neuronowe. Warszawa 1993, Akad. Oficyna. Wydawnicza.
© Copyright 2025 Paperzz