Frequently Asked Questions about Neural Networks.

plot [0:7] [0:3] "iris1" notitle, "iris2" notitle, "iris3" notitle, \ (-0.6*x + 2.0)/0.8 notitle, -0.5*x + 4.0 notitle sigmoid(x) = 1.0/(1.0 + exp(-x)) unit1(x,y) = sigmoid(0.95022*x + 1.2699*y - 2.7849) unit2(x,y) = sigmoid(-3.6089*x - 11.5877*y + 38.2507) out1(x,y) = sigmoid(-22.7877*unit1(x,y) + 0.26565*unit2(x,y) + 5.023) out2(x,y) = sigmoid(16.2471*unit1(x,y) + 7.9134*unit2(x,y) - 18.1983) out3(x,y) = sigmoid(16.196*unit1(x,y) - 37.7073*unit2(x,y) - 12.8456) splot [0:7] [0:3] unit1(x,y), unit2(x,y) splot [0:7] [0:3] out1(x,y), out2(x,y), out3(x,y) plot [-5:5] sigmoid(x)1.3.1. Approximation, the first four graphs were generated using the book's Matlab software, the fifth graph was generated using the gnuplot command:

plot [-1:2] x**2 + 2*x**3 title "f", 5.2*x + 0.9 title "L2", \ 3.62*x + 0.94 title "L1", 7.0*x + 0.98 title "Linf"I calculated the norms using Maple to do some integration, algebra, and calculations for me.

To do the least-squares calculation, I started with the
iris1, iris2, iris3 files. For each line of two inputs,
*x*_{1} and *x*_{2}, I generated a line of
six inputs and one output. The six inputs are the zero-, first-, and
second-order terms, and the output is either 0 or 1. The resulting file was input to a Perl hack of mine for calculating the
least squares coefficients (not to be used seriously). The last plot
used those coefficients and these three files (iris11, iris22, iris33). Here are the gnuplot commands to create
the plot:

set contour base set view 0,0 set cntrparam levels incremental 0,0.25,1 h(x,y) = -1.30465212333162 + x*1.15333621204672 - y*0.0687323088406203 \ - x*x*0.197952522130871 - y*y*0.744328027286469 + x*y*0.396016730830849 splot [0:7] [0:3] "iris11" notitle, "iris22" notitle, \ "iris33" notitle, h(x,y) notitle1.3.2 Nonlinear Approximation. I describe how three of the figures were generated.

The noise in the 1+2*x* function is from a uniform distribution
between -1 and 1. Using gnuplot, the error on the data was calculated
by:

err(x,y,w1,w2) = (y - (w1 + w2*x))**2 sumerr(w1,w2) = \ err(-4,-7.6193314,w1,w2) + err(-3,-4.653893,w1,w2) + \ err(-2,-3.201543,w1,w2) + err(-1,-0.22456038,w1,w2) + \ err(0,0.4883707,w1,w2) + err(1,3.3397593,w1,w2) + \ err(2,4.6062903,w1,w2) + err(3,7.4832516,w1,w2) + \ err(4,9.154069,w1,w2)The error surface was generated by:

set nocontour set view 60,60 set surface set xlabel "w0" set ylabel "w1" splot [-2:4] [0:4] sumerr(x,y) notitle lw 2The error contours was generated by:

set contour base set cntrparam levels discrete 5,10,20,50 set view 0,0 set nosurface set xlabel "w0" set ylabel "w1" splot [-2:4] [0:4] sumerr(x,y) lw 2Creating a .eps file was done by:

set output "linerr.eps" set terminal postscript eps color "Timesroman" 24 replot set terminal x11I used xfig to convert .eps files to .pdf files.

1.4. Regression and Classification

The first graph was generated with the following gnuplot commands:

sx = sqrt(2.1) sy = sqrt(1.2) sxy = -0.9 p = sxy/(sx*sy) pi = 2.0*acos(0) P(x,y) = 1.0/(2*pi*sx*sy*sqrt(1-p*p))*exp(-0.5/(1-p*p)* \ (((x-2.5)/sx)**2 - 2*p*(x-2.5)/sx*(y-1.5)/sy + ((y-1.5)/sy)**2)) splot [0:4.5] [-1:4] P(x,y)It was saved to a .eps file by these additional gnuplot commands:

set output "mndist.eps" set terminal postscript eps color "Timesroman" 24 replot set terminal x11I used xfig to create a .pdf file from the .eps file.

The contour plot was generated by still more gnuplot commands:

set view 0,0 set contour base set surface splot [0:4.5] [-1:4] P(x,y) lw 2, "multinorm.line" notitle with lines lw 2The file multinorm.line are points on the line that can be derived from the book's formulas.

The last graph is generated from the gnuplot commands:

plot "multinorm.data" using 2:3 notitle ps 2, \ 3.412 - 0.687*x notitle lw 2, \ 1.5 - 0.9*(x-2.5)/2.1 notitle lw 2The file multinorm.data are points generated randomly using matlab. In the matlab command window:

Sigma = [2.1 -0.9; -0.9 1.2]; mu = [2.5 1.5]; mvnrnd(mu, Sigma)will generate a single random point from a normal distribution with mean

1.4.2. Classification with Normal Distributions

In gnuplot, I define the three probability distribution and decision functions as follows:

sa1 = sqrt(0.0301) sa2 = sqrt(0.0115) sa12 = 0.0057 pa = sa12/(sa1*sa2) pi = 2.0*acos(0) PA(x1,x2) = 1.0/(2.0*pi*sa1*sa2)*sqrt(1.0-pa*pa)*exp(-0.5/(1.0-pa*pa)* \ (((x1-1.464)/sa1)**2 - 2*pa*((x1-1.464)/sa1)*((x2-0.244)/sa2) + \ ((x2-0.244)/sa2)**2)) DA(x1,x2) = -0.5*log(1.0-pa*pa) -0.5/(1.0-pa*pa)* \ (((x1-1.464)/sa1)**2 - 2*pa*((x1-1.464)/sa1)*((x2-0.244)/sa2) + \ ((x2-0.244)/sa2)**2) sb1 = sqrt(0.2208) sb2 = sqrt(0.0731) sb12 = 0.0731 pb = sb12/(sb1*sb2) pi = 2.0*acos(0) PB(x1,x2) = 1.0/(2.0*pi*sb1*sb2)*sqrt(1.0-pb*pb)*exp(-0.5/(1.0-pb*pb)* \ (((x1-4.260)/sb1)**2 - 2*pb*((x1-4.260)/sb1)*((x2-1.326)/sb2) + \ ((x2-1.326)/sb2)**2)) DB(x1,x2) = -0.5*log(1.0-pb*pb) -0.5/(1.0-pb*pb)* \ (((x1-4.260)/sb1)**2 - 2*pb*((x1-4.260)/sb1)*((x2-1.326)/sb2) + \ ((x2-1.326)/sb2)**2) sc1 = sqrt(0.3046) sc2 = sqrt(0.0754) sc12 = 0.0488 pc = sc12/(sc1*sc2) pi = 2.0*acos(0) PC(x1,x2) = 1.0/(2.0*pi*sc1*sc2)*sqrt(1.0-pc*pc)*exp(-0.5/(1.0-pc*pc)* \ (((x1-5.552)/sc1)**2 - 2*pc*((x1-5.552)/sc1)*((x2-2.026)/sc2) + \ ((x2-2.026)/sc2)**2)) DC(x1,x2) = -0.5*log(1.0-pc*pc) -0.5/(1.0-pc*pc)* \ (((x1-5.552)/sc1)**2 - 2*pc*((x1-5.552)/sc1)*((x2-2.026)/sc2) + \ ((x2-2.026)/sc2)**2)PA, PB, and PC are the probability distribution functions, and DA, DB, and DC are slightly simpler decision functions. The plot of the distributions was done by:

splot [0:7] [0:3] [0:2] PA(x,y) notitle, PB(x,y) notitle, PC(x,y) notitleand the decision boundaries by:

set contour base set view 0,0 set cntrparam levels discrete 0.0 set cntrparam points 100 splot [0:7] [0:3] "iris11" notitle ps 2, "iris22" notitle ps 2, \ "iris33" notitle ps 2, \ DA(x,y) - DB(x,y) notitle lw 2, DC(x,y) - DB(x,y) notitle lw 2where iris11, iris22, and iris33 are three files.

The contour for the higher loss was done by:

set cntrparam levels discrete log(100) replot

The linear decision boundaries were graphed by changing some of the variables:

sa1 = sqrt(0.1852) sa2 = sqrt(0.0420) sa12 = 0.0425 sb1 = sqrt(0.1852) sb2 = sqrt(0.0420) sb12 = 0.0425 sc1 = sqrt(0.1852) sc2 = sqrt(0.0420) sc12 = 0.0425and doing the same kind of plots as before.

3.1 Perceptrons, Project Framework, Perceptron Convergence in Nonseparable Case, 3.2 Adaline, Adaline Convergence.

4.1 Multilayer Perceptrons, Universal Approximation, 4.3 Practical Aspect of Neural Networks.

Here is the original glass dataset and a description of its attributes. After scaling and reformatting, here is the glass dataset that I used to produce my notes.

Introduction to Support Vector Machines and Kernel Functions, A Simple Algorithm that Combines Examples, Statistical Learning Theory, Hyperplane Classification, Support Vector Classification, SMO Algorithm for SVMs, SVM Examples withe a Gaussian Kernel, Support Vector Regression.

Here is a program for learning SVMs (zip file) (tar file). Read the README file. This has been replaced (11/14/03) with a new version with fewer bugs.

Error and Risk, Train and Test, Comparing Algorithms and an Example, Other Aspects of Evaluation.

[irisp irist] = iris(0)The 0 is a useless argument because I didn't take the time to figure out how to write a function with zero arguments. Now

There is also Matlab software in
conjunction with the book to do some simple experiments. Unzip this
file in your matlab directory, and change the name of the new
subdirectory from `LearnSCp` to `learnsc`. This will
help you to follow the book's instructions of how to use the software.