1990

Probabilistic Interpretation of Feedforward Classification Network Outputs, With Relationship to Statistical Pattern Recognition

John S. Bridle

citations

Cite Score

54

AI summary

This paper presents a method to interpret feed-forward non-linear networks as probabilities, using softmax and radial units for Gaussian within-class distributions, achieving improved class discrimination through cross-class training.

Main Contributions

  • Introduces probability scoring as an alternative to squared error minimization.
  • Presents a normalized exponential (softmax) multi-input generalization of the logistic non-linearity.
  • Proposes the use of radial units before the softmax output stage to compute posterior distributions over class labels based on Gaussian within-class distributions.
  • Demonstrates that cross-class information during training improves class discrimination.
  • Applies the softmax non-linearity and probability scoring to construct a network for computing posterior distribution over class labels, under assumptions of Gaussian within-class distributions with equal covariance matrices.

Abstract

We are concerned with feed-forward non-linear networks (multi-layer perceptrons, or MLPs) with multiple outputs. We wish to treat the outputs of the network as probabilities of alternatives (e.g. pattern classes), conditioned on the inputs. We look for appropriate output non-linearities and for appropriate criteria for adaptation of the parameters of the network (e.g. weights). We explain two modifications: probability scoring, which is an alternative to squared error minimisation, and a normalised exponential (softmax) multi-input generalisation of the logistic non-linearity. The two modifications together result in quite simple arithmetic, and hardware implementation is not difficult either. The use of radial units (squared distance instead of dot product) immediately before the softmax output stage produces a network which computes posterior distributions over class labels based on an assumption of Gaussian within-class distributions. However the training, which uses cross-class information, can result in better performance at class discrimination than the usual within-class training method, unless the within-class distribution assumptions are actually correct.

Citation Graph

Loading graph...

References [12]

Sort:
Filter:

Geoffrey E. Hinton - 1987

11 papers in library cite

T. J. Sejnowski, C. R. Rosenberg - 1986

6 papers in library cite

L. Bahl, P. Brown, P. D. Souza, R. Mercer - 1986

4 papers in library cite

Geoffrey Hinton, T. Sejnowski, D. Ackley - 1984

3 papers in library cite

Sara A. Solla, E. Levin, M. Fleisher - 1988

2 papers in library cite

G. E. Peterson, H. L. Barney - 1952

2 papers in library cite

E. B. Baum, F. Wilczek - 1988

2 papers in library cite

W. M. Huang, R. P. Lippmann - 1988

1 paper in library cites

A. J. Viterbi - 1979

1 paper in library cites

L. Gillick - 1987

1 paper in library cites

E. Yair, A. Gersho - 1988

1 paper in library cites

D. R. Cox, H. D. Millar - 1965

1 paper in library cites

Cited by

12

papers in your library

Cites

1

papers in your library

Read

on June 23, 2025

Your review

Tags

Paper Aliases

No aliases