An active field over the past several decades in computer vision and machine learning has been pattern classification [271,295,711]. The general problem involves using a set of data to perform classifications. For example, in computer vision, the data correspond to information extracted from an image. These indicate observed features of an object that are used by a vision system to try to classify the object (e.g., ``I am looking at a bowl of Vietnamese noodle soup'').
The presentation here represents a highly idealized version of pattern classification. We will assume that all of the appropriate model details, including the required probability distributions, are available. In some contexts, these can be obtained by gathering statistics over large data sets. In many applications, however, obtaining such data is expensive or inaccessible, and classification techniques must be developed in lieu of good information. Some problems are even unsupervised, which means that the set of possible classes must also be discovered automatically. Due to issues such as these, pattern classification remains a challenging research field.
The general model is that nature first determines the class, then observations are obtained regarding the class, and finally the robot action attempts to guess the correct class based on the observations. The problem fits under Formulation 9.5. Let denote a finite set of classes. Since the robot must guess the class, . A simple cost function is defined to measure the mismatch between and :
The next part of the formulation considers information that is used to make the classification decision. Let denote a feature space, in which each is called a feature or feature vector (often ). The feature in this context is just an observation, as given in Formulation 9.5. The best classifier or classification rule is a strategy that provides the smallest classification error in the worst case or expected case, depending on the model.