Possibilities for Solving the Problem of Dimensionality in Classification
June 5, 2014
The overview of why indexing is hard on VisionDummy is titled The Curse of Dimensionality in Classification. The article provides a surprisingly readable explanation with an example of sorting images of cats and dogs. The first step would be creating features that would assign values to the images (such as different color or texture). From there, the article states,
“We now have 5 features that, in combination, could possibly be used by a classification algorithm to distinguish cats from dogs. To obtain an even more accurate classification, we could add more features, based on color or texture histograms, statistical moments, etc. Maybe we can obtain a perfect classification by carefully defining a few hundred of these features? The answer to this question might sound a bit counter-intuitive: no we can not!.”
Instead, simply adding more and more features, or increasing dimensionality, would lessen the performance of the classifier. A graph is provided with a sharp descending line after the point called the “optimal number of features.” At this point there would exist a three-dimensional feature space, making it possible to fully separate the classes (still dogs and cats). When more features are added passing the optimal amount, over fitting occurs and finding a general space without exceptions becomes difficult. The article goes on to suggest some remedies such as cross-fitting and feature extraction.
Chelsea Kerwin, June 05, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Comments
One Response to “Possibilities for Solving the Problem of Dimensionality in Classification”
I will right away clutch your rss as I can’t find your
email subscription link or newsletter service. Do you have any?
Kindly let me realize in order that I could subscribe.
Thanks.