Possibilities for Solving the Problem of Dimensionality in Classification

June 5, 2014

The overview of why indexing is hard on VisionDummy is titled The Curse of Dimensionality in Classification. The article provides a surprisingly readable explanation with an example of sorting images of cats and dogs. The first step would be creating features that would assign values to the images (such as different color or texture). From there, the article states,

“We now have 5 features that, in combination, could possibly be used by a classification algorithm to distinguish cats from dogs. To obtain an even more accurate classification, we could add more features, based on color or texture histograms, statistical moments, etc. Maybe we can obtain a perfect classification by carefully defining a few hundred of these features? The answer to this question might sound a bit counter-intuitive: no we can not!.”

Instead, simply adding more and more features, or increasing dimensionality, would lessen the performance of the classifier. A graph is provided with a sharp descending line after the point called the “optimal number of features.” At this point there would exist a three-dimensional feature space, making it possible to fully separate the classes (still dogs and cats). When more features are added passing the optimal amount, over fitting occurs and finding a general space without exceptions becomes difficult. The article goes on to suggest some remedies such as cross-fitting and feature extraction.

Chelsea Kerwin, June 05, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under News, Technology

Comments

One Response to “Possibilities for Solving the Problem of Dimensionality in Classification”

Corinne on June 12th, 2014 8:07 am

I will right away clutch your rss as I can’t find your
email subscription link or newsletter service. Do you have any?

Kindly let me realize in order that I could subscribe.
Thanks.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.