Faster Text Classification from Facebook, the Social Outfit

August 29, 2016

I read “Faster, Better Text Classification.” Facebook’s artificial intelligence team has made available some of its whizzy code. The software may be a bit of a challenge to the vendors of proprietary text classification software, but Facebook wants to help everyone. Think of the billion plus Facebook users who need to train an artificially intelligent system with one billion words in 10 minutes. You may want to try this on your Chromebook, gentle reader.

I learned:

Automatic text processing forms a key part of the day-to-day interaction with your computer; it’s a critical component of everything from web search and content ranking to spam filtering, and when it works well, it’s completely invisible to you. With the growing amount of online data, there is a need for more flexible tools to better understand the content of very large datasets, in order to provide more accurate classification results. To address this need, the Facebook AI Research (FAIR) lab is open-sourcing fastText, a library designed to help build scalable solutions for text representation and classification.

What does the Facebook text classification code deliver as open sourciness? I learned:

FastText combines some of the most successful concepts introduced by the natural language processing and machine learning communities in the last few decades. These include representing sentences with bag of words and bag of n-grams, as well as using subword information, and sharing information across classes through a hidden representation. We also employ a hierarchical softmax that takes advantage of the unbalanced distribution of the classes to speed up computation. These different concepts are being used for two different tasks: efficient text classification and learning word vector representations.

The write up details some of the benefits of the code; for example, its multilingual capabilities and its accuracy.

What will other do gooders like Amazon, Google, and Microsoft do to respond to Facebook’s generosity? My thought is that more text processing software will find its way to open source green pastures.

What will the for fee vendors peddling proprietary classification systems do? Here’s a short list of ideas I had:

Pivot to become predictive analytics companies and seek new rounds of financing
Pretend that open source options are available but not good enough for real world tasks
Generate white papers and commission mid tier consulting firms to extol the virtues of their innovative, unique, high speed, smart software
Look for another line of work in search engine optimization, direct sales for a tool and die company, or check out Facebook.

Stephen E Arnold, August 29, 2016

Written by Stephen E. Arnold · Filed Under Facebook, News, Open source

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.