Faster Text Classification from Facebook, the Social Outfit
August 29, 2016
I read “Faster, Better Text Classification.” Facebook’s artificial intelligence team has made available some of its whizzy code. The software may be a bit of a challenge to the vendors of proprietary text classification software, but Facebook wants to help everyone. Think of the billion plus Facebook users who need to train an artificially intelligent system with one billion words in 10 minutes. You may want to try this on your Chromebook, gentle reader.
I learned:
Automatic text processing forms a key part of the day-to-day interaction with your computer; it’s a critical component of everything from web search and content ranking to spam filtering, and when it works well, it’s completely invisible to you. With the growing amount of online data, there is a need for more flexible tools to better understand the content of very large datasets, in order to provide more accurate classification results. To address this need, the Facebook AI Research (FAIR) lab is open-sourcing fastText, a library designed to help build scalable solutions for text representation and classification.
What does the Facebook text classification code deliver as open sourciness? I learned:
FastText combines some of the most successful concepts introduced by the natural language processing and machine learning communities in the last few decades. These include representing sentences with bag of words and bag of n-grams, as well as using subword information, and sharing information across classes through a hidden representation. We also employ a hierarchical softmax that takes advantage of the unbalanced distribution of the classes to speed up computation. These different concepts are being used for two different tasks: efficient text classification and learning word vector representations.
The write up details some of the benefits of the code; for example, its multilingual capabilities and its accuracy.
What will other do gooders like Amazon, Google, and Microsoft do to respond to Facebook’s generosity? My thought is that more text processing software will find its way to open source green pastures.
What will the for fee vendors peddling proprietary classification systems do? Here’s a short list of ideas I had:
- Pivot to become predictive analytics companies and seek new rounds of financing
- Pretend that open source options are available but not good enough for real world tasks
- Generate white papers and commission mid tier consulting firms to extol the virtues of their innovative, unique, high speed, smart software
- Look for another line of work in search engine optimization, direct sales for a tool and die company, or check out Facebook.
Stephen E Arnold, August 29, 2016