Machine Learning Going Through a Phase

May 10, 2017

People think that machine learning is like an algorithm magic wand.   It works by some writing the algorithmic code, popping in the data, and the computer learns how to do a task.  It is not that easy.  The Bitext blog reveals that machine learning needs assistance in the post, “How Phrase Structure Can Help Machine Learning For Text Analysis.”

Machine learning techniques used for text analysis are not that accurate.  The post explains that instead of learning the meaning of words in a sentence according to its structure, all the words are tossed into a bag and translated individually.  The context and meaning are lost.  A real world example is Chinese and Japanese because they use kanji (pictorial symbols representing words).   Chinese and Japanese are two languages, where a kanji’s meaning changes based on the context.  The result is that both languages have a lot of puns and are a nightmare for text analytics.

As you can imagine there are problems in Germanic and Latin-based languages too:

Ignoring the structure of a sentence can lead to various types of analysis problems. The most common one is incorrectly assigning similarity to two unrelated phrases such as Social Security in the Media” and “Security in Social Media” just because they use the same words (although with a different structure).

Besides, this approach has stronger effects for certain types of “special” words like “not” or “if”. In a sentence like “I would recommend this phone if the screen was bigger”, we don’t have a recommendation for the phone, but this could be the output of many text analysis tools, given that we have the words “recommendation” and “phone”, and given that the connection between “if” and “recommend” is not detected.

If you rely solely on the “bag of words” approach for text analysis the problems only get worse.  That is why it phrase structure is very important for text and sentiment analysis.  Bitext incorporates phrase structure and other techniques in their analytics platform used by a large search engine company and another tech company that likes fruit.

Whitney Grace, May 10, 2017


Comments are closed.

  • Archives

  • Recent Posts

  • Meta