Natural Language Processing with Ruby

October 18, 2013

For those who know the open-source programming language Ruby, NLP is a script away. Sitepoint shares some basic techniques in, “Natural Language Processing with Ruby: N-Grams.” This first piece in a series begins at the beginning; developer Nathan Kleyn writes:

“Natural Language Processing (NLP for short) is the process of processing written dialect with a computer. The processing could be for anything – language modeling, sentiment analysis, question answering, relationship extraction, and much more. In this series, we’re going to look at methods for performing some basic and some more advanced NLP techniques on various forms of input data. One of the most basic techniques in NLP is n-gram analysis, which is what we’ll start with in this article!”

Kleyn explains his subject clearly, with plenty of code examples so we can see what’s going on. He goes into the following: what it means to split strings of characters into n-gram chunks; selecting a good data source (he sends readers to the comprehensive Brown Corpus); writing an n-gram class; extracting sentences from the Corpus; and, finally, n-gram analysis. The post includes links to the source code he uses in the article.

In the next installment, Kleyn intends to explore Markov chaining, which uses probability to approximate language and generate “pseudo-random” text. This series may be just the thing for folks getting into, or considering, the natural language processing field.

Cynthia Murrell, October 18, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta