Meaning in Casual Wording

March 3, 2011

I love science.  Paired with my increasing passion for language and grammar, a sweeter cocktail could hardly be imagined.  “Do Casual Words Betray Warlike Intent?” was a fascinating read.

At the recent American Association for the Advancement of Science (AAAS) meeting, James Pennebaker, a University of Texas at Austin psychologist spoke about the study he and assorted colleagues along with the Department of Homeland Security have been engaged in recently.  The focus of the research has been on four similar Islamic groups and the relationship between the speech they employ and the actions that follow.  The collective hope is the study’s findings can be used to forecast aggressive activity.

Isolating pronouns, determiners, adjectives and prepositions, the group mines them for what Pennebaker calls “linguistic shifts”.  To date they have determined that of the four, the two groups who have committed acts of violence, telegraphed said destructiveness with the use of “more personal pronouns, and words with social meaning or which convey positive or negative emotions.”  Aside from differentiating between various stylistic elements of expression, Pennebaker has also scrutinized statements made by warmongers from our past, including George W. Bush, with interesting results.

Skepticism has always fueled scientific endeavors, and we must continue to ask questions, especially those that breed discomfort.  This science deals with a very grey area and Pennebaker himself labels the results as only “modest probabilistic predictions”.  There is no question that this information must be used responsibly, but my aforementioned appreciation for the field keeps me from seeing this as a negative.

If one can discern an opponent’s intent in a fight or a game of cards by careful observation, why is it so strange to think the same could be done from listening to what they say?

Sarah Rogers, March 3, 2011

Freebie

NLP Gets a Full Monty

February 28, 2011

Natural Language Processing (NLP) is experiencing huge growth.  From handwriting recognition to foreign language translation to predictive text on your handheld, NLP is used in many different ways to help our technology recognize what we mean when we simply speak or write English (or whatever language you happen to use in life).  Natural Language Processing with Python is a book available in pdf that gives a useful introduction to NLP based on the Python programming language with its shallow learning curve.

According to its own introduction:

“This book provides a highly accessible introduction to the field of NLP. It can be used for individual study or as the textbook for a course on natural language processing or computational linguistics, or as a supplement to courses in artificial intelligence, text mining, or corpus linguistics.”

The book is geared toward beginning and intermediate levels, so even if you are new, don’t be intimidated.  It is full of exercises, and the authors have used entertaining examples to lighten what might otherwise be a heavy subject.  The book is available for free download and the Natural Language Toolkit with open source Python modules is as well.  Whether your background is arts and humanities or science and engineering, this is a recommended read.

Alice Wasielewski, February 28, 2011

What Is a Fresh Start, Alex?

February 26, 2011

With the mounting anticipation of the man versus machine episode of Jeopardy! set to air on February 14, 15, and 16, 2011, it is hard to ignore the buzz over Watson.  If you’ve been locked in a closet for the last month, Watson is IBM’s supercomputing experiment in AI.  Recent articles in The Pittsburgh Post-Gazette and USA Today can bring you up to speed if necessary.  In previous weeks we covered Watson’s win in the game’s practice round.  But trivialities aside, what does Watson actually mean for IBM?

Well, Watson won.

The head gander in Harrod’s Creek maintains that IBM is pulling a PR stunt considering the company’s long history of work in the search field without ever impacting the market in a significant way.  Omnifind 9.1 for Lucene has not been met with much fanfare, at least not here at Beyond Search, largely in part to the convoluted web of features (or corresponding fixes) and lack of support.

Yes, it is all happening on a game show, so the possibility of rigging exists.  But how would introducing a fraud over national airwaves benefit IBM or what appears to be a quest to remind the public of its ingenuity?  Watson performs so well due to refined natural language processing (NLP) and QA technology, two facets of search that are likely to be important players in the future.  So if all goes according to plan, rather than typing a query into a search engine and waiting for a list of results out of which you must dig your own answer, the quandary will automatically be resolved.  That is the claim of Watson’s power, it accurately plucks answers out of information stores and the range of applications is huge.  This could be the next step in search and IBM could once again be a great innovator in the foreground.  Even though IBM processors are in nearly every gadget on the market, it’s been a while since IBM has had any real recognition.  That is why it does not surprise me they choose to roll-out their new tech on a prime-time television show, making advanced technology more palatable and memorable to the average consumer.

Perhaps I am being naïve or am too huge a fan of the science fiction novel, but I can’t help but be in Watson’s corner.  Hey, Arthur C. Clarke got satellite communication right; maybe HAL 9000 is on its way.

Now Watson is headed to health care. Stay tuned.

Sarah Rogers, February 26, 2011

Freebie

Invention Machine Embraces NLP

February 24, 2011

Natural Language Processing (NLP) is not a new science, but it has yet to be perfected. A wealth of corporate and common knowledge lies in unstructured text documents, but most NLP search retrieves piles of seemingly disconnected documents. Users need precise relevant results, but NLP has yet to get there. Goldfire claims to know How Natural Language Processing Can Solve the Knowledge Retrieval Problem. He notes:

“Goldfire’s Natural Language query interface enables the user to put a question in a free text format, which would be the same format as if the question were given to another person. And, once relevant knowledge has been retrieved, Goldfire presents the results in a way that makes their meaning readily apparent. “

Claiming to have found a way for computer-aided knowledge extraction to overcome the natural language obstacles of semantics, syntax, and context, Goldfire marries high-level concept extraction and problem-solving capabilities. Despite such improvements, the problem of how we structure and retrieve information is unlikely to be solved anytime soon.

Emily Rae Aldridge, February 24, 2011

SAP Embraces InQuira

February 21, 2011

SAP makes an interesting search move. TMC News announces “InQuira Platform Endorsed by SAP.” German software provider SAP has endorsed the use of the InQuira Platform with its software:

“According to InQuira it is complementary to SAP software offerings, developed in accordance with SAP development guidelines and provides additional choices and flexibility for businesses running SAP applications.”

The two companies also plan to share technology and product planning.

Cynthia Murrell February 21, 2011

Freebie

Watson Reddited

February 18, 2011

Short honk: If you are “into” the brave new world of IBM as the leader in search, you will want to navigate to this Reddit thread. Enjoy.

Stephen E Arnold, February 18, 2011

For You Watson Fans: Better than SkyNet

February 14, 2011

Short honk: Want to read the upside of IBM’s most recent push into information processing. Navigate to “Why IBM’s Watson Is NOT Skynet (It’s Better).” Best quote for the true blue believers:

But it is impressive. From Watson’s Jeopardy play so far, it’s apparent that Watson does two things extremely well: 1) looking at data from a host of sources, finding the answers it wants extremely fast, and 2) understanding the nuances of natural human speech and writing. The other things it does—learning as it goes along and pressing a buzzer quickly—are nice, too, but aren’t really that novel.

Accuracy? About 85 percent. That’s an A in today’s world of fluid standards. Tattoo Charley in Louisville can slap an IBM logo on your guns while you wait.

I hear, “I’ll be back” now.

Stephen E Arnold, February 14, 2011

Freebie.

NLP from Southampton

February 3, 2011

IBM’s Jeopardy marketing play has sparked other companies to say, “We do that too.”

The Engineer announced that “Intelligent Machine Brain Understands Natural Language.” This machine brain is called the Sysbrain and Prof. Sandor Veres of Southampton University headed its creation. The Sysbrain has huge marketability and can used in a wide range of fields.

‘Essentially we’ve developed a system to give some intelligence to machines, not as human intelligence but specific to particular tasks such as spatiotemporal awareness and avoiding dangerous situations,’ he said. ‘Think about what a spacecraft would need in a mission through the asteroid belt.’ ”

The natural language programming (NRL) is the Sysbrain’s most remarkable feature. It can understand everyday English and technical documents and is geared towards professionals not fluent in programming language. I’m sure everyone is thinking about HAL from 2001: A Space Odyssey after reading this. All I can say is, “Daisy, Daisy, Daisy, give me your answer true!” and “Take a stress pill, Dave.” That big red glowing light was impressive but it did dim, didn’t it?

Whitney Grace, February 3, 2011

Freebie

Juru, Watson, I Say, Juru!

February 1, 2011

Quite a heated discussion at lunch today. One of the goslings was raving about Watson. The Jeopardy demo convinced the engineer that IBM had the next big thing in search. A person can ask a question and right away get the answer. Wow. I thought that type of computer system only worked under carefully controlled conditions, in demos, or in motion pictures.

That’s why the goslings were agitated when I said, “It is TV. TV does almost anything—well, anything—for money.” I pointed out that the game shows 21 and the $64,000 Question took some liberties to boost ratings. Have TV times changed that much? I said, “I don’t think so.”

I supported my argument by mentioning Juru. Do you remember that gem from IBM. Here’s what my Overflight system spit out.

Juru is / was a full text search “library” that would make short work of “small and mid-sized corpuses.” Of course, “small” and “mid-sized” are rarely defined either by IBM or other search researchers. The idea was that Java made it easy to run Juru on any platform. Of course, today, I don’t think Juru would work in the Android or IOS environment, but some day maybe.

Juru asserted that the system would:

  • Support different document types
  • Make use of links just like our every tweakable PageRank-type systems
  • On the fly summaries of documents
  • Clustering
  • Nifty ways to keep the indexes small and, therefore, zippy.

You can get some info at this link. There is some additional color here:

I reminded the goslings that IBM rolls out search solutions as part of its global marketing efforts. More to the point, I asked the goslings which vendors’ search systems IBM resells. I did not hear the magic words Autonomy or Endeca. IBM once loved Fast ESP.

If you want search from IBM, what do you get today? A version of the open source search solution Lucene. Why? It works pretty well. Juru, Watson, Web Fountain, et al? Well, make up your own mind with some head to head testing. I won the argument and still had to pay for lunch. Honk.

Stephen E Arnold, February 1, 2011

Freebie

Linguamatics Says, Keep Experimenting

January 24, 2011

Linguamatics, which produces natural language processing technology, has posted a blog entry titled “Trend Analysis- Can a Prediction be Made?” The answer depends on the mathematics and the definition of a “prediction.”

For its example, the blog compares the popularity of a couple of politicians during their debates, as recorded through Twitter, to their election results. Using their I2E text mining software to analyze the Tweets, Linguamatics found a strong correlation.

However, the blog is missing details needed to definitively answer their own question. How did they use their data to calculate probability? Furthermore, what other types of predictions could this process make, and how?

The company claims that:

“This case study shows how the power of using NLP with the I2E software platform can be used to gain quite powerful insights on what is likely to happen based on opinions expressed by people using social media platforms.”

I’m afraid I’d have to see more results before I can agree with that opinion.

To read more about the company on their website, go to www.linguamatics.com.

Cynthia Murrell January 22, 2011

Freebie

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta