Language Computer: Why Now for Swingly and Extractiv

September 2, 2010

I did some fooling around on the Language Computer Corp. Web site. The PR blitz is on for Swingly, the question-answering service that was featured in blogs and on the quite remarkable podcast hosted by Jason Calacanis. I listened to the Swingly segment but exited once that interview concluded. Instead of wallowing in the “ask a question, get an answer” just like Ask.com, Yahoo Answers, Mahalo, Quora, Aardvark, and others, I thought I would navigate to the Overflight archive and check out the Web site. The first thing I noted was that a click on the WebFerret button now renamed “Ferret” returned a 404 error. Okay. So much for that. I then punched the entity recognition demo which I had also examined a while ago. More luck there, but I had to dismiss an “invalid security certificate,” which I supposed would have been a deal breaker for the Steve Gibson types visiting the Language Computer Web site.

I uploaded one of my for-fee columns  to CiceroLite ML.. The system accepted the file, stripped out the Word craziness, and invited me to process the file. I punched the “process” button. The system highlighted the different entities. What’s important is that Language Computer has for at least eight or nine years performed at or near the top of the heap on various US government tests of content processing systems. Here’s what the marked up text looked like. Each color represents a different type of entity. For example, red is an organization, blue a person, etc.

lcc entity

In operational use, the tagged entities are written to a file, not embedded in a document. But for demo purposes, it makes it easy to see that Language Computer did a pretty good job. Entity extraction is a big deal for some types of content activities. I find a tally of how many times an entity appears in a document quite useful. The big chunk of work, in my opinion, is mapping entities to synonyms and then to people and places. It’s great to know the entities in a document, but it is even more great to have these items hooked together. I quite like the ability to click and see the entities in the source document.

Language Computer Corporation has been around since 1995. It has an excellent reputation, and, like other next generation content processing systems, has been used by specialists in quite specific niche markets. I won’t name these, but you can figure out what outfits are interested in:

  • Entity recognition
  • Event time stamping
  • Sentiment tracking
  • Document summarization.

The plumbing for these industrial-strength applications is what makes Swingly.com work. Swingly.com is a demo of the Language Computer question answering function. In my opinion, I am not likely to do much typing or speaking of questions into a search box or device. I type queries and I shout into a phone, often with considerable enthusiasm. (I hate phones.)

If you want to explore the Language Computer function to turn Web content (heterogeneous and semi-structured content) into structured data, navigate to www.extractiv.com. You will need to register. In order to use the service you have to create a content job, perform some steps, and then know what the heck you are looking at. The system works.

The larger issue to consider is, “Why are companies like Language Computer, Fetch Technologies, JackBe, and others from the niche government markets suddenly bursting into the broader enterprise and consumer sector?”

The pundits have not tackled this question. Most of the Swingly.com write ups are content to beat on the Q&A drum. I don’t think question answering is a mass market service except on devices that allow me to talk. In short, the Web angle is silly. So I am at odds with the azurini. I don’t care too much about English majors and journalists who are experts in search and content processing. Feel free to fall in love. Just brush up on your Shakespeare because the plumbing in systems like Language Computer’s will mean zero to this crowd.

Back to the implications. I noted five for the purposes of this blog write up:

  1. The salad days of defense largesse are coming to an end. Global financial crisis and problems getting funding to keep the lights on in health care facilities are two good reasons. These outfits have to find a way to generate revenues in today’s fun filled world.
  2. The understanding that key word search sucks is pretty widespread. As a result, specialist vendors can make a real contribution where the past search failures have illuminated the dark corners of the procurement team’s understanding of search.
  3. Machines are fast enough to crunch the data. In the past, machines cost too much, so the sophisticated methods became the Land of the Government Agencies. Now anyone can have a supercomputer.
  4. Engineers have figured out how to hook different methods together. Instead of the super proprietary approach, more solutions move content through a processing pipeline. Some of the functions are deep secrets. Others are pretty widely known. The trick was hooking the pieces together to deliver useful output.
  5. The content volume is making less sophisticated approaches look more and more irrelevant. A good case in point is a Google query. Who cares about a laundry list? Answer: I don’t. If you do, steer clear of me.

Bottom-line: I think we will see more plays like the Language Computer Swingly and Extractiv services. That’s exciting.

I bet your friendly azure chip consultant can explain in detail why I am wrong. Of course, those folks dig customer support and business intelligence now, so maybe not. Anyone for SigIntNews.com?

Stephen E Arnold, September 2, 2010

Freebie

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta