Linguistic Agents: Smart Software from Israel

April 16, 2008

In my new study “Beyond Search”, I profile a number of non-US content processing companies. Several years ago I learned about Jerusalem-based Linguistic Agents. The company uses an interesting technique for its natural language processing system. I found Linguistics Agents’ approach interesting.

The firm’s founder is Sasson Margaliot. In 1999, Mr. Margaliot wanted to convert linguistic theories
into practical technologies. The goal was to enable computers to understand human language and context. Like other innovators in content processing, Mr. Margaliot had expertise in theoretical linguistics and application software development. He studied Linguistics at UCLA and Computer Science at Hebrew
University of Jerusalem.

The company’s chief scientist is Alexander Demidov. Mr. Demidov was responsible for the development of Linguistic Grammars for the Company’s NanoSyntactic Parser, the precursor of today’s Streaming Logic engine. Previously, he worked for the Moscow Institute of Applied Mathematics and at Zehut, a company that developed advanced compression and protection algorithms for digital imaging.

Computerworld identified the company in the summer of 2007 as having one of the “cool cutting-edge technologies on the horizon”. Since that burst of publicity in the US, not much has been done to keep the company’s profile above the water line.

The company uses “nano syntax” to extract meaning from documents. On the surface, the approach seems to share some features with Attensity, the “deep extraction company” and the firm that I included in my new study as an exemplar of recursive analysis and linguistic processing for meaning.

The idea is that a series of parallelized processes converts a sentence into a representation that preserves its syntactical meaning. The technology can be applied to search as well as context-based advertising. The company asserts, “The technology can revolutionize how computers and people interact –computers will learn our language instead of vice versa.”

Read more

Teragram: SAS’s Search Launchpad

March 20, 2008

This week SAS announced that it purchased Teragram, a content processing company with deep roots in, computer science, mathematics, and blue – chip clients. If you poke around Teragram’s Web site, you learn that the company supports double byte languages. If I read the Teragram information correctly, this little-known outfit not far from Harvard Yard has proprietary technology strongly suggestive of the super – sophisticated techniques in use at IBM, Google, Microsoft, and Yahoo.

The Teragram system can match other systems advanced functions like advanced function — NLP (natural language processing)? Automatic summarization? No problem. Hosted services option? Check. Autonomy – Recommind type patten matching? Done. Attensity and Bitext style linguistic analysis? Covered. Teragram has a warehouse chock full of search and content processing goodies.

Now SAS owns this “search tech” tool box.

Teragram, founded in 1997, was a privately-held content processing company in Cambridge, Massachusetts. Two wizards — both from Luxembourg — have applied their computer science and mathematical expertise to unstructured information for more than a decade. That’s a long time in the fast-moving search and text processing sector.

I learned about Teragram when someone told me that the company was a technology provider to Fast Search & Transfer SA. Fast Search’s Dr. John Lervik is a canny technologist, and he has a good nose for solid technology.

Read more

Rain on the Search Parade

March 14, 2008

The storm warnings flash across the sky. This morning (Mrch 14, 2008) BearStearns is rumored to face a Carlyle-like liquidity crisis.

But so far no lightning has hit the search lightening rods. In fact, the unsettled financial weather has had no visible effects. The Google – DoubleClick deal is done. The Microsoft – Fast Search tie up is nearing port. Yahoo says that it is embracing the Semantic Web whatever that means (semantically, of course). France funds a Google killer. Radar’s Twine spools out. Business as usual in the search sector. But still we have no “real” solution to the “problem” of Intranet search, what I call behind-the-firewall search. The marketing razzle dazzle can’t mask the pain begging for lidocaine.

The turmoil in the financial market, the degrading dollar, and the $1,000 per ounce gold price seem to have little impact on search and retrieval so far. Anyone who suggests that a problem looms or that an actual panic could occur is an alarmist. I don’t want to sound any alarms.

InfoWorld‘s Web log contained a post that has to make search vendors’ pant with revenue lust. Jon Williams wrote here on March 13, 2008:

Every system we build has a search function built into it, usually hand-crafted (proprietary). Why? … Search on the internet, whether it be google, youtube, facebook, amazon, ebay, or linkedin, is solved for me, I always find what I need. And I believe the same is true for most consumers. But why not in the enterprise? Seems like a solution waiting to happen.

Spot on, Mr. Willliams. Spot on. This unanswered need is why you won’t hear gloom and doom from me. Search often sucks, and whoever solves this problem can make their investors happy in our down market.

An Entrepreneur’s Concern

At dinner yesterday evening (March 13, 2008) in Palo Alto’s noisy Fish Market, I showed the president of a hosted application my current list of 150 next-generation search and content processing companies. Most of the outfits on this list won’t resonate with you. Bitext operates from Madrid, Spain. Thetus has offices near Microsoft’s stomping grounds. PolySpot is tucked away in Paris, France. He had heard of none of these companies or most of the others on my list.

He said, “There are so many on this list unknown to me.” Not unusual. He then asked me, “How can these companies survive so much competition? I think the market downturn will make it very hard for these companies.

Right?”

I said, “Yep, tough sector. But no one has the one right answer. Not Google. Not IBM. Not the seven score newcomers on my list.”

The search market remains a triathlon, one of those “iron” versions that require competitors to climb mountains, swim rapids, and bicycle from Burlingame to Boise. But there are some formidable hurdles search vendors must overcome; namely:

Oversupply. Without rehashing dear old Samuelson’s Economics (now in its 18th edition I think), you have an embarrassment of riches for search. You have high-profile, publicly-traded “brands” like Autonomy. You have market-leading companies like Endeca. You have up-and-coming vendors like Coveo, Exalead, ISYS Search Software, and Vivisimo. You have state-of-the-art deep extraction providers like Attensity and Exegy (bet you never heard of Exegy, right?). You have free search software such as Lucene and Flax. You have such super-platforms as IBM, Microsoft, Oracle, and SAP including search with every enterprise applications licensed. You have specialists in entity extraction (Inxight / Business Objects), semantics (Siderean), ANSI standard controlled terms (Access Innovations). You get the idea. Can the market support hundreds of vendors of search and content processing?

Confusion. You don’t want me to belabor this point. There’s a great deal of confusion about search, content processing, text mining, and related disciplines. The easiest way to illustrate this is to provide you with a handful of the buzz words that I have collected in the last two weeks. How many of these can you define? How many of these do you use in your discourse with colleagues? Here are the “Cs” through the “Ks” only:

Collective knowledge systems
Community portals
Composite applications
Conferencing
Context aware games
Context aware mobile search
Context aware search
Context search
Faceted search
Folksomony
Formal language
Geospatial search
Glass boxes
Instant messaging
Intelligent agents
Knowledge base
Knowledge computing
Knowledge management
Knowledge spaces

Confused buyers often drag their heels as they try to decipher the nuances of search-speak.

Skepticism. Some vendors have told me that potential customers are skeptical about some search features and functions. For example, on a telephone call with a non-U.S. search system vendor, a principal in the company told me, “The nest has been fouled. Two prospects told me today that our two to five day deployment time was impossible. Their incumbent system took more than a month to get installed and another two months of effort before deployment.” As organizations get more behind-the-firewall search experience, those organization’s employees know that some vendor claims may be a blend of wishful thinking and science fiction.

Over confidence. I don’t have much to say about this human failing. Most chief technical officers over estimate what they know about search and retrieval. Most of the Intranet search problems problems have their roots anchored in the licensees’ assumptions about what their systems can do, their knowledge of search systems, and their ability to figure out software. I get my Greek myths mixed up, but there were, as I recall, quite a few stories about the nasty effects of pride. “Flame out” and Icarus resonate with me.

Loosey goosey pricing. In the course of the research for my new study Beyond Search, I encountered one vendor who refused to give me a starting price for its system. The president refused. I said, “Take your total revenue, divide it by the number of customers you have, and I will use that number as the average price.” He sputtered in anger. Let’s face it. Unless something is free, most search software comes with a price tag. Even a free system such as Lucene costs money because someone who gets a salary has to babysit the Lucene system. More and more vendors are tap dancing on the cost of their licenses, services, and support. I suspect that these vendors want to hold out to get the best possible price. Maybe these vendors don’t want other customers to know that a price is rising or falling?

Adam Smith’s “invisible hand” will reach out to strangle me. Economics in March 2008, however, continues to surprise the Wall Street set. Last time I checked the super-secret Carlyle Group did not expect fellow bankers to demand cash.

How untoward!

But if some of the best-known financial services companies are in the doo-doo, what will become of the more 300 firms engaged in search and retrieval? Even the Teflon-coated Google has drawn criticism. Today (March 14, 2008) Google’s share price will open at $443, down from its 52-week high of $747. Microsoft will pay $1.2 billion for a chance at bat to hit a search home run. That’s a pricey swing methinks. In my conversations at conferences, I detect a note of concern about making numbers. Entrepreneurs are thoughtful.

Wrap Up

To wrap up, I believe the search landscape will be pockmarked with Entopia-like shut downs. I also anticipate more strident marketing. Sigh. There will be some buy outs, but there will be some firms that cannot sell out. One reader of this Web log wondered if Autonomy was an example of company that many look at but none has carried over the threshold. Maybe the right suitor has not come forward? I believe that some countries will intervene in order to keep certain search firms in business. Anyone think that the French government has this as a motive for the funding of its Google killer? Other companies will give away search software and try to make money via services and consulting. And don’t forget the bundling option. Every time I buy an IBM server, I get Lotus Notes. Perhaps the same approach will be used by Microsoft and Oracle to “lock in” customers with this tactic.

The big concern I have is that search’s “bird flu” will land. The weaker firms will die after a tough fight. The stronger firms will capture a larger share of the market. Instead of the surfeit of choices we have today, we may end up with fewer choices, higher prices, and a stifling of innovation. What do you think? End or beginning for behind-the-firewall search?

Stephen Arnold, March 14, 2008

Is Search Approaching a Crisis in 2008?

February 26, 2008

In May 2007, I will be doing the end note talk at the Enterprise Search Summit 2008. This is a conference owned and managed by Information Today, Inc. This may be the third or fourth year that I have anchored the program. Last year, Sue Feldman, IDC’s well-known search wizard, and Robert Peck, Managing Director of BearStearns’ Internet unit “debated” me last year. The idea is that I am known to be controversial, so representatives of received wisdom about “enterprise search”, a term I don’t like. For May 2008, I’m not certain what Information Today has planned to counter balance by contrarian views of behind-the-firewall search.

I worked yesterday to locate my remarks from 2007 here and come up with observations based on my research since May 2007. I have two studies under my belt in the last 10 months– Google Version 2.0 and Beyond Search: What to Do When Your Search System Doesn’t Work. Google is an interesting company, and I will be talking about its impact on enterprise software at the AIIM Show in Boston on March 4, 2008. My research for Beyond Search unearthed a number of interesting facts and insights. I am inclined to lean heavily on that information for the Enterprise Search Summit 2008 “controversial” end note.

I want to outline my preliminary thinking for my May 2008 remarks and invite comments on my views. Accordingly, here’s the table I created yesterday:

2007 Crisis

2008 Delta

Observation

Organization’s info tech departments are in trouble

No change

Complexity continues to escalate. There’s a reason Salesforce.com, Amazon S3 and EC2, and NetSuite are getting hard looks. Blossom, Exalead, and Fast Search offer hosted solutions

Costs are rising

Financial pressure is increasing

Buy outs, staff reductions, and repositioning are making it tough for potential buyers to know what search vendors have on offer. Examples: Autonomy and Zantaz; Inxight becoming part of Business Objects, then BO getting acquired by SAP, then SAP investing in Endeca.

Customers

More confident in their ability to select the right system than in 2007

Arrogance, not common sense, on the rise.

Vendors

More despite buys outs and consolidation

Vendors are morphing quickly. Utilities become search engines. Search engines become platforms. Platforms become knowledge systems. Too many companies chasing too few customers.

Sea change

Greater uncertainty “Stay the course” seems to be the mantra.

As I reflect on these points, I see three characteristics of the 2008 search market that are not addressed. Let me summarize each:

  1. A naive dismissal of the Google Search Appliance, OneBox API, and Google Apps as not important to the major players in behind-the-firewall search. My data suggest that Google has about 8,500 licensees of the maligned GSA. Interest in Google Apps is climbing, often following the sky rocketing interest in Google Maps. Google is going to reshape the behind-the-firewall market for search and other applications.
  2. Growing importance of international vendors. I am continually surprised that many of the organizations with whom I speak about behind-the-firewall search are essentially ignorant of important North American vendors such as Attensity, Cognition Technologies, Siderean Software, or Thetus. But I am thunderstruck when these informed and bright people look baffled when I mention Bitext, Copper Eye, Polyspot, and Lingway. I haven’t mentioned the innovators in behind-the-firewall search in the Pacific Rim. Big changes are afoot, and few in the U.S. seem to care very much. There’s more curiosity about new Apple iPods than enterprise information systems, I surmise.
  3. Over confidence in search expertise and knowledge. I have been amazed on several occasions in the last six months at the lack of knowledge about the “gotchas” in search and the incredible hubris of certain procurement teams. In addition to refusing to consider a hosted or managed solution, these folks have zero knowledge of viable solutions developed in far-off, mysterious places like far-off France. Amazing. I meet many 25-year-olds who have “mastered” the intricacies of behind-the-firewall search. I conclude that it must be wonderful to be so smart so young. I’m still learning by plodding along. I’ve been at this more than 30 years and know I don’t know very much at all.

Let me close with an anecdote. One of my long-time friends and colleagues told me that her firm’s behind-the-firewall search system didn’t work. I think the word she used was sucked. Young people are quite colloquial.

I said, “Didn’t I try to flag you off that vendor?”

She replied, “Yes, but our VP of Information Technology made the decision. He knew what he wanted and made the deal happen.” I think she made a sound like an annoyed ocelot, a grrrr sound.)

What’s interesting about this exchange is that company with the search system that “sucked” conducts analyses of text mining, knowledge management, and “enterprise” search systems — for a fee.

I am struggling with how to communicate the need for those who want to procure a behind-the-firewall search system to make a decision based on understanding, facts, and specific, pragmatic requirements. I thought it was my generation who watched Star Trek and believed that technology would make it possible to issue voice commands to computers or say “Beam me up” to move from place to place. I learned in 2007 that recent graduates of prestigious computer science programs have absorbed Star Trek’s teachings.

Just one problem. Behind-the-firewall search remains a complex challenge. I document in Beyond Search 13 “disasters” and provide guidance on how to extricate oneself from the clutches of these problems. There’s no “beam me up” solution to the rats’ nest of issues that plague some behind-the-firewall search solutions — yet.

Stephen Arnold, February 26, 2008

« Previous Page

  • Archives

  • Recent Posts

  • Meta