Ontos: a Text Processing Company, Not a Weapon

June 5, 2008

In a conference call yesterday (June 4, 2008), someone mentioned “Ontos”. Another person asked, “What’s an Ontos?” I answered, “An anti-tank vehicle” What I remembered about the Ontos is that it was a tank loaded down with so many weapons I a turtle was speedier. Big laugh. Ontos is a company engaged in text and content processing with a product called ObjectSpark. To fill in the void in my knowledge, I navigated to the GOOG, plugged in “Ontos” and found a link to a 2001 article in Intelligent Enterprise, a very good Web site now that the print magazine has been put out to pasture. You can read the description here.

The company’s English language Web site is at www.ontos.com. The product line up no longer relies on the ObjectSpark name. You can license:

  • OntosMiner, which “analyzes natural language text. It recognizes objects and their relations and adds them as annotations to the related text parts. The technology is based on semantic rules, i.e. NLP (Natural Language Processing). It uses ontologies to define the area of interest.”
  • LightOntos for Workgroups, which “helps to organize and search information and documents. It allows the user to process and annotate PDF, Word, RTF, Text or HTML files using OntosMiner.”
  • Ontos SOA, which “realizes the whole cycle of semantic-syntactical processing, management and analysis of unstructured information located in the Internet and large corporative data banks.”
  • TAIS Ontos, which is “created as an Application Package using ORACLE technologies and Java. The system uses a semantic designed for building and maintaining object oriented databases. Additional components are effective engines for the search of explicit and hidden relations between objects. A visualization environment (interface) supports the analysts when analyzing a domain of interest. The product is adapted for the segment of law enforcing structures and attributed to the class of anti-criminal analytical systems”

The display of tagged text uses color to identify specific elements. When I saw this display, it reminded me to the output from Inxight Software’s text processing system.

ontos mark up

The company’s Russian partner–ZAO AviComp Services–participated in the recent German technical extravaganza, CEBIT 2008.

You will find a handful of white papers on the Ontos Web site. I found “Ontos Solutions for the Semantic Web” quite interesting and informative. You can download it here.

I wasn’t able to locate any pricing or licensing information. If you have some of these data points, please, use the comment form below this essay to share the information with other readers. My email to the company went unanswered.

Based on my clicking through the Web site, you might want to take a look at this system. The white papers and technical descriptions use the buzz words that other vendors bandy about. The one drawback to a system that lacks a high profile in the US is this question, “Does the system meet US security guidelines?” My hunch is that the system is industrial strength; otherwise, the Brussels customer would not have signed a deal to use the Ontos technology.

Stephen Arnold, June 5, 2008

SAS: BI Giant Sends Mixed Signals

May 26, 2008

SAS Institute, one of the leaders in business analytics and the world’s largest privately-held software company, has taken several actions that signals changes at the Cary, North Carolina firm.

Prior to the company’s acquisition of Teragram, SAS moved forward with measured steps. Innovations made statisticians and business analysts giddy with excitement. The average employee dependent upon SAS reports and data noticed little if any significant change.

Now changes are coming at what appears to be faster pace. First, the company announced that it was laying off employees in its educational division. I was introduced to SAS or “sass” as my professor pronounced it at university. Other schools indoctrinated their students with SPSS, SAS Institute’s Chicago-based competitor. If you took advanced statistic, you learned one or the other, and once learned, you stuck with that toolkit unless there was a boss who made you learn the other program.

Layoffs That Weren’t

When I saw the announcement of the layoffs, it struck me as odd. Then I saw this explanation by SAS in the Charlotte News & Observer here.  Navigate to this story quickly. Traditional publishers pull articles or make them hard to find a day or two after these appear online.

The terminations rumor if true, the shift seemed to mark change in tactics in the company’s battle with SPSS. Tomorrow’s analysts use the tools learned in their university statistics and math classes. A pull back in education said to me, “We’re looking for better ways to market.”

No. SAS is adding staff, and it is not for sale. In my experience, everything is for sale, but I will take SAS at its word. The mix up in the local newspaper is peculiar. Experienced reporters don’t make these types of mistakes very often. The reporter heard something; otherwise, the story would not have found its way past the editor into the newspaper.

Teragram and Lucene

Then I cam across information in CMSWire here saying that SAS’ Teragram text processing unit was picking “up up some multilingual, natural language support with the recent integration of Teragram Linguistic Tools.”

Lucene, which I describe in some detail here, is Apache open source search engine. Lucene forms the basis of the IBM Yahoo “free” search, and it is used by many companies looking for a low-cost alternative for such basics and key word search.

With the integration of Teragram‘s tools, Lucene appears to get a steroid injection. Beefed up, Lucene should be able to give a Lucene adept a way to use taxonomies and classification schemes. These are essential if you want to provide users with “See Also” and “Use For” suggestions. The popular point-and-click interfaces such as the one I showed in my talk at the Enterprise Search Summit for the Oracle Technology Network are what users crave instead of a laundry list of search results. (In that Oracle demo I showed the Siderean Software technology implemented by Oracle here.)

If Teragram makes its bag of text processing tricks available, you will be able to provide some advanced features to blunt the edges of dull tool of key word search. Teragram provides a spell check feature and its supports different languages.

In short, the SAS Teragram move may signal a shift in marketing for the Cary, North Carolina firm. SAS once meant proprietary. With the Lucene Teragram hook up, SAS is playing an open source card.
When I step back, these two unrelated events indicate to me that staid SAS is in the midst of change. There’s increasing competition in business analytics. See my HiQube and Infobright technology makes it possible for SAS to distance itself over time from the essays. The TeragramInxight technology that SAS has integrated into its core platform. Inxight, as you may know, was a text processing tools vendor not unlike Teragram. Business Objects, a competitor to SAS, bought Inxight. Then Business Objects itself was acquired by SAP further muddying the water for integrated business analytics.

The confusion over staff changes underscore a lack of coordination. SAS had previously done a good job managing its public face. A gaffe in the home town newspaper means that communication signals were crossed, possibly due to pressures within SAS created by marketing shifts.

My view is that I now have to pay closer attention to SAS. For decades, the company hummed along like a Singer sewing machine. Now, the jerks and starts indicate that some changes are taking place. We don’t know what’s happening. We do know that the competitive arena in the once quiet business and text analytics market niche are becoming evident. Agree? Disagree? Let me know.

Stephen Arnold, May 25, 2008

Government High-Tech Investments: IN-Q-TEL

May 26, 2008

I received an email from a colleague new to the Federal sector. Her email included comments and links about US government funding of high technology companies. I was surprised because I assumed that most people knew of the IN-Q-TEL organization. As US government urls go, IN-Q-TEL’s will baffle some people. First, the hyphens throw off some folks. Then the group’s use of the Dot Org domain is another.

inqtel splash

In a nutshell, IN-Q-TEL makes clear what it does and why:

IN-Q-TEL identifies, adapts, and delivers innovative technology solutions to support the missions of the Central Intelligence Agency and the broader US intelligence community.

I’m not interested in whether IN-Q-TEL is doing a great job or a lousy job. I’m not concerned about its mission, its funding, or its management team.

What I find fascinating is the organization’s choice of companies in which to invest. I don’t know the budget range of IN-Q-TEL, but my sources tell me that the investments stick close to $1 million, sometimes more, sometimes less. You can read more about IN-Q-TEL at these links:

  • The Wikipedia entry, and I am not vouching for the accuracy of this entry
  • The CIA’s own description here
  • KMWorld’s write up here. (I am a paid columnist for KMWorld, but I did not contribute to this story.)

The purpose of this feature is to provide a snapshot of the companies in which IN-Q-TEL has invested. I’ve identified more than 70 companies. This is too many to put in one posting, so I will break up the list and cover the period 2000 to 2003 here and do each subsequent year in additional Beyond Search postings.

In the period from 2000 to 2003, IN-Q-TEL invested in 25 companies. Keep in mind that I may have overlooked some in my research. If you know of a company I missed, please, use the comment section of this Web log to update my information. These appear in the table below:

Read more

Xerox Factspotter: Thingfinder’s Second Cousin

April 20, 2008

A long time ago in a research park far, far away, Xerox PARC (Palo Alto Research Center) developed text processing systems. Xerox PARC spun out a bundle of this content processing technology as Inxight Software. For about a decade, Inxight chugged along, winning accolades from the spookeratti in Washington, DC’s intelligence community. Business Objects, a disrupter of the business intelligence space, bought Inxight Software. The deal rippled the fabric of the likes of SAS Institute, a company licening Inxight’s technology for its data mining systems. Then SAP bought business objects. Along the way, start ups like PowerSet used some of Xerox’s technology to build a whizzy search start up.

Amidst this slow flowing river of deals, Xerox is back. This time “the document company” has Factspotter. Now Factspotter, like most search and text processing systems, is not newly-sprung from Xerox’s idea hathchery in Grenoble, France. The research team at XRCE, an forgettable acronym for Xerox Research Centre Europe.

I learned about Factspotter in early 2007. I dug through my files and unearthed this description of the invention from the Xerox news release:

Unlike traditional enterprise search tools, FactSpotter looks not only for the keywords contained in a query but also the context of the document those words contain. For example, if searching for documents that reference Angelina Jolie, FactSpotter will also return results where the pronoun “she” is used instead of Jolie’s full name. The “smart” search engine can comb through almost any document regardless of the language, location, format or type; take advantage of the way humans think, speak and ask questions; and discriminate the results highlighting just a handful of relevant answers instead of returning thousands of unrelated responses.

I haven’t been tracking Xerox’s “inventions” or its document processing business until the IBM InfoPrint entity popped into being in 2007. Then in January 2008, Hewlett Packard paid $1.2 billion for the Exstream Software operation in Lexington, Kentucky. When these document processing developments took place, I wondered what had happened to Xerox, “the document company”. After that thought, Xerox drifted off my radar–until today, April 12.

Someone emailed me a snippet of text from IT Reseller. The key points, which I have edited for easier readability,are:

[Factspotter’s] novel interface means users can express their queries naturally instead of forcing them to adapt their questions to the logic of computers. Traditional systems, on the other hand, split a query into isolated words and return only documents that contain exactly those words in exactly that order, And [Factspotter] takes into account the context of the entire document instead of just a cluster of nearby words. And [Factspotter] introduces the concept of “relation,” searching within and across sentences and paragraphs. It recognizes abstract concepts, like “people” or “building,” and will retrieve all the words that fit within that category.

Xerox’s marketing mavens were dead on in 2007. The only issue is that I have is that it’s on the Xerox Web site, but not anywhere else. If you know the fate of Thingfinder’s second cousin, write me at seaky2000 @ yahoo.com.

Stephen Arnold, April 20, 2008

Text Mining: No-Cost Resources

April 19, 2008

Engineers without Fears has a post by Matt Moore that contains four useful links. If you are looking for a way to get up to speed on this “beyond search” function, navigate to this post.

None is without some constraints; each is useful. First, you can read a six-page paper comparing four systems: Leximancer, Megaputer, SAS Institute, and SPSS. Keep in mind that each of these is approaches text mining from very different angles of attack. Leximancer is a useful system that can become difficult to navigate in visualization mode. Megaputer, developed by wizards from a university in Russia, is robust but can be complex to operate. SAS has licensed technology from Inxight Software (now owned by SAP’s Business Objects) and the recent buyer of text processing specialist, Teragram. Expect some changes in the SAS approach in the near future. SPSS, a company best known for data mining, acquired LexiQuest and uses that company’s technologies in its systems. Nevertheless, you can pick up some helpful information in “An Evaluation of Unstructured Text Mining Software”. The link appears on Engineers without Fears.

The link to the National Centre for Text Mining is particularly helpful. The information available on the site ranges from traditional society boilerplate to the more useful comments about tools and research. You may find it useful to spider the entire site. Information can appear and disappear, so an archive is helpful if you plan on extending your research over a period of years.

The links to a lecture by Dr. Marti Hearst is a must read. Most vendors have sucked concepts, phrases, and data from Dr. Hearst’s work, often without giving her credit. This particular paper dates from late 2003, and a quick search of Google and the University of California – Berkeley Web site will point you to more current information. (You may want to narrow your query to computer science and allied disciplines. The site is sprawling, and it can difficult to locate what you need. UC Berkeley obviously doesn’t pay much attention to Dr. Hearst’s expertise.)

The link to the 2003 New York Times’s article satisfies a researcher’s need to get the “gray lady’s” take on a technical topic. I don’t pay much attention to the information in newspapers, but you can decide for yourself. Engineering documents, patent applications, and technical articles often provide more useful information without the rhetorical over extension needed to convert an equation into a two word phrase or a metaphor.

If you have a budget, you will want to look at the profiles of text mining companies in Beyond Search, a 300-page review of text mining and its component parts. The study also includes a discussion of approaches to content processing that “wrap” text mining in more usable applications. More information about this resource is located here.

Stephen Arnold, April 19, 2008

Teragram: SAS’s Search Launchpad

March 20, 2008

This week SAS announced that it purchased Teragram, a content processing company with deep roots in, computer science, mathematics, and blue – chip clients. If you poke around Teragram’s Web site, you learn that the company supports double byte languages. If I read the Teragram information correctly, this little-known outfit not far from Harvard Yard has proprietary technology strongly suggestive of the super – sophisticated techniques in use at IBM, Google, Microsoft, and Yahoo.

The Teragram system can match other systems advanced functions like advanced function — NLP (natural language processing)? Automatic summarization? No problem. Hosted services option? Check. Autonomy – Recommind type patten matching? Done. Attensity and Bitext style linguistic analysis? Covered. Teragram has a warehouse chock full of search and content processing goodies.

Now SAS owns this “search tech” tool box.

Teragram, founded in 1997, was a privately-held content processing company in Cambridge, Massachusetts. Two wizards — both from Luxembourg — have applied their computer science and mathematical expertise to unstructured information for more than a decade. That’s a long time in the fast-moving search and text processing sector.

I learned about Teragram when someone told me that the company was a technology provider to Fast Search & Transfer SA. Fast Search’s Dr. John Lervik is a canny technologist, and he has a good nose for solid technology.

Read more

SAS Buys Teragram Corporation

March 17, 2008

SAS Institute Inc. (Cary, North Carolina) announced today that it had acquired Teragram “to strengthen industry-leading text mining, analyticss”. Teragram, founded in 1997 by two technology wizards from Luxembourg. I’m working to chase down more details of the deal. SAS (a tight-lipped, privately-owned company best known for its industrial-strength analytics) seems like an ideal owner of the low-profile, privately-held firm with offices in Cambridge, Massachusetts.

Among the capabilities SAS highlighted in its announcement today are Teragram’s functionality; specifically:

  • Natural language processing
  • Automatic categorization
  • Enterprise search
  • Mobile search.

When Inxight Software was gobbled up by Business Objects (another analytics outfit), I had a hunch SAS would rethink its use of the Inxight tools. SAS was in a difficult position because a competitor or semi-competitor was in a position to make life somewhat uncomfortable. Then SAP, the German software giant, bought Business Objects. SAS had to take action in order to increase its degrees of text analytics freedom. With Teragram, SAS has options and some interesting technology.

Look for a summary of Teragram’s technology. In Beyond Search, I decided not to include this company. Rumors about a change at Teragram surfaced earlier this year. I have learned that rewriting studies to adapt to the acquisitions and business failures is not much fun.

If you want a jump start on Teragram’s customers, click here. To check out Teragram’s explanation of its rules-based approach to content processing, click here. I will cover this particular aspect of Teragram’s technology in another essay.

More buy outs are looming. With the deepening financial morass in the US, I also believe some of the weaker search and content processing firms are going to turn off their lights. The cost of developing, innovating, and maintaining text processing technology is far greater than most people know.

SPSS — a direct competitor — acquired LexiQuest Inc., a linguistics-based text mining system developer. SPSS, therefore, took control of its text mining and analytics fate with this 2002 purchase. Licensing technology yields some significant benefits. When a technology provider goes belly up or is purchased by a competitor, the happy face can morph into a frown quickly.

Stay tuned. More Teragram information will appear in a day or two.

Stephen Arnold, March 17, 2008

Rain on the Search Parade

March 14, 2008

The storm warnings flash across the sky. This morning (Mrch 14, 2008) BearStearns is rumored to face a Carlyle-like liquidity crisis.

But so far no lightning has hit the search lightening rods. In fact, the unsettled financial weather has had no visible effects. The Google – DoubleClick deal is done. The Microsoft – Fast Search tie up is nearing port. Yahoo says that it is embracing the Semantic Web whatever that means (semantically, of course). France funds a Google killer. Radar’s Twine spools out. Business as usual in the search sector. But still we have no “real” solution to the “problem” of Intranet search, what I call behind-the-firewall search. The marketing razzle dazzle can’t mask the pain begging for lidocaine.

The turmoil in the financial market, the degrading dollar, and the $1,000 per ounce gold price seem to have little impact on search and retrieval so far. Anyone who suggests that a problem looms or that an actual panic could occur is an alarmist. I don’t want to sound any alarms.

InfoWorld‘s Web log contained a post that has to make search vendors’ pant with revenue lust. Jon Williams wrote here on March 13, 2008:

Every system we build has a search function built into it, usually hand-crafted (proprietary). Why? … Search on the internet, whether it be google, youtube, facebook, amazon, ebay, or linkedin, is solved for me, I always find what I need. And I believe the same is true for most consumers. But why not in the enterprise? Seems like a solution waiting to happen.

Spot on, Mr. Willliams. Spot on. This unanswered need is why you won’t hear gloom and doom from me. Search often sucks, and whoever solves this problem can make their investors happy in our down market.

An Entrepreneur’s Concern

At dinner yesterday evening (March 13, 2008) in Palo Alto’s noisy Fish Market, I showed the president of a hosted application my current list of 150 next-generation search and content processing companies. Most of the outfits on this list won’t resonate with you. Bitext operates from Madrid, Spain. Thetus has offices near Microsoft’s stomping grounds. PolySpot is tucked away in Paris, France. He had heard of none of these companies or most of the others on my list.

He said, “There are so many on this list unknown to me.” Not unusual. He then asked me, “How can these companies survive so much competition? I think the market downturn will make it very hard for these companies.

Right?”

I said, “Yep, tough sector. But no one has the one right answer. Not Google. Not IBM. Not the seven score newcomers on my list.”

The search market remains a triathlon, one of those “iron” versions that require competitors to climb mountains, swim rapids, and bicycle from Burlingame to Boise. But there are some formidable hurdles search vendors must overcome; namely:

Oversupply. Without rehashing dear old Samuelson’s Economics (now in its 18th edition I think), you have an embarrassment of riches for search. You have high-profile, publicly-traded “brands” like Autonomy. You have market-leading companies like Endeca. You have up-and-coming vendors like Coveo, Exalead, ISYS Search Software, and Vivisimo. You have state-of-the-art deep extraction providers like Attensity and Exegy (bet you never heard of Exegy, right?). You have free search software such as Lucene and Flax. You have such super-platforms as IBM, Microsoft, Oracle, and SAP including search with every enterprise applications licensed. You have specialists in entity extraction (Inxight / Business Objects), semantics (Siderean), ANSI standard controlled terms (Access Innovations). You get the idea. Can the market support hundreds of vendors of search and content processing?

Confusion. You don’t want me to belabor this point. There’s a great deal of confusion about search, content processing, text mining, and related disciplines. The easiest way to illustrate this is to provide you with a handful of the buzz words that I have collected in the last two weeks. How many of these can you define? How many of these do you use in your discourse with colleagues? Here are the “Cs” through the “Ks” only:

Collective knowledge systems
Community portals
Composite applications
Conferencing
Context aware games
Context aware mobile search
Context aware search
Context search
Faceted search
Folksomony
Formal language
Geospatial search
Glass boxes
Instant messaging
Intelligent agents
Knowledge base
Knowledge computing
Knowledge management
Knowledge spaces

Confused buyers often drag their heels as they try to decipher the nuances of search-speak.

Skepticism. Some vendors have told me that potential customers are skeptical about some search features and functions. For example, on a telephone call with a non-U.S. search system vendor, a principal in the company told me, “The nest has been fouled. Two prospects told me today that our two to five day deployment time was impossible. Their incumbent system took more than a month to get installed and another two months of effort before deployment.” As organizations get more behind-the-firewall search experience, those organization’s employees know that some vendor claims may be a blend of wishful thinking and science fiction.

Over confidence. I don’t have much to say about this human failing. Most chief technical officers over estimate what they know about search and retrieval. Most of the Intranet search problems problems have their roots anchored in the licensees’ assumptions about what their systems can do, their knowledge of search systems, and their ability to figure out software. I get my Greek myths mixed up, but there were, as I recall, quite a few stories about the nasty effects of pride. “Flame out” and Icarus resonate with me.

Loosey goosey pricing. In the course of the research for my new study Beyond Search, I encountered one vendor who refused to give me a starting price for its system. The president refused. I said, “Take your total revenue, divide it by the number of customers you have, and I will use that number as the average price.” He sputtered in anger. Let’s face it. Unless something is free, most search software comes with a price tag. Even a free system such as Lucene costs money because someone who gets a salary has to babysit the Lucene system. More and more vendors are tap dancing on the cost of their licenses, services, and support. I suspect that these vendors want to hold out to get the best possible price. Maybe these vendors don’t want other customers to know that a price is rising or falling?

Adam Smith’s “invisible hand” will reach out to strangle me. Economics in March 2008, however, continues to surprise the Wall Street set. Last time I checked the super-secret Carlyle Group did not expect fellow bankers to demand cash.

How untoward!

But if some of the best-known financial services companies are in the doo-doo, what will become of the more 300 firms engaged in search and retrieval? Even the Teflon-coated Google has drawn criticism. Today (March 14, 2008) Google’s share price will open at $443, down from its 52-week high of $747. Microsoft will pay $1.2 billion for a chance at bat to hit a search home run. That’s a pricey swing methinks. In my conversations at conferences, I detect a note of concern about making numbers. Entrepreneurs are thoughtful.

Wrap Up

To wrap up, I believe the search landscape will be pockmarked with Entopia-like shut downs. I also anticipate more strident marketing. Sigh. There will be some buy outs, but there will be some firms that cannot sell out. One reader of this Web log wondered if Autonomy was an example of company that many look at but none has carried over the threshold. Maybe the right suitor has not come forward? I believe that some countries will intervene in order to keep certain search firms in business. Anyone think that the French government has this as a motive for the funding of its Google killer? Other companies will give away search software and try to make money via services and consulting. And don’t forget the bundling option. Every time I buy an IBM server, I get Lotus Notes. Perhaps the same approach will be used by Microsoft and Oracle to “lock in” customers with this tactic.

The big concern I have is that search’s “bird flu” will land. The weaker firms will die after a tough fight. The stronger firms will capture a larger share of the market. Instead of the surfeit of choices we have today, we may end up with fewer choices, higher prices, and a stifling of innovation. What do you think? End or beginning for behind-the-firewall search?

Stephen Arnold, March 14, 2008

SharePoint: Another “Free” Behind-the-Firewall Search System?

March 3, 2008

It’s 6 am in cheery Louisville International Airport, but the word “international” can be misleading. The news this morning is that Microsoft will roll out a “new” SharePoint search service. You can read the breathless InfoWorld story here. The announcement will be made, I believe, at one, maybe two, separate Microsoft conferences this week.

The “free” word is a powerful marketing tool for commercial firms. When it comes to behind-the-firewall search, “free” is a synonym for demonstration product. The set up, configuration, debug process, optimization, and operation of a search or content processing system come with some hefty costs. The license fee is, of course, the cost that the gullible seize upon. When you root around in the financial statements of publicly-traded companies in the search and retrieval business, you find that many are trying to follow in Verity’s pre-sell out footsteps. Specifically, vendors want to pump up consulting fees, making them carry the freight for earnings and growth. My recollection is foggy after seven consecutive days of travel, but Verity was generating more than half its revenues from non-license revenue. The number 65 percent pops in and out of my memory, but I’m going to have to dig through my files to verify this. As license revenues flat line (a common problem for some search vendors), cash can be generated by selling services. These are higher margin than a license fee with yearly maintenance fee add ons. Services can be open ended, and have a certain upside revenue charm for certain software vendors.

“Free” Search Systems: A Marketing Tactic

The idea is that you can install a working version of the program, get a sense of its basic features, and kick the tires. When we tested the “free” IBM – Yahoo Ominifind search system a few months ago, it worked quite well, but it had a document limit. My recollection is that most of the “free” systems have some type of governor on the system. The reason is that the “free” system is a way to qualify sales leads. When a user needs to process more content or perform some magic such as integrating the system into a third-party application, the vendor jumps with joy. A real sales lead has landed in her lap without booth duty, blogging, or hammer dialing.

Microsoft has jumped into the “free” fray with a beefed up search function for SharePoint. The SharePoint system has been in the forefront of the “knowledge management” revolution. The idea is that a Web-like interface makes it possible for a user to find, edit, share, and connect with colleagues, their documents, or related content. The word “portal” is sometimes used to describe this multi-function interface.

My sources tell me that SharePoint has more than 100 million users worldwide. This is a significant jump from the 65 million users I had learned in the fourth quarter of 2007. Microsoft SharePoint is on a roll. When we install a robust content management system designed to work in a Microsoft-centric environment, SharePoint is a required “server”. In fact, to make these high-end CMS systems function, we typically install SQLServer, Windows Server, and IIS (Internet Information Services), among others. I may be wrong in how I perceive this server conga line, however.

Microsoft Search Systems

In my analyses of SharePoint search in the first three editions of the Enterprise Search Report, I summarized these separate search systems for SharePoint.

  • SharePoint search with a “blue” interface
  • SharePoint search with a “green” interface
  • SQLServer search
  • Microsoft tool bar search
  • Start button / Explorer search
  • Microsoft’s http://search.live.com

Without repeating that 40-page analysis and tromping over the rights I assigned to CMSWatch.com, I can go into much detail about what each of these different search systems do. But what I can tell you is that there is not “one” search system available when you implement a SharePoint search.

What’s New?

The “free” system is Search Server 2008 Express. Express was rolled out last year and includes metatag functions so results can be sorted. You can also click on a colleague’s name and see documents written by that person. Keep in mind that SharePoint is not breaking new ground here. SharePoint is adding features that have been available from Certified Gold Partners like Coveo and Mondosoft, among others, for a couple of years. What’s new is that anyone will be able to download Express and give it a whirl. My understanding was that only certain customers would be able to experiment with the Express system. I don’t have a download link, which I think will be available in the near future. You can also download a version of Silverlight to hook visualization into search results. Again, this is a feature that has been available from such vendors as Inxight Software (now part of Business Objects and owned by SAP) for more than a decade.

Observations

I am intrigued with this “free” version of Express. When I look at it in terms of Autonomy, I see a counter to Autonomy’s UltraSeek solution. UltraSeek, developed when Steve Kirsch was at InfoSeek, is a useful system acquired when Autonomy gobbled up Verity in December 2006. Autonomy, according to my sources, has had some success upselling UltraSeek users to more robust search and retrieval solutions.

When I compare the different “flavors” of SharePoint search with offerings from Microsoft Certified Gold Partners, I am somewhat uncertain about the Microsoft approach. For example, Interse, a company with a modest profile in Harrod’s Creek, Kentucky, offers software that manipulates the metadata available in SharePoint repositories. Also, Fast Search & Transfer coded an adapter for SharePoint. With this code widget, a SharePoint customer could use the functionality supported by the Fast ESP (enterprise search platform). In addition, there are a number of companies offering enhancements to SharePoint.

The reason there are so many search, indexing, and content processing options for SharePoint boils down to two reasons in my opinion. First, Microsoft encouraged its partners to create these products. Second, the SharePoint search is not as easy to use for system administrators as it could be. (Forget “good” because most search and retrieval systems leave as many as two-thirds of their users griping.)

I will be interested to see how Microsoft handles the Certified Gold Partners who might feel a bit of competitive pressure. I’m also interested to see how the SharePoint platform will be mapped to the FAST enterprise search platform. (There are some areas of overlap and a few interesting technical issues to resolve.)

To wrap up, I urge you to download and install the Express search function. You are canny enough to know that you should check out these systems vendors as well:

  1. Coveo (Canada)
  2. Exalead (France)
  3. ISYS Search Software
  4. Mondosoft (now part of SurfRay in Denmark)

You can get a copy of Enterprise Search Report (now in its 4th edition) or place an order for my Beyond Search study, which will be available in April 2008).

SharePoint is a useful system, and it isn’t going to be displaced by a competitive system anytime soon. Keep in mind that it’s complex. You know behind-the-firewall search is complex. So “free” doesn’t mean with out cost. You will have to throw time, programmers, and effort at anyone’s “free” search system. That goes for anyone who offers a “free lunch” to you.

Stephen Arnold, March 3, 2008

Is Search Approaching a Crisis in 2008?

February 26, 2008

In May 2007, I will be doing the end note talk at the Enterprise Search Summit 2008. This is a conference owned and managed by Information Today, Inc. This may be the third or fourth year that I have anchored the program. Last year, Sue Feldman, IDC’s well-known search wizard, and Robert Peck, Managing Director of BearStearns’ Internet unit “debated” me last year. The idea is that I am known to be controversial, so representatives of received wisdom about “enterprise search”, a term I don’t like. For May 2008, I’m not certain what Information Today has planned to counter balance by contrarian views of behind-the-firewall search.

I worked yesterday to locate my remarks from 2007 here and come up with observations based on my research since May 2007. I have two studies under my belt in the last 10 months– Google Version 2.0 and Beyond Search: What to Do When Your Search System Doesn’t Work. Google is an interesting company, and I will be talking about its impact on enterprise software at the AIIM Show in Boston on March 4, 2008. My research for Beyond Search unearthed a number of interesting facts and insights. I am inclined to lean heavily on that information for the Enterprise Search Summit 2008 “controversial” end note.

I want to outline my preliminary thinking for my May 2008 remarks and invite comments on my views. Accordingly, here’s the table I created yesterday:

2007 Crisis

2008 Delta

Observation

Organization’s info tech departments are in trouble

No change

Complexity continues to escalate. There’s a reason Salesforce.com, Amazon S3 and EC2, and NetSuite are getting hard looks. Blossom, Exalead, and Fast Search offer hosted solutions

Costs are rising

Financial pressure is increasing

Buy outs, staff reductions, and repositioning are making it tough for potential buyers to know what search vendors have on offer. Examples: Autonomy and Zantaz; Inxight becoming part of Business Objects, then BO getting acquired by SAP, then SAP investing in Endeca.

Customers

More confident in their ability to select the right system than in 2007

Arrogance, not common sense, on the rise.

Vendors

More despite buys outs and consolidation

Vendors are morphing quickly. Utilities become search engines. Search engines become platforms. Platforms become knowledge systems. Too many companies chasing too few customers.

Sea change

Greater uncertainty “Stay the course” seems to be the mantra.

As I reflect on these points, I see three characteristics of the 2008 search market that are not addressed. Let me summarize each:

  1. A naive dismissal of the Google Search Appliance, OneBox API, and Google Apps as not important to the major players in behind-the-firewall search. My data suggest that Google has about 8,500 licensees of the maligned GSA. Interest in Google Apps is climbing, often following the sky rocketing interest in Google Maps. Google is going to reshape the behind-the-firewall market for search and other applications.
  2. Growing importance of international vendors. I am continually surprised that many of the organizations with whom I speak about behind-the-firewall search are essentially ignorant of important North American vendors such as Attensity, Cognition Technologies, Siderean Software, or Thetus. But I am thunderstruck when these informed and bright people look baffled when I mention Bitext, Copper Eye, Polyspot, and Lingway. I haven’t mentioned the innovators in behind-the-firewall search in the Pacific Rim. Big changes are afoot, and few in the U.S. seem to care very much. There’s more curiosity about new Apple iPods than enterprise information systems, I surmise.
  3. Over confidence in search expertise and knowledge. I have been amazed on several occasions in the last six months at the lack of knowledge about the “gotchas” in search and the incredible hubris of certain procurement teams. In addition to refusing to consider a hosted or managed solution, these folks have zero knowledge of viable solutions developed in far-off, mysterious places like far-off France. Amazing. I meet many 25-year-olds who have “mastered” the intricacies of behind-the-firewall search. I conclude that it must be wonderful to be so smart so young. I’m still learning by plodding along. I’ve been at this more than 30 years and know I don’t know very much at all.

Let me close with an anecdote. One of my long-time friends and colleagues told me that her firm’s behind-the-firewall search system didn’t work. I think the word she used was sucked. Young people are quite colloquial.

I said, “Didn’t I try to flag you off that vendor?”

She replied, “Yes, but our VP of Information Technology made the decision. He knew what he wanted and made the deal happen.” I think she made a sound like an annoyed ocelot, a grrrr sound.)

What’s interesting about this exchange is that company with the search system that “sucked” conducts analyses of text mining, knowledge management, and “enterprise” search systems — for a fee.

I am struggling with how to communicate the need for those who want to procure a behind-the-firewall search system to make a decision based on understanding, facts, and specific, pragmatic requirements. I thought it was my generation who watched Star Trek and believed that technology would make it possible to issue voice commands to computers or say “Beam me up” to move from place to place. I learned in 2007 that recent graduates of prestigious computer science programs have absorbed Star Trek’s teachings.

Just one problem. Behind-the-firewall search remains a complex challenge. I document in Beyond Search 13 “disasters” and provide guidance on how to extricate oneself from the clutches of these problems. There’s no “beam me up” solution to the rats’ nest of issues that plague some behind-the-firewall search solutions — yet.

Stephen Arnold, February 26, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta