Real Time Search Systems, Part 1

June 21, 2010

Editor’s note: For those in the New Orleans real time search lecture and the Madrid semantic search talk, I promised to make available some of the information I discussed. Attendees are often hungry to have a take away, and I want to offer a refrigerator magnet, not the cruise ship gift shop. This post will provide a summary of the real time information services I mentioned. The group focuses on content processed from such services as Facebook, Twitter, blogs, and other geysers of digital confetti. A subsequent blog post will present the basics of my draft taxonomy of real time search. I know that most readers will kick the candy bar wrapper into the gutter. If you are one of the folks who picks up the taxonomy, a credit line would make the addled goose feel less like a down pillow and more like a Marie Antoinette pond ornament.

What’s Real Time Search?

Ah, gentle reader, real time search is marketing baloney. Life has latency. You call me on the phone and days, maybe weeks go by, and I don’t return the call. In the digital world, you get an SMS and you think it was rocketed to you by the ever vigilant telecommunications companies. Not exactly. In most cases, unless you conduct a laboratory test between mobiles on different systems, capturing the transmit time, the receiving time, and other data points such as time of day, geolocation, etc., you don’t have a clue what the latency between sending and receiving. Isn’t it easier to assume that the message was sent instantly. When you delve into other types of information, you may discover that what you thought was real time is something quite different. The “check is in the mail” applies to digital information, index updating, query processing, system response time, and double talk from organizations too cheap or too disorganized to do much of anything quickly. Thus, real time is a slippery fish.

Real Time Search Systems

Why do I use the phrase “real time”? I don’t have a better phrase at hand. Vendors yap about real time and a very, very few explain exactly what their use of the phrase means. One outfit that deserves a pat on the head is Exalead. The company explains that in an organization, most information is available to an authorized user no less than 15 minutes after the Exalead system becomes aware of the data. That’s fast, and it beats the gym shorts of many other vendors. I would love to pinpoint the turtles, but my legal eagle cautions me that this type of sportiness will get me a yellow card. Figure it out for yourself is the sad consequence.

Here’s the list of the systems I identified in my lectures. I don’t work for any of these outfits, and I use different services depending on my specific information needs. You are, therefore, invited to run sample queries on these services or turn to one of the “real” journalists for their take. If you have spare cash and found yourself in the lower quartile of your math class, you may find that an azure chip consultant is just what you need to make it in the crazy world of online information.

In my lectures I made four points about these types of real-time search services.

First, each of these services did at the time of my talks deliver more useful and comprehensive results than the “real time search” services from the Big Gals in the Web search game; namely, Google, Microsoft Bing, and Yahoo. Yahoo, I pointed out, doesn’t do real time search itself. Yahoo has a deal with the OneRiot.com outfit. The service is useful and I suppose I could stick it in the list above, but I am just cutting and pasting from the PowerPoint decks I used as crutches and dogs in my lecture.

Read more

Podcast Interview with Paul Doscher, Part 2: The Exalead Technology

June 21, 2010

Exalead’s Paul Doscher talks about Exalead’s technology on the June 21, 2010, ArnoldIT Beyond Search podcast. Exalead has been growing rapidly, landing blue-chip accounts with the largest technology company in North America, the French postal service, and Canada’s Urbanizer.com. In this podcast, Mr. Doscher talks about Exalead’s technical approach to content processing and the framework that makes search-based applications crack tough problems in information access. You can listen to the podcast on the ArnoldIT.com Web site. More information about Exalead is available from www.exalead.com. The ArnoldIT podcast series extends the Search Wizards Speak series of interview beyond text into rich media. Watch this blog for announcements about other rich media programs from the professionals who move information retrieval beyond search.

Stephen E Arnold, June 21, 2010

Sponsored by Stephen E. Arnold

A Smidgen of Dataspace Seeps from the Google

June 18, 2010

My favorite dataspace guru and his colleagues have nailed a US patent. I am speaking of Dr. Alon Halevy and his clutch of merry Googlers, Jayant Madhavan  and David Ko. I know the azure chip set is excited about Google’s ability to pinpoint gaps in media coverage and methods for sucking the filling out of Wi Fi content. I am not. Nope, I keep my eye on the ball. In this case the system and method disclosed in US7,739,258, “Facilitating Searches through Content Which Is Accessible through Web-Based Forms.” These are not your mother’s social security forms. With some prior patents in the possession of sporty outfits like AT&T and Lucent, the Googlers have to tiptoe through the dataspace. I was going to write a lyric for Tiny Tim to sing: something along the line of “Tiptoe through the manifold with me.” But I won’t.

Here’s the crystal clear prose from the Google wizard and legal eagle team:

One embodiment of the present invention provides a system that facilitates crawling through web-based forms to gather information to facilitate subsequent searches through content which is accessible though the web-based forms. During operation, the system first obtains web-based forms to be searched. Note that the system can obtain these web-based forms from a number of sources. For example, the system can crawl through web sites to identify web-based forms, the system can receive manually provided web-based forms, or the system can find web-based forms through methods other than crawling. Next, the system creates database entries for the identified forms. This involves obtaining and storing metadata describing the identified forms into database entries and then storing these database entries in a form database to facilitate searches through content which is accessible through the identified forms. Note that this form database can include a web index and associated documents, which can be used to facilitate web search queries that return both ordinary documents and documents that result from form queries.

My view? Important.

Stephen E Arnold, June 18, 2010

Freebie

Attensity SAS Staff Shuffle

June 16, 2010

I learned recently that SAS lost Manya Mayes to Attensity. No big deal, but Ms Mayes had been at SAS for 15 years. Attensity seems to be serious about its text analytics business. You can get more information in the write up “Attensity Group Appints Manya Mayes as Director of Advanced Analytics.” Here’s what an Attensity officer said about the new hire:

Her SAS expertise and customer and product experience will be a great asset to the Attensity team. Her addition will build on Attensity’s current analytic capabilities, bringing advanced analytics expertise to the team.”

A couple of thoughts. I wonder why Ms. Mayes is not an officer of Attensity. Second, will SAS push back and make some noise about competition? When Google hired an Endeca expert in eCommerce, I received email suggesting that any connection between the Endeca hire and Google’s aspirations in markets where Endeca has a presence was silly.

That’s a silly goose for you I suppose.

Stephen E Arnold, June 16, 2010

Freebie

Exalead Acquired by Dassault

June 11, 2010

I have done some work for Exalead over the last five years, and I have gone down in history as one of the few people from Kentucky to talk my way into the Exalead offices in Paris without an appointment. L’horreur. I had a bucket of KY Fry in my hand and was guzzling a Coca Lite.

Out of that exciting moment in American courtesy, I met François Bourdoncle, a former AltaVista.com wizard. He watched in horror as I gobbled a crispy leg and asked him about the origins of Exalead, his work with then-Googler Louis Monier, and his vision for 64 bit computing. I wrote up some of the information in the first edition of the Enterprise Search Report, a publication now shaped into a quasi-New Age Cliff’s Notes for the under 30 crowd. I followed up with M. Bourdoncle in February 2008, and published that interview as part of the ArnoldIT.com Search Wizards Speak series. The last time I was in Paris, I dropped by the Exalead offices and had a nice chat. I even made a video. Several Exaleaders took me to dinner, pointing out that McDo was not an option. Rats.

image

So what’s with the sale of Exalead to Dassault Systèmes?

The azure chip crowd has weighed in, and I will ignore those observations. There is some spectacular baloney being converted into expensive consulting burgers, and I will leave you and them to your intellectual picnic.

Here’s my take:

Differentiator

There are lots of outfits asserting that their search and content processing system will work wonders. I don’t want to list these companies, but you can find them by navigating either to Google.com or Exalead.com/search and running a query for enterprise search. The problem is that most of these outfits come with what I call an “interesting history.” Examples range from natural language processing companies that have been created from the ashes of not-so-successful search vendors to Frankenstein companies created with “no cash mergers.” I know. Wild, right. Other companies have on going investigations snapping like cocker spaniels at their heels. A few are giant roll ups, in effect, 21st century Ling Temco Vought clones. A few are delivering solid value for specific applications. I can cite examples in XML search, eDiscovery, and enhancements for the Google constructs. (Okay, I will mention my son’s company, Adhere Solutions, a leader in this Google space.)

The point for me is that Exalead combined a number of working functions into a platform. The platform delivers search enabled applications; that is, the licensee has an information problem and doesn’t know how to cope with costs, data flows, and the need for continuous index updating. The Exalead technology makes it easy to suck in information and give different users access to the information they need to do their job. For some Exalead customers, the solution allows people to track packages and shipments. For other licensees, the Exalead technology sucks in information and generates reports in the form of restaurant reviews or competitive profiles. The terminology is less important than solving the problem.

That’s a key differentiator.

Technology

Google and Exalead were two outfits able to learn from the mistakes at AltaVista.com. Early on I learned that the founder of Exalead could have become a Googler. The reason Exalead exists is that M. Bourdoncle wanted to build a French company in France without the wackiness that goes along with tackling this mission in the US of A. Americans don’t fully understand the French, and I can’t do much more than remind you, gentle reader, that French waiters behave a certain way because of the “approach” many Americans make to the task of getting a jambon sandwich and a bottle of water.

I understood that M. Bourdoncle wanted to do the job his way, and he focused on coding for a 64 bit world when there were few 64 bit processors in the paws of enterprise information technology departments. He tackled a number of tough technical problems in order to make possible high performance, low cost scaling, and mostly painless tailoring of the system to information problems, not just search. Sure, search is part of the DNA, but Exalead has connectors, text to voice, image recognition, etc. And, happily, Exalead’s approach plays well with other enterprise systems. Exalead can add value with less engineering hassles than some of the firm’s competitors can. Implementation can be done in days or weeks, and sometimes months, not years like some vendors require.

So the plumbing is good.

That’s a high value asset.

Read more

Exclusive Interview with Seth Grimes, Alta Plana Now Available

June 9, 2010

In April 2010, I spoke with Seth Grimes, the founder of Alta Plana. Mr. Grimes is an analytics strategy consultant. He is founding chair of the Sentiment Analysis Symposium and of the Text Analytics Summit, contributing editor at Intelligent Enterprise magazine, and text analytics channel expert at the Business Intelligence Network. He founded Washington DC-based Alta Plana Corporation in 1997. Mr. Grimes consults, writes, and speaks on information-systems strategy, data management and analysis systems, industry trends, and emerging analytical technologies.

In the interview, he highlighted one of the challenges search and content processing systems face. He said:

I’ve in the past characterized search as evidence of a failure of design. If information were correctly and adequately categorized and organized and made accessible, we wouldn’t need search, would we?  I’ve retreated from that view as I’ve seen search evolve into information access, into technology that not only finds but also organizes results from sources the user likely-as-not didn’t know about.  Yet I’d call my statement still largely true when it comes to the enterprise’s own data holdings: Search is necessitated by a failure of design.  Do a better job organizing information as it’s created or acquired, and also, by the way, stop allowing application vendors to bring in siloed search applications, and the in-organization situation will improve.

To read the full interview, navigate to Search Wizards Speak and click on the Seth Grimes’ interview or click this link.

Stephen E Arnold, June 9, 2010

Unsponsored post.

Business Intelligence Firm Sells for $1.13 Billion

June 8, 2010

Kroll is not a household name. If your house includes an intelligence or police professional, you may have a Kroll T shirt somewhere. The company was part of Marsh McLennan, an outfit that looks like an insurance company. I am not going to sort our what Marsh’s business interests are or explain the Kroll set up. You can get some information in this interview with a Kroll executive. No, don’t ask how I know him.

Kroll is now part of another outfit you probably never heard about either: Altegrity. Read “Kroll-Altegrity: A Reunion of Sorts.” Why’s this important? I think other outfits in this market sector hope to be acquired. In my addled goose view, I don’t think the Marsh executives were sad to see Kroll say adios. There was the money, and the management effort to be in the Kroll line of work is demanding.

Who else is in the Kroll business? Sorry. Not for free.

Stephen E Arnold, June 8, 2010

Freebie.

Silobreaker to Roll Out Report Feature

June 4, 2010

Short honk: We learned from a reader in Europe, that Silobreaker plans to roll out a report feature in the fall of 2010. Silobreaker delivers high value information in an easy-to-digest format. If you are not familiar with the company, navigate to http://www.silobreaker.com/. You can use the free service and upgrade to the industrial-strength version once you get a feel for the depth of the service. Silobreaker supports one or multi dimension queries. When you become a paying customer, you can configure a custom report on a topic of interest to you. You can specify daily, weekly, or monthly updates.

Stephen E Arnold, June 4, 2010

Freebie although I have been assured a fish treat the next time I track down a Silobreaker executive. Promises, promises. I would settle for an answer to my email queries.

Exalead Cloudview Lets Fingers Do the Walking and Caring

June 4, 2010

Yellow Pages Group’s phone application, Urbanizer, selected Exalead Cloudview to collect customer sentiment information. This innovative product is the first restaurant recommendation application that aligns with the emotional element of consumer decision making.

Sys-Con Media reports in “Urbanizer iPhone Application Uses Exalead CloudView to Collect Customer Sentiment Data” () that this new phone application allows users to choose from a selection of pre-defined moods or use Urbanizer’s equalizer function to create a custom mood based on combinations of cuisine, ambiance and service categories. Exalead’s CloudView search-based application platform is embedded into the Urbanizer application architecture and uses semantic extraction capabilities to distill sentiment from unstructured web data from consumer comments posted to Urbanizer.

The advanced semantic technology that Exalead brings to the table seems to be reshaping the digital content landscape. Cloudview collects data from virtually any source, in any format, and transforms it into structured business information that can be directly searched and queried.

Melody K. Smith, June 4, 2010

A freebie but maybe a Coca lite when I am next in Paris?

SAS Text Analytics and Teragram

May 28, 2010

I received a call about Teragram, the text processing company that SAS acquired a couple of years ago. I did a quick Overflight check and realized that I had not documented the absorption of Teragram into SAS. Teragram’s technology is alive and well, but the SAS positioning is for content processing to be a component of SAS Text Analytics. The product and solution has its own subsite within SAS.com. You can locate the details at http://www.sas.com/text-analytics/.

Another important point is that SAS Text Analytics includes four components. There is the SAS Enterprise Content Categorization function. The system parses content and identifies entities. Metadata are created along with category rules.

The second function is SAS Sentiment Analysis. A number of companies are competing in this sector. The SAS approach sucks in emails, tweets, and other documents. The system identifies various subjective shades in the source content.

SAS Text Miner now includes both text and data mining operations. The system is not one of those Web 2.0, “it is really easy” solutions. The system is easy to use, but to put “easy” in context, you will need programming and statistical savvy along with solid data set building skills.

The SAS Ontology Management solution provides a centralized method for keeping index terms and metatags consistent. Sounds easy, but this type of consistency is the difference between useful and useless information. SharePoint lacks this type of functionality. You have been given a gentle reminder about consistent tagging, dear SharePoint user.

SAS has a blog focused on text analytics. You can read “The Text Frontier” but last time I checked, the blog’s most recent update was posted in March 2010.

Bottomline: Teragram is alive and well, just part of SAS Text Analytics.

Stephen E Arnold, May 28, 2010

Freebie

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta