Clarabridge: Cash Infusion

August 20, 2008

Following in the footsteps of Endeca and Vivisimo, Clarabridge nailed $12 million in  financing. The additional infusion marks the company’s third round of funding. The money came from Grotech Ventures, Harbert Venture Partners,Boulder Ventures, and Intersouth Partners. You can read the company’s official news release here. VCAonline provides additional color here. The money will allow Clarabridge to increase its presence in a crowded market. The expectations for fast growth are often high, and pumping Clarabridge to $100 million in revenue in 24 to 36 months seems to be a relatively challenging job. At my age, it makes my heart palpitate just thinking of the work ahead.

The question that nags at me is, “Will these cash infusions in text and content processing pay off?” Despite the lousy market, smart money seems to say, “Absolutely.” The challenge will be to break through the glass ceiling that keeps most text crunching companies well below Autonomy’s lofty $300 million in revenue, achieved after a decade of hard work, and Endeca’s $110 million (another 10 years of effort as well). Agree? Disagree? Use the comments section to offer your views.

Stephen Arnold, August 20, 2008

Attensity Lassos Brands with BzzAgent Tie Up

August 20, 2008

Attensity, a text analytics and content processing company, applies its “deep extraction” methods to law enforcement and customer support tasks. The company has formed a partnership with BzzAgent. You can find out more about this firm here. This Boston-based firm specializes in the delightfully named art of WOM, shorthand for “word of mouth” marketing. The company’s secret sauce is more than 400,000 WOM volunteers. Attensity’s technology can process BzzAgent’s inputs and deliver useful brand cues. Helen Leggatt’s “Marketers to Get ‘Unrivaled Insights’ into WOM.” You can read this interesting article here. For me, the most interesting point is Ms. Leggatt’s article was:

Each month, BzzAgent’s volunteers submit around 100,000 reports. Attensity’s text analytics technology will analyze the data contained within these reports to identify “facts, sentiment, opinions, requests, trends, and trouble spots”.

Like other content processing companies, Attensity is looking for ways to expand into new markets with its extraction and analytic technology. Is this a sign of vitality, or is it a hint that content processing companies are beginning to experience a slow down in other market sectors? Anyone have thoughts on this type of market friction?

Stephen Arnold, August 20, 2008

Facets ‘Lite": Discovery Navigation for Thunderbird

August 18, 2008

David Huynh, a research scientist at MIT, posted in March 2008, a brief description of Seek 1.0. This software plug in allows a user to locate information in Thunderbird email. In eCommerce and enterprise search, Endeca has been successful positioning itself as one of the leaders in point-and-click interfaces. The idea is that during content processing, the system identifies concepts, entities, and relationships. A user has the option of plugging a word into a search box or browsing categories or other objects displayed. The user can scan a list of hot links, click on one, and begin examining information. Key word search is useful, but if the user does not know the terms to use, the browse feature becomes a useful way to locate information.

The Seek 1.0 component, according to Dr. Huynh’s Web log here, “an extension for Mozilla Thunderbird that provides faceted browsing features to let you search through your email more efficiently.” Commercial systems can be expensive. Dr. Huynh’s is available here. Endeca is most likely aware of Dr. Huynh’s activities, and Dr. Huynh lists one of Endeca’s research scientists in his “blogroll”.

Here’s a snippet of the interface:

image

After installing the component, navigate to the Thunderbird Tools menu and click on Seek. You are good to go.

Mr. Huynh says:

It is thus important that everyone be able to deal with data themselves: gather data, sift through data, integrate data, interpret data, make informed conclusions, and present their findings to their peers and to the world.

For me the importance of Seek is that the system is sufficient light weight to run on most notebook computers. Furthermore, the interface integrates well with Thunderbird, so users don’t have to understand metadata to make use of the system. Finally, for now, the system is making discovery interfaces available to a broader range of email users.

Is there a downside? The system does take some time to process content. I didn’t notice significant latency, but I have a fire breather and you may have an asthmatic gizmo. We have not subjected the component to crash recovery testing; that is, is it possible to restore indexes in the event of a problem. We will get to that in the days ahead. Finally, there are a number of commercial systems gearing up to enhance, improve, and search email. At this point it’s not clear how these services will serve to confuse users which can create traction problems for interesting projects like Seek.

A happy quack to Dr Huynh and the rest of the technical Jedi knights at the MIT Haystack Group. If you want to know more about Dr. Huynh, here’s cv is here.

Lexalytics and Infonic Go Beyond Sentiment and Get Hitched

August 14, 2008

I learned about Lexalytics when I was researching Fast Search & Technology. Fast Search introduced when I was writing the Enterprise Search Report a function that would report on the sentiment in documents or email. The idea is particularly important in customer support. A flow of email that turns sour can be identified by sentiment analysis software. Fast Search’s approach was interesting to me because it was able to use Fast Search’s alert feature.

Founded in 2000, Infonic here is a publicly traded company (previously named Corpora plc). Infonic is listed on the UK’s AiM Stock Exchange as LON:IFNC. The company offers geo-replication and document management solutions. The firm also develops text analytics and sentiment technology. The firm’s Geo-Replicator software uses data compression and synchronization technology to replicate data between servers and laptops and server to server. The firm’s Document Manager software permits scanning, search, and retrieval of processed content. The company’s text analytics software product is called Sentiment.

At the end of July 2008, the two companies announced that the sentiment units would be merged. The new unit will be based in the UK and named Lexalytics Limited. I profiled the company in my new study for the Gilbane Group here. Lexalytics software performs entity extraction, sentiment analysis, document summarization and thematic extraction. Information about Lexalytics is here.

According to the two companies,

The rationale behind combining the businesses is to pool the expertise and complementary products of the parties in this specialist area and to drive joint growth in sales, utilizing Infonic’s global sales capabilities.

The new company has a value estimated at $40 million. Jeff Caitlin, founder of Lexalytics, will be the managing director of the new company.

Sentiment analysis is moving to the mainstream. The addled goose wishes the new sentimental outfits good luck. Oh, one final point: watch for more consolidation in the text analytics space. The market is a frosty place for some search and content processing vendors at this time.

Stephen Arnold, August 14, 2008

Autonomy Lands Fresh Content Tuna

August 13, 2008

Northcliffe Media, a unit of the Daily Mail and General Trust, publishes a score of newspapers mostly in the UK. Circulation is nosing towards a million. Northcliffe also operates a little more than two dozen subscription and ad support weekly regional tabloids and produces 60 ad supported free week shopper type publications. The company also cranks out directories, runs a printing business, and is in the newsstand business. The company whose tag line is “At the heart of all things local” has international interests as well. Despite the categorical affirmative “all”, I don’t think Northcliffe operates in Harrod’s Creek, Kentucky. Perhaps it should?

Autonomy, the Cambridge-based search and Okana (an Autonomy partner) have landed the Northcliffe Media search business; thus, a big content tuna. Okana describes big clients like Northcliffe as “platinum” not “tuna” but I wanted to keep my metaphor consistent and Okana rhymes with tuna.

Okana was Autonomy’s “Innovative Partner of the Year” in 2007. Okana says, “Based around proven architectures, Okana’s range of Autonomy appliance products provide instantly deployable and scaleable [sic] solutions for Information Discovery, Monitoring, and Investigation.Compliance.”

This description of Okana’s offering an “appliance” was new information for me. Also, my research suggests that many Autonomy IDOL installations benefit from training the IDOL system. If training is required, then I ask, “What if any trade off is required for an instant Autonomy IDOL deployment?” If anyone has details for me, please, use the comments section of this Web log.

Autonomy’s IDOL (Intelligent Data Operating Layer) will index an initial  40 million documents and then process new content. The new system will use Autonomy’s “meaning-based computing” for research, information retrieval and trend spotting.

You can read the complete Autonomy news release here. Once again Autonomy is making sales as word reaches me of competitors running into revenue problems. Autonomy’s a tuna catcher while others seem fated to return to port with empty holds.

Stephen Arnold, August 13, 2008

The Future of Search? It’s Here and Disappointing

August 13, 2008

AltSearchEngines.com–an excellent Web log and information retrieval news source–tapped the addled goose (me, Stephen E. Arnold) for some thoughts about the future of search. I’m no wizard, being an a befuddled flow, but I did offer several hundred words on the subject. I even contributed one of my semi-famous “layers” diagrams. These are important because each layer represents a slathering of computational crunching. The result is an incremental boost to the the underlying search system’s precision, recall, interface outputs, and overall utility of the system. The downside is that as the layers pile up so does complexity and its girl friend costs. You can read the full essay and look at the diagram here. A happy quack to the AltSearchEngines.com team for [a] asking me to contribute an essay and [b] having the moxie to publish the goose feathers I generate. The message in my essay is no laughing matter. The future of search is here and in many ways, it is deeply disappointing and increasingly troubling to me. For an example, click here.

Stephen Arnold, August 13, 2008

MarkLogic: The Army’s New Information Access Platform

August 13, 2008

You probably know that the US Army has nicknames for its elite units. Screaming Eagle, Big Red One, and my favorite “Hell on Wheels.” Now some HUMINT, COMINT, and SIGINT brass may create a MarkLogic unit with its own flash. Based on the early reports I have, the MarkLogic system works.

Based in San Carlos (next to Google’s Postini unit, by the way), MarkLogic announced that the US Army Combined Arms Center or CAC in Ft. Leavenworth, Kansas, has embraced MarkLogic Server. BCKS, shorthand for the Army’s Battle Command Knowledge System, will use this next-generation content processing and intelligence system for the Warrior Knowledge Base. Believe me, when someone wants to do you and your team harm, access to the most timely, on point information is important. If Napoleon were based at Ft. Leavenworth today, he would have this unit report directly to him. Information, the famous general is reported to have said, is nine tenths of any battle.

Ft. Leavenworth plays a pivotal role in the US Army’s commitment to capture, analyze, share, and make available information from a range of sources. MarkLogic’s technology, which has the Department of Defense Good Housekeeping Seal of Approval, delivers search, content management, and collaborative functions.

img 813a

An unclassified sample display from the US Army’s BCKS system. Thanks to MarkLogic and the US Army for permission to use this image.

The system applies metadata based on the DOD Metadata Specification (DDMS). The content is managed automatically by applying metadata properties such as the ‘Valid Until’ date. The system uses the schema standard used by the DOD community. The MarkLogic Server manages the work flow until the file is transferred to archives or deleted by the content manager. MarkLogic points to savings in time and money. My sources tell me that the system can reduce the risk to service personnel. So, I’m going to editorialize and say, “The system saves lives.” More details about the BCKS is available here. Dot Mil content does move, so click today. I verified this link at 0719, August 13, 2008.

Read more

hakia’s Founder Riza Berkan on Search

August 12, 2008

Dr. Riza Berkan, founder of hakia, a company engaged in search and content processing, reveals the depth of engineering behind the firm’s semantic technology. Dr. Berkan said here:

If you want broad semantic search, you have to develop the platform to support it, as we have. You cannot simply use an index and convert it to semantic search.

With its unique engineering foundation, the hakia system goes through a learning process similar to that of the human brain. Dr. Berkan added:

We take the page and content, and create queries and answers that can be asked to that page, which are then ready before the query comes.

He emphasized that “there is a level of suffering and discontent with the current solutions”. He continued:

I think the next phase of the search will have credibility rankings. For example, for medical searches, first you will see government results – FDA, National Institutes of Health, National Science Foundation. – then commercial – WebMD – then some doctor in Southern California – and then user contributed content. You give users such results with every search; for example, searching for Madonna, you first get her site, then her official fan site, and eventually fan Web logs.

You can read the full text of the interview with Dr. Riza Berkan on the ArnoldIT.com Web in the Search Wizards Speak series. The interview was conducted by Avi Deitcher for ArnoldIT.com.

Stephen Arnold, August 12, 2008

Sprylogics’ CTO Zivkovic Talks about Cluuz.com

August 7, 2008

The popular media fawn over search and process content companies with modest demos that work on limited result sets. Cluuz.com–a company I profiled here several weeks ago here–offers more hearty fare. The company uses Yahoo’s index to showcase its technology. You can take Cluuz.com for a test drive here. I was quite interested in the company’s approach because it uses Fancy Dan technology in a way that was immediately useful for me. Cluuz.com is a demonstration of Toronto-based Sprylogics International Inc. The company is traded on the Toronto exchange symbol TSXV:SPY.

With roots in the intelligence community, unlocking Sprylogics took some work. Once I established contact with Alex Zivkovic, I was impressed with his responsiveness and his candor.

You can read about the origins of the Cluuz.com service as well as some of the company’s other interesting content processing technology. The company offers a search system, but the real substance of the company is how the company processes content, even the Yahoo search index into a significantly more useful form.

The Cluuz.com system puts on display the firm’s proprietary semantic graph technology. You can see relationships for specific subsets of relevant content. I often use the system to locate information about a topic and then explore the identified experts and their relationships. This feature saves me hours of work trying to find a connection between two people. Cluuz.com makes this a trivial task.

Mr. Zivkovic told me:

So, we have clustering. We have entity extraction. We have a relational ship analysis in a graph format. I want to point out that for enterprise applications, the Cluuz.com functions are significantly more rich. For example, a query can be run across internal content and external content. The user sees that the internal information is useful but not exactly on point. Our graph technology makes it easy for the user to spot useful information from an external source such as the Web in conjunction with the internal information. With a single click, the user can be looking into those information objects.

I probed into the “guts” of the system. Mr. Zivkovic revealed:

Our engineers have worked hard to perform multiple text processing operations in an optimized way. Our technology can, in most cases, process content and update the indexes in a minute or less. We keep the details of our method close to the vest. I can say that we use some of the techniques that you and others have identified as characteristic of high-speed systems like those at Google, for example.

You can read the full interview with Mr. Zivkovic in the Search Wizards Speaks interview collection on the ArnoldIT.com Web site. The full text of this exclusive interview is here. A complete index of the interviews in this series is here.

Attivio: Active Intelligence Engine Version 1.2 Released

August 7, 2008

Attivio is on my radar. The company demonstrated its next-generation business intelligence system to me at the AIIM Show in April 2008. I liked what I saw. I interviewed the founder of Attivio, and you can read that transcript on the ArnoldIT.com Search Wizards Speak site here.

Now the company has released Version 1.2 of its Active Intelligence Engine, which means the gang in Wellesley, Massachusetts, is on the move. You can read the write up here.

Attivio–like some other newcomers–is not just a search or information access engine. Attivio has rethought the problem of getting information to employees who are under pressure or just in a hurry to get their kids from day care. I will dig into some of the new features later this month.

For now, let me highlight some of the new features of AIE 1.2. (I must admit when I pronounce AIE, I imagine the sound of a scream from competitors. Then after the scream dies down, I hear, “Why didn’t we implement those functions?”).

Four points about Version 1.2 caught my attention:

  1. AIE is positioned as a platform. The idea is that you can deploy quickly and build on top of the AIE system.
  2. Rich index that combines ease of use with the type of precision associated with structured query language statements. To me, this means I can get what I need without trying to get a programmer’s attention or spend some time flipping through a SQL manual
  3. Fast index updates and real time alerting.
  4. New connectors and support for 50 languages.

Attivio wants to deliver business intelligence without the hassles of most older business intelligence systems. The Bottomline for me is that Attivio is focusing on basics: speed, ease of use, quick deployment, and platform extensibility. You can learn more about Attivio here.

Stephen Arnold, August 7, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta