Powerset as Antigen: Can Google Resist Microsoft’s New Threat
August 20, 2008
I found the write ups about Satya Nadella’s observations about Microsoft’s use of the Powerset technology in WebProNews, Webware.com, and Business Week magnetizing. Each of these write ups converged on a single key idea; namely, Microsoft will use the Powerset / Xerox PARC technology to exploit Google’s inability to deal with tailoring a search experience to deliver a better search experience a user. The media attention directed at a conference focused on generating traffic to a Web site without regard to the content on that site, its provenance, or its accuracy is downright remarkable. Add together the assertion that Powerset will hobble the Google, and I may have to extend my anti-baloney shields another 5,000 kilometers.
Let’s tackle some realities:
- To kill Google, a company has to jump over, leap frog, or out innovate Google. Using technology that dates from the 1990s, poses scaling challenges, and must be “hooked” into the existing Microsoft infrastructure is a way to narrow a gap, but it’s not enough to do much to wound, impair, or kill Google. If you know something about the Xerox PARC technology that I’m missing, please, tell me. I profiled Inxight Software in one of my studies. Although different from Xerox PARC technology used by Powerset, it was close enough to identify some strengths and weaknesses. One issue is the computational load the system imposes. Maybe I’m wrong but scaling is a big deal when extending “context” to lots of users.
- Microsoft is slipping further behind Google. The company is paying users, and it is still losing market share. Read my short post on this subject here. Even if the data are off by an order of magnitude, Microsoft is not making headway in the Web search market share.
- Cost is a big deal. Microsoft appears to have unlimited resources. I’m not so sure. If Google’s $1 of infrastructure investment buys 4X the performance that a Microsoft $1 does, Microsoft has an infrastructure challenge that could cost more than even Microsoft can afford.
So, there are computational load issues. There are cost issues. There are innovation issues. There are market issues. I must be the only person on the planet who is willing to assert that small scale search tweaks will not have the large scale effects Microsoft needs.
Forget the assertion that Business Week offers when its says that Google is moving forward. Google is not moving forward; Google is morphing into a different type of company. “Moving forward” only tells part of the story. I wonder if I should extend my shields of protection to include filtering baloney about search emanating from a conference focused on tricking algorithms into putting a lousy site at the top of a results list.
Agree? Disagree? I’m willing to learn if my opinions are scrambled.
Stephen Arnold, August 20, 2008
Clarabridge: Cash Infusion
August 20, 2008
Following in the footsteps of Endeca and Vivisimo, Clarabridge nailed $12 million in financing. The additional infusion marks the company’s third round of funding. The money came from Grotech Ventures, Harbert Venture Partners,Boulder Ventures, and Intersouth Partners. You can read the company’s official news release here. VCAonline provides additional color here. The money will allow Clarabridge to increase its presence in a crowded market. The expectations for fast growth are often high, and pumping Clarabridge to $100 million in revenue in 24 to 36 months seems to be a relatively challenging job. At my age, it makes my heart palpitate just thinking of the work ahead.
The question that nags at me is, “Will these cash infusions in text and content processing pay off?” Despite the lousy market, smart money seems to say, “Absolutely.” The challenge will be to break through the glass ceiling that keeps most text crunching companies well below Autonomy’s lofty $300 million in revenue, achieved after a decade of hard work, and Endeca’s $110 million (another 10 years of effort as well). Agree? Disagree? Use the comments section to offer your views.
Stephen Arnold, August 20, 2008
Attensity Lassos Brands with BzzAgent Tie Up
August 20, 2008
Attensity, a text analytics and content processing company, applies its “deep extraction” methods to law enforcement and customer support tasks. The company has formed a partnership with BzzAgent. You can find out more about this firm here. This Boston-based firm specializes in the delightfully named art of WOM, shorthand for “word of mouth” marketing. The company’s secret sauce is more than 400,000 WOM volunteers. Attensity’s technology can process BzzAgent’s inputs and deliver useful brand cues. Helen Leggatt’s “Marketers to Get ‘Unrivaled Insights’ into WOM.” You can read this interesting article here. For me, the most interesting point is Ms. Leggatt’s article was:
Each month, BzzAgent’s volunteers submit around 100,000 reports. Attensity’s text analytics technology will analyze the data contained within these reports to identify “facts, sentiment, opinions, requests, trends, and trouble spots”.
Like other content processing companies, Attensity is looking for ways to expand into new markets with its extraction and analytic technology. Is this a sign of vitality, or is it a hint that content processing companies are beginning to experience a slow down in other market sectors? Anyone have thoughts on this type of market friction?
Stephen Arnold, August 20, 2008
Facets ‘Lite": Discovery Navigation for Thunderbird
August 18, 2008
David Huynh, a research scientist at MIT, posted in March 2008, a brief description of Seek 1.0. This software plug in allows a user to locate information in Thunderbird email. In eCommerce and enterprise search, Endeca has been successful positioning itself as one of the leaders in point-and-click interfaces. The idea is that during content processing, the system identifies concepts, entities, and relationships. A user has the option of plugging a word into a search box or browsing categories or other objects displayed. The user can scan a list of hot links, click on one, and begin examining information. Key word search is useful, but if the user does not know the terms to use, the browse feature becomes a useful way to locate information.
The Seek 1.0 component, according to Dr. Huynh’s Web log here, “an extension for Mozilla Thunderbird that provides faceted browsing features to let you search through your email more efficiently.” Commercial systems can be expensive. Dr. Huynh’s is available here. Endeca is most likely aware of Dr. Huynh’s activities, and Dr. Huynh lists one of Endeca’s research scientists in his “blogroll”.
Here’s a snippet of the interface:
After installing the component, navigate to the Thunderbird Tools menu and click on Seek. You are good to go.
Mr. Huynh says:
It is thus important that everyone be able to deal with data themselves: gather data, sift through data, integrate data, interpret data, make informed conclusions, and present their findings to their peers and to the world.
For me the importance of Seek is that the system is sufficient light weight to run on most notebook computers. Furthermore, the interface integrates well with Thunderbird, so users don’t have to understand metadata to make use of the system. Finally, for now, the system is making discovery interfaces available to a broader range of email users.
Is there a downside? The system does take some time to process content. I didn’t notice significant latency, but I have a fire breather and you may have an asthmatic gizmo. We have not subjected the component to crash recovery testing; that is, is it possible to restore indexes in the event of a problem. We will get to that in the days ahead. Finally, there are a number of commercial systems gearing up to enhance, improve, and search email. At this point it’s not clear how these services will serve to confuse users which can create traction problems for interesting projects like Seek.
A happy quack to Dr Huynh and the rest of the technical Jedi knights at the MIT Haystack Group. If you want to know more about Dr. Huynh, here’s cv is here.
Lexalytics and Infonic Go Beyond Sentiment and Get Hitched
August 14, 2008
I learned about Lexalytics when I was researching Fast Search & Technology. Fast Search introduced when I was writing the Enterprise Search Report a function that would report on the sentiment in documents or email. The idea is particularly important in customer support. A flow of email that turns sour can be identified by sentiment analysis software. Fast Search’s approach was interesting to me because it was able to use Fast Search’s alert feature.
Founded in 2000, Infonic here is a publicly traded company (previously named Corpora plc). Infonic is listed on the UK’s AiM Stock Exchange as LON:IFNC. The company offers geo-replication and document management solutions. The firm also develops text analytics and sentiment technology. The firm’s Geo-Replicator software uses data compression and synchronization technology to replicate data between servers and laptops and server to server. The firm’s Document Manager software permits scanning, search, and retrieval of processed content. The company’s text analytics software product is called Sentiment.
At the end of July 2008, the two companies announced that the sentiment units would be merged. The new unit will be based in the UK and named Lexalytics Limited. I profiled the company in my new study for the Gilbane Group here. Lexalytics software performs entity extraction, sentiment analysis, document summarization and thematic extraction. Information about Lexalytics is here.
According to the two companies,
The rationale behind combining the businesses is to pool the expertise and complementary products of the parties in this specialist area and to drive joint growth in sales, utilizing Infonic’s global sales capabilities.
The new company has a value estimated at $40 million. Jeff Caitlin, founder of Lexalytics, will be the managing director of the new company.
Sentiment analysis is moving to the mainstream. The addled goose wishes the new sentimental outfits good luck. Oh, one final point: watch for more consolidation in the text analytics space. The market is a frosty place for some search and content processing vendors at this time.
Stephen Arnold, August 14, 2008
Autonomy Lands Fresh Content Tuna
August 13, 2008
Northcliffe Media, a unit of the Daily Mail and General Trust, publishes a score of newspapers mostly in the UK. Circulation is nosing towards a million. Northcliffe also operates a little more than two dozen subscription and ad support weekly regional tabloids and produces 60 ad supported free week shopper type publications. The company also cranks out directories, runs a printing business, and is in the newsstand business. The company whose tag line is “At the heart of all things local” has international interests as well. Despite the categorical affirmative “all”, I don’t think Northcliffe operates in Harrod’s Creek, Kentucky. Perhaps it should?
Autonomy, the Cambridge-based search and Okana (an Autonomy partner) have landed the Northcliffe Media search business; thus, a big content tuna. Okana describes big clients like Northcliffe as “platinum” not “tuna” but I wanted to keep my metaphor consistent and Okana rhymes with tuna.
Okana was Autonomy’s “Innovative Partner of the Year” in 2007. Okana says, “Based around proven architectures, Okana’s range of Autonomy appliance products provide instantly deployable and scaleable [sic] solutions for Information Discovery, Monitoring, and Investigation.Compliance.”
This description of Okana’s offering an “appliance” was new information for me. Also, my research suggests that many Autonomy IDOL installations benefit from training the IDOL system. If training is required, then I ask, “What if any trade off is required for an instant Autonomy IDOL deployment?” If anyone has details for me, please, use the comments section of this Web log.
Autonomy’s IDOL (Intelligent Data Operating Layer) will index an initial 40 million documents and then process new content. The new system will use Autonomy’s “meaning-based computing” for research, information retrieval and trend spotting.
You can read the complete Autonomy news release here. Once again Autonomy is making sales as word reaches me of competitors running into revenue problems. Autonomy’s a tuna catcher while others seem fated to return to port with empty holds.
Stephen Arnold, August 13, 2008
The Future of Search? It’s Here and Disappointing
August 13, 2008
AltSearchEngines.com–an excellent Web log and information retrieval news source–tapped the addled goose (me, Stephen E. Arnold) for some thoughts about the future of search. I’m no wizard, being an a befuddled flow, but I did offer several hundred words on the subject. I even contributed one of my semi-famous “layers” diagrams. These are important because each layer represents a slathering of computational crunching. The result is an incremental boost to the the underlying search system’s precision, recall, interface outputs, and overall utility of the system. The downside is that as the layers pile up so does complexity and its girl friend costs. You can read the full essay and look at the diagram here. A happy quack to the AltSearchEngines.com team for [a] asking me to contribute an essay and [b] having the moxie to publish the goose feathers I generate. The message in my essay is no laughing matter. The future of search is here and in many ways, it is deeply disappointing and increasingly troubling to me. For an example, click here.
Stephen Arnold, August 13, 2008
MarkLogic: The Army’s New Information Access Platform
August 13, 2008
You probably know that the US Army has nicknames for its elite units. Screaming Eagle, Big Red One, and my favorite “Hell on Wheels.” Now some HUMINT, COMINT, and SIGINT brass may create a MarkLogic unit with its own flash. Based on the early reports I have, the MarkLogic system works.
Based in San Carlos (next to Google’s Postini unit, by the way), MarkLogic announced that the US Army Combined Arms Center or CAC in Ft. Leavenworth, Kansas, has embraced MarkLogic Server. BCKS, shorthand for the Army’s Battle Command Knowledge System, will use this next-generation content processing and intelligence system for the Warrior Knowledge Base. Believe me, when someone wants to do you and your team harm, access to the most timely, on point information is important. If Napoleon were based at Ft. Leavenworth today, he would have this unit report directly to him. Information, the famous general is reported to have said, is nine tenths of any battle.
Ft. Leavenworth plays a pivotal role in the US Army’s commitment to capture, analyze, share, and make available information from a range of sources. MarkLogic’s technology, which has the Department of Defense Good Housekeeping Seal of Approval, delivers search, content management, and collaborative functions.
An unclassified sample display from the US Army’s BCKS system. Thanks to MarkLogic and the US Army for permission to use this image.
The system applies metadata based on the DOD Metadata Specification (DDMS). The content is managed automatically by applying metadata properties such as the ‘Valid Until’ date. The system uses the schema standard used by the DOD community. The MarkLogic Server manages the work flow until the file is transferred to archives or deleted by the content manager. MarkLogic points to savings in time and money. My sources tell me that the system can reduce the risk to service personnel. So, I’m going to editorialize and say, “The system saves lives.” More details about the BCKS is available here. Dot Mil content does move, so click today. I verified this link at 0719, August 13, 2008.
hakia’s Founder Riza Berkan on Search
August 12, 2008
Dr. Riza Berkan, founder of hakia, a company engaged in search and content processing, reveals the depth of engineering behind the firm’s semantic technology. Dr. Berkan said here:
If you want broad semantic search, you have to develop the platform to support it, as we have. You cannot simply use an index and convert it to semantic search.
With its unique engineering foundation, the hakia system goes through a learning process similar to that of the human brain. Dr. Berkan added:
We take the page and content, and create queries and answers that can be asked to that page, which are then ready before the query comes.
He emphasized that “there is a level of suffering and discontent with the current solutions”. He continued:
I think the next phase of the search will have credibility rankings. For example, for medical searches, first you will see government results – FDA, National Institutes of Health, National Science Foundation. – then commercial – WebMD – then some doctor in Southern California – and then user contributed content. You give users such results with every search; for example, searching for Madonna, you first get her site, then her official fan site, and eventually fan Web logs.
You can read the full text of the interview with Dr. Riza Berkan on the ArnoldIT.com Web in the Search Wizards Speak series. The interview was conducted by Avi Deitcher for ArnoldIT.com.
Stephen Arnold, August 12, 2008
Sprylogics’ CTO Zivkovic Talks about Cluuz.com
August 7, 2008
The popular media fawn over search and process content companies with modest demos that work on limited result sets. Cluuz.com–a company I profiled here several weeks ago here–offers more hearty fare. The company uses Yahoo’s index to showcase its technology. You can take Cluuz.com for a test drive here. I was quite interested in the company’s approach because it uses Fancy Dan technology in a way that was immediately useful for me. Cluuz.com is a demonstration of Toronto-based Sprylogics International Inc. The company is traded on the Toronto exchange symbol TSXV:SPY.
With roots in the intelligence community, unlocking Sprylogics took some work. Once I established contact with Alex Zivkovic, I was impressed with his responsiveness and his candor.
You can read about the origins of the Cluuz.com service as well as some of the company’s other interesting content processing technology. The company offers a search system, but the real substance of the company is how the company processes content, even the Yahoo search index into a significantly more useful form.
The Cluuz.com system puts on display the firm’s proprietary semantic graph technology. You can see relationships for specific subsets of relevant content. I often use the system to locate information about a topic and then explore the identified experts and their relationships. This feature saves me hours of work trying to find a connection between two people. Cluuz.com makes this a trivial task.
Mr. Zivkovic told me:
So, we have clustering. We have entity extraction. We have a relational ship analysis in a graph format. I want to point out that for enterprise applications, the Cluuz.com functions are significantly more rich. For example, a query can be run across internal content and external content. The user sees that the internal information is useful but not exactly on point. Our graph technology makes it easy for the user to spot useful information from an external source such as the Web in conjunction with the internal information. With a single click, the user can be looking into those information objects.
I probed into the “guts” of the system. Mr. Zivkovic revealed:
Our engineers have worked hard to perform multiple text processing operations in an optimized way. Our technology can, in most cases, process content and update the indexes in a minute or less. We keep the details of our method close to the vest. I can say that we use some of the techniques that you and others have identified as characteristic of high-speed systems like those at Google, for example.
You can read the full interview with Mr. Zivkovic in the Search Wizards Speaks interview collection on the ArnoldIT.com Web site. The full text of this exclusive interview is here. A complete index of the interviews in this series is here.