Cognition’s Semantic Map
September 22, 2008
I profiled Cognition Technologies in my April 2008 “Beyond Search” report for the Gilbane Group here. I can’t reproduce the profile in my Web log, but you can find out about Cognition by reading the information on the company’s Web site. My take on the firm was that it was working to tame the semantic beast that is prowling around many procurement team meetings. The company has released a knowledge base that “teaches computers the meanings behind words.” You can read more about the semantic map in the RawStory.com article “Computers Figuring Out What Words Mean” here. Cognition has, according to RawStory, licensed the map to LexisNexis, one of the early entrants in online for-fee content access. If you are in the market for a semantic map, check out Cognition’s new offering. My view of semantic technology is that Google seems to be ideally positioned to become the Semantic Web. I provided details behind this assertion in the 2007 report I did for BearStearns before it went down in flames earlier this year. Google has quite a few of its Googley souls laboring in the semantic vine yard. As a result, the semantic efforts of smaller companies and larger outfits like Microsoft have to make significant progress and fast. Cognition’s Web site is here.
Stephen Arnold, September 22, 2008
Business Intelligence: Getting Smarter in a Class with Some Lousy Students
September 22, 2008
Business intelligence sounds more up town than search. Analytics resonates with quantitative goodness. Most employees look back on their classes in mathematics with a combination of nostalgia as in “I wish I would have taken more math” and horror as in “I hated Miss Blackburn’s algebra class”. I did a job for a major university to answer the question, “Can we be number one in computer science?” The answer was, “No.” There were not many math majors who planned on working in the US once the sheepskins were handed out. It’s tough to rise to the top when your future endowment funding sources are working in Wu Han or Mumbai. Loyalties and money may go to the local high school where the math wizards’ genius was first recognized and cultivated.
I find it amusing that search vendors are rushing to become players in the business intelligence arena. Now established business intelligence companies are encouraging the running of the bull-oney. SPSS, SAS, Cognos, and Business Objects have learned to love text because their customers demanded that structured and unstructured data be mind for insights. Ignoring comments on warranty cards, in emails, or in voice calls to a help desk do yield useful information. Some companies learn what customers loathe and then don’t fix the problem. Called your mobile provider lately? How about your bank, assuming it’s still in business? See what I mean.
When I read a good analysis of how business intelligence vendors are getting smarter, I learn something about how the market perceives business intelligence. But I wonder why these analyses don’t dig into the deeper issues associated with vendors who reinvent themselves in order to make sales. I’m not sure the product innovation is of the same quality as the marketing collateral. In short, vendors talk a good game, but the delivery remains much the way it always has. Math and programming people have to be taught the system. The business intelligence system is then set up with rules spelled out. The biggest change is that the traditional method is too expensive, so companies want short cuts to business intelligence goodness. Enter the search and content processing vendor. The idea is simple: index content and convert a user’s query to a form that generates a report. Now will the report have the same concern with the niceties and nuances of hand crafted statistical instructions operating on a well formed data cube? Maybe? But the new approaches are a heck of a lot easier, faster, and cheaper. Licensees are asked to conclude, “You get all three with our new system.”
Take a gander at the well written “Business Intelligence Gets Smart” published on September 5, 2008, by Intelligent Enterprise’s Doug Henschen here. You will have to put up with an annoying ad flop over, but the content is worth the annoyance. The key point of the write up is that business intelligence “improves business performance.” This is a key point. Most search and content processing systems don’t generate a hard return on investment. Business intelligence, according to the Information Week Research Business Intelligence Survey cited by Mr. Henschen does. That’s good news, and it encourages vendors with non-ROI systems to repackage these products as bottom line centric solutions. For me, the most important parts of this write up were the charts and graphs. Mr. Henschen does a good job of pulling together the numbers that help put business intelligence in context.
I would like to offer several observations and, of course, invite comment:
- Business intelligence remains a complicated area, and it does not lend itself to facile solutions.
- Most business intelligence systems require that content be transformed, then processed, and finally analyzed. If the content processing goes off track, the fix can be time consuming and expensive. BI systems, like search and content processing systems, can experience cost overruns because the assumptions about the source information were wrong or shallow.
- Business intelligence even when implemented with some of the search centric solutions on the market like Endeca’s Latitude require a math or programming wizard to configure the systems.
Quite a few search and text analytics companies are asserting that “we do business intelligence”. The statement is both true and false. In order to avoid coming down on the false side of the statement, short cuts should be avoided. Implementing business intelligence is similar to Miss Blackburn’s algebra class. It’s demanding, a great deal of work, and usually disliked by those without the appetite or the aptitude for the tasks.
Stephen Arnold, September 22, 2008
Microsoft’s SharePoint in a Post Chrome World
September 17, 2008
CNet ran an interesting story on September 9, 2008 with the fetching title “Microsoft’s Response to Chrome. SharePoint.” The author was Matt Asay, a fellow whose viewpoint I enjoy. For me, the key point to this article which you can read here was:
Microsoft, then, has not been sitting still, waiting to be run over by Google. It has been quietly spreading SharePoint throughout enterprises. SharePoint opens up enterprise data to Microsoft services, running in Microsoft’s browser. Unlike Google, however, Microsoft already has an impressive beachhead in the enterprise. It’s called Office, and most enterprises are addicted to it. In sum, if Google is aiming for Windows, it’s going to lose, because the table stakes are much higher. For Microsoft, the game is SharePoint. For the rest of the industry, including Google, the response needs to be content standardization.
The battle between Google and Microsoft pivots on content. SharePoint is Microsoft’s content standardization play. I think this argument is interesting, but a handful of modest issues nagged at me when I read the article:
- SharePoint is a complicated collection of “stuff”. You can check out the SharePoint placemat here. Complexity may be the major weakness of SharePoint.
- SharePoint search is a work in progress. If you have lots of content even if it is standardized, I find the native SharePoint search function pretty awful. I find it even more awful when I have to configure it, chase down aberrant security settings, and mud wrestle SQL Server performance. I think this is an iceberg issue for Microsoft. The marketing shows the top; the tech folks see what’s hidden. It’s not pretty.
- Google’s approach to content standardization is different from the SharePoint approach Mr. Asay describes. The GOOG wants software to transform and manipulate content. The organization can do what it wants to create information. Googzilla can handle it, make it searchable, and even repurpose it with one of its “publishing” inventions disclosed in patent documents.
I hear Mr. Asay. I just don’t think SharePoint is the “shields up” that Microsoft needs to deal with Google in the enterprise. Agree? Disagree? Help me learn, please.
Stephen Arnold, September 10, 2008
How Smart Is Google’s Software?
September 17, 2008
When you read this, I will have completed my “Meet the Guru” session in Utrecht for Eric Hartmann. More information is here. My “guru” talk is not worthy of its name. What I want to discuss is the relationship between two components of Google’s online infrastructure. This venue will mark the first public reference to a topic I have been tracking and researching for several years–computational intelligence. Some background information appears in the Ignorance Is Futile Web log here.
I am going to reference my analysis of Google’s innovation method. I described this in my 2007 study The Google Legacy, and I want to mention one Google patent document; specifically, US20070198481, which is about fact extraction. I chose this particular document because it references research that began a couple of years before the filing and the 2007 granting of the patent. It’s important in my opinion because it reveals some information about Google’s intelligent agents, which Google references as “janitors” in the patent application. Another reason I want to highlight it is that it includes a representation of a Google results list as a report or dossier.
Each time I show a screen shot of the dossier, any Googlers in the audience tell me that I have Photoshopped the Google image, revealing their ignorance of Google’s public patent documents and the lousy graphical representations that Google routinely places in its patent filings. The quality of the images and the cute language like “janitors” are intended to make it difficult to figure out what Google engineers are doing in the Google cubicles. Any Googlers curious about this image (reproduced below) should look at Google’s own public documents before accusing me of spoofing Googzilla. This now happens frequently enough to annoy me, so, Googlers, prove you are the world’s smartest people by reading your own patent documents. That’s what I do to find revealing glimpses such as this one display for a search of the bound phrase “Michael Jackson”:
The highlight boxes and call outs are mine. What this diagram shows is a field (structured) report or dossier about Michael Jackson. The red vertical box identifies the field names of the data and the blue rectangle points your attention to the various names by which Michael Jackson is known; for example, Wacko Jacko.
Now this is a result that most people have never seen. Googlers react to this in shock and disbelief because only a handful of Google’s more than 19,000 employees have substantive data about what the firm’s top scientists are doing at their jobs. I’ve learned that 18,500 Googlers “run the game plan”, a Google phrase that means “Do what MOMA tells you”. Google patent documents are important because Google has hundreds of US patent applications and patents, not thousands like IBM and Microsoft. Consequently, there is intent behind funding research, paying attorneys, and dealing with the chaotic baloney that is the specialty of the USPTO.
Attensity and BzzAgent: What’s the Angle
September 14, 2008
Attensity made a splash in the US intelligence community after 2001. A quick review of Attensity’s news releases suggests that the company began shifting its marketing emphasis from In-Q-Tel related entities to the enterprise in 2004-2005. By 2006, the company was sharpening its focus on customer support. Now Attensity is offering a wider range of technologies to organizations wanting to deal with their customers using Attensity’s technology.
In August 2008, the company announced that it had teamed up with the oddly named BzzAgent to provide insights into consumer conversations. BzzAgent, a specialist in word of mouth media. You can learn more about WOM–that is, word of mouth marketing–at the company’s Web site here.
The Attensity technology makes it possible for BzzAgent to squeeze meaning out of email or any other text. With the outputs of the Attensity system, BzzAgent can figure out whether a product is getting marketing lift or down draft. Other functionality provides beefier metrics to buttress the BaaAgent’s technology.
The purpose of this post is to ask a broader question about content processing and text analytics? To close, I want to offer a comment about the need to find places to sell rocket science information technology.
Why Chase Customer Support?
The big question is, “Why chase customer support?” Call centers, self service Web sites, and online bulletin board systems have replaced people in many organizations. In an effort to slash the cost of support, organizations have outsourced help to countries with lower wages than the organization’s home country. In an interesting twist of fate, Indian software outsourcing firms are sending some programming and technical work back to the US. Atlanta has been a beneficiary of this reverse outsourcing, according to my source in the Peach State.
Attensity’s technology performs what the company once described as “deep extraction.” The idea is to iterate through source documents. The process outputs metadata, entities, and a wide range of data that one can slice, dice, chart, and analyze. Attensity’s technology is quite advanced, and it can be tricky to optimize to get the best performance from the system on a particular domain of content.
Customer support appears to be a niche that functions like a hamburger to a hungry fly buzzing around tailgaters at the college football game. Customer support, despite vendors’ efforts to reduce costs and keep customers happy, has embraced every conceivable technology. There are the “live chat” telepresence services. There work fine until the company realizes that customers may be in time zones when the company is not open for business. There are the smart systems like the one Yahoo deployed using InQuira’s technology. To see how this works, navigate to Yahoo help central, type this question “How do I can premium email?”, and check out the answers. There are even more sophisticated systems deployed using tools from such companies as RightNow. This firm includes work flow tools and consulting to improve customer support services and operations.
The reason is simple–customer support remains a problem, or as the marketers say, “An opportunity.” I know that I avoid customer support whenever possible. Here’s a typical example. Verizon sent me a flier that told me I could reduce my monthly wireless broadband bill from $80 to $60. It took a Web site visit and six telephone calls to find out that the lower price came with a five gigabyte bandwidth cap. Not only was I stressed by the bum customer support experience, I was annoyed at what I perceived rightly or wrongly as the duplicity of the promotion. Software vendors jump at the chance to license Verizon a better mousetrap. So far, costs may have come down for Verizon, but this mouse remains far away from the mouse trap.
The new spin on customer support rotates around one idea: find out stuff * before * the customer calls, visits the Web site, or fires up a telepresence session.
That’s where Attensity’s focus narrows its beam. Attensity’s rocket science technology can support zippy new angles on customer support; for example, BzzAgent’s early warning system.
What’s This Mean for Search and Content Processing?
For me that is the $64 question. Here’s what I think:
- Companies like Attensity are working hard to find niches where their text analytics tools can make a difference. By signing licensing deals with third parties like BzzAgent, Attensity gets some revenue and shifts the cost of sales to the BzzAgent’s team.
- Attensity’s embedding or inserting its technology into BzzAgent’s systems deemphasizes or possibly eliminates the brand “Attensity” from the customers’ radar. Licensing deals deliver revenue with a concomitant loss of identify. Either way, text analytics moves from the center stage to a supporting role.
- The key to success in Attensity’s marketing shift is getting to the new customers first. A stampede is building from other search and content processing vendors to follow a very similar strategy. Saturation will lower prices, which will have the effect of making the customer support sector less attractive to text processing companies than it is now. ClearForest was an early entrant, but now the herd is arriving.
The net net for me is that Attensity has been nimble. What will the arrival of other competitors in the customer support and call center space mean for this niche? My hunch is that search and content processing is quickly becoming a commodity. Companies just discovering the customer support market will have to displace established vendors such as InQuira and Attensity.
Search and content processing certainly appear to be headed rapidly toward commoditization unless the vendor can come up with a magnetic, value add.
Stephen Arnold, September 14, 2008
Search: A Failure to Communicate
September 12, 2008
At lunch today, the ArnoldIT.com team embraced a law librarian. For Mongolian beef, this information professional agreed to talk about indexing. The conversation turned to the grousing that lawyers do when looking for information. I remembered seeing a cartoon that captured the the problem we shelled, boiled, and deviled during our Chinese meal.
Source: http://www.i-heart-god.com/images/failure%20to%20communicate.jpg
Our lunch analysis identified three constituencies in a professionals services organization. We agreed that narrowing our focus to consultants, lawyers, financial mavens, and accountants was an easy way to put egg rolls in one basket.
First, we have the people who understand information. Think indexing, consistent tagging for XML documents, consistent bibliographic data, the credibility of the source, and other nuances that escape my 86 year old father when he searches for “Chicago Cubs”.
Second, we have the information technology people. The “information” in their title is a bit of misdirection that leads to a stir fry of trouble. IT pros understand databases and file types. Once data are structured and normalized, the job is complete. Algorithms can handle the indexing and the metadata. When a system needs to go faster, the fix is to buy hardware. If it breaks, the IT pros tinker a bit and then call in an authorized service provider.
Third, we have the professionals. These are the ladies and gentlemen who have trained to master a specific professional skill; for example, legal eagle or bean counter. These folks are trapped within their training. Their notions of information are shaped by their dead lines, crazed clients, and crushing billability.
Here’s where the search system or content processing system begins it rapid slide to the greasy bottom of the organization’s wok.
- No one listens or understands the other players’ definition of “information”.
- The three players, unable to get their points across, clam up and work to implement their vision of information
- The vendors, hungry for the licensing deal, steer clear of this internal collision of ignorant, often supremely confident souls
- The system is a clunker, doing nothing particularly well.
Enter the senior manager or the CFO. Users are unhappy. Maybe the system is broken and a big deal is lost or a legal matter goes against the organization. The senior manager wants a fix. The problem is that unless the three constituents go back to the definition of information and carry that common understanding through requirements, to procurement, to deployment, not much will change.
Like the old joke says, “Get me some new numbers or I will get a new numbers guy.” So, heads may roll. The problem remains the same. The search and content processing system annoys a majority of its users. Now, a question for you two or three readers, “How do we fix this problem in professional services organizations?
Stephen Arnold, September 12, 2008
eDiscovery: Speed Bumps Annoy Billing Attorneys
September 12, 2008
A happy quack to my Australian reader who called “eDiscovery Performance Still a Worry”. The article by Greg McNevin appeared on the IDM.net.au Web site on September 10, 2008. The main point of the write up is that 60 percent of those polled about their organization’s eDiscovery litigation support system said, “Dog slow.” The more felicitous wording chosen by Mr. McNevin was:
The survey also found that despite 80 percent of organisations claiming to have made an investment in IT to address discovery challenges, 60 percent of respondents think their IT department is not always able to deliver information quickly enough for them to do their legal job efficiently.
The survey was conducted by Dynamic Markets, who polled 300 in house legal eagles in the Uk, Germany, and the Netherlands. My hunch is that the 60 percent figure may well apply in North America as well. My own research unearthed the fact that two thirds of the users of enterprise search systems were dissatisfied with those systems. The 60 percent score matches up well.
In my view, the larger implication of this CommVault study is that when it comes to text and content processing, more than half the users go away annoyed or use the system whilst grumbling and complaining.
What are vendors doing? There’s quite a bit of activity in the eDiscovery arena. More gladiators arrive to take the place of those who fall on their swords, get bought as trophies, or die at hands of another gladiator. Sadly, the activity does not address the issue of speed. In the sense for this context, “speed” in not three millisecond response time. “Speed” means transforming content, updating indexes, and generating the reports needed to figure out what information is where in the discovered information.
Many vendors are counting on Intel to solve the “speed” problem. I don’t think faster chips will do much, however. The “speed” problem is that eDiscovery relies on a great many processes. Lawyers, in general, have a need for what’s required to meet a deadline. There’s little reason for them to trouble their keen legal minds with such details as content throughput, malformed XML, flawed metatagging, and trashed indexes after an index update.
eDiscovery’s dissatisfaction score mirrors the larger problems with search and content processing. There’s no fix coming that will convert a grim black and white image to a Kodachrome version of reality.
Stephen Arnold, September 12, 2008
First Search Mini-Profile: Stratify
September 9, 2008
Beyond Search has started its search and content processing mini-profile series.
The first profile is about Stratify, and you can read it here.
The goal is to publish each week a brief snapshot of selected search and content processing vendors. The format of each profile will be a short essay that covers the background of the system, its principal features, strengths, weaknesses, and an observation. The idea inspiring each profile is to create a basic summary. Each vendor is invited to post additional information, links, and updates. On a schedule yet to be determined, each mini-profile will be updated and the comments providing new information deleted. The system allows a reasonable trade off between editorial control and vendor supplements. We will try to adhere to the weekly schedule. Our “Search Wizards Speak” series has been well received, and we will add interviews, but the interest in profiles has been good. Remember. You don’t need to write me “off the record” or even worse call me to provide insights, updates, and emendations. Please, use the comments section for each profile. I have other work to do. I enjoy meeting new people via email and the phone, the volume of messages to me is rising rapidly. Enjoy the Stratify post. You will find the profiles under the “Profile” tab on the splash page for the Web log. I will post a short news item when a new profile becomes available. Each profile will be indexed with the key word “profile”.
Stephen Arnold, September
Oracle Teams with ekiwi
September 8, 2008
ekiwi, based in Provo, Utah, has formed a relationship with Oracle. The company was founded in 2002. It focuses on Web based data extraction. The firm’s Screen-Scraper technology is, the news release asserts, “platform-independent and designed to integrate with virtually any existing information technology system.”
The company describes Screen Scraper this way here:
It consists of a proxy server that allows the contents of HTTP and HTTPS requests to be viewed, and an engine that can be configured to extract information from Web sites using special patterns and regular expressions. It handles authentication, redirects, and cookies, and contains an embedded scripting engine that allows extracted data to be manipulated, written out to a file, or inserted into a database. It can be used with PHP, .NET, ColdFusion, Java, or any COM-friendly language such as Visual Basic or Active Server Pages.
Oracle’s revenues are in the $18 to 20 billion range. ekiwi’s revenues may be more modest. Oracle, however, has turned to ekiwi for screen scraping technology to enhance the content acquisition capabilities of Oracle’s flagship enterprise search system, Secure Enterprise Search 10g or SES10g. In May 2008, one of Oracle’s senior executives told me that SES10g was key player in the enterprise search arena and SES10g sold because it was secure. Security, I recall being told, was the key differentiation.
This deal suggests that SES10g has to turn to up-and-coming screen scraping vendors to expand the capabilities of SES10g. I’m still puzzling over this deal, but that’s clearly my inability to understand the sophisticated management thinking that fuels SES10g to its lofty position among the search and content processing vendors.
The news release makes it clear that e-kiwi can access content from the “deep Web”. This buzzword means to me dynamic, database-driven sites. Google has its “deep Web” technologies which may be in part described in its five Programmable Search Engine patents, published by the USPTO as patent applications, in February 2007.
e-kiwi, which offers a very useful Web log here, is:
…a member of the Oracle PartnerNetwork, has worked with Oracle to develop an adaptor that integrates ekiwi’s Screen Scraper with Oracle Secure Enterprise Search to help significantly expand the amount of enterprise content that can be searched while maintaining existing information access and authorization policies. The Oracle Secure Enterprise Search product provides a secure, easy-to-use enterprise search platform that connects to a broad range of enterprise applications and data sources.
The release continues:
The two technologies have already been coupled in a number of cases that demonstrate their ability to work together. In one instance cell phones from many of the major providers were crawled by Screen-Scraper and indexed by Oracle Secure Enterprise Search. A user shopping for cell phones is then able to search, filter, and browse from a single location the various cell phone models by attributes such as price, form factor, and manufacturer. In yet another case, Screen-Scraper was used to extract forum postings from various photography aficionado web sites. This information was then made available through Oracle Secure Enterprise Search, which made it easy to conduct internal marketing analysis on recently released cameras.
I did some poking around and came up short after a quick look at my files and running a couple of Web searches. Information is located, according to the news story about the deal, here. The url is http//:www.screen-scraper.com/ss4ses/. The link redirected for me to http://www.w3.org/Protocols/. The company’s Web site is at http://www.screen-scraper.com, and it looks like this on September 7, 2008, at 8 pm Eastern:
I am delighted that SES10g can acquire Web-based content in dynamic systems. I remain confused about the functions included with SES10g. My understanding was that SES10g was easily extensible, compatible with Oracle Applications, Fusion, and other Oracle technologies. If this were true, SES10g’s ability to pull content from databased services should be trivial for the firm’s engineering team. I was hoping for an upgrade to SES10g, but that seems not to be in the cards at this time. Scraping Web pages seems to be a higher priority that getting a new release out the door. What’s your understanding of Oracle’s enterprise search strategy? I’m confused. Help me out, please.
Stephen Arnold, September 8, 2008
New Beyond Search White Paper: Coveo G2B for Mobile Email Search
September 8, 2008
The Beyond Search research team prepared a white paper about Coveo’s new G2B for Email product. You can download a copy from us here or from Coveo here. Coveo’s system works across different mobile devices, requires no third-party viewers, delivers low-latency access when searching, evidenced no rendering issues, and provided access to contacts and attachments as well as the text in an email. When compared to email search solutions from Google, Microsoft and Yahoo–Coveo’s new service provided a more robust and functional service. Beyond Search identified 13 features that set G2B apart. These include a graphical administrative interface, comprehensive usage reports, and real time indexing of email. The Beyond Search research team—Stephen Arnold, Stuart Schram, Jessica Bratcher, and Anthony Safina–concluded that Coveo established a new benchmark for mobile email search. For more information about Coveo, navigate to www.coveo.com. Pricing information is available from Coveo.
Stephen Arnold, September 5, 2008