IBM Watson Has a Tough Question to Answer

November 6, 2014

In a sense, the erosion of a well-known company is interesting to watch. Some might use the word “bittersweet.” IBM has been struggling. Its “profits” come from stock buybacks, reductions in force, cost cutting, and divestitures. Coincident with the company’s quarterly financial reports, I heard two messages.

  1. We are not going to hit the 2015 targets we said we were going to hit
  2. IBM paid another company money to “acquire” one of IBM’s semiconductor units.

I may have these facts wrong, but what’s important is that the messaging about IBM’s strategic health sends signals which I find troubling. IBM is a big company, and it will take time for its ultimate trajectory to be discernable. But from my vantage point in rural Kentucky, IBM has its work cut out for its thousands of professionals.

I read “Does Watson Know the Answer to IBM’s Woes?” Compared to other Technology Review write ups about IBM’s projected $10 billion revenue juggernaut, the article finally suggests that IBM’s Watson may not be the unit that produces billions in new revenue.

Here’s a passage I highlighted with my trusty yellow marker:

Watson is still a work in progress. Some companies and researchers testing Watson systems have reported difficulties in adapting the technology to work with their data sets. IBM’s CEO, Virginia Rometty, said in October last year that she expects Watson to bring in $10 billion in annual revenue in 10 years, even though that figure then stood at around $100 million.

Let’s consider this $100 million number. If it is accurate, IBM is now one eighth the size of Autonomy when HP paid $11 billion for the company. It took Autonomy more than 14 years to hit this figure. In order to produce $800 million in revenue, Autonomy had to invest, license, and acquire technology and businesses. In total, Autonomy was more like an information processing holding company, not a company built on a one trick pony like Google’s search and advertising technology. Autonomy’s revenue was diversified for one good reason: It has been very difficult to built multi billion dollar businesses on basic search and retrieval. Google hit $60 billion because it hooked search to advertising. Autonomy generated seven times more revenue than Endeca because it was diversified. Endeca never broke out of three main product lines: ecommerce, search, and business intelligence. And Endeca never generated more than an estimated $160 million in revenue per year at the time of its sale to Oracle. Even Google’s search appliance fell short of Autonomy’s revenues. Now IBM wants to generate more money from search than Autonomy and in one third the time. Perhaps IBM could emulate Mike Lynch’s business approach, but event then this seems like a bridge too far. (This is a more gentle way of saying, “Not possible in 60 months.”)

It is very difficult to generate billions of dollars from search without some amazing luck and an angle.

If IBM has $100 million in revenue, how will the company generate $1 billion and then an additional $9 billion. The PR razzle dazzle that has involved TV game shows, recipes with tamarind, and an all out assault on main stream media about Watson has been impressive. In search, $100 million is a pretty good achievement. But $100 million does not beget $1 billion without some significant breakthroughs in marketing, technology, must have applications, and outstanding management.

From my point of view, Technology Review and other high profile “real” news outfits have parroted the IBM story about Watson, artificial intelligence, and curing cancer. To IBM’s credit, it has refrained from trying to cure death. Google has this task in hand.

The story includes a modest but refreshing statement about the improbability of Watson’s financial goal:

“It’s not taking off as quickly as they would like,” says Robert Austin, a professor of management at Copenhagen Business School who has studied IBM’s strategy over the years. “This is one of those areas where turning demos into real business value depends on the devils in the details. I think there’s a bold new world coming, but not as fast as some people think.”

As the story points out, “Watson is still a work in progress.”

Hey, no kidding?

Stephen E Arnold, November 6, 2014

Attivio Highlights Content Intake Issues

November 4, 2014

I read “Digesting Ingestion.” The write up is important because it illustrates how vendors with roots in traditional information retrieval like Attivio are responding to changing market demands.

The article talks about the software required to hook a source like a Web page or a dynamic information source to a content processing and search system. Most vendors provide a number of software widgets to handle frequently encountered file types; for example, Microsoft Word content, HTML Web pages, and Adobe PDF documents. However, when less frequently encountered content types are required, a specialized software widget may be required.

Attivio states:

There are a number of multiplicative factors to consider from the perspective of trying to provide a high-quality connector that works across all versions of a source:

·         The source software version, including patches, optional modules, and configuration

·         Embedded or required 3rd party software (such as a relational database), including version, patches, optional modules and configuration

·         Hardware and operating system version, including patches, optional modules, and configuration

·         Throughput/capacity of the repository APIs

·         Throughput/capacity and ability to operate in parallel.

This is useful information. In a real world example, Attivio reports that a number of other factors can come into play. These range from lacking appropriate computing resources to corrupt data that connectors send to the exception folder and my favorite Big Data.

Attivio is to be credited for identifying these issues. Search-centric vendors have to provide solutions to these challenges. I would point out that there are a number of companies that have leapfrogged search-centric approaches to high volume content intake.

These new players, not the well known companies providing search solutions, are the next generation in information access solutions. Watch for more information about automated collection and analysis of Internet accessible information and the firms redefining information access.

Stephen E Arnold, November 4, 2014

Connotate: Automated Data Extraction That Seems to Work Like a Traditional Alert

November 4, 2014

I found the write up “Using Automated Data Extraction to Find Out Who Makes How Much and Where They Make It” suggestive of what search systems will have to do to survive. The blog post presents information about Connotate’s automation functions.

I learned that Connotate has a client interested in gathering information about salaries. The write up reported:

They’re [the client] trying to scale up and found they could look into salaries and titles only in downtimes, and that wasn’t very often. In fact, they’ve been able to go to only a couple of websites and get information for just two job titles in two countries. But their plans call for learning about hundreds, if not thousands, of job titles across 75 countries. Since they were doing this manually, and only when time permitted, getting to where they needed to be was almost impossible.

The shift to automation as a key feature of information access is important. However, note that the client had a known problem and knew what information was required. Connotate then performed a standing query on accessible content and provided outputs to the client.

However, what about clients who do not know what information is germane to their business? How can automation that mimics knowing what to look for assist with pinpointing unknowns?

Search vendors will have to shift into a different development mode in order to provide services that deal with high volatility and unknowns in today’s business climate.

Stephen Arnold, November 4, 2014

Overreliance on Search Engines?

November 3, 2014

The article titled Google Makes Us All Dumber: The Neuroscience of Search Engines probes the ever-increasing reliance on search engines and finds that the way we use them is problematic. This is due to the way our brains respond to this simplified question and answer process. The article stipulates that the harder we work for knowledge, the more likely we are to store it. When it is as simple as typing in a search query and reading a simple answer, we will forget the answer as easily as we found it. The article explains,

“It’s not that the Internet is making us stupid or incurious. Only we can do that. It’s that we will only realize the potential of technology and humans working together when each is focused on its strengths — and that means we need to consciously cultivate effortful curiosity. Smart machines are taking over more and more of the tasks assumed to be the preserve of humans. But no machine, however sophisticated, can yet be said to be curious.”

We are all guilty of using Google as a shortcut to end a fight over a fact or using IMDB to quickly be reminded of that actor what’s-her-name in that movie whatdya-callit. But the article points out that as we lean more on search engines in this fashion it will only shorten our recalls and diminish our ability to ask interesting questions.

Chelsea Kerwin, November 03, 2014

Sponsored by, developer of Augmentext

Predictive Analytics: An Older Survey with Implications for 2015

November 2, 2014

In my files I had a copy of the 2009 Predictive Analytics World survey about, not surprisingly, predictive analytics. When I first reviewed the data in the report, I noted that “information retrieval” or “search” were not to be found. Before the bandwagon began to roll for predictive analytics, search technology was not in the game if I interpret the survey data correctly.

The factoid I marked was revealed in this table:


The planned use of predictive analytics was for fraud detection.It appears that 64 percent of the sample planned to adopt predictive analytics for criminal or terrorist detection. The method requires filtering various types of public information including text.

Are vendors of enterprise search and content processing systems leaders in this sector in 2014? Based on my research, content processing vendors provide the equivalent of add-in utilities to the popular systems. The hypothesis I have formulated is that traditional information retrieval companies find themselves relegated to a supporting role.

Looking forward to 2015, I see growing dominance by leaders in the cyber OSINT market. Irrelevancy awaits the traditional search vendor unable to identify and then deliver high value solutions to a high demand, high growth market sector.

Has IDC or Dave Schubmehl tracked this sector? I don’t think so. As I produce more information about this market, I anticipate some me-too activity, however.

Stephen E Arnold, November 2, 2014

Whither Bing?

October 31, 2014

I learned in “Microsoft’s Advertising Unit Shuts Down Global Agency, Creative Team in Latest Layoffs” that the latest round of cutbacks strike at Bing ad sales. Other announcements revealed that one of the Microsoft cheerleaders for the search engine optimization crowd has been given an opportunity to find his future elsewhere. (See “Bing Severs Ties with Webmasters by Firing Duane Forrester.”)

What’s up? Well, maybe Microsoft has come to the conclusion that Google is going to get most of the money from online search.

From my vantage point in Harrod’s Creek, the shift makes it clear that what Bing was doing wasn’t making certain folks in the executive suite giddy with joy.

So, let me ask the interesting question, “Has Google claimed another Web search victim?” If that is the case you will want to read my forthcoming article in Information Today about the outlook for Qwant, the French Google killer built on Pertimm technology and singled out by Eric Schmidt as a threat to all things Googley.

I know it is not popular to suggest that the Google is a monopoly, but if Microsoft is not committed to pumping money into Bing, who will challenge the balloon-crazed fighters of death in Mountain View?

How often do you use Jike, iSeek, Ixquick, or Yandex—or Bing?

Stephen E Arnold, October 31, 2014

New Look for Internet Archive

October 29, 2014

The Internet Archive has a new look. You may have seen the change, but I don’t visit the site too frequently. I have struggles with its search system.

The new look features many postage stamp graphics and some text. Click on a graphic and one is sent to the appropriate Archive page.

Here’s a screenshot of the content available to you.


How does one search this content? The search box returns a list of hits with an icon indicating the content type. Have the cheerleaders for unified search would have cracked the information access challenge for a single search box to access mixed content types? I am still a fan of one at a time searching. Inefficient, but I get a sense of the collection’s scope and the idiosyncrasies of the indexed information.

Searching today is more difficult than it was in 1980 in my opinion. The method required is to know what in a collection before one queries it.

How does one know what’s in each of these collections? Well, unfortunately you can no longer ask a librarian in many organizations.

You are on your own, pilgrim.

Stephen E Arnold, October 29, 2014

EasyAsk Adds Glitter to Oya Costumes

October 29, 2014

I learned that Oya Costumes has tapped EasyAsk to provide the search function for You can read the news release here. I clicked around using drop downs and facets. I did run a query to locate a suitable Harrod’s Creek Halloween costume. I searched for Darth Vader. The results were mostly on point. There was one anomaly, an inflatable purple suit. Perhaps Darth has a side few know about.

Here’s the result page for my query:


Here’s a close up of the purple outfit mapped to the query “Darth Vader.”


I quite like the inflatable purple suit. I assume it is semantically related to Mr. Vader.

Stephen E Arnold, October 29, 2014

The Cheating Search Engine

October 29, 2014

Spokeo is a people search engine that when you enter in a name, email address, or username it returns personal information. The results usually include address, phone number, other email addresses, or aliases. People searches are the digital equivalent of a phone book’s white pages, except it is easier to manipulate the information for the desired output. The results, however, are limited based off how much you want to pay for.

Most of the time people search engine advertisements are banal, but Spokeo has tried a new approach. When you visit the Web site, you are greeted by the following caption: “Is He Cheating On You?” This warning follows it:

CAUTION: This information is potentially shocking. Spokeo uses proprietary deep web technology to search over 70 social networks for status updates, photos, relationships, and profiles. Please prepare yourself for the unexpected.”

A picture of a couple caught in a scandalous position flocks the search box. Spokeo is trying to appeal to an entire new clientele, perhaps the kind who click on advertisements about learning a new language in a week or melting away the pounds with a new, exclusive diet pill offer. The search results include dating profiles, social media accounts, aliases, hidden photos, etc. The type of information you

While Spokeo was not the first people search engine of choice, it does provide basic information about individuals. This new advertising campaign, however, pushes it into the lowbrow Internet and makes its content questionable. Why the sudden change in marketing? Is Spokeo seeing a revenue drop or has seen a spike in profits when used as a cheating search engine?

Whitney Grace, October 29, 2014
Sponsored by, developer of Augmentext

Beyond Intranet Search

October 28, 2014

Apparently, there is a difference between search and knowledge management; I guess you learn something new every day. CMS Wire asks, “Intranet Search: Where Documents Go to Die or KM Enabler?” Writer Jed Cawthorne uses Coveo’s platform to illustrate ways a company can go beyond the “baked in” search functionality in an intranet content management system. He writes:

“You don’t need to stick with the ‘built in solution’ if search is important to your KM / Enterprise Information Management strategies. There are alternatives beyond the ever more standard SharePoint (even though building FAST technology into core SharePoint 2013 has improved it) or the really big (and expensive) heavy hitters like HP’s IDOL platform.

“With the growing rate at which our mountains of internal content grow ever bigger, search capabilities are a fundamental element of an intranet, and of the broader digital workplace. If you want to apply long tail principles to mountains of social content, such as discussion forums, news feeds and updates, a search engine with concept search capabilities would be a good idea, unless you have a work force which is truly at one with tagging absolutely everything with appropriate and valuable metadata … (what, you work in the Library of the Jedi Temple? Cool!).”

Cawthorne spoke to Coveo’s Diane Berry about her company’s knowledge management options. She emphasizes broad content-source connectivity, metadata enrichment through text analytics (for companies lacking Jedi librarians), and building taxonomies through entity extraction. A user-interface based on users’ needs is also key, she notes, and mobile interfaces are a part of that. So is making it easy to adjust search and analysis parameters. See the write-up for more details and some screenshots that illustrate these points.

Cynthia Murrell, October 28, 2014

Sponsored by, developer of Augmentext

« Previous PageNext Page »