Disney and Search

November 10, 2014

I won’t bore you with the Disney InfoSeek adventure. Sigh. If you want to know how Disney is approaching Web search, read “Disney Fights Piracy With New Search Patent.” The system and method is intended to filter out content not licensed by means known to Disney. The write up’s headline suggests that a system and method in the form of a patent will “fight piracy.” Interesting notion, but I think the idea is that Disney has built or will build a system that shows only “official” content.

The notion of building a specialist Web site is an interesting idea. The reality may be that traffic will be very hard to come by. The most recent evidence is Axil Springer’s capitulation to the Google. Axil Springer owns a chunk of Qwanta. Again a good idea, but it does not deliver traffic.

If you build a search engine, who will use it? Answer: Not too many people if the data available to me are correct.

Stephen E Arnold, November 210, 2014

POTUS, Fear, and Google

November 8, 2014

I have zero clue of this article—“Movie Chief: Obama Is Scared to Push Google, ISPs on Piracy”—is accurate. Let’s for the moment assume that the write up by Andy is right as rain.

Here’s a statement I noted:

“It’s sad because if we had a good president that cared about the film industry he would pass a very simple law, an anti-piracy law, but they don’t want to stop it because they are scared of Google, and he’s scared of all the ISPs,” Lerner says. Google’s power and money not only scares off the President but Congress too, Lerner adds. Furthermore, plenty of that revenue is coming piracy-related sources, so the company has no incentive stop it.

Let’s look at the entities in this article.

  • The president of the United States or POTUS
  • Nu Image CEO and founder Avi Lerner
  • The GOOG.

As I understand it, Google which worked out a friendly deal with Axil Springer the other day is just as cuddly as a child chewed Harrod’s teddy bear. The POTUS is able to send troops, issue Executive Orders, and disrupt traffic when he ventures out into the amber waves of grain. (Is there “grain” in LA?) Mr. Lerner is a movie mogul. I am not sure what a movie mogul does. I think it involves creating high value intellectual property which puts Shakespeare and Milton in a state of inferiority.

The point is that movie moguls and POTUS are not as powerful as Google.

From Google’s point of view, that’s the way life is supposed to work. Problems with that, pilgrim. Well, you can always take your queries to Yahoo or, better yet, Qwanza OR Qwanta, whatever. (Try typing that name rapidly on your iPhone.)

Keep in mind that the source write up may not be spot on. It is entertaining, though.

Stephen E Arnold, November 8, 2014

Attensity: Downplaying Automated Collection and Analysis

November 7, 2014

I read “Do What I Mean, Not What I Say: The Text Analytics Paradox.” The write up made a comment which I found interesting; to wit:

Now, before you start worrying about robots replacing humans (relax—that’s at least a couple of years away), understand this: context and disambiguation within these billions of daily social posts, tweets, comments, and online surveys is they key to viable, business-relevant data. The way human use language is replete with nuance, idiomatic expressions, slang, typos, and of course, context. This underscores the magnitude of surfacing actionable intelligence in data for any industry.

Based on information my research team has collected, the notion of threat detection via automated collection and analysis of Internet-accessible information is quite advanced. In fact, some of the technology has been undergoing continuous refinement since the late 1990s. Rutgers University has been one of the academic outfits in the forefront of this approach to the paradox puzzling Attensity.

The more recent entrants in this important branch (perhaps an new redwood in the search forest) of information access are keeping a low profile. There is a promising venture funded company in Baltimore as well as a China-based firm operating from offices in Hong Kong. Neither of these companies has captured the imagination of traditional content processing vendors for three reasons:

First, the approach is not from traditional information retrieval methodologies.

Second, the companies generate most of their revenue from organizations demanding “quiet service.” (This means that when there is no marketing hoo hah, the most interesting companies are simply not visible to the casual, MBA inspired analyst.

Third, the outputs are of stunning utility. Information about quite particular subjects are presented without recourse to traditional human intermediated fiddling.

I want to float an idea: The next generation firms delivering state of the art solutions and have yet to hit the wall that requires the type of marketing that now characterizes some content processing efforts.

I am trying to figure out how to present these important but little known players. I will write about one in my next Info Today article. The challenge is that there are two dozen firms pushing “search” in a new and productive direction.

Stephen E Arnold, November 7, 2014

Insights from Search Pro Dave Hawking

November 7, 2014

Search-technology expert Dave Hawking is now working with Microsoft to improve Bing. Our own Stephen Arnold spoke to Mr. Hawking when he was still helping propel Funnelback to great heights. Now, IDM Magazine interviews the search wizard about his new gig, some search history, and challenges currently facing enterprise search in, “To Bing and Beyond.”

Anyone interested in the future of Bing, Microsoft, or enterprise search, or in Australian computer-science history, should check out the article. I was interested in this bit Hawking had to say about ways that tangled repository access can affect enterprise search:

“Access controls for particular repositories are often out of date, inappropriate, and inconsistent, and deployment of enterprise search exposes these problems. They can arise from organisational restructuring, staff changes or knee-jerk responses to unauthorised accesses. As there are usually a large number of repositories, rationalising access controls to ensure that search results respect policies is a lot of work.

“Organisations vary widely in their approach to security: some want security enforced with early binding (recording permissions at indexing time), others want late binding, where current permissions are applied when query result are displayed, or a hybrid of the two.

“This choice has a major impact on performance. Another option is ‘translucency’, where users may see the title of a document but not its content, or receive an indication that documents matching the query exist but that they need to request permission to access them. As well these security model variations, organisations vary in their requirements for customization, integration and presentation, and how results from multiple repositories should be prioritized, tending to make enterprise search projects quite complex.”

Eventually, standards and best practices may spread that will reduce these complexities. Then again, perhaps technology now changes too fast for such guidelines to take root. For now, at least, experts who can skillfully navigate this obstacle-strewn field will continue to command a pretty penny.

Cynthia Murrell, November 07, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

IBM Watson Has a Tough Question to Answer

November 6, 2014

In a sense, the erosion of a well-known company is interesting to watch. Some might use the word “bittersweet.” IBM has been struggling. Its “profits” come from stock buybacks, reductions in force, cost cutting, and divestitures. Coincident with the company’s quarterly financial reports, I heard two messages.

  1. We are not going to hit the 2015 targets we said we were going to hit
  2. IBM paid another company money to “acquire” one of IBM’s semiconductor units.

I may have these facts wrong, but what’s important is that the messaging about IBM’s strategic health sends signals which I find troubling. IBM is a big company, and it will take time for its ultimate trajectory to be discernable. But from my vantage point in rural Kentucky, IBM has its work cut out for its thousands of professionals.

I read “Does Watson Know the Answer to IBM’s Woes?” Compared to other Technology Review write ups about IBM’s projected $10 billion revenue juggernaut, the article finally suggests that IBM’s Watson may not be the unit that produces billions in new revenue.

Here’s a passage I highlighted with my trusty yellow marker:

Watson is still a work in progress. Some companies and researchers testing Watson systems have reported difficulties in adapting the technology to work with their data sets. IBM’s CEO, Virginia Rometty, said in October last year that she expects Watson to bring in $10 billion in annual revenue in 10 years, even though that figure then stood at around $100 million.

Let’s consider this $100 million number. If it is accurate, IBM is now one eighth the size of Autonomy when HP paid $11 billion for the company. It took Autonomy more than 14 years to hit this figure. In order to produce $800 million in revenue, Autonomy had to invest, license, and acquire technology and businesses. In total, Autonomy was more like an information processing holding company, not a company built on a one trick pony like Google’s search and advertising technology. Autonomy’s revenue was diversified for one good reason: It has been very difficult to built multi billion dollar businesses on basic search and retrieval. Google hit $60 billion because it hooked search to advertising. Autonomy generated seven times more revenue than Endeca because it was diversified. Endeca never broke out of three main product lines: ecommerce, search, and business intelligence. And Endeca never generated more than an estimated $160 million in revenue per year at the time of its sale to Oracle. Even Google’s search appliance fell short of Autonomy’s revenues. Now IBM wants to generate more money from search than Autonomy and in one third the time. Perhaps IBM could emulate Mike Lynch’s business approach, but event then this seems like a bridge too far. (This is a more gentle way of saying, “Not possible in 60 months.”)

It is very difficult to generate billions of dollars from search without some amazing luck and an angle.

If IBM has $100 million in revenue, how will the company generate $1 billion and then an additional $9 billion. The PR razzle dazzle that has involved TV game shows, recipes with tamarind, and an all out assault on main stream media about Watson has been impressive. In search, $100 million is a pretty good achievement. But $100 million does not beget $1 billion without some significant breakthroughs in marketing, technology, must have applications, and outstanding management.

From my point of view, Technology Review and other high profile “real” news outfits have parroted the IBM story about Watson, artificial intelligence, and curing cancer. To IBM’s credit, it has refrained from trying to cure death. Google has this task in hand.

The story includes a modest but refreshing statement about the improbability of Watson’s financial goal:

“It’s not taking off as quickly as they would like,” says Robert Austin, a professor of management at Copenhagen Business School who has studied IBM’s strategy over the years. “This is one of those areas where turning demos into real business value depends on the devils in the details. I think there’s a bold new world coming, but not as fast as some people think.”

As the story points out, “Watson is still a work in progress.”

Hey, no kidding?

Stephen E Arnold, November 6, 2014

Attivio Highlights Content Intake Issues

November 4, 2014

I read “Digesting Ingestion.” The write up is important because it illustrates how vendors with roots in traditional information retrieval like Attivio are responding to changing market demands.

The article talks about the software required to hook a source like a Web page or a dynamic information source to a content processing and search system. Most vendors provide a number of software widgets to handle frequently encountered file types; for example, Microsoft Word content, HTML Web pages, and Adobe PDF documents. However, when less frequently encountered content types are required, a specialized software widget may be required.

Attivio states:

There are a number of multiplicative factors to consider from the perspective of trying to provide a high-quality connector that works across all versions of a source:

·         The source software version, including patches, optional modules, and configuration

·         Embedded or required 3rd party software (such as a relational database), including version, patches, optional modules and configuration

·         Hardware and operating system version, including patches, optional modules, and configuration

·         Throughput/capacity of the repository APIs

·         Throughput/capacity and ability to operate in parallel.

This is useful information. In a real world example, Attivio reports that a number of other factors can come into play. These range from lacking appropriate computing resources to corrupt data that connectors send to the exception folder and my favorite Big Data.

Attivio is to be credited for identifying these issues. Search-centric vendors have to provide solutions to these challenges. I would point out that there are a number of companies that have leapfrogged search-centric approaches to high volume content intake.

These new players, not the well known companies providing search solutions, are the next generation in information access solutions. Watch for more information about automated collection and analysis of Internet accessible information and the firms redefining information access.

Stephen E Arnold, November 4, 2014

Connotate: Automated Data Extraction That Seems to Work Like a Traditional Alert

November 4, 2014

I found the write up “Using Automated Data Extraction to Find Out Who Makes How Much and Where They Make It” suggestive of what search systems will have to do to survive. The blog post presents information about Connotate’s automation functions.

I learned that Connotate has a client interested in gathering information about salaries. The write up reported:

They’re [the client] trying to scale up and found they could look into salaries and titles only in downtimes, and that wasn’t very often. In fact, they’ve been able to go to only a couple of websites and get information for just two job titles in two countries. But their plans call for learning about hundreds, if not thousands, of job titles across 75 countries. Since they were doing this manually, and only when time permitted, getting to where they needed to be was almost impossible.

The shift to automation as a key feature of information access is important. However, note that the client had a known problem and knew what information was required. Connotate then performed a standing query on accessible content and provided outputs to the client.

However, what about clients who do not know what information is germane to their business? How can automation that mimics knowing what to look for assist with pinpointing unknowns?

Search vendors will have to shift into a different development mode in order to provide services that deal with high volatility and unknowns in today’s business climate.

Stephen Arnold, November 4, 2014

Overreliance on Search Engines?

November 3, 2014

The Salon.com article titled Google Makes Us All Dumber: The Neuroscience of Search Engines probes the ever-increasing reliance on search engines and finds that the way we use them is problematic. This is due to the way our brains respond to this simplified question and answer process. The article stipulates that the harder we work for knowledge, the more likely we are to store it. When it is as simple as typing in a search query and reading a simple answer, we will forget the answer as easily as we found it. The article explains,

“It’s not that the Internet is making us stupid or incurious. Only we can do that. It’s that we will only realize the potential of technology and humans working together when each is focused on its strengths — and that means we need to consciously cultivate effortful curiosity. Smart machines are taking over more and more of the tasks assumed to be the preserve of humans. But no machine, however sophisticated, can yet be said to be curious.”

We are all guilty of using Google as a shortcut to end a fight over a fact or using IMDB to quickly be reminded of that actor what’s-her-name in that movie whatdya-callit. But the article points out that as we lean more on search engines in this fashion it will only shorten our recalls and diminish our ability to ask interesting questions.

Chelsea Kerwin, November 03, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Predictive Analytics: An Older Survey with Implications for 2015

November 2, 2014

In my files I had a copy of the 2009 Predictive Analytics World survey about, not surprisingly, predictive analytics. When I first reviewed the data in the report, I noted that “information retrieval” or “search” were not to be found. Before the bandwagon began to roll for predictive analytics, search technology was not in the game if I interpret the survey data correctly.

The factoid I marked was revealed in this table:

image

The planned use of predictive analytics was for fraud detection.It appears that 64 percent of the sample planned to adopt predictive analytics for criminal or terrorist detection. The method requires filtering various types of public information including text.

Are vendors of enterprise search and content processing systems leaders in this sector in 2014? Based on my research, content processing vendors provide the equivalent of add-in utilities to the popular systems. The hypothesis I have formulated is that traditional information retrieval companies find themselves relegated to a supporting role.

Looking forward to 2015, I see growing dominance by leaders in the cyber OSINT market. Irrelevancy awaits the traditional search vendor unable to identify and then deliver high value solutions to a high demand, high growth market sector.

Has IDC or Dave Schubmehl tracked this sector? I don’t think so. As I produce more information about this market, I anticipate some me-too activity, however.

Stephen E Arnold, November 2, 2014

Whither Bing?

October 31, 2014

I learned in “Microsoft’s Advertising Unit Shuts Down Global Agency, Creative Team in Latest Layoffs” that the latest round of cutbacks strike at Bing ad sales. Other announcements revealed that one of the Microsoft cheerleaders for the search engine optimization crowd has been given an opportunity to find his future elsewhere. (See “Bing Severs Ties with Webmasters by Firing Duane Forrester.”)

What’s up? Well, maybe Microsoft has come to the conclusion that Google is going to get most of the money from online search.

From my vantage point in Harrod’s Creek, the shift makes it clear that what Bing was doing wasn’t making certain folks in the executive suite giddy with joy.

So, let me ask the interesting question, “Has Google claimed another Web search victim?” If that is the case you will want to read my forthcoming article in Information Today about the outlook for Qwant, the French Google killer built on Pertimm technology and singled out by Eric Schmidt as a threat to all things Googley.

I know it is not popular to suggest that the Google is a monopoly, but if Microsoft is not committed to pumping money into Bing, who will challenge the balloon-crazed fighters of death in Mountain View?

How often do you use Jike, iSeek, Ixquick, or Yandex—or Bing?

Stephen E Arnold, October 31, 2014

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta