January 6, 2014
I follow two or three LinkedIn groups. Believe me. The process is painful. On the plus side, LinkedIn’s discussions of “enterprise search” reveal the broken ribs in the body of information retrieval. On the surface, enterprise search and content processing appear to be fit and trim. The LinkedIn discussion X-ray reveals some painful and potentially life-threatening injuries. Whether it is marketing professionals at search vendors or individuals with zero background in information retrieval, the discussions often give me a piercing headache.
The eruption of digital information posed a challenge to UK firms in Autonomy’s “Information Black Holes” report. © Autonomy, 1999
One of the “gaps” in the enterprise search sector is a lack of historical perspective. Moderators and participants see only the “now” of their search work. When looking down the information highway, the LinkedIn search group participants strain to see bright white lines. Anyone who has driven on the roads in Kentucky knows that lines are neither bright nor white. Most are faded, mere suggestions of where the traffic should flow.
In 1999, I picked up a printed document called “Information Black Holes.” The subtitle was this question, “Will the Evolution of EIPs Save British Business £17 Billion per Year?” The author of the report was an azure chip consulting firm doing business as “Continental Research.” The company sponsoring the research was Autonomy. Autonomy as a concept relates to “automatic”, “automation,” and “autonomous.” This connotation is a powerful one. Think “automation” and the mind accepts an initial investment followed by significant cost reductions. Autonomy had a name and brand advantage from its inception. Who remembers Cambridge Neurodynamics? Not many of the 20 something flogging search and content processing systems in 2014 I would wager.
As you may know, Hewlett Packard purchased Autonomy in 2011. I doubt that HP has a copy of this document, and I know that most of the LinkedIn enterprise search group members have not read the report. I understand because 15 year old marketing collateral (unlike Kentucky bourbon) does not often improve with age. But “Information Black Holes” is an important document. Unwittingly today’s enterprise search vendors are addressing many of the topics set forth in the 1999 Autonomy publication.
December 20, 2013
One of the ArnoldIT goslings called to my attention a 2011 PDF white paper with the title (I kid you not):
Human inFormation (sic): Cloud, pan enterprise search, automation, video search, audio search, discovery, infrastructure platfo9rm, Big Data, business process management, mobile search, OEMs, and advanced analytics.
I checked on December 19, 2013, and this PDF was available at http://bit.ly/19Vwkqg.
That covers a lot of ground even for HP with or without Autonomy. The analysis includes some “factoids”; for example:
- Unstructured data represents 85% of all information but structure information is growing at 22% CAGR
- Unstructured information is growing at 62% CAGR.
- Users upload 35 hours of video every minute
- Unstructured data will grow to over 35 zettabytes by 2020
- Videos on YouTube were viewed 2 billion times per day, 20 times more than in 2006.
You get the idea. With lots of data, information is a problem. I need to pause a moment and catch my breath.
Well, “it’s not just about search.” Again, I must pause. One Mississippi, two Mississippi, and three Mississippi. Okay.
Fundamentally, the ability to understand meaning and automatically process information is all about distance, probabilities, relativeness (sic), definitions, slang, and more. It is an overwhelming and continually growing problem that requires advanced technology to solve.
One technique is to use structured data methods to solve the unstructured problem. (Wasn’t this the approach taken by Fulcrum Technologies, what? 25 or 30 years ago? I just read a profile of Fulcrum that suggested Fulcrum did this first and continues chugging along within the OpenText product line up which competes directly with HP in information archiving.
HP points out, “People are Lazy.” More interesting is this observation, “People are stupid.” I thought about HP’s write off of billions after owning a company for a couple of years, but I assume that HP means “other people” are stupid, not HP people.
December 15, 2013
If you are interested in “artificial intelligence” or “artificial general intelligence”, you will want to read “Creative Blocks: The Very Laws of Physics Imply That Artificial Intelligence Must Be Possible. What’s Holding Us Up?” Artificial General Intelligence is a discipline that seeks to render in a computing device the human brain.
Dr. Deutsch asserts:
I cannot think of any other significant field of knowledge in which the prevailing wisdom, not only in society at large but also among experts, is so beset with entrenched, overlapping, fundamental errors. Yet it has also been one of the most self-confident fields in prophesying that it will soon achieve the ultimate breakthrough.
Adherents of making a machine’s brain work like a human’s are, says Dr. Deutsch:
split the intellectual world into two camps, one insisting that AGI was none the less impossible, and the other that it was imminent. Both were mistaken. The first, initially predominant, camp cited a plethora of reasons ranging from the supernatural to the incoherent. All shared the basic mistake that they did not understand what computational universality implies about the physical world, and about human brains in particular. But it is the other camp’s basic mistake that is responsible for the lack of progress. It was a failure to recognize that what distinguishes human brains from all other physical systems is qualitatively different from all other functionalities, and cannot be specified in the way that all other attributes of computer programs can be. It cannot be programmed by any of the techniques that suffice for writing any other type of program. Nor can it be achieved merely by improving their performance at tasks that they currently do perform, no matter by how much.
One of the examples Dr. Deutsch invokes is IBM’s game show “winning” computer Watson. He explains:
Nowadays, an accelerating stream of marvelous and useful functionalities for computers are coming into use, some of them sooner than had been foreseen even quite recently. But what is neither marvelous nor useful is the argument that often greets these developments, that they are reaching the frontiers of AGI. An especially severe outbreak of this occurred recently when a search engine called Watson, developed by IBM, defeated the best human player of a word-association database-searching game called Jeopardy. ‘Smartest machine on Earth’, the PBS documentary series Nova called it, and characterized its function as ‘mimicking the human thought process with software.’ But that is precisely what it does not do. The thing is, playing Jeopardy — like every one of the computational functionalities at which we rightly marvel today — is firmly among the functionalities that can be specified in the standard, behaviorist way that I discussed above. No Jeopardy answer will ever be published in a journal of new discoveries. The fact that humans perform that task less well by using creativity to generate the underlying guesses is not a sign that the program has near-human cognitive abilities. The exact opposite is true, for the two methods are utterly different from the ground up.
IBM surfaces again with regard to playing chess, a trick IBM demonstrated years ago:
Likewise, when a computer program beats a grandmaster at chess, the two are not using even remotely similar algorithms. The grandmaster can explain why it seemed worth sacrificing the knight for strategic advantage and can write an exciting book on the subject. The program can only prove that the sacrifice does not force a checkmate, and cannot write a book because it has no clue even what the objective of a chess game is. Programming AGI is not the same sort of problem as programming Jeopardy or chess.
After I read Dr. Deutsch’s essay, I refreshed my memory about Dr. Ray Kurzweil’s view. You can find an interesting essay by this now-Googler in “The Real Reasons We Don’t Have AGI Yet.” The key assertions are:
The real reasons we don’t have AGI yet, I believe, have nothing to do with Popperian philosophy, and everything to do with:
- The weakness of current computer hardware (rapidly being remedied via exponential technological growth!)
- The relatively minimal funding allocated to AGI research (which, I agree with Deutsch, should be distinguished from “narrow AI” research on highly purpose-specific AI systems like IBM’s Jeopardy!-playing AI or Google’s self-driving cars).
- The integration bottleneck: the difficulty of integrating multiple complex components together to make a complex dynamical software system, in cases where the behavior of the integrated system depends sensitively on every one of the components.
Dr. Kurzweil concludes:
The difference between Deutsch’s perspective and my own is not a purely abstract matter; it does have practical consequence. If Deutsch’s perspective is correct, the best way for society to work toward AGI would be to give lots of funding to philosophers of mind. If my view is correct, on the other hand, most AGI funding should go to folks designing and building large-scale integrated AGI systems.
These discussions are going to be quite important in 2014. As search systems do more thinking for the human user, disagreements that appear to be theoretical will have a significant impact on what information is displayed for a user.
Do users know that search results are shaped by algorithms that “think” they are smarter than humans? Good question.
Stephen E Arnold, December 15, 2013
December 13, 2013
I have been working through some of the archives in my personal file about search vendors. I came across a wonderfully amusing article from DMReview. The article “The Problem with Unstructured Data.”
Here’s the part I have circled in 2003, a decade ago, about the next big thing:
Content intelligence is maturing into an essential enterprise technology, comparable to the relational database. The technology comes in several flavors, namely: search, classification and discovery. In most cases, however, enterprises will want to integrate this technology with one or more of their existing enterprise systems to derive greater value from the embedded unstructured data. Many organizations have identified high-value, content intelligence-centric applications that can now be constructed using platforms from leading vendors. What will make content intelligence the next big trend is how this not-so-new set of technologies will be used to uncover new issues and trends and to answer specific business questions, akin to business intelligence. When this happens, unstructured data will be a source of actionable, time-critical business intelligence.
I can see this paragraph appearing without much of a change in any one of a number of today’s vendors’ marketing collateral.
I just finished an article for about the lack of innovation in search and content processing. My focus in that essay was from 2007 to the present. I will keep my eyes open for examples of jargon and high-flying buzzwords that reach even deeper into the forgotten past of search and retrieval.
The chit chat on LinkedIn about “best” search system is a little disappointing but almost as amusing as this quote from DM Review. Yep, “content intelligence” was the next big thing a decade ago. I suppose that “maturing” process is like the one used for Kentucky bourbon. No oak barrels, just hyperbole, for the search mavens.
Stephen E Arnold, January 26, 2013
December 12, 2013
Short honk. I came across an interesting marketing concept in “Diffbot and Semantria Join to Find and Parse the Important Text on the ‘Net (Exclusive).”
Semantria (a company that offers sentiment analysis as a service) participated in a hackathon in San Francisco. The explains:
To make the Semantria service work quickly, even for text-mining novices, Rogynskyy’s team decided to build a plugin for Microsoft’s popular Excel spreadsheet program. The data in a spreadsheet goes to the cloud for processing, and Semantria sends back analysis in Excel format.
Semantria sponsored a prize for the best app. Diffbot won:
A Diffbot developer built a simple plugin for Google’s Chrome browser that changes the background color of messages on Facebook and Twitter based on sentiment — red for negative, green for positive. The concept won a prize from Semantria, Rogynskyy said. A Diffbot executive was on hand at the hackathon, and Rogynskyy started talking with him about how the two companies could work together.
I like the “sponsor”, “winner” and “team up” approach. The pay off, according to the article, is “While Semantria and Diffbot technologies continue to be available separately, they can now be used together.”
Sentiment analysis is one of the search submarkets that caught fire and then, based on the churning at some firms like Attensity, may be losing some momentum. Marketing innovation may be a goal other firms offering this functionality in 2014.
Stephen E Arnold, December 12, 2013
December 12, 2013
The article titled Three Steps for Crushing Multi-Location Search on Search Engine Land offers tips for “local market opportunity” aka multi- location businesses taking advantage of local coverage in all of the areas serviced. The first tip is to know your local market coverage by identifying all of the areas you might be missing out in and compiling search volume data as well as average order value and doing some fancy mathematical footwork to understand more clearly where you stand to gain the most in terms of first page coverage on search engines. The second tip is to optimize your business listings.
The article states:
“Beef up your listings with as much data as you can provide — directions, payments accepted, localized description, categories, images, local coupons, photos, social network links and links to individual store pages can really make your listing stand out. I call it good data fidelity. This data — when accurate, current and consistent across locations — helps search engines deliver optimum results to user queries. And search engines live or die by delivering a good user experience through accurate results.”
The third and final suggestion is to keep the bulk and manual feeds for local maps through Google Plus, Bing Business and Yahoo up to date and accurate. These all comprise sound advice, but it was surprising to see that the author left out a major tip: buy Google ads.
Chelsea Kerwin, December 12, 2013
December 4, 2013
I suppose the notion of a biased sample is of little interest to the search and content processing mavens, poobahs, and vendors who want inputs. I just received from SearchBlox a request to provide information via a survey. I have seen survey requests from LinkedIn people, unknown PR types, and search vendors.
Here’s that the SearchBlox email displayed for me:
Are these surveys useful? In my experience, no. The dry stuff presented in Statistics 101 about sample selection is taught for a reason. Results from “please, respond” surveys are easily spoofed, distorted, or plain wrong.
SearchBlox offers a cloud search solution. Last I heard it was based on ElasticSearch’s technology. If you want to know more about a search vendor with no recollection of sampling methodology, navigate to www.searchblox.com.
Stephen E Arnold, December 4, 2013
December 4, 2013
I read “You Are the Query: Yahoo’s Bold Quest to Reinvent Search.” The write up explains that “search” is important to Yahoo. The buzzwords personalization and categorization make an appearance. There is no definition of “search.” So the story suggests that the new direction may be a “feed”, a stream of information. The passage I noted is:
So what is Yahoo building? To wit, the company is working on a new “personalization platform,” according to the LinkedIn profile of one Yahoo senior director. Cris Luiz Pierry, the director who headed up Yahoo’s now-shuttered Flipboard clone Livestand, writes that he is heading up a “stealth project,” and that he is “building the best content discovery and recommendation engine on the Web, across all of our regions.” Pierry also has an in-the-weeds search background, with experience in core Web search, ranking algorithms, and e-commerce software — which may come in handy when dealing with monetization.
A stealth search project. Didn’t Fulcrum Technologies operate in this way between 1983 and its run up to a much needed initial public offering in the early 1990s? Wasn’t the newcomer SRCH2 in stealth mode earlier in 2013?
The hook to the new approach may be nestled within this comment in the article:
That search experience would likely be layered on top of another company’s Web crawler, like Microsoft’s Bing, which took over those operations for Yahoo in 2010, as part of a 10-year deal. (More on that later.) Beginning in 2008.
Indexing the Web is an expensive proposition. No commercial publisher can afford it. Google is able to pull it off via its Yahoo-inspired ad model. Yandex is struggling to find monetization methods that allow it to keep its indexes fresh. But other Web indexers have had to cut back on coverage. Exalead’s Web index is thin gruel. Blekko has lost its usefulness for me. In fact, looking for information is now more difficult that it has been for a number of years.
Another interesting comment in the article jumped off the screen for me; to wit:
We firmly believe that the Search Product of tomorrow will not be anything alike [sic] the product that we are used to today,” says the job description for the search architect. The posting also name-checks Search Direct, Yahoo’s version of Google Instant, as the “first step” in changing the landscape of search. After testing out a few queries on Yahoo’s home page, the feature, which looks up queries without requiring the user to hit “search,” looks to be dormant.
The write up concludes with this speculative paragraph:
Some theories: The company could be planning a Bing exit strategy for 2015 or earlier, and look to partner with another Web crawler, aka Google. Some reports have said Mayer has been cozying up to her former company on that front. Or Yahoo could be rebuilding its own core search capabilities, though that’s the unlikeliest of scenarios because that would be a nightmare for the company’s margins. Or Yahoo could even be beefing up its team just enough to gain more authority within the Bing partnership, in case it wanted to advise Bing on what to do on the back end.
What I find interesting is that the term “search” is not really defined in this write up or most of the information I see that address findability. I am not sure what “search” means for Yahoo. The company has a history of listing sites by categories. Then the company indexed Web sites. Then the company used other vendors’ results. What’s next? I am not sure.
Observations? I have a few:
First, anyone looking for specific information has a tough job on their hands today. In a conversation with two experts in information retrieval, both mentioned that finding historical information via Web search systems was getting more difficult.
Second, queries run by different researchers return different results. The notion of comparative searching is tricky.
Third, with library funding shrinking, access to commercial databases is dwindling. For example, in Kentucky, patrons cannot locate a company news release from the 1980s using public library services.
The article about Yahoo is less about search and more about public relations. Is Yahoo or any vendor able to do something “new” in search? Without defining the term “search,” does it matter to the current generation of experts?
Personally I don’t want to influence a query. I want to locate information that is germane to a query that I craft and submit to an information retrieval system. Then I want to review results lists for relevant content and I want to read that information, analyze the high value information, synthesize it, and move on about my business.
I want to control the query. I don’t want personalization, feeds, or predictive analytics clouding the process. Does “search” mean thinking or taking what a company wants to provide to advance its own agenda?
Stephen E Arnold, December 4, 2013
November 27, 2013
The subtitle is the keeper, however: “No, I’m not insane.” The insane person is Wim Nijmeijer or Nicky Singh. Interesting semantic connection to either entity I believe. I learned this “insanity” stuff in a candidate chunk of possible PR ersatz http://goo.gl/ogVgIe. Since the publication of the New York Times’ story about Vocus and its PR spam, I have started a collection of search vendor messaging that may be a trifle light in the protein department.
Here’s the passage I noted:
Today Coveo announced that it will lead a session at Search Solutions 2013 on Wednesday, November 27 in London, UK.
No problem except that Coveo itself announces that its staff will explain the nuances behind “No, I’m not insane.” A third party “voice” might help.
There were some supporting “facts”. Here’s an example of a fact:
The reality is that many enterprise search implementations are far from simple, and often match the complexity of the systems they need to interface with. Coveo understands the complexity and challenges of enterprise search. Our revolutionary Search & Relevance Technology securely connects with all of an organization’s systems, and harnesses big, fragmented data from any combination of cloud, social and on-premise systems — without complex integrations.
Okay. Okay, well, “facts” may be too strong a word. I think the “revolutionary” and the “all” are going to be tough for me to accept. In a large organization, figuring out what not to make available can be time consuming in my experience. Toss is the information that will cause the company to feel a bit of heat, and you have some heavy lifting.
For instance, is “all” possible in today’s regulated environment. What about employee medical records, documents related to secret contracts and research work, salary information, clinical trial data, information related to a legal matter, and “any combination of cloud, social, and on premises systems”? Insane? Okay.
Well, maybe Coveo can deliver?
- On a conference call with an enterprise search vendor, I pointed out that marketing enterprise solutions has changed. Hyperbole and cheerleading have replaced the more mundane information that answers such questions as, “Will this system work?” There continues to be skepticism in some circles about the claims of search vendors.
- Sending messages about oneself are interesting but even Paris Hilton and Lady Gaga employ publicists. Sure, Lady Gaga uses a drone dress to get media coverage, but she doesn’t issue a news release that says, “No, I’m not insane.”
- Enterprise search groups on LinkedIn are struggling with the question, “Why do vendors get fired?” The reason goes back to the days of Verity. That company charted the course that many vendors wittingly or unwittingly followed; that is, promise absolutely anything to get the job. The legacy of Verity’s mind boggling complexity are marketing assertions that enterprise search works and can be up and running in a day.
Not even Google can make that eight hour assertion stick for the new Google Search Appliance with 100 percent confidence in my experience.
Anyway, by the time you read this, the lecture “No, I’m not insane” by a Coveo expert will be over. I suppose I can catch the summary in the Guardian. Stop the presses.
Stephen E Arnold, November 27, 2013
November 11, 2013
The article on ClickZ titled Alice Through the Looking Glass: Augmented Reality in the Real World introduces a new discipline that involves capturing vision behavior. The author cites both Google Glass and Qualcomm Vuforia as technologies capable of Augmented Reality (AR). They are capable of capturing the user’s vision and as a result, of improving his or her engagement. The article explains,
“Like Alice Through The Looking Glass, we become visitors navigating through the real AR world, which is not unlike charting visitor conversion paths in a website from the home page to the checkout confirmation page. The basic idea of augmented reality is to superimpose graphics, audio and other sensory enhancements over a real-world environment in real-time.”
How is this useful in business? The author explains his testing and research with a thorough example, following a user of AR through a store, seeing what they spend time looking at, (the longer they look, the higher their engagement levels) and then perhaps offering discount at checkout for sharing the image of the product they are purchasing. The research has been effective in real-world usage of AR, citing sporting goods purchases, movie tickets sold, and games purchased. It is easy to see how this technology might be a very powerful resource for marketing through the customer, but what has yet to be explained it how one might search the data being compiled.
Chelsea Kerwin, November 11, 2013