AIs Newest Hurdle Happens When the Machines Hallucinate

November 27, 2017

Artificial Intelligence has long been thought of as an answer to airport security and other areas. The idea of intelligent machines finding the bad guys is a good one in theory. But what if the machines aren’t as clever as we think? A stunning new article in The Verge, “Google’s AI Thinks This Turtle is a Gun and That’s a Problem,” made us sit up and take notice.

As you can guess by the title, Google’s AI made a huge flub recently:

This 3D-printed turtle is an example of what’s known as an “adversarial image.” In the AI world, these are pictures engineered to trick machine vision software, incorporating special patterns that make AI systems flip out. Think of them as optical illusions for computers. You can make adversarial glasses that trick facial recognition systems into thinking you’re someone else, or can apply an adversarial pattern to a picture as a layer of near-invisible static. Humans won’t spot the difference, but to an AI it means that panda has suddenly turned into a pickup truck.

This adversarial image news is especially concerning when you consider how quickly airports are implementing this technology. Dubai International airport is already using self-driving carts for luggage. It’s only a matter of time until security screening goes the same way. You’d best hope they iron out adversarial image issues before we do.

Patrick Roland, November 27, 2017

Free Services: What Happens When They Are Killed Off?

November 3, 2017

In the salad days of online, one paid for “time” (the online connection) and one paid for the “content” (the citations, data, full text). Today data are free. Hooray.*

For users of the the Google flight information, the news that Google was likely to shut down its flight data feed is bad news. Even worse, those nifty MBA inspired spreadsheets which happily omitted the cost of flight data are going to have to be re-imagined.

And Oath (remember Yahoo?) is, it seems, going to cut off the finance, if the story in Hacker News is accurate. The write up states:

Yahoo Finance has apparently killed is API. Zero warning. Lots of apps probably use this. Before, you could get stock information by using Now, you get the following message: It has come to our attention that this service is being used in violation of the Yahoo Terms of Service. As such, the service is being discontinued. For all future markets and equities data research, please refer to What violation of TOS? People have been using this for years without any issues. If you are going to cut this off, how about a warning and heads up? Guess that’s what we should expect from OATH / Verizon.

The comments are interesting.

Net net: The online model from the 1969 to 1995 phase of online may be poking its nose from a Rip Van Winkle snooze.

And those spreadsheets? MBAs are crafty. The numbers will work out—at least in Excel. In real life? Hmmm. Good question.

Stephen E Arnold, November 3, 2017

* Editor’s update: Heads up. I last night (November 3, 2017) I received an impassioned and mom-like communication from a person who wanted confidentiality about the information he was about to impart via Gmail email. (Isn’t that type of email parsed by smart software for the purpose of collecting ad revenue and data?) The alleged former Googler (aka Xoogler) was unaware that I was at dinner with my wife enjoying a grilled squirrel burger with the cheese on the bottom in the approved Google manner. But this write up was an urgent matter in the mind of the agitated Xoogler eager to share confidential information with me. Lucky me! The email included numbers and a statement that I had to rewrite this article because I was, as I have noted on numerous occasions in the course of this 10 year old Beyond Search blog, an “addled goose”. The email made clear that killing Google services and products does no harm, and I was wrong, incorrect, off base, and a Bambi brained deer. Please, check out the source story from Marketwatch. Make up your own mind, gentle reader, because I try to present my opinion whilst separating the giblets from the goosefeathers.  My view is that abrupt, unilateral modifications of services is a good thing for some devlopers and users. But I do enjoy confidential communications about the inner workings of my favorite search engine as I munch my burger with cheese on the bottom in the Sundar Pichai approved manner. Plus, I enjoy recalling the Google Reader, Google Talk, Google Health, Knol, Google Buzz, and my favorite and the fave of some Brazilians, Orkut. You don’t? Well, you, unlike me, are not trying to be Googley. To refresh your memory, check out the Google Graveyeard. Do you have a problem with terminated services? In my opinion, termination with extreme prejudiced is in your best interests. Now put the cheese on the bottom of the meat patty.

Enterprise Search Revisionism: Can One Change What Happened

March 9, 2016

I read “The Search Continues: A History of Search’s Unsatisfactory Progress.” I noted some points which, in my opinion, underscore why enterprise search has been problematic and why the menagerie of experts and marketers have put search and retrieval on the path to enterprise irrelevance. The word that came to mind when I read the article was “revisionism” for the millennials among us.

The write up ignores the fact that enterprise search dates back to the early 1970s. One can argue that IBM’s Storage and Information Retrieval System (STAIRS) was the first significant enterprise search system. The point is that enterprise search as a productized service has a history of over promising and under delivering of more than 40 years.

image.pngEnterprise search with a touch of Stalinist revisionism.

Customers said they wanted to “find” information. What those individuals meant was have access to information that provided the relevant facts, documents, and data needed to deal with a problem.

Because providing on point information was and remains a very, very difficult problem, the vendors interpreted “find” to mean a list of indexed documents that contained the users’ search terms. But there was a problem. Users were not skilled in crafting queries which were essentially computer instructions between words the index actually contained.

After STAIRS came other systems, many other systems which have been documented reasonably well in Bourne and Bellardo-Hahn’s A History of Online information Services 1963-1976. (The period prior to 1970 describes for-fee research centric online systems. STAIRS was among the most well known early enterprise information retrieval system.)  I provided some history in the first three editions of the Enterprise Search Report, published from 2003 to 2007. I have continued to document enterprise search in the Xenky profiles and in this blog.

The history makes painful reading for those who invested in many search and retrieval companies and for the executives who experienced the crushing of their dreams and sometimes career under the buzz saw of reality.

In a nutshell, enterprise search vendors heard what prospects, workers overwhelmed with digital and print information, and unhappy users of those early systems were saying.

The disconnect was that enterprise search vendors parroted back marketing pitches that assured enterprise procurement teams of these functions:

  • Easy to use
  • “All” information instantly available
  • Answers to business questions
  • Faster decision making
  • Access to the organization’s knowledge.

The result was a steady stream of enterprise search product launches. Some of these were funded by US government money like Verity. Sure, the company struggled with the cost of infrastructure the Verity system required. The work arounds were okay as long as the infrastructure could keep pace with the new and changed word-centric documents. Toss in other types of digital information, make the system perform ever faster indexing, and keep the Verity system responding quickly was another kettle of fish.

Research oriented information retrieval experts looked at the Verity type system and concluded, “We can do more. We can use better algorithms. We can use smart software to eliminate some of the costs and indexing delays. We can [ fill in the blank ].

The cycle of describing what an enterprise search system could actually deliver was disconnected from the promises the vendors made. As one moves through the decades from 1973 to the present, the failures of search vendors made it clear that:

  1. Companies and government agencies would buy a system, discover it did not do the job users needed, and buy another system.
  2. New search vendors picked up the methods taught at Cornell, Stanford, and other search-centric research centers and wrap on additional functions like semantics. The core of most modern enterprise search systems is unchanged from what STAIRS implemented.
  3. Search vendors came like Convera, failed, and went away. Some hit revenue ceilings and sold to larger companies looking for a search utility. The acquisitions hit a high water mark with the sale of Autonomy (a 1990s system) to HP for $11 billion.

What about Oracle, as a representative outfit. Oracle database has included search as a core system function since the day Larry Ellison envisioned becoming a big dog in enterprise software. The search language was Oracle’s version of the structured query language. But people found that difficult to use. Oracle purchased Artificial Linguistics in order to make finding information more intuitive. Oracle continued to try to crack the find information problem through the acquisitions of Triple Hop, its in-house Secure Enterprise Search, and some other odds and ends until it bought in rapid succession InQuira (a company formed from the failure of two search vendors), RightNow (technology from a Dutch outfit RightNow acquired), and Endeca. Where is search at Oracle today? Essentially search is a utility and it is available in Oracle applications: customer support, ecommerce, and business intelligence. In short, search has shifted from the “solution” to a component used to get started with an application that allows the user to find the answer to business questions.

I mention the Oracle story because it illustrates the consistent pattern of companies which are actually trying to deliver information that the u9ser of a search system needs to answer a business or technical question.

I don’t want to highlight the inaccuracies of “The Search Continues.” Instead I want to point out the problem buzzwords create when trying to understand why search has consistently been a problem and why today’s most promising solutions may relegate search to a permanent role of necessary evil.

In the write up, the notion of answering questions, analytics, federation (that is, running a single query across multiple collections of content and file types), the cloud, and system performance are the conclusion of the write up.


The use of open source search systems means that good enough is the foundation of many modern systems. Palantir-type outfits, essential an enterprise search vendors describing themselves as “intelligence” providing systems,, uses open source technology in order to reduce costs, shift bug chasing to a community, The good enough core is wrapped with subsystems that deal with the pesky problems of video, audio, data streams from sensors or similar sources. Attivio, formed by professionals who worked at the infamous Fast Search & Transfer company, delivers active intelligence but uses open source to handle the STAIRS-type functions. These companies have figured out that open source search is a good foundation. Available resources can be invested in visualizations, generating reports instead of results lists, and graphical interfaces which involve the user in performing tasks smart software at this time cannot perform.

For a low cost enterprise search system, one can download Lucene, Solr, SphinxSearch, or any one of a number of open source systems. There are low cost (keep in mind that costs of search can be tricky to nail down) appliances from vendors like Maxxcat and Thunderstone. One can make do with the craziness of the search included with Microsoft SharePoint.

For a serious application, enterprises have many choices. Some of these are highly specialized like BAE NetReveal and Palantir Metropolitan. Others are more generic like the Elastic offering. Some are free like the Effective File Search system.

The point is that enterprise search is not what users wanted in the 1970s when IBM pitched the mainframe centric STAIRS system, in the 1980s when Verity pitched its system, in the 1990s when Excalibur (later Convera) sold its system, in the 2000s when Fast Search shifted from Web search to enterprise search and put the company on the road to improper financial behavior, and in the efflorescence of search sell offs (Dassault bought Exalead, IBM bought iPhrase and other search vendors), and Lexmark bought Brainware and ISYS Search Software.

Where are we today?

Users still want on point information. The solutions on offer today are application and use case centric, not the silly one-size-fits-all approach of the period from 2001 to 2011 when Autonomy sold to HP.

Open source search has helped create an opportunity for vendors to deliver information access in interesting ways. There are cloud solutions. There are open source solutions. There are small company solutions. There are more ways to find information than at any other time in the history of search as I know it.

Unfortunately, the same problems remain. These are:

  1. As the volume of digital information goes up, so does the cost of indexing and accessing the sources in the corpus
  2. Multimedia remains a significant challenge for which there is no particularly good solution
  3. Federation of content requires considerable investment in data grooming and normalizing
  4. Multi-lingual corpuses require humans to deal with certain synonyms and entity names
  5. Graphical interfaces still are stupid and need more intelligence behind the icons and links
  6. Visualizations have to be “accurate” because a bad decision can have significant real world consequences
  7. Intelligent systems are creeping forward but crazy Watson-like marketing raises expectations and exacerbates the credibility of enterprise search’s capabilities.

I am okay with history. I am not okay with analyses that ignore some very real and painful lessons. I sure would like some of the experts today to know a bit more about the facts behind the implosions of Convera, Delphis, Entopia, and many other companies.

I also would like investors in search start ups to know a bit more about the risks associated with search and content processing.

In short, for a history of search, one needs more than 900 words mixing up what happened with what is.

Stephen E Arnold, March 9, 2016

Want to Know What Happens Online Every 60 Seconds?

December 4, 2015

I thought I knew. Time wasting, distractive behavior, and non productive behavior.

Wrong again. I read “What Happens Online Every Minute?” The document is an infographic which reveals a number of factoids. (Who knows if these are accurate or a 20 something daydream.)

  • Every minute Vine users play 1,041,666 videos. I like the precision of this number. The happenstance of the sign of the devil is a delight. Remember? 666.
  • In seconds Alphabet Googlers who can probably spell “video” nine out of ten times upload 300 hours of new video. The idea is that in one minute, you have the opportunity to fritter away 300 hours of couch potato time whether in a Google self driving car, in your own car, or standing on a line to buy a slice in Manhattan.
  • In 1/60th of an hour, Twitter users send 347,222 tweets. How many of these are from marketers? No info. But again the precision of the number is outstanding. I like the 222 number which connotes faith. I have faith in Twitter. Also, 222 is a a strobogrammatic number. Nifty, eh?

View the original. There will be a factoid to make your day or at least a few seconds so you can get back to viewing the video goodness.

Stephen E Arnold, December 4, 2015

Whatever Happened to Social Search?

January 7, 2015

Social search was supposed to integrate social media and regular semantic search to create a seamless flow of information. This was one of the major search points for a while, yet it has not come to fruition. So what happened? TechCrunch reports that it is “Good Riddance To Social Search” and with good reason, because the combination only cluttered up search results.

TechCrunch explains that Google tried Social Search back in 2009, using its regular search engine and Google+. Now the search engine mogul is not putting forth much effort in promoting social search. Bing tried something by adding more social media features, but it is not present in most of its search results today.

Why did this endeavor fail?

“I think one of the reasons social search failed is because our social media “friendships” don’t actually represent our real-life tastes all that well. Just because we follow people on Twitter or are friends with old high school classmates on Facebook doesn’t mean we like the same restaurants they do or share the politics they do. At the end of the day, I’m more likely to trust an overall score on Yelp, for example, than a single person’s recommendation.”

It makes sense considering how many people consider their social media feeds are filled with too much noise. Having search results free of the noiwy makes them more accurate and helpful to users.

Whitney Grace, January 07, 2014
Sponsored by, developer of Augmentext

Appen Uses Humans to Improve Non-English Search Relevance

March 21, 2014

The Appen explanation titled Query Relevance delves into the work that the language, search and social technology company has done recently to improve natural language search. Linguist PhD Julie Vonwiller founded the company in 1996 with her engineer husband Chris Vonwiller. In 2010, Appen merged with Butler Hill Group and began making strides in language resources, search, and text. The article explores the issues at hand when it comes to natural language search,

“Even a query as seemingly simple as the word “blue” could be looking for any of the following: a description or picture of the color, a television show, a credit card, a misspelling of an electronic cigarette brand, or a rap artist. By analyzing what the most likely user intent is and returning valid and appropriate results in the correct order of relevance, we encourage a relationship whereby the user will return again and again to our client’s search engine.”

Appen has established a “global network” of locals who are trained experts in the language and local culture. This team allows for the most accurate interpretations of queries from regional users. The company is continually working to improve their processes, both through collaboration with users and advances in the program to provide the best possible results.

Chelsea Kerwin, March 21, 2014

Sponsored by, developer of Augmentext

What is Happening with Natural Language Processing?

May 29, 2013

Why Are We Still Waiting for Natural Language Processing, an article on The Chronicle of Higher Education, explores the failure of the 21st century to produce Natural Language Processing, or NLP. This would mean the ability of computers to process natural human language. The steps required are explained in the article,

“ In the 1980s I was convinced that computers would soon be able to simulate the basics of what (I hope) you are doing right now: processing sentences and determining their meanings.

To do this, computers would have to master three things. First, enough syntax to uniquely identify the sentence; second, enough semantics to extract its literal meaning; and third, enough pragmatics to infer the intent behind the utterance, and thus discerning what should be done or assumed given that it was uttered.”

Currently, typing a question into Google can result in exactly the opposite information from what you are seeking. This is because it is unable to infer, since natural conversation is full of gaps and assumptions that we are all trained to leap through without failure. According to the article, the one company that seemed to be coming close to this technology was Powerset in 2008. After making a deal with Microsoft, however, their site now only redirects to Bing, a Google clone. Maybe NLP like Big Data, business intelligence, and predictive analytics is just a buzzword with marketing value.

Chelsea Kerwin, May 29, 2013

Sponsored by, developer of Augmentext

A Whatever Happened To… HP and TeraText

May 3, 2013

My Overflight for search vendors generated an odd “recent” update. The item originated from Chrlettestuvv’s Blog. The story pointed to an item called “SAIC’s TeraText Solutions Signs Strategic Alliance Agreement with HP.” The source was an “article from Software Industry Report, August 1, 2005.

HP apparently needed something more than TeraText, which shared some similarities with the now forgotten iPhrase and anticipated features in MarkLogic Server today. I find these search- and content-processing related tie ups interesting.

Each time I recall one or some glitch in the Internet surfaces a partner factoid, I am more confident that search vendors and some growth hungry large corporations move from speed dating to speed dating activity. Do the engagements lead to marriages? Sometimes I suppose. Other times the companies, like boy friends and girl friends in high school, the couples just drift apart.

Search, however, remains mostly unchanged.

Stephen E Arnold, May 3, 2013

Sponsored by Augmentext

What Happened to Google? Nothing.

October 18, 2012

I have been amusing myself with the various analyses of the Google missteps. I just got off the phone with one of my clients who asked me, “What has happened to Google?” My answer, “Nothing.”

For context, check out “Google Reports Profit, Sales That Miss Analysts’ Estimates.” The estimable Bloomberg said:

The company earlier this year spent $12.4 billion on Motorola Mobility Holdings, pushing it further into the hardware market and stepping up its rivalry with Apple Inc. (AAPL) Third-quarter total revenue, including the acquisition of Motorola Mobility, rose 45 percent from a year earlier, while expenses rose 71 percent over the same time period. Motorola Mobility contributed sales of $2.58 billion for the period. Net income declined to $2.18 billion, or $6.53 a share, from $2.73 billion, or $8.33, a year earlier.

Everything looks pretty good considering the dilution of ad precision, the lousy economy, and the lost voice of Larry Page.

Let me highlight two other points. First, the Motorola deal is going to be exciting and expensive. Second, Google operates via controlled chaos. The approach may work like a champ among rocket scientists. Among lesser souls, management is a bit more tricky. Glue together without a clamp Motorola Mobility and controlled chaos, and I think we have a pivot point for the happy crowd in Mountain View.

One more thing: those pesky regulators are not going quietly into the good night.

Stephen E Arnold, October 18, 2012

What Happens When One NOTs Out Web Sites?

April 30, 2012

We learned about Million Short. The idea is that a user can NOT out a block of Web sites. The use case is that a query is passed to the system, and the system filters out the top 1,000 or 100,000 Web sites. We think you will want to check it out. Once you have run some sample queries, consider these questions:

  1. When a handful of Web sites attract the most traffic, is popularity the road to comprehensive information retrieval?
  2. When sites are NOTted out, what do you gain? What do you lose?
  3. How do you know what is filtered from any Web search index? Do you run test queries and examine the results?

Enjoy. If you know a librarian, why not involve that person in your test queries?

Stephen E Arnold, May 1, 2012

Sponsored by PolySpot

Next Page »

  • Archives

  • Recent Posts

  • Meta