How to End Googles Search Monopoly if You Want To

August 29, 2014

The article on makeuseof titled Help End Google’s Search Monopoly: Use Something Else implores Internet users to consider alternatives for search on the basis of a very simple concept: monopolies are bad. Without a doubt, Google is a monopoly, with the Chinese Baidu in a lagging second place. The amount of power this gives Google is the main target of the article, not Google itself, interestingly. The article states,

“The ball is always in Google’s court – they control the search game. This breeds a culture of tailoring content to what Google wants, with the problem being that nobody really knows what this is. Most “SEO experts” will tell you they know how to get your site ranking highly, but really they have no greater insight into what goes on behind the scenes than you do.

We’re not bitter, that’s not the point of this article.”

They are referring to Panda, Google’s 2011 filter that removed lower quality content websites from searches. This benefitted some sites, but it also had far-reaching negative implications for any number of sites. This is why monopolies are bad, not because Google is inherently evil but because they are making decisions that can affect huge amounts of people and businesses. It may be too late to recommend alternatives like DuckDuckGo, since Google is so ingrained in its users as the only option for search.

Chelsea Kerwin, August 29, 2014

Sponsored by, developer of Augmentext

Short Honk: Surveillance Database Report

August 26, 2014

I wanted to document a report that ICREACH exists. For information, see The Intercept’s report. No further comment from Beyond Search.

Stephen E Arnold, August 26, 2014

Endeca Wins Over Beauty Retailer

August 26, 2014

To overhaul the customer experience on their site, ULTA Beauty turned to Endeca. We learn of the move from Integrated Solutions for Retailers in, “Thanx Media’s Oracle Endeca and ULTA Beauty Take Customer Experience to the Next Level.” Thanx Media is ULTA’s integrated-search-solutions provider. The press release tells us:

“Oracle Endeca has replaced a third party search solution, now tightly integrating the browse and search navigation, resulting in a consistent guest experience with minimal maintenance. The previous lack of integration with the third party search solution caused discrepancies in product data (such as pricing and inventory levels between search and browse) resulting in product listing pages that didn’t always match and a process that lacked the flexibility required by the e-commerce business team.”

Those are indeed serious problems for a retail site. How did the switch pan out? The write-up makes it clear that the reseller is very, very happy. Less clear is how, exactly, the system paid off for ULTA. Aside from a tangential reference to “positive Q4 results,” we are given no details. Oh, well. At least the middleman is pleased.

Cynthia Murrell, August 26, 2014

Sponsored by, developer of Augmentext

Questioning How To Search New Sound files

August 25, 2014

Sound is an underrated science, but it is quite an amazing topic to study. MIT News reports an amazing experiment: “Extracting Audio From Visual Information.” The article explains that Adobe, Microsoft, and MIT researchers developed an algorithm that can reconstruct an audio signal by analyzing minute vibrations of objects depicted in video. The team has been able to get audible files of the leaves of a potted plant, the surface of a glass of water, aluminum foil, and vibrations from a potato-chip bag.

The sound files can be used by law enforcement organizations, but MIT graduate student Abe Davis says it creates a “new kind of imaging.”

“ ‘We’re recovering sounds from objects,’ [Davis] says. ‘That gives us a lot of information about the sound that’s going on around the object, but it also gives us a lot of information about the object itself, because different objects are going to respond to sound in different ways.’”

The team speculates that the technology community will embrace the research and amazing applications will be developed from it. The new sound technology will also create a new slew of content. How will we search the new content? A specific and exact ontology will be needed to distinguish sound files. Will a search application smart enough to read the sound data be developed to identify the user’s information need? Oh wait, enterprise search systems index “all information” so it already exists.

Whitney Grace, August 25, 2014

Sponsored by, developer of Augmentext

Launching and Scaling Elasticsearch

August 21, 2014

Elasticsearch is widely hailed as an alternative to SharePoint or many of the other open source alternatives, but it is not without its problems. Ben Hundley from StackSearch offers his input on the software in his QBox article, “Thoughts on Launching and Scaling Elasticsearch.”

Hundley begins:

“Qbox is a dedicated hosting service for Elasticsearch.  The project began internally to find a more economical solution to Amazon’s Cloudsearch, but it evolved as we became enamored by the flexibility and power of Elasticsearch.  Nearly a year later, we’ve adopted the product as our main priority.  Admittedly, our initial attempt took the wrong approach to scale.  Our assumption was that scaling clusters for all customers could be handled in a generalized manner, and behind the scenes.”

Hundley walks through reader through several considerations that affect their own implementation: knowing your application’s needs, deciding on hardware, monitoring, tuning, and knowing when to scale. These are all decisions that must be made on the front-end, allowing for more effective customization. The upside of an open source solution like Elasticsearch is greater customization, control, and less rigidity. Of course for a small organization, that could also be the downside as time and staffing are more limited and an out-of-the-box solution like SharePoint is more likely to be chosen.

Emily Rae Aldridge, August 21, 2014

Sponsored by, developer of Augmentext

Google Search Has Been Improved. A Lot.

August 20, 2014

I do a lecture for the police and intelligence community. The focus is on the techniques helpful in finding information that answers a query. If a person types a query into Google, the results are ads, popular hits that others found useful, and search engine optimized content.

Consider looking for a “shotgun suppressor”. Ignore the quotes. Here’s the results from on August 20, 2014:


Pictures. Not too many adds. A video.

Where does one buy a shotgun suppressor? Run the query “purchase shotgun suppressor”.

The results are:


More pictures. Ads. and a couple of companies mentioned several times.

So it is easy to get information about a shotgun suppressor and buy one. Now, do some clicking and you will find that the links include auto mufflers from and some other results that are off point.

In order to nail the real deal, military grade suppressor, some additional work is required.

When I read “Google Made 890 Improvements To Search Over The Past Year”, I just sighed. The write up is a rah rah for Google. Here’s a passage that I highlighted:

In a Google+ post from Google head of search Amit Singhal, Google shares they have made “more than 890 improvements to Google Search last year alone.” In 2009, Google told us they made between 350 to 400 changes to search and in 2010, they said they made 550 improvements to search in the past year. Google’s Matt Cutts said in a video in 2010 they make one change per day to their core search algorithm. We also know Google tests hundreds of changes in a day but only some of them make the light of day.

Okay, run some queries. Has Google improved search, or has Google improved its methods for diffusing ads into results. My experience is that Google is great for information about Dr Dre and pizza. For other types of information, considerable effort is required to unearth useful, on point information.

By the way, the key to finding the shotgun suppressor is to use synonyms like moderator and to approach the problem using another Google service. The content is findable but I am not feeling lucky anymore.

Since everyone is now an “expert” in search, which of the top 10 changes to Google in the last decade ring your bell. How about “universal search”? Ever wonder why books, blogs, non US content are not included in a universal search? Think about it, please.

Stephen E Arnold, August 20, 2014

Search Vendors Learn From Comcast Sales Rep

August 19, 2014

Ryan Block experienced horrible customer service while trying to disconnect his Internet with Comcast. Prior to his experience, Comcast had a horrible reputation when it came to customer service and Block’s cancelation attempt brings to light an ongoing problem within the company. TechDirt comments on the situation in the article: “Behind The Veil: Comcast Techs Detail How Customer Service Is Really All Just ‘Sales.’”

The article reposts stories from The Verge where current and former employees confess their customer service stories. Their accounts amount to call center nightmares, stress, and Comcast’s drive to sell, sell, and sell! Comcast is definitely going to have future troubles.

“The question that arises with this kind of thing, particularly with Comcast operating a multi-tiered group of call centers, some outsourced, some not, is whether the company has become too unwieldy to actually meet customer requests. It’s fine for a company to work to retain customers, but that’s typically done by providing great service, not irritating the shit out of anyone who doesn’t think your company’s poop doesn’t stink. Far from too big to fail, Comcast, recently in massive merger discussions, may be getting too big to succeed.”

We’ll leave comments on illegal monopolies for another article, but this brings to mind what search vendors can take from this situation. Poor customer service equals poor client retention and fewer sales. It does not take long for customer complaints to go viral on the Internet, making reputation even more important. Search vendors offer numerous solutions to help with customer support and their products can improve a customer’s experience. Now would be a good time for search vendors to market their customer service products.

Whitney Grace, August 19, 2014
Sponsored by, developer of Augmentext

One User Finds Some Flaws in Elasticsearch

August 18, 2014

We are jazzed about Elasticsearch. Our own search expert Stephen E. Arnold, who has been yearning for some real innovation in search for years now, recently declared, “I will be telling those who attend my lectures to go with Elasticsearch. That’s where the developers and the money are.” Personally, I’m inclined to go with the search expert here (though I admit I may be a bit biased.) This declaration is just to preface my reaction to a post at Sammaye’s Blog, “Things I Have Learnt in the First 5 Minutes of Using Elastic Search.” Apparently, how to spell the name correctly was not one of those things.

Still, it looks like programmer Sam Millman (aka Sammaye) may have some good points. For example, he describes the querying as “the most verbose in the universe,” balks at the requirement to define indexes client-side, and claims Lucene is a bad platform on which to base search in the first place. He also calls the documentation terrible, and bad documentation happens to be a pet peeve of mine. (I’ve written documentation. If you must supply it, you might as well make it comprehensive, organized, and well-written. It’s not that difficult.) Millman explains:

“Its documentation is great at explaining the API, no doubt about it but if you want to actually find out how something works and why something is then you have to constantly ask StackOverflow. It just describes what parameters to put in and then leaves the rest up to you thinking that you don’t want to bother yourself with those details. We do though, we are not bandwagoning your product, we want to know how sharding and replication works, how indexes work and how to manage the product and more. Even when looking at the API the documentation can sometimes be…unhelpful. Mainly due to its huge font-size, yet tiny middle centered layout, English language problems and disorganisation. Overall I came out less than impressed about Elastic Searches documentation. I actually Google search everything first so I don’t have to navigate that mess.”

So, perhaps Elasticsearch is not perfect. See the article for Millman’s full roster of complaints. However, if Arnold is correct and this is “where the developers and the money are” right now, vexing problems should be fixed in short order. It would be a mistake to not take Elasticsearch seriously. Formed in 2012, the company is based in Amsterdam with offices in the U.S., the U.K., France, Germany, and Switzerland. They are also hiring as of this writing, in case anyone here wants to help them iron out some wrinkles.

Cynthia Murrell, August 18, 2014

Sponsored by, developer of Augmentext

The Guardian Explores HP Autonomy

August 16, 2014

I read “Hewlett-Packard Allegations: Autonomy Founder Mike Lynch Tries to Clear Name.” The British “real” newspaper focuses on Mike Lynch, the founder of Autonomy. I am convinced that Autonomy pitched the value of its company to a number of firms. I know that Hewlett Packard bought Autonomy. I assume that spending $11 billion was not a K Mart blue light special impulse purchase. I know that HP has had what the MBAs call “governance challenges.” These range from allegations of getting frisky with folks to management churn. I know that for me, the HP of electronic devices yielded to the HP of the ink cartridges.

Here’s a point I highlighted in the Guardian’s write up:

Meanwhile, lawyers on all sides are using legal privilege to sling mud. Lynch says it is not only his name that has been stained, but that of the British technology industry. Autonomy’s accounting and marketing methods had attracted criticism before the HP acquisition, but Lynch was also a poster child for the achievements of Cambridge’s Silicon Fen. The Autonomy affair casts a shadow, and a conclusion from the SFO is overdue.

I have a slightly different view of the dust up. Folks want to believe that information retrieval will generate another Google. Because of those expectations, executives whose expertise in search extends to running a Google search on a mobile device assume they know about content processing.

When buyers get excited about a purchase, some people buy Bugatti Veyrons and spring for gold iPhones. Others snap up search companies and expect the money to roll in like the oohs and aahs at the golf club when the Veyron rolls up.

Wrong. The dust up between HP and Autonomy is an illustration of what happens when folks without too much understanding of content processing’s complexities covet a home run. The impact does affect Mike Lynch, a Cambridge PhD and real live inventor.

The collateral damage is on the buyers of search companies who toss millions at a sector without understanding how difficult it is to create a search company that is not selling ads or living exclusively on Department of Defense largesse.

HP bought a company with a strong brand, customers, and technology that when properly resourced works. HP did not buy a Google scale money stream, a Palantir clinging to the US government, or a break even metasearch system.

The impact on the reputation of Autonomy professionals is significant. What does this dispute do to other search and content processing companies? Search is tough enough without having a megaton dispute played out in the datasphere.

HP did not have to buy Autonomy. Microsoft passed. Oracle passed. HP bought. HP had time and resources to dig through Autonomy. If it did not, then HP created its own problem. If it did, HP created its own problem. Autonomy, with 15 years of history, was looking for a buyer. My hunch is that HP was looking for a Google and bought a different business because HP convinced itself it could generate more money than Autonomy could. HP found out that it could not match Autonomy’s revenues. Whom does any self respecting MBA or lawyer blame? The other guy.

This hassle says much about HP. Sadly it affects other search and content processing companies as well.

Stephen E Arnold, August 16, 2014

Venture Outcome: The Search and Content Processing Angle

August 14, 2014

I suggest you read “Venture Outcomes Are Even More Skewed Than You Think.” The write up contains several factoids. I highlighted one and added a couple of exclamation points. I suggest you print out the article, grab a writing instrument, and do your own filtering.

The main point of the write up is buried in the paragraph that begins “This really underscores the challenge of crating a venture portfolio that produces reasonable returns.” The factoid I honored with exclamation points is:

In my hypothetical $100M fund with 20 investments, the total number of financings producing a return above 5x was 0.8 – producing almost $100M of proceeds. My theoretical fund actually didn’t find their purple unicorn, they found 4/5ths of that company. If they had missed it, they would have failed to return capital after fees.  Even if we doubled the number of portfolio companies in the hypothetical portfolio, a full quarter of the fund’s return comes from the roughly ½ of a company they invested in that generated 10x or above. Had they missed it, they would have produced a return that roughly approximated investing in bonds – not the kind of risk adjusted return they or their investors were looking for.

I know this is a hypothetical. Assume that the analysis is off by plus or minimum 10 percent. What do we get? Lousy returns; that is, returns comparable to dumping cash into bonds. I think about the banking and venture firm meetings in which I have participated. I cannot recall any of the smiling MBAs considering that their best ideas could perform on a par with bonds. My hunch is that the people who pushed money into venture funds and bank VP-inspired investments are not thinking bond-type yield.

If the number is accurate, I wonder if those folks who have pumped tens of millions of dollars into outfits promising a money ball from search and content processing will get their money back. Forget an upside. Break even may be tough. Search and content processing makes headlines like this one every day:


To get similar results, navigate to Google News and enter the query Autonomy HP or Autonomy CFO.

The second item I circled with my pink marker was a diagram:


The important part is the small number of “winners” graphically embodied in the miniscule 0.4% column. This is a broad swath of investments. For search and content processing, the payoffs have to be measured in what money flows via revenues or a sell off like Fast Search to Microsoft, Exalead to Dassault, or Autonomy to HP. The number of folks who made big bucks and are really happy may be modest. In fact, judging from the legal hassles with regard to Fast Search and the recent HP Autonomy headlines, even those who were MBA winners may have headaches. Information retrieval seems to deliver a number of headaches for stakeholders.

The third item is the factoid that makes clear the failure rate of start ups. Search and content processing poses similar challenges. There is a twist. Once a search and content processing sells to a larger firm, how many have become major money pumps to the acquiring companies? The question is very difficult to answer. The absence of information tells me that there are not too many feel good stories to tell. The pleas on LinkedIn enterprise search discussion threads for positive case studies about search are easy to ignore. Good news with regard to search and content processing is not sloshing around the Big Data bucket in which we exist.

How long with companies that have been in business for many years promising a money ball from search be able to survive? How long will the old soft shoe about search and content processing open checkbooks? How many years will it take some information retrieval companies to replace red ink with the blank ink of hefty after tax profits? How long will it take those seeking answers to information retrieval problems to wake up to the fact that consultant saucisson, Star Trek fantasies, and marketing hyperbole are unlikely to deliver a Disneyland-like “win”?

The data set for the Seth Levine write up is large enough to warrant a tentative answer, “Probably never.” Search and content processing are different. The algorithms and methods are decades old. Talk does not change what can be accomplished with affordable computational resources. Pumping money into search, therefore, may be painful when the actual financial data are reviewed by investors and stakeholders.

Why aren’t their abundant “good news” cases for search and content processing? There just aren’t that many. Think a power curve of implementation successes. There are more examples of search going off the rails than home runs. This is surprising when so many profess to be experts in search and so much money has been injected into information retrieval start ups. The business strategy of search and content processing companies may be raising money. Any other work may be of little interest.

Stephen E Arnold, August 14, 2014

Next Page »