Useful Tip for Elasticsearch Admins

December 1, 2014

Short honk: Elasticsearch continues to outpace the other open source search vendors. I know that some of the companies with venture funding folks breathing down their necks say otherwise. Keep in mind that there is a difference between performing and saying one is able to perform. Elasticsearch delivers functionality that we find valuable. Also, from the information flowing through my Overflight system, Elasticsearch works. Really!

A useful security configuration article offers helpful tips. Navigate to “Elasticsearch: Dealing with Complex Permissions.” The short article provides some code snippets that you will find instructive.

Stephen E Arnold, December 1, 2014

Finding Books: Not Much Has Changed

December 1, 2014

Three or four years ago I described what I called “the book findability” problem. The audience was a group of confident executives trying to squeeze money from an old school commercial database model. Here’s how the commercial databases worked in 1979.

  1. Take content from published sources
  2. Create a bibliographic record, write or edit the abstract included with the source document
  3. Index it with no more than three to six index terms
  4. Digitize the result
  5. Charge a commercial information utility to make it available
  6. Get a share of the revenues.

That worked well until the first Web browser showed up and individuals and institutions began making information available online. There are a number of companies that still use variations of this old school business model. Examples include newspapers that charge a Web browser user for access to content to outfits like LexisNexis, Ebsco, Cambridge Scientific Abstracts, and other outfits.


As libraries and individuals resist online fees, many of the old school outfits are going to have to come up with new business models. But adaptation will not be easy. Amazon is in the content business. Why buy a Cliff’s Notes-type summary when there are Amazon reviews? Why pay for news when a bit of sleuthing will turn up useful content from outfits like the United Nations or off the radar outfits like World News at Tech information is going through a bit of an author revolt. While not on the level of protests in Hong Kong, a lot of information that used to be available in research libraries or from old school database providers is available online. At some point, peer reviewed journals and their charge the author business models will have to reinvent themselves. Even recruitment services like LinkedIn offer useful business information via

One black hole concerns finding out what books are available online. A former intelligence officer with darned good research skills was not able to locate a copy of my The New Landscape of Search. You can find it here for free.

I read “Location, Location: GPS in the Medieval Library.” The use of coordinates to locate a book on a shelf or hanging from a wall anchored by a chain is not new to those who have fooled around with medieval manuscripts. Remember that I used to index medieval sermons in Latin as early as 1963.

What the write up triggered was the complete and utter failure of indexing services to make an attempt to locate, index, and provide a pointer to books regardless of form. The baloney about indexing “all” information is shown to be a toothless dragon. The failure of the Google method and the flaws of the Amazon, Library of Congress, and commercial database providers is evident.

Now back to the group of somewhat plump, red face confident wizards of commercial database expertise. The group found my suggestion laughable. No big deal. I try to avoid the Titanic type operations. I collected my check and hit the road.

There are domains of content that warrant better indexing. Books, unfortunately, is one set of content that makes me long for the approach that put knowledge in one place with a system that at least worked and could be supplemented by walking around and looking.

No such luck today.

Stephen E Arnold, December 1, 2014

Poor Search Equals Poor E-Sales

November 27, 2014

Logically this statement makes sense and if you have been paying attention to facts you already knew it:

“A recent study by the Baymard Institute, an independent web research institute with a focus on e-commerce usability and optimization, found that many of the top 50 U.S. e-commerce sites are lacking essential e-commerce search capabilities which is hindering current online sales.”

Please feel free to insert your favorite exasperation for pointing out the obvious. This is something that even an experienced online retail shopper could tell you. Digital Journal covers the story in “Baymard Institute Study Finds Major Problems With Search On Leading E-Commerce Sites.”

Baymard found that most users don’t like browsing through categories. The search function on these big e-retailers also found they don’t have a spell check feature, did not support thematic or product searches, and required specific jargon.

EasyAsk responded to the Baymard’s with a white paper detailing how e-commerce Web site can improve their search feature to improve sales. One way is supporting natural language search. The white paper is available for free download.

Whitney Grace, November 27, 2014
Sponsored by, developer of Augmentext

Disappearing Content: You Cannot Search for It if It Is Not There

November 25, 2014

The issue of shaped and filtered content is becoming more and more of a mainstream topic. I read “Uber Removed Blog Post from Data Science Team That Examined Link between Prostitution and Rides.” The world’s oldest profession meets the world’s newest ride service. I noted this passage in the write up:

“The company examined its rider data, sorting it for anyone who took an Uber between 10 p.m. and 4 a.m. on a Friday or Saturday night. Then it looked at how many of those same people took another ride about four to six hours later – from at or near the previous nights’ drop-off point. Yes, Uber can and does track one-night stands. Consider it the Uber equivalent of the walk of shame.”

How will this corporate approach to content play out? My hunch is that content has been getting removed for a long time. I recall looking for information about the Spyglass browser decades ago and finding a 404 error.

At least today there are copies of the Web, caches on public systems, and people who store content on their drives. Nevertheless, most people cannot search for content that is not “there.” Has anyone looked for CMS information about the original MIC, RAC, and ZPIC contractors? My hunch is that more attention should be paid to content that goes missing, not because of its prurient nature, but because the disappearance of content provides very useful information about the behavior of people, systems, and organizations.

Stephen E Arnold, November 25, 2014

Elasticsearch Ups the Pressure on LucidWorks (Really?)

November 23, 2014

I am not too keen on videos. I prefer reading hard copies. I did find the video referenced in “Elasticsearch Uses Power of Community for Open Source Analytics” useful. My team and I are putting the finishing touches on a report that points out how enterprise search vendors have been leapfrogged by vendors rarely considered by mid tier consultants and the self appointed experts in search. The video drives home a simple point: Combining open source technologies delivers information access functions that are more useful to users than laundry lists, odd ball point and click suggested content, and confusing mash ups of information presented without context.

Why the reference to Lucid? One of the firm’s presidents had been involved with Jaspersoft, an open source analytics outfit. Despite this “inside track”, Elasticsearch has powered past Lucid, leaving that open source vendor struggling to reach parity with Elasticsearch. Elasticsearch itself faces challenges, but that’s the name of the game when keyword search is the keystone of a service. For now, Elasticsearch leaves competitors rushing to close the gap. By the way, this subject was the focal point of one of Dave Schubmehl’s IDC reports that surfed on my name. The juicy part about the “gap” was edited from my original write up. Nevertheless, the facts remain valid. Kudos to Elasticsearch.

Stephen E Arnold, November 23, 2014

Want to Lose Weight? Google Results Could Make You Fat

November 18, 2014

I never know what to make of article that report about research results. Mistakes that would not fly in a Stats 101 class are the norm. I did work through “Poor-Quality Weight Loss Advice Often Appears First in an Online Search.”

Here’s the passage that I highlighted with my new bright pink marker:

The study reveals that the first page of results, using a search engine like Google, is likely to display less reliable sites instead of more comprehensive, high-quality sites, and includes sponsored content that makes unrealistic weight loss promises.

I would not be surprised if there were a Federal grant boosting this ground breaking, never before thought of, issue.

I find the results presented by advertising supported search engines incredibly useful, relevant, and on point. The notion that one might have to use a system other than Bing or Google to get accurate information is a new thought.

I liked this bit about the timeliness and rigor of the research too:

In 2012, the researchers accessed 103 websites for queries specific to weight loss and scored the content on its adherence to available evidence-based guidelines for weight loss. Medical, government and university sites ranked highest, along with blogs.

Yes, blogs and governmental entities are fonts of accurate information. With data from a mind boggling 103 Web sites to evaluate, I am amazed with the speed with which the information found its way to an online publication.

Stephen E Arnold, November 18, 2014

Sinequa Launches U.S. Subsidiary

November 13, 2014

Another French company is setting up shop in America. Business Wire reports, “Sinequa to Open New York Subsidiary Following Continued Success in U.S. Market.” The search and analytics company has found so much prosperity here they’ve decided to make it official. The write-up tells us:

“The creation of the U.S. Corporation and the New York office is a necessary next step as Sinequa has recently announced major client wins in the U.S including AstraZeneca, Biogen Idec, Mercer, and several ongoing RFPs. Sinequa is projecting that the largest part of the company’s future growth will come from the U.S., and they are also expecting to extend their presence there as well.

“The first employee of Sinequa Corp. is Nicolas Brel, who brings his deep experience with Sinequa to New York as the company’s Solutions Architect. The hiring process for a new sales manager is well advanced, and Xavier Pornain, VP Sales & Alliances, will soon relocate to the U.S. and assume the role of country manager and head of Sinequa Corp.”

Will Sinequa outperform Dassault (Exalead), Antidot, or PolySpot? The French market appears to be problematic. The U.S. market, on the other hand, seems to be a slam dunk compared to demand for proprietary search in the healthy European countries. Interesting.

Launched in 2002, Sinequa serves hundreds of organizations around the world. Naturally, the company boasts strong business analytics across multiple data sources and types. Sinequa also emphasizes an intuitive interface and strong visualization tools. Based in Paris, the firm maintains offices in Frankfurt and London as well as their shiny new location in New York City.

Cynthia Murrell, November 13, 2014

Sponsored by, developer of Augmentext

Will Two Xooglers Burnish Yahoo?

November 12, 2014

I read an exclusive story. Know how I know the story is “exclusive”? Here’s the title:

Exclusive: Some Unhappy Yahoo Investors Asking AOL for Rescue

Obviously you have to read the foundation’s exclusive. I want to focus on a different question: Can two former Google executives repair Yahoo’s revenues? I am less than optimistic. I used an illustration in one of the briefings I did during the era of Terry Semel. The picture featured a sinking ship with Mr. Semel’s face Photoshopped into a captain’s uniform.

As I pointed out years ago, once an Internet portal service loses its momentum, flat-lining is the upside. The downside is a slow, gentle drift into irrelevance. So the answer to the question, in my opinion, is, “Long shot.”

I like to recall Yahoo’s former chief technology officer railing me on a conference call about Yahoo’s super-advanced search technology. How is that working out?

Stephen E Arnold, November 12, 2014

Disney and Search

November 10, 2014

I won’t bore you with the Disney InfoSeek adventure. Sigh. If you want to know how Disney is approaching Web search, read “Disney Fights Piracy With New Search Patent.” The system and method is intended to filter out content not licensed by means known to Disney. The write up’s headline suggests that a system and method in the form of a patent will “fight piracy.” Interesting notion, but I think the idea is that Disney has built or will build a system that shows only “official” content.

The notion of building a specialist Web site is an interesting idea. The reality may be that traffic will be very hard to come by. The most recent evidence is Axil Springer’s capitulation to the Google. Axil Springer owns a chunk of Qwanta. Again a good idea, but it does not deliver traffic.

If you build a search engine, who will use it? Answer: Not too many people if the data available to me are correct.

Stephen E Arnold, November 210, 2014

POTUS, Fear, and Google

November 8, 2014

I have zero clue of this article—“Movie Chief: Obama Is Scared to Push Google, ISPs on Piracy”—is accurate. Let’s for the moment assume that the write up by Andy is right as rain.

Here’s a statement I noted:

“It’s sad because if we had a good president that cared about the film industry he would pass a very simple law, an anti-piracy law, but they don’t want to stop it because they are scared of Google, and he’s scared of all the ISPs,” Lerner says. Google’s power and money not only scares off the President but Congress too, Lerner adds. Furthermore, plenty of that revenue is coming piracy-related sources, so the company has no incentive stop it.

Let’s look at the entities in this article.

  • The president of the United States or POTUS
  • Nu Image CEO and founder Avi Lerner
  • The GOOG.

As I understand it, Google which worked out a friendly deal with Axil Springer the other day is just as cuddly as a child chewed Harrod’s teddy bear. The POTUS is able to send troops, issue Executive Orders, and disrupt traffic when he ventures out into the amber waves of grain. (Is there “grain” in LA?) Mr. Lerner is a movie mogul. I am not sure what a movie mogul does. I think it involves creating high value intellectual property which puts Shakespeare and Milton in a state of inferiority.

The point is that movie moguls and POTUS are not as powerful as Google.

From Google’s point of view, that’s the way life is supposed to work. Problems with that, pilgrim. Well, you can always take your queries to Yahoo or, better yet, Qwanza OR Qwanta, whatever. (Try typing that name rapidly on your iPhone.)

Keep in mind that the source write up may not be spot on. It is entertaining, though.

Stephen E Arnold, November 8, 2014

« Previous PageNext Page »