Launching and Scaling Elasticsearch

August 21, 2014

Elasticsearch is widely hailed as an alternative to SharePoint or many of the other open source alternatives, but it is not without its problems. Ben Hundley from StackSearch offers his input on the software in his QBox article, “Thoughts on Launching and Scaling Elasticsearch.”

Hundley begins:

“Qbox is a dedicated hosting service for Elasticsearch.  The project began internally to find a more economical solution to Amazon’s Cloudsearch, but it evolved as we became enamored by the flexibility and power of Elasticsearch.  Nearly a year later, we’ve adopted the product as our main priority.  Admittedly, our initial attempt took the wrong approach to scale.  Our assumption was that scaling clusters for all customers could be handled in a generalized manner, and behind the scenes.”

Hundley walks through reader through several considerations that affect their own implementation: knowing your application’s needs, deciding on hardware, monitoring, tuning, and knowing when to scale. These are all decisions that must be made on the front-end, allowing for more effective customization. The upside of an open source solution like Elasticsearch is greater customization, control, and less rigidity. Of course for a small organization, that could also be the downside as time and staffing are more limited and an out-of-the-box solution like SharePoint is more likely to be chosen.

Emily Rae Aldridge, August 21, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Google Search Has Been Improved. A Lot.

August 20, 2014

I do a lecture for the police and intelligence community. The focus is on the techniques helpful in finding information that answers a query. If a person types a query into Google, the results are ads, popular hits that others found useful, and search engine optimized content.

Consider looking for a “shotgun suppressor”. Ignore the quotes. Here’s the results from Google.com on August 20, 2014:

image

Pictures. Not too many adds. A video.

Where does one buy a shotgun suppressor? Run the query “purchase shotgun suppressor”.

The results are:

image

More pictures. Ads. and a couple of companies mentioned several times.

So it is easy to get information about a shotgun suppressor and buy one. Now, do some clicking and you will find that the links include auto mufflers from 2WheelPartsSupply.com and some other results that are off point.

In order to nail the real deal, military grade suppressor, some additional work is required.

When I read “Google Made 890 Improvements To Search Over The Past Year”, I just sighed. The write up is a rah rah for Google. Here’s a passage that I highlighted:

In a Google+ post from Google head of search Amit Singhal, Google shares they have made “more than 890 improvements to Google Search last year alone.” In 2009, Google told us they made between 350 to 400 changes to search and in 2010, they said they made 550 improvements to search in the past year. Google’s Matt Cutts said in a video in 2010 they make one change per day to their core search algorithm. We also know Google tests hundreds of changes in a day but only some of them make the light of day.

Okay, run some queries. Has Google improved search, or has Google improved its methods for diffusing ads into results. My experience is that Google is great for information about Dr Dre and pizza. For other types of information, considerable effort is required to unearth useful, on point information.

By the way, the key to finding the shotgun suppressor is to use synonyms like moderator and to approach the problem using another Google service. The content is findable but I am not feeling lucky anymore.

Since everyone is now an “expert” in search, which of the top 10 changes to Google in the last decade ring your bell. How about “universal search”? Ever wonder why books, blogs, non US content are not included in a universal search? Think about it, please.

Stephen E Arnold, August 20, 2014

Search Vendors Learn From Comcast Sales Rep

August 19, 2014

Ryan Block experienced horrible customer service while trying to disconnect his Internet with Comcast. Prior to his experience, Comcast had a horrible reputation when it came to customer service and Block’s cancelation attempt brings to light an ongoing problem within the company. TechDirt comments on the situation in the article: “Behind The Veil: Comcast Techs Detail How Customer Service Is Really All Just ‘Sales.’”

The article reposts stories from The Verge where current and former employees confess their customer service stories. Their accounts amount to call center nightmares, stress, and Comcast’s drive to sell, sell, and sell! Comcast is definitely going to have future troubles.

“The question that arises with this kind of thing, particularly with Comcast operating a multi-tiered group of call centers, some outsourced, some not, is whether the company has become too unwieldy to actually meet customer requests. It’s fine for a company to work to retain customers, but that’s typically done by providing great service, not irritating the shit out of anyone who doesn’t think your company’s poop doesn’t stink. Far from too big to fail, Comcast, recently in massive merger discussions, may be getting too big to succeed.”

We’ll leave comments on illegal monopolies for another article, but this brings to mind what search vendors can take from this situation. Poor customer service equals poor client retention and fewer sales. It does not take long for customer complaints to go viral on the Internet, making reputation even more important. Search vendors offer numerous solutions to help with customer support and their products can improve a customer’s experience. Now would be a good time for search vendors to market their customer service products.

Whitney Grace, August 19, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

One User Finds Some Flaws in Elasticsearch

August 18, 2014

We are jazzed about Elasticsearch. Our own search expert Stephen E. Arnold, who has been yearning for some real innovation in search for years now, recently declared, “I will be telling those who attend my lectures to go with Elasticsearch. That’s where the developers and the money are.” Personally, I’m inclined to go with the search expert here (though I admit I may be a bit biased.) This declaration is just to preface my reaction to a post at Sammaye’s Blog, “Things I Have Learnt in the First 5 Minutes of Using Elastic Search.” Apparently, how to spell the name correctly was not one of those things.

Still, it looks like programmer Sam Millman (aka Sammaye) may have some good points. For example, he describes the querying as “the most verbose in the universe,” balks at the requirement to define indexes client-side, and claims Lucene is a bad platform on which to base search in the first place. He also calls the documentation terrible, and bad documentation happens to be a pet peeve of mine. (I’ve written documentation. If you must supply it, you might as well make it comprehensive, organized, and well-written. It’s not that difficult.) Millman explains:

“Its documentation is great at explaining the API, no doubt about it but if you want to actually find out how something works and why something is then you have to constantly ask StackOverflow. It just describes what parameters to put in and then leaves the rest up to you thinking that you don’t want to bother yourself with those details. We do though, we are not bandwagoning your product, we want to know how sharding and replication works, how indexes work and how to manage the product and more. Even when looking at the API the documentation can sometimes be…unhelpful. Mainly due to its huge font-size, yet tiny middle centered layout, English language problems and disorganisation. Overall I came out less than impressed about Elastic Searches documentation. I actually Google search everything first so I don’t have to navigate that mess.”

So, perhaps Elasticsearch is not perfect. See the article for Millman’s full roster of complaints. However, if Arnold is correct and this is “where the developers and the money are” right now, vexing problems should be fixed in short order. It would be a mistake to not take Elasticsearch seriously. Formed in 2012, the company is based in Amsterdam with offices in the U.S., the U.K., France, Germany, and Switzerland. They are also hiring as of this writing, in case anyone here wants to help them iron out some wrinkles.

Cynthia Murrell, August 18, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

The Guardian Explores HP Autonomy

August 16, 2014

I read “Hewlett-Packard Allegations: Autonomy Founder Mike Lynch Tries to Clear Name.” The British “real” newspaper focuses on Mike Lynch, the founder of Autonomy. I am convinced that Autonomy pitched the value of its company to a number of firms. I know that Hewlett Packard bought Autonomy. I assume that spending $11 billion was not a K Mart blue light special impulse purchase. I know that HP has had what the MBAs call “governance challenges.” These range from allegations of getting frisky with folks to management churn. I know that for me, the HP of electronic devices yielded to the HP of the ink cartridges.

Here’s a point I highlighted in the Guardian’s write up:

Meanwhile, lawyers on all sides are using legal privilege to sling mud. Lynch says it is not only his name that has been stained, but that of the British technology industry. Autonomy’s accounting and marketing methods had attracted criticism before the HP acquisition, but Lynch was also a poster child for the achievements of Cambridge’s Silicon Fen. The Autonomy affair casts a shadow, and a conclusion from the SFO is overdue.

I have a slightly different view of the dust up. Folks want to believe that information retrieval will generate another Google. Because of those expectations, executives whose expertise in search extends to running a Google search on a mobile device assume they know about content processing.

When buyers get excited about a purchase, some people buy Bugatti Veyrons and spring for gold iPhones. Others snap up search companies and expect the money to roll in like the oohs and aahs at the golf club when the Veyron rolls up.

Wrong. The dust up between HP and Autonomy is an illustration of what happens when folks without too much understanding of content processing’s complexities covet a home run. The impact does affect Mike Lynch, a Cambridge PhD and real live inventor.

The collateral damage is on the buyers of search companies who toss millions at a sector without understanding how difficult it is to create a search company that is not selling ads or living exclusively on Department of Defense largesse.

HP bought a company with a strong brand, customers, and technology that when properly resourced works. HP did not buy a Google scale money stream, a Palantir clinging to the US government, or a break even metasearch system.

The impact on the reputation of Autonomy professionals is significant. What does this dispute do to other search and content processing companies? Search is tough enough without having a megaton dispute played out in the datasphere.

HP did not have to buy Autonomy. Microsoft passed. Oracle passed. HP bought. HP had time and resources to dig through Autonomy. If it did not, then HP created its own problem. If it did, HP created its own problem. Autonomy, with 15 years of history, was looking for a buyer. My hunch is that HP was looking for a Google and bought a different business because HP convinced itself it could generate more money than Autonomy could. HP found out that it could not match Autonomy’s revenues. Whom does any self respecting MBA or lawyer blame? The other guy.

This hassle says much about HP. Sadly it affects other search and content processing companies as well.

Stephen E Arnold, August 16, 2014

Venture Outcome: The Search and Content Processing Angle

August 14, 2014

I suggest you read “Venture Outcomes Are Even More Skewed Than You Think.” The write up contains several factoids. I highlighted one and added a couple of exclamation points. I suggest you print out the article, grab a writing instrument, and do your own filtering.

The main point of the write up is buried in the paragraph that begins “This really underscores the challenge of crating a venture portfolio that produces reasonable returns.” The factoid I honored with exclamation points is:

In my hypothetical $100M fund with 20 investments, the total number of financings producing a return above 5x was 0.8 – producing almost $100M of proceeds. My theoretical fund actually didn’t find their purple unicorn, they found 4/5ths of that company. If they had missed it, they would have failed to return capital after fees.  Even if we doubled the number of portfolio companies in the hypothetical portfolio, a full quarter of the fund’s return comes from the roughly ½ of a company they invested in that generated 10x or above. Had they missed it, they would have produced a return that roughly approximated investing in bonds – not the kind of risk adjusted return they or their investors were looking for.

I know this is a hypothetical. Assume that the analysis is off by plus or minimum 10 percent. What do we get? Lousy returns; that is, returns comparable to dumping cash into bonds. I think about the banking and venture firm meetings in which I have participated. I cannot recall any of the smiling MBAs considering that their best ideas could perform on a par with bonds. My hunch is that the people who pushed money into venture funds and bank VP-inspired investments are not thinking bond-type yield.

If the number is accurate, I wonder if those folks who have pumped tens of millions of dollars into outfits promising a money ball from search and content processing will get their money back. Forget an upside. Break even may be tough. Search and content processing makes headlines like this one every day:

image

To get similar results, navigate to Google News and enter the query Autonomy HP or Autonomy CFO.

The second item I circled with my pink marker was a diagram:

image

The important part is the small number of “winners” graphically embodied in the miniscule 0.4% column. This is a broad swath of investments. For search and content processing, the payoffs have to be measured in what money flows via revenues or a sell off like Fast Search to Microsoft, Exalead to Dassault, or Autonomy to HP. The number of folks who made big bucks and are really happy may be modest. In fact, judging from the legal hassles with regard to Fast Search and the recent HP Autonomy headlines, even those who were MBA winners may have headaches. Information retrieval seems to deliver a number of headaches for stakeholders.

The third item is the factoid that makes clear the failure rate of start ups. Search and content processing poses similar challenges. There is a twist. Once a search and content processing sells to a larger firm, how many have become major money pumps to the acquiring companies? The question is very difficult to answer. The absence of information tells me that there are not too many feel good stories to tell. The pleas on LinkedIn enterprise search discussion threads for positive case studies about search are easy to ignore. Good news with regard to search and content processing is not sloshing around the Big Data bucket in which we exist.

How long with companies that have been in business for many years promising a money ball from search be able to survive? How long will the old soft shoe about search and content processing open checkbooks? How many years will it take some information retrieval companies to replace red ink with the blank ink of hefty after tax profits? How long will it take those seeking answers to information retrieval problems to wake up to the fact that consultant saucisson, Star Trek fantasies, and marketing hyperbole are unlikely to deliver a Disneyland-like “win”?

The data set for the Seth Levine write up is large enough to warrant a tentative answer, “Probably never.” Search and content processing are different. The algorithms and methods are decades old. Talk does not change what can be accomplished with affordable computational resources. Pumping money into search, therefore, may be painful when the actual financial data are reviewed by investors and stakeholders.

Why aren’t their abundant “good news” cases for search and content processing? There just aren’t that many. Think a power curve of implementation successes. There are more examples of search going off the rails than home runs. This is surprising when so many profess to be experts in search and so much money has been injected into information retrieval start ups. The business strategy of search and content processing companies may be raising money. Any other work may be of little interest.

Stephen E Arnold, August 14, 2014

A Case Study for Search from Opentext

August 13, 2014

The Customer Story about Distell on OpenText tells of the successful South African beverage company. The “article” might provide a search case study. Opentext is an information management software that offers guidance in content management, archiving, web content management, and a myriad of other pursuits within the umbrella of “unleashing the power of information.” The article provides a list of bullet points about the company and an About section that states,

“Distell is Africa’s leading producer and marketer of spirits, fine wines, ciders, and ready-to-drinks (RTDs). It employs nearly 5,000 people and has an annual turnover in excess of R12,3 billion. When Distell was formed in 2000 it had 1,700 information workers but due to mainly organic growth and the acquisitions of Bisquit, a French cognac company, and Burn Stewart Distillers, a Scottish whisky producer, that has now grown to 3,000 users spread across over 80 offices, mainly in Southern Africa, but also in eight international locations.”

Otherwise it has a movie and lots of dot points. Substantive cost overrun info? Nope. Of course there is also a link to the full story, a three page PDF that provides detailed information about the company and its prospects. But the dot points are a lot more appealing.

Chelsea Kerwin, August 13, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Flurry in Stock Market Listings Coincides with SLI Systems Downward Spiral

August 12, 2014

The article titled SLI Systems Plunges to Lowest Since Listing on TVNZ discusses the recent burst of listings. SLI Systems is a company that provides site search, navigation and “user-generated SEO.” SLI’s share price shows the pressure findability vendors are facing in today’s marketplace. The stock fell over seven percent and remains just above its initial public offer price of $1.15. The article states,

“The local stock market is experiencing a flurry of listings which is spoiling investors for choice after it got a shot in the arm from the government’s partial privatisation last year, and the recent listings of software developers Gentrack Group and Serko have only added to tech investments available. Next week, IkeGPS Group, which sells a range of portable measuring devices, plans to list while Vista Entertainment, the cinema software and data analytics company, is due in August…”

Paul Harrison of Salt Funds Management, believes that the flood of listings is not the only culprit for falling prices. Instead, he suggests that certain stocks were simply priced too highly and the current downward trend is a “hangover” following the initial “frenzy.” Other affected companies mentioned include Xero, the accounting software firm, the biotech company Pacific Edge which was unchanged, and Diligent, which also fell in price.

Chelsea Kerwin, August 12, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

OnlyBoth Launches “Niche Finding” Data Search

August 12, 2014

An article on the Library Journal Infodocket is titled Co-Founder of Vivisimo Launches “OnlyBoth” and It’s Super Cool! The article continues in this entirely unbiased vein. OnlyBoth, it explains, was created by Raul Valdes- Perez and Andre Lessa. It offers an automated process of finding data and delivering it to the user in perfect English. The article states,

“What does OnlyBoth do? Actions speak louder than words so go take a look but in a nutshell, OnlyBoth can mine a dataset, discover insights, and then write what it finds in grammatically correct sentences. The entire process is automated. At launch, OnlyBoth offers an application providing insights o 3,122 U.S. colleges and universities described by 190 attributes. Entries also include a list of similar and neighboring institutions. More applications are forthcoming.”

The article suggests that this technology will easily lend itself to more applications, for now it is limited to presenting the facts about colleges and baseball in perfect English. The idea is called “niche finding” which Valedes-Perez developed in the early 2000s and never finished. The technology focuses on factual data that requires some reasoning. For example, the Onlyboth website suggests that the insight “If California were a country, it would be the tenth biggest in the world” is a more complicated piece of information than just a simple fact like the population of California. OnlyBoth promises that more applications are forthcoming.

Chelsea Kerwin, August 12, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

A Google Savior for US Government Web Sites

August 11, 2014

I know that Googlers and Xooglers are absolutely the best. I read “Ex-Google Engineer to Lead Fix-It Team for Government Websites.” I am confident that the Xoogler will bring high magic to the problematic Web sites from numerous Federal entities and quasi-government entities. In year 2000, there were 36,000 of these puppies. I don’t recall how many were not working the way the developers intended.

I don’t know how many US government Web sites there are today because the nifty free tools I used in 2000 and 2001 the way they did a decade ago.

How long will it take to address the backend issues of HealthCare.gov or get the other sites with glitches working “just like Google”? I think USA.gov might warrant a quick look too. I suppose one could check out the performance metrics for America Online or Yahoo, two outfits run by Xooglers. There may be some data that help in predicting the fix time.

Stephen E Arnold, August 11, 2014

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta