Elasticsearch Optimization Tips

September 25, 2014

One of the Elasticsearch experts at Found shares some of his wisdom in “Optimizing Elasticsearch Searches.” Writer and open source enthusiast Alex Brasetvik emphasizes that Elasticsearch often offers several ways to approach a problem, and that his suggestions can lead to improved performance. The post begins with a look at the way the platform’s filters work:

“Understanding how filters work is essential to making searches faster. A lot of search optimization is really about how to use filters, where to place them and when to (not) cache them….

“This is the key property of filters: the result will be the same for all searches, hence the result of a filter can be cached and reused for subsequent searches. Caching them is quite cheap, as you can store them as a compact bitmap. When you search with filters that have been cached, you are essentially manipulating in-memory bitmaps – which is just about as fast as it can possibly get.

“A rule of thumb is to use filters when you can and queries when you must: when you need the actual scoring from the queries.”

Brasetvik goes on to elaborate on points such as effective filter usage, combining filters, acceleration filters, aggregation issues, scoring, and important Things to Avoid. The helpful post concludes with a list of further resources.

Cynthia Murrell, September 25, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Watson and Its API

September 24, 2014

Short honk: Attention, Watson fans. check out the documentation “Example Post for Answers with Evidence.” Put your code hat on.

Stephen E Arnold, September 25, 2014

Lucid Works: Pando Daily Sets the Record Straight

September 23, 2014

On LinkedIn I learned about this Pando Daily write up: “How Disgruntled Ex-Employees and Bad Reporting Hung LucidWorks Out to Dry.” I noted the Venture Beat analysis of Lucid Works in my post on September 6, 2014. My focus was the wild and crazy information from an “expert” about various factoids. You can read my reaction to the “Trouble at LucidWorks” story here.

The Pando Daily story comes at the issue in a different way. I was delighted to see that Pando found the “expert’s” comments a bit wobbly. There was an interesting run down about Lucid Works that seems to have come from a different point of view. In a way, the two stories—Venture Beat’s and Pando Daily’s—are a bit like the he said, she said information provided to police investigating a married couple’s disturbing the peace incident. I am no cop, so I can’t figure out who is correct and who is incorrect.

Pando takes this tack:

More accurately: It’s [Lucid Works] a startup, and this shit is hard.

I understand that search is hard, but is an eight year old company a start up? That time span baffled me. Coveo asserts that it too is a start up. Other search vendors dating from the implosion of the Big Five in 2006 also use the start up moniker.

the article points out that there are happy employees and positive investors. More money is likely to be needed. Pando Daily quotes a backer as saying:

We won’t start looking for an expansion round until early next year.

ElasticSearch has amassed about $90 million in funding. So LucidWorks may be thinking it needs the same scale of investment to take wing.

With regard to management, Pando Daily reports that the new top dog is the type of CEO who can deliver revenues. The new president—Will Smith—is described in this context:

On this point, VentureBeat seems oddly hung up on the idea that Hayes is a first-time CEO, perhaps failing to realize that Silicon Valley was (and continues to be) literally built on the success of first-time CEOs. Not to over egg the point, but Mark Zuckerberg and Steve Jobs were first-time CEOs.

Pando Daily added:

As an early member of the Splunk team, Hayes is certainly more qualified for this job than 99 percent of the candidates out there, and more importantly, given that he didn’t found the company, he appears excited about the category.

Pando Daily reminded me that good start ups fire people. I understand the difference between the Silicon Valley approach to management and that practiced at Halliburton and Booz, Allen & Hamilton where I worked for many years. The idea of stability is not always congruent with the needs of a fast moving, pivoting technology company.

Pando Daily also takes issue with Venture Beat’s report that Lucid Works fumbled deals with some real big companies. Pando Daily asserted:

These accounts may or may not have any basis in reality, but they hardly indicate a failing company. The very nature of sales and business development is that deals fall apart all the time. Sometimes those are big deals, sometimes not. The facts are that LucidWorks counts Apple, Sears, Verizon, ADP, Raytheon, Zappos, Qualcomm, Ford, eHarmony, Cisco, and others among current customers.

My reaction to this is okay, but won’t naming these firms give ElasticSearch and other firms a target at which to shoot. Some content processing vendors like Palantir and Recorded Future don’t provide too much information about their customers.

On the all important revenue front, Pando Daily quoted the new top dog at Lucid Works as saying:

“$12 million in services revenue isn’t worth shit,” Hayes says. “But $12 million in product sales on subscription? That’s a $100 million business.”

I agree. Unless the subscriber terminates the subscription. As the competition among content processing vendors heats up, some firms will be quite aggressive in their attempts to take away business. Amazon, for example, seems to be struggling with search, but it could get its act together and offer both a good enough solution at very competitive prices. Amazon is not the only sharp toothed outfit in the pond.

Pando Daily tracked down its own search wizard. That poobah said:

Not everyone agrees that enterprise search is quite this sexy. One enterprise analyst, speaking to Pando on the condition of anonymity, describes it as “not that big of an end market.” But at the same time, it’s one that’s still out there for the taking. “There isn’t really a single company or set of companies that have dominant products in the space,” this analyst says. Google and Microsoft have entered the market (the latter via acquisition) with low-cost offerings that would seem to make the competitive environment more challenging for LucidWorks and other upstarts. But according to the company’s supporters, these products are targeting different, less big data-centric applications and are thus not a valid comparison.

If you have ever listened to opposing expert witnesses in a legal dispute, the same factoid gets very different treatment by each expert. That’s what makes subjective expertise difficult to interpret. My view is that enterprise search is struggling for credibility. Some of the value for information retrieval has been exhausted by vendors now out of business. These include Convera, Delphes, Entopia, Siderean, and others. Some credibility has been eroded as a result of the Fast Search & Transfer matter. The CEO was hit with a jail term and a ban on working in search for a couple of years. Then there is the on going dispute between Hewlett Packard and Autonomy. IDOL is an aging technology like Endeca. But the mud slinging about search and content processing does not improve the image of those working in this sector.

Consequently information retrieval companies are working overtime to explain their solutions in terms that do not invoke memories of Convera or Fast Search. Palantir is a data mining company. Record Future does predictive analytics. Coveo is eDiscovery and customer support. Search vendors are using a wide range of jargon to describe findability. Lucid Works is brave in using enterprise search with a dash of Big Data in its marketing.

Pando Daily said:

Journalism is tough, particularly in the technology sector. Reporters in this industry asked to cover complex and rapidly evolving companies that often take on hordes of venture cash and set outrageous performance expectations. Unseemly as it may be, stories of failure and calamity make for good scoops, and in these cases ex-employees and competitors often make the best sources. Unfortunately, they also can be the most biased sources and are often are in the best position to credibly lead a journalist astray. LucidWorks certainly has its warts and its scars. But that doesn’t make it trouble, that only makes it a startup.

One question remains: When does a company cease to be a start up and start to be a viable company? Is it one years, four years, or eight years? I just don’t know, but I think that companies that have been in business for almost a decade may not be start ups. Management with a start up mentality may not want to face the cold realities expected of established, stable firms. With Lucid’s technology originating with a community, management may be the issue to watch at Lucid Works. Good management can produce revenue, happy employees, and contented customers. Its absence is often evidenced by a lack of harmony.

Stephen E Arnold, September 23, 2014

Search Exposes Hackers

September 22, 2014

Hackers get their boldness from their anonymity and it encourages them to do malicious acts. Engadget has an interesting article that will strike fear into hackers: “Search Engine Turns The Tables On Hackers Exposing Their Info.” Indexeus is a search engine that shares hackers’ information in the same kind of data breaches they create. The search engine’s original purpose was to force the hackers to pay one dollar for every record they wanted to purse from the engine’s index. It is funny, because they had to pay safety money.

Indexeus was accused of extortion, so they had to waive that rule. The new law in the EU might mean something new for the hackers:

“Indexeus founder Jason Relinquo tells security guru Brian Krebs that blacklisting is now free due to the EU’s “right to be forgotten;” he can’t charge for a service that’s supposed to be gratis. That purported desire to obey the law is rather odd when the indexed content is illegal by nature. Look at it this way, though — if any targeted hackers are having second thoughts about their paths in life, this may be the excuse they need to make a clean break.”

Get a clean record? It could work, but it can also be used to cover their tracks. It still is wonderful that search is being used for the powers of good.

Whitney Grace, September 22, 2014

Lucid Works: Really?

September 21, 2014

Editor’s Note: This amusing open letter to Chrissy Lee at Launchsquad Public Relations points out some of the challenges Lucid Imagination (now Lucid Works) faces. Significant competition exists from numerous findability vendors. The market leader in open source search is, in Beyond Search’s view, ElasticSearch.

Dear Ms. Lee,

I sent you an email on September 18, 2014, referring you to my response to Stacy Wechsler at Hired Gun public relations. I told you I would create a prize for the news release you sent me. I am retired, but I don’t have too much time to write for PR “professionals” who send me spam, fail to do some research about my background, and understand the topic addressed in your email.

Some history: I recall the first contact I had from Lucid Imagination in 2008. A fellow named Anil Uberoi sent me an email. He and I had a mutual connection, Mark Krellenstein who was the CTO for Northern Light when it was a search vendor.

I wrote a for fee report for Mr. Uberoi, who shortly thereafter left Lucid for an outfit called Kitana. His replacement was a fellow named David. He left and migrated to another company as well. Then a person named Nancy took over marketing and quickly left for another outfit. My recollection is that in a span of 24 months, Lucid Imagination churned through technical professionals, marketers, and presidents. Open source search, it seemed, was beyond the management expertise of the professionals at Lucid.

Then co founder Mark Krellenstein cut his ties with the firm, I wondered how Mr. Krellenstein could deliver the innovative folders function for Northern Light and flop at Lucid. Odd.

Recently I have been the recipient of several emails sent to my two major email accounts. For me, this is an indication of spam. I knew about the appointment of another president. I read  “Trouble at Lucid Works: Lawsuits, Lost Deals, and Layoffs Plague the Search Startup Despite Funding.” Like other pundit-fueled articles, there is probably some truth, some exaggeration, and some errors in the article. The overall impression left on me by the write up is that Lucid Works seems to be struggling.

Your emails to me indicate that you perceive me as a “real” journalist. Call me quirky, but I do not like it when a chipper young person writes me, uses my first name, and then shovels baloney at me. As the purveyor of search silliness for your employer Launchsquad, which seems Lucid Works’ biggest fan and current content marketing agent. Not surprisingly, the new Lucid Fusion products is the Popeil pocket fisherman of search. Fusion slices, dices, chops, and grates. Here’s what  Lucid Works allegedly delivers via Lucene/Solr and proprietary code:

  • Modular integration. Sorry, Ms. Lee, I don’t know what this means.
  • Big Data Discovery Engine. Ms. Lee, Lucid has a search and retrieval system, not a Cybertap, Palantir, or Recorded Future type system.
  • Connector Framework. Ms. Lee licensees want connectors included. Salesforce bought Entropy Soft to meet this need. Oracle bought Outside In for the same reason. Even Microsoft includes some connectors with the quite fragile Delve system for Office 365.
  • Intelligent Search Services.Ms. Lee, I suggest you read my forthcoming article in KMWorld about smart software. Today, most search services are using the word intelligent when the technology in use has been available for decades.
  • Signals Processing.Ms. Lee, I suggest you provide some facts for signals processing. I think in terms of SIGINT, not crude click log file data.
  • Advanced Analytics.Ms. Lee, I lecture at several intelligence and law enforcement conferences about “analytics.” The notion of “advanced” analytics is at odds with the standard numerical recipes that most vendors use. The reason “advanced” is not a good word is that there are mathematical methods that can deliver significant return. Unfortunately today’s computer systems cannot get around the computational barriers that bring x86 architectures to their knees.
  • Natural Language Search.Ms. Lee, I have been hearing about NLP for many years. Perhaps you have not experimented with the voice search functions on Apple and Android devices? You should. Software does a miserable job of figuring out what a human “means.”

So what?

Frankly I am not confident that Lucid Works can close the gap between your client and ElasticSearch’s. Furthermore, I don’t think Lucid Works can deliver the type of performance available from Searchdaimon or ElasticSearch. The indexing and query processing gap between Lucid Works and Blossom Software is orders of magnitude. How do I know? Well, my team tested Lucid Works’ performance against these systems. Why don’t you know this when you write directly to the person who ran the tests? I sent a copy of the test results to one of Lucid Works’ many presidents.

Do I care about Ms. Lee, the new management team, the investors, or the “new” Lucid?

Nope.

The sun has begun to set on vendors and their agents who employ meaningless jargon to generate interest from potential licensees.

What’s my recommendation? I suggest a person interested in Lucid navigate to my Search Wizards Speak series and read the Lucid Imagination and Lucid Works interviews. Notice how the story drifts. You can find these interviews at www.arnoldit.com/search-wizards-speak.

Why does Lucid illustrate “pivoting”? It is easy to sit around and dream about what software could do. It is another task to deliver software that matches products and services from industry leaders and consistent innovators.

For open source search, I suggest you pay attention to www.Flax.co.uk, www.Searchdaimon.com, www.sphinxsearch.com, and www.elasticsearch.com for starters. Keep in mind that other competitors like IBM and Attivio use open source search technology too.

You will never have the opportunity to work directly for me. I can offer one small piece of advice: Do your homework before writing about search to me.

Your pal,

Stephen E Arnold, September 21, 2014

Russian Content: Tough to Search If Russia Is Not on the Internet

September 20, 2014

Forget running queries on Yandex.ru if Russia disconnects from the Internet. Sure, there may be workarounds, but these might invite some additional scrutiny. Why am I suggesting that some Russian content becomes unsearchable. Well, I believed the story “Russia to Be Disconnected from the Internet.” Isn’t Pravda a go to source for accurate, objective information?

The story asserts:

This is not a question of disconnecting Russia from the international network, yet, Russian operators will need to set up their equipment in a way to be able to disconnect the Russian Internet from the global network quickly in case of emergency, the newspaper wrote. As for the state of emergency, it goes about both military actions and large-scale riots in the country. In addition, the government reportedly discusses a possibility to empower the state with the function to administer domains. Currently this is a function of a public organization – the Coordination Center for the National Domain of the Internet. The purpose of the possible measure is not to isolate Russia from the outside world, but to protect the country, should the USA, for example, decide to disconnect Russia from the system of IP-addresses. It will be possible to avoid this threat, if Russia has a local regulator to distribute IP-addresses inside the country, rather than the ICANN, controlled by the United States government. This requires operators to set up “mirrors” that will be able to receive user requests and forward them to specific domain names.

Interesting. Who is being kept in the information closet? I suppose it depends on one’s point of view. Need an update for Sphinx Search? There will be a solution because some folks will plan ahead.

Stephen E Arnold, September 20, 2014

Hakia Down

September 18, 2014

We ran a check on the search and content processing vendors in our file. The Hakia.com site appears to be down.

Hakia was a developer of semantic search and offered several demonstrations of its technology. To learn about the company, the interview with Riza C. Berkan, navigate to this Search Wizards Speak issue.

Stephen E Arnold, September 18, 2014

BA Insight New Hire Likes His Job

September 17, 2014

Navigate to “My BA Insight Enterprise Search Adventure Begins.” The enthusiasm, confidence, and Super Bowl winning attitude rips off my screen. With new executive and venture funding, BA Insight seems to be a go to solution. But is the company too closely allied with Microsoft and the aging SharePoint product? Will the forthcoming Delve (a variation on the vision for Fast Search & Transfer revealed during a talk at CERN in 2007) put pressure on the SharePoint centric outfits? I just don’t know.

Here’s the passage I find interesting. I did not have one of the goslings “fix up” the capitalization errors or add links.

As I’ve been ramping up I’ve been learning a lot about their products and solutions.  BA Insight use to be known as the connector company.  The BA Insight Longitude Connectors can connect Microsoft SharePoint to more than 30 enterprise systems for information access and cross-platform search.  They have so many connectors that allow SharePoint 2013, 2010, FAST and previous versions of SharePoint connect to a huge variety of backend systems.  Here are a few examples:  Documentum, eRooms, Websphere, Hummingbird, LiveLink, SAP, Siebel, Notes, Autonomy, FileNet, Connections, Opentext, SalesForce, Netdocs, SQL,  Docushare, and a bunch of different legal systems… I heard they recently setup a connector for Jive and are open to building a connector for companies that need one to other systems not listed.  Even with all of that, I find they don’t want to be known as simply a connector company since they really have a platform for enterprise search.  The autoclassify stuff is brilliant.  It helps set properties on your content based on your managed metadata and with a set of rules for both content already in SharePoint and for the content that will stay in these other systems.  You really need to have good metadata so you can drill down and filter your search results quickly and easily and that’s where their rich search UI comes in providing search parts that give you the ability to drill in without needing to know boolean search.   At that point it’s the smart previews that save you time.  On top of the Office Web Apps in SharePoint 2013, you get previews for PDF, ZIP, and a huge variety of other formats including the old office formats that you’d otherwise miss including to all of those systems I mentioned.  There’s even more, but I think this is a good start for understanding a few of the top products.  As an example they’ve been doing some really innovative work on hybrid search and real federation where the results are in one stream.

My question is, “Why would anyone use SharePoint when BA Insight can fill the bill as “enterprise search experts”? I think Fast Search had a good sense of what it had to do to address the limitations of its technology. The question is, “Will Microsoft want partners to siphon off revenue from the mother ship?”

Stephen E Arnold, September 17, 2014

Sir Thomas Bayes Does Art. Versatile Guy.

September 17, 2014

Navigate to FindMeLike. Click on “Try this demo.” You will have access to a Bayesian-centric visual search tool. The idea is that you click on an image you like. The system then locates similar images.

image

The click narrows the result set. Each poster is available for sale. But I could not figure out how to move to the shopping cart.

How well does a Bayesian-centric system work? Try and use the comments section of this blog to share you opinion.

Stephen E Arnold, September 17, 2014

Microsoft Azure Search is the Search of the Future

September 17, 2014

For a preview of Azure Search, visit Microsoft Azure. The article promises that Azure search is a breakthrough in “search-as-a-service for web and mobile app development.” For fast search, the future is Azure Search, the cloud platform that allows for the building, deployment and management of applications. Developers will be pleased at the ability to incorporate search without the infrastructure to worry about. The Azure client libraries are open source and available through GitHub. The article includes this information,

“Azure Search boosts development speed thanks to support for familiar tools and a consistent global cloud platform. Quickly provision search and start populating the index to get up and running quickly. Like other Azure services, Search uses familiar REST API calls. The worldwide network of Azure datacenters means reduced search latency no matter where your application is located.”

Pricing details are also available here. The pricing details include this information,

“Azure Search is sold in combinable “search units” that have a defined queries-per-second (QPS) benchmark and document count (index storage) benchmark associated with each unit.”

By combining units, users can achieve higher QPS and/or higher document count. Currently Microsoft is offering a month-long free trial, which should be enough time for anyone to ensure that it is worth the investment.

Chelsea Kerwin, September 17, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta