An ElasticSearch Feature Comparison: Where Is the Beef?

November 15, 2012

There is an interesting but somewhat incomplete “feature comparison” between Solr and ElasticSearch. ElasticSearch, as you may know, is the new $10 million darling of the search world. Well, maybe Attivio with $42 million or Palantir with $150 million is “darlinger”?

You can find the write up at “Apache Solr vs ElasticSearch.” I want to point out that the comments to the basic information are quite useful. Among the points included in the comments which I found helpful were:

  • The notion of dynamic fields, field copying via multi-fields, and alternative query parsers
  • A reference to DataStax, Cassandra, and Solr
  • A suggestion that an eZ Publish reference be added.

However, I want to point out that in our analysis of ElasticSearch, there is one big factor not embraced by a feature list. Organizations want a system which is easy to install, maintain, and extend. Cost is a big deal, but when one factors in the costs associated with start up companies, there may be less predictability than with more established open source vendors such as Attivio, IBM, LucidWorks, and others.

As a side note, the publisher of the first three editions of the Enterprise Search Report, which I wrote, I had to produce nearly 20 feature charts. Guess what? Most of the feature charts were identical on the main points. The differences were of great technical importance to developers at the vendors’ firms. However, to the companies licensing software, the decisive factors were usually based on business considerations; for example:

  • Customer live demos and references from these customers
  • Pricing including support and training
  • Business stability
  • Engineering depth of the vendor
  • Financial performance over time
  • Management experience.

The fact that one vendor’s approach to k-means was “faster”, the metatagging system was “self learning”, or that another vendor’s system could index 10 gigabytes of content in X time slices was often irrelevant as decision time. Maybe open source search will be different, but right now, the open source world is on a vector that leads to the same business models which the traditional proprietary software vendors used with varying degrees of success?

In my view, a company in growth mode is juggling many balls at once and riding a unicycle. Consequently, the marketing and developer hyperbole may distract from the pure business considerations which garnered Attivio four times the funding that ElasticSearch obtained. The downside is that Attivio has to generate sufficient revenue to hit financial targets. Some financial types want five, 10, or 17 times the investment. I am too old and frail for that type of pressure. Even a $10 million cash infusion works out to $50, $170, or $100 million in revenues.

Only a handful of the 50 search vendors I track have revenues in shouting distance of $50 million. On a call with some MBA types last week, I learned that blowing past the revenues of Autonomy, Endeca, and Fast Search before their sale or implosion, was a “no brainer.”

I am not so sure. Building and sustaining revenue is more than a feature punch list. The real challenge is building and sustaining a business. Look at the present situation for HP Autonomy. Fast Search is, in my opinion, an end of life product. Endeca is an “all things to all people” solution. Endeca is darned good at eCommerce and processing certain types of data sets.

Open source software is important. Open source search is important too. What is more important is the constellation of factors that make “free” software into a viable commercial product which delivers a return to its funding sources. Will the open source community cheerlead when the VCs force the innovators who took those millions to produce a hefty profit? More than marketing and feature lists are needed. Just my opinion.

You can purchase the ElasticSearch analysis at this link for $3,500. Why so much? IDC has to generate revenue and return a profit. My hunch is that this is a fact of economic life that some open source code surfers do not yet hug and cuddle every hour or two.

Stephen E Arnold, November 14, 2012

Enterprise Architect Roles Shifting in Big Data Developments

November 15, 2012

IBM PureSystems is developing new systems to deal with Big Data challenges and emphasizes high-performance data services for local and/or cloud storage. The systems facilitate more rapid implementation and full integration, according to the article “IBM PureSystems Takes on Big Data” on ComputerWorld, and are challenging the traditional role of enterprise architects.

The article informs us about the changes:

“The traditional job of an enterprise architect is ‘to produce a huge document saying ‘this is how we do it’ – a document that everyone ignores, because it takes more effort to read and follow it than it does to ignore it,’ says IBM ‘distinguished engineer’ Jason McGee.

‘With PureSystems kind of technology, you can turn the document into actionable patterns that live in the system. That shifts the inertia and makes it easier to do things the right way. Enterprise architects will think ‘at last I can influence the way things develop’.’”

IBM Pure’s attack on Big Data is obviously shifting the enterprise architect’s job to a new phase of expertise. Working with selected certified integrators such as Intrafind can make that shift an easier transition that manages data effectively with rich tagging and secure search.

Andrea Hayden, November 15, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Disruptive Software Solutions Aim to Increase Productivity

November 15, 2012

Ben Werther became head of the products division when EMC bought Greenplum in 2010, which Forbes asserts as the first step in founding Platfora. This company plans on disrupting the traditional warehousing and business intelligence and is the subject of the recent article, “Platfora Raises $20 Million To Get Real with Big Data.” The main collateral Platfora has is it’s usage of Hadoop.

Platfora is still in the early phases; there are ten beta customers and more than 70 that are waiting.

The article describes how Platfora increases the value of Hadoop:

Hadoop is not easy to work with. Keep in mind that it’s been mostly the domain of data scientists at companies like Yahoo! and Facebook. But with Platfora, it’s now possible for any company to get tangible business value from Big Data. This is through common sense queries and helpful visualizations. Pulling this off has taken about a year and intense engineering. ‘I’ve never seen better execution from a team,’ said Scott Weiss, who is a partner at Andreessen Horowitz.

There are a multitude of companies that have presented the business intelligence market with potentially disruptive technologies. PolySpot software solutions fit that bill, but more importantly they help deliver information across the enterprise. Increasing productivity is why these technologies matter after all.

Megan Feil, November 15, 2012

Sponsored by ArnoldIT.com, developer of Augmentext.

Leading Austrian IT News Portal Adopts Mindbreeze InSite Solution

November 15, 2012

Monitor.at is Austria’s leading IT news portal for small and medium businesses. The organization recently added the Mindbreeze InSite search solution to their Web site. This integrated cloud solution helps site visitors quickly and efficiently find important and relevant facts. Details of the InSite adoption can be read in the article, “Monitor.at with New Site Search.” The author includes this comment from Monitor’s editor:

’With over 13,500 products to monitor.at it is for visitors not always easy to find the desired information. Integrating Mindbreeze InSite, we are offering our visitors a convenience feature to quickly and easily find the desired information. Addition be easier for us Mindbreeze InSite work. Messages to the top topics automatically, appears seeking based,’ explains Ing Markus Klaus Eder, editor monitor.

Monitor.at is a good example of a major Web site that has incorporated a powerful search solution for improved site experience. Increasing Web traffic and retaining site visitors is increasingly becoming a major avenue for business success as a Web site is often the first customer interaction with a business. The power of semantic search in addition to relevant content is necessary for gaining and retaining an audience. A powerful search system that can make connections among vast amounts of data can also help deliver a better search experience for the user. InSite is capable of searching a wide variety of specific documents, including PDFs, Excel sheets, and Word documents, as well as searching social media sites and Web sites. Consider the free-trial to see if the Mindbreeze solution works for you.

Philip West, November 15, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Big Data Enters the Political Arena

November 15, 2012

Big Data is making headlines as it helps organization and companies make sense out of massive amounts of unstructured data. Interest heightens when companies can take Big Data and make profits. However, Big Data now has another potential arena for success – politics. Attivio has ventured into the election 2012 data. Read a full report in the MarketWatch article, “Attivio and Tableau Analyze Presidential Election News, Social Media and Polling Data.”

The author explains:

“‘As a Tableau partner in the Big Data space, Attivio has taken the election data to a new level with their ability to unify social media, unstructured content and structured data,’ said Ellie Fields, Sr. Director of Product Marketing at Tableau. ‘Together, Attivio and Tableau provide a fully integrated and correlated visualization of unstructured content, structured data and business intelligence giving customers easily digestible, sharable and actionable insight.’”

Users who may be interested in tapping into Big Data for their own organizations could look to a trusted company like LucidWorks. Their LucidWorks Big Data has a major advantage in that it is ready out-of-the-box. Read more from the LucidWorks Web site.

“Designed to be ready out of the box, the LucidWorks Big Data platform includes all of the necessary open source components pre-integrated and certified.  Within a few hours, a customer instance is provisioned and hosted in the cloud – and supported by LucidWorks.”

Big Data will continue to make a bigger and bigger impact in the enterprise. See what a Big Data solution can do for your organization.

Emily Rae Aldridge, November 15, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

SLI Systems Helps Stanfords Increase Conversion

November 15, 2012

SLI Systems has generated a conversion improvement, we learn from their press release, “Stanfords Creates 3.5X Improvement in Conversion Rate and 3X Higher Per-Visit Value with SLI Systems Site Search.” The write up tells us:

“Stanfords, the UK’s leading specialist retailer of maps, travel books, and travel accessories, is seeing a conversion rate for site search users that is 3.5 times the rate for non-site search users after implementing Learning Search from SLI Systems. In addition, per-visit value for visitors who use site search is three times higher than per-visit values for visitors who don’t use search. Stanfords chose SLI’s customizable refinements and learning-based approach to replace the site search built into its e-commerce platform from Exact Abacus.”

Interesting metric. Could there be something about users who don’t use site search that predisposes them to not buy?

Stanfords‘ e-commerce manager Joanna Lawton explained that the recent expansion into travel-related products prompted the move. She is happy with the increased relevance of her company’s results pages, as well as with the system’s intuitive user tools, she said.

SLI Systems supplies tools for site search, navigation, merchandising, and search engine optimization. They boast that their technology ‘learns’ from the behavior of visitors over time, resulting in more relevant results. The privately held company has offices in the US, the UK, Australia, and New Zealand.

Cynthia Murrell, November 15, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

ZyLAB on Mixing Social Media with Business

November 15, 2012

EDiscovery and social media: another tricky content issue. The CodeZED blog offers, “Compliance in the Cloud: How to Deal with Social Media in the Workplace.” Writer Brenda Mahedy succinctly lays out the problem. For example, she writes:

“The combination of business sensitive information and a mass broadcast capacity keeps legal departments awake at night! Have you thought of some of the legal challenges facing the use of social media at the workplace?

  • At this moment there are no specific laws and regulations for the governance of social media.
  • There is no generally recognized right to privacy in social media postings.
  • The effects of publicly available information stretch out beyond the recruitment process and are entering courts.
  • More and more legal cases insist on the production of social media e-discovery as evidence…
  • Social media information lives on servers that are not in the enterprise’s direct custody or control.”

All important points to consider. Despite the title, the article directly offers no actual solutions for dealing with social media in the workplace. Instead, the ZyLAB blog suggests a ZyLAB white paper (registration required), titled “Compliance in the Cloud: How to Deal with Social Media in the Workplace” by Annelore van der Lint. It is probably a good idea to check out the paper if you use, or plan to use, social media in connection with your business. As Mahedy reminds us, research firm Gartner has predicted that, “by the end of 2013 half of all companies will have been asked to product material from social media websites for e-Discovery.”

ZyLAB was founded 1983, with its release of the first full-text retrieval software for the PC. Its current flagship product, the eDiscovery and data management solution ZyLAB Information Management Platform, was released in 2010. The company maintains headquarters in the Netherlands and the United States, and has offices around the world.

Cynthia Murrell, November 15, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Google and Its Preserving the Lumber Room Contents

November 14, 2012

I don’t know if this write up is accurate. Navigate to “Privacy Issue: Google Docs Seems to Not Delete but Only Hide Documents When the Trash Is Emptied.” The main point of the write up is that content which a user may have wanted to make go away has not gone away. In database deletions, a similar issue exists until the database is spiffed up to make the space hogging deletions go the way of the dodo. Even then, it is possible to roll back a database or just restore it to a previous state. So what’s gone may not be gone.

Here’s a passage I noted:

The good thing is that Google Docs is still in Beta and things can change until it goes into release mode. But chances are higher that something will happen when we bring our privacy concerns to the attention of Google and also to the attention of all others that are offering to us either free or paid services on the Web. It is our responsibility. Let us choose wisely what and what not we are using as the the core of our personal information infrastructure.

I admire optimism. What surprises me is that someone finds this non deletion anything other than standard operating procedure. The original Norton’s Utilities removed those pesky “?”s so that deleted files were suddenly not deleted. Magic.

If I had the energy, I would ask questions about the deployment of link analysis and intercept tools across deleted data. But, I am 68 and it is late in the afternoon. I assume that nothing untoward will be done with deleted user data. The world is just getting better with each passing day. Oh, I have to limp to the TV. More information about the email buzz and consequences concerning a certain former government official, a writer who can do more pushups than I can, and a wild and crazy family in Florida. Now that’s a state I admire with or without email shenanigans.

Stephen E Arnold, November 14, 2012

Google Related Quote to Note: The Year of Transparency

November 14, 2012

I read “Google Opens Up on Seven Years of Its Data Center History.” The write up appeared in the “real” journalism publication GigaOm.com. I found the story interesting. The timeline was fascinating, but the gem was this statement about Google:

This year’s theme is transparency.

There seems to be a bit of public relations push about Google’s technology. I noticed that Jeff Dean, one of Google’s super wizards, published “Large Scale Distributed Deep Networks.” You may want to snag it before it disappears. I had to hunt around for the document. Pretty interesting. Google also blew its horn about processing three days of video in the blink of an eye. Forbes, another “real journalism” outfit hopped on this story at “YouTube Turns Seven today, Now Uploads 72 Hours of Video per Minute.”

Maybe transparency means PR? I wonder how those folks toiling at the FTC and in law offices focusing on Google related matters define “transparency.” I side with Google. PR is plenty transparent.

Stephen E Arnold, November 14, 2012

Retail Giants Make Transition to Big Data Analytics

November 14, 2012

I came across an interesting article on InformationWeek titled “Why Sears Is Going All-In On Hadoop,” which tells about how some “old-school” companies are making the transition to big data services to access their customer bases. Sears’ admits personalization and customer loyalty were big draws to implementing big data analytics. To go beyond just the surface of available data, the retail giant turned to Hadoop.

The article tells us about the company’s choice of platforms and the benefits of the transition:

“Enter Hadoop, an open source data processing platform gaining adoption on the strength of two promises: ultra-high scalability and low cost compared with conventional relational databases. Hadoop systems at 200 terabytes cost about one-third of 200-TB relational platforms, and the differential grows as scale increases into the petabytes, according to Sears. With Hadoop’s massively parallel processing power, Sears sees little more than one minute’s difference between processing 100 million records and 2 billion records.”

This emerging drive toward IT services shows the basic needs of the enterprise and the reliance upon open source technology as businesses shift to big data services. The article admits there are issues with Hadoop: it is an immature platform and there is a lack of talent and experts in the program. Open source is a viable option for building solutions and experts are needed; enterprise search solution Intrafind does this well.

Andrea Hayden, November 14, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta