Elasticsearchs Facebook Happiness and Search Competition in 2014

January 21, 2014

A holiday leftover we’ve found at the Elasticsearch Blog has us contemplating the open-source search race for 2014. The site shares video from a recent event in which a Facebook representative discusses his company’s use of their product in, “Facebook & Elasticsearch: for Your Holiday Viewing Pleasure.” Is Elasticsearch surpassing Silicon Valley-based LucidWorks?

The post introduces the video:

“So, without further ado, we bring you this video from the inaugural Elasticsearch Silicon Valley meetup, in which you’ll learn more about Facebook’s use of Elasticsearch, including:

  • Facebook’s migration from Apache Solr to Elasticsearch
  • The company’s use of Elasticsearch to power internal search for developer tool sets and libraries
  • How Elasticsearch powers Facebook’s Community Help Site
  • And much, much more on their use case.”

The video is over an hour long, and full of good technical information, if that’s your thing. But the first two minutes summarize why Facebook prefers Elasticsearch over the competition. (The company had previously tried using Google Enterprise Search and Apache Solr and found each lacking.) Below the video, the post links to a webinar on getting started with their product. Formed in 2012, Elasticsearch is based in Amsterdam with offices in the U.S., the U.K., France, Germany, and Switzerland. They are also hiring as of this writing; that’s usually a good sign.

We heard that OpenSearchServer, another open-source search vendor, snagged the Le Monde account from Sinequa. If true, there seems to be competition between open-source search vendors and non-open-source search systems as well as among open-source search vendors.
Contention and competition. The year 2014 will be fascinating.

Cynthia Murrell, January 21, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Search Application Perspectives For 2014

January 20, 2014

With 2014 well under way, search experts are trying to predict what will happen for enterprise search. Search Appliance World has an article that takes a look on enterprise search in the past and future called, “The New Search Appliance Landscape: Reflections And Predictions With MaxxCAT.” Basic search commands that come in out-of-the-box system are old school and do not provide the robust solution enterprise systems need.

Search appliances became enterprise users’ favorite toys and everyone had to have the Google Mini Search Appliance, but those days are gone. Other search developers, such as MaxxCat, stepped up to the plate.

The article states:

“ ‘In 2013, we saw a lot of the fallout from that as customers realized they couldn’t replace their Google Mini appliance and went looking for viable alternatives that weren’t $30K. For us, this lead to a huge boost in sales of our entry level appliances and even some additional sales of our enterprise series appliances,’ MaxxCAT Director of Marketing & Sales Chris Whissen told Search Appliance World.”

The MaxxCat developers were interested in exploring new markets their search appliance could expand into. The company is also big on customer service and ensuring that clients know they are valued. The biggest endeavor being made, though, is offering MaxxCat’s clients an efficient solution to solve their search problems and to encourage more competition in the search application market. Google is no longer the small player, but some of its solutions have grown too expensive for its former clients. New companies like MaxxCat keep the market fresh and offer up new ideas.

Whitney Grace, January 20, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

A User Friendly Search Tip from MarkLogic

January 20, 2014

Open source developers rely on code libraries their fellow developers have already written to complete their projects. This makes search engines and open source communities an invaluable tool. Blogs are another place to locate helpful code and the Developer Notes details one developer’s notes on XML, MARKLOGIC, XQUERY, XFORMS, XSLT, SCHEMA, JAVA, HTML, CSS, JAVASCRIPT, XSSI AND OTHERS.

By searching through the blog’s archive, we found this old post entitled: “MarkLogic: Techniques For Querying In-Memory Fragments Using CTS:Contains.”

Here is what you will find in the post:

“This snippet demonstrates the use of cts:contains and cts:element-attribute-word-query on an in-memory fragment (something that has been stored in the Expanded Tree Cache using a let statement).”

As any kind developer who pulls from open source, the author posts the code for anyone to use in their project. We occasionally find neat little tricks like this tucked away in the Internet. Sadly, many of them can get lost and are left in the hidden Web, which is we rely on deep Web crawler content wrangler. Developers need a robust search engine to find good code. Sometimes the big guns like Google do not work.

Whitney Grace, January 20, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Basic Search Tips for SharePoint

January 16, 2014

The article titled Not Getting The Search Results You’re Looking For In SharePoint? on Microsoft Office divulges tips to help users who are having trouble with SharePoint, the content management platform from Microsoft. The basic issues it mentions relate to too many or too few results. For those receiving too few results, tips include generalizing search terms and ensuring your settings aren’t blocking the results. For too many results, the article suggests using Advanced Search.

This is what the article suggests for no results regarding syntax use:

“If an error message tells you to make sure you’re using the proper syntax, the search system interprets your search as a KQL query, but finds that there’s something wrong with the syntax. Check that you’re using the right syntax, and particularly the right number of parentheses and double quotation marks. If you want to search for a phrase that contains a parenthesis or double quotation mark, make sure that you enclose the entire phrase… in double quotation marks.”

You can also consult the Keyword Query Language (KQL) Syntax Reference for querying in KQL. However, we have a hunch that this advice is not too useful if the content is not in the index or some other system level issue is a problem. Consider the article more of a basic troubleshoot guide, not a comprehensive directory.

Chelsea Kerwin, January 16, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Google Search Appliance VS SearchBlox Price and Indexing Limit Comparison

January 16, 2014

The article on SearchBlox titled Google Search Appliance Price Comparison with SearchBlox explains the slippery pricing data available to consumers on Google Search appliances. The article states that for a more limited document storage space Google charges $30,000 while SearchBlox, for unlimited storage, charges only $5,000 (but these numbers are only approximations). SearchBlox also offers more constant support and maintenance than Google, making it a very appealing option in the world of intranet or Web site search.

The article explains:

“SearchBlox provides the option of seamlessly moving away from the Google Search Appliance without skipping a beat. In addition to the cost-savings and feature comparison, scalability of the solution is something to consider given the explosion of content. SearchBlox scales both vertically (by adding more CPU/RAM to the existing setup) and horizontally (by adding more search servers that can be run in a cluster) without disrupting your architecture.”

SearchBlox even allows for Google administrators using XLS with a “faceted search plugin” that promises not to disturb the infrastructure. Allowing users to index unlimited documents certainly beats Google’s 500K indexing limit. A quick check of the GSA Advantage site shows that the Google Search Appliance is a significantly more expensive alternative to the open source based SearchBlox solution.

Chelsea Kerwin, January 16, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

A Sales Pitch for HP IDOL

January 12, 2014

Conceptual search allows users to search by concepts and ideas within information rather than basic keywords and phrases. Great idea, except that that the idea of conceptual search has been around since 1999. HP is touting it as a entirely brand new idea in the article, “Analytics For Human Information: Optimize Information Categorization With HP IDOL” posted on its own Web site. Rather than break directly into the “new” conceptual search, we are given the even better glittery term “categorization.” HP IDOL, using ExploreCloud-an SaaS solution for analytics and sights, offers an auto-categorization feature marked as a time saver and productive tool.

HP describes it as a magic tool:

“Powered by HP IDOL, ExploreCloud helps you uncover insights across all channels: web, mobile, social media, email, contact center, database, and storefront, so that you can organize and quantify content in a consistent, objective manner, resulting in data that is more accessible and consistent. And you can maintain existing legacy taxonomies and/or enrich them with contextual understanding. When you go beyond the limitations of what keywords can help you do, your whole world opens up. You can also discover the “unknown unknowns,” or topics you did not know to look for in the first place.”

The article stresses that regular keyword searching is far from abandoned, but its limitations are stressed. Keyword search’s weaknesses are addressed to the point of stating the obvious, and then it turns into a sales pitch for HP IDOL. Little is said about what exactly HP IDOL can do, other than organize data. HP, please tell us something we do not know.

Whitney Grace, January 12, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

More Watson Nuggets

January 10, 2014

Fast Company published “IBM’s Watson For Business: The $1 Billion Siri Slayer.” The write offers some nuggets of information that convert Watson from search system into the next Apple or Google. Frankly I find this notion somewhat amusing.

The story reports this interesting assertion, “IBM wants to transform Watson into a Siri for business.” Quite an analogy.

I also noted these items:

  • Stephen Gold is the vice president of IBM Watson Solutions
  • Watson Discovery Advisor will be a product/service for publishing, education, and health care
  • Watson Analytics Advisor appears to be an interactive analytics solution
  • An ecosystem will be built around the Watson Application Programming Interface and “the Watson headquarters will also include space for a tech incubator for startups building Watson-based apps”
  • Watson will be deployed on Softlayer, an IBM cloud computing service. Apparently some eager Watson prospects have an appetite for Softlayer’s delivering Watson.

I marked a quote to note from Mr. Gold and Fast Company:

Watson for Business is “one of the top innovations in IBM’s history” and it could even be the biggest IBM innovation since the IBM PC.

IBM seems to have made a different executive available to the Wall Street Journal and the New York Times. My hunch is that the cheerleading will continue for a while.

Meanwhile where’s the online demonstration of Watson’s functionality? I want to see how the system compares to Hewlett Packard’s Autonomy technology, check out the visualizations to see if they are different from IBM i2’s, and figure out if the analytics are recycled SPSS functions or something different.

Stephen E Arnold, January 10, 2014

The IBM Watson PR Blitz Continues

January 9, 2014

Content marketing is alive and well at IBM. I read two Watson related stories this morning. Let’s look at each and see if there are hints about how IBM will generate $10 billion in revenue from the game show winning Watson information system.

The New York Times

“IBM Is Betting Watson Can Earn Its Keep” appears on page B 9 of the hard copy which arrives in Harrod’s Creek most days. A digital instance of this Quentin Hardy write up may be online at http://nyti.ms/1krYgfx. If not, contact a Google Penguin for guidance.

The write up contains a quote to note:

Virginia M. Rometty, CEO of IBM: Watson does more than find the needle in the haystack. It understands the haystack. It understands concepts.

The best haystack quote I have heard came from Matt Kohl, student of Gerald Salton and founder of Personal Library Software. Dr. Kohl pointed out that that haystacks involve needles, multiple haystacks and multiple needles, and other nuances that make clear how difficult locating information can be.

The quote attributed to Ms. Rometty also nods to Autonomy’s marketing. Autonomy, since 1996, emphasized that one of the core functions of the Bayesian-Shannon-Laplace-Volterra method was identifying concepts automatically. Are IBM and arch rival Hewlett Packard using the same 18 year old marketing lingo? If so, I wonder how that will play out against the real-life struggles HP seems to be experiencing in the information retrieval sector.

There are several other interesting points in the content marketing-style article:

  1. IBM is “giving Watson $1 billion and a nice office.” I wonder if the nuance of “giving” is better than “investing.”
  2. $100 million will be allocated “for venture investments related to Watson’s so-called data analysis and recommendation technology.” One hopes that IBM’s future acquisitions deliver value. IBM already owns iPhrase, a “smart search system,” some of Dr. Ramanathan Guha’s semantic technology, Vivisimo, and the text processing component of SPSS called Clementine. That’s a lot of in hand technology, but IBM wants to buy more. What are the costs of integration?
  3. IBM has to figure out how to “cohere” with other IBM initiatives. Is Cognos now part of Watson? What happens to the IBM Almaden research flowing from Web Fountain and similar initiatives? What is the role of Lucene, which I heard is the plumbing of Watson?

The IBM write up will get wide pick up, but the article strikes me as raising some serious questions about Watson initiative. There may be 750 eager developers wanting to write applications for Watson. I am waiting for an Internet accessible demonstration against a live data set.

The Wall Street Journal, Round 2, January 9, 2014

The day after running “IBM Struggles to turn Watson into Big Business”, the real news outfit ran a second story called “IBM Set to Expand Watson’s Reach.” I saw this on page B2 of the hard copy that arrived in Harrod’s Creek this morning. Progress. There was no WSJ delivery on January 6 and January 7 because it was too cold. You may be able to locate a digital version of the story at http://on.wsj.com/1ikQa3X. (Same Penguin advice applies if the article is not available online.)

This January 9, 2014, story includes a quote to note:

Michael Rhodin, IBM senior vice president, Watson unit: We are now moving into more of a rapid expansion phase. We’ve made incredible progress. There is lots more to do. We would not be pursuing it if we did not think think had big commercial potential.

We then learn that by 2018, Watson will generate $1 billion per year. Autonomy was founded in 1996 and at the time of its purchase by Hewlett Packard, the company reported revenue in the $800 million range. IBM wants to generate more revenue from search in less time than Autonomy. No other enterprise search and content processing vendor has been able to match Autonomy’s performance. In fact, Autonomy’s rapid growth after 2004 was due in part to acquisitions. Autonomy paid about $500 million for Verity and IBM’s $100 million for investments may not buy much in a search sector that has consolidated. Oracle paid about $1 billion for Endeca which generated about $130 million a year in 2011.

Net Net

Watson has better PR than most of the search and content processing companies I track. How many people at the Watson unit pay attention to SRCH2, Open Search Server, Sphinx Search, SearchDaimon, the Dassault Cloud 360 system, and the dozens and dozens of other companies pitching information retrieval solutions.

I would wager that the goals for Watson are unachievable in the time frame outlined. The ability of a large company to blast past Autonomy’s revenue benchmark will require agility, flexibility, price wizardry, and a product that delivers verifiable value.

As the second Wall Street Journal points out, “IBM is looking to revive growth after six straight quarters of revenue declines.”

IBM may be better at content marketing than hitting the revenue targets for Watson at the same time Hewlett Packard is trying to generate massive revenues from the Autonomy technology. Will Google sit on its hands as IBM and HP scoop up the enterprise deals? What about Amazon? Its search system is a so-so offering, but it can offer some sugar treats to organizations looking to kick tires with reduced risk.

Many organizations are downloading open source search and data management systems. These are good enough when smart software is still a work in progress. With 2,000 people working on Watson, the trajectory of this solution will be interesting to follow.

Stephen E Arnold, January 9, 2014

Distraction Addiction: Welcoming Predictive Search Systems

January 9, 2014

The article on Business Insider titled Here’s How Many Times People Switch Devices In a Single Hour provides insight into the studies being undertaken by both Google and Facebook into following users from device to device. They need to demonstrate to advertisers that the ad one user saw on his laptop at work later caused him to make a purchase from his smartphone. The article states

“A new study from the British unit of advertising buyer OMD shows just how massively important this cross-device tracking has become to monitoring a given consumer’s behavior.

In looking at the behavior of 200 Brits during one evening, OMD found that the average person shifted his attention between his smartphone, tablet, and laptop a staggering 21 times in one hour.”

This study’s findings may not come as huge surprise. An article on Salon titled How Baby Boomers Screwed Their Kids and Created Millennial Impatience argues that the Generation Y is the most distracted and impatient batch of people yet. The article contends,

“According to a study at Northwestern University, the number of children and young people diagnosed with attention deficit hyperactivity disorder (ADHD) shot up 66 percent between 2000 and 2010. Why the sudden and huge spike in a frontal lobe dysfunction over the course of a decade… What I believe is likely happening, however, is that more young people are developing an addiction to distraction. An entire generation has become addicted to the dopamine-producing effects of text messages, e-mails and other online activities.”

This “addiction to distraction” is often held up by Gen Y’ers as an ability to “multi-task”. But what does it mean to be someone unable to focus? In Buddhism there is the belief that if you are doing more than one focused task, you are not truly alive.

With telework, the workplace is now the world.

We have all succumbed at one time or another to the call of checking our e-mail, Facebook, or Twitter account, but when we are doing it so often that it takes over our concentration, what have our lives become? There is a wide gap between flitting from these exciting distractions and actually gaining some foothold of understanding. And the more we do jump back and forth between tasks, the less likely it becomes that any knowledge is created or stored. The Salon article paints a bleak picture, starting off with the dark Philip Larkin poem “This Be the Verse” (it is hardly “High Windows”) and including this dreary image of the future,

Read more

A Full Text Engine Blooms in Life

January 9, 2014

Basic search for static Web sites stink. They are just a generic code that takes a one-size fits all approach to search and as we all know that never works. Stavros Korokithakis realized this problem and decided that he wanted to create a full-text search engine that was accurate. In his article, “Writing A Full-Text Search Engine Using Bloom Filters,” Korokithakis details how he wrote his own search using an inverted index and bloom filters. An inverted index works by mapping every word in a document to the ID of the document. As one can imagine that list grows very big and the basic search engine for a static Web site returns every hit. A search plug-in limits itself to titles, tags, and key words. How do you get the same results for a static search?

A bloom filter is the answer. A bloom filter is a data structure that stores elements in a fixed number of bits and tells users whether it has seen those elements before queried. It is also apparently easy to implement a bloom filter:

  • “Create one filter per document and add all the words in that document in the filter.
  • Serialize the (fixed-size) filter in some sort of string and send it to the client.
  • When the client needs to search, iterate through all the filters, looking for ones that match all the terms, and return the document names.
  • Profit!”

He even has a quick implementation guide in Python. It sounds like a wonderful way to improve static Web site search, but could not the same problem be solved with a simple plug-in as described above? With the rampant use of people relying pre-made Web site servers such as Word Press, tumblr, etc. they come with built-in plug-ins. Is this for the bigger Web sites people deploy?

Whitney Grace, January 09, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta