Amazon CloudSearch

April 12, 2012

Well, some of the folks working to bolt a search and retrieval system into “big data”, mobile apps, and cloud vendors’ systems are trying to figure out what to do about Jeff Bezos. The head of Amazon has taken time from his space flight activities to disrupt the world of cloud-based search and retrieval.

The announcements were handled in Amazon’s typical mode. Those who were privy to the new service, which is based on A9 with what looks like some open source goodness inside, had to keep quite. Then Amazon published a chunk of Web pages about the service. You can find most of the basics in this CloudSearch documentation collection.

There are two general interest type blog posts. You may want to check out Dr. Werner Vogels’ “Expanding the Cloud—Introducing Amazon CloudSearch” and the AWS Blog story “Amazon CloudSearch—Start Searching in One Hour for Less Than $100 / Month.”

The system is the “old” A9 search service which received some early life support from Udi Manber, now a Googler. But the features and functions referenced in the documentation suggest that additional work has been done to make facets, snippets, highlighting, and graphic features take advantage of some open source goodness. However, Amazon takes some care to make sure that the provider of the open source goodness is tough to grab. The best example of this method is Amazon’s handling of the Android operating system for the Kindle Fire. Beneath the sluggish interface of the Kindle Fire beats the heart of Android 2.x. Even the Amazon app store runs certain apps, not all of them. The approach works and keeps many of Amazon’s secrets from turning up in Gawker or trendy Silicon Valley blogs. Amazon secrecy is not quite Apple grade, but Amazon is familiar with the orchard.

According to Expanding the Cloud – Introducing Amazon CloudSearch:

Developers set up a Search Domain — a set of resources in AWS that will serve as the home for one collection of data. Developers then access their domain through two HTTP-based endpoints: a document upload endpoint and a query endpoint. As developers send documents to the upload endpoint they are quickly incorporated into the searchable index and become searchable.

Developers can upload data either through the AWS console, from the command-line tools, or by sending their own HTTP POST requests to the upload endpoint.

There are three features that make it easy to configure and customize the search results to meet exactly the needs of the application.

Filtering: Conceptually, this is using a match in a document field to restrict the match set. For example, if documents have a “color” field, you can filter the matches for the color “red”.

Ranking: Search has at least two major phases: matching and ranking. The query specifies which documents match, generating a match set. After that, scores are computed (or direct sort criterion is applied) for each of the matching documents to rank them best to worst. Amazon CloudSearch provides the ability to have customized ranking functions to fine tune the search results.

Faceting: Faceting allows you to categorize your search results into refinements on which the user can further search. For example, a user might search for ‘umbrellas’, and facets allow you to group the results by price, such as $0-$10, $10-$20, $20-$40, etc. Amazon CloudSearch also allows for result counts to be included in facets, so that each refinement has a count of the number of documents in that group. The example could then be: $0-$10 (4 items), $10-$20 (123 items), $20-$40 (57 items), etc.

For more information on the different configuration possibilities visit the Amazon CloudSearch detail page.

Automatic Scaling: Amazon CloudSearch is itself built on AWS, which enables it to handle scale.

Okay, automatic. This sounds like the standard line from every cloud vendor with knowledge of sharding, distributed computing, and work allocation. We noted that the system supports Boolean logic and math operations. That’s good news and long overdue from Amazon.

Our take on Amazon CloudSearch is that Amazon has introduced a service which will allow developers to get out of the business of figuring out how to bolt a third party search solution to their Amazon content. For organizations looking for a silver bullet to kill the on premises search systems, Amazon has taken a quick step into the search disco.

Will Amazon’s CloudSearch become a viable alternative for on premises search? Will Amazon’s new service put additional pressure on the big enterprise companies like Hewlett Packard and Oracle. Both of these outfits have spent big money buying ageing findability solutions. What about Microsoft with its ubiquitous search solutions included with SharePoint? What happens to mid tier vendors like Lexmark Isys or start ups like DataStax and its Enterprise 2.0 service?

We don’t know. What we do know is that Amazon, unlike Google and Facebook, has found a way to enter a service space without looking much like a head on competitor to any other company. Google has not moved too far from its on premises Google Search Appliance. Facebook continues to dither when it comes to full-on search. Amazon’s challenge will be getting its costs under control and finding a way to placate the Wall Street MBAs. Search on Amazon is, in our opinion, a service which is in dire need of improvement.

Perhaps the CloudSearch will impact the way Amazon.com’s book search works? I am still struggling to find a way to NOT out books which are not yet available. I find the method of coping with titles on the iPad 3 Kindle reading app almost unusable.

Can Amazon do better? Yes. Will CloudSearch be that important leap forward? I don’t know. But I am watching, and I have a hunch that other search vendors, partners, and integrators are checking out this most recent blast from Bezos Land.

Stephen E Arnold, April 12, 2012

Comments

2 Responses to “Amazon CloudSearch”

Iain Fletcher on April 13th, 2012 4:58 am

Check out the new Wikipedia Search Lab based on Amazon CloudSearch at http://wikipedia.searchtechnologies.com
Amazon CloudSearch-Information Retrieval as a Service | Researcher's Blog on April 13th, 2012 8:51 am

[…] http://arnoldit.com/wordpress/2012/04/12/amazon-cloudsearch/?amp&amp […]

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.