Is Google Attacking Endeca with Killer Prices?

June 23, 2010

I am not sure if this eWeek story is on the money, but I want to capture the alleged Google pricing for its eCommerce service. Judge for yourself by navigating to “Google Commerce Search 2.0 Gets Refinements, $25K Price Point.” The big dogs in eCommerce include / have included Dieselpoint, EasyAsk, Endeca, SLI Systems, Omniture Mercado, and a handful of other outfits. Price points range from $25,000 right on up to millions, depending on what the customers’ specifications are perceived to be. Keep in mind that scaling and tuning may add significantly to the cost of an ecommerce system.

For me the key paragraph was:

The search engine also added a new price point for Commerce Search. The original entry level price was $50,000 per year for an indexing of 100,000 items and up to 10 million queries. Google has cut that virtually in half to appeal to smaller businesses, or businesses with smaller needs. Businesses may now license Commerce Search for $25,000 per year, which is good for 50,000 products and 3 million queries. Customers will pay more as they scale.

So what? This price point is a bargain until one considers the sentence “Customers will pay more as they scale.” Budget that, grasshopper.

Stephen E Arnold, June 21, 2010

Freebie

Elsevier Buys Collexis

June 22, 2010

Elsevier continues to add to its search and content processing arsenal. With the cost of human indexing gushing like the BP oil spill, Elsevier is looking for magic to use for publishing scientific, technical, and medical information products and services. Elsevier is the giant company behind journals like The Lancet and the encyclopedia of Mosby reference books. In terms of indexing, sci-tech is easier to machine index than chatty Twitter tweets. To bolster the firm’s multiple methods, Elsevier acquired Collexis Holdings, a semantic technology and software developer. The plan is that the Collexis technology will give Elsevier the ability to help researchers and institutions take advantage of more avenues for finding data and publishing results, creating a better ROI. Is it a good plan? Yahoo has been a practitioner of this approach for years. Perhaps Elsevier can craft a success from this Yahoo-style approach. Now those Collexis assets have to be fine tuned and installed before the company or its clients will start seeing benefits. But kudos for Elsevier for making a positive step.

Jessica West Bratcher, June 22, 2010

Freebie

BA Insight Announces Longitude V4

June 15, 2010

I get quite a bit of information about snap in search and content processing systems designed specifically for Microsoft SharePoint. Many organizations find SharePoint and its components, add ins, and third party enhancements exactly what is needed to crack tough information management problems.

image

Make your SharePoint search as quickly as a Bugatti Veyron accelerates.

BA Insight – along with Fabasoft Mindbreeze, SurfRay, Coveo, Exalead, and other vendors – offers a search solution for SharePoint licensees. You can read about the “state-of-the-art search features” in “BA-Insight Announces Next-Generation Search Technology for SharePoint and FAST Search 2010 at Microsoft TechEd 2010 Conference. BA-Insight’s Longitude Version 4 Provides Automatic Optimization of Microsoft’s 2010 Enterprise Search Products.”

Among the state-of-the-art features are, according to the write up:

  • Highly scalable performance, superior to Flash/Java in speed of rendition
  • More efficient engine for rendering complex pages and 3D animation
  • Linking of structured and unstructured data
  • Text recognition within an image format, where OCR is executed on the fly
  • Translation from foreign languages
  • Strong .Net integration – customer ability to embed existing custom .Net extensions into the Silverlight viewer
  • Full use of all existing Longitude Search Connectors
  • Indexing of email including attachments
  • Parametric search.

The description of this product might bring tears to the eyes of BA Insight’s competitors and smiles of joy to SharePoint licensees who struggle to get a distributed SharePoint system humming like a Bugatti Veyron.

You can get more information about the BA Insight “state of the art” system at www.ba-insight.com. Each time I read about a search solution for SharePoint I wonder what creates such a thriving business in SharePoint search now that Microsoft owns the Fast Search & Transfer technology.

Stephen E Arnold, June 15, 2010

Google Addresses Index Staleness

June 10, 2010

Next week, I am giving two lectures about what is now one of the touchstones of 2010: real time. I will put up some extracts from these lectures in the next week or so. What I want to do this morning is cal your attention to a post from Google called “Our New Search Index: Caffeine.” I think the nod to the fizzy drinks that gives club goers and sluggish 20 somethings is interesting.

Most users of a search and retrieval system have zero clue about when the index was updated or assembled. The 20 something wizards assume that if an index is available from an electronic device, that index is up to the minute or even more current.

Most online system users have zero clue about when data were created, when those data were processed, when those index pointers were updated, or what other factors may have slammed on the search system’s air brakes. Ever hear this in an organization: “I know my version of the PowerPoint should be in the system but I can’t find it.” I do. Frequently.

The Google write up makes clear in a Googley sort of way wants to try and cope with streams of information from Twitter and Facebook. Traffic from social sites either has reached parity with search traffic or it has surpassed the traffic. I have some information in Overflight, and I will post one or two items that document this usage shift. Users seem to prefer the what looks to most people like “real time”. A traditional indexing system does not do real time with Mikhail Nikolaevich Baryshnikov’s agility.

Here’s what the Googlers said:

Caffeine lets us index web pages on an enormous scale. In fact, every second Caffeine processes hundreds of thousands of pages in parallel. If this were a pile of paper it would grow three miles taller every second. Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles. We’ve built Caffeine with the future in mind. Not only is it fresher, it’s a robust foundation that makes it possible for us to build an even faster and comprehensive search engine that scales with the growth of information online, and delivers even more relevant search results to you. So stay tuned, and look for more improvements in the months to come.

The idea is that Google which has numerous ways of skinning the content processing cat has grabbed the digital Red Bull.

image

© Google 2010.

I have no doubt that the freshness of certain types of content is going to benefit. However, I am not sure that Google will be able to handle its vast content processing needs with the ballet grace the nuclear logo in the blog post suggests. Furthermore, I don’t think that most users understand that whatever Google does to process content more quickly and update its indexes deals with some of the thorny underlying issues. I address these in my lecture, but the user is unlikely to know about latency elsewhere in the content ecosystem.

The notion of “real time” is slippery. The notion of an index’s “freshness” is slippery. The problem is a complex one. Why do you think that financial institutions pay really big bucks for products from Exegy and Thomson Reuters to deal with freshness? The reason? Speed that can be documented from moment of information creating, acquisition, processing, and availability. For freshness, be prepared to spend big money.

For a temporary pick me up, guzzle the caffeine-laced beverages from the 7-11. I might just recommend that you turn to http://search.twitter.com and look for a tip on where to buy a Jolt at a discount. Just my opinion.

Stephen E Arnold, June 10, 2010

A freebie. No coupons for a complementary can of Jolt.

Spindex Revealed

June 2, 2010

Microsoft’s Spindex is likely to add some useful functions to social media access. “Microsoft launches its Impossible Project: Spindex” provides a good description of smart software performing “personal indexing.” The idea is that a user’s social information allows Microsoft software to filter information. Only information pertinent to the user is available. When the source is a stream of Twitter messages, the Spindex system converts the noise in tweets to information related to a user’s interests. For me, the most interesting passage in the write up was:

Spindex is a way of surfacing the most shared or popular items that come through your personal news feeds on social networks like Twitter. Microsoft’s project is part of a wave of similar projects like The Cadmus, Feedera and Knowmore that try and synthesize trends and news streams from personal social networks. “Most people don’t really care about what’s trending on Twitter. They care about what’s trending in your own personal index. They want something that’s private, but that you can possibly make public and share with friends,” said Lili Cheng, who is general manager of the lab.

Objectively the service appears to be useful. Subjectively Microsoft will have to make certain that privacy centric users feel comfortable with the system.

Stephen E Arnold, June 2, 2010

Freebie

Apps Versus Browsers for Content

June 1, 2010

Fred a VC’s “I Prefer Safari to Content Apps On The iPad” triggered some thoughts about search and findability. The main point of the write up is that some content is better when consumed through a browser. The write up identifies a number of reasons, including:

  • Content as images
  • Link issues
  • A page at a time.

There are other reasons and you will want to read them in the original document.

I agree with most of these points, but there is a larger and I think more significant issue standing out of the spotlight. Those who create content as Apps may be making it difficult for a person looking for information to “find” the content. With the surge of interest in charging for “real” journalism or “real” essays, will search engines be able to index the content locked in Apps? The easy answer is, “Sure, you silly goose.”

But what if the publishers balk at playing ball with a Web indexing company? The outfit could be big and threatening like you-know-who in Mountain View or small and just getting its feet wet like Duck Duck Go.

Locked up content creates problems for researchers and restarts the cycle of having to have a bunch of accounts or waiting until an appropriate meta-index becomes available.

Stephen E Arnold, June 1, 2010

Freebie

Property Mappings or Why Microsoft Enterprise Search Is a Consultants’ Treasure Chest

May 31, 2010

First, navigate to “Creating Enterprise Search Metadata Property Mappings with PowerShell.” Notice that you may have difficulty reading the story because the Microsoft ad’s close button auto positions itself so you can’t get rid of the ad. Pretty annoying on some netbooks, including my Toshiba NB305.

Second, the author of the article is annoyed, but he apparently finds his solution spot on as something germane to open source search. Frankly I don’t get the link between manual scripting to perform a common function and open source search. Well, that’s what comes from getting old and becoming less tolerant of stuff that simply does not work unless there is a generous amount of time to fix a commercial product.

What’s broken? Here’s the problem:

One of the things that drove me absolutely nuts about Enterprise Search in MOSS 2007 was that there was no built-in way to export your managed property mappings and install them on a new server.  A third party utility on CodePlex helped, but it was still less than ideal.  With SharePoint 2010, well you still really can’t export your property mappings to a file, but you do get a lot of flexibility using PowerShell.

And the fix?

You use the baker’s dozen lines of code in the write up, substitute your own variable names, and presto, you can get access to that hard won metadata. Here’s the author’s key point:

It seems like a lot but it really isn’t.  I create two managed properties (TestProperty1 and TestProperty2).  In the case of TestProperty2, I actually map two crawled properties to it.

In my opinion, this type of manual solution is great for those with time to burn and money to pay advisors. Flip the problem. Why aren’t basic functions included in Microsoft’s enterprise search solutions? Oh, and what about that short cut for reindexing? Bet that works like a champ for some users. Little wonder that third party search solutions for SharePoint are thriving. And the open source angle? Beats me.

Stephen E Arnold, May 31, 2010

Freebie

DataparkSearch, Free Full-Featured Web Search Engine

May 24, 2010

Newslookup.com  is a quite the feat of news-search engineering. It is the first search engine to arrange search results by media type (television, radio, Internet, etc.) and category, display separate document parts, and effectively use meta data to crawl the internet to provide a “snapshot look of news websites throughout the world.” This is powered by a free, open-source search system called DataparkSearch, its origins going all the way back to 1998 via Russian programmer Maxim Zakharov.

Now in version 4, DataparkSearch boasts an impressive set of features, including indexing of all (x)html file types as well as MP3 and GIF files; support for http(s) and ftp URL schemes; vast language support; authentication and cookie support with session IDs in URLs; and a wide array of sorting, categorizing, and relevancy models to return specific results quickly. All of this is run through various database systems, notably SQL and ODBC.

Sochi’s Internet, a portal and search engine for the Russian city hosting the 2014 Winter Olympics, uses the DataparkSearch engine to deliver hotel, job, and real estate data for the city and surrounding area. The CGI front-end seen on the site provides the data collected by the “indexer,” described as a mechanism that “walks over hypertext references and stores found words and new references into the database.” The same mechanism allows for “fuzzy search,” correcting for spelling corrections and different word forms.

DataparkSearch is available through its own Web site  or via Google Code  where it has a quite busy activity log. Coded in C, the software is supported on a plethora of UNIX operating systems including FreeBSD and RedHat. Frequency dictionaries, synonym lists, and other helpful files can be found in multiple languages on the website, as well. Support for the search engine can be found through their Wiki, forum, and Google Group.

Samuel Hartman, May 20, 2010

Freebie.

Exalead and Dassault Tie Up, Users Benefit

May 24, 2010

A happy quack to the reader who alerted us to another win by Exalead.

Dassault Systèmes (DS) (Euronext Paris: #13065, DSY.PA), one of the world leaders in 3D and Product Lifecycle Management (PLM) solutions, announced an OEM agreement with Exalead, a global software provider in the enterprise and Web search market. As a result of this partnership, Dassault will deliver discovery and advanced PLM enterprise search capabilities within the Dassault ENOVIA V6 solutions.

The Exalead CloudView OEM edition is dedicated to ISVs and integrators who want to differentiate their solutions with high-performing and highly scalable embedded search capabilities. Built on an open, modular architecture, Exalead CloudView uses minimal hardware but provides high scalability, which helps reduce overall costs. Additionally, Exalead’s CloudView uses advanced semantic technologies to analyze, categorize, enhance and align data automatically. Users benefit from more accurate, precise and relevant search results.

This partnership with Exalead demonstrates the unique capabilities of ENOVIA’s V6 PLM solutions to serve as an open federation, indexing and data warehouse platform for process and user data, for customers across multiple industries. Dassault Systèmes PLM users will benefit from its Exalead-empowered ENOVIA V6 solutions to handle large data volumes thus enabling PLM enterprise data to be easily discovered, indexed and instantaneously available for real-time search and intelligent navigation. Non-experts will have the opportunity to access PLM know-how and knowledge with the simplicity and the performance of the Web in scalable online collaborative environments. Moreover, PLM creators and collaborators will be able to instantly find IP from any generic, business, product and social content and turn it into actionable intelligence.

Stephen E Arnold, May 22, 2010

Freebie.

Social Networks, Testosterone, and Facebook

May 13, 2010

In my Information Today column which will run in the next hard copy issue, I talk about the advantage social networks have in identifying sites members perceive as useful. Examples are Delicious.com (owned by Yahoo) and StumbleUpon.com (once eBay and now back in private hands).

The idea is based in economics. Indexing the entire Web and then keeping up with changes is very expensive. With most queries answered by indexing a subset of the total Web universe, only a handful of organizations can tackle this problem. If I put on my gloom hat, the number of companies indexing as many Web pages as possible is Google. If I put on my happy hat, I can name a couple of other outfits. One implication is that Google may find itself spending lots of money to index content and its search traffic starts to go to Facebook. Yikes. Crisis time in Mountain View?

image

It costs a lot when many identify important sites and the lone person or company has to figure everything out for himself or herself. Image source: http://lensaunders.com/habit/img/peerpressuresmall.jpg

The idea is that when members recommend a Web site as useful, the company getting this Web site url can index that site’s content. Over time, a body of indexed content becomes useful. I routinely run specialized queries on Delicious.com and StumbleUpon.com, among others. I don’t run these queries on Google because the results list require too much work to process. One nagging problem is Google’s failure to make it possible to sort results by time. I can get a better “time sense” from other systems.

When I read “The Big Game, Zuckerberg and Overplaying your Hand”, I interpreted these observations in the context of the information cost advantage. The write up makes the point via some interesting rhetorical touches that Facebook is off the reservation. The idea is that Facebook’s managers are seizing opportunities and creating some real problems for themselves and other companies. The round up of urls in the article is worth reviewing, and I will leave that work to you.

First, it is clear that social networks are traffic magnets because users see benefits. In fact, despite Facebook’s actions and the backlash about privacy, the Facebook system keeps on chugging along. In a sense, Facebook is operating like the captain of an ice breaker in the arctic. Rev the engines and blast forward. Hit a penguin? Well, that’s what happens when a big ship meets a penguin. If – note, the “if” – the Facebook user community continues to grow, the behavior of the firm’s management will be encouraged. This means more ice breaker actions. In a  sense, this is how Google, Microsoft, and Yahoo either operated or operated in their youth. The motto is, “It is better to beg for forgiveness than ask for permission.”

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta