Google Annoys Some in the UK

April 30, 2011

The BBC reported that Google’s defense of a new algorithm in its search system has relegated some very popular Web sites to the bottom of the pile in search returns. In “Google Denies Panda Hit on Rival,” Google is calling allegations that it rigged results “almost absurd.” Funny that they didn’t deem the allegations “outright absurd”.

Google claims it is only trying to weed out websites known as content farms, Web sites that copy content from other sites to gain prominence. When Panda was introduced in mid-April, it was supposed to reduce the rankings of low-quality websites. is a shopping site owned by Microsoft and was one of the hardest hit by the new algorithm, and while it is not unusual for shopping sites to drop in ranking because of the repeating reviews and comments, the 94 percent drop experienced by Ciao was undeniably harsh.

I’d have to agree that it’s not outrageous to think that Google was perhaps trying to dole out a little payback after Ciao filed suit and initiated and investigation against the conglomerate in late 2010. Though it would certainly be hard to rig the results because of the mass number of sites and searches, it’s probably not impossible considering that algorithms to make a computer learn from past mistakes is already in the works. Though why some of these companies are complaining I can’t figure out because their business didn’t see an immediate impact, the search words eliminated weren’t being clicked anyway.

Leslie Radcliff, April 30, 2011


Watson and Its Methods

April 30, 2011

In the Fast Company article by Ariel Schwartz, IBM is partnering with Caltrans and the University of California at Berkeley to create a “personalized commuter forecast” for individuals living in large cities and high traffic areas. This p.c.f. will be dubbed “Watson” because everyone needs a trusty sidekick.

Schwartz’s article “IBM Will Go All Watson On Your Commute, Keep You Out Of Traffic” explains that the program, which is still in its prototypes stage, will use the GPS on your phone to analyze traffic on your daily route to traffic and suggest the route that will get you to work the fastest. (According to IBM you’re still S.O.L. if you live in an area with no or few alternate routes, go figure.)

Instead of slogging through the traffic, your phone recommends that you drive halfway to work, park in the BART parking lot, and take the subway system the rest of the way. If you leave now, you’ll make your way through traffic just in time to catch the next train to work.

I feel like that’s a little too good to be true. Though IBM’s willingness to utilize already in place technologies such as the road sensors used by Berkeley and Caltrans is admirable (no wonder they’re partnered.)

Let’s face it. It all boils down to money, IBM is hoping to generate cash based upon sales to different transportation entities, merchants who would build along newly used transit systems and sales from advertisements–in exchange for IBM knowing your every location. IBM is not the government. Some people may take the position, “What’s one more conglomerate tracking user behavior?”

Leslie Radcliff, April 30, 2011


TNR Global and Its Take on Enterprise Search

April 30, 2011

I was poking around with my Overflight system and came across four “white papers” about enterprise search. These were produced by TNR Global, a firm which offers scalable Web and search solutions. The company asserts:

TNR Global (TNR) is a systems design and integration company focused on enterprise search and cloud computing solutions.   We develop scalable web-based search solutions built on the open source LAMP stack.  We have over 10 years of hands-on experience in web systems and enterprise search implementations, both proprietary and open source. We specialize in FAST ESP and Lucene Solr search applications.

The company says that it has three specialties; namely:

Amazon Elastic Compute Cloud (EC2) platform for deploying applications into the cloud
FAST Enterprise Search Platform (ESP) and Lucene Solr implementations for data intensive web sites
Web based system administration and application management.

The “white papers” are available at The topics covered are:

I took a quick look and I found that the approach was interesting. In the basics white paper, TNR explains that Web search is not enterprise search. To provide some substance to the definition of enterprise search, TNR identifies four ways to apply enterprise search: eCommerce, market management, online media and publishing, and risk management and eDiscovery. The Enterprise Search and Government “white paper” is one page in length. The Enterprise Search for Law Firms white paper explains eDiscovery and identifies such functions as faceted navigation, role based search, clustering, relevance ranking, etc. The Life Sciences white paper blends eDiscovery, referencing the rules for legal procedures.

The most recent news on the firm’s Web site is dated November 2010. The company has a Web log, “Enterprise Search, System Administration, and Cloud Computing.”

In my forthcoming landscape of search book for, I list some of the resellers and integrators known to be working with the search systems I profile. I will try to capture basic information about other niche search consultants as I come across the information.

Stephen E Arnold, April 30, 2011


Libraries Like the Snow Leopard May Be Endangered

April 29, 2011

We should have known this day would come. At, Peter Cochrane blogs the question: is it “Time Libraries Were Shelved?” He asserts:

“Does it matter anyway? The debate goes on but I must admit that I cannot remember the last time I visited a physical library. I give away far more books than I read.”


His questions were prompted by cuts to public libraries in the U.K. That story is already in progress here in the U.S. Are we about to become an illiterate society?

Budget woes pushed the trend, of course, but perhaps it was inevitable. Many feel that books are simply an outdated technology. I see their point but, at the risk of sounding outdated myself, there’s just no substitute for a real book in my real hands.

Sure, I can curl up in my comfy chair with an eReader, but it’s just not the same. I enjoy the different weights of different books, the feel of turning a real page, even the smell of ink and paper. And those sensations are part of what enticed me to become a reader in the first place! I can’t be the only one.

Besides, without libraries, how will folks get free access to knowledge? Ben Franklin would be very disappointed. Online is useful, but it does not answer * every * question a research may have.

Cynthia Murrell April 29, 2011


dtSearch: Unusual Marketing Angle

April 29, 2011

Though search sites such as Google provide users with a wealth of information it is important to pay careful attention to search results. According to the Artikel Indonesia Kirim Tulis Submit article “DtSearch Desktop 7.54.7670 OEM Low Cost” the dtSearch product line is available for download for $15. The Web site said:

“The dtSearch product line can instantly search terabytes of text across a desktop, network, Internet or Intranet site. dtSearch products also serve as tools for publishing, with instant text searching, large document collections to Web sites or CD/DVDs.”

The site claims that users can instantly buy and download the program. However, DtSearch has its own product website and it gives a detailed account of each product as well as offers customers a free trial to gain more interest. Upon doing a search on Google for the product both websites are likely to come up but the $15 discount seems questionable and may confuse some which is one of the problems associated with Google searches. In the Internet world some things are too good to be true.

April Holmes, April 29, 2011


Dieselpoint: Described in a Fuzzy Manner

April 29, 2011

A quote from the MartinButler Research “fact with opinion piece“Dieselpoint” states

“Dieselpoint is something of a Porsche in the Enterprise Search space. It is very fast, well-engineered, doesn’t carry much excess weight, and its text based searching technology can be made to satisfy almost any search requirement.”

Though the Porsche reference is a somewhat unconventional comparison, to most it sounds like this company deserves a closer look. At first glance the Dieselpoint Web Site seems routine but upon taking a closer look one can’t help but notice that it does not list any current information or events within the last several years but they claim to be a leader in their field. This article says some great things about Dieselpoint but it ultimately leaves more questions than answers. Questions such as “What type of system does Dieselpoint offer??” and “What type of moderate prices and options do they offer?” come up. With more questions than answers it may be that this “Porsche” may be parked on the shoulder of the information superhighway.

Check out our Overflight profile of Dieselpoint. Quiet seems it.

Stephen E Arnold, April 29, 2011


Access Innovation Merges Data Harmony and Microsoft SharePoint 2010

April 29, 2011

According to the article “Access Innovation Integrates Data Harmony with Microsoft SharePoint 2010” Access Innovation hopes its Data Harmony and Microsoft SharePoint 2010 integration will provide clients with even more valuable options. The Data Harmony suite provides users with a content rich thesaurus and management tools to help them organize their information resources. “Data Harmony can be used to provide semantic capabilities to SharePoint to help users take full advantage of their metadata through auto classification, enterprise taxonomy management, entity extraction and enhanced search.”

The new MAIstro program offers users a whole new level of automation services. The software program will automatically index any SharePoint content using a combination of taxonomy and thesaurus database tools. The indexing results obtained “can be more than 90 percent accurate.” Individuals can search a specific subject and even find additional information using related terms. Sounds like the Data Harmony Microsoft SharePoint merger could be the beginning of a beautiful relationship.

April Holmes, April 29, 2011

Freebie but I have been promised a Mexican burrito

Protected: SharePoint Generates Search Suggestions

April 29, 2011

This content is password protected. To view it please enter your password below:

Nuxeo and the Google Search Appliance

April 28, 2011

I saw a brief news item about the integration of the open source content management system with the Google Search Appliance. Nuxeo already hooks into Lotus Notes and a number of other enterprise applications. The cheery “Great News…Nuxeo Integration with Google Search Appliance” points out:

Nuxeo’s recently announced Google Search Appliance (GSA) connector is an important component for any enterprise indexing and search strategy. Nuxeo content is actively indexed and can be searched using the familiar Google search page. Of course, to access Nuxeo content you still to login and you must have appropriate rights. And because the Nuxeo connector is open source, it can always be customized to meet your specific requirements!

My reaction to this announcement was a question about the cost of scaling a GSA search solution. I covered some of Google’s publicly posted pricing data for its GB 7007 and GB 9009 devices. The article appeared in ETM, a publication of (This was a for fee column, so you will have to chase down the hard copy of the publication or contact I had a couple of comments about the cost of the GSA, particularly when an organization has to upgrade to handle tens of millions of documents.

My reaction is that organizations considering the GSA will want to make certain about the document count and then get written price quotations for the appropriate GSA AND the cost of scaling that Google Search Appliance as the volume of content increases.

The savings from an open source CMS could be consumed by a GSA upgrade unless the licensee does his or her homework.

Stephen E Arnold, April 28, 2011

Freebie unlike the GSA

Amazon: Insight into Search, Engineering, and Cloud Computing

April 28, 2011

In order to locate data, one must be able to search for it. If search does not work, data are lost. Seems obvious but one of the consequences of the Amazon cloud outage was that I had to think about the online big box store again. Amazon is, to me, a convenient way to get books and buy a gift or a replacement BlackBerry battery. Even when the A9 service was a priority, Amazon’s ability to make information findable was hit and miss.

Even today, I have a tough time thinking of Amazon as giant, reliable, low cost information utility. I have difficulty finding lists of books “about” a subject. Sometimes I stumble upon this user created content; other times, I have no idea how to find this useful information. When I want a book, I don’t know how to NOT out books that are available from those that will be published in the future. I cannot find information about the credits I “earn” when I buy Kindle books or products using my Amazon credit card. The snail mail coupons I used to get have disappeared, and I don’t have a clue about “finding” this information.

Several years ago, we did a close look at how Amazon handled glitches. The information was not that different from other companies we had examined. However, one approach was interesting. When an outage took place, a small team was assembled to figure out what happened and to fix it. This approach has its upside such as speed and fluid problem solving. The downside, in my opinion, was that solutions could be ad hoc. In my view, the next time a problem cropped up, the Amazon approach I probed three years ago meant that the next problem solving team had to figure out what the previous team did. No big deal until the problem of figuring out everything consumed lots of time.

We are not using Amazon Web services. Call me old fashioned but I prefer to have data storied on local devices with appropriate backups on media in an off site location.

For another, unrelated project we ran a series of tests in 2010 on the take up of the phrase “cloud computing.” What we learned was that the actual traffic generated by the phrase “cloud computing” was far less than our client anticipated.

After a six month text, we concluded:

  • There was a large amount of information about cloud computing from a bewildering range of vendors big and small
  • The interest in cloud computing was less than in some other words and bound phrases we tested
  • The information about cloud computing was a cloud of semantic fuzziness; that is, it was difficult to pin down specifics within the documents written about cloud computing.

What happens when you combine a retail store with a cloud computing service? You get an anchor point. Amazon becomes associated with certain words and phrases, but these may not have much meaning. Examples range from acronyms from S3 to EC2.

What happens when a company which has associated itself with this difficult to define subject has an outage? The problems of Amazon immediately diffuse across other products and services available in the cloud.

You can see an example of this semantic drift in “Amazon: Some Data Won’t Be Recovered after Cloud Outage.” The article points out that the Amazon “outage” has resulted in data that “won’t be recovered.” The problem is no one that Amazon and its customers must resolve.

Amazon’s close association with cloud computing has made the Amazon incident the defining case for the risks of cloud computing. Even worse, unrecoverable data cannot be found. Search and retrieval does little good if the data no longer exist. Services which depend on their customers locating information are effectively stranded. Those affected include “Quora, Sencha, Reddit, and FourSquare.”

So what?

This problem at Amazon provides some insight into the firm’s engineering approach. In a larger arena, the close association of Amazon with cloud computing has had a somewhat negative impact on the concept of cloud computing. To sum up:

  • You can’t find information if it is not  “there”
  • Amazon’s engineering methods are interesting and may give some companies some additional analysis to perform
  • The impact of the outage has created some pushback for other cloud computing vendors.

Will this be a defining moment for Amazon? Probably not, but it is an interesting moment. Non-recoverable is a disturbing notion to those who have to find a fact, entity, or a concept. Amazon has figured out some aspects of eCommerce. Other areas warrant additional investment which may be why Amazon’s costs are skyrocketing.

Stephen E Arnold, April 28, 2011


Next Page »

  • Archives

  • Recent Posts

  • Meta