February 13, 2011
So Google can be fooled. It’s not nice to fool Mother Google. The inverse, however, is not accurate. Mother Google can take some liberties. Any indexing system can. Objectivity is in the eye of the beholder or the person who pays for results.
Judging from the torrent of posts from “experts”, the big guns of search are saying, “We told you so.” The trigger for this outburst of criticism is the New York Times’s write up about JC Penny. You can try this link, but I expect that it and its SEO crunchy headline will go dark shortly. (Yep, the NYT is in the SEO game too.)
I am not sure how many years ago I wrote the “search sucks” article for Searcher Magazine. My position was clear long before the JC Penny affair and the slowly growing awareness that search is anything BUT objective.
In the good old days, database bias was set forth in the editorial policies for online files. You could disagree with what we selected for ABI/INFORM, but we made an effort to explain what we selected, why we selected certain items for the file, and how the decision affected assignment of index terms and classification codes. The point was that we were explaining the mechanism for making a database which we hoped would be useful. We were successful, and we tried to avoid the silliness of claiming comprehensive coverage. We had an editorial policy, and we shaped our work to that policy. Most people in 1980 did not know much about online. I am willing to risk this statement: I don’t think too many people in 2011 know about online and Web indexing. In the absence of knowledge, some remarkable actions occur.
You don’t know what you don’t know or the unknown unknowns. Source: http://dealbreaker.com/donald-rumsfeld/
Flash forward to the Web. Most users assume incorrectly that a search engine is objective. Baloney. Just as we set an editorial policy for ABI/INFORM each crawler and content processing system has similar decisions beneath it.
The difference is that at ABI/INFORM we explained our bias. The modern Web and enterprise search engines don’t. If a system tries to explain what it does, most of the failed Web masters, English majors working as consultants, and unemployed lawyers turned search experts just don’t care.
Search and content processing are complicated businesses, and the appetite for the gory details about certain issues are of zero interest to most professionals. Here’s a quick list of “decisions” that must be made for a basic search engine:
- How deep will we crawl? Most engines set a limit. No one, not even Google, has the time or money to follow every link.
- How frequently will we update? Most search engines have to allocate resources in order to get a reasonable index refresh. Sites that get zero traffic don’t get updated too often. Sites that are sprawling and deep may get three of four levels of indexing. The rest? Forget it.
- What will we index? Most people perceive the various Web search systems as indexing the entire Web. Baloney. Bing.com makes decisions about what to index and when, and I find that it favors certain verticals and trendy topics. Google does a bit better, but there are bluebirds, canaries, and sparrows. Bluebirds get indexed thoroughly and frequently. See Google News for an example. For Google’s Uncle Sam, a different schedule applies. In between, there are lots of sites and lots of factors at play, not the least of which is money.
- What is on the stop list? Yep, a list can kill index pointers, making the site invisible.
- When will we revisit a site with slow response time?
- What actions do we take when a site is owned by a key stakeholder?
February 7, 2011
We learned from one of our readers that Kartoo has turned out its lights. According to Wikipedia, the company shut down after a nine year run. Kartoo relied on Flash to display search results. Novel? Yes. Useful. In some types of queries, yes.
If you are interested in visual search, you can check out Yometa.com. This is a federating search system which taps results from Bing, Google, and Yahoo. A query for “Stephen E Arnold” returned this display.
Yometa displays the most relevant search results based on a combination of the three search engines ranking determined by the Yometa algorithm.
The company developed its approach based on research that showed that 97 percent of search results by the three search engines(Google, Yahoo and Bing) are different and there is only three percent overlap. The visual interface allows users to see results of Google, Bing and Yahoo individually and in various combinations. Users can see any combination of search results from Bing, Yahoo and Google in one screen and is displayed in a visual interface. The search results are displayed in a Venn Diagram, the results closer to the middle are more relevant.
For more information navigate to www.yometa.com/about/ .
Stephen E Arnold, February 7, 2011
August 11, 2010
Yippy, Inc. has good reason to rejoice. In “Yippy Releases Family Friendly Search For Nintendo Wii” http://www.tmcnet.com/usubmit/2010/07/28/4925824.htm VP Emily Parker says “the Yippee Wii search has been optimized for use with Nintendo Wii game controls and features Yippy content-blocking protocols.” The report also tells of a soon-to-be-released Yippee Wii Browser with cloud-based content management platforms.
Let’s not get ahead of ourselves. A family friendly search was the focus of The Point (Top 5% of the Internet), developed by Beyond Search’s Stephen E. Arnold, his son, Erik S. Arnold, and business partner, Chris Kitze in 1993. the Point service sold to Lycos in 1996, and, alas, Lycos lost its way. Now, a 17-yr-old idea is back, proving The Point was right on target almost two decades ago.
Brett Quinn, August 11, 2010
February 14, 2010
Abe Lederman (one of the founders of Verity) alerted me this morning that his company, Deep Web Technology, signed a deal and partnership agreement with SWETS. This Netherlands-based company is one of the world’s leading subscription services. SWETS helps government agencies and companies with subscriptions and related services. The firm has clients in over 160 countries and describes itself as “a long-talk powerhouse.”
Deep Web Technology provides the software and systems that fuel Science.gov, a US government search and retrieval project. Science.gov taps into a wide range of data and information related to science and technology. The invention of the Deep Web method was an outgrowth of Dr. Lederman’s experience in providing a user with access to a broad range of structured and unstructured data. In my various reports on enterprise and special purpose search, I have given Dr. Lederman’s method high marks, and I even let him buy me a taco in a restaurant in Santa Fe, after I finished a lecture at Los Alamos. Dr. Lederman contributed at Los Alamos prior to founding Deep Web as I recall.
The deal brings Dr. Lederman’s federation technology to the SwetsWise Searcher. This service will be powered by Deep Web Technology. SwetsWise is designed to help librarians and their users meet the challenge of searching and finding relevant results from the ever-increasing catalog of content available online. The search system simplifies access to an organization’s diverse and valuable resources, along with the open Web content users are accustomed to searching. SWETS will deliver search results through the Deep Web ranking engine, providing incremental results for fast response times, scalability and flexibility. SwetsWise Searcher performs a rapid parallel search of all available sources or selected sources in real-time, ensuring fresh information and that documents are retrieved the minute they are published into a collection’s database. A simple search box to cover all sources can be integrated into any web page, blog or Intranet homepage.
A happy quack to Deep Web Technology. No more tacos in Santa Fe. I want a nuked burrito, a nod to our friends up the road.
Stephen E Arnold, February 14, 2010
No one paid me to write this. I do have a promise of a taco in Santa Fe, which I have just rejected. I will report this to the Food & Drug Administration.
October 13, 2009
I have found the Kartoo.com service useful and innovative. I learned today that the company has rolled out a new interface and links that make it easier to locate the company’s other content processing technology. The new interface provides thumbnails of the top hits. You can explore other results by clicking on the links on the page. The default interface for the query “text mining” appears below:
Other new features include:
- E-reputation tools
- Metasearch functions
- Support for anonymous search
- Support for French, English and Dutch language.
If you have not explored the Kartoo service, give it a whirl.
Stephen Arnold, October 13, 2009, published because I like the French
October 8, 2009
Vivisimo, http://www.vivismo.com, a company that works with email archiving, eDiscovery, and information management solutions, just released a revved-up version of the Velocity Enterprise Search Platform, which builds search-centric programs. The platform focuses on extensibility, scalability and performance; Vivisimo is using it to accelerate into OEM and reseller markets. Those programs are designed to add value to existing applications and develop new solutions for sorting information assets, for example, it supports searching 1 billion emails on a single server. Vivisimo also says “With Velocity 7.5, new traceable accuracy metrics can accurately prove and defend that all data has been crawled and identify any documents that were not indexed due to corrupt file types.” This can be a big plus for companies dealing with growing regulation. A happy quack for Vivisimo (tagline: “Search Done Right!”). Any progress that can help enterprise business advance search and make sense of unstructured data is a good thing.
Jessica Bratcher, October 8, 2009
October 1, 2009
I have been using Coveo’s products for years. I remember the first time I fired up the original desktop search program. I found the interface intuitive and the features in line with how I looked for information. I learned from the company yesterday (September 30, 2009) that a new version of the product is now available. I noticed that the company has added several new features to its Enterprise Desktop Search application; for example:
- Search of content on my netbook, my Outlook mail store, and other applications running in my Harrod’s Creek data center.
- A centralized index of all enterprise information, including the formerly risky and elusive, cross-enterprise PC and laptop content, which is useful when I am in a meeting and need a coding gosling to locate a particular item of information that I tucked away without telling anyone its location
- Enhanced monitoring functions.
After installing the application, you will want to check out the built in connectors, the faceted “point and click” search function, and the support for access from a BlackBerry device. Nifty indeed because RIM’s search function is not too useful in my opinion.
The president and founder Laurent Simoneau told me:
With our roots dating to the early days of Copernic, a global leader in consumer desktop search, we were committed to build the cross-enterprise capability to index and provide unified access for employees to their desktop content, including their email,” said Coveo CEO and President Laurent Simoneau, who prior to founding Coveo in 2005 was COO of Copernic. “What we’ve done is elevate that access to a higher level, with unified search of not only their individual PCs and laptops, but of contextually relevant knowledge and information residing in any enterprise system, based on IT permissions. In so doing, we’ve placed control over cross-enterprise desktop content indexing, with complete security and access permissions, in the hands of IT.
The benefits of the new system struck me as reducing the time spent hunting for email. Larger organizations will be able to reduces costs and risks as well.
The Coveo Enterprise Desktop Search application is powered by the Coveo Enterprise Search 6.0 platform, which is scalable from hundreds of thousands to billions of documents, and requires approximately 20 percent of the server footprint of legacy enterprise search solutions. Our tests show that Coveo is one of the more modular and scalable enterprise search solutions. It ranks as one of the easiest to install and configure search solutions we have tested. Worth a look. Fill out the form and give it a spin.
Stephen Arnold, October 1, 2009
September 28, 2009
I thought I made Google’s intent clear in Google Version 2.0. The company provides a user with access to content within the Google index. The inventions reviewed briefly in The Google Legacy and in greater detail in Google Version 2.0 explain that information within the Google data management system can be sliced, diced, remixed, and output as new information objects. The analogy is similar to what an MBA does at Booz, McKinsey, or any other rental firm for semi-wizards. Intakes become high value outputs. I was delighted to read Erick Schonfeld’s “With Google Places, Concerns Rise that Google Just Wants to Link to Its Own Content.” The story makes clear that folks are now beginning to see that Google is a digital Gutenberg and is a different type of information company. Mr. Schonfeld wrote:
The concerns arise, however, back on Google’s main search page, where Google is indexing these Places pages. Since Google controls its own search index, it can push Google Places more prominently if it so desires. There isn’t a heck of a lot of evidence that Google is doing this yet, but the mere fact that Google is indexing these Places pages has the SEO world in a tizzy. And Google is indexing them, despite assurances to the contrary. If you do a search for the Burdick Chocolate Cafe in Boston, for instance, the Google Places page is the sixth result, above results from Yelp, Yahoo Travel, and New York Times Travel. This wouldn’t be so bad if Google wasn’t already linking to itself in the top “one Box” result, which shows a detail from Google Maps. So within the top ten results, two of them link back to Google content.
Directories are variants of vertical search. Google is much more than rich directory listings.
Let me give one example, and you are welcome to snag a copy of my three Google monographs for more examples.
Consider a deal between Google and a mobile telephone company. The users of the mobile telco’s service run a query. The deal makes it possible for the telco to use the content in the Google system. No query goes into the “world beyond Google”. The reason is that Google and the telco gain control over latency, content, and advertising. This makes sense. Let’s assume that this is a deal that Google crafts with an outfit like T Mobile. Remember: this is a hypothetical example. When I use my T Mobile device to get access to the T Mobile Internet service, the content comes from Google with its caches, distributed data centers, and proprietary methods for speeding results to a device. In this example, as a user, I just want fast access to content that is pretty routine; for example, traffic, weather, flight schedules. I don’t do much heavy lifting from my flakey BlackBerry or old person hostile iPhone / iTouch device. Google uses its magical ability to predict, slice, and dice to put what I want in my personal queue so it is ready before I know I need the info. Think “I am feeling doubly lucky”, a “real” patent application by the way. T Mobile wins. The user wins. The Google wins. The stuff not in the Google system loses.
Interesting? I think so. But the system goes well beyond directory listings. I have been writing about Dr. Guha, Simon Tong, Jeff Dean, and the Halevy team for a while. The inventions, systems and methods from this group have revolutionized information access in ways that reach well beyond local directory listings.
The Google has been pecking away for 11 years and I am pleased that some influential journalists / analysts are beginning to see the shape of the world’s first trans national information access company. Google is the digital Gutenberg and well into the process of moving info and data into a hyper state. Google is becoming the Internet. If one is not “in” Google, one may not exist for a certain sector of the Google user community. Googleo ergo sum.
Stephen Arnold, September 28, 2009
September 23, 2009
TechFlash reported an interesting article called “Windows Live Lost $560 Million in FY2009”. With revenues of $520, the loss chewed through $64,000 an hour or $2,663 a minute 24×7 for 365 days. With Microsoft’s revenue in the $58 billion range, a $560 million is not such a big deal. In my opinion, profligate spending might work in the short term, but I wonder if the tactic will work over a longer haul on the information highway.
Stephen Arnold, September 23, 2009
September 7, 2009
When a company offers multiple software products to perform a similar function, I get confused. For example, I have a difficult time explaining to my 88 year old father the differences among Notepad, WordPad, Microsoft Works’ word processing, Microsoft Word word processing, and the Microsoft Live Writer he watched me use to create this Web log post. I think it is an approach like the one the genius at Ragu spaghetti sauce used to boost sales of that condiment. When my wife sends me to the store to get a jar of Ragu spaghetti sauce, I have to invest many minutes figuring out what the heck is the one I need. Am I the only male who cannot differentiate between Sweet Tomato Basic and Margherita? I think Microsoft has taken a different angle of attack because when I acquired a Toshiba netbook, the machine had installed Notepad, WordPad, and Microsoft Works. I added a version of Office and also the Live Writer blog tool. Some of these were “free” and others products came with my MSDN subscription.
Now the same problem has surfaced with basic search. I read “FAST ESP versus MOSS 2007 / Microsoft Search Server” with interest. Frankly I could not recall if I had read this material before, but quit a bit seemed repetitive. I suppose when trying to explain the differences among word processors, the listener hears a lot of redundant information as well.
The write up begins:
It took me some time but i figured out some differences between Microsoft Search Server / MOSS 2007 and Microsoft FAST ESP. These differences are not coming from Microsoft or the FAST company. But it came to my notice that Microsoft and FAST will announce a complete and correct list with these differences between the two products at the conference in Las Vegas next week.These differences will help me and you to make the right decisions at our customers for implementing search and are based on business requirements.
Ah, what’s different is that this is a preview of the “real” list of differences. Given the fact that the search systems available for SharePoint choke and gasp when the magic number of 50 million documents is reached, I hope that the Fast ESP system can handle the volume of information objects that many organizations have on their systems at this time.
The list in the Bloggix post numbers 14. Three interested me:
- Faceted navigation
- Advanced federation.
First, scalability is an issue with most search systems. Some companies have made significant technical breakthroughs to make adding gizmos painless and reasonably economical. Other companies have made the process expensive, time consuming, and impossible for the average IT manager to perform. I heard about EMC’s purchase of Kazeon. I thought I heard that someone familiar with the matter pointed to problems with the Fast ESP architecture as one challenge for EMC. In order to address the issue, EMC bought Kazeon. I hope the words about “scalability” are backed up with the plumbing required to deliver. Scaling search is a tough problem, and throwing hardware at hot spots is, at best, a very costly dab of Neosporin.
Second, faceted navigation exists within existing MOSS implementations. I think I included screenshots of faceted navigation in the last edition of the Enterprise Search Report I wrote in 2006 and 2007. There was a blue interface and a green interface. Both of these made it possible to slice and dice results by clicking on an “expert” identified by counting the number of documents a person wrote with a certain word in them. There were other facets available as well, although most we more sophisticated that the “expert” function. I hope that the “new” Fast ESP implements a more useful approach for users of Fast ESP. Of course, identifying, tagging, and linking facets across processed content requires appropriate computing resources. That brings us back to scaling, doesn’t it? Sorry.
Third, federation is a buzz word that means many different things because vendors define the term in quite distinctive ways. For example, Vivisimo federates, and it is or was at one time a metasearch system. The query went to different indexing services, brought back the results, deduplicated them, put the results in folders on the fly, and generated a results list. Another type of federation surfaces in the descriptions of business intelligence systems offered by SAS. The system blends structured and unstructured data within the SAP “environment”. Others are floating around as well, including the repository solutions from TeraText which federates disparate content into one XML repository. What I find interesting is that Microsoft is not delivering “federation” which is undefined. Microsoft is, according to the Bloggix post, on the trail of “advanced federation”. What the heck does that mean. The explanation is:
FAST ESP supports advanced federation including sending queries to various web search APIs, mixing results, and shallow navigation. MOSS only supports federation without mixing of results from different sources and navigation components, but showing them separately.
Okay, Vivisimo and SAP style for Fast ESP; basic tagging for MOSS. Hmm.
To close, I think that the Fast ESP product is going to add a dose of complexity to the SharePoint environment. Despite Google’s clumsy marketing, the Google Search Appliance continues to gain traction in many organizations. Google’s solution is not cheap. People want it. I think Fast ESP is going to find itself in a tough battle for three reasons:
- Google is a hot brand, even within SharePoint shops
- Microsoft certified search solutions are better than Fast ESP based on my testing of search systems over the past decade
- The cost savings pitch is only going to go so far. CFOs eventually will see the bills for staff time, consulting services, upgrades, and search related scaling. In a lousy financial environment, money will be a weak point.
I look forward to the official announcement about Fast ESP, the $1.2 billion Microsoft spent for this company is now going to have to deliver. I find it unfortunate that the police investigation of alleged impropriety at Fast Search & Transfer has not been resolved. If a product is so good as Fast ESP was advertised to be, what went wrong with the company, its technology, and its customer relations prior to the Microsoft buy out? I guess I have to wait for more information on these matters. When you have a lot of different products with overlapping and similar services, the message I get is more like the Ragu marketing model, not the solving of customer problems in a clear, straightforward way. Sigh. Marketing, not technology, fuels enterprise search these days I fear.
Stephen Arnold, September 7, 2009