A Traditional Newspaper Tries to Get Hip

November 16, 2008

I read in CIO Magazine here that the UK newspaper the Telegraph is a content provider for Google Android. The story “Telegraphy Newspaper Is First  Google Android Content Provider.” Leo King does a good job of explaining that the Telegraph Media Group has an Android application that will, according to Mr. King:

provide users with news and sports feeds, as well as travel and motoring information, and can be downloaded from the Android Marketplace.

In August 2008, the same Telegraph unit offered content to iPhone users. I am encouraged by this innovation. My question is, “What were these folks doing for the last decade?” Google has been chugging happily along, offering opportunities for companies to hook up with Googzilla for years. The fact is that these action, although admirable, are in my opinion too late to make save the game. Traditional newspapers are facing demographic challenges, rising costs, and advertisers who have to decide between food and running adverts. More innovation is needed, but the good news is that the Telegraph is making an effort.

Stephen Arnold, November 16, 2008

Yasni: People Search

November 14, 2008

yasni a people search engine, just launched in the U.S. If you’re on the web, yasni supposedly will find you. But the search is on first and last names, and there are lots of “Jessica Bratcher”s out there. My yasni search returned 30 results, including hits on amazon.com, Facebook, MySpace, Google News and Blogs, Technorati, even criminal searches. But for more listings, they’ll send me an e-mail list within 24 hours.

People search has been and remains very important. Zoom Info, LinkedIn, and other sites provide useful information. I have found Cluuz.com useful as well. Cluuz.com displays relationship charts. I did some ego surfing to test yasni and I ran the same queries on Cluuz.com. On Cluuz.com, I found an interview I did in 2005. Cluuz.com also surfaced several articles about newspaper awards I’ve received. On my test queries, I did not find yasni as useful. But it is early in the game for yasni. I will check back in a month or so to see how the service develops. I do recommend that you give it a whirl.

Jessica Bratcher, November 14, 2008

ZDNet Identifies Google’s Fatal Flaw

November 14, 2008

Sure, the stock is at 2005 levels. The company took it on the snoot with the Yahoo deal, the possible loss of the Verizon account, and grousing about employee options that are underwater. Dana Blankenhorn’s “Google Fatal Flaw Revealed” takes a view of Google that I have not previously considered. As a result, here ZDNet article is a must read. I don’t want to spoil the “fatal flaw” for you. Click here and you will see the “fatal flaw” revealed in the first subhead. In my two Google books, whose titles will not retype today, I identified a number of Google vulnerabilities. I admit. I did not hit upon the weakness Dana Blankenhorn’s researched unearthed. For me, aside from the fatal flaw, the most interesting comment in the article was:

Yet whether I’m covering the efforts at Chrome, at Android, or at Google Health, what I see are Google employees working on a Google Island, depending only on fellow Googlers and Google-made code in their efforts.

The idea is that Google is an island. I know that the company buys technology. The purchase of Transformic, Inc. is an example. Most people don’t recognize the name of this acquisition. Google doesn’t talk about some of its more interesting activities. My own research about Google suggests that when it buys a company, it gets new people. In the case of Transformic, the gurus running that shop attracted more fresh talent to Google. As a result, Google has a number of engineers and scientists who are steeped in the type of systems for which Transformic was developing. As a result, I think that Google may be operating as an island, but it is an island with regular shuttle service to the mainland, abundant bandwidth, and very dynamic presence at certain technology venues. See if you agree with the ZDNet article and share your comments.

Stephen Arnold, November 14, 2008

Google and Novel Content

November 13, 2008

On November 11, 2008, Google received a patent for the invention “Detecting Novel Content”, US7451120. In my opinion this is an important Google invention. The system and method makes it possible for Google to identify a segment of a document that contains interesting information. “Novel” is a code word for distinctive information. The abstract for the invention is:

A system determines an ordered sequence of documents and determines an amount of novel content contained in each document of the ordered sequence of documents. The system assigns a novelty score to each document based on the determined amount of novel content.

Let’s assume that Google uses this invention. What can the method deliver? My thought was a compilation of novel content on a user-specified subject. Traditional publishers cut and paste to create anthologies. In the 15th and 16th centuries books that were collections of snippets were used to teach students Latin and Greek. Another possible use of the method would be to snip content from one document and place that snippet and its metadata into a dataspace.

Stephen Arnold, November 13, 2008

Microsoft Verizon Deal Said to Be Near

November 12, 2008

Google learned that Sun Microsystems cut a deal with its former enemy Microsoft. Now Google may have to adapt to Microsoft’s winning the contract to provide mobile search to Verizon. Steven Muil’s “Microsoft Said Closer to Verizon Search Deal” provides a good summary of what appears to be happening. For me, the most interesting comment in Mr. Musil’s story is this comment:

Google’s preoccupation with regulators over the Yahoo deal reportedly helped create the opening for Microsoft with Verizon, the sources told the newspaper. The move comes as the two companies ramp up their efforts in the mobile arena. The first phone based on Google’s Android mobile operating system–a challenger to Microsoft’s Windows Mobile–recently went on sale.

This comment made clear to me that Google, despite its brilliant record of successes, is not immune to big company disease. When a deal like this slips away, I am inclined to think that the management team is rushing from meeting to meeting, email to email, and decision to decision without appropriate management oversight. Chaos is often useful when cutting and pasting code widgets together to create new applications and services. However, business deals have a different composition. Deals can slip away because details get overlooked because everyone is too busy.

Is this deal set in concrete? I don’t know. What I do know is that the messages sent about the misfire with Yahoo, the Sun shift to Microsoft, and now this alleged tie up between Microsoft and Verizon resonate with me. Has Google underestimated Microsoft? I am eager to see what tomorrow brings.

Stephen Arnold, November 12, 2008

GT&T: Ma Google Expands Her Comms Service

November 12, 2008

I saw a Reuters story on the South African Web site MoneyWeb.com. The story was “Google’s Gmail Takes on Skype, Adds Video and Voice to Chat.” You can read the story here. Sam Diaz posted “Gmail Expands to Include Voice and Video” here. Both stories explained the new communication features available in Gmail. For me, the most interesting comment appeared in Sam Diaz’s story:

Google launched video and voice chat for Gmail – not necessarily a ground-breaking feature but somewhat different from other models because the feature is built into the Gmail inbox window, instead of a separate application.

The original Google blog post about this development appeared in the Gmail Blog here. Uptake on this story seems rapid. The most interesting comment for me was this remark on the Official Google Blog here:

Video chatting from Gmail is as easy as sending an instant message. With our team spread out across Google offices in Sweden and the U.S., it’s been really handy in helping us work together.

Google seems to be “dog fooding”. This is a term used at places such as Microsoft to describe products used by employees prior to their release.

When I logged into Skype today, there were 14 million users online. Gmail’s new voice feature has a fraction of this user base–for now. My hunch is that Google is continuing its slow, steady march to global telephony. When I read these two stories, I mentally flash forwarded several years. Google is become the Global Telephone & Telegraph Co., a 21st century version of the pre-break up AT&T. Instead of Ma Bell, we now have Ma Google.

Stephen Arnold, November 12, 2008

Interview with Martin White, Intranet Focus Ltd.

November 11, 2008

Martin White, co author of Successful Enterprise Search Management (Galatea, November 28, 2008), spoke with Stephen Arnold (his co-author) about the new Galatea study about search management. The interview touches upon the challenges that organizations face with information access, including search. The new study, which becomes available on November 28, 2008, tackles subjects that have not been discussed in terms of management, return on investment, and problem solving. Mr. White said:

Very rarely is poor search a result solely of poor technology. It is all about effective management of the entire search procurement, installation and implementation processes.

On the subject of business intelligence–what some pundits are calling “smart search” or “active intelligence”–Mr. White said:

Look at all those financial institutions with their BI applications. Did it stop them making a fool of themselves and us over sub-prime loans. BI is only as good as the way in which the correlations are set up and usually that is poorly. To me the new search is when search is all but invisible – embedded in a workflow process.

You can read the full text of this interview, conducted on November 10, 2008 here.

About the Study

The scope of the new study Successful Enterprise Search Management is unusual. Most studies of search are little more than profiles of vendors. After more than one year of work, Mr. White’s and his co-author’s approach is to approach the management aspect of search, information access and content processing by putting a conceptual foundation in place, reviewing the technology of search, discussing the vendor selection process, exploring the implementation stage, including pre launch testing of the system, and a series of suggestions called “action this day.”

The book includes case studies, references to specific vendors’ systems, and practical guidance from Mr. White and his co-author. Specific topics addressed include text mining and advanced content processing, information governance, and the challenges language itself presents. The book consists of five major sections and 17 chapters. The book is illustrated with screenshots from referenced systems and diagrams that highlight the management tasks addressed.

You can learn more about the book and place a pre-publication order on the Galatea Web site here.

Sun Microsoft: Tit for Tat

November 11, 2008

When Google pulled Open Office from its Google Pack, I wondered how long it would take for more information about a growing rift between Sun Microsoft Systems and Google to surface. On November 10, 2008, ZDNet UK published “Microsoft, Sun Agree Web Search Deal.” You can read the story here. Google at one time had quite a few employees who had work experience at Sun. Eric Schmidt, Google’s top Googler, the chief technical officer of Sun before he left to head up Novell. The interesting twist in the ZDNet story from Reuters is that when  user downloads Java, Microsoft’s toolbar comes along. Not long ago, Microsoft and Sun were exchanging legal hand grenades about Microsoft’s implementation of Java. In my opinion, the disruptive power of Google is becoming more evident with this deal. With Java on about 800 million PCs, Microsoft should get an immediate foot print boost. Its share of the Web search market should rise, and I think the effect will be visible as soon as the January 2009 data become available. What will Google do? In my opinion, I think Google sees Sun as a company that has past its prime. With rumors of Sun Microsystems being an acquisition target, Google may be correct. Microsoft benefits from this deal with users and the public relations coup it delivers to Googzilla’s nose.

Stephen Arnold, November 11, 2008

Micro Mart’s Surprising Web Search Findings: Google Is an Also Ran

November 11, 2008

The trusty newsreader served up a link to a three-part article by Peter Hayes. He wrote a feature “The Secret Life of Search Engines” for Micromart.com. I have conflicting date information for this article. It may have been written yesterday or a year ago.

You can find the first part here. The second part here. And the third part here. The Micromart.com site search engine leaves a bit to be desired because its index does not contain a pointer to the first part of this article. Sigh. My own tools ferreted out the three parts, and I think you will find Mr. Hayes’ analysis surprising. The key point for me is that when a journalist runs benchmark queries across search systems, the gulf between those who understand what readers find interesting and those who build search engines becomes evident. In fact, if Mr. Hayes’ analysis were used as the definitive guide for finding information on the public Web, there would be considerable consternation at a number of high profile firms and cause for joy among a group of search engines that are going nowhere in terms of usage. I want to consider this point at the end of my Beyond Search post. Let’s look at the key points in each of the three parts of this analysis, shall we?

Part One: Outline Politics

Straight off let me say I don’t know what ‘outline politics means. I don’t think it matters much beyond privacy and the ambivalent nature of an index’s utility. I did not get the impression that the phrase is particularly significant in the flow of his argument. The series begins with the notion that you can make money offering a product people use everyday. The idea is flawless when it comes to a fungible product, but I am not sure it applies to the somewhat more slippery world of information. Nevertheless, the point is that traffic is good. Furthermore, the Internet is changing. Content is tricky. Mr. Hayes introduces the notion of official content and unofficial content. That’s a useful distinction, but it did not resonate with me. Mr. Hayes then asserts that search engines have, and I quote:

two major functions. One is to teach, the other is to search. While both have a large positive side we shouldn’t pretend that there isn’t a downside to any tool. Any tool used for good can also be used for bad.

He is now in full stride and hitting a hot button almost guaranteed to whip up interest among European Web uses–privacy. He then heads for the end of Part One with this comment:

My final thought is that search engines are only passengers on the Internet train and not the train itself. The growth of the Internet gives them the prospect of a healthy and prosperous future – but at the same time it is reliant on the safekeeping and update of the Internet to keep up with demand and to protect it from vandals. As our newspaper headlines tell us, the world is not totally a safe and law abiding place.

I must admit that I am not quite sure of the logic of this first section, but let’s move on to Part Two.

Part Two: Tools

Mr. Hayes dives in with location searching and touches upon Boolean logic, promising to tackle this topic elsewhere in his series. His first injunction is to keep a search simple. Web indexes are divided into systems dependent on software and systems dependent on humans. Mr. Hayes does not provide a context for the disparity in usage between these two types of systems, a distinction that will return to haunt him in Part Three of his series. He points out that search systems are not “born equal”. The promised analysis of Boolean arrives and I learn:

Boolean (which consists of the three words AND, OR, NOT, remember) is best explained by example. Some engines don’t allow it and some only use the NOT part. This follows the general rule that nothing to do with the Internet is ever totally straightforward! Typing NOT will take out examples that don’t fit the bill (‘Arsenal NOT soccer’, for example), but this is hard word to use and control. In Yahoo, double meanings are automatically divided out. Also the engine can easily come up with word connections that you would never think of in a million years – including simple names.

I think I understand even though Mr. Hayes’ own examples use symbols for AND, and he does not provide an example of a successful NOT search statement. NOT for Mr. Hayes is a “hard word to control”. I imagine that for him NOT may be troublesome. He points out that:

AND is the least useful of all because most of time, it is taken as read on all known engines that work via keywords. Type ‘Peter Hayes Writing Genius’ it will give the same result as ‘Peter+Hayes+Writing+Genius’ or ‘Peter AND Hayes AND Writing AND Genius’.

The statement confirms my suspicions that Mr. Hayes has taken a very different view of Boolean logic, its complexities, and the way in which logical operators work in his world. I quite like AND, NOT, OR, and even NAND in some systems. You too may find AND and NOT useful as well.

I am not certain what the sub section “Getting It Right” means. The resonance of AND and NOT inutility echoes in my mind. Part Two ends with an observation about how much of the Internet is indexed. That’s a good question, and I now turn to Part Three, where the intellectual rigor of Mr. Hayes meets the Information Superhighway, if I may indulge in a bit of metaphorical whimsy.

Part Three: The Best UK Web Search Engines

I knew I was in for a delightful few minutes after the first two parts of Mr. Hayes’ feature. In Part Three he lays out 10 test queries. I can’t reproduce the full list, but I can highlight two of his queries:

  • Bring me the site of the best selling newspaper in the UK (The Sun)
  • Find a local newspaper covering the Shetlands

I noted that each query is expressed as a string of text. Some vendors would rush to point out that Mr. Hayes is using natural language queries. Not many systems support natural language queries in particularly sophisticated ways. Some, for instance, create a Boolean query from whatever the user enters in the search box. Other systems consult a look up table of what’s been a satisfactory result for the query recently and delivers that result from its cache. Others dump stop words and go with the meaningful words with an simplicity AND or OR Boolean operator. Others look at what’s available from an advertiser and dumps those results directly to the user. Others predict what a user will prefer based on that user’s profile or the user’s usage history. This list is not exhaustive  by any means.

What did Mr. Hayes learn from his analysis of the 10 queries sent to the UK sites for Lycos, AltaVista, Dogpile, Excite, HotBot, Metacrawler, MSN, Yahoo, Ask, and Google. I have converted Mr. Hayes’ findings into the summary table below. Keep in mind that these are his data in a slightly different form. These are not my or my team’s findings:

Rank Engine Hayes’ Take
1 Lycos Answered questions well
2 AltaVista Useful but obscure results
3 Dogpile Surprised it didn’t do better
4 Excite Respectable performer
5 HotBot Good all round performer; Mr. Hayes’ favorite
6 Metacrawler Biggest surprise of the lot
7 MSN Slick and impressive performer
8 Yahoo Handpicked and categorized results a plus
9 Ask Plain English queries
10 Google Did not outperform the opposition

Mr. Hayes includes “scores” for each engine. The top rated engine Lycos received a Hayes number of 83%; the lowest rated engine Google received a Hayes number of 78%.

Observations

I came away from my reading of this three part series in a semi stunned state. I had a number of major and minor quibbles gallivanting around my cranial cavity. Let me highlight three points and move on:

  1. This article made it clear to me that people don’t know what they don’t know about Web search, its technology, and its nuances. Google is probably correct in sticking with its very simple interface and its behind the scenes functions to answer most of the users’ questions with “good enough” information with its approach to results. If Mr. Hayes is an informed user of Web search systems, the fact that he finds the HotBot results more useful to him than other systems’ results, that’s well and good. The idea of using one system to conduct research of any type is an anathema to me. Overlap, freshness, scope of index–these are essential factors for each Web indexing system. Insensitivity to these issues makes me downright nervous. I thought, “If Mr. Hayes can’t figure out the important parts, what about a less informed online user?”
  2. The queries Mr. Hayes formulated reveal why natural language systems are not understood. Forget semantic methods. I am not sure how to remediate Mr. Hayes’ test queries. The approach is foreign to me as is Mr. Hayes’ failure to differentiate each of the test systems with more precision. There is a big difference between a system that is federating results, one that indexes only frequently accessed pages, and one that operates with orphaned code on a shoestring.
  3. The failure to point out that Google serves about 70 percent of the queries in North America and more in Denmark, Germany, and the UK is an oversight. The giant gets the lowest score, which doesn’t make sense to me. Mr. Hayes uses subjective criteria to generate his Hayes numbers and provides zero detail about the method used to calculate a score. I think the idea of scoring Lycos as a better search engine on freshness, features, relevance as measured by the number of on target hits in the first 10,000 results in a result set, and similar criteria will suggest that Lycos, AltaVista, and HotBot aren’t competitive in today’s market. Microsoft’s Live.com and Yahoo search are in some ways easier to benchmark against the Google. The other vendors are non starters in my mind because none has the technical nor financial resources to index at the Google, Microsoft Live.com, and Yahoo levels.

Mr. Hayes omitted a Web search engine that I think is better than eight or nine of those on this list; namely, Exalead. I am well pleased with the results I obtain from Exalead.com here. In general, the French make me nervous with the math skills and sense of style, but Exalead is the functional equivalent of Google, operated by Europeans, and a country mile better on my relevance tests than the orphans AltaVista, Excite, and HotBot.

Keep in mind I am stating my opinion. I am an addled goose. I am sure the experts who organize search conferences will be delighted to feature Mr. Hayes as a keynote speaker. The conference organizers and Mr. Hayes’ understanding of search may be well matched.

Stephen Arnold, November 11, 2008

Fast Search at PDC 2008

November 10, 2008

A happy quack to the reader who alerted me to a Web log post by Philippe Sentenac here. The 45 minute session was to cover a number of topics. Mr. Sentenac’s opinion was that most of these topics had been covered elsewhere. However, he did snag several interesting points about Fast Search, a unit of Microsoft that is the subject of a police investigation. First, he notes that FAST ESP integration is underway by Fast engineers. Fast ESP will be focused on indexing Web sites. Prior to the Microsoft acquisition, ESP was a system used to index enterprise content. A version of ESP for SharePoint will be developed to handle SharePoint installations with more than 50 million documents. Here’s the screen shot reproduced by Mr. Sentenac showing this two-part approach to Fast Search’s technology:

image

Source: http://translate.google.com/translate?u=http%3A%2F%2Fblogs.developpeur.org%2Fphil%2Farchive%2F2008%2F10%2F30%2Fpdc-2008-fast-building-search-driven-portals-with-moss-and-silverlight.aspx&hl=en&ie=UTF-8&sl=fr&tl=en

I don’t have a back up source for this information. My recommendation is that you view this as an interesting possibility, not a formal program. If I were working at the Microsoft Fast office in Oslo, Norway, I would be thinking about the police investigation, not the integration of ESP with SharePoint. But that’s just my opinion. I am sure that none of the Microsoft Fast executives or engineers are troubled by the possibly unwarranted investigation allegedly related to financial dealings.

Stephen Arnold, November 10, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta