The Fast Follies: A Math Error… Maybe

May 29, 2008

I’ve been in Canada with lousy email access. I received two emails earlier today that I wished I had read when I was waiting for a late flight.

One came from a colleague in Oslo, Norway. The second arrived from Copenhagen, Denmark. Both writers wanted to know if I had seen this article,  My Norwegian ;anguage skills are non-existent, but with Google Translate, it seems that Fast Search & Transfer has blipped the radar of Norwegian authorities. I have worked for the police in a number of countries, and I enjoy the camaraderie, the intellectual challenge, and the thrill of the hunt. I received from a US colleague the well-composed article by Liz Gunnison at Portfolio.com titled “Microsoft Stuck With a Norwegian Herring?” (I really like that headline, but my “fast follies” is okay too.) The key line for me in her essay is:

Økokrim last week concurred that the nature of the irregularities and the amount by which Fast Search apparently inflated its accounts were serious matters warranting prosecution. But the agency said it was too busy to open a criminal investigation. Rather than let the matter rest, the market supervisor turned it over to the Oslo police for investigation. Aftenposten, a Norwegian newspaper, characterized Kredittilsynet’s decision to involve the police as an unprecedented step in that country.

Fast Search & Transfer is not likely to engender much enthusiasm among the investigative team given the job of figuring out how a company losing money since 2006 could suck in $1.2 billion from Microsoft. The Redmond giant has about 80,000 employees and many of these are certified wizards. I must be missing something. Paying $1.2 billion for a company with several years of losses is pretty interesting.

Take a quick look at the restated financials. These are tough to locate, but I found a link to them here. These are public documents, but I have a hunch that the files will become even more difficult to locate as this drama unfolds. I did some looking for Fast Search’s management presentations. These have been removed from the Fast Search Web site. Even the vaunted Fast Search search system was unable to direct me to these documents. The Fast Search site map is more useful than its own site search system. You can access the site map here.

There are some technical descriptions of Fast Search’s technology available via Google. Just navigate to Google’s advanced search page, enter the phrase “Fast Search & Transfer” and specify the file type as Powerpoint (PPT). Reading these decks with knowledge of the restated financials left me thinking I was caught in Rod Serling’s Twilight Zone, a TV series where reality isn’t what the protagonists think. Three years of losses and a $1.2 billion sale. It’s almost too far out for TV.

The Web site also include videos of Fast Search management explaining how their system is the one for other vendors to beat. You can learn much from them here. What I did was to note the folks who raved about Fast Search. I don’t want to drink too much of these analysts’ KoolAid or get caught in a “search wave” and get crushed by a reality ignored by researchers. You can also listen to darned amazing podcasts here. i was able to grab a few snatches in between phone calls about the aforementioned police investigation. I think the one I listened too was science fiction. You can still get a flavor of the “old” Fast Search by reading the entries on the Fast Forward Web log, a chronicle of Fast Search’s user meetings. Access this information here.

If you have some difficulty navigating the site, you’re not alone. The redesign makes it difficult to locate information. Nevertheless, you can do some data archeology using the Google cache, the Wayback Machine, and other search engines; for example, Exalead. AllTheWeb.com, operated by Fast Search for Yahoo, is not too useful for finding information about Fast. I tracked down the Norwegian news stories via EUfeeds.eu. I’m not sure why there was no pick up of this story when I looked at AllTheWeb.com‘s index of 5,000 news sources. I may not have sharp enough search skills to locate this information.

In the morning, I will print out the restated financials and post short essay about these date. Maybe there’s another mistake in the math. I find it amazing that so many smart people at two high-tech companies make errors in addition and substraction. Ah, these youngsters.

If you have any information and insights into this interesting development, let me know.  Watch for my run down of the restated financials on May 30th.

Stephen Arnold, May 29, 2008

Knewco: Community Tags

May 29, 2008

Peter Suber offers a clear, detailed post about a new approach to community tags. You can read his post “Combining OA, Wikis, Community Annotation, Semantic Processing, and Text Mining” here. Mr. Suber includes a link to a discussion of the idea in Genome Biology here.

What’s interesting to me is the specialist nature of the effort. Although anyone can tag, the focus is STM (scientific, technical, and medical). The idea is to create rich indexing for technical information. I think this is a good idea. I think there will be challenges because a small number of people do most of the work. Nevertheless, these types of projects are sorely needed.

The company responsible for the technology is Knewco, founded by several academics. You can learn more about the firm here. Knewco has developed some tag options that are interesting. I think the value will come from POW or “plain old words”.

Why do I care about this and what’s the wiki variant have to do with search? Well, a lot. First, technical information has long been in the hands of a small number of multi-national firms. If you want to search engineering or chemical information, you have to use specialist files and sometimes pay big, big online access charges. This type of project is one more example of the research community feeling its oats. Good for researchers and potentially threatening to the oligopolies in the STM information business.

Second, I like the idea that information innovation is coming from thinkers outside the traditional IR (information retrieval) community. When I go to conferences, there are 20-somethings who have an opportunity to lecture me on their major insight, Use For references. Okay, been there. Done that. Fresh thinking is important, and I am delighted that Knewco is trying pop ups, colors, and other bells and whistles that may point to some new directions in tagging.

Finally, the larger the body of publicly accessible tags, the better the next-generation systems will be. Google, as I point out in my new study due out in September 2008, is focused on making its software smarter. Humans play a role, but the GOOG knows the value of indexing, taxonomies, tags, and their breathern.

On the downside, I don’t like the company name “Knewco”. In fact, Knewco uses coinages for its different functions; for example, a “knowlet”. I hate having to memorize a neologism for something I call a cross reference. But that’s a personal preference. Check the company’s Web technology here.

Stephen Arnold, May 29, 2008

X1 Technologies Dives into the SharePoint Search Channel

May 29, 2008

X1 Technologies blipped my radar when a source in Mountain View, California, told me that Yahoo inked a deal with X1 for search. You can learn more about X1 and its patent search technology here. The company’s tagline is “a single interface for secure business search”. I’ve been pleased with my X1 search experiences, and in my discussion of X1 as an option for IBM systems, I identified its technology as one well worth a close look.

The Yahoo Connection

Troubled Yahoo–despite its lousy ad system, Panama–has some sharp search and information retrieval wizards. When a point solution for search is needed, these wizards can pinpoint a vendor who can provide a quick fix for a findability ailment. Yahoo, for example, licensed the InQuira system to power the company’s customer support system. You are getting natural language help from InQuira, not Yahoo’s own search system. When Google aced Yahoo with email search, Yahoo’s engineers poked around and licensed Stata Labs’ technology. Yahoo can identify good technology, but that’s now a core weakness. Instead of an integrated search platform, Yahoo uses the Baskin-Robbins’ approach–many different flavors. Some flavors change without warning. The X1 solution deployed by Yahoo in its toolbar offered some useful features; namely, fast indexing and on-the-fly document display.

I took a look at X1 Technologies and learned that its engine indexed quickly. I found the interface geared to an email user, not a dinosaur like me. All in all, I liked the performance and the ability to filter results. Over the years, I tested different versions of the system and concluded that it was worth a look, particularly if the user community wanted an Outlook-type interface and zippy indexing.

X1: Signing Up with MSFT

I learned on May 27, 2008, that X1 made the jump into the Microsoft channel and its fast-moving currents. As you know, a company can sync up with Microsoft, send an engineer to two to Microsoft’s training courses, and demonstrate that its software doesn’t foul up SharePoint or some other “core” Microsoft product. In my experience, third-party software is often more stable than Microsoft’s “core” technology. A “hot fix” can produce some exciting SharePoint moments in my experience. I also enjoy SQLServer back ups that appear to complete and then upon testing, demonstrate a less-the-charming ability to rebuild the data set. Sigh.

X1 offered a desktop search system, free from Yahoo at one time and a modest charge if you bought the commercial version of the product. Now the company offers its X1 Enterprise Search Suite. The technical dope is here. The features of this Microsoft-certified system include:

  • Ability to search the contents of Microsoft servers, including Exchange and SharePoint servers
  • Federated results; that is, obtaining documents from different servers and displaying a single results list with duplicates removed
  • Support for Microsoft’s security model, Microsoft clustering, etc.
  • Connectors for more than 400 file types, including the Symantec Enterprise Vault.

x1 interface

With more than 12,000 SharePoint licensees and a rumored 65 million users–a estimate which I doubt–of SharePoint search, X1 joins a number of other prominent enterprise search vendors as Certified Gold partners.

Read more

Vivisimo Dives into eDiscovery

May 29, 2008

I’ve always liked the Carnegie-Mellon spawned Vivisimo. I think my early experience with Lycos and its writing a check for Point (Top 5% of the Internet) made me look favorably on CMU technologies. Vivisimo announced that its Velocity 6.0 platform now does “social search” and eDiscovery. These are hot topics in search and information retrieval.

You can read Vivisimo’s own announcement of these functions here, or you can take a gander at eWeek’s write up here. The idea is that the system includes a “Discovery Module”. A licensee can tag large groups of search results with custom index terms. In a legal matter, the ability to add a custom tag to a group of documents can be a useful feature.

The issue with email is that there is a lot of it. Vivisimo’s ability to scale will be under the microscope in this hotly-contested sector. There’s Autonomy and dozens of other players in this space. Plus more vendors are jumping on this bandwagon every day. You can find a list of other vendors here.

Stephen Arnold, May 29, 2008

Microsoft: Plan to Hobble the Google

May 28, 2008

A contented quack to Henry Blodget who wrote a thoughtful essay “Microsoft’s Secret Plan to Kill Google Explained”.

Not only is this a great headline, the analysis is first rate. You can read the Silicon Alley Insider post here. This Web log has a useful search engine, so you can track this article down if you don’t get to it today. (Another happy quack for Mr. Blodget.)

The arguments are dense and hard to summarize. I don’t want to rehash Mr. Blodget’s points. Instead, let me identify for you the sentence that jumped out and hit me between the eyes:

Good search results are necessary, but even if Microsoft’s results are widely agreed to be better than Google’s–which today’s certainly aren’t–this won’t help. Why not? Because most consumers won’t notice or care. Google works, and Google is synonymous with search. And of course “great results” are an advantage only if the great results are better than Google’s, and we’ll believe that when we see it.

Powerful and direct.

You may not agree, of course. Let me offer several observations sparked by Mr. Blodget’s humdinger of an essay.

First, Google can’t be caught. Microsoft must leapfrog Google. Google can fall on its own talons. Lawyers can ensnare it. But the GOOG is lumbering forward, not sitting on a park bench smelling the flowers. There are technologies and mathematical procedures that are newer than the ones Google uses. These have to be assembled and crafted into a system that leaves Google in second place. Incremental, tactical, and me-too moves won’t do it. IBM could not match Microsoft and Microsoft can’t match Google. Today, IBM has a consulting business model that blends hardware and software. It works, and Microsoft needs that type of business innovation first, then the technical expertise to deliver.

Second, user behavior in searching is changing quickly. At the Enterprise Search Summit I showed a screen dump from a “test” Google service disclosed in a little-known patent application. A Googler–fancy Dan Hollywood-type too–told me I made up the screen shot. Wrong.

Google’s whiz kids in engineering are pushing far beyond search results and keeping the secrets from sales types and most of its competitors. (IBM insists it knows what Google is doing, so IBM has decided it won’t be caught in a rewind of the Microsoft MS-DOS fiasco with the GOOG.) So, Google is moving forward more quickly than most people know or realize. Let me tell you. The output shows aliases of a person along with a hot link to the individual’s location and image. Pretty tasty stuff to a researcher or a person with an inquisitive mind.

Third, search is a problem. The news that the Fast Search & Transfer group has become a new project for Norwegian law enforcement professionals. If you read Norwegian, you can get the details here. Keep in mind that this news story may be addled, and an overworked police force might be too busy with other work to fiddle with engineers who can’t add. Fast Search’s Web indexing and search system is online and available here. If the Fast Search problem moves from the backburner of Norwegian gossip to the microwave of a legal action, more stormy seas may lie ahead. The quick action to swap Live.com search out and plug in the Fast Search technology could go nowhere. A delay allows the GOOD to widen its Grand Canyon scale lead over Yahoo and the lagging, third place Microsoft. You can pay advertisers, retailers, and users. Heck, pay me. But folks are voting with their clicks, and the results are in.

To wrap up, the Henry Blodget fellow has earned an invitation to the Beyond Search goose nerve center. We’ll even cook up a batch of burgoo and not require him to bring a couple of dead squirrels or whatever critters are in the Silicon Alley ecosystem.

Stephen Arnold, May 28, 2008

Gates: Reports of Retirement Greatly Exaggerated

May 28, 2008

Dan Farber, a sharp fellow whose articles I track, has a good discussion here of Microsoft’s Bill Gates’s decision to spend 20 percent of his retirement time working on Microsoft. Mr. Farber brought a smile to my face when he wrote:

Gates, who will remain chairman of Microsoft, said he will spend two to three days at Microsoft, where he will have an office, and two to three times that amount of time writing, thinking and working on a variety of pet projects, including the next generation Microsoft Office, natural interfaces (such voice and handwriting) and search.

Now that’s good writing!

The point that jumped out at me–aside from the two or three days at Microsoft translating to 20 percent of Mr. Gates’s time–is the direct reference to search. Man, that’s a subject that Microsoft cannot put to rest. When I read Mr. Farber’s story and scanned some of the other comments about the news, several thoughts went through my mind. Keep in mind that Mr. Farber helped me get in the mood to think these thoughts; he’s not party to my thoughts:

  1. Maybe search is the reason that Mr. Gates can’t retire. The Fast Search & Transfer acquisition, the dismal performance relative to Google of MSN and Live.com search, and the land-office business companies selling an alternative to SharePoint search put a hitch in the get-along
  2. Could this “20 percent” be the first signal that the leadership of Mr. Ballmer does not have the desired batting average this year? Not only is there the turmoil in search, Microsoft has ankle-biting Europeans grousing about certain business practices. There’s the wacky Yahoo deal. There’s the Vista excitement.
  3. Ah, the Google. Wall Street sees the GOOG as an ad company that does something called search. Somewhat more flexible thinkers see Google as a threat to the Microsoft business model. If true, Microsoft can deliver shareholder value when broken into three or more parts. An $85 billion company that is a value stock with a management team unable to respond to a bunch of mathematicians in Mounting View, California, is not going to deliver the much-needed cash investment bankers and shareholders demand.

You can find a very good run down of other views of this “development” on the path to the retirement home on Techmeme and Megite. Every day Microsoft takes an action that helps keep me smiling inwardly. I called this situation in my 2005 study The Google Legacy. It does feel good to know that the work I did in 2003 and 2004 was dead on. No kicking back ahead for Mr. Gates. I also believe life will become more exciting for the Microsoft senior management team. I can hear everyone saying, “Great news. Bill’s back.”

Stephen Arnold, May 28, 2008

Good Enough Means Trouble for Commercial Database Publishers

May 28, 2008

I began work on my new Google monograph. (I’m loath to reveal any details because I just yesterday started work on this project.) I will be looking at Google’s data management inventions in an attempt to understand how Google is increasing its lead over search rivals like Microsoft and Yahoo while edging ever closer to providing data services to organizations choking on their digital information.

As part of that research, I came across several open source patent documents that explain how Google uses the outputs of several different models to determine a particular value. Last week a Googler saw my presentation which featured a Google illustrative output from a patent application and, in a Googley way, accused me of creating the graphic in Photoshop.

Sorry, chipper Googler, open source means that you can find this document yourself in Google if you know how to search. Google’s system is pretty useful for finding out information about Google even if Googlers don’t know how to use their own search system.

How does Google make it possible for my 86-year-old father to find information about the town in Brazil where we used to live and allow me to surface some of Google’s most closely-guarded secrets? These are questions worth considering. Most people focus on ad revenues and call it a day. Google’s a pretty slick operation, and ads are just part of the secret sauce’s ingredients.

Running Scenarios In my experience, it’s far more common for a team to use a single model and then run a range of scenarios. The high and low scenario outputs are discarded, and the results are averaged. While not perfect, the approach yields a value which can be used as is or refined as more data become available to the system. Google’s twist is that different models generate an answer.

Incremental improvements pay off over time

This diagram shows how Google’s policy of incremental “learnings” allows one or more algorithms to become more intelligent over time.

The outputs of each model is mathematically combined with other models’ outputs. As I read the Google engineers’ explanations, it appears that using multiple models generates “good enough” results, and it is possible, according to the patent document I am now analyzing, to replace models whose data are out of bounds.

Read more

Kevin Turner, Microsoft’s COO: Search Is Transformative

May 28, 2008

Eileen Yu, “Microsoft COO: Standing Firm on Vista“, May 27, 2008, offers a very insightful interview with Kevin Turner, a former Wal*Mart executive, now Microsoft’s chief operating officer. Please, navigate to that link and read the full interview. Don’t delay because some ZDNet postings, particularly those on ZD sites outside the US, make it tough to locate specific stories.

The key question for me that Ms. Yu asked was:

During a conference in July 2006, you said: “Enterprise search is our business, it’s our house and Google is not going to take that business.” It’s almost two years since then — is this still the core market you see Microsoft and Google compete in? And where do you think Google stands now in this market?

Here’s Mr. Turner’s answer:

The IT industry has historically been defined by a series of periodic, transformative shifts in the way people think about computing, and we’re in the midst of another: a services transformation. With any of these transformative shifts, the prediction is that change will happen swiftly and completely. But the reality is quite the opposite. Customers want choice, and that is the foundation of our software-plus-services strategy. We are giving customers the choice of what they want to do with their data and infrastructure. If they want to run it on-site, we will support that. If customers want a partner to host a solution, we will support that as well. Finally, if a customer wants Microsoft to host a solution, as many have asked us to do, we can do that. Again, it is about giving customers a choice, which is a very different strategy than our competitors.

After thinking about this transformative comment and the fact that Google is now a decade old, I am curious about the many “flavors” of search at Microsoft. With Fast Search & Transfer providing the AllTheWeb.com index, I wonder if Live.com’s service will be enhanced with Fast’s Web search technology. The “hard” boundary in SharePoint is generating money for Certified Gold Partners’ snap-in enterprise search technology. If anything, the vendors supporting SharePoint are experiencing strong interest in their products. On Monday, May 26, I tried to locate an email in an Outlook Express archive. The search system wasn’t bad; it just couldn’t find the email. The solution was easy. I had someone go to my office, use the ISYS Search Software I used on that machine, and email me the document.

I’m looking forward to a Wal*Mart executive’s touch. But come to think about it: I have a tough find locating what I need at my local Wal*Mart. There’s no directory in my Sam’s Club or “super” Wal*Mart. My recollections is that my local greeter is chipper but clueless about where products are. The rare Wal*Mart employee, when I can attract one’s attention, knows a product category or two. The rest require the Wal*Mart professional to wander up and down aisles. Not quite as much of a hassle as sending someone to my office on a holiday but close enough for horseshoes.

Stephen Arnold, May 28, 2008

Attivio: Moving Beyond Search Aggressively

May 27, 2008

An exclusive interview with Attivio’s founder Ali Riaz is now available in ArnoldIT.com’s Search Wizards Speak series.

Mr. Riaz revealed that traditional search engines add some extra work for the user. He told ArnoldIT.com:

Legacy search thinking suggests that the search box is the answer for everything. At the simplest level, people look for things in two ways: “I know what I’m looking for, just tell me where it is” (the search box), and “I don’t know what I’m looking for specifically, but I know how to describe it so let me navigate and explore; let me ‘disambiguate’ my way to the answer.” We think the latter approach is the more powerful of the two and what we’re hearing from prospects, especially in the enterprise. In many ways, you can think of the search box as the user interface of last resort. It has its place, but search is best used when it is woven throughout an application’s capabilities rather than offered as an isolated, off-to-the-side interface on its own. Search needs to feel natural to be truly effective. Of course, when all else fails, then type something in a search box, but a good search engine prevents this happening wherever it can.

With dissatisfaction emerging as one of the themes of the Enterprise Search Summit in New York City in May 2008, Mr. Riaz told ArnoldIT.com said:

… Internal information technology units are overwhelmed. We believe they are overwhelmed by complexity and incompatibility. Today’s legacy search engines are simply too complex to manage, and needlessly, I might add. As well, their pricing models in most cases are a disincentive to grow the technology throughout the organization because they commonly charge per usage, either index size – number of documents or disk space – or query capacity, or per user. We offer a much simpler, more one-size-fits-all model. By the way, we also federate across to the legacy search systems so you don’t have to rip and replace. This is a request from a number of our prospects.

Incompatibility exists most prevalently between the two basic information silos that exist in every organization: the business intelligence-data warehousing stack for structured data and the search-content management system stack for unstructured content. Our long term strategy is to merge these two.

You can read the complete interview here. More information about Attivio is here.

Stephen Arnold, May 27, 2008

Lawyers: Mixed Opinions about Law and Online Giants

May 26, 2008

I’m no attorney (thank goodness). I don’t understand lawyers, lawmakers, or the pundits who explain what legal eagles do, don’t do, and won’t do.

Two news items caught my attention. Despite my feeling lousy, I decided to urge you to read both. The CNet story explains that Viacom is suing Google for one billion dollars. Google argues that it complies with applicable copyright laws. The old media company and the new media company are going to meet in court. (Someone told me that 95 percent of litigation is resolved before going to trial.) You can read this clear write up here.

The other story is from ZDNet Australia about a judge in that country who opines that “Google, Yahoo Make Lawmakers Impotent”. You can read it here.

The Australian judge offers the opinion that technology is changing too fast for the courts. Technology allows some companies to “beat the legal system”.

Lawyers, based on my limited experience, are not good technologists. My sample is small, but the attorneys whom I have known also have trouble with math. Suggest that a Riemann zeta function is a use procedure, and I experienced a nervous chuckle. The sidekick of blind justice were not sure if I was kidding, or I were serious.

In my first Google study “The Google Legacy” and in my second “Google Version 2.0” I argued that lawyers could kill Google. Both are available from Infonortics Ltd. in Tetbury, Glou.

I’m not sure if “kill” is the correct word. A legal process can suck money, management attention, and public perception at prodigious rates. A sufficiently bad run of luck in the courts could slap a weight jacket on the GOOG.

On the other hand, lawyers, if the Australian judge’s observation is somewhat accurate, might be their own bear trap. A lawyer trying to explain how algorithms and teenagers undermine a traditional media giant could confuse matters in an interesting way.

My view is that technology is not just outpacing the legal system. Technology is in the process of redefining some of the principles that are codified in many countries’ laws. The problem is analogous to the wrenching of the Roman legal system before Julius Caesar and the wild and crazy mess that followed his brief term in office. Roman law never adapted. One might point out that Italy’s present legal system is still pretty wacky. Nevertheless, the Italian technologists in Modena, Bologna, and Rome seem to be innovating without much friction from Italian courts.

Yahoo could be taken out by the courts. The company is in “transition”. Google, on the other hand, may have the resources to deal with lawyers who want to put technology in its place, snap a shock collar on Google, and keep the clueless traditional giants paying those fat, fat fees. In law, attorneys’ math is good enough to get those bills in the mail.

Stephen Arnold, May 27, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta