Search: Habits vs Environments

June 2, 2008

In 1980, when you launched the Dialog Information Service search function, the system dumped you into a database about education. From that starting point, you entered a file number. Experienced searchers memorized file numbers; type b 15 and you would be “in” the ABI / INFORM business information file. Type b 16 and you would be able to search PROMT, a collection of business data. Dialog never saw bulletin board systems or the Internet coming.

People fortunate enough to have the money and technical savvy could become online searchers. The technology was sufficiently clumsy and the entire process so unfamiliar to most people as to make online searching an arcane art. Searching in those early days was handled by an intermediary. When I first learned about online databases at Booz, Allen & Hamilton in 1976, the intermediary was the New York office’s boss. I would call the intermediary, explain what I needed, provide a project number, and pick up the outputs on weird thermal paper later that day. As clumsy and expensive as the process was, it was more efficient than doing research with paper journals, printed books, and the horrific microfilm.

By 1983, Dialog had found a market for its mainframe-based search system–librarians. Librarians had two characteristics that MBAs, lawyers, and folks trained in brochure making lacked. First, librarians chose a discipline that required an ability to think about categories. Librarians also understood the importance of having a standard way to identify authors, titles, and subjects.

Second, librarians had a budget to meet the needs of people described as an “end user”. Some of my Booz, Allen colleagues would rush into our corporate library and demand, “Give me everything on ECCS!”

The approach taken by Systems Development (SDC Orbit), BRS (Bibliographic Retrieval Service), DataStar, and the handful of other online vendors was monetized in clever ways. First, a company would pay money to sign up to get a password. Second, the company would send the librarian to training programs. Most programs were free and taught tips and tricks to tame the naked command line. No graphical user interface.

You had to memorize command strings like this one.SS UD=9999 and CC=76?. The system then spit out the most recent records about marketing. The key point is not the complexity. The point is that you had to form specific habits to make the system work. Make an error and the system would deliver nothing useful. Search and retrieval was part puzzle, part programming, and part memorization. At the time, I believed that these habits would be difficult to break. I think the vendors saw their users as hooked on online in the way a life long smoker is hooked on nicotine.

habit bombs

The vendors were wrong. The “habit” was not a habit. The systems were confining, hellishly expensive, and complicated to such a degree that change was hard for vendor. Change for the people who knew how to search was easy. The automatic behavior that worked so well in 1980 began to erode when PCs became available. When the first browser became available, the old solid gold revenue streams started to slip. The intermediaries who controlled online were disintermediated. The stage was set for the Internet, lowest-common-denominator searching, and graphical interfaces. The Internet offered useful information for free. I have dozens of examples of online budgets slashed or eliminated because neither the vendor nor the information professional could explain the value of online information. A visible, direct cost with no proof of payback crippled the original online industry. Many of the companies continue to hang on today, but these firms are in a race that is debilitating. Weaker companies in the commercial database business will find survival more and more difficult.

The notion of online habits persists. There’s a view that once a user has learned one way to perform an online or digital task, it’s game over for competitors. That’s not true. New customer constituencies come into being, and the people skilled in complex, specialized systems can carve out a niche. But hockey stick growth and fat margins are increasingly unlikely for traditional information companies.

Read more

Google Tells Everyone: We Are Human

June 2, 2008

Techmeme has a link to the New York Times’ story “The Human Hands behind the Google Money Machine”. There’s also a link to the useful commentary by Henry Blodget, Silicon Valley Insider. By the time you read this, the comments and analyses of Google’s summer openness will be one of day’s key stories.

Last week there were the interviews and postings about Google I/O conference for developers. The best summary I’ve seen is by CNet. Stephen Shankland’s “We’re All Guinea Pigs in Google’s Search Experiment” and his “Google Spotlights Data Center Inner Workings.” Anand Rajaraman provided a technically-significant scoop about Google’s reluctance to rely exclusively on autonomous software. My post is here. The Datawocky piece is here. (I’ve heard that some Googlers call the Google infrastructure “the borg”.)

The flow of information is useful. As I thought about stream of information, I forced myself to step back and ask, “Why now?” Google has never been particularly forthcoming, and its public-facing representatives “run the game plan”. If you haven’t heard that phrase, it means, “Stick to the script.” At conferences, I’ve watched Googlers thrust into a presentation at the last minute struggle through the script.

Here are my thoughts about this new direction:

  1. The Google sees an opportunity to position itself a thoughtful leader. The emphasis on people shifts the discussion from monitoring clicks and algorithms to people who think about the implications of technology and market needs.
  2. The messages focus on what Google is doing. The examples say to me, “Hey, guys, we’re doing these things now.” For a competitor, the positioning of activities as actions based on what’s in place may be chilling. It begs the question, “What’s next?”
  3. Google is maturing, and its management is confident that messages for users, developers, advertisers, and competitors will increase Google’s presence in the market.

What do you think is behind this new transparency? It’s visible in Eric Schmidt’s remarks about mobile advertising , reported by Seeking Alpha, and his earlier comment in the U.K. Telegraph that Google’s founders have grown up. You can read this story here. and enjoy its now-obligatory picture of Messrs. Brin and Page lounging on some of Google’s signature fluffy furniture.

My take is that Google’s management is not behaving in a spontaneous manner. Just as a series of steps makes an algorithm work, this flood of information has my radar oscilloscope flickering. I think the mathematical logic so prized at Google is at work. I’m watching for signs of a big event in the Googlesphere. Semantic Web? Data management? Major buyout close to completion? Maybe.

Controlled transparency is a signal, not an end in itself.

Stephen Arnold, June 2, 2008

IBM: Watching Cloud Patterns

June 1, 2008

Last week, IBM announced a cloud-based, software as a service initiative. IBM has partners in this venture, which appears to focus on the insurance niche. The announcement appeared in a news release, and you can read it here.

IBM has teamed with Millbrook, Inc., whose core business is software integration for the insurance industry. Another party to the deal is Sapiens America Corp. Sapiens (whose corporate family tree is pretty complicated) is another specialist with a core competency in property and casualty.

IBM will use its Cognos 8 Business Intelligence system and the Sapiens Insight software. Both systems will make use of the Millbrook property and casualty model.

The idea is that small- and mid-sized insurance agencies will be have access to industrial-strength business intelligence systems without any on premises software. The three companies said in their release:

Business intelligence and predictive analytics tools are becoming the strategic mainstay of how service enterprises in general, and insurance carriers in particular, conduct their daily business. Companies that have near real-time ability to analyze the entirety of their captured business data and extract key performance indicators and accurate answers to “what-if” scenarios can be more responsive to a rapidly evolving business environment and can competitively maximize profitable operations while moving away from risky propositions.

The announcement struck me as significant step for IBM. IBM has been a player in online and cloud-based services for quite a while. In the late 1990s, IBM Global Network ramped up as an Internet service provider eventually selling that business to AT&T. IBM made some noise several years ago about its grid computing capability. Its AlphaWorks initiative has pushed cloud computing as well. Now IBM is testing the water for niche-focused SaaS or Software as a Service. IBM and its new pal Google are working cooperatively on an educational project to stimulate the flow of programmers with expertise in writing programs for distributed systems.

My thought is that this SaaS warrants observation. On paper and in white board “what if” sessions, IBM could deploy a number its software systems as cloud-based services. The question is, “What’s next in online services for IBM?” Will IBM, like Google, sit on the sidelines and watch Amazon.com, Salesforce.com, and other companies push this market forward?

Stephen Arnold, June 1, 2008

Related story from InfoWorld here.

IN-Q-TEL Investments: 2006 to April 2008

June 1, 2008

This table brings the summary of IN-Q-TEL investments through April 2008. You can access the investments from 2000 to 2003 here. The investments from 2004 and 2005 are here.

Read more

IN-Q-TEL Investments: 2004-2005

May 31, 2008

I’m delighted with the response to my table and links of IN-Q-TEL’s investments up to 2002. If you want to review this information, click here. In this essay, I want to provide the list of companies receiving funding in the two year period from 2004 to 2005. As one of the people reviewing my list pointed out, there are some companies associated with IN-Q-TEL that do not appear in my table. My source is the publicly-accessible information on the IN-Q-TEL Web site. If you know of an investment that I have omitted, please, use the comments section of this Web log to share your information. I appreciate the numerous suggestions to make the list more useful. There is a limit to what we have time to assemble for a no-cost information resource. Please, tell me what you think would improve the utility of the list. If it’s light weight, then I will consider altering the basic information in the table. The table appears after the jump.

Read more

Fast Financials: Three Day Old Fish Should Be Discounted

May 30, 2008

You may want to download the revised financials that are available today–May 30, 2008–on the Fast Search & Transfer Web site here. Information that I recall seeing on various Web sites is either no longer available or I lack the skills to locate the data. Mary-Jo Foley in her All about Microsoft Web log wrote a useful description of the implications of the deal when it was first announced. You can read this story here.

Some Fast Search corporate and general business information has been deleted because it was old or because it was deemed no longer of interest. Fortunately, I have a habit of downloading interesting documents when I first see them. Fast Search information is tough to locate using public Web sites for some reason. You can get these PDF documents directly from http://www.newsweb.no/index.jsp?messageId=209172. The explicit link from the Fast Search Web site with the pointer is here: http://www.fastsearch.com/news.aspx?m=329. Note: I am reluctant to post these documents because I am not certain of the Norwegian guidelines for this type of information.

2007 restated

A screen shot of the restated FY2007 data. I used this information plus the data in the FY2006 restated financials to make the table of numbers below.

A Walk Through

Fast Search’s top line revenues for the period from 2004 to 2007 are now reported as increasing from $66.4 million in 2004 to $143.0 million in 2007. That’s a jump of 115.4 percent. In the search engine game, the increase is good, but it does not match Google’s performance with its Google Search Appliance in the same period. Google went from zero revenue in 2004 to an estimate $400 million in the same period. (Note: that Google reported $188 million for its enterprise unit, but I have calculated monies from its educational initiative, maps, and partner contributions in the form of sign up fees, among other enterprise revenue flows.)

Year Revenue (Restated) Original Revenue Statement

2004

$66,374,000

$66.300,000

2005

$98,069,000

$100,300,000

2006

$133,741,000

$162,200,000

2007

$142,979,000

n.a.

Nothing too dramatic in this run down except the sharp decrease in FY2006 numbers. But what’s $30 million in today’s loosey goosey financial world? However, when you look at the Fast Search restatements in terms of revenue, I found the losses interesting.

A Warning Signal from Fast Search

I have a copy of the Fast Search & Transfer Mid Quarter Presentation by Joseph J. Lacson, dated December 2006. That document has some optimistic comments about Fast Search’s opportunities. The presentation is no longer available on the Fast Search Web site, but I have made a couple of screen shots from the presentation to give you a sense of what caught my attention. (Since the document is no longer available on the Web, you may want to skip my discussion of this information. I wish I could provide a link to the full document, but I don’t have permission to do that. I wrote Fast Search’s PR department, but I haven’t heard anything from them.)

Read more

Knewco: Community Tags

May 29, 2008

Peter Suber offers a clear, detailed post about a new approach to community tags. You can read his post “Combining OA, Wikis, Community Annotation, Semantic Processing, and Text Mining” here. Mr. Suber includes a link to a discussion of the idea in Genome Biology here.

What’s interesting to me is the specialist nature of the effort. Although anyone can tag, the focus is STM (scientific, technical, and medical). The idea is to create rich indexing for technical information. I think this is a good idea. I think there will be challenges because a small number of people do most of the work. Nevertheless, these types of projects are sorely needed.

The company responsible for the technology is Knewco, founded by several academics. You can learn more about the firm here. Knewco has developed some tag options that are interesting. I think the value will come from POW or “plain old words”.

Why do I care about this and what’s the wiki variant have to do with search? Well, a lot. First, technical information has long been in the hands of a small number of multi-national firms. If you want to search engineering or chemical information, you have to use specialist files and sometimes pay big, big online access charges. This type of project is one more example of the research community feeling its oats. Good for researchers and potentially threatening to the oligopolies in the STM information business.

Second, I like the idea that information innovation is coming from thinkers outside the traditional IR (information retrieval) community. When I go to conferences, there are 20-somethings who have an opportunity to lecture me on their major insight, Use For references. Okay, been there. Done that. Fresh thinking is important, and I am delighted that Knewco is trying pop ups, colors, and other bells and whistles that may point to some new directions in tagging.

Finally, the larger the body of publicly accessible tags, the better the next-generation systems will be. Google, as I point out in my new study due out in September 2008, is focused on making its software smarter. Humans play a role, but the GOOG knows the value of indexing, taxonomies, tags, and their breathern.

On the downside, I don’t like the company name “Knewco”. In fact, Knewco uses coinages for its different functions; for example, a “knowlet”. I hate having to memorize a neologism for something I call a cross reference. But that’s a personal preference. Check the company’s Web technology here.

Stephen Arnold, May 29, 2008

Good Enough Means Trouble for Commercial Database Publishers

May 28, 2008

I began work on my new Google monograph. (I’m loath to reveal any details because I just yesterday started work on this project.) I will be looking at Google’s data management inventions in an attempt to understand how Google is increasing its lead over search rivals like Microsoft and Yahoo while edging ever closer to providing data services to organizations choking on their digital information.

As part of that research, I came across several open source patent documents that explain how Google uses the outputs of several different models to determine a particular value. Last week a Googler saw my presentation which featured a Google illustrative output from a patent application and, in a Googley way, accused me of creating the graphic in Photoshop.

Sorry, chipper Googler, open source means that you can find this document yourself in Google if you know how to search. Google’s system is pretty useful for finding out information about Google even if Googlers don’t know how to use their own search system.

How does Google make it possible for my 86-year-old father to find information about the town in Brazil where we used to live and allow me to surface some of Google’s most closely-guarded secrets? These are questions worth considering. Most people focus on ad revenues and call it a day. Google’s a pretty slick operation, and ads are just part of the secret sauce’s ingredients.

Running Scenarios In my experience, it’s far more common for a team to use a single model and then run a range of scenarios. The high and low scenario outputs are discarded, and the results are averaged. While not perfect, the approach yields a value which can be used as is or refined as more data become available to the system. Google’s twist is that different models generate an answer.

Incremental improvements pay off over time

This diagram shows how Google’s policy of incremental “learnings” allows one or more algorithms to become more intelligent over time.

The outputs of each model is mathematically combined with other models’ outputs. As I read the Google engineers’ explanations, it appears that using multiple models generates “good enough” results, and it is possible, according to the patent document I am now analyzing, to replace models whose data are out of bounds.

Read more

Lawyers: Mixed Opinions about Law and Online Giants

May 26, 2008

I’m no attorney (thank goodness). I don’t understand lawyers, lawmakers, or the pundits who explain what legal eagles do, don’t do, and won’t do.

Two news items caught my attention. Despite my feeling lousy, I decided to urge you to read both. The CNet story explains that Viacom is suing Google for one billion dollars. Google argues that it complies with applicable copyright laws. The old media company and the new media company are going to meet in court. (Someone told me that 95 percent of litigation is resolved before going to trial.) You can read this clear write up here.

The other story is from ZDNet Australia about a judge in that country who opines that “Google, Yahoo Make Lawmakers Impotent”. You can read it here.

The Australian judge offers the opinion that technology is changing too fast for the courts. Technology allows some companies to “beat the legal system”.

Lawyers, based on my limited experience, are not good technologists. My sample is small, but the attorneys whom I have known also have trouble with math. Suggest that a Riemann zeta function is a use procedure, and I experienced a nervous chuckle. The sidekick of blind justice were not sure if I was kidding, or I were serious.

In my first Google study “The Google Legacy” and in my second “Google Version 2.0” I argued that lawyers could kill Google. Both are available from Infonortics Ltd. in Tetbury, Glou.

I’m not sure if “kill” is the correct word. A legal process can suck money, management attention, and public perception at prodigious rates. A sufficiently bad run of luck in the courts could slap a weight jacket on the GOOG.

On the other hand, lawyers, if the Australian judge’s observation is somewhat accurate, might be their own bear trap. A lawyer trying to explain how algorithms and teenagers undermine a traditional media giant could confuse matters in an interesting way.

My view is that technology is not just outpacing the legal system. Technology is in the process of redefining some of the principles that are codified in many countries’ laws. The problem is analogous to the wrenching of the Roman legal system before Julius Caesar and the wild and crazy mess that followed his brief term in office. Roman law never adapted. One might point out that Italy’s present legal system is still pretty wacky. Nevertheless, the Italian technologists in Modena, Bologna, and Rome seem to be innovating without much friction from Italian courts.

Yahoo could be taken out by the courts. The company is in “transition”. Google, on the other hand, may have the resources to deal with lawyers who want to put technology in its place, snap a shock collar on Google, and keep the clueless traditional giants paying those fat, fat fees. In law, attorneys’ math is good enough to get those bills in the mail.

Stephen Arnold, May 27, 2008

Government High-Tech Investments: IN-Q-TEL

May 26, 2008

I received an email from a colleague new to the Federal sector. Her email included comments and links about US government funding of high technology companies. I was surprised because I assumed that most people knew of the IN-Q-TEL organization. As US government urls go, IN-Q-TEL’s will baffle some people. First, the hyphens throw off some folks. Then the group’s use of the Dot Org domain is another.

inqtel splash

In a nutshell, IN-Q-TEL makes clear what it does and why:

IN-Q-TEL identifies, adapts, and delivers innovative technology solutions to support the missions of the Central Intelligence Agency and the broader US intelligence community.

I’m not interested in whether IN-Q-TEL is doing a great job or a lousy job. I’m not concerned about its mission, its funding, or its management team.

What I find fascinating is the organization’s choice of companies in which to invest. I don’t know the budget range of IN-Q-TEL, but my sources tell me that the investments stick close to $1 million, sometimes more, sometimes less. You can read more about IN-Q-TEL at these links:

  • The Wikipedia entry, and I am not vouching for the accuracy of this entry
  • The CIA’s own description here
  • KMWorld’s write up here. (I am a paid columnist for KMWorld, but I did not contribute to this story.)

The purpose of this feature is to provide a snapshot of the companies in which IN-Q-TEL has invested. I’ve identified more than 70 companies. This is too many to put in one posting, so I will break up the list and cover the period 2000 to 2003 here and do each subsequent year in additional Beyond Search postings.

In the period from 2000 to 2003, IN-Q-TEL invested in 25 companies. Keep in mind that I may have overlooked some in my research. If you know of a company I missed, please, use the comment section of this Web log to update my information. These appear in the table below:

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta