Commercial Online at Crunch Time

August 17, 2009

Einstein would be confused about the meaning of “time” in the search and content processing sector.

In the early days of online, commercial database producers controlled information that was accessible online. The impetus for electronic information was the US government. Some of the giants of the early online world were beneficiaries of government contracts and other government support for the technology that promised to make information findable.

I recall hearing when my father worked in Washington, DC in the 1950s that there was “government time.” The idea, as I recall, is that when a government entity issues a contract or support, the time lines in that deal stated start and stop dates, but not how fast the work had to be completed. I learned when but a youth that “government time” could be worked so that the contract could be extended. As a result, government time had a notional dimension known to insiders. Outsiders would have another view of time.

Source: http://focus.aps.org/files/focus/v23/st18/time_tunnel_big.jpg

When the first commercial online systems became available, time gained another nuance. Added to the idea of “government time” was the idea that computing infrastructure required time to process information. Programmers needed time to write code and debug programs. Systems engineers needed time to figure out how to expand a system. More time was needed to procure the equipment and time was necessary to get the hardware like DASDs (direct access storage devices) deliver and online.

One word—”time”—was used to refer to these many different nuances and notions of time. Again the outsider was essentially clueless when it came to understanding the meaning of “time” when applied to any activity related to electronic information.

Fast forward to 1993 and the availability of the graphic browser to make the Internet usable to average folks. The idea that a click could display a page in front of the user in very little time was compelling. The user received information quickly and formed an impression that the time required to access information via the Internet was different from the time required to schlep to a library to get information. Time became distorted with another load of meaning: work processes.

Now think about the meaning of “time” today. Vendors are no longer content with describing a system as fast and responsive. The word time has been turbo charged with the addition of the adjectival phrase “real time”.

What is real time? What is real time search? If you think about the meaning of time itself in the online world, you may conclude as I have that when an online vendor says “time”, you don’t have a firm understanding of what the heck the vendor means. When a vendor says “real time” or “near real time”, we are further into the fog.

Let me identify some of the distortions that exist with the word “time” and online information. Keep in mind that this is a preliminary list and I may add, delete, or otherwise modify these notes at some point in the future:

Time in the sense of currency or freshness of information. The vast majority of online information is not fresh; that is, information may be a week, a month, a year, or a decade “old”. The information may have value, but it may be stale, even incorrect. Most content lacks a date created identifier. The file system dates, often used by default by search and content processing systems, may not correlate the date on which the document was created or updated. Most users are blissfully unaware of document “time”
Time in the sense of index updates. Online systems do not update indexes when new information is processed by the system. Index updates are tricky in many search systems and are scheduled so that other operations are not impaired by the process. As a result, a Google News page or a NewsNow.co.uk page may be refreshed in X minutes, but the pages may contain information that is less new than information available elsewhere and from other services. Therefore, an index may seem new, current, and fresh, but that index may be hours, days, or weeks old.

Vendors use time so customers may not know how to locate the meaning in an information context. Image source: http://obligement.free.fr/images/effet_escher_convexconcave.jpg
Time in the sense of system performance. Most search systems use a number of methods (tricks) to deliver results to the user in a snappy manner. The more quickly a query is processed and results returned to the user, the happier the user. Fast response with lousy results can be more satisfying to a user than sluggish response with more relevant results. Users get impatient with the “time” required to get needed information. If perceived as too long, the user abandons the search or the service.
Time in the sense of getting “here and now” information in search results. The user expects a news feed or similar service to provide the items from a flow of content in the datasphere. In reality, RSS and other types of feeds have latency. This is the “time” required for the system to acquire, process, and distribute information. The delays can range from a minute or less to as long as several days.
Time in the sense of “real” real time. This meaning refers to low latency access to information as it happens. Examples include processing of financial news. This “real” real time can be expensive because traditional hardware and software cannot cope with the engineering required to chop “wait” out of the content and data processing steps. Example: Exegy’s real time content processing devices.

As you may conclude from this list, the idea of “time” provides many opportunities for imparting a perception to a customer or a user. The actual reality may be very different.

So what’s this mean?

In short, when any vendor talks about time, get the term defined. A failure to understand how a vendor or a system is using the term “time” may set the stage for an information surprise. I don’t like information surprises. Your mileage may vary.

Stephen Arnold, August 17, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, Feature, Online (general), Technology, Text processing

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.