Context: Popular Term, Difficult Technical Challenge

February 13, 2008

In April 2008, I’m giving a talk at Information Today’s Buying & Selling Econtent conference.

When I am designated as a keynote speaker, I want to be thought provoking and well prepared. So I try to start thinking about the topic a month or more before the event. As I was ruminating about my topic, I was popping in and out of email. I was doing, what some students of human behavior might call, context shifting.

The idea is that I was doing one thing (thinking about a speech) and then turning my attention to email or a telephone call. When I worked at Booz, Allen, my boss described this behavior as multi-tasking, but I don’t think what I was doing was doing two or three things at once. He was, like Einstein, not really human. I’m just a guy from a small town in Illinois, trying to do one thing and not screwing it up. So I was doing one thing at a time, just jumping from one work context to another. Normal behavior for me, but I know from observation my 86-year-old father doesn’t handle this type of function as easily as I do. I also know that my son is more adept at context shifting than I am. Obviously it’s a skill that can deteriorate as one’s mental acuity declines.

What struck me this morning was that in the space of a half hour, one email, one telephone call, and one face-to-face meeting each used the word “context”. Perhaps the Nokia announcement and its use of the word context allowed me to group these different events. I think that may be a type of meta tagging, but more about that notion in a moment.

Context seemed to be a high-frequency term in the last 24 hours. I don’t meed a Markov procedure to flag the term. The Google Trends’ report seems to suggest that context has been in a slow decline since the fourth quarter of 2004. Maybe so, but “context” was le mot de jour for me.

What’s Context in Search?

In my insular world, most of the buzzwords I hear pertain to search and retrieval, text processing, and online. After thinking about the word context, I jotted down the different meanings of the word context had in each of the communications I noticed.

The first use of context referenced the term as I defined it in my 2007 contributions to Bear Stearns’ analyst note, “Google and the Semantic Web.” I can’t provide a link to this document. You will have to chase down your local Bear Stearns’ broker to get a copy. This report describes the inventions of Ramanathan Guha. The PSE or Programmable Search engine discerns and captures context for a user’s query, the information satisfying that query, and other data that provide clues to interpret a particular situation.

The second use of context was a synonym for personalization. The idea was that a user profile would provide useful information about the meaning of a query. The idea is that a user looks for consumer information about gasoline mileage. When the system “knows” this fact, a subsequent query for “green fuel” is processed in the context of an automobile. In this case, “green” means environmentally friendly. Context makes it possible to predict a user’s likely context based on search history and implicit or explicit personalization.

The third use of context came up in a discussion about key word search. My colleague made the point that most search engines are “pretty dumb.” “The key words entered in a search box have no context,” he opined. The search engine, therefore, has to deliver the most likely match based on whatever data are available to the query processor. A Web search engine gives you a popular result for many queries. Type Spears into Google and you get pop star hits and few manufacturing and weapon hits.

When a search engine “knows” something about a user — for example, search history, factual information provided when the user registered for a free service, or the implicit or explicit information a search system gathers from users — search results can be made more on point. The idea is that the relevance of the hits matches the user’s needs. The more the system knows about a user and his context, the more relevant the results can be.

Sometimes the word context, when used in reference to search and retrieval, means “popping up a level” in order to understand the bigger picture for the user. Context, therefore, makes it possible to “know” that a user is moving toward the airport (geo spatial input), has a history of looking at flight departure information (user search history), and making numerous data entry errors (implicit monitoring of user misspellings or query restarts). These items of information can be used to shape a results set. In a more extreme application, these context data can be used to launch a query and “push” the information to the user’s mobile device. This is the “search without search” function I discussed in my May 2007 iBreakfast briefing, which — alas! — is not available online at this time.

Is Context Functionality Ubiquitous Today?

Yes, there are many online services that make use of context functions, systems, and methods today.

Even though context systems and methods add extra computational cycles, many companies are knee deep in context and its use. I think the low profile of context functions may be, in part, due to privacy issues becoming the target of a media blitz. In my experience, most users accept implicit monitoring if the user has a perception that their identity is neither tracked nor used. The more fuzzification — that is, statistical blurring — of a single user’s identity, the less the user’s anxiety about implicit tracking in order to use context data as a way to make results more relevant. Other vendors have not figured out how to add additional computational loads to their systems without introducing unacceptable latency, and these vendors offer dribs and drabs of context functionality. As their infrastructure becomes more robust, look for more context services.

The company making good use of personalization-centric context is Yahoo. Its personalized MyYahoo service delivers news and information selected by the user. Yahoo’s forthcoming OneConnect, announced this week at the telco conference in Barcelona, Spain. Based on the news reports I have seen, Yahoo wants to extend its personalization services to mobile devices.

Although Yahoo doesn’t talk about context, a user who logs in with a Yahoo ID will be “known” to some degree by Yahoo. The user’s mobile experience, therefore, has more context than a user not “known” to Yahoo. Yahoo’s OneConnect is a single example of context that helps an online service customize information services. Viewed from a privacy advocate’s point of view, this type of context is an intrusion, perhaps unwelcome. However, from the vantage point of a mobile device user rushing to the airport, Yahoo’s ability to “know” more about the user’s context can allow more customized information displays. Flight departure information, parking lot availability, or weather information can be “pushed” to the Yahoo user’s mobile device without the user having to push buttons or make finger gestures.

Context, when used in conjunction with search, refers to additional information about [a] a particular user or group of users identified as belonging to a cluster of users, [b] information and data in the system, [c] data about system processes, and [d] or information available to Yahoo though not residing on its servers.

Yahoo and T-Mobile are not alone in their interest in this type of context sensitive search. Geo spatial functions are potential enablers of news services and targeted advertising revenue. Google and Nokia seem to be moving on a similar vector. Microsoft has a keen awareness of context and its usefulness in search, personalization, and advertising.

Context has become a key part of reducing what I call the “shackles of the search box.” Thumb typing is okay but it’s much more useful to have a device that anticipates, personalizes, and contextualizes information and services. If I’m on my way to the airport, the mobile device should be able to “know” what I will need. I know that I am a creature of habit as you probably are with regard to certain behaviors.

Context allows disambiguation. Disambiguation means figuring out which of two or more possibilities is the “right” one. A good example is comes up dozens of times a day. You are in line to buy a bagel. The clerk asks you, “What kind of bagel?” with a very heavy accent, speaking rapidly and softly. You know you want a plain bagel. Without hesitation, you are able to disambiguate what the clear uttered and reply, “Plain, please.”

Humans disambiguate in most social settings, when reading, when watching the boob tube, or just figuring out weird road signs glimpsed at 60 miles per hour. Software doesn’t have the wetware humans have. Disambiguation in search and retrieval systems is a much more complex problem than looking up string matches in an index.

Context is one of the keys to figuring out what a person means or wants. If you know a certain person looks at news about Kolmogorov axioms, next-generation search systems should know that if the user types “Plank”, that user wants information about Max Planck, even though the intrepid user mistyped the name. Google seems to be pushing forward to use this type of context information to minimize the thumb typing that plagues many mobile device users today.

These types of context awareness seem within reach. Though complex, many companies have technologies, systems, and methods to deliver what I call basic context metadata. Let me note that context aware services are in wide use, but rarely labeled as “context” functions. The problem with naming is endemic in search, but you can explore some of these services at there sites. You may have to register and provide some information to take advantage of the features:

Google ig (Individualized Google) — Personalized start page, automatic identification of possibly relevant information based on your search history, and tools for you to customize the information
Yahoo MyYahoo — content customization, email previews, and likely integration with the forthcoming OneConnect service
MyWay — IAC’s personalized start page. One can argue that IAC’s implementation is easier to use than Yahoo’s and more graphically adept than Google’s ig service.

If you are younger than I or young at heart, you will be familiar with the legions of Web 2.0 personalization services. These range from RSS (really simple syndication) feeds that you set up to NetVibes, among hundreds of other mashy, nifty, sticky services. You can explore the most interesting of these services at Tech Crunch. It’s useful to click through the Tech Crunch Top 40 here. I have set up a custom profile on Daily Rotation, a very useful service for people in the information technology market.

An Even Tougher Context Challenge

As interesting and useful as voice disambiguation and automatic adjustment of search results are, I think there is a more significant context issue. At this time, only a handful of researchers are working on this problem. It probably won’t surprise you that my research has identified Google as the leader in what I call “meta-context systems and methods.”

The term meta refers to “information about” a person, process, datum, or other information. The term has drifted a long way from its Latin meaning of a turn in a hippodrome; for example, meta prima was the first turn. Mathematicians and scientists use the term to mean related to or based upon. When a vendor talks about indexing, the term metadata is used to mean those tags or terms assigned to an information object by an automated indexing system or a human subject matter expert who assigns index terms.

The term is also stretched to reference higher levels in nested sets. So, when an index term applies to other index terms, that broader index term performs a meta-index function. For example, if you have an index of documents on your hard drive, you can index groups of documents about a new proposal as “USDA Proposal.” The term does not appear in any of the documents on your hard drive. You have created a meta-index term to refer to a grouping of information. You can create meta-indexes automatically. Most people don’t apply a term to creating a folder name or new directory. Software that performs automatic indexing can assign these meta-index terms. Automatic classification systems can perform this function. I discuss the different approaches in Beyond Search, and I won’t rehash that information in this essay.

The “real context challenge” then is to create a meta context for available context data. Recognize that context data is itself a higher level of abstraction than a key word index. So we are now talking about taking multiple contexts, probably from multiple systems, and creating a way to use these abstractions in an informed way.

You, like me, get a headache when thinking about these Russian doll structures. Matryoshka (Ð¼Ð°Ñ‚Ñ€Ñ‘ÑˆÐºÐ°)mare made of wood or plastic. When you open one doll, you see another inside. You open each doll and find increasingly small dolls inside the largest doll. The Russian doll metaphor is a useful one. Each meta-context refers to the larger doll containing smaller dolls. The type of meta context challenge I perceive is finding a way to deal with multiple matryoshkas, each containing smaller dolls. What we need, then, is a digital basket into which we can put our matryoshka. Single item of context data is useful, but having access to multiple items and multiple context containers opens up some interesting possibilities.

In Beyond Search, I describe one interesting initiative at Google. In 2006, Google acquired a small company that specialized in systems and methods for manipulating these types of information context abstractions. There is interesting research into this meta context challenge underway at the University of Wisconsin — Madison as well as at other universities in the U.S. and elsewhere.

Progress in context is taking place at two levels. At the lowest level, commercial services are starting to implement context functions into their products and services. Mobile telephony is one obvious application, and I think the musical chairs underway with Google, Yahoo, and their respective mobile partners is an indication that jockeying is underway. Also at this lowest level are the Web 2.0 and various personalization services that are widely available on Web sites or in commercial software bundles. In the middle, there is not much high-profile activity, but that will change as entrepreneurs sniff the big pay offs in context tools, applications, and services. The most intense activity is taking place out of sight of most journalists and analysts. Google, one of the leaders in this technology space, provides almost zero information about its activities. Even researchers at major universities have a low profile.

That’s going to change. Context systems and methods may open new types of information utility. In my April 2008 talk, I will provide more information about context and its potential for igniting new products, services, features, and functions for information-centric activities.

Stephen Arnold, February 13, 2008

Written by Stephen E. Arnold · Filed Under Database, Enterprise, Online (general), Search

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.