Cuil Founder Lands Another Google Invention

April 22, 2010

I have been reluctant to beat up on the alleged weaknesses of the Cuil.com system for one good reason. Dr. Anna Patterson is a very sharp computer scientist. She developed a quite ingenious system called Xift which she sold to the AltaVista.com crowd. After more engineering and family work, she joined Google and invented some fascinating technology which I discuss in Google Version 2.0. Even though she and her equally smart companion founded Cuil.com, the Patterson impact on Google continues. One example is the April 20, 2010 patent granted for her invention “Information Retrieval System for Archiving Multiple Document Versions.” You can read in my studies The Google Legacy and Google Version 2.0 about the importance of this technique to some Google “time” centric processes. A moment’s reflection will reveal that this ability to traverse deltas has some interesting applications. There are other benefits as well, but the invention is meritorious in my opinion and worth reading in US 7,702,618. Here’s the fine Google/lawyer explanation in the patent’s abstract:

An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. Index data for multiple versions or instances of documents is also maintained. Each document instance is associated with a date range and relevance data derived from the document for the date range.

Dr. Patterson has tallied more than a half dozen inventions for the Google. I pay attention to her work and I discount much of the criticism aimed at her most recent activities. In my experience, the systems reveal significant insights into the trajectory of search. Care to disagree? Just bring some facts and your list of inventions and your record of innovation in search. Dr. Patterson may find the dust up amusing. I will.

Stephen E Arnold, April 22, 2010

Unsponsored post. Dr. Patterson let me pet one of her dogs once. Does that count as a payoff?

The Seven Forms of Mass Media

April 21, 2010

Last evening on a pleasant boat ride on the Adriatic, a number of young computer scientists to be were asking about my Google lecture. A few challenged me, but most seemed to agree with my assertion that Google has a large number of balls in the air. A talented juggler, of course, can deal with five or six balls. The average juggler may struggle to keep two or three in sync.

One of the students shifted the subject to search and “findability.” As you know, I floated the idea that search and content processing is morphing into operational intelligence, preferably real-time operational intelligence, not the somewhat stuffy method of banging two or three words into a search box and taking the most likely hit as the answer.

The question put to me was, “Search has not kept up with printed text, which has been around since the 1500s, maybe earlier. What are we going to do about mobile media?”

The idea is that we still have a difficult time locating the precise segment of text or datum. With mobile devices placing restraints on interface, fostering new types of content like short text messages, and producing an increasing flow of pictures and video, finding is harder not easier.

I remembered reading “Cell Phones: The Seventh Mass Media” and had a copy of this document on my laptop. I did not give the assertion that mobile derives were a mass medium, but I thought the insight had relevance. Mobile information comes with some interesting characteristics. These include:

  • The potential for metadata derived from the user’s mobile number, location, call history, etc
  • The index terms in content, if the system can parse information objects or unwrap text in an image or video such as converting an image to ASCII and then indexing the name of a restaurant or other message in an object
  • Contextual information, if available, related to content, identified entities, recipients of messages, etc.
  • Log file processing for any other cues about the user, recipient(s), and information objects.

What this line of thinking indicates is that a shift to mobile devices has the potential for increasing the amount of metadata about information objects. A “tweet”, for instance, may be brief but one could given the right processing system impart considerable richness to the information object in the form of metadata of one sort or another.

The previous six forms of media—[I] print (books, magazines, and newspapers), [II] recordings; [III] cinema; [IV] radio; [V] television; and [VI] Internet—fit neatly under the umbrella of [VII] mobile. The idea is mobile embraces the other six. This type of reasoning is quite useful because it gathers some disparate items and adds some handles and knobs to the otherwise unwieldy assortment in the collection.

In the write up referenced above, I found this passage interesting: “Mobile is as different from the Internet as TV is from the radio.”

The challenge that is kicked to the side of the information highway is, “How does one find needed information in this seventh mass media?” Not very well in my experience. In fact, finding and accessing information is clumsy for textual information. After 500 years, the basic approach of hunting, Easter egg style, has been facilitated by information retrieval systems. But I think most people who look for information can point out some obvious deficiencies. For example, most retrieval systems ignore content in various languages. Real time information is more of a marketing ploy than a useful means of figuring out the pulse count for a particular concept. A comprehensive search remains a job for a specialist who would be recognized by an archivist who worked in Ephesus’ library 2500 years ago.

barokas video

Are you able to locate this video on Ustream or any other video search system? I could not, but I know the video exists. Here is a screen capture. Finding mobile content can be next to impossible in my opinion.

When I toss in the radio and other rich media content, finding and accessing pose enormous challenges to a researcher and a casual user alike. In my keynote speech on April 15, 2010, I referenced some Google patent documents. The clutch of disclosures provide some evidence that Google wants to apply smart software to the editorial job of creating personalized rich media program guides. The approach strikes me as an extension of other personalization approaches, and I am not convinced that explicit personalization is a method that will crack the problem of finding information in the seventh medium or any other for that matter.

Here’s my reasoning:

  • Search and retrieval methods for text don’t solve problems. The more information processed means longer results lists and an increase in the work required to figure out where the answer is.
  • Smart systems like Google’s or the Cuil Cpedia project are in their infancy. An expert may find fault with smart software that is actually quite stupid from the informed user’s point of view.
  • Making use of context is a challenging problem for research scientists but asking one’s “friends” may be the simplest, most economical, and widely used method. Facebook’s utility as a finding system or Twitter’s vibrating mesh may be the killer app for finding content from mobile devices.
  • As impressive as Google’s achievements have been in the last 11 years, the approach remains largely a modernization of search systems from the 1970s. A new direction may be needed.

The bright young PhDs have the job of figuring out if mobile is indeed the seventh medium. The group with which I was talking or similar engineers elsewhere have the job of cracking the findability problem for the seventh medium. My hope is that on the road to solving the problem of the new seventh medium’s search challenge, a solution to finding information in the other six is discovered as well.

The interest in my use of the phrase “operational intelligence” tells me one thing. Search is a devalued and somewhat tired bit of jargon. Unfortunately substituting operational intelligence for the word search does not address the problem of delivering the right information when it is needed in a form that the user can easily apprehend and use.

There’s work to be done. A lot of work in my opinion.

Stephen E Arnold, April 20, 2010

No sponsor for this post, gentle reader.

Explaining Artificial Intelligence to Everyone

April 18, 2010

Science Daily ran a story on April 1, 2010. I was not sure if this story was a joke or whether it was serious. I will let you decide. The title was “Grand Unified Theory of AI: New Approach Unites Two Prevailing but Often Opposed Strains in Artificial-Intelligence Research.” The write up explains the Math Club approach; that is, the use of numerical methods, which are now popular. The article describes the rules based approach, which requires a human to write the rules. The core of the story is a pitch for the “Church system”. Science Daily explains:

“With probabilistic reasoning, you get all that structure for free,” Goodman says. A Church program that has never encountered a flightless bird might, initially, set the probability that any bird can fly at 99.99 percent. But as it learns more about cassowaries — and penguins, and caged and broken-winged robins — it revises its probabilities accordingly. Ultimately, the probabilities represent all the conceptual distinctions that early AI researchers would have had to code by hand. But the system learns those distinctions itself, over time — much the way humans learn new concepts and revise old ones. “What’s brilliant about this is that it allows you to build a cognitive model in a fantastically much more straightforward and transparent way than you could do before,” says Nick Chater, a professor of cognitive and decision sciences at University College London. “You can imagine all the things that a human knows, and trying to list those would just be an endless task, and it might even be an infinite task. But the magic trick is saying, ‘No, no, just tell me a few things,’ and then the brain — or in this case the Church system, hopefully somewhat analogous to the way the mind does it — can churn out, using its probabilistic calculation, all the consequences and inferences. And also, when you give the system new information, it can figure out the consequences of that.”

We talked about this write up at lunch and decided that we would invite readers to read the article and draw a conclusion about a “unified theory of artificial intelligence.”

Stephen E Arnold, April 19, 2010

A freebie.

A-Life NLP Renew Medical Automation Deal

April 17, 2010

A-Life Medical, Inc., a leading provider of computer-assisted coding (CAC) products and services to the healthcare industry, announced today the renewal of an extensive contract with Associated Billing Services, Inc. “Associated Billing Services Renews Extensive Agreement with A-Life Medical”  The computerized coding and workflow management product that leverages A-Life’s proprietary and patented technology, LifeCode ® appears to be the source of reaping the cost-savings benefits and efficiencies key to a successful business.

According to Associated Billing Services’ vice president, Matthew Frick:

“We have built a long-standing relationship with A-Life based on the benefits of the company’s patented NLP technology. Its accuracy rate and ability to appropriately code quickly, seamlessly and efficiently, has helped us to significantly reduce turnaround time, labor costs and accounts receivable days of services outstanding.”

Using NLP technology, A-Life deciphers electronic transcribed patient encounters via the Internet through its data center, which are then appropriately coded for reimbursement purposes.

Melody K. Smith, April 17, 2010

Note: Post was not sponsored.

Blogs May Be Training Input for AI Systems

April 16, 2010

The Montréal Gazette ran an interesting story “’Mundane’ Blogs Could Help Train Artificial-Intelligence Computers: Researcher. I think of blogs as marketing vehicles, not instructional material. That goes to show how little I know. For me, the key passage in the write up was:

For Andrew Gordon, there’s no such thing as a boring blog — even if it chronicles making breakfast or walking to work. A research scientist at the University of Southern California’s Institute for Creative Technologies, he’s heading a new project with the ambitious aim of archiving every English-language blog entry posted online — a million of them a day — in hopes of using this vast database to teach artificial-intelligence computers about real life. “People write about the mundane aspects of their daily life, and for me, personally, I find it incredibly interesting,” he says.

This line of research falls within what has been called “a formalization of common sense.”

Stephen E Arnold, April 16, 2010

No one paid for this post.

Google and Disruption: Will It Work Tomorrow?

April 15, 2010

Editor’s Note: The text in this article is derived from the notes prepared by Stephen E Arnold’s keynote talk on April 15, 2010. He delivered this speech as part of Slovenian Information Days in Portoroz, Slovenia.

Thank you, Mr. Chairman. I am most grateful for the opportunity to address this group and offer some observations about Google and its disruptive tactics.

I started tracking Google’s technical inventions in 2002. A client, now out of business, asked me to indicate if “Google really had something solid.”

My analysis showed a platform diagram and a list of markets that Google was likely to disrupt. I captured three ideas in my 2005 monograph “The Google Legacy“, which is still timely and available from Infonortics Ltd. in Tetbury, Glos.

The three ideas were:

First, Google had figured out how to add computing capacity, including storage, using mostly commodity hardware. I estimated the cost in 2002 dollars as about one-third what companies like Excite, Lycos, Microsoft, and Yahoo and were paying.

Second, Google had solved the problem of text search for content on Web pages. Google’s engineers were using that infrastructure to deliver other types of services. In 2002, there were rumors that Google was experimenting with services that ranged from email to an online community / messaging system. One person, whose name I have forgotten, pointed out that Google’s internal network MOMA was the test bed for this type of service.

Third, Google was not an invention company. Google was an applied research company. The firm’s engineers, some of whom came from Sun Microsystems and AltaVista.com, were adepts at plucking discoveries from university research computing tests and hooking them into systems that were improvements on what most companies used for their applications. The genius was focus and selection and integration.

image

Google is an information factory, a digital Rouge River construct. Raw materials enter at one end and higher value information products and services come out at the other end of the process.

In my  second Google monograph, funded funded in part by another client, I built upon my research into technology and summarized Google’s patent activities between 2004 and mid 2007. Google Version 2.0: The Calculating Predator, also published by Infonortics Ltd., disclosed several interesting facts about the company.

Read more

Operational Intelligence, the New Enterprise Search

April 14, 2010

Worlds are colliding. Business intelligence, search, analytics, and business process are hurtling toward one another. No collider is needed. The impetus comes from managers who are struggling to keep their firms above water. Make no mistake about it. The economic climate may be improving based on government data and the self serving reports from global financial powerhouses. But just look at the number of empty buildings, the fraying  infrastructure, and the desperation in the eyes of most employees in North America.

For those  lucky enough to be thriving in a world gone mad for sending ads to individuals, life may be good. For people who are in more traditional jobs, the notion of finding information is an everyday struggle. Without the right information at the moment it is needed, organizations can make costly mistakes. These are not errors of judgment like magazine publishers who see the iPad as the font of new revenue or the dew eyed MBA looking for a job with a third string consulting firm. Nope. These visages reflect the person who cannot explain to a customer why an order was lost or an automobile was delivered with a faulty electronic gizmo. In fact, I see the effects of downsizing, the need to squeeze extra money from every transaction, and crazy decisions made by committees everywhere I look, regardless of the country.

What’s the answer? According to a sponsored white paper from the consulting outfit IDC, Teradata has the fix. Now you may not think that even bigger piles of data will help your business. I admit that I don’t believe the premise either. You can get the story in “Real-Time Operational Intelligence Gains Momentum in Europe: Teradata-sponsored business survey shows adoption details for ‘Active Data Warehousing’” and make up your own mind. Big data means big costs in my experience.

What I liked about this write up was the phrase “real time operational intelligence”. True, the acronym RTOI is a bit clumsy, but I think the phrase points to an important shift in search and content processing. RTOI delivers what many of the people with whom I speak perceive enterprise search delivering. The idea is that the information in an organization is available when needed to help people answer questions and make decisions. Hopefully the decision makers did well in school and have a modicum of common sense.

After thinking about this phrase and the acronym RTOI, I had several thoughts:

  • Vendors of enterprise search may want to make this phrase their own. It is a heck of lot more compelling than “putting information at your fingertips” or “dashboard”
  • Search, in this phrase’s embrace,  becomes an enabler. Search becomes like butter in a recipe. Without the ingredient the dish does not work. Many vendors of search see themselves as the fish, vegetables, and spices in the meal. RTOI makes search an essential but supporting ingredient.
  • The conceptual outcome of RTOI may be consolidation of what now are marketed as separate systems. For RTOI to work, an organization needs an integrated approach. Data are not enough. The various features and functions of analytics, retrieval, report generation, and business processes must be woven together into one coherent, affordable system.

Is RTOT the future? I am willing to float a tentative, “Yes.” Fragmented information centric systems are now a cost  and resource challenge for many organizations. The time is ripe for a new approach. Maybe it will be fueled by open source software like Lucene? Maybe it will be the use of a system like Google’s? Maybe it will be a roll up following the trajectory of Autonomy or OpenText.

The status quo is not delivering and change may be coming. Teradata may not be the winner, but it has contributed a useful catch phrase in my opinion. The phrase “enterprise search” could be put to rest which would be a step forward in my opinion.

Stephen E Arnold, April 14, 2010

Unsponsored post.

OneRiot Identifies Challenges to Monetizing Real Time Info

April 13, 2010

OneRiot’s Kimbal Musk has identified the three challenges to monetizing real time information. The reasons appear in “Monetizing the Realtime Web” in the company’s blog. I agree that there is interest in real time information. Even SAS, the analytics giant, wants to hop on this fast moving content train. Law enforcement has long had an interest in knowing what’s going on, particularly in certain fast moving situations when mobile devices are used to pass messages. The challenges are, however, formidable. Mr. Musk identifies these hurdles:

  1. “Real time targeting”; that is, knowing what message goes to whom at a particular point in time. Advertisers want to fire info rifle shots, not shotgun blasts in my experience. However, real time targeting can be computationally expensive.
  2. “Data is everything”; that is, individual messages must be processed and converted into meaningful information. Google has had this challenge gripped in its teeth for more than a decade. Many organizations are struggling with this issue. There are costs and precision issues in addition to technical challenges to resolve. Better metadata are needed to make some real time information useful to an advertiser.
  3. Advertisers have some learning to do. Missionary marketing is important and some old expectations and habits can be difficult to change.

Mr. Musk provides some color about OneRiot’s successful approach provides a useful case.

The challenge is not just OneRiot’s. Google continues to tweak its presentation of real time results. I noted that our research suggests that users skip over the real time results. Some topics don’t have real time results; others do. Traditional searchers, therefore, don’t see information consistently in result sets. Consistency is important.

The larger issue, in my opinion, is that some real time results lack context. Additional information may be needed to make sense of some real time results. These injected content wrappers provide the user with the information needed to make sense of an otherwise cryptic or out of context item of information. If you run a query on a current event such as updates to the PGA tournament, the user presumably has context. But even these messages may need framing.

At this time, injection and wrapper technology is available, based on our research, just not deployed. Real time information is likely to benefit when more than the terse message is presented. Smart software may be able to shoulder the burden, converting isolated items into mini news stories.

Whoever cracks this problem will have an edge in monetization because the machine generated wrappers can have ads attached which may offer more advertising hooks.

Stephen E Arnold, April 13, 2010

Unsponsored post.

Google Snags Programmable Search Engine Patent

April 11, 2010

Short honk: The programmable search engine invention has been granted a US patent. Filed in august 2005 and published in February 2007, the PSE provides a glimpse of the Google’s systems and methods for performing sophisticated content processing. Dr. Ramanathan Guha, inventor of the PSE, has a deep interest in data management, the semantic Web and context tagging. You can download a copy of US7693830 from the USPTO. There were four other PSE patent applications published on the same day in February 2007, which is a testament to Dr. Guha’s ability to invent and write complex patent applications in a remarkable period of time. The PSE is quite important with elements of the invention visible in today’s Google shopping service, among others.

Stephen E Arnold, April 9, 2010

Unsponsored post.

Boston Search Engine Meeting and Exalead

April 9, 2010

The Evvie Award recognizes outstanding work in the field of search and content processing. Ev Brenner, one of the original founders of the Boston Search Engine Meeting emphasized the need to acknowledge original research and innovative thinking. After Mr. Brenner died, the Boston Search Engine Meeting, then owned by a company in the UK, instituted the Evvie award. This year, the Evvie is sponsored by Exalead, one of the leaders in search-based applications and ArnoldIT.com, are sponsoring the award. in addition to a cash recognition of $1,000, the recipient receives the Evvie shown below.

evvie small

For more information about the premier search and content processing conference, navigate to the Search Engine Meeting Web site. You can review the program and pre conference activities.

For more information about Exalead, navigate to the Exalead Web site. You can see a demonstration of the Exalead system on the ArnoldIT.com site here and you can explore next generation search and content processing innovations at Exalead’s “labs” site.

For more information about the award, click here.

Stephen E Arnold, April 9, 2010

This post is sponsored by ArnoldIT.com, Exalead, and Information Today, Inc.

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta