Google and Domain Dictionaries
February 27, 2009
On February 26, 2009, the USPTO published US20090055381, “Domain Dictionary Creation”, an invention by a herd of Googlers, most of whom work in the firm’s Beijing offices. A domain dictionary is a word list such as one for legal eagles or medical information. The patent document said:
Methods, systems, and apparatus, including computer program products, to identify topic words in a document corpus that includes topic documents related to a topic are disclosed. A reference topic word divergence value based on the document corpus and the topic document corpus is determined. A candidate topic word divergence value for a candidate topic word is determined based on the document corpus and the topic document corpus. The candidate topic word is determined to be a topic word if the candidate topic word divergence value is greater than the reference topic word divergence value.
What’s the implication of this invention? Think domain specific collections of content similar to those available on the floundering Dialog Information Services or LexisNexis services. Will Google push into these areas? You can take the position that patent applications are the domain of the idle and are meaningless. You can join the crowd who says, “Who knows?” Or, you can be one of a small group that assumes the effort, cost, and time involved in a patent document points to an area of interest and intent. I’m not sure to which group the addled goose belongs. I would opine that management of traditional database companies may want to read the document and do some noodling. On the other hand, it may be too late so the time might be spent in more productive pursuits.
Stephen Arnold, February 27, 2009