Searching Google Patent Documents with ISYS Version 9

October 13, 2008

After my two lectures at the Enterprise Search Summit in San Jose, California, in mid-September 2008, I had two people write me about my method for figuring out Google patent documents. Please, appreciate that I can’t reveal the tools that I use which my team has developed. These are my secret sauce, but I can describe the broad approach and provide some detail about what Constance, Don, Stuart, and Tony do when I have to cut through the “fog of transparency” and lava lamp light emanating from Google.

Background

Google generates a large volume of technical information and comparatively modest amounts of patent-related documents. The starting point, therefore, is a fact that catches my attention.  One client sent two people to “watch” me investigate a technical topic. After five days of taking notes, snapping digital photos, and reviewing the information that I have flowing into my Harrod’s Creek, Kentucky, offices, the pair gave up. The procedure was easily flow charted, but the identification of an important and interesting item was a consequence of effort and years of grunting through technical material. Knowing what to research, it seems, is a matter of experience, judgment, and instinct.

The two “watchers” looked at the dozens of search, text mining, and content utilities I had on my machines. The two even fiddled with the systems’ ability to pattern match using n-gram technology, entity extraction using 12-year-old methods that some companies still find cutting edge, and various search systems from companies still in business as well as those long since bought out or simply shut down.

Here’s the big picture:

  1. Spider and collect information via various push methods. The data may be in XML, PDF, or other formats. The key point is that everything I process is open source. This means that I rely on search engines, university Web sites, government agencies with search systems that are prone to time outs, and postings of Web logs. I use exactly the same data that you can use when you run a query on any of the more than 500 systems listed here. This list is one of the keys to our work because none of the well known search systems index “everything”. The popular search engines don’t even come close. In fact, most don’t go more than two or three links deep for certain Web sites. Do some exploring on the US Department of Energy Web site, and you will what I mean. The key is to run the query across multiple systems and filter out duplicates. Software and humans do this work, just as humans process information at national security operations in many countries. (If you read my Web log, you will know that I have a close familiarity with systems developed by former intelligence professionals.)
  2. Take the filtered subset and process it with a search engine. The bulk of this Web log post describes the ISYS Search Software system. We have been using this system for several years, and we find that it is a quick indexer, so we can process new information quickly.
  3. Subset analysis. Once we have a cut from the content we are processing, then we move the subset into our proprietary tools. One of these tools runs stored queries or what some people call saved searches against the subset looking for specific people and things. My team looks at these outputs.
  4. I review the winnowed subset, and, as time allows, I involve myself in the preceding steps. Once the subset is on my machine, I have to do what anyone reviewing patents and technical documents must do. I read these materials. No, I don’t like to do it, but I have found that doing consistently the dog work that most people prefer to dismiss as irrelevant is what makes it possible for me to “connect the dots”.

Searching

There’s not much to say about running queries and collecting information that comes via RSS or other push technologies. We get “stuff” from open sources, and we filter out the spam, duplicates, and uninteresting material. Let’s assume that we have information regarding new Google patent documents. We get this information pushed to us, and these are easy to flag. You can navigate to the USPTO Web site and see what we get. You can pay commercial services to send you alerts when new Google documents are filed or published. You can poke around on the Web and find a number of free patent services. If you want to use Google to track Google, then you can use Google’s own patent service. I don’t find it particularly helpful, but Google may improve it at some point in the future. Right now, it’s on my list, but it’s like a dull but well meaning student. I let the student attend my lectures, but I don’t pay much attention to the outputs. If you want some basic information about patent documents, click here.

datacenterresults

Narrowed result set for a Google hardware invention related to cooling. This is an image generated using ISYS Version 9, which is now available.

Before Running Queries

You can’t search patent documents and technical materials shooting from the hip. When I look for information about Google or Microsoft, for instance, I have to get smart with regards to terminology. Let me illustrate. If you want to find out how Microsoft is building data centers to compete with Google, you will get zero useful information with this type of query on any system: “Microsoft and “data centers”. My actual queries are more complex and use nesting, but this test query is one you can use on Microsoft’s Live.com search. Now run the same query for “Microsoft Monsoon”. You will see what you need to know here. If you don’t know the code word “Monsoon”, you will never find the information. It’s that simple.

Because of the potentially high stakes in an IP matter and the problem of knowing the “magic” word like Monsoon, even in large firms awash in paralegals, it can be difficult to get up to speed quickly. With patent applications and patent-related matters roiling the business and technical communities, the secret to efficient searching of complex content has three ingredients:

First, I want to be able to get up to speed quickly. Specifically, I want a way to look at a particular collection of patent applications and patents. A good example is a firm such as Jarg, a company involved in data management. Jarg has claimed that Google infringed on a Jarg invention. Like most people working in search, I have a working knowledge of computers, but I don’t know the specifics of these two companies’ positions. Speed is important because an inquiry must be acted upon, squeezed in, if you will, amidst other work.

Second, I need a way to capture the information. One of my challenges, which technology never seems to address, is, “Where did I see that fact?” Whatever I do to get smart fast, I have to have a way to document what I did and an easy way to recreate the research. This becomes more important if the inquiry becomes an engagement.

Third, the time and mental effort costs must be low as a result of efficient work. When it comes to IP, I know that I can spend anywhere from $125 to $400 per query using commercial services. In my opinion, commercial services such as Derwent or Questel are useful, but I still have to process manually individual synopses of patent applications and patents. It’s neither easy nor economical to use these services for a quick overview, get a big picture, and then try to learn on the fly. I avoid that approach like the plague.

The Text Mining and Search

Text mining is the use of a software tool that includes a system and method for analyzing and discovering facts about text documents. We tested a number of systems and kept them online and available. But in terms of popularity with my team, ISYS Search Software gets the lion’s share of the use.

Here’s what I learned from my team:

1. The ISYS software processed in three minutes on my garden-variety desktop computer about 500 patent applications and patents. This means that I can update a corpus and trigger a reindex. ISYS’s auto update feature works fine, but I want to know that I have reindexed the corpus so that I can search for a specific new document and use ISYS’s features to locate documents with the same inventor, documents referenced in the new patent document, and other narrow queries essential to my analyses.

2. The ISYS system allowed us to take a domain of patent applications and patents and run a free text query, “data management”, for example. The result set showed me documents in which the term appeared. But the most important part of the display was the identification of the inventors associated with these documents. In my legal experience, having the name of a person associated with an invention is one of the most important facts. (See the illustration above.)

3. The ISYS system allows my team to jump from topic to topic, saving interesting result sets so we don’t have to go back and rerun queries developed interactively in a dialog with the ISYS system.

4. My team can search claims, either singly, by key word, or by copying a claim from one document and using that for a second query against my collection. (I read claims first, but you may have another approach.) ISYS makes it possible to slice and dice patent application and patent queries with virtually zero delay.

In less than 30 minutes my team or I are able to get a solid feel for the specific documents germane to my single point of inquiry, “data management” or “hardware”. We also generate a list of individuals in order of importance associated with that topic. I use the ISYS export feature to create output for further analysis with the ArnoldIT.com proprietary tools.

Most of our probes into Google’s patent documents and technical information take less than 30 minutes,

Probably the most important feature of the ISYS system was that we did this without consulting a manual, looking for an FAQ (frequently asked question) on the Internet, or emailing the ISYS technical support team.

The Diagrams

What about the drawings? I find these the second most important part of a patent documents.

We maintain two versions of each patent document and most technical papers. One is an ASCII version; the other is a PDF of the document. I prefer to print out only the diagrams and use the text of the patent document in electronic form so I can mark up the hard copy of the patent drawing. (ISYS can process PDF files, but I prefer to keep the PDF collection separate and use it to generate images and hard copies of patent documents.)

2003 Google Cooling 2008 Google Cooling
isys-drive-cooling-baffle.jpg isys-water-based-data-center.jpg

These two diagrams make clear the technical gulf between an earlier Google cooling invention and the water cooled data center that made news in September 2008.

Quite a bit of news swirled around Google’s “invention” of a floating data center. A representative story about this invention is here. I wrote about this invention in the context of Google’s tie up with General Electric. The connection I made was that GE is an ideal partner for Google. GE and Google are interested in green power, and GE has the engineering and manufacturing know how to contribute to Google’s floating data centers. The Google invention US20080209234 sure looks to me as though it has been inspired by the cooling systems in use at nuclear power generation stations.

The key part of our investigation is not the exhaustive look at Google’s hardware inventions. The important point was to benchmark a cooling invention US6906920 (filed in September 2003). The ISYS system made it easy for us to pinpoint a baby cooling invention in 2003. US69006920 is a scooter on steel ball bearings. The US20080209234 is an electric scooter like one of these gizmos.

Wrap Up

I’ve learned that there are many systems that can process patent applications and patent documents. What’s important to me is the combination of fast indexing and point-and-click discovery functions. We can add ASCII documents to my patent collection at any time and then reindex the entire set. Alternatively we can create a new collection for a specific legal matter. In short, we can process content as a whole or we can slice content into other components as well. You can download a trial of the ISYS:desktop 9 here. More information about ISYS:desktop 9 is here. An interview with the founder of ISYS is here.

I am not a fan of most content processing systems. I find ISYS useful to me, and the researchers who assist me use the system. So, ISYS provides me with functionality that works for me. Check it out if you are not familiar with the company’s technology. I have been using ISYS since version 3.x or 4.x.

Stephen Arnold, October 9, 2008

Comments

5 Responses to “Searching Google Patent Documents with ISYS Version 9”

  1. ISYS Enterprise Search Insights » Blog Archive » Analysts Point to ISYS » ISYS Search Software Blog on October 13th, 2008 12:02 pm

    […] finally, the venerable Steve Arnold was gracious enough to relay how he uses ISYS as one of his key tools for Google patent analysis.  Many folks know the fine work Steve has done […]

  2. Otis Gospodnetic on October 17th, 2008 2:44 pm

    Impressive workflow.

  3. Stephen E. Arnold on October 17th, 2008 2:51 pm

    Otis Gospodnetic

    I am an addled goose. The acronym for this Web log is BS.

    Stephen Arnold, October 17, 2008 from London

  4. ISYS:web 9 Now Available : Beyond Search on November 11th, 2008 11:32 am

    […] ran several queries on the new system. You can read about my tests and examine a sample screen shot here. In my April 2008 study for the Gilbane Group, I identified ISYS Search Software as a […]

  5. johnny l wilson on January 12th, 2009 7:18 pm

    vertical wind turbine from 2003 to 2008

  • Archives

  • Recent Posts

  • Meta