Cicumvallation: Reed Elsevier and Thomson as Vercingetorix

November 27, 2009

Google Scholar Gets Smart in Legal Information

One turkey received a presidential pardon. Other turkeys may not be so lucky on November 26, 2009, when the US celebrates Thanksgiving. I am befuddled about this holiday. There are not too many farmers in Harrod’s Creek. The fields contain the abandoned foundations of McMansions that the present economic meltdown have left like Shelly’s statue of Ozymandius. The “half buried in the sand” becomes half built homes in the horse farm.

As Kentuckians in my hollow give thanks for a day off from job hunting,, I am sitting by the goose pond trying to remember what I read in my copy of Caesar’s De Bello Gallico. I know Caesar did not write this memoir, but his PR bunnies did a pretty good job. I awoke this morning thinking about the connection between the battle of Alesia and what is now happening to the publishing giants Reed-Elsevier and Thomson Reuters. The trigger for this mental exercise was Google’s announcement that it had added legal content to Google Scholar.

vercingetorix

What’s Vercingetorix got to do with Google, Lexis, and Westlaw? Think military strategy. Starvation, death, surrender, and ritual killing. Just what today’s business giants relish.

Google has added the full text of US federal cases and state cases. The coverage of the federal cases, district and appellate, is from 1924 to the present. US state cases cover 1950 to the present. Additional content will be added; for example, I have one source that suggested that the Commonwealth of Virginia Supreme Court will provide Google with CD ROMs of cases back to 1924. Google, according to this source, is talking with other sources of US legal information and may provide access to additional legal information as well. What are these sources? Possibly
Public.Resource.Org and possibly Justia.org, among others.

The present service includes:

  • The full text of the legal document
  • Footnotes in the legal document
  • Page numbers in the legal document
  • Page breaks in the legal document
  • Hyperlinks in the legal document to cases
  • A tab to show how the case was cited in other documents
  • Links to non legal documents that cite a case.

You can read various pundits, mavens, and azure=chip consultants’ comments on this Google action at this link.

You may want to listen to a podcast called TWIL and listened to the November 23, 2009, show on which Google Scholar was discussed for about a half hour. You can find that discussion on iTunes. Just search for TWIL and download the program “Social Lubricants and Frictions.”

On the surface, the Google push into legal information is a modest amount of data in terms of Google’s daily petabyte flows. The service is easy to use, but the engineering required to provide access to the content strikes me as non-trivial. Content transformation is an expensive proposition, and the cost of fiddling with legal information is one of the primary reasons commercial online services have had to charges hefty fees to look at what amounts to taxpayer supported, public information.

The good news is that the information is free, easily accessible even from an iPhone or other mobile device. The Google service does the standard Google animal tricks of linking, displaying content with minimal latency, and updating new content in a a minute or so that content becoming available to Google software Dyson vacuum cleaner.

So what?

This service is similar to others I have written about in my three Google monographs. Be aware. My studies are not Sergey-and-Larry-eat-pizza books. I look at the Google open source technical and business information. I ignore most of what Google’s wizards “say” in public. These folks are “running the game plan” and add little useful information for my line of work. Your mileage may differ. If so, stop reading this blog post and hunt down a cheerful non-fiction Google book by a real live journalist. That’s not my game. I am an addled goose.

Now let me answer the “so what”.

First, the Google legal content is an incremental effort for the Google. This means that Google’s existing infrastructure, staff, and software can handle the content transformation, parsing, indexing, and serving. No additional big-buck investment is needed. In fact, I have heard that the legal content project, like Google News, was accomplished in the free time for play that Google makes available to its full time professionals. A bit of thought should make clear to you that commercial outfits who have to invest to handle legal content in a Google manner have a cost problem right out of the starting blocks.

Second, Google is doing content processing that should be the responsibility of the US government. I know. I know. The US government wants to create information and not compete with commercial outfits. But the outfits manipulating legal information have priced it so that most everyday Trents and Whitneys cannot afford to use these commercial services. Even some law firms cannot afford these services. Pro bono attorneys don’t have enough money to buy yellow pads to help their clients. Even kind hearted attorneys have to eat before they pay a couple a hundred bucks to run a query on the commercial online services from publicly traded companies out to make their shareholders have a great big financial payday. Google is operating like a government when it processes legal information and makes it available without direct charge to the user. The monetization takes place but on a different business model foundation. That also spells T-R-O-U-B-L-E for the commercial online services like Lexis and Westlaw.

Third, the linking function makes it much easier to root around in the content domain or dataspace and knit together Document A with Document B. Using Google makes it possible for some computer literate legal eagles to develop a different work flow. For the attorney with lots of clients, the work flow changes means more income because work can be done quickly and with confidence. Instead of charging by the hour, the savvy attorney can slap a fixed fee on certain types of work and let Google  do the grunt work. Payday for the savvy lawyer. A predictable legal fee means a big smile on the client’s face. For the big buck online service selling the * same * content with a clunky work flow, that happy face is flipped upside down. Google’s legal content will just happen.

image

My hunch is that as the Google Scholar gains traction, courts will go to Google. Google won’t have to put feet on the street or hire lawyers from the bottom quartile of a graduating class to crate the type of legal information that has allowed billions of dollars in revenue to flow to the Lexis type outfits.

So, the world has changed. And quickly. Sad Thanksgiving in some of the for fee legal online services today I surmise.

How Commercial Legal Information Works

I want to capture a few thoughts on how traditional online has operated since the late 1970s when Lockheed Dialog, SDC, and a handful of other companies like Dialcom got rolling. I have covered most of this information in my “Mysteries of Online” series which you can find by searching this Web log.

The basic way to make dough in commercial online in 1979 was simple. Define a content domain; for example, publications carrying news about pharmaceutical products and the executives working on those products. Then get the hard copies of the publications with this information. Next, hire people who knew something about the domain to read the stories, identify the who, what, when, and where. The information was typed into a mainframe’s green screen input form. Each record representing a single news story was written to a DASD or other storage device. On a schedule, usually every month in the early days, the records of each indexed and processed story was sent to an online service in a format strictly defined by the online service. This meant that the QA and QC process had to be performed by another human looking at a record on a green screen. This was tedious and there was lots of to and fro of the source documents, particularly when weird made up pharma type words were in the source article. The online company would “load” the big reel of tape on their mainframes, run some checks to make sure the data were lined up correctly, and then the content would be added to the existing index of that particular database. The index update was a time scheduled affair. There was no “real time” anything. The update was scheduled. When the new content became available, users would search for that new information using a command like UD=9999 and standing profiles would be generated for a fee, of course, and sent to users of that database. The timesharing company took the bulk of the revenue although some database producers could get a 50 / 50 revenue deal. The giant timesharing companies were constantly trying to get “exclusives” for revenue producing databases. The giant timesharing companies wanted to be monopolies because then their revenues would be completely under the monopoly’s control. A database producer was like a soy bean farmer in Illinois just without the government subsidies. The customer had to pay whatever the timesharing company charged, usually by such interesting methods as:

  • A specific fee to look at a bibliographic record. This meant that the researcher had to have access to a hard copy or microfilm containing the source document.
  • A specific fee for a carriage return. This was a nifty trick at Mead Paper when it owned Lexis. Don Wilson, one of the founders of Lexis, explained that the carriage return’s $0.02 charge generated tons of money and sold lots of the custom printed red Lexis terminal output paper. Great idea, right? (Google in my opinion looks positively altruistic in comparison.)
  • A fee to receive training. Remember in the late 1970s, computers were not the iPhone type of gizmo. There was quite a bit to learn whenever a computer processing textual information was in the work process. The commercial companies would charge directly or indirectly to teach people to use these systems. By the late 1980s, the “training” was free, but as the systems became easier to use, the barrier was lowered. I miss those mainframe days and JCL by the way.
  • A fee to get a custom report from the timesharing company about user behavior. Commercial online companies define the user of the system as the timesharing company’s customer. The commercial database producer was a soy bean farmer and did not need customer data. To get customer data, the commercial database producer had to pay the timesharing company and then wait as much as a month for the report to be delivered on hard copy green bar paper. Some of you reading this blog post don’t know what green bar paper is. Pity.

To sum up, everything involved in commercial online content production was labor intensive, complicated, and time intensive. As a result, searching online was an expensive proposition. Where we are today is that the giants of commercial online legal information still operate with many of the precepts, business methods, and technology approaches that are now more than 40 years old. These outfits have been trying to change, bless their hearts. As you may know, traditional information companies are having a bit of a tough go at the moment. Sad in a way, but these outfits are the equivalent of the Han dynasty’s giant wooden war ships.

image

Big targets like this ancient Chinese war ship are easy to attack with modern technology. Caesar would have understood this vulnerability and like Google would attack weak points with the best available technology. Source: http://www.chinahistoryforum.com/index.php?/topic/2462-chinese-warships/

A bass fishing boat with a single M79 40mm grenade launcher could neutralize one of these slow moving, ponderous behemoths in a nonce. That’s what’s going to happen with some commercial information outfits. Google Scholar is a bit more than an M79, but you get the idea.

Let’s Think about Alesia

The basic idea in this interesting battle is circumvallation.  The diagram below shows the core of the idea:

image

Source: http://en.wikipedia.org/wiki/Battle_of_Alesia

You surround the enemy with a wall, essentially a big jail. When the enemy attempts to flee, the enemy is trapped inside the circumvallation. Julius Caesar defeated Vercingetorix. Caesar was a calculating predator. (I wonder where he would work if he were alive today?) Vercingetorix was no wimp, but he underestimate the Romans. When Vercingetorix sent the women and children out of Alesia he assumed that Caesar would let them go free. Wrong. Caesar either let them die or killed them. Check out the meaning of “predator” or read more about Caesar’s management style in De Bello Gallico.

Mark Antony, quoted in Fighting Techniques of the Ancient World by Simon Anglim, et. al.: [We must attack] the walls and bodies of the enemy, which they will yield to bravery, to the sword to despair….this very day must decide for us either a complete victory or death. [Emphasis added]

Google, in my opinion, is using the same tactic in legal information. There is one important difference. Google cannot lose with its approach. More about why I hold this opinion in moment. The commercial legal information companies can and even a partial victory may not be enough to save their businesses as they exist today.

When I step back, I see Google using this tactic in other content domains in the future as well. Unless the Alesians came up with a method to beat Caesar at his own game, their fate was sealed. Unless the legal publishers come up with a method to beat Google at its own game, their fate will be sealed in my opinion.

Now let’s think about why Caesar’s strategy is applicable to the coming dust up between Google and the commercial online legal information services.

The Financial Impact of Google Scholar’s Legal Content

The commercial online legal vendors have a very difficult cost problem. Companies like Reed Elsevier and Thomson Reuters have their roots in traditional business methods, processes, and business models. The giant legal publishing units have been slow to change for one reason: inertia. Change is expensive in human and material terms. The customers themselves are in part to blame. Once a law firm finds a way to billing heaven, why should the partners embrace a different, unproven approach. The bonus in the hand each year is far better than a bonus in a digital bush.

Digging deeper into the traditional legal information business, the business methods themselves provide even more significant indications of the firms’ weaknesses. The idea that software can perform the work of humans is not really believed. Sure, there are some in the information technology department who know that certain processes can be enhanced with software, but commercial, off-the-shelf products are expensive and require “time” to tune. When the SharePoint servers go down, there are too few cycles to tackle smart software. Tests are easier, and these provide management with a sense that the legal publisher is making significant progress. The reality is that the methods in use today at the big legal publishing companies would be recognizable to the employee from 1981 and probably to Gutenberg’s assistants if one were available for this personnel bake off.

Another issue is that the US federal government has been mired in red tape that Napoleon Bonaparte would find amusing. The legal publishers have, over time, formed relationships that allow these firms to gather content, get clarification about murky issues, and operate with the government agencies as their customers. Little wonder that change has been slow to come where these giant legal publishers and their source generating entities interact. The notion that the US government should make legal documents available is not a new one. It arose in the planning for the original index of the US government’s citizen facing content. Where did that idea go in the last nine years? Exactly nowhere. So the cozy world of the giant legal publishing companies and the entities producing public legal documents worked to prevent these documents from being available to most people. Those with a tame legal eagle or just lots of cash faced no barriers. The average person running a tire store might as well think about joining the next Shuttle launch. Both were effectively out of reach.

Finally, the business environment itself ossified around traditional for fee online services. I learned about online when I was at Booz, Allen & Hamilton. Ellen Shedlarz, then manager of the New York BAH library, took the time to explain online to me and I was hooked. I ended up getting hired by one of the world’s leading business database producers because of my interest in online. The way online worked into he 1980s was that a trained intermediary provided access to the for fee information. The great chain of being in the information world existed. Over the years, the commercial legal information companies and most other for fee information vendors fought to keep tight control over information access. To lose that control meant a loss of revenues. Little wonder that online has been more like the factions in Serbo Croatia. Balkanization was the basic food in the traditional commercial information terrarium. The increasing access to digital information is like snow falling on dinosaurs. The easy snacks are going away. Extinction is now a real possibility.

To summarize:

  1. Commercial online companies will have to adapt or die. Legal action against Google may buy some time, but the Google has a cost advantage.
  2. The customers moving from high school to college and college to the law schools and the law schools to the unemployment line are going to create new business methods. The notion of paying for legal information will hold less charm than using services like Google’s. As I argued in The Google Legacy, even if Google is shut down tomorrow, one of these youngsters will just reinvent Google. Like a video game, the enemies just keep on coming at the traditional legal information companies.
  3. Courts, as choked and inefficient as they are, will lack the money, guts, and ability to do much more than just dump content on their servers. The future is going to carry a big shoulder shrug and a resigned sigh that says, “Let Google do it.” In effect, legal content will be dumped on a server. The folks with the most sophisticated spidering and content transformation methods will make this content useful with embedded tags, value added metadata, and other types of enrichment. For details, buy my Google trilogy. I am tired of explaining stuff that Google developed six or seven years ago.

Looking Forward

The impact of Google Scholar’s legal content will be to bleed some of the revenue flowing to Lexis and Westlaw. The amount could be as much as a 20 percent decline in quarterly revenues by March 31, 2010. Where will that money go? Probably lots of places, just not to Lexis and Westlaw. The must-have content at Lexis and Westlaw will have to make up the shortfall. This means higher prices which will force deep pocket law firms to look even more closely at Google-type services. The multi billion dollar legal information companies won’t die overnight, but they will be forced to change. The future, therefore, is uncertain. If I were an investment fund manager, I think I would take a close look at what Reed Elsevier and Thomson Reuters top line revenues reveal a six months into the Google legal push. I know I will ask my investment advisor about these companies’ future prospects.

I think the legal work processes will begin to change more quickly. I don’t think 60 year old partners’ day-to-day life will change much. I do think the change will be fast, significant, and unstoppable in other legal sectors. The notion of looking stuff up by hand and writing notes on a yellow pad will be given short shrift by some legal eagles.

I see the for fee legal publishers following in the footsteps of the magazine publishers who want to create an iTunes for magazine content. This is a modern day version of Vercingetorix’s response to Caesar. Google is potentially more powerful than Caesar when it comes to digital content battles. The traditional legal information companies have a difficult struggle in front of them. The end may be death or surrender with a few finding a way to survive in a strange new world with different business methods and business models. I can envision courts themselves using Google, not the commercial services. Not even a Type A judge wants to do grunt work when a Google box exists to get his or her honor to the golf course early in my opinion.

Finally, what will be vulnerable to Google disruption will be difficult to use, expensive, and incomplete services. Maybe Reed Elsevier, Thomson Reuters, and Wolters Kluwer should merge. That will give the present crop of senior managers time to cash out. I don’t see an easy, quick, inexpensive, or painless way to prevent the lessons of Alesia being writ large in tomorrow’s digital headlines.

Just my opinion. Have another sweet potato. Just don’t eat goose for Thanksgiving. And remember a Googler described the addition of legal content as a “happy little event.” Yep, happy.

Stephen Arnold, November 26, 2009 but posted on November 27, 2009.

Oyez, oyez, Department of Justice. No one paid me to explain how a Web search company is going to eat the legal information vendors’ lunch. True, the lunch will be mostly leftovers. But unlike this essay, that lunch for those vendors will not be free. Nope, that lunch will be expensive, probably in the $300 million range pretty darned quick.

Comments

4 Responses to “Cicumvallation: Reed Elsevier and Thomson as Vercingetorix”

  1. Circumvallation « François Schiettecatte’s Blog on November 28th, 2009 12:06 pm

    […] it was very interesting to read his post about how Google is challenging Reed Elsevier and Thomson by indexing legal texts: Google has added the full text of US federal cases and state cases. The […]

  2. Social Media and Google Scholar: The flash is not just in the pan Law Firms Beware « Answer Maven on November 30th, 2009 7:51 am

    […] their lower tier customers and cut into their revenue streams.  Stephen Arnold has taken a close look at this development and I plan on spending some more time thinking about this in terms of Fastcase […]

  3. iron sheik on December 7th, 2009 12:54 am

    i didn’t do it in the pontiac michigan because i respect my sport, but i never respect the fag! i never respect the gay! and you sir are both; the gay and the fag and the fag and the gay! goodday!

  4. dbv on December 17th, 2009 8:45 am

    Agree. History repeats itself. About 20 years ago, the encyclopedia publishers began hurting with the advent of Windows/Mac and CD’s and were finally pumelled with the combination of the Internet, www and Wikipedia.

    About 10 years ago, the newspaper and magazine publishers started hurting with the advent of the consumer web, search engines and the so called web 2.0 with its instant publishing capabilities.

    Today, Government data is beginning to be set free as it should be as it was originally paid for by taxpayers. The likes of ThomsonReuters and LexisNexis have built large profitable businesses on the back of taxpayer paid content. This is now coming to an end.

    Looking 10 years out, every legal case will be on Wikipedia with added value provided by law professors, students, lawyers and others. This will happen faster than people realize because there aren’t that many precedent setting legal cases … literally, the numbers in the US are in the 1 million range.

  • Archives

  • Recent Posts

  • Meta