The Alleged Received Wisdom about Predictive Coding

June 19, 2012

Let’s start off with a recommendation. Snag a copy of the Wall Street Journal and read the hard copy front page story in the Marketplace section, “Computers Carry Water of Pretrial Legal Work.” In theory, you can read the story online if you don’t have Sections A-1, A-10 of the June 18, 2012, newspaper. Check out a variant of the story appears as “Why Hire a Lawyer? Computers Are Cheaper.”

Now let me offer a possibly shocking observation: The costs of litigation are not going down for certain legal matters. Neither bargain basement human attorneys nor Fancy Dan content processing systems make the legal bills smaller. Your mileage may vary, but for those snared in some legal traffic jams, costs are tough to control. In fact, search and content processing can impact costs, just not in the way some of the licensees of next generation systems expect. That is one of the mysteries of online that few can penetrate.

The main idea of the Wall Street Journal story is that “predictive coding” can do work that human lawyers do for a higher cost but sometimes with much less precision. That’s the hint about costs in my opinion. But the article is traditional journalistic gold. Coming from the Murdoch organization, what did I expect? i2 Group has been chugging along with relationship maps for case analyses of important matters since 1990. Big alert: i2 Ltd. was a client of mine. Let’s see that was more than a couple of weeks ago that basic discovery functions were available.

The write up quotes published analyses which indicate that when humans review documents, those humans get tired and do a lousy job. The article cites “experts” who from Thomson Reuters, a firm steeped in legal and digital expertise, who point out that predictive coding is going to be an even bigger business. Here’s the passage I underlined: “Greg McPolin, an executive at the legal outsourcing firm Pangea3 which is owned by Thomson Reuters Corp., says about one third of the company’s clients are considering using predictive coding in their matters.” This factoid is likely to spawn a swarm of azure chip consultants who will explain how big the market for predictive coding will be. Good news for the firms engaged in this content processing activity.

What goes faster? The costs of a legal matter or the costs of a legal matter that requires automation and trained attorneys? Why do companies embrace automation plus human attorneys? Risk certainly is a turbo charger?

The article also explains how predictive coding works, offers some cost estimates for various actions related to a document, and adds some cautionary points about predictive coding proving itself in court. In short, we have a touchstone document about this niche in search and content processing.

My thoughts about predictive coding are related to the broader trends in the use of systems and methods to figure out what is in a corpus and what a document is about.

First, the driver for most content processing is related to two quite human needs. First, the costs of coping with large volumes of information is high and going up fast. Second, the need to reduce risk. Most professionals find quips about orange jump suits, sharing a cell with Mr. Madoff, and the iconic “perp walk” downright depressing. When a legal matter surfaces, the need to know what’s in a collection of content like corporate email is high. The need for speed is driven by executive urgency. The cost factor clicks in when the chief financial officer has to figure out the costs of determining what’s in those documents. Predictive coding to the rescue. One firm used the phrase “rocket docket” to communicate speed. Other firms promise optimized statistical routines. The big idea is that automation is fast and cheaper than having lots of attorneys sifting through documents in printed or digital form. The Wall Street Journal is right. Automated content processing is going to be a big business. I just hit the two key drivers. Why dance around what is fueling this sector?

Second, I think quite a few search and content processing firms are chasing this business already. The problem is that legal matters are different from niches like customer support. Let me be clear. Indexing help desk content is less risky than indexing email. An angry customer can file a law suit, so there is some risk. But missing a key document is already part of a legal carousel. Companies jumping into the legal market without understanding the nature of the search and content processing risk makes the marketing job a little more difficult. Then, once the system is in place, the licensees may find themselves forced to throw humans at the project as well. How many help desk workers double check to make sure that the information provided to a customer with a dead toaster is correct? Not many based on my experience. In the legal world, at some point human lawyers have to grind through the content. The result is not a reduction in costs, but a shifting of cost expectations. The automated systems can reduce costs for initial document processing. But the old fashioned human costs will get applied once the real legal fun begins. Search and content processing works well in some applications. In others, it is a strut, a support, a pair of Jobst compression hosiery.

Third, most of the marketers pitching eDiscovery, predictive coding, and automated document processing have been hitting the big law firms and Fortune 1000 companies hard for the last three years. The LegalTech trade show features talks, hosted events, and exhibits from a number of companies which have been fixtures of the search and content processing sector for years. The sponsors include Kroll OnTrack, Recommind, and others. HP Autonomy is in the game in a big way. Content Analyst is one of the leading providers of technology which other firms integrated into their systems including many that provide predictive coding. Brainware and ISYS Search, both owned by Lexmark, play in this market as well. What surprised me about the Wall Street Journal story is that the history of search and content processing in this sector was not mentioned. It is as if a high school student takes an American history class and emerges without knowledge of George Washington or the Civil War. Revisionism or just a desire to create a benchmark article which sets up a series of follow up articles about the latest and greatest stuff a public relations firm can present to the Murdoch reporters? I don’t know. I just like history. A smidge will do.

I will continue to surface some thoughts and observations about predictive coding, which is a tiny branch of the broader numeric systems which have been in use for decades. In search and content processing, “new” means that someone just discovered something previously unknown to that person. The reality is that manipulating digital representations of content to determine relationships, aboutness, and relevance to another entity is not new, not without benefits, not without flaws, and not without a history.

But history in search and content processing is boring. I wish I had the fresh eyes of a newcomer who learned about predictive coding. I have been around, however. “New” is in the eye of the beholder and given some added appeal when a legal document is delivered to a corporate big wig. That’s news and once it sold newspapers.

Stephen E Arnold, June 18, 2012

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.