Another View of TAR

August 21, 2012

One judge’s endorsement of Technology Assisted Review (TAR) has set a precedent and has stirred up the eDiscovery community. The eDiscovery and Information Management blog tackles the topic in “Technology Assisted Review, Concept Search and Predictive Coding: The Limitations and Risks.”

TAR is also variously called Machine Assisted Review, Computer Assisted Review, Predictive Coding, Concept Search, and Meaning-based computing. It seems that US federal judge Andrew J. Peck ordered parties in a recent case to adopt an eDiscovery protocol, including the use of TAR as practiced by Recommind’s Axcelerate. The other side filed a complaint, and now the debate rages on.

The blog post aims to bring some perspective to the issue. While it praises text mining and machine learning, the author warns that folks should understand what predictive coding can and cannot do. The write up notes that AI techniques:

“. . . are based on solid mathematical and statistical frameworks in combination with common-sense or biology-inspired heuristics. In the case of text-mining, there is an extra complication: the content of textual documents has to be translated, so to speak, into numbers (probabilities, mathematical notions such as vectors, etc.) that machine learning algorithms can interpret. The choices that are made during this translation can highly influence the results of the machine learning algorithms.

“For instance, the ‘bag-of-words’ approach used by some products has several limitations that may result in having completely different documents ending up in the exact same vector for machine learning and having documents with the same meaning ending up as completely different vectors.”

The post points to additional complications. For example, multi-lingual documents can cause difficulties. Also, different documents may use different language to describe the same things, or their language can be ambiguous. Furthermore, the process of setting up classifiers can be time-consuming and challenging; if not implemented conscientiously the results will not be defensible in court.

See the article for more details. The post ends by noting there are other ways to automatically classify documents, and that in many cases those options will produce results that are more defensible and more manageable than those produced by TAR.

Cynthia Murrell, August 21, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Written by Stephen E. Arnold · Filed Under EDiscovery, News, Technology

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.