Recommind Publishes Predictive Coding Guide

October 8, 2012

Darned amazing. It is like rocket science for dummies. The Wall Street Journal’s Market Watch reports, “Recommind Announces ‘Predictive Coding for Dummies’.” The publication, part of the “for Dummies” series of manuals, aims to help document reviewers speed and automate their process. The press release explains:

“This guide is a definitive text covering the challenges of document review in eDiscovery, what makes it vital to legal cases, and what to look for in an eDiscovery solution. ‘Predictive Coding for Dummies’ also outlines real-world cost savings through Predictive Coding solutions like Axcelerate Review & Analysis, Recommind’s leading end-to-end eDiscovery product. . . .

“Through hundreds of implementations, Recommind understands firsthand the high cost associated with using old approaches to document review and the benefits an eDiscovery solution provides. Recommind’s eDiscovery solution is designed to address the specific context of today’s law firms and legal departments, including the ever-increasing volume of information.”

Though it sounds like the guide may amount to an info-advertisement for Recommind’s products, you may be able to glean some useful nuggets from it. Chapter titles include “Information Explosion and Electronic Discovery”; “Putting Predictive Coding to Work”; and “The Top Benefits of Predictive Coding.”

Axcelerate eDiscovery is Recommind‘s flagship product, based on their CORE platform. The company was formed in 2000. It is headquartered in San Francisco, and maintains offices around the world.

Cynthia Murrell, Occtober 08, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Another View of TAR

August 21, 2012

One judge’s endorsement of Technology Assisted Review (TAR) has set a precedent and has stirred up the eDiscovery community. The eDiscovery and Information Management blog tackles the topic in “Technology Assisted Review, Concept Search and Predictive Coding: The Limitations and Risks.”

TAR is also variously called Machine Assisted Review, Computer Assisted Review, Predictive Coding, Concept Search, and Meaning-based computing. It seems that US federal judge Andrew J. Peck ordered parties in a recent case to adopt an eDiscovery protocol, including the use of TAR as practiced by Recommind’s Axcelerate. The other side filed a complaint, and now the debate rages on.

The blog post aims to bring some perspective to the issue. While it praises text mining and machine learning, the author warns that folks should understand what predictive coding can and cannot do. The write up notes that AI techniques:

“. . . are based on solid mathematical and statistical frameworks in combination with common-sense or biology-inspired heuristics. In the case of text-mining, there is an extra complication: the content of textual documents has to be translated, so to speak, into numbers (probabilities, mathematical notions such as vectors, etc.) that machine learning algorithms can interpret. The choices that are made during this translation can highly influence the results of the machine learning algorithms.

“For instance, the ‘bag-of-words’ approach  used by some products has several limitations that may result in having completely different documents ending up in the exact same vector for machine learning and having documents with the same meaning ending up as completely different vectors.”

The post points to additional complications. For example, multi-lingual documents can cause difficulties. Also, different documents may use different language to describe the same things, or their language can be ambiguous. Furthermore, the process of setting up classifiers can be time-consuming and challenging; if not implemented conscientiously the results will not be defensible in court.

See the article for more details. The post ends by noting there are other ways to automatically classify documents, and that in many cases those options will produce results that are more defensible and more manageable than those produced by TAR.

Cynthia Murrell, August 21, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Open Source Solutions Continue to Gain Popularity

August 13, 2012

The H Open recently reported on some new developments for the open source search, discovery and analytics company Lucid Imagination in the article, “Lucid Imagination Becomes LucidWorks.”

According to the article, after continuously having customers confuse the name of the company with its flagship product, Lucid Imagination decided to go along with the customers perceptions and change its name to LucidWorks as means of avoiding further complicating branding efforts.

In addition its two product lines: LucidWorks Search and LucidWorks Big Data, both of which draw from open source products, the company has some additional plans on the horizon:

“LucidWorks has also announced that, in September, it will be setting up a community site called SearchHub.org, which will be oriented at developers. It is planned that this will include a blog from Lucene/Solr committers; of the 35 committers on the project, nine work for LucidWorks. Other planned features include video tutorials, podcasts, a community forum, up-to-date information on Lucene/Solr, and a calendar of enterprise search related hackathons and meetups.”

LucidWorks is an example of a company that has created an enterprise-grade embedded search development solution built on the power of the Apache Lucene/Solr open source search project. As technology continues to advance, companies that utilize open source technology are going to have an edge over their competition.

Jasmine Ashton, August 13, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

A Bold Assertion: eDiscovery Promises to Reduce Some Costs

August 1, 2012

Virtual Strategy Magazine recently wrote an article about “Quantum Discovery’s Predictive Coding Expertise Saves Client Over $1 Million in Attorney Review Fees.” Anyone who saw the headline would immediately click on it, wondering how they could save money with their attorney. Quantum Discovery, a top eDiscovery and forensic technology provider, was hired by a Top 100 Law Firm to analyze a 415GB dataset, containing 2 million records. After preliminary work, Quantum Discovery realized the job was too big for the client’s budget, so they asked for help.

“Quantum Discovery suggested to the Litigation Support Manager and Joe Eaton, Partner and Vice Chairperson of the Litigation Department, that the case team should consider utilizing Lateral Data’s Viewpoint TAR (Technology Assisted Review). Due to recent court rulings and articles associated with the benefits and advantages of predictive coding, Mr. Eaton decided to proceed using TAR. ‘We were looking for ways to save our client money while providing an accurate and reliable document review in a short time frame and thought this new technology being utilized by Quantum may provide a favorable vehicle to do so,’ said Mr. Eaton.”

Teamed with Lateral Data, Quantum Discovery’s software provided the vehicle for a cost-effective solution. The client was not charged an exorbitant amount of money and the case was resolved in a timely manner, relatively unheard of in the court system. On the surface, eDiscovery appears to be a money saver. In today’s economic climate, any cost cutting assertion is likely to get broad consideration. Everyone needs to believe in the tooth fairie.

Stephen E Arnold, August 1, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Lotus Notes Expands into eDiscovery

July 31, 2012

Lotus Notes is a collaborative content management platform that warehouses an organization’s content for multiple users. It does not come as a surprise, as MarketWatch tells us, that “Daegis Expands Cross-Matter Management Capabilities with Support for Lotus Notes Integration in eDiscovery Platform.” It was only a matter of time before an eDiscovery provider integrated their services with a content platform. Daegis adds Lotus Notes to Daegis eDiscovery Platform integration that will be added to other data sources for litigation purposes. Daegis already has an industry noted trademark, its Cross-Matter Management methodology, which enables data to be preserved and re-purposed in a single secure environment, spanning various collaboration platforms and file types.

Adding Lotus Notes to eDiscovery integration will lower costs:

“These expanded Lotus Notes capabilities continue to build on Daegis’ data-driven eDiscovery model. Using Daegis’ Cross-Matter Management methodology and the Master Repository that enables it, Daegis helps clients curb the rising cost of eDiscovery by leveraging the analysis already done in prior matters and applying that work product to multiple subsequent matters. All data, regardless from which collaboration platform it originates, can be processed, reviewed, and stored in a single instance for use in multiple matters, driving down costs and improving consistency both in search results and attorney review calls regardless of the platform that was originally used to create the data.”

After adding content management platforms, eDiscovery developers should concentrate on mobile eDiscovery solutions. IBM has a number of initiatives in content processing. How will the company displace the eDiscovery incumbents? Maybe predictive magic will do the trick.

Stephen E Arnold, July 30, 2012

Sponsored by Polyspot

 

More Explanation of Predictive Coding

July 24, 2012

Predictive coding is the best thing to happen to eDiscovery since its conception, but it has been hard to find an article that goes into strict detail about how it works. Mondaq finally answered the call to explain how the litigation coding works in “Predicting the Future of Predictive Coding.” It first gives the prerequisite paragraph about what predictive coding is, uses an example of hand sifting through paper, and explains about cost savings.

Then it gets into the meat:

“A recent study by Rand Corp., which includes 57 case studies from eight large corporations, shows that the cost of e-discovery can be grouped into three main categories: collection, processing and review. Amazingly, the review phase accounted for 73 percent of the costs incurred during e-discovery. Predictive coding works to drastically reduce the number of documents that are manually reviewed by lawyers.”

The process typically works in this way. Lawyers review a small document sampling and code them according to subject matter or relevance. The litigation software then studies the sampling and applies it to a larger document set. Lawyers perform quality control checks to make sure the correct relevant documents are pulled up, drastically reducing manual searching and increasing accuracy. Predictive as an adjective is now doing more work than the previous favorite “big data.” My hyperbole radar is humming.

Stephen E Arnold, July 24

Sponsored by HighGainBlog

Cloud Data Protection Offerings for eDiscovery

July 24, 2012

I would have thought that eDiscovery vendors would have stayed away from cloud based storage due to cybercriminals vamping up their attacks, but if you can make the cloud secure enough to protect client data it could work. MarketWatch reports “Lighthouse eDiscovery Deploys Riverbed Whitewater to Improve Cloud-Based Data Protection.” Lighthouse eDiscovey has selected Riverbed Technology and their Whitewater cloud storage gateway to replace its very old tape-based backup disaster recovery system. The change will allow Lighthouse to improve their eDiscovery practices, securely accelerate backup, and recovery options to the public data for better data protection.

As I guessed Lighthouse was a little weary of the cloud, until they could verify its security:

“`While cloud storage was an attractive option for us due to its immediate availability, Riverbed was a critical component of making disaster recovery in the cloud a reality. We were able to configure the Whitewater gateway in about an hour without having to replace our existing backup tool, Symantec Backup Exec,’ said Marc Larkin, System Administrator at Lighthouse. ‘Our data protection strategy is much more reliable than with tape and our backups feel local!’”

Secure data protection is one the most important aspects when researching eDiscovery and litigation support software. If data are not protected and backed-up, clients rights could be violated and evidence could possibly be lost.  The fancy math and predictive outputs may be for nought, if the source is compromised.

Stephen E Arnold, July 24, 2012

Sponsored by Ikanow 

 

The TREC 2011 Results and Predictive Whatevers

July 20, 2012

Law.com reports “Technology-Assisted Review Boosted in TREC 2011 Results” how technology-assisted review boasts that it may be capable of ousting predictive coding’s title. TREC Legal Track is an annual government sponsored project (2012 was canceled) to examine document review methods. From the 2011 TREC, participants voted in favor of technology-assisted review, but it may have a way to go:

“As such, ‘There is still plenty of room for improvement in the efficiency and effectiveness of technology-assisted review efforts, and, in particular, the accuracy of intra-review recall estimation tools, so as to support a reasonable decision that ‘enough is enough’ and to declare the review complete. Commensurate with improvements in review efficiency and effectiveness is the need for improved external evaluation methodologies,’ the report states.”

The 2011 TREC asked participants to test three document review requests, but different from past years the rules were more specific in requirements by having participants rank documents as well as which were the most responsive. The extra requirement meant that researchers were able to test hypothetical situations, but there were some downsides:

“TREC 2011 had its share of controversy. ‘Some participants may have conducted an all-out effort to achieve the best possible results, while others may have conducted experiments to illuminate selected aspects of document review technology. … Efficacy must be interpreted in light of effort,’ the report authors wrote. They noted that six teams devoted 10 or fewer hours for document review during individual rounds, two took 20 hours, one used 48 hours, and one, Recommind, invested 150 hours in one round and 500 in another.”

We noticed this passage in the write up as well:

“`It is inappropriate –- and forbidden by the TREC participation agreement –- to claim that the results presented here show that one participant’s system or approach is generally better than another’s. It is also inappropriate to compare the results of TREC 2011 with the results of past TREC Legal Track exercises, as the test conditions as well as the particular techniques and tools employed by the participating teams are not directly comparable. One TREC 2011 Legal Track participant was barred from future participation in TREC for advertising such invalid comparisons,’ the report states.”

TREC is sensitive to participants who use the data for commercial purposes. We wonder which vendor allegedly stepped over the end line. We also wonder if TREC is breaking out of the slump which traditional indexing seems have relaxed into. Is “predictive” the future of search? We are not sure about the TREC results. We do have an opinion, however. Predictive works in certain situations. For others, there are other, more reliable tools. We also believe that there is a role for humans, particularly when the risks of an algorithm going crazy exist. A goof in placing an ad on a Web page is one thing. An error predicting more significant events? Well, we are more cautious. Marketers are afoot. We prefer the more pragmatic approach of outfits like Ikanow and we avoid the high fliers whom we will not name.

Stephen E Arnold, July 20, 2012

Sponsored by Polyspot

 

Predictive Analytics and Big Data with Higher Costs to Boot

July 19, 2012

Predictive analytics and big data are two of the biggest buzzwords in the legal and It professions at the moment. Both deal with the beneficial power of analytics, but soon the two concepts will meet and combine. Knoll Ontrack wrote “Predictive Coding Helps Tackle Big Data” explaining what will happen when the two shall meet. The article explains that as big data becomes more widespread it will make the e-disclosure process more expensive.

Predictive coding could make big data more cost-effective just as it makes attorney fees lower:

“However, having a mushrooming quantity of data means that when an e-disclosure request is issued, it takes even longer to trawl through information, identify relevant documents and compare duplicates. With the increasing time it takes, legal costs can skyrocket, a worrying trend for businesses in the current climate where margins are already stretched thin. For this reason the introduction of predictive coding in likely to be popular as it leaves the legwork to a sophisticated algorithm, finding relevant documents which can then be reviewed more closely.”

Can you take some of the marketing assertions about predictive methods and win at the race track or the stock market? I know that I would not invest my retirement savings in systems which purport to tell the future. Software can provide some guidance, but the decision making requires human effort. Cost cutting and dreams of sugar plums may be behind some of the bold assertions about the magic of predictive methods. Run a query for “predictive analytics” on Google You will have an opportunity to work through the assertions directly. Doing one’s homework reduces some of the risks associated with embracing methods which are often a blend of math and marketing. Expensive? We agree. Possibly higher costs. We would suggest greater risk in some situations.

Stephen E Arnold, July 19, 2012

Sponsored by Polyspot

Document Management Is Ripe For eDiscovery

July 18, 2012

If you work in any aspect related to the legal community, you should be aware that eDiscovery generates a great deal of chatter. Like most search and information retrieval functions, progress is erratic.

While eDiscovery, according to the marketers who flock to Legal Tech and other conferences, will save clients and attorneys millions of dollars in the long run, there will still be some associated costs with it. Fees do not magically disappear and eDiscovery will have its own costs that can accrue, even if they may be a tad lower than the regular attorney’s time sheets.

One way to keep costs down is to create a document management policy, so if you are ever taken to court it will reduce the amount of time and money spent in the litigation process. We have mixed feelings about document management. The systems are often problematic because the management guidance and support are inadequate. Software cannot “fix” this type of issue. Marketers, however, suggest software may be up to the task.

JD Supra discusses the importance of a document management plan in “eDiscovery and Document Management.” The legal firm of Warner, Norcross, and Judd wrote a basic strategy guide for JD Supra for people to get started on a document management plan. A plan’s importance is immeasurable:

“With proper document management, you’ll have control over your systems and records when a litigation hold is issued and the eDiscovery process begins, resulting in reduced risk and lower eDiscovery costs. This is imperative because discovery involving electronically stored data — including e-mail, voicemail, calendars, text messages and metadata — is among the most time-consuming and costly phases of any dispute. Ultimately, an effective document management policy is likely to contribute to the best possible outcome of litigation or an investigation.”

The best way to start working on a plan is to outline your purpose and scope—know what you need and want the plan to do. Also specify who will be responsible for each part of the plan—not designating proper authority can leave the entire plan in limbo. Never forget a records retention policy—it is legally require to keep most data for seven years or permanently, but some data can be deleted. Do not pay for data you do not have to keep. Most important of all, provide specific direction for individual tasks, such as scanning, word management, destruction schedule, and observing litigation holds. One last thing, never under estimate the importance of employee training and audit schedules, the latter will sneak up on you before you know it.

If, however, you still are hesitant in drafting a plan can carry some hefty consequences:

  • “Outdated and possibly harmful documents might be available and subject to discovery.
  • Failure to produce documents in a timely fashion might result in fines and jail time: one large corporation was charged with misleading regulators and not producing evidence in a timely matter and was fined $10 million.
  • Destroying documents in violation of federal statutes and regulations may result in fines and jail time: one provision of the Sarbanes-Oxley Act specifies a prison sentence of up to 20 years for someone who knowingly destroys documents with the intent to obstruct a government investigation.”

A document management plan is a tool meant to guide organizations in managing their data, outlining the tasks associated with it, and preparing for eventual audits and litigation procedures. Having a document management plan in place will make the eDiscovery process go quicker, but another way to make the process even faster and more accurate is using litigation support technology and predictive coding, such as provided by Polyspot.

Here at Beyond Search we have a healthy skepticism for automated content processing. Some systems perform quite well in quite specific circumstances. Examples include Digital Reasoning and Ikanow. Other systems are disappointing. Very disappointing. Who are the disappointing vendors? Not in this free blog. Sign up for Honk!, our no holds barred newsletter, and get that opt-in, limited distribution information today.

Whitney Grace, July 18, 2012

Sponsored by Polyspot

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta