DuPont v Kolon Industries Shines Light on Keyword Search and Spoliation and Ignores Predictive Coding
June 27, 2013
The article on e-discovery 2.0 titled The eDiscovery Trinity: Spoliation Sanctions, Keywords and Predictive Coding explores the three issues most relevant to clients and council. One case cited is Dupont v. Kolon Industries, an intellectual property lawsuit in which Kolon’s complaint was that DuPont’s forensic experts failed to exercise an efficient keyword search, meaning that of the nearly 18,000 hits only about ten percent were relevant. The article explains,
“Kolon then asserted that the “reckless inefficiency” of the search methodology was “fairly attributable to the fact that DuPont ran insipid keywords like ‘other,’ ‘news,’ and ‘mail.’” The court observed how important search terms had become in discovery: “… in the current world of litigation, where so many documents are stored and, hence, produced, electronically, the selection of search terms is an important decision because it, in turn, drives the subsequent document discovery, production and review.”
Ultimately the court favored Dupont, calling their efforts reasonable. The article mentions that although spoliation and keywords were taken into consideration in this particular case, it did not address predictive coding. What would have happened if DuPont had utilized predictive coding is entirely hypothetical, but some do argue that it could have minimized the cost and produced the same group of relevant documents. The article, though an evocative metaphor for eDiscovery, is certainly not the end of the debate.
Chelsea Kerwin, June 27, 2013
June 10, 2013
Algorithms that mine our data to predict what we want or need are getting more sophisticated. The MIT Technology Review reports, “With Personal Data, Predictive Apps Stay a Step Ahead.” Recently, Google Now (part of the latest Android version and now included in the Google search app for the iPhone) has captured some attention. That app pulls information from a user’s Gmail, Google Calendar, and Google Web searches to spontaneously present timely, relevant (ideally) information, like traffic conditions between office and home as one is wrapping up the workday.
The next stage of this predictive ability is on its way. Reporter Tom Simonite tells us:
“Engineers at Google, Osito, and elsewhere seek to wring more insights from the data they collect about their users. Osito’s engineers are working to learn more from a person’s past location traces to refine predictions of future activity, says [Osito's Bill]Ferrell. Google Now recently began showing the weather in places it believes you’re headed to soon. It can also notify you of nearby properties for sale if you have recently done a Web search suggesting you’re looking for a new home.
“Machine learning experts at Grokr, a predictive app for the iPhone, have found they can divine the ethnicity, gender, and age of their users to a high degree of accuracy, says CEO Srivats Sampath. ‘That can help us predict places you might like to go better,’ he says. The information will be used to fine-tune the recommendations Grokr offers for restaurants and music events.”
Is the trend creepy or helpful? A bit of both, perhaps. See the article for more on the current state of this “predictive intelligence.”
My apprehension goes beyond privacy and past any discomfort with increasingly sophisticated AI. I am concerned that we are giving more fuel to the already raging confirmation-bias fire. If our devices serve up only information and entertainment we are predisposed to, how likely are we learn anything new? More broadly, the chances of conversing intelligently with someone on the other side of any professional, cultural, or political divide will continue to dwindle, since each party is relying on a different set of “facts.”
Ah, well, there is no going backward. Perhaps someone could design an app that deliberately suggests bits of content we would otherwise avoid as a way to combat our own prejudices. I would use it, and I suspect other independent thinkers would, too. Any developers out there feel like taking on a socially beneficial project?
Cynthia Murrell, June 10, 2013
December 20, 2012
I am confused, but what’s new? The whole “predictive analytics” rah rah causes me to reach for my NRR 33 dB bell shaped foam ear plugs.
Look. If predictive methods worked, there would be headlines in the Daily Racing Form, in the Wall Street Journal, and in the Las Vegas sports books. The cheerleaders for predictive wizardry are pitching breakthrough technology in places where accountability is a little fuzzier than a horse race, stock picking, and betting on football games.
The godfather of cost cutting for legal document analysis. Revenend Thomas Bayes, 1701 to 1761. I heard he said, “Praise be, the math doth work when I flip the numbers and perform the old inverse probability trick. Perhaps I shall apply this to legal disputes when lawyers believe technology will transform their profession.” Yep, partial belief. Just the ticket for attorneys. See http://goo.gl/S5VSR.
I understand that there is PREDICTION which generates tons of money to the person who has an algorithm which divines which nag wins the Derby, which stock is going to soar, and which football team will win a particular game. Skip the fuzzifiers like 51 percent chance of rain. It either rains or it does not rain. In the harsh world of Harrod’s Creek, capital letter PREDICTION is not too reliable.
The lower case prediction is far safer. The assumptions, the unexamined data, the thresholds hardwired into the off-the-shelf algorithms, or the fiddling with Bayesian relaxation factors is aimed at those looking to cut corners, trim costs, or figure out which way to point the hit-and-miss medical research team.
Which is it? PREDICTION or prediction.
I submit that it is lower case prediction with an upper case MARKETING wordsmithing.
I read “The Amazing Forensic Tech behind the Next Apple, Samsun Legal Dust Up (and How to Hack It).” Now that is a headline. Skip the “amazing”, “Apple”, “Samsung,” and “Hack.” I think the message is that Fast Company has discovered predictive text analysis. I could be wrong here, but I think Fast Company might have been helped along by some friendly public relations type.
Let’s look at the write up.
First, the high profile Apple Samsung trial become the hook for “amazing” technology. the idea is that smart software can grind through the text spit out from a discovery process. In the era of a ballooning digital data, it is really expensive to pay humans (even those working at a discount in India or the Philippines) to read the emails, reports, and transcripts.
Let a smart machine do the work. It is cheaper, faster, and better. (Shouldn’t one have to pick two of these attributes?)
Fast Company asserts:
“A couple good things are happening now,” Looby says. “Courts are beginning to endorse predictive coding, and training a machine to do the information retrieval is a lot quicker than doing it manually.” The process of “Information retrieval” (or IR) is the first part of the “discovery” phase of a lawsuit, dubbed “e-discovery” when computers are involved. Normally, a small team of lawyers would have to comb through documents and manually search for pertinent patterns. With predictive coding, they can manually review a small portion, and use the sample to teach the computer to analyze the rest. (A variety of machine learning technologies were used in the Madoff investigation, says Looby, but he can’t specify which.)
December 14, 2012
England and Wales residents are soon to elect local cop chiefs, and IBM is already trying to help the new force with a little advice regarding predictive model tech. According to the article “IBM Begs Britain’s New Top Cops: C’mon, Set Up Pre-Crime Units” on The Register, UK already uses IBM’s SPSS statistics module and 12 analyst notebook, but apparently not to the full potential of the software. Instead of crime prevention, the software is being used for “beancounting” and basic statistical analysis.
The article comments on the potential of the predictive content:
“IBM believe British forces should hit the beat on crime prevention by employing content analysis and predictive modeling using unstructured data – something that comprises 95 per cent of the data police handle in the form of video, written statements, crime reports, media, Tweets – along with the structured stuff. Also, police should be able to draw on data from sources outside of day-to-day policing – groups involved in housing and education.”
The article states that in joining forces with US police, one specific cooperating department has reduced crime by 30 percent by predicting where a crime would happen.
Seems like IBM is a big motion picture fan. First, we note Watson is eerily similar to 2001’s smart computer HAL. Now Minority Report is moving the company toward PreCrime if this report is accurate. Next up: Disney’s Episode VII of Star Wars? We will be waiting with our popcorn.
Andrea Hayden, December 14, 2012
October 30, 2012
I heard the cheerleading over the news broadcasts about the terrible storm. I urge you to read “Google Now: Behind the Predictive Future of Search.” But the “real” story from the “real” journalist is the subtitle: “How Google Learned to Un Fragment Itself and Create the Next Big Thing.” Faint praise. No. Bold assertions about the “un fragmented Google.”
The guts of the story pivot on Google’s new service Google Now. The idea is that “now” information is what defines the modern mobile user. I use my mobile as a phone and to check email. Therefore, I struggle with the “predictive future” thing.
The idea is that
… your phone is more “Personal Assistant” than “Bar bet settler.” The difference is that the former actually understands what you need while the latter is a blunt search instrument.
Universal appeal is assumed. The secret ingredient for the predictive search magic is Android 4.2.
Here’s the write up’s digest of the “big thing”:
It’s essentially an app that combines two important functions: voice search and “cards” that bubble up relevant information on a contextual basis. Actually, Google Now technically only refers to the ambient information part of the equation, a branding kerfuffle that distinguishes it from Apple’s Siri product yet still causes confusion. Those cards might contain local restaurants, the traffic on your commute home, or when your flight is about to take off. They appear automatically as Google tries to guess the information you’ll need at any given moment. While it seems like a relatively simple service, it’s only really possible because of the massive amount of computational power Google can leverage alongside the massive amount of data Google knows about you thanks to your searches.
The predictive search functionality has been part of Google Web search since August 2012. The key point is:
These new cards are actually similar to a feature that Google added to its web search results this past August, both in content and in style. That’s probably not an accident — if you assume Google has already won the battle for search, the next battle is giving you information before you even search for it. When it comes to deciding which data to give you, Barra tells us that Google has “a pipeline [...], possibly in the hundreds of cards” from its many engineering teams. Rather than flood users with all of those new cards, Google is taking a slow and steady approach to adding those new features — if only because right now it can only add those cards with a software update.
The numerical recipes behind the Now service include neural networks (what I call smart software) and knowledge graphs (entity relationships). Both of these methods have been in development and use for years. Google itself owns a chunk of a company which has quite sophisticated predictive technology. There is more to come from Google, including hot visualizations and improved voice interaction with mobile devices.
If you want to see a write up that puts the Dallas Cowboy cheerleaders to shame, check out this story. Like the cheerleaders, there will be changes in the line up with each update cycle. For now, the magic is in the eye of the True Believer.
I just make voice calls and check email.
Stephen E Arnold, October 31, 2012
October 8, 2012
Darned amazing. It is like rocket science for dummies. The Wall Street Journal’s Market Watch reports, “Recommind Announces ‘Predictive Coding for Dummies’.” The publication, part of the “for Dummies” series of manuals, aims to help document reviewers speed and automate their process. The press release explains:
“This guide is a definitive text covering the challenges of document review in eDiscovery, what makes it vital to legal cases, and what to look for in an eDiscovery solution. ‘Predictive Coding for Dummies’ also outlines real-world cost savings through Predictive Coding solutions like Axcelerate Review & Analysis, Recommind’s leading end-to-end eDiscovery product. . . .
“Through hundreds of implementations, Recommind understands firsthand the high cost associated with using old approaches to document review and the benefits an eDiscovery solution provides. Recommind’s eDiscovery solution is designed to address the specific context of today’s law firms and legal departments, including the ever-increasing volume of information.”
Though it sounds like the guide may amount to an info-advertisement for Recommind’s products, you may be able to glean some useful nuggets from it. Chapter titles include “Information Explosion and Electronic Discovery”; “Putting Predictive Coding to Work”; and “The Top Benefits of Predictive Coding.”
Cynthia Murrell, Occtober 08, 2012
September 10, 2012
You may have heard of the deep extraction company Attensity. There is another company in a similar business with the name inTTENSITY. Not the playful misspelling of the common word “intensity.” What happens when a person looking for the company inTTENSITY get when he or she runs a query on Google. Look at what Google’s autocomplete suggestions recommend when I type intten:
The company’s spelling appears along with the less helpful “interstate ten”, “internet explorer ten”, and “internet icon top ten.” If I enter “inten”, I don’t get the company name. No surprise.
Is Google’s autocomplete a help or hindrance? The answer, in my opinion, is it depends on the users and what he or she is seeking.
I just read “Germany’s Former First Lady Sues Google For Defamation Over Autocomplete Suggestions.” According to the write up:
When you search for “Bettina Wulff” on Google, the search engine will happily autocomplete this search with terms like “escort” and “prostitute.” That’s obviously not something you would like to be associated with your name, so the wife of former German president Christian Wulff has now, according to Germany’s Süddeutschen Zeitung, decided to sue Google for defamation. The reason why these terms appear in Google’s autocomplete is that there have been persistent rumors that Wulff worked for an escort service before she met her husband. Wulff categorically denies that this is true.
The article explains that autocomplete has been the target of criticism before. The concluding statement struck me as interesting:
In Japan, a man recently filed a suit against Google after the autocomplete feature started linking his names with a number of crimes he says he wasn’t involved in. A court in Japan then ordered Google to delete these terms from autocomplete. Google also lost a similar suit in Italy in 2011.
I have commented about the interesting situations predictive algorithms can create. I assume that Google’s numerical recipes chug along like a digital and intent-free robot.
September 4, 2012
I know there are quite a few companies who depend upon, integrate with, and otherwise cheerlead for Microsoft SharePoint. Heck, there are consultants a-plenty who tout their “expertise” with SharePoint. The problem is that some folks are not taking advantage of SharePoint’s glories. There are also some, if the data in “Most Popular Content Management Systems by Country” are accurate, who may never embrace SharePoint’s wonderfulness.
The write up appeared in W3Tech and makes clear that the top dog in content management is WordPress, followed by Joomla. Both of these are open source systems. The article asserts:
WordPress, as the most popular CMS overall, also dominates this picture. It is the number one system in most countries in North and South America, Europe and Oceania, many countries in Asia including Russia and India, and surprisingly few countries in Africa. Joomla dominates a fair number of countries in Africa, for example Algeria, Morocco and Nigeria, several countries in Central and South America, such as Venezuela, Ecuador and Cuba, two countries in Europe, Greece and Bosnia, as well as Afghanistan and a number of other countries in Asia.
Are SharePoint centric vendors ignoring the market shifts in content management and search?
So where is SharePoint popular? Where do companies like BA Insight, Concept Searching, dtSearch, Recommind, SurfRay, and dozens upon dozens of other SharePoint supporters need to focus their sales efforts? According to W3Techs:
SharePoint is the number one system in Saudi Arabia, Egypt, Qatar and Lebanon as well as on .mil sites, which again don’t show up as separate country in our chart.
And China? Bad news. W3Tech says:
Discuz is a Chinese system that dominates its home market with 49% market share, but is not so much used outside China.
Thank goodness for Skype and Webex. A sales call and conference visit in these countries can whack an expense budget.
Many stakeholders in search and content processing companies believe that SharePoint as a market will keep on growing and growing. That may indeed happen. However, SharePoint centric vendors are likely to find themselves forced to pivot. At this time, a couple of search and content processing vendors have begun the process. Many have not, and I think that as the cost of sales and marketing rises, investors will want to learn about “life after SharePoint.”
How quickly will this message disseminate? Paddling around in Harrod’s Creek, I think that some companies will continue to ride the SharePoint bandwagon. That’s okay, but the “sudden pivot” which Vivisimo is trying to pull off with its “big data” positioning can leave some people confused.
SharePoint has been a money machine for third parties and consultants for a long time. The history of SharePoint is rarely discussed. The focus is on making the system work. That approach was a money maker when there was strong cash flow and liberal credit. As organizations look for ways to cut costs, open source content management systems seem to be taking hold. We are now tracking these important market shifts in our new service Text Radar.
If the W3Tech data are incorrect, the SharePoint vendors with their assertions about smart algorithms and seamless integration will blast past Autonomy’s record search revenues of almost $1 billion. But most search vendors are not Autonomy and are likely to be mired in the $3 to $15 million range where the vast majority of search and content processing vendors dwell.
Could the future be open source and high value, for fee add ons that deliver a solid punch for licensees? We have analyzed the open source search and content processing sector for IDC, and open source as an alternative to SharePoint content management, content processing, and search may have some wind in its sales. How many SharePoint centric vendors will generate a hundred million in revenue this year? Perhaps zero?
Stephen E Arnold, September 4, 2012
Sponsored by Augmentext
June 18, 2012
We’ve heard before how data analysis will change how we view and use information, but it will have a huge impact on the legal system. The Pittsburgh Post-Gazette has the following headline, “Pittsburgh Lawyer Wins Landmark Case Involving Use of Predictive Coding In Discovery Process.” Thomas Gricks III, a partner at Schnader Harrison Segal & Lewis, filed to have predictive coding useable in circuit court for ten suits his firm represented. Gricks had more than 2 million documents to sift through and he used predictive coding to characterize the files, so he would only have to review a smaller portion. His strategy worked and has set a precedent for the legal system.
Here’s a prediction for the future:
“Rather than keyword searching of documents, predictive coding uses analytic searching that looks for concepts, said Peter Mansmann, chief executive of Precise Inc., a Downtown-based firm that provides trial consulting, e-discovery and document retention services. ‘It’s kind of new to the legal industry, and though it’s statistically shown to be accurate, people are afraid to use it because they’re not sure if it’s admissible in court,” Mr. Mansmann said. ‘The importance of this case is that now you have a judge who said, ‘Yes. I’m going to accept this as a reasonable approach to handling discovery. It will open the door for this to become a more accepted method. And it’s a huge cost savings. Typically, attorney review is the most expensive part of the process.’ ”
Had Gricks not used predictive coding, he would still have several interns and paralegals sifting through two million documents to find evidence for his clients. He would have had to pay and train those people, but predictive coding takes out man-hours and lowers litigation costs. Content Analyst offers predictive coding technology and solutions as one of the many services it offers their clients. The Content Analyst technology is available for licensing. Worth a look.
Whitney Grace, June 18, 2012
Sponsored by Content Analyst
June 15, 2012