IBM Uses Watson Analytics Freebie Academic Program to Lure in Student Data Scientists

May 6, 2016

The article on eWeek titled IBM Expands Watson Analytics Program, Creates Citizen Data Scientists zooms in on the expansion of the IBM  Watson Analytics academic program, which was begun last year at 400 global universities. The next phase, according to Watson Analytics public sector manager Randy Messina, is to get Watson Analytics into the hands of students beyond computer science or technical courses. The article explains,

“Other examples of universities using Watson Analytics include the University of Connecticut, which is incorporating Watson Analytics into several of its MBA courses. Northwestern University is building Watson Analytics into the curriculum of its Predictive Analytics, Marketing Mix Models and Entertainment Marketing classes. And at the University of Memphis Fogelman College of Business and Economics, undergraduate students are using Watson Analytics as part of their initial introduction to business analytics.”

Urban planning, marketing, and health care disciplines have also ushered in Watson Analytics for classroom use. Great, so students and professors get to use and learn through this advanced and intuitive platform. But that is where it gets a little shady. IBM is also interested in winning over these students and leading them into the data analytics field. Nothing wrong with that given the shortage of data scientists, but considering the free program and the creepy language IBM uses like “capturing mindshare among young people,” one gets the urge to warn these students to run away from the strange Watson guy, or at least proceed with caution into his lair.

Chelsea Kerwin, May 6, 2016

Sponsored by, publisher of the CyberOSINT monograph


Open Source Boundaries

July 3, 2015

Now here is an interesting metaphor to explain how open source is sustainable.  On, Bryan Behrenshausen posted the article, “Making Collaboration Sustainable” that references the famous scene from Tom Sawyer, where the title character is forced to whitewash a fence by his Aunt Polly.  He does not want to do it, but is able to persuade his friends that whitewashing is fun and has them pay him for the privilege.

Jim Whitehurst refers to it as the “Tom Sawyer” model, where organizations treat communities as gullible chumps who will work without proper compensation.  It is a type of crowdsourcing, where the organizations benefit from the communities’ resources to further their own goals.  Whitehurst continues that this is not a sustainable approach to crowdsourcing.  It could even backfire at some point.

He continues to saw open source requires a different mindset, one that has a commitment from its contributors and everyone is equal and must be treated/respected for their efforts.

“Treating internal and external communities as equals, really listening to and understanding their shared goals, and locating ways to genuinely enhance those goals—that’s the key to successfully open sourcing a project. Crowdsourcing takes what it can; it turns people and their ideas into a resource. Open sourcing reciprocates where it can; it channels people and their ideas into a productive community.”

The entire goal of open source is to work with a community that coalesces around shared beliefs and passions.  Behrenshausen finishes with that an organization might find themselves totally changed by engaging with an open source community and it could be for the better.  Is that a good thing or a bad thing?  It is, however, concerning for enterprise search solutions.

Whitney Grace, July 3, 2015

Sponsored by, publisher of the CyberOSINT monograph


Kroll Ontrack Enjoys Predictive Coding Award

October 20, 2014

What happened to Recommind and ZyLAB? We thought they were eDiscovery frontrunners, but now BusinessWire tells us, “Kroll Ontrack Voted Best Predictive Coding Solution in New York Law Journal Survey.” The 2014 survey tallied votes in 90 categories from readers of the law journal ALM. The press release quotes Kroll Ontrack’s VP of product management, John Grancarich:

“We are honored to be chosen as the leading predictive coding technology in the industry by New York Law Journal readers. With a focus on amplifying the power of your best reviewers, this award demonstrates the impact Review predictive coding technology has in driving increased speed, consistency and accuracy in document review.”

The strength of the predictive coding platform, we are told, comes from three parts that work together: workflow technology, “smart training” technology, and quality control/ sampling technology. The write-up emphasizes:

“Given the innovative volume control mechanisms of Review, the award-winning power of Kroll Ontrack’s predictive coding is available throughout the entire culling, filtering, early data assessment and review experience. For more information about Kroll Ontrack predictive coding technology, visit or watch a demo at”

Headquartered in Eden Prairie, Minnesota, Kroll Ontrack launched as a software firm in 1985. The company’s work with damaged hard drives led to a focus on data recovery. Now, Kroll Ontrack supplies a wealth of data-related solutions to customers in the legal, corporate, and government arenas.

Cynthia Murrell, October 20, 2014

Sponsored by, developer of Augmentext

Google Searches, Prediction, and Fabulous Stock Market Returns?

July 28, 2014

I read “Google Searches Hold Key to Future Market Crashes.” The main idea in my opinion is:

Moat [female big thinker at Warwick Business School’ continued, “Our results are in line with the hypothesis that increases in searches relating to both politics and business could be a sign of concern about the state of the economy, which may lead to decreased confidence in the value of stocks, resulting in transactions at lower prices.”

So will the Warwick team cash in on the stock market?

Well, there is a cautionary item as well:

“Our results provide evidence of a relationship between the search behavior of Google users and stock market movements,” said Tobias Preis, Associate Professor of Behavioral Science and Finance at Warwick Business School. “However, our analysis found that the strength of this relationship, using this very simple weekly trading strategy, has diminished in recent years. This potentially reflects the increasing incorporation of Internet data into automated trading strategies, and highlights that more advanced strategies are now needed to fully exploit online data in financial trading.”

Rats. Quants are already on this it seems.

What’s fascinating to me is that the Warwick experts overlooked a couple of points; namely:

  1. Google is using its own predictive methods to determine what users see when they get a search result based on the behavior of others. Recursion, anyone?
  2. Google provides more searches with each passing day to those using mobile devices. By their nature, traditional desktop queries are not exactly the same as mobile device searches. As a workaround, Google uses clusters and other methods to give users what Google thinks the user really wants. Advertising, anyone?
  3. The stock pickers that are the cat’s pajamas at the B school have to demonstrate their acumen on the trading floor. Does insider trading play a role? Does working at a Goldman Sachs-type of firm help a bit?

Like perpetual motion, folks will keep looking for a way to get an edge. Why are large international banks paying some hefty fines? Humans, I believe, not algorithms.

Stephen E Arnold, July 28, 2014

ZyLabs Mary Mack Urges Caution with Predictive Coding

July 9, 2014

An article titled ZyLAB’s Mary Mack on Predictive Coding Myths and Traps for the Unwary on The eDisclosure Information Project offers some insight into the trend of viewing predictive coding as some form of “magic.” This idea is quickly brushed aside and predictive coding is allocated back to the realm of statistics and technology. The article quotes Mary Mack of ZyLab,

“Machine learning and artificial intelligence for legal applications is our future. It’s a wonderful advance that the judiciary is embracing machine-assisted review in the form of predictive coding. While we steadily move into the second and much less risky generation of predictive coding, there are still traps and pitfalls that are better considered early for mitigation. This session and the session on eDiscovery taboos will expose a few concerns to consider when evaluating predictive coding for specific or portfolio litigation.”

In this article ZyLab offers a counterpoint to Recommind, which asserted in a recent article that predictive coding was to eDiscovery like a GPS is to driving cross-country. ZyLab prefers a much more cautious approach to the innovative technology. The article stresses an objective, fact-based discussion on the merits and pitfalls of predictive coding is a necessary step in its growth.

Chelsea Kerwin, July 09, 2014

Sponsored by, developer of Augmentext

Predictive Coding for eDiscovery Users in a Hurry

July 9, 2014

The article on Recommind titled Why eDiscovery Needs GPS (And a Soundtrack) whimsically applies the basic tenets of GPS to the eDiscovery process with the aid of song titles. If you can get through the song titles bit, there is some meat to the article, though not much. He suggests several areas where predictive coding might make eDiscovery easier and more efficient. The author explains his thinking,

“A good eDiscovery navigator will help you take a reliable Estimation Sample… early on to determine the statistically likely number of responsive documents for any issue in your matter.  It will then plot that destination clearly, along with the appropriate margin of error, and show your status toward it at every point along The Long and Winding Road. It should also clearly display the responsiveness levels you’re experiencing with each iteration as you review the machine-suggested document batches.”

The type of guidance and efficiency that predictive coding offers is already being utilized by companies conducting internal investigations and “reviewing data already seized by a regulatory agency.” The author conditions the usefulness of predictive coding on its being flexible and able to recalculate based on any change in direction.When speed and effectiveness are of paramount importance, a GPS for eDiscovery might be the best possible tool.

Chelsea Kerwin, July 09, 2014

Sponsored by, developer of Augmentext

DuPont v Kolon Industries Shines Light on Keyword Search and Spoliation and Ignores Predictive Coding

June 27, 2013

The article on e-discovery 2.0 titled The eDiscovery Trinity: Spoliation Sanctions, Keywords and Predictive Coding explores the three issues most relevant to clients and council. One case cited is Dupont v. Kolon Industries, an intellectual property lawsuit in which Kolon’s complaint was that DuPont’s forensic experts failed to exercise an efficient keyword search, meaning that of the nearly 18,000 hits only about ten percent were relevant. The article explains,

“Kolon then asserted that the “reckless inefficiency” of the search methodology was “fairly attributable to the fact that DuPont ran insipid keywords like ‘other,’ ‘news,’ and ‘mail.’” The court observed how important search terms had become in discovery: “… in the current world of litigation, where so many documents are stored and, hence, produced, electronically, the selection of search terms is an important decision because it, in turn, drives the subsequent document discovery, production and review.”

Ultimately the court favored Dupont, calling their efforts reasonable. The article mentions that although spoliation and keywords were taken into consideration in this particular case, it did not address predictive coding. What would have happened if DuPont had utilized predictive coding is entirely hypothetical, but some do argue that it could have minimized the cost and produced the same group of relevant documents. The article, though an evocative metaphor for eDiscovery, is certainly not the end of the debate.

Chelsea Kerwin, June 27, 2013

Sponsored by, developer of Augmentext

Predictive Apps Continue to Evolve

June 10, 2013

Algorithms that mine our data to predict what we want or need are getting more sophisticated. The MIT Technology Review reports, “With Personal Data, Predictive Apps Stay a Step Ahead.” Recently, Google Now (part of the latest Android version and now included in the Google search app for the iPhone) has captured some attention. That app pulls information from a user’s Gmail, Google Calendar, and Google Web searches to spontaneously present timely, relevant (ideally) information, like traffic conditions between office and home as one is wrapping up the workday.

The next stage of this predictive ability is on its way. Reporter Tom Simonite tells us:

“Engineers at Google, Osito, and elsewhere seek to wring more insights from the data they collect about their users. Osito’s engineers are working to learn more from a person’s past location traces to refine predictions of future activity, says [Osito’s Bill]Ferrell. Google Now recently began showing the weather in places it believes you’re headed to soon. It can also notify you of nearby properties for sale if you have recently done a Web search suggesting you’re looking for a new home.

“Machine learning experts at Grokr, a predictive app for the iPhone, have found they can divine the ethnicity, gender, and age of their users to a high degree of accuracy, says CEO Srivats Sampath. ‘That can help us predict places you might like to go better,’ he says. The information will be used to fine-tune the recommendations Grokr offers for restaurants and music events.”

Is the trend creepy or helpful? A bit of both, perhaps. See the article for more on the current state of this “predictive intelligence.”

My apprehension goes beyond privacy and past any discomfort with increasingly sophisticated AI. I am concerned that we are giving more fuel to the already raging confirmation-bias fire. If our devices serve up only information and entertainment we are predisposed to, how likely are we learn anything new? More broadly, the chances of conversing intelligently with someone on the other side of any professional, cultural, or political divide will continue to dwindle, since each party is relying on a different set of “facts.”

Ah, well, there is no going backward. Perhaps someone could design an app that deliberately suggests bits of content we would otherwise avoid as a way to combat our own prejudices. I would use it, and I suspect other independent thinkers would, too. Any developers out there feel like taking on a socially beneficial project?

Cynthia Murrell, June 10, 2013

Sponsored by, developer of Augmentext

Predictive Coding: Who Is on First? What Is the Betting Game?

December 20, 2012

I am confused, but what’s new? The whole “predictive analytics” rah rah causes me to reach for my NRR 33 dB bell shaped foam ear plugs.

Look. If predictive methods worked, there would be headlines in the Daily Racing Form, in the Wall Street Journal, and in the Las Vegas sports books. The cheerleaders for predictive wizardry are pitching breakthrough technology in places where accountability is a little fuzzier than a horse race, stock picking, and betting on football games.


The godfather of cost cutting for legal document analysis. Revenend Thomas Bayes, 1701 to 1761. I heard he said, “Praise be, the math doth work when I flip the numbers and perform the old inverse probability trick. Perhaps I shall apply this to legal disputes when lawyers believe technology will transform their profession.” Yep, partial belief. Just the ticket for attorneys. See

I understand that there is PREDICTION which generates tons of money to the person who has an algorithm which divines which nag wins the Derby, which stock is going to soar, and which football team will win a particular game. Skip the fuzzifiers like 51 percent chance of rain. It either rains or it does not rain. In the harsh world of Harrod’s Creek, capital letter PREDICTION is not too reliable.

The lower case prediction is far safer. The assumptions, the unexamined data, the thresholds hardwired into the off-the-shelf algorithms, or the fiddling with Bayesian relaxation factors is aimed at those looking to cut corners, trim costs, or figure out which way to point the hit-and-miss medical research team.

Which is it? PREDICTION or prediction.

I submit that it is lower case prediction with an upper case MARKETING wordsmithing.

Here’s why:

I read “The Amazing Forensic Tech behind the Next Apple, Samsun Legal Dust Up (and How to Hack It).” Now that is a headline. Skip the “amazing”, “Apple”, “Samsung,” and “Hack.” I think the message is that Fast Company has discovered predictive text analysis. I could be wrong here, but I think Fast Company might have been helped along by some friendly public relations type.

Let’s look at the write up.

First, the high profile Apple Samsung trial become the hook for “amazing” technology. the idea is that smart software can grind through the text spit out from a discovery process. In the era of a ballooning digital data, it is really expensive to pay humans (even those working at a discount in India or the Philippines) to read the emails, reports, and transcripts.

Let a smart machine do the work. It is cheaper, faster, and better. (Shouldn’t one have to pick two of these attributes?)

Fast Company asserts:

“A couple good things are happening now,” Looby says. “Courts are beginning to endorse predictive coding, and training a machine to do the information retrieval is a lot quicker than doing it manually.” The process of “Information retrieval” (or IR) is the first part of the “discovery” phase of a lawsuit, dubbed “e-discovery” when computers are involved. Normally, a small team of lawyers would have to comb through documents and manually search for pertinent patterns. With predictive coding, they can manually review a small portion, and use the sample to teach the computer to analyze the rest. (A variety of machine learning technologies were used in the Madoff investigation, says Looby, but he can’t specify which.)

Read more

IBM Asks Britain to Discover Full Potential of Crime Analysis Software

December 14, 2012

England and Wales residents are soon to elect local cop chiefs, and IBM is already trying to help the new force with a little advice regarding predictive model tech. According to the article “IBM Begs Britain’s New Top Cops: C’mon, Set Up Pre-Crime Units” on The Register, UK already uses IBM’s SPSS statistics module and 12 analyst notebook, but apparently not to the full potential of the software. Instead of crime prevention, the software is being used for “beancounting” and  basic statistical analysis.

The article comments on the potential of the predictive content:

“IBM believe British forces should hit the beat on crime prevention by employing content analysis and predictive modeling using unstructured data – something that comprises 95 per cent of the data police handle in the form of video, written statements, crime reports, media, Tweets – along with the structured stuff. Also, police should be able to draw on data from sources outside of day-to-day policing – groups involved in housing and education.”

The article states that in joining forces with US police, one specific cooperating department has reduced crime by 30 percent by predicting where a crime would happen.

Seems like IBM is a big motion picture fan. First, we note Watson is eerily similar to 2001’s smart computer HAL. Now Minority Report is moving the company toward PreCrime if this report is accurate. Next up: Disney’s Episode VII of Star Wars? We will be waiting with our popcorn.

Andrea Hayden, December 14, 2012

Sponsored by, developer of Augmentext

Next Page »

  • Archives

  • Recent Posts

  • Meta