Email Analysis
July 5, 2008
This summer I have been asked about email analysis on two different occasions. In order to respond to these requests, I had to grind through my archive of email-related information. I wrote about Clearwell Systems and its approach earlier this year. You can read this essay here.
I cannot reproduce the information my paying customers received. I can take a representative company–in this case, Stratify, a unit of Iron Mountain–and show you two different screen shots. These layouts and representations are the property of Stratify, and I am including them in this essay for two reasons:
- Stratify has been one of the early players in text analytics. First as Purple Yogi and then as Stratify, the company was engaged in the difficult missionary marketing needed to make non believers into believers
- The company has gained some traction in the legal market, which in the US, is a booming sector. The problems of the economy translate into a harvest of riches for some legal firms. Email is a big deal in discovery, and few have the resources to get a human to read all the baloney that zooms around an organization involved in a legal matter.
The Problem
You know the problem. Email was once ASCII shot between two people on Arpanet. Today email is the bane of the knowledge worker. The volume is high. The storage systems antiquated. The attachments madden the sane. The people using email forget that the messages live on different servers and can, in the process of discovery, be copied to a storage device and delivered to the attorney or attorneys who have to find something germane to the legal matter in the terabytes of digital data.
To summarize the challenges:
- Email volume (lots of it, maybe a billion messages in a mid-sized organization every year)
- Email attachments (tough to find the “right” one)
- Email crashes (restores don’t always work, which you probably know first hand)
- Email sent as if it were a one-time, secret communication
- Email with recipients who, by definition, have some relationship.
For a lawyer, email is good and bad. It’s good if one finds a smoking gun or better yet a gun in the act of shooting. It’s bad if the bullets are coming at the opposing side’s legal eagles, worse if the bullet shoots a legal eagle out of the sky with a slug through the brain.
Ergo: email is a big, big deal in the information world of litigation.
The Solution
The fix is obvious–search. Actually to be precise, the conundrums of email invite text processing, text analytics, link analysis, relationship extraction, entity extraction, and other nifty methods.
The basics of email analysis are actually simple on the surface, more complicated under the hood and out of sight of non-technical types like lawyers: [a] copy email to a storage device that is fast, [b] tell email analysis program to index the email, [c] key word search or browse outputs, [d] make notes, print out email, and read individual documents of interest, [e] repeat taking care to bill for the time. (That’s the best part of email analysis. It’s quicker than manual methods, but the systems have to have a baby sitter. Those operating these systems can bill without working up too much of mental headache. Automated processes do make some legal thinking less painful. The best part is billing for this less stressful time.)
What do these systems show the user? The illustration below shows a Stratify search screen. Since I obtained this screen shot, Stratify has probably updated the interface. The main features are our interest. Take a look at what the Stratify system user sees when analyzing processed email:
Stratify’s email visualization
The principal features of this display are:
- Simplicity. You don’t want to confuse attorneys
- A picture showing people and their relationships as discerned by the system. Remember, an email can be sent to a person unrelated to a subject either by accident or for some other reason such as an “this is what I am doing” courtesy
- Links on the right hand panel to make it easy for the user to poke around by sender, topic, etc.
Let’s assume that the email is one part of a discovered collection of information. Stratify provides a richer interface. This one includes the bells and whistles that warrants the Stratify system price tag which is in six figures in case you want to license the system.
Here’s this display:
Stratify’s concept discovery interface
In this display you can see concepts in the left hand panel. The concepts are “discovered” using a combination of the Stratify knowledge base (controlled term index) and entities and topics the Stratify methods generate. The system is easy to use with a mouse. The system has more ways to navigate the content underneath the concept folders (note the support for non English documents); for example, Tasks, Smart Folders, and Work Folders. There’s also a button for search.
The idea is that the user can flag documents and build a sub set for other attorneys to review or to use when preparing a legal document.
This type of content interaction is what most people mean when they use the word “search”. Remember enterprise search is dead because most systems don’t do what Stratify does. Enterprises don’t want huge chunks of email accessible to most employees. Most organizations have pretty lousy information controls, so health, personal, and confidential information about the soccer club fund raising campaign will be discernable among the lab reports for the top secret drug trial. Few procurement teams or information technology departments understand the problems for management inherent in certain types of content processing systems. The lawyers do, and enterprise attorneys often opt out of enterprise search initiatives.
What You Do with the Outputs
I can’t give a full inventory of the uses of this type of system, but I can identify three typical applications:
- Find information to win a legal contest. Nothing works quite so well as a print out of emails that support a specific assertion.
- Find relationships among people who may not have informed anyone that those relationships exist. A good example is the sort of discovery that emerged in the Intel – AMD legal matter in which email played a part.
- Sell more work. Ambiguities identified allow the user to say, “I need to investigate this topic more.”
A little thought about your own email will reveal other interesting uses of this type of system.
Observations
After getting burned with lousy key word search systems, organizations want different ways to access information. The push back is now widely known, so vendors are starting to sell systems that perform this type of content analysis. Some of the systems are pretty good; for example, I have had great success using specialized systems from outfits like Attensity and more modern, generalized systems from Coveo, Exalead, and ISYS Search Software, Siderean Software, and a handful of others. I have had less success with some better known systems because you can’t put lipstick on a pig and pass it off to me as the author of the The Fairie Queene. Agree? Disagree?
This is a pretty big topic, so I may do some more essays about it. I have some unvalidated intelligence about a new email service from a well-known search vendor. I will poke around.
Stephen Arnold, July 5, 2008