eDiscovery: A Source of Thrills and Reduced Costs?
February 2, 2013
When I hear the phrase “eDiscovery”, I don’t get chills. I suppose some folks do. I read after dinner last night (February 1, 2013) “Letter From LegalTech: The Thrills of E-Discovery.” The author addresses the use of search and content processing technology to figure out which documents are most germane to a legal matter. Once the subset has been identified, eDiscovery provides outputs which “real” attorneys (whether in Bangalore or Binghamton) can use to develop their “logical” arguments.
A happy quack to
One interesting factoid bumps into my rather sharp assessment of the “size” of the enterprise search market generated by an azure chip out. The number was about $1.5 billion. In the eDiscovery write up, the author says:
Nobody seems to know how large the e-discovery market is — estimates range from 1.2 to 2.8 billion dollars — but everyone agree it’s not going anywhere. We’re never going back to sorting through those boxes of documents in that proverbial warehouse.
I like the categorical affirmative “nobody.” The point is that sizing any of the search and content processing markets is pretty much like asking Bernie Madoff type professionals, “How much in liquid assets do you have?” The answer is situational, enhanced by marketing, and believed without a moment’s hesitation.
I know the eDiscovery market is out there because I get lots of PR spam about various breakthroughs, revolutions, and inventions which promise to revolutionize figuring out which email will help a legal eagle win a case with his or her “logical” argument. I wanted to use the word “rational” in the manner of John Ralston Saul, but the rational attorneys are leaving the field and looking for work as novelists, bloggers, and fast food workers.
One company—an outfit called Catalyst Repository Systems—flooded me with PR email spam about its products. I called the company on January 31, 2013. I was treated in an offhand, suspicious manner by a tense, somewhat defensive young man named Mark, Monk, Matt, or Mump. At age 69, I have a tough time figuring out Denver accents. Mark, Monk, Matt, or Mump took my name and phone number. He assured me that his boss would call me back to answer my questions about PR spam and the product which struck me as a “me too.” I did learn that he had six years of marketing experience and that he just “push the send button.” I suggested that he may want to know to whom he is sending messages multiple times, he said, “You are being too aggressive.” I pointed out that I was asking a question just like the lawyers who, one presumes, gobbles up the Catalyst products. He took my name, did not ask how to spell it, wrote down my direct line and did not bother to repeat it back to me, and left me with the impression that I was out of bounds and annoying. That was amusing because I was trying hard to be a regular type caller.
A happy quack to Bitter Lawyer which has information about the pressures upon some in the legal profession. See http://www.bitterlawyer.com/i%E2%80%99m-unemployed-and-feel-ripped-off-by-my-ttt-law-school/
Mark, Monk, Matt, or Mump may have delivered the message and the Catalyst top dog was too busy to give me a jingle. Another possibility is that Mark, Monk, Matt, or Mump never took the note. He just wanted to get a person complaining about PR spam off the phone. Either way, Catalyst qualifies as an interesting example of what’s happening in eDiscovery. Desperation marketing has infected other subsectors of the information retrieval market. Maybe this is an attempt to hit in reality revenues of $1.5 billion?
ArnoldIT to Roll Out Electronic Medical Record Publication
January 31, 2013
In the next three weeks, ArnoldIT will make available a new information service focusing on a specific aspect of electronic medical records. The publication will be similar to Beyond Search and combine elements of our other information services. We plan to tailor the content and approach to the needs of the professional who wants to extract maximum value and minimize the risks often associated with EMRs or electronic medical records. An important part of the EMR team is Jeannene Manning. You can get a detailed profile at http://goo.gl/CGa3e.
Jeannene Manning, a former colleague of Mr. Arnold at the Courier Journal, has worked for more than 20 years in various health and medical related disciplines.
Manning has seen healthcare marketing from both sides of the street. She was a top executive at Humana, the health insurance giant with interests in Medicare, Medicaid and commercial insurance businesses, as well as at Caretenders, a pioneer in home health services. Then she left the corporate side to be the senior vice president at Finelight, an advertising agency with clients in both health insurance and health services, to work in strategic planning, business development, branding and product management targeting both business-to-business and direct-to-consumer.
She said:
Very few marketers have learned how to cut through the acronyms and government jargon to deliver understandable messages to consumers on complex topics,” Manning said. “You have to know your audience and how it’s changing.
An interview with her appears in HighGainBlog, an ArnoldIT information service which reports on business innovation. The full text of her interview is at http://goo.gl/F9smG.
Stuart Schram, January 31, 2013
Sponsored by Verdenoce, the gourmet craft spirit
Information Confusion: Search Gone South
January 26, 2013
I read “We Are Supposed to Be Truth Tellers.” I think the publication is owned by a large media firm. The point of the write up is that “real news” has a higher aspiration and may deal with facts with a smidgen of opinion.
I am not a journalist. I am a semi retired guy who lives in rural Kentucky. I am not a big fan of downloading and watching television programs. The idea that I would want to record multiple shows, skip commercials, and then feel smarter and more informed as a direct result of those activities baffles me.
Here’s what I understand:
A large company clamped down on a subsidiary’s giving a recording oriented outfit a prize for coming up with a product that allows the couch potato to skip commercials. The fallout from this corporate decision caused a journalist to quit and triggered some internal grousing.
The article addresses these issues, which I admit, are foreign to me. Here’s one of the passages which caught my attention:
CNET reporters need to either be resigning or be reporting this story, or both. On CNET. If someone higher up removes their content then they should republish it on their personal blogs. If they are then fired for that they should sue the company. And either way, other tech sites, including this one, would be more than happy to make them job offers.
I agree I suppose. But what baffles me are these questions:
- In today’s uncertain financial climate, does anyone expect senior management to do more than take steps to minimize risk, reduce costs, and try to keep their jobs? I don’t. The notion that senior management of a media company embraces the feel good methods of Whole Earth or the Dali Lama is out of whack with reality in my opinion.
- In the era of “weaponized information,” pay to play search traffic, and sponsored content from organizations like good old ArnoldIT—what is accurate. What is the reality? What is given spin? I find that when I run a query for “gourmet craft spirit” I get some darned interesting results. Try it. Who are these “gourmet craft spirit” people? Interesting stuff, but what’s news, what’s fact, and what’s marketing? If I cannot tell, how about the average Web surfer who lets online systems predict what the user needs before the user enters a query?
- At a time when those using online to find pizza and paradise, can users discern when a system is sending false content? More importantly, can today’s Fancy Dan intelligence systems from Palantir-likeand i2 Group-like discern “fake” information from “real” information? My experience is that with sufficient resources, these advanced systems can output results which are shaped by crafty humans. Not exactly what the licensees want or know about.
Net net: I am confused about the “facts” of any content object available today and skeptical of smart systems’ outputs. These can be, gentle reader, manipulated. I see articles in the Wall Street Journal which report on wire tapping. Interesting but did not the owner of the newspaper find itself tangled in a wire tapping legal matter? I read about industry trends from consulting firms who highlight the companies who pay to be given the high intensity beam and the rah rah assessments. Is this Big Data baloney sponsored content, a marketing trend, or just the next big thing to generate cash in a time of desperation. I see conference programs which feature firms who pay for platinum sponsorships and then get the keynote, a couple of panels, and a product talk. Heck, after one talk, I get the message about sentiment analysis. Do I need to hear from this sponsor four or five more times. Ah, “real” information? So what’s real?
In today’s digital world, there are many opportunities for humans to exercise self interest. The dust up over the CBS intervention is not surprising to me. The high profile resignation of a real journalist is a heck of a way to get visibility for “ethical” behavior. The subsequent buzz on the Internet, including this blog post, are part of the information game today.
Thank goodness I am sold and in a geographic location without running water, but I have an Internet connection. Such is progress. The ethics stuff, the assumptions of “real” journalists, and the notion of objective, fair information don’t cause much of stir around the wood burning stove at the local grocery.
“Weaponized information” has arrived in some observers’ consciousness. That is a step forward. That insight is coming after the train left the station. Blog posts may not be effective in getting the train to stop, back up, and let the late arrivals board.
Stephen E Arnold, January 26, 2013
Social Search: Don Quixote Is Alive and Well
January 18, 2013
Here I float in Harrod’s Creek, Kentucky, an addled goose. I am interested in other geese in rural Kentucky. I log into Facebook, using a faux human alias (easier than one would imagine) and run a natural language query (human language, of course). I peck with my beak on my iPad using an app, “Geese hook up 40027.” What do I get? Nothing, Zip, zilch, nada.
Intrigued I query, “modern American drama.” What do I get? Nothing, Zip, zilch, nada.
I give up. Social search just does not work under my quite “normal” conditions.
First, I am a goose spoofing the world as a human. Not too many folks like this on Facebook, so my interests and my social graph is useless.
Second, the key words in my natural language query do not match the Facebook patterns, crafted by former Googlers and 20 somethings to deliver hook up heaven and links to the semi infamous Actor’s Theater or the Kentucky Center.
Social search is not search. Social search is group centric. Social search is an outstanding system for monitoring and surveillance. For information retrieval, social search is a subset of information retrieval. How do semantic methods improve the validity of the information retrieved? I am not exactly sure. Perhaps the vendors will explain and provide documented examples?
Third, without context, my natural language queries shoot through the holes in the Swiss Cheese of the Facebook database.
After I read “The Future of Social Search,” I assumed that information was available at the peck of my beak. How misguided was I? Well, one more “next big thing” in search demonstrated that baloney production is surging in a ailing economy. Optimism is good. Crazy predictions about search are not so good. Look at the sad state of enterprise search, Web search, and email search. Nothing works exactly as I hope. The dust up between Hewlett Packard and Autonomy suggests that “meaning based computing” is a point of contention.
If social search does not work for an addled goose, for whom does it work? According to the wild and crazy write up:
Are social networks (or information networks) the new search engine? Or, as Steve Jobs would argue, is the mobile app the new search engine? Or, is the question-and-answer formula of Quora the real search 2.0? The answer is most likely all of the above, because search is being redefined by all of these factors. Because search is changing, so too is the still maturing notion of social search, and we should certainly think about it as something much grander than socially-enhanced search results.
Yep, Search 2.0.
But the bit of plastic floating in my pond is semantic search. Here’s what the Search 2.0 social crowd asserts:
Let’s embrace the notion that social search should be effortless on the part of the user and exist within a familiar experience — mobile, social or search. What this foretells is a future in which semantic analysis, machine learning, natural language processing and artificial intelligence will digest our every web action and organically spit out a social search experience. This social search future is already unfolding before our very eyes. Foursquare now taps its massive check in database to churn out recommendations personalized by relationships and activities. My6sense prioritizes tweets, RSS feeds and Facebook updates, and it’s working to personalize the web through semantic analysis. Even Flipboard offers a fresh form of social search and helps the user find content through their social relationships. Of course, there’s the obvious implementations of Facebook Instant Personalization: Rotten Tomatoes, Clicker and Yelp offer Facebook-personalized experiences, essentially using your social graph to return better “search” results.
Semantics. Better search results. How does that work on Facebook images and Twitter messages?
My view is that when one looks for information, there are some old fashioned yardsticks; for example, precision, recall, editorial policy, corpus provenance, etc.
When a clueless person asks about pop culture, I am not sure that traditional reference sources will provide an answer. But as information access is trivialized, the need for knowledge about the accuracy and comprehensiveness of content, the metrics of precision and recall, and the editorial policy or degree of manipulation baked into the system decreases.
See Advantech.com for details of a surveillance system.
Search has not become better. Search has become subject to self referential mechanisms. That’s why my goose queries disappoint. If I were looking for pizza or Lady Gaga information, I would have hit pay dirt with a social search system. When I look for information based on an idiosyncratic social fingerprint or when I look for hard information to answer difficult questions related to client work, social search is not going to deliver the input which keeps this goose happy.
What is interesting is that so many are embracing a surveillance based system as the next big thing in search. I am glad I am old. I am delighted my old fashioned approach to obtaining information is working just fine without the special advantages a social graph delivers.
Will today’s social search users understand the old fashioned methods of obtaining information? In my opinion, nope. Does it matter? Not to me. I hope some of these social searchers do more than run a Facebook query to study for their electrical engineering certification or to pass board certification for brain surgery.
Stephen E Arnold, January 18, 2013
Dr. Jerry Lucas: Exclusive Interview with TeleStrategies ISS Founder
January 14, 2013
Dr. Jerry Lucas, founder of TeleStrategies, is an expert in digital information and founder of the ISS World series of conferences. “ISS” is shorthand for “intelligence support systems.” The scope of Mr. Lucas’ interests range from the technical innards of modern communications systems to the exploding sectors for real time content processing. Analytics, fancy math, and online underpin Mr. Lucas’ expertise and form the backbone of the company’s training and conference activities.
What makes Dr. Lucas’ viewpoint of particular value is his deep experience in “lawful interception, criminal investigations, and intelligence gathering.” The perspective of an individual with Dr. Lucas’ professional career offers an important and refreshing alternative to the baloney promulgated by many of the consulting firms explaining online systems.
Dr. Lucas offered a more “internationalized” view of the Big Data trend which is exercising many US marketers’ and sales professionals’ activities. He said:
“Big Data” is an eye catching buzzword that works in the US. But as you go east across the globe, “Big Data” as a buzzword doesn’t get traction in the Middle East, Africa and Asia Pacific Regions if you remove Russia and China. One interesting note is that Russian and Chinese government agencies only buy from vendors based in their countries. The US Intelligence Community (IC) has big data problems because of the obvious massive amount of data gathered that’s now being measured in zettabytes. The data gathered and stored by the US Intelligence Community is growing beyond what typical database software products can handle as well as the tools to capture, store, manage and analyze the data. For the US, Western Europe, Russia and China, “Big Data” is a real problem and not a hyped up buzzword.
Western vendors have been caught in the boundaries between different countries’ requirements. Dr. Lucas observed:
A number of western vendors made a decision because of the negative press attention to abandon the global intelligence gathering market. In the US Congress Representative Chris Smith (R, NJ) sponsored a bill that went nowhere to ban the export of intelligence gathering products period. In France a Bull Group subsidiary, Amesys legally sold intelligence gathering systems to Lybia but received a lot of bad press during Arab Spring. Since Amesys represented only a few percent of Bull Group’s annual revenues, they just sold the division. Amesys is now a UAE company, Advanced Middle East Systems (Ames). My take away here is governments particularly in the Middle East, Africa and Asia have concerns about the long term regional presence of western intelligence gathering vendors who desire to keep a low public profile. For example, choosing not to exhibit at ISS World Programs. The next step by these vendors could be abandoning the regional marketplace and product support.
The desire for federated information access is, based on the vendors’ marketing efforts, is high. Dr. Lucas made this comment about the existence of information silos:
Consider the US where you have 16 federal organizations collecting intelligence data plus the oversight of the Office of Director of National Intelligence (ODNI). In addition there are nearly 30,000 local and state police organizations collecting intelligence data as well. Data sharing has been a well identified problem since 9/11. Congress established the ODNI in 2004 and funded the Department of Homeland Security to set up State and Local Data Fusion Centers. To date Congress has not been impressed. DNI James Clapper has come under intelligence gathering fire over Benghazi and the DHS has been criticized in an October Senate report that the $1 Billion spent by DHS on 70 state and local data fusion centers has been an alleged waste of money. The information silo or the information stovepipe problem will not go away quickly in the US for many reasons. Data cannot be shared because one agency doesn’t have the proper security clearances, job security which means “as long as I control access the data I have a job,” and privacy issues, among others.
The full text of the exclusive interview with Dr. Lucas is at http://www.arnoldit.com/search-wizards-speak/telestrategies-2.html. The full text of the 2011 interview with Dr. Lucas is at this link. Stephen E Arnold interviewed Dr. Lucas on January 10, 2013. The full text of the interview is available on the ArnoldIT.com subsite “Search Wizards Speak.”
Worth reading.
Donald Anderson, January 14, 2013
Biotechnology News Reports on Vital Natural Language Processing Developments
January 14, 2013
Several biotechnology companies raced to release new 2012 products and we were filled in on these new releases by Bio IT World in their recent summary. A few important briefings related to the industry were also described in December Product and News Briefs.
In addition to reporting industry news from big players like IBM and announcing job opportunities, the majority of attention has been places on new products from Linguamatics, PerkinElmer, Titan Software, SoftGenetics, and Optibrium. These were all launched in the final month of 2012; however, most notably, Linguamatics has rolled out version 4.0 of text mining software platform, I2E.
The article discusses how natural language text mining will be opened up to a variety of different users with various needs. Continuing out of this topic, the author states:
“Regardless of how many disparate data sources need to be mined, I2E now has the power to analyze and extract information and knowledge from all of them simultaneously. Linguamatics I2E can now deal with recognition of novel compounds, which will give informaticians, researchers and patent analysts the power to investigate uncharted areas of innovation.”
Overall, this website offered a nice summary of some new products with Linguamatics offering some developments worth noting in the land of natural language processing. We shall see what 2013 holds.
Megan Feil, January 14, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Semantria Goes Pentalingual
January 1, 2013
Semantria is a text analytics and sentiment analysis solutions company. In order to reach a new clientele as well as work with companies with an international base, “Semantria Announces Content Classification and Categorization Functionality in 5 Languages.” Semantria now speaks English, Spanish, French, German, and Portuguese.
To power its categorization functionality, Semantria uses the Concept Matrix. It is a large thesaurus that used Wikipedia in its beta phase. After digesting Wikipedia, the Concept Matrix created lexical connections between every concept within it. Semantria developed the technology with Lexalytics and the Lexalytics Salience 5 engine powers the Concept Matrix. The Concept Matrix is a one of a kind tool that organizes and classifies information:
“Seth Redmore, VP Product Management and Marketing at Lexalytics, explains; ‘Text categorization requires an understanding of how things are alike. Before the Concept Matrix, you’d have to use a massive amount of training data to “teach” your engine, i.e. ‘documents about food’.’ And, he continues, ‘With the Concept Matrix, the training’s already done, and by providing Semantria a few keywords, it drops your content into the correct categories.’ ”
A piece of software that does all the organizing for you, how amazing is that? If it “ate” Wikipedia and made lexical connections, what could it do with Google, Bing, the entire Internet?
Whitney Grace, January 01, 2013
Sponsored by ArnoldIT.com, developer of Augmentext
Gannett Wants Better Content Management
December 31, 2012
Gannett is a media and marketing company that represents USA Today, Shop Local, and Deal Chicken. One can imagine that such a prolific company has a lot of data that needs to be organized and made workable. Marketing and media companies are on the forefront of the public eye and if they do not get their client’s name out in the open, then it means less dollars in the bank for them. One way this could happen is if they do not centralize a plan for information governance. The good news is “Gannett Chooses ITM for Centralized Management of Reference Vocabularies,”as reported via the Mondeca news Web site. Mondeca is a company that specializes in knowledge management with a variety of products that structure knowledge in many possible ways. Its ITM system was built to handle knowledge structures from conception to usage and the maintenance process afterward. ITM helps organize knowledge, accessing data across multiple platforms, improved search and navigation, and aligning/merging taxonomies and ontologies.
Gannet selected Mondeca for these very purposes:
“Gannett needed software to centrally manage, synchronize, and distribute its reference vocabularies across a variety of systems, such as text analytics, search engines, and CMS. They also wanted to create vocabularies and enrich them using external sources, with the help of MEI. Gannett selected ITM as the best match for the job. At the end of the project, Gannett intends to achieve stronger semantic integration across its content delivery workflow.”
Gannett is sure to discover that Mondeca’s ITM software will provide them with better control over its data, not to mention new insights into its knowledge base. Data organization and proper search techniques are the master key to any organization’s success.
Whitney Grace, December 31, 2012
Sponsored by ArnoldIT.com, developer of Augmentext
New Offering from Attensity Poised to Blow Up ROI
December 21, 2012
Analytics tools from social-minded vendors are now using text analytics technology to report on market perception and consumer preferences before the product launch. BtoB reported on this new offering in the article, “Attensity Releases Analytics Tools for Product Introductions.”
Now, businesses will be able to monitor product introductions with this new tool from Attensity. It is only a matter of time before we start seeing specific technology solutions to evaluate and analyze every specific phase of the product development cycle.
Both new insights for further developments and opportunities to avoid risk will be possible with New Product Introduction.
The article states:
“The tool uses text analytics technology to report on market perception and preferences before roll out, uncovering areas of risk and opportunity, according to the company. It then tracks customer reception upon and after the launch to determine the impact of initial marketing efforts. Attensity said the New Product Introduction tool is one in a series of planned social text-analytics applications devoted to customer care, branding, and campaign and competitive analytics.”
Many organizations will be chomping at the bit to utilize this technology since it offers an easy way to improve ROI.
Megan Feil, December 21, 2012
Sponsored by ArnoldIT.com, developer of Augmentext
Predictive Coding: Who Is on First? What Is the Betting Game?
December 20, 2012
I am confused, but what’s new? The whole “predictive analytics” rah rah causes me to reach for my NRR 33 dB bell shaped foam ear plugs.
Look. If predictive methods worked, there would be headlines in the Daily Racing Form, in the Wall Street Journal, and in the Las Vegas sports books. The cheerleaders for predictive wizardry are pitching breakthrough technology in places where accountability is a little fuzzier than a horse race, stock picking, and betting on football games.
The godfather of cost cutting for legal document analysis. Revenend Thomas Bayes, 1701 to 1761. I heard he said, “Praise be, the math doth work when I flip the numbers and perform the old inverse probability trick. Perhaps I shall apply this to legal disputes when lawyers believe technology will transform their profession.” Yep, partial belief. Just the ticket for attorneys. See http://goo.gl/S5VSR.
I understand that there is PREDICTION which generates tons of money to the person who has an algorithm which divines which nag wins the Derby, which stock is going to soar, and which football team will win a particular game. Skip the fuzzifiers like 51 percent chance of rain. It either rains or it does not rain. In the harsh world of Harrod’s Creek, capital letter PREDICTION is not too reliable.
The lower case prediction is far safer. The assumptions, the unexamined data, the thresholds hardwired into the off-the-shelf algorithms, or the fiddling with Bayesian relaxation factors is aimed at those looking to cut corners, trim costs, or figure out which way to point the hit-and-miss medical research team.
Which is it? PREDICTION or prediction.
I submit that it is lower case prediction with an upper case MARKETING wordsmithing.
Here’s why:
I read “The Amazing Forensic Tech behind the Next Apple, Samsun Legal Dust Up (and How to Hack It).” Now that is a headline. Skip the “amazing”, “Apple”, “Samsung,” and “Hack.” I think the message is that Fast Company has discovered predictive text analysis. I could be wrong here, but I think Fast Company might have been helped along by some friendly public relations type.
Let’s look at the write up.
First, the high profile Apple Samsung trial become the hook for “amazing” technology. the idea is that smart software can grind through the text spit out from a discovery process. In the era of a ballooning digital data, it is really expensive to pay humans (even those working at a discount in India or the Philippines) to read the emails, reports, and transcripts.
Let a smart machine do the work. It is cheaper, faster, and better. (Shouldn’t one have to pick two of these attributes?)
Fast Company asserts:
“A couple good things are happening now,” Looby says. “Courts are beginning to endorse predictive coding, and training a machine to do the information retrieval is a lot quicker than doing it manually.” The process of “Information retrieval” (or IR) is the first part of the “discovery” phase of a lawsuit, dubbed “e-discovery” when computers are involved. Normally, a small team of lawyers would have to comb through documents and manually search for pertinent patterns. With predictive coding, they can manually review a small portion, and use the sample to teach the computer to analyze the rest. (A variety of machine learning technologies were used in the Madoff investigation, says Looby, but he can’t specify which.)