SharePoint and Document Management
November 20, 2008
If you are in the midst of a discovery process, you will find some surprising information in the article “MOSS 2007 Document Management Services — Document Centralization” here. This Web log post appeared on Mastering SharePoint Community on November 19, 2008. The author was Bob Mixon. The write up covers a number of SharePoint document management topics, but for me the most important point was in this comment:
I don’t believe you will find anyone (or I at least hope not) at Microsoft recommending the use of a single Document Library to store all of your organizations documents.
What this means is that Microsoft is opening the door to third party vendors who can build a single collection of documents, put them in one place, and provide access control tools so the documents in the repository cannot be changed. The fancy word for this is spoliation. SharePoint, the Swiss Army knife for content, ships with a broken knife blade and some rust on the moving parts. You may find the many collections approach useful. I don’t think some senior managers who are facing litigation will be too thrilled to learn that special purpose systems will be needed because SharePoint doesn’t recommend a single repository. If you have licked this problem, let me know.
Stephen Arnold, November 20, 2008
Autonomy Upgrades Investigative System
November 15, 2008
Autonomy, based in Cambridge, England, continues to be one of the most agile of the information access and services company. The firm has updated its Intelligent Investigator & Early Case Assessment software. You can read about the story here or visit the Autonomy Web site for more details. Autonomy asserts that its software can understand the meaning of large volumes of data collected in an investigation or similar procedure. Once the structured and unstructured data are processed, an investigator can use the Autonomy system:
to reconstruct what occurred, develop informed case strategies and sweep aside non-responsive data. A seamless link with Autonomy Legal Hold software automatically provides a legally defensible preservation and collection process.
Features of the investigative system include:
- A case centric view of the data. The idea is that an investigator can get a bird’s eye view of information, events, persons of interest, and time in a matter
- A new feature to analyze data where it resides and provide answers to queries without building a collection and performing some of the manual tasks other systems require
- A risk component
- Enhanced entity extraction and alias identification
Other companies offer case management and investigative tools. Autonomy’s broad sweep of software and systems allows the company to provide a solution that can mesh with almost any organizational or legal requirement. Will Autonomy sweep the field in this market? I know the company will try? The challenge will be to convince investigative units and lawyers to try new methods. Investigators and lawyers can be like my grandmother–set in her ways. A number of search and content processing companies are looking closely at these specialized markets. When the economy goes south, legal activity goes north. Autonomy has demonstrated it knows which way the compass is spinning.
Stephen Arnold, November 15, 2008
Data Management: A New Search Driver
November 4, 2008
Earlier today I reread “The Claremont Report on Database Research.” I had a few minutes, and I recalled reading the document earlier this year, and I wanted to see if I had missed some of its key points. This report is a committee written document prepared as part of an invitation only conference focusing on databases. I follow the work of several of the people listed as authors of the report; for example, Michael Stonebraker and Hector Garcia-Molina, among others.
One passage struck me as important on this reading of the document. On page 6, the report said:
The second challenge is to develop methods for effectively querying and deriving insight from the resulting sea of heterogeneous data…. keyword queries are just one entry point into data exploration, and there is a need for techniques that lead users into the most appropriate querying mechanism. Unlike previous work on information integration, the challenges here are that we do not assume we have semantic mappings for the data sources and we cannot assume that the domain of the query or the data sources is known. We need to develop algorithms for providing best-effort services on loosely integrated data. The system should provide some meaningful answers to queries with no need for any manual integration, and improve over time in a “pay-as-you-go” fashion as semantic relationships are discovered and refined. Developing index structures to support querying hybrid data is also a significant challenge. More generally, we need to develop new notions of correctness and consistency in order to provide metrics and to enable users or system designers to make cost/quality tradeoffs. We also need to develop the appropriate systems concepts around which to tie these functionalities.
Several thoughts crossed my mind as I thought about this passage; namely:
- The efforts by some vendors to make search a front end or interface for database queries is bringing this function to enterprise customers. The demonstrations by different vendors of business intelligence systems such as Microsoft Fast’s Active Warehouse or Attivio’s Active Intelligence Engine make it clear that search has morphed from key words to answers.
- The notion of “pay as you go” translates to smart software; that is, no humans needed. If a human is needed, that involvement is as a system developer. Once the software begins to run, it educates itself. So, pay as you go becomes a colloquial way to describe what some might have labeled “artificial intelligence” in the past. With data volume increasing, the notion of humans getting paid to touch the content recedes.
- Database quality in the commercial database sector could be measured by consistency and completeness. The idea that zip codes were consistent was more important than a zip code being accurate. With statistical procedures the value in a cell may be filled and it will include a score that shows the probability that the zip code is correct. Similarly, if one looks for the salary or mobile number of an individuals, these probability scores become important guides to the user.
“Pay as you go” computing means that the most expensive functions in a data management method have costs reduced because humans are no longer needed to do “knowledge work” required to winnow and select documents, facts, and information. The company able to implement “pay as you go” computing on a large scale will destabilize the existing database business sector. My research has identified Google as an organization employing research scientists who use the phrase “pay as you go” computing. Is this a coincidence or an indication that Google wants to leap frog traditional database vendors in the enterprise?
In the last month, a number of companies have been kind enough to show me demonstrations of next generation systems that take a query and generate a report. One system allows me to look at a sample screen, click a few options, and then begin my investigation by scanning a “trial report”. I located a sample Google report in a patent application that generates a dossier when the query is for an individual. That output goes an extra step and includes aliases used by the individual who is the subject of the query and a hot link to a map showing geolocations associated with that individual.
The number of companies offering products or advanced demonstrations of these functions means that the word search is going to be stretched even further than assisted navigation or alerts. The vendors who describe search as an interface for business intelligence are moving well beyond key word queries and the seemingly sophisticated interfaces widely available today.
Despite the economic pressures on organizations today, vendors pushing into data management for the purpose of delivering business intelligence will find customers. The problem will be finding a language in which to discuss these new functions and features. The word search may not be up to the task. The phrase business intelligence is similarly devalued for many applications. An interesting problem now confronts buyers, analysts, and vendors, “How can we describe our systems so people will understand that a revolution is taking place?”
The turgid writing in the Claremont Report is designed to keep the secret for the in crowd. My hunch is that certain large organizations–possibly Google–are quite far along in this data management deployment. One risk is that some companies will be better at marketing than at deploying industrial strength next generation data management systems. The nest might be fouled by great marketing not supported by equally robust technology. If this happens, the company that says little about its next generation data management system might deploy the system, allow users to discover it, and thus carry the field without any significant sales and marketing effort.
Does anyone have an opinion on whether the “winner” in data management will be a start up like Aster Data, a market leader like Oracle, or a Web search outfit like Google? Let me know.
Stephen Arnold, November 4, 2008
Another View of the Search Market
November 3, 2008
I missed this September 27, 2008, analysis in Intelligent Enterprise. I am not surprised that my trusty correspondents did not forward the link to me. You must read “Enterprise Search: Microsoft, Google, Specialized Players Vie for Supremacy” by Andrew Conry-Murray” here. The article was interesting because it comes at a subject near and dear to my heart in a way that I would not have anticipated. This is a five part opus, so plan to spend some time analyzing the write up’s structure and assertions.
The thesis makes one key assumption; namely, enterprise search is alive and kicking and that it is a viable business sector for the hundreds of companies touting their search systems. First, Mr. Conry-Murray uses a segmentation developed by Information Week. That’s okay, but I am not certain it is 100 percent in line with my analysis of this complicated, confused, conflicted sector. Second, the article pops from finding stuff to finding stuff under the umbrella of eDiscovery. The leap doesn’t resonate with me, and it does not make much sense. eDiscovery can exist along with multiple search systems, and it involves some different issues that searching for stuff without threat of a fine or jail time. Think spoliation. Third, I was exposed to “the 17 databases problem”. Now, next generation data management systems can cope with heterogeneous types of structured and unstructured data. I quite like Google’s dataspace approach and the Exalead system works like a champ as well. Mark Logic and others are in this horse race as well. I could list more vendors but I don’t want to rehash my profiles in my Beyond Search study published by The Gilbane Group in April 2008. Finally, I learned about expert search.
I am not going to be able to recycle much of this article. Nor will I reference it in my lectures next week. What I learned is that a person who “reads up” about search and talks to some people can identify some of the issues. What’s missing is context. I do quite like the frequency with which the “beyond” preposition is turning up. There’s a “beyond Google” seminar. Even Attivio uses the “beyond” word in its newest white paper.
Here’s an interesting exercise. Navigate to Google. Run a query for “beyond search”. Start there.
Stephen Arnold, November 3, 2008
Herding SharePoint Content Sheep
November 2, 2008
Microsoft may be pushing Fast Search’s ESP into large SharePoint installations, but certified gold partners continue to find opportunities to make money from the 100 million SharePoint installations. Autonomy and Open Text recently rolled out systems that make it easier to keep control of SharePoint content. Why control SharePoint documents? You will learn quickly enough when you get caught in a legal matter and have to figure out which version of each document is the one that is the “right” one. SharePoint offers primitive and clunky controls for herding SharePoint content sheep; that is, the many bits and pieces of modular documents, emails with attachments, and PowerPoint decks with some relevant information but perhaps not the best and final version of the deck. When you get the invoice for collecting documents manually, you will understand why you have to have robust tools for governing information.
Let’s look briefly at two products that herd content sheep:
First, Autonomy has rolled out Controlpoint. You can read more about the product on the Autonomy Web site here and in the Marketwatch write up here. In a nutshell, you install Controlpoint, and you get policy-driven control of all SharePoint content. Autonomy includes a number of content processing functions with Controlpoint; for example, classification of documents.
Next, Open Text dubs its governance solution the sveltly named Open Text Content Lifecycle Management Services for Microsoft SharePoint, eDOCS Edition. The acronym is OTCLMSMSE. Between you and me Autonomy does a better job naming products. The Open Text solution delivers life cycle management, policies, and archiving functions. A licensee can hook SharePoint into Open Text’s other enterprise content management services as well. You can read CMSWire’s write up here.
Let’s step back and think about SharePoint. Autonomy and Open Text have identified a glaring weakness in the Frankenstein SharePoint. SharePoint is, according to some, Microsoft’s next generation operating system. I think that’s pretty wacky. SharePoint is a product that changes with each release. First it was content management. Then it was collaboration. Now it is knowledge management. The Microsoft sale pitch makes SharePoint seem easy, cheap, wonderful, and the cure for what ails a modern organization. In reality, SharePoint lacks the chops to deal with content when a lawyer shows up to collect information as part of the discovery process. SharePoint doesn’t do any single function particularly well. What it does is deliver stub functions that work okay when you have two or three people using a small number of documents. Increase the number of documents and the number of users, and you have a multi million dollar investment to get the system stable and running with acceptable performance.
I am willing to go out on a limb and say that Microsoft will introduce enhanced policy and information governance features. The Microsoft certified professionals will install these extensions, and the company will find that liberal injections of money and technical resources will be needed to get the amalgamation working in an acceptable way.
In the meantime, Autonomy and Open Text should be able to make sales. When Microsoft rolls out its own governance solutions, engineers at these companies will develop a fix for another void in SharePoint.
Stephen Arnold, November 2, 2008
Brainware: Nailing a Big Deal
October 31, 2008
Brainware won’t reveal the name of its client, but the “diversified energy company” bought $2.6 million worth of Brainware technology in October 2007. Brainware explains that this is a “follow on contract”. This size of this deal moves Brainware into Autonomy and Endeca territory, two vendors noted for their ability to land large search and content processing deals. Brainware describes itself as a vendor of “intelligent data capture and enterprise search solutions.” The firm offers what is called “end to end” solutions. Like OpenText and ZyLab, Brainware can put in place scanning and content capture and conversion systems, indexing, content processing, and information access tools. The idea is that paper or digital information goes in at one end of the system and the user can access that content at the other. You can read an interview with one of the Brainware executives here. I profiled the company in Beyond Search, published by the Gilbane Group. More information about that study is here.
According to the Brainware news release here:
This private sector oil company is deploying Brainware Distiller as a front-end data capture solution in conjunction with its global ERP rollout. The Brainware Distiller solution will process millions of invoice pages per year from more than a hundred different countries through their shared services centers located across Europe and the U.S.
A happy quack to the Brainware team. Now, what’s next for the Ashburn, Virginia, company? If you haven’t sold Halliburton, maybe you should aim for Shell or BP next? I found the announcement interesting because Brainware is associated with eDiscovery, not enterprise search, in my mind. Like Recommind, Brainware seems to be making an effort to penetrate new markets with its patented technology for information processing.
Stephen Arnold, October 30, 2008
Silobreaker: Two New Services Coming
October 24, 2008
I rarely come across real news. In London, England, last week I uncovered some information about Silobreaker’s new services. I have written about Silobreaker before here and interviewed one of the company’s founders, Mats Bjore here. In the course of my chatting with some of the people I know in London, I garnered two useful pieces of intelligence. Keep in mind that the actual details of these forthcoming services may vary, but I am 99% certain that Silobreaker will introduce:
Contextualized Ad Retrieval in Silobreaker.com.
The idea is that Silobreaker’s “smart software” called a “contextualization engine” will be applied to advertising. The method understands concepts and topics, not just keywords. I expect to see Silobreaker offering this system to licensees and partners. What’s the implication of this technology? Obviously, for licensees, the system makes it possible to deliver context-based ads. Another use is for a governmental organization to blend a pool of content with a stream of news. In effect, when certain events occur in a news or content stream, an appropriate message or reminder can be displayed for the user. I can think of numerous police and intelligence applications for this blend of static and dynamic content in operational situations.
Enterprise Media Monitoring & Analysis Service
The other new service I learned about is a fully customizable online service that delivers a simple and effective way for enterprise customers to handle the entire work flow around their media monitoring and analysis needs. While today’s media monitoring and news clipping efforts remain resource intensive, Silobreaker Enterprise will be a subscription-based service that will automate much of the heavy lifting that either internal or external analysts must perform by hand. The Silobreaker approach is to blend–a key concept in the Silobreaker technical approach–in a single intuitive user interface disparate yet related information. The enterprise customers will be able to define monitoring targets, trigger content aggregation, perform analyses, and display results in a customized web-service. A single mouse click allows a user to generate a report or receive an auto-generated PDF report in response to an event of interest. Silobreaker has also teamed up with a partner company to add sentiment analysis to its already comprehensive suite of analytics. Currently in final testing phase with large multinational corporate test-users and due to be released at end of 2008/early 2009.
Silobreaker is a leader in search enabled intelligence applications. Check out the company at www.silobreaker.com. A happy quack to the reader who tipped me on these Silobreaker developments.
Stephen Arnold, October 23, 2008
Google: A Powerful Mental Eraser
October 23, 2008
Earlier today I learned that a person who listened to my 20 minute talk at a small conference in London, England, heard one thing only–Google. I won’t mention the name of this person, who has an advanced degree and is sufficiently motivated to attend a technical conference.
What amazed me were these points:
- The attendee thought I was selling Google’s eDiscovery services
- I did not explain that organizations require predictive services, not historical search services
- I failed to mention other products in my talk.
I looked at the PowerPoint deck I used to check my memory. At age 64, I have a tough time remembering where I parked my car. Here’s what I learned from my slide deck.
Mention Google and some people in the audience lose the ability to listen and “erase” any recollection of other companies mentioned or any suggestion that Google is not flawless. Source: http://i265.photobucket.com/albums/ii215/Katieluvr01/eraser-2.jpg.
First, I began with a chart created by an SAS Institute professional. I told the audience the source of the chart and pointed out the bright red portion of the chart. This segment of the chart identifies the emergence of the predictive analytics era. Yep, that’s the era we are now entering.
Second, I reviewed the excellent search enabled eDiscovery system from Clearwell Systems. I showed six screen shots of the service and its outputs. I pointed out that attorneys pay big sums for the Clearwell System because it creates an audit trail so queries can be rerun at any time. It generates an email thread so an attorney can see who wrote whom when and what was said. It creates outputs that can be submitted to a court without requiring a human to rekey data. In short, I gave Clearwell a grade of “A” and urged the audience to look at this system for competitive intelligence, not just eDiscovery. Oh, I pointed out that email comprises a larger percentage of content in eDiscovery than it has in previous years.
Recommind: Grabs Legal Hold
October 20, 2008
Recommind released its Insite Legal Hold solution today. This product bridges the gap between enterprise search and analytics.
Recommind’s Craig Carpenter states that Insite maps well with the current customer base of financial and professional service firms that are involved in heavily regulated, high knowledge users that are subject to mass litigation.
The release of this product during these financially strained times is viewed as a growth opportunity backed by a recently infusion of $7.5 million in private-equity funding.
So what makes Insite Legal Hold worth an investment in your company? First, it is an integrated solution – early risk assessment (ERA), preservation, hold/collection and processing. Second, you can reduce your litigation related costs and risks to some degree. Third, you can collect only what is needed and leave the rest to current company retention policy. Finally, you can proactively address retention and spoliation risks; that is, having an email changed.
Perhaps the most intriguing part of this product is the automated updates to current holds, though Mr. Carpenter said that in response to customer feedback, Recommind also included less sexy but still important features including filtering, deduping, near-duping, and e-mail-thread processing.
A few other benefits of Insite Legal Hold include:
- Collective selection based upon keyword, Boolean, and concept matching. This collective selection provides is more defensible than previous legal hold releases because the applied intelligence normalizes for related concepts and produces documents that yield more relevant data that is above and beyond reasonable as required by the Federal Rules of Civil Procedure.
- Explore in Place Technology allows the indexing an return of light results into html for a sampling review which can them be used to apply concept searches and more to the fuller data sets.
- Multi-platform flexibility: allows enterprises with a legacy review platform to enhance data analytics yet still use its current system for production
- Built-in processing: filter, dedupe, near-dupe, and thread documents, thereby saving 70-80% of processing and review costs.
- Manages Multiple Holds.
- Reduces IT costs by providing a forensically sound copy of perceived
relevant data and holds it in a separate data store.
When asked about pricing Mr. Carpenter provided an overview of Recommind’s three-tiered licensing module.
- Annual license fee bases upon the number of custodians
- System sold outright to customers with existing infrastructures
- A La Carte for those customers who don’t have a huge litigation load but need to manage 1 or 2 cases per year.
Insite Legal Hold has a huge potential to reduce the costs and risks involved in e-discovery endeavors. The pain points of high costs at the collection and review stage make the automation of updates and concept and near-concept bases selection an attractive solution.
Recommind’s investment of private equity funds to get the word out about their solution in a time when more potential customers are struggling with the fall-out from a global financial crisis bodes well for the profit stream of this company. What is apparent with this solution is that the developers are starting to pay attention to the less-sexy parts of e-discovery work and spending time and money to provide solutions that help reduce costs and the collection and production stages of the e-discovery cycle.
Constance Ard, Answer Maven for Beyond Search, October 20, 2008
Google Yahoo: Boo Hoo for GooHoo
October 16, 2008
I don’t pay much attention to online advertising or the search engine optimization games. Ads are part of the furniture of living in the Wild West of capitalism. SEO is little more than people without the ability to create compelling content working hard to outsmart indexing robots. Dull, uninteresting, and crass work from my point of view. The headline in The AM Law Daily here “Five Firms Push for Google Yahoo Antitrust Settlement” by Nate Raymond caught my attention. For me the key point was not the litigation that Google faces. I noted the legal eagles enlisted by GooHoo to make the objections go away: Cleary Gottlieb Steen & Hamilton, Wilson Sonsini Goodrich & Rosati, Skadden, Arps, Slate, Meagher & Flom, and Hunton & Williams. The price tag for the eDiscovery related to this matter is going to be a blockbuster just like the legal fees. No BS regarding the costs of this adventure. We have an aleph of eagles circling this GooHoo matter. Google and Yahoo do seem to be care about ads and SEO-related activities for traffic. In my opinion utilities are natural monopolies and will form one way or another. What’s your view?
Stephen Arnold, October 16, 2008
