Metadata Now Fair Game
November 2, 2009
The US legal system has spoken. I saw the ZDNet UK story “Watch Out, Your Metadata Is Showing” and chuckled. Not long ago in goose years, legal eagles realized that the Word fast save function preserved text once in a document. Sending the document with fast save activated could allow the curious to see the bits and pieces of document that were once believed to be deleted from that document. Exciting stuff. Now the Arizona supreme court, according to Simon Bisson and Mary Branscombe, “has decided that the metadata of a document is governed by the same rules as the document. With value-added indexing coming to most SharePoint systems, there will be some interesting discussions about what metadata is the document’s metadata and which metadata are part of another, broader system. If you read vendors’ enthusiastic descriptions of what their smart software will assign to documents, users, and system processes, you will enter into an interesting world. How exciting will be be? Consider a document that has metadata such as date of creation, file format, and the name of the author. Now consider a document that has metadata pertaining to the “aboutness” of a document, who looked at the document, who made which change and when, and who opened the document and for how long. Interesting stuff in my opinion. The courts will be entering data space soon, and I think that journey will be difficult. Next up? A metadata specialist at your local Top 10 law firm. Get your checkbook ready.
Stephen Arnold, November 2, 2009
I say, no pay.
Stratify Presses the Accelerator
October 7, 2009
Stratify (formerly Purple Yogi) rolled out an early case assessment tool called eVantage. The idea is that by processing quickly available information, those engaged in a legal matter can get a flash view of the information germane to a particular issue. I found the article “Stratify Sweeps left [of EDRM] with eVantage” a useful summary of this new service. For me, the most important comment in the article was:
For Stratify, the news is that they will deliver a box on site that can: process data (over 300 file and archival formats, including Encase, FTK, Exchange); remove system files and detect duplicate and near-duplicate data; search multilingual data (Chinese, Japanese, Korean, etc.) using complex Boolean and faceted search strategies where facets equal domains, custodians, e-mail senders and receivers, etc.; and offers intelligent ways to conduct first-level review and data analysis by e-mail threads and identified concepts and groups.
The appliance solution allows a quick deployment. The advantage of an eDiscovery “toaster” is that content can be processed quickly. Speed of deployment has been an irritant to Type A attorneys who want to begin the information review process quickly.
Will the Stratify solution pay off for Iron Mountain? One thing is certain. Other eDiscovery appliance vendors will have to respond to the Stratify appliance.
Stephen Arnold, October 7, 2009
eDiscovery 1994
October 7, 2009
A happy quack to the reader who sent me a link to the blog post to “Old ZyLAB Promotional Video from 1994”. Christopher Spizzirri wrote in the Delaware eDiscovery Report:
This is apparently a 15-year-old ZyLAB promotional video recently posted on YouTube. The video covers some eDiscovery related technologies, including OCR, fuzzy searching, and automatic bates numbering.
The video makes clear that significant progress in content processing has been made in the last 15 years. Fun to watch.
Stephen Arnold, October 7, 2009
A New Gartner Hobby Horse
October 5, 2009
I found “Gartner Humming the Proactive eDiscovery Tune with Five Step Process for Better Use of Enterprise Search in eDiscovery” notable for the length of the title and revelation that search and retrieval is important in eDiscovery. Yep. When preparing to work on a legal matter, search is pretty darned important. Finding a single email in several million does require some search and retrieval functionality even though attorneys really like to print out email, put them in three ring binders, and haul them around. The most interesting statement in the blog post about Gartner’s newest hobby horse was, in my opinion:
If you read my Blog with any consistency, you know that I believe that eDiscovery is part of a bigger issue called Governance, Risk and Compliance (GRC). And, if addressed from a proactive standpoint with ESI archiving and leading edge enterprise search and analysis from forward thinking technology organizations such as Orcatec and ContentAnalyst, eDiscovery will eventually become a commodity process.
I found this statement more informative than the information than Gartner’s identifying the obvious.
Stephen Arnold, October 5, 2009
European Search Vendor Round Up
September 16, 2009
Updated at 8 29 am, September 17, 2009, to 23 vendors
I received a call from a very energetic, quite important investment wizard from a “big” financial firm yesterday. Based in Europe, the caller was having a bad hair day, and he seemed pushy, almost angry. I couldn’t figure out why he was out of sorts and why he was calling me. I asked him. He said, “I read your Web log and you annoy me with your poor coverage of European search vendors.”
I had to admit that I was baffled. I mentioned the companies that I tracked. But he wanted me to do more. I pointed out that the Web log is a marketing vehicle and he can pay me to cover his favorite investment in search. That really set him off. He wanted me to be a journalist (whatever that meant) and provide more detailed information about European vendors. And for free.
Right.
After the call, I took a moment and went through my files to see which European vendors I have mentioned and the general impression I have of each of these companies. The table below summarizes the companies I have either profiled in my for fee studies or the companies I have mentioned in this diary / marketing Web log. You may disagree with my opinions. I know that the azure chip consultants at Gartner, Ovum, Forrester, and others certainly do. But that’s understandable. The addled geese here in Harrod’s Creek actually install systems and test them, a step that most of the azure chip crowd just don’t have time because of their exciting work to generate enough revenue to keep the lights on, advise clients, and conduct social network marketing events. Just my opinion, folks. I am entitled to those despite the wide spread belief that I should be in the Happy Geese Retirement Home.
Vendor | Function | Opinion |
Autonomy | Search and eDiscovery | One of the key players in content processing; good marketing |
Bitext | Semantic components | Impressive technology |
Brox | Open source semantic tools | Energetic, marketing centric open source play |
Empolis GmbH | Information management and business intel | No cash tie with Attensity |
Exalead | Next generation application platform | The leader in search and content processing technology |
Expert System | Semantic toolkit | Works; can be tricky to get working the way the goslings want |
Fast ESP | Enterprise search, business intelligence, and everything else | Legacy of a police investigation hangs over the core technology |
InfoFinder | Full featured enterprise search system | my contact in Europe reports that this is a European technology. Listed customers are mostly in Norway. |
Interse Scan Jour | SharePoint enterprise search alternative | Based in Copenhagen, the Interse system adds useful access functions to SharePoint; sold in Dec 2008 |
Intellisearch | Enterprise search; closed US office | Basic search positioned as a one size fits all system |
Lumur Consulting | Flax is a robust enterprise search system | I have written positively about this system. Continues to improve with each release of the open source engine. |
Lexalytics | Sentiment analysis tools | A no cash merger with a US company and UK based Infonics; |
Linguamatics | Content processing focused on pharma | Insists that it does not have a price list |
Living-e AG | Information management | No cash tie with Attensity |
Mindbreeze | Another SharePoint snap in for search | Trying hard; interface confusing to some goslings |
Neofonie | Vertical search | Founded in the late 1990s, created Fireball.de |
Ontoprise GmbH | Semantic search | The firm’s semantic Web infrastructure product, OntoBroker, is at Version 5.3 |
Pertimm | Enterprise search | Now positioned as information management |
PolySpot | Enterprise search with workflow | Now at Version 4.8, search, work flow, and faceted navigation |
SAP Trex | Search tool in NetWeaver; works with R/3 content | Works; getting long in the tooth |
Sinequa | Enterprise search with workflow | Now at Version 7, the system includes linguistic tools |
Sowsoft | High speed desktop search | Excellent, lightweight desktop search |
SurfRay | Now focused on SharePoint | Uncertain; emerging from some business uncertainties |
Temis | Content processing and discovery | Original code and integrated components |
Tesuji | Lucene enterprise search | Highly usable and speedy; recommended for open source installations |
Updated at 8 29 am Eastern, September 17, 2009
Coveo Adds Muscle to Its Email Search System
September 9, 2009
Short honk: My Overflight service alerted me to Coveo’s most recent boost to its enterprise email search system. You can read the full text of the article in “Coveo Launches New Email Search Paradigm with Cross Enterprise Email Search”. For me the most interesting comment in the write up was:
Traditional email search enables only the employee to access his or her desktop email, reducing knowledge sharing and creating roadblocks to information access. Coveo`s industry-first Enterprise Email Search package provides organizations with the cross-organizational ability to index, tag, categorize and securely access all enterprise email content to reduce risk and conduct investigations and provide completely unified access to email and attachments regardless of where they are stored, offline within PST files on desktops or laptops, or online within active Exchange or email archiving platforms such as Symantec Enterprise Vault and Quest Archive Manager.
For a full discussion of the new system, navigate to the Coveo Web site.
Stephen Arnold, September 9, 2009
Quintura: Relationships with Hoops to Jump
September 9, 2009
A reader sent me a link to Quintura. I had looked at this system and turned my attention to more enterprise-centric vendors. I took another look at it this morning (September 7, 2009). I ran a query on the publicly accessible search system. This appears at the top the Quintura interface. My earlier test delivered a result in the form of a relationship map. The father of the hyperbolic relationship map in my mind is Ramana Rao, the former Xerox PARC wizard. His map has been influential and its surfaces in SAS’s “discovery” interface and has pointed Cluuz.com toward its display of connections. I ran a test query for “Microsoft Bing”. The result I saw was:
The results were useful. I then ran a more difficult query. I tried “Microsoft Eugene Agichtein. Dr. Agichtein is working in a little known but quite significant area related to next generation data management. Here is the Quintura display for this query:
Source: http://www.quintura.com. No link to the results appears in the navigation bar in thus Quintura page display.
None of the mapped items pertained to database, dataspace, or data management. I got proper nouns, and I know that the pointer to the people will with some further research eventually lead to useful information. I found this set of discovered tags not too useful for my needs. I kept getting proper nouns, and I needed other words and phrases for this test query. Maybe a consumer would find the tags useful. I found them not particularly useful for the type of research I do.
I then ran the same query on Cluuz.com. Here is a portion of the Cluuz.com output which uses the Yahoo search index:
Bingo. The Cluuz.com system pointed me to a relationship at the University of Washington, prople, and a concept. I was off to the research races.
I then tried the query on one of Google’s “in the wild” search demonstrations. I don’t recall how I got the system to generate this type of output, so I can provide much of a how to in this write up. Here’s what Google delivered for the query “Microsoft Eugene Agichtein”:
Source: http://www.google.com/search?hl=en&tbo=1&tbs=ww%3A1&q=Microsoft+Eugene+Agichtein&btnG=Search&tbo=1
Okay, better than Quintura.com’s output in my opinion but not as good as the Cluuz.com output.
What’s my take?
First, I don’t think most users find these types of relationship maps easy to use upon first encountering them. A more skilled researcher will be able to make sense out of them. If the maps are too simple like the Quintura and Google implementation, I think that a list of suggestions may be more useful.
Second, in terms of what the systems found “related” to Dr. Eugene Agichtein, the Yahoo index processed by Cluuz.com was more useful. This tells me that Yahoo has a useful index and a lousy way of making the pointers available via Yahoo. The Canadian crowd at Cluuz.com makes Yahoo a more useful service. Too bad Yahoo has not signed a deal with Cluuz.com. Hopefully Cluuz.com will stay in business because I like their tools.
Third, the “regular” Google index has the information I wanted. I did have to use the Advanced Search panel to dig it out of the trillion item Google index (give or take a hundred billion, of course). Google has a major accessibility problem right now. The weird thing that looks like a law sprinkler does not deliver for me.
So, if you want relationship maps, I suggest you use Cluuz.com and skip the flashy displays on other systems until these outfits crack the code successfully.
Stephen Arnold, September 8, 2009
Microsoft Fast ESP Architecture
September 4, 2009
Short honk: I was riffling through the Overflight inputs and I came across “EMC ‘Stitching’ Its Stack With Kazeon”. Kazeon is a vendor of eDiscovery systems. What interested me was this statement, attributed to an unnamed software engineer at EMC, so take the comment with a dash hoi sin sauce:
EMC sees e-discovery as a strategic tool allowing it to sharply differentiate itself from Microsoft. (Microsoft acquired erstwhile EMC search partner FAST Search and Transfer in early 2008. A software engineer at EMC told me that FAST has a “fundamentally flawed architecture. Microsoft is welcome to have it.”) Use our stack, EMC is saying, and we will provide you with tools allowing you to find and manage all the information in your organization, not just for litigation purposes, but to help stimulate innovation.
What struck me was the statement “fundamentally flawed architecture.” Harsh words. Could Microsoft have spent $1.2 billion on a company with “fundamentally flawed architecture”? I don’t know the answer, but I put the statement in my “Quotes” folder. When the “new” Fast ESP becomes available, more information will be available. Perhaps some of this information will prove or disprove this statement from Bnet.com’s story.
Stephen Arnold, September 3, 2009
LexisNexis and Recommind Tie Up
August 26, 2009
LexisNexis continues to look for ways to boost its revenues. The efforts are interesting because the company continues to retrace its steps in an effort to crack the code. LexisNexis, like Westlaw, faces a mini revolt. Some law firm clients are asking the firms to do legal work for fixed prices, reduced rates with ceilings on costs, and be more creative in reducing the costs of certain legal work. This is bad news for the commercial legal information companies. The reason? The companies delivering US legal information depend to some degree on taxi meter pricing. The idea is that the legal researcher pays for time and other variables for system access. Not surprisingly, for a patent matter, a legal researcher can run up a four or five figure bill. In the good old days, the law firms’ clients would pay up. Today, some clients are balking. Once again the traditional business model runs headlong into the new realities of business.
What’s the fix? LexisNexis has tried a tie up with Microsoft to put research in Microsoft Word. Did you activate the feature? I didn’t. LexisNexis has tried to diversify into fraud, content analysis, and risk. Do you think of LexisNexis when I say these words? I didn’t think so. LexisNexis has tried different angles of attack on search, law firm software services, and Web access.
The financial pressure continues to mount.
I just learned that knowledge management is the next revenue Petri dish. “LexisNexis and Recommind to Deliver New Knowledge Management Capabilities for Law Firms” reported this new venture. The story reported:
The new offering integrates Lexis Search Advantage content and services accessed through lexis.com with MindServer Search, Recommind’s enterprise search platform. It provides a one-stop destination combining access to documents and information from both a firm’s internal sources as well as trusted LexisNexis® content, delivering search results that are more complete, efficient and actionable.
The run down of benefits is pretty much what one would expect: information integration, better research, etc.
The proof of the pudding will be revenue. LexisNexis is straying from its core competency of delivering commercial grade legal information. Will knowledge management generate enough cash to put LexisNexis back on the fast growth track? In my opinion, the company is in a race. Some government entities are making more legal related information available online. Attorneys looking for ways to cut costs are likely to flock to these free services. Another challenge is the interest in lower cost professional information services like FastCase.
Recommind shifted from a legal niche strategy to an enterprise search strategy. Does this tie up mean that Recommind is returning to its legal niche or diversifying courtesy of LexisNexis.
Interesting moves by both companies. Each firm has search technology. Now search and content have morphed into knowledge management. Does anyone know what knowledge means? Does anyone know what management means? The phrase strikes me as “old school”.
Stephen Arnold, August 26, 2009
Silobreaker Update
August 25, 2009
I was exploring usage patterns via Alexa. I wanted to see how Silobreaker, a service developed by some savvy Scandinavians, was performing against the brand name business intelligence companies. Silobreaker is one of the next generation information services that processes a range of content, automatically indexing and filtering the stream, and making the information available in “dossiers”. A number of companies have attempted to deliver usable “at a glance” services. Silobreaker has been one of the systems I have relied upon for a number of client engagements.
I compared the daily reach of LexisNexis (a unit of the Anglo Dutch outfit Reed Elsevier), Factiva (originally a Reuters Dow Jones “joint” effort in content and value added indexing now rolled back into the Dow Jones mothership), Ebsco (the online arm of the EB Stevens Co. subscription agency), and Dialog (a unit of the privately held database roll up company Cambridge Scientific Abstracts / ProQuest and some investors). Keep in mind that Silobreaker is a next generation system and I was comparing it to the online equivalent of the Smithsonian’s computer exhibit with the Univac and IBM key punch machine sitting side by side:
Silobreaker is the blue line which is chugging right along despite the challenging financial climate. I ran the same query on Compete.com, and that data showed LexisNexis showing a growth uptick and more traffic in June 2009. You mileage may vary. These types of traffic estimates are indicative, not definitive. But Silobreaker is performing and growing. One could ask, “Why aren’t the big names showing stronger buzz?”
A better question may be, “Why haven’t the museum pieces performed?” I think there are three reasons. First, the commercial online services have not been able to bridge the gap between their older technical roots and the new technologies. When I poked under the hood in Silobreaker’s UK facility, I was impressed with the company’s use of next generation Web services technology. I challenged the R&D team regarding performance, and I was shown a clever architecture that delivers better performance than the museum piece services against which Silobreaker competes. I am quick to admit that performance and scaling remain problems for most online content processing companies, but I came away convinced that Silobreaker’s engineering was among the best I had examined in the real time content sector.
Second, I think the museum pieces – I could mention any of the services against which I compared Silobreaker – have yet to figure out how to deal with the gap between the old business model for online and the newer business models that exist. My hunch is that the museum pieces are reluctant to move quickly to embrace some new approaches because of the fear of [a] cannibalization of their for fee revenues from a handful of deep pocket customers like law firms and government agencies and [b] looking silly when their next generation efforts are compared to newer, slicker services from Yfrog.com, Collecta.com, Surchur.com, and, of course, Silobreaker.com.
Third, I think the established content processing companies are not in step with what users want. For example, when I visit the Dialog Web site here, I don’t have a way to get a relationship map. I like nifty methods of providing me with an overview of information. Who has the time or patience to handcraft a Boolean query and then paying money whether the dataset contains useful information or not. I just won’t play that “pay us to learn there is a null set” game any more. Here’s the Dialog splash page. Not too useful to me because it is brochureware, almost a 1998 approach to an online service. The search function only returns hits from the site itself. There is not compelling reason for me to dig deeper into this service. I don’t want a dialog; I want answers. What’s a ProQuest? Even the name leaves me puzzled.
I wanted to make sure that I was not too harsh on the established “players” in the commercial content processing sector. I tracked down Mats Bjore, one of the founders of Silobreaker. I interviewed him as part of my Search Wizards Speak series in 2008, and you may find that information helpful in understanding the new concepts in the Silobreaker service.
What are some of the changes that have taken place since we spoke in June 2008?
Mats Bjore: There are several news things and plenty more in the pipeline. The layout and design of Silobreaker.com have been redesigned to improve usability; we have added an Energy section to provide a more vertically focused service around both fossil fuels and alternative energy; we have released Widgets and an API that enable anyone to embed Silobreaker functionality in their own web sites; and we have improved our enterprise software to offer corporate and government customers “local” customizable Silobreaker installations, as well a technical platform for publishers who’d like to “silobreak” their existing or new offerings with our technology. Industry-wise,the recent statements by media moguls like Rupert Murdoch make it clear that the big guys want to monetize their information. The problem is that charging for information does not solve the problem of a professional already drowning in information. This is like trying to charge a man who has fallen overboard for water instead of offering a life jacket. Wrong solution. The marginal loss of losing a few news sources is really minimal for the reader, as there are thousands to choose from anyways, so unless you are a “must-have” publication, I think you’ll find out very quickly that reader loyalty can be fickle or short-lived or both. Add to that that news reporting itself has changed dramatically. Blogs and other types of social media are already favoured before many newspapers and we saw Twitters role during the election demonstrations in Iran. Citizen journalism of that kind; immediate, straight from the action and free is extremely powerful. But whether old or new media, Silobreaker remains focused on providing sense-making tools.
What is it going to be, free information or for fee information?
Mats Bjore: I think there will be free, for fee, and blended information just like Starbuck’s coffee.·The differentiators will be “smart software” like Silobreaker and some of the Google technology I have heard you describe. However, the future is not just lots of results. The services that generate value for the user will have multiple ways to make money. License fees, customization, and special processing services—to name just three—will differentiate what I can find on your Web log and what I can get from a Silobreaker “report”.
What can the museum pieces like Dialog and Ebsco do to get out of their present financial swamp?
Mats Bjore: That is a tough question. I also run a management consultancy, so let me put on my consultant hat for a moment. If I were Reed Elsevier, Dow Jones/Factiva, Dialog, Ebsco or owned a large publishing house, I must realize that I have to think out of the box. It is clear that these organizations define technology in a way that is different from many of the hot new information companies. Big information companies still define technology in terms of printing, publishing or other traditional processes. The newer companies define technology in terms of solving a user’s problem. The quick fix, therefore, ought to be to start working with new technology firms and see how they can add value for these big dragons today, not tomorrow.
What does Silobreaker offer a museum piece company?
Mats Bjore: The Silobreaker platform delivers access and answers without traditional searching. Users can spot what is hot and relevant. I would seriously look at solutions such as Silobreaker as a front to create a better reach to new customers, capture revenues from the ads sponsored free and reach a wider audience an click for premium content – ( most of us are unaware of the premium content that is out there, since the legacy contractual types only reach big companies and organizations. I am surprised that Google, Microsoft, and Yahoo have not moved more aggressively to deliver more than a laundry list of results with some pictures.
Is the US intelligence community moving more purposefully with access and analysis?
The interest in open source is rising. However, there is quite a bit of inertia when it comes to having one set of smart software pull information from multiple sources. I think there is a significant opportunity to improve the use of information with smart software like Silobreaker’s.
Stephen Arnold, August 25, 2009