Iron Mountain Snags Mimosa
February 24, 2010
I read in the Microsoft centric publication “Iron Mountain Acquires Mimosa.” Iron Mountain began to grow when records management took off. The company has been riding high on the digital data glut. According to the write up:
Mimosa is the rapidly growing provider of premises-based e-mail and file-based content archival solutions called NearPoint. The Microsoft Gold Certified Partner, popular among Exchange and SharePoint shops, last month announced its 1,000th customer. Though it was founded in 2003, more than 300 of its NearPoint systems were sold last year. By acquiring Mimosa, Iron Mountain can offer premises-based archival and e-discovery it has not been able to offer before.
More information about Mimosa is at http://www.mimosasystems.com.
Iron Mountain also owns Stratify, formerly Purple Yogi. Now with all that digital data, how will Iron Mountain customers search, retrieve, process, and manipulate those information objects? When I know the answer, I will let you know. My recollection is that Mimosa has a basic search system, but it is not the fire-breathing dragon that some vendors have in their menagerie.
Stephen E Arnold, February 24, 2010
No one paid me to write this. Because I reference a mineral, iron, I think I have to report free work to the USGS.
A Free Pass for Open Source Search?
February 11, 2010
Dateline: Harrod’s Creek, February 11, 2010
I read Gavin Clarke’s “Microsoft Drops Open Source Birthday Gift with Fast Lucidly Imaginative?” I think that the point of the story was “a free pass” to “open source search providers like Lucid Imagination” is interesting. However, I am not willing to accept “free pass”, a variant of the “free lunch” in my opinion.
Here’s my view from the pleasant clime of snowy Harrod’s Creek.
First, in my opinion, most of the Fast Search & Transfer licensees bought into the “one size fits all” approach to search: facets, reports, access to structured and unstructured data, etc. As many of these licensees discovered, the cost of making Fast’s search technology deliver on the marketing PowerPoints was high. Furthermore, some like me learned how difficult it was for certain licensees to get the moving parts in sync quickly. Fast ESP consisted, prior to the Microsoft buy out, of keyword search, semantics from a team in Germany, third-party magic from companies like Lexalytics, home brew code from Norwegian wizards, and outright acquisitions for publishing and content management functionality. Wisely, many search vendors have learned to steer clear of the path that Fast Search & Transfer chopped through the sales wilderness. This means that orphaned Fast Search licensees may be looking at procurements that narrow the scope of search and content processing systems. In fact, there are only a handful vendors who are now pitching the “kitchen sink” approach to search.
Source: http://www.graceforlife.com/uploaded_images/no_free_lunch-772769.jpg
Second, open source search solutions are not created equal. Some are tool kits; others are ready-to-run systems. Lucid Imagination has a good public relations presence in certain places; for example, San Francisco. For those who monitor the search space, there are some other open source vendors that may provide some options. I particularly like the open source version of Lucene available from Tesuji.eu. Ah, never heard of the outfit, right? I also find the FLAX system available from Lemur Consulting useful as well. I think the issues with Fast Search & Transfer are not going to be resolved by ringing up a single vendor and saying, “We’re ready to go with your open source solution.” The more prudent approach is going to be understanding what the differences among various open source search solutions are and then determining if an organization’s specific requirements match up to one of these firms’ service offerings. Open source, therefore, requires some work and I don’t think a knee jerk reaction or a sweeping statement that the Microsoft announcement will deliver a “free pass” is accurate.
Autonomy Pops Up an Email Archiving Toaster
January 31, 2010
Autonomy is in the appliance business. You can get what The Orange Rag called “the Autonomy eDiscovery Appliance.” The idea is that the features of a Clearwell-type of solution is combined with Autonomy’s smart software and connectors. The solution, according to The Orange Rag: “delivers a broad set of unique capabilities” and “meaning based computing”. Among the features embedded in the appliance are search, connectors to various content types, visualization, scalability, and reports. The appliance that has captured some loyal fans is the Clearwell Systems’ “rocket docket” service in its appliance. Clearwell now has a formidable competitor, and I wonder if the value-added software that allows a report to be generated that can be slapped in the hand of opposing counsel and a nifty audit trail feature will be enough to deal with the steroid infused marketing of Autonomy. Should be interesting because Recommind has tried to broaden beyond the legal market in a bid to become an enterprise search vendor. Stratify has morphed several times in its eDiscovery journey. EMC bought Kazeon and may be getting ready to attack the legal eagles from the storage angle. I suppose this is what the azure chip crowd calls “search specialization”. I thought it was savvy product packaging, but what do I know. I am not young and inclined to perceive myself as infallible. I am an addled goose who forgets when he puts his pin feathers.
Stephen E Arnold, January 31, 2010
A freebie. I will report this unpleasant fact to the director of the US Postal Museum where old information methods are on display.
Search Vendors Working the Content Food Chain
January 13, 2010
In the last six months, I have noticed that three companies are making an effort to respond to ZyLAB’s success in the end-to-end content processing sector. There has been some uninformed and misleading discussion of search and content processing companies shift to vertical market solutions. I think this view distorts what some vendors are doing; namely, when one company finds a way to make sales, the other vendors pile into the Volkswagen. This is not so much “imitation as flattery”. What is happening is that sales are tough to make. When a company finds an angle, the stampede is on. In a short period of time, an underserved sector in search and content processing has more people stomping around than Lady Gaga.
Let’s go back in history, a subject that most of the poobahs, azure chip consultants, and self appointed experts avoid. The idea that certain actions have surfaced before is no fun. Identifying a “new” trend is easier, particularly when the trend spotter’s “history” extends to his / her last Google query.
The Mobius strip is non-orientable, just like search solutions that provide end-to-end solutions. A path on a Mobius strip can be twice as long as the original strip of paper. That’s a good way for me to think about end-to-end search and content processing systems. Costs follow a similar trajectory as well.
In the dim mists of time, one of the first outfits to offer and end-to-end solution to content acquisitions, indexing, and search was—believe it or not—Excalibur. The first demonstration I received of the Excalibur RetrievalWare technology included scanning, conversion of the scanned image’s text to ASCII, indexing of the ASCII for an image, and search. The information processed in that demonstration was a competitor’s marketing collateral. There were online search systems, but these were mostly small scale systems due to the brutal costs of indexing large domains of HTML. A number of companies were pushing forward with the idea of integrated scanning systems. Sure, in the 1990s you could buy a high end scanner and software. But in order to build a system that minimized the fiddly human touch, you had to build the missing components yourself. Excalibur hooked up with resellers of high end scanners from companies like Bell+Howell, Fujitsu, and others. The notion of taking a scanned image and then via an in memory processing performing optical character recognition of the page image and then indexing that ASCII was a relatively new method. UMI (a unit of Bell+Howell) had a sophisticated production process to do this work. Big outfits like Thomson were interested in this type of process because lots of information in the early 1990s was still in hard copy form. To make a long story short, the Excalibur engineers were among the first to create commercial product that mostly worked, well, sort of. The indexing was an issue. Excalibur embarked on a journey that required enhancing the RetrievalWare product, generating ready-to-use controlled vocabularies for specific business sectors like defense and banking. As you may know, Excalibur’s original vision did not work so the company mrophed into a search and content processing company with a focus on business intelligence. The firm renamed itself as Convera. The origins of the company were mostly ignored as the Convera package of services chased government work, commercial accounts like Intel and the National Basketball Association (data center SaaS functions for the former and video searching for the hoopsters). When those changes did not work out too well, Convera refocused to become a for fee version of the free Google custom search engine. That did not work out too well either, and the company has be semi-dissolved.
Why’s this important?
First, the history shows that end-to-end processing is not new. Like much of the hot search innovations, I find the discoveries of the azure chip crowd a “been there, done that” experience. Processing paper and making it searchable is a basic way to approach certain persistent problems.
Second, the synopsis of the Excalibur trajectory makes clear that senior managers of search and content processing companies scramble, following well worn paths. The constant repositioning and restating of what a technology allegedly does is a characteristic of search and content processing.
Third, the shifts and jolts in the path of the Excalibur / Convera entity are predictable. The template is:
- Start with a problem
- Integrate
- Sell
- Engineer fixes on the fly
- Fail
- Identify a new problem
- Rinse, repeat.
What has popped out of my Overflight intel system is that law firms are now looking for a solution to a persistent information problem; that is, when a legal matter fires up, most search systems work just fine with content in electronic form. The hitch is that a great deal of paper is produced. If something exists in digital form and one law firm must provide that information to another law firm, some law firms convert the digital information to paper, slap on a code, and have FedEx deliver boxes of paper. The law firm receiving this paper no longer has the luxury of paying minions to grind through the paper. The new spin on the problem is that the law firm’s information technology people want to buy a hardware-software combination that allows a box of paper to be put in one end and the magic between the hard copy and the searchable, electronic instance of the documents are magically completed.
Well, that’s the idea. Some of the arabesques that vendors slap on this quite difficult problem include:
- Audit records so a law firm knows who looked at what when and for how long
- A billing method. Law firms want to do invoices, of course
- A single point solution so there is “one throat to choke”.
What the companies want is what Excalibur asserted it had almost 20 years ago.
ZyLAB, under the firm hand of Johann Scholtes (a former Dutch naval officer), has made inroads in this market sector. You can read an interview with him in the Search Wizards Speak series, so I won’t recycle that information in this write up.
Autonomy was quick to move to build out its end-to-end solutions for law firms and other clients with a paper and digital content problem. In fact, Autonomy just received an award for its end-to-end eDiscovery platform.
Brainware offers a similar system. That company, a couple of years ago, told me that it had to add staff to handle the demand for its scanning and search solution. Among the firm’s largest customers were law firms and, not surprisingly, the Federal government. You can read an interview with a Brainware executive (who is an attorney) in the Search Wizards Speak series.
I learned that Recommind has inked a deal with Daeja Image Systems for its various document processing software components. The idea is to be able to provide an end-to-end solution to law firms, government agencies, and other outfits that need a system that provides access to paper based content and digital content.
Let’s step back.
What this addled goose sees in these recent announcements is that the “new” is little more than a rediscovery that law firms have not yet cracked the back of the paper to digital job and been able to get a search system that provides access to the source material. Sure, there were solutions 20 years ago, but those solutions don’t meet a continuing need. Notice that this problem has been around for a long time, and I don’t think the present crop of solutions will solve the problem fully.
Microsoft and Its Entity Cube
December 30, 2009
Entity extraction has been around for a long, long time. Microsoft Research has revivified the discipline with its EntityCube. Here’s the description of EntityCube in the Microsoft EntityCube team’s words:
EntityCube is a research prototype for exploring object-level search technologies, which automatically summarizes the Web for entities (such as people, locations and organizations) with a modest web presence. The Chinese-language version is called Renlifang. The need for collecting and understanding Web information about a real-world entity (such as a person or a product) is mostly collated manually through search engines. However, information about a single entity might appear in thousands of Web pages. Even if a search engine could find all the relevant Web pages about an entity, the user would need to sift through all these pages to get a complete view of the entity. EntityCube generates summaries of Web entities from billions of public Web pages that contain information about people, locations, and organizations, and allows for exploration of their relationships. For example, users can use EntityCube to find an automatically generated biography page and social-network graph for a person, and use it to discover a relationship path between two people.
Microsoft Research points out that this is a test, eschewing the Google term “beta”. There are some issues, which include entity extraction, name disambiguation, entity ranking, and relationship extraction, among others. Softpedia’s “Introducing Microsoft’s EntityCube” is a useful overview.
I ran the query “dataspace” on Microsoft Academic, which has some EntityCube features enabled. I got a results list and as shown in the screenshot below, a list of entities on the left side of the results screen. I reviewed the hits and the ranking was somewhat unexpected. Experts whom I expected to appear toward the top of the results list were buried.
There was a more general purpose version of the system available at http://entitycube.research.microsoft.com/, but I did not want to install Silverlight, the bane of Major League Baseball, on this underpowered netbook. If you want quotes and more bells and whistles you can walk down this path.
According to Geek in Disguise, EntityCube offers some interesting features when you search for information about a person. Here’s what Geek in Disguise said, “Specifically, EntityCube automatically generates:
- A biography page for a person.
- A social-network graph for a person.
- A shortest-relationship path between two people.
- All titles of a person that are found on the Web. “
On10.net reported that EntityCube:
builds a dynamic Wikipedia page for the entity or person you search for. The types of information you’ll find include biographies, a social-network graph, relationships between people (mouse over the link to see how they are connected), and titles of people.
I checked my files and found a note to myself that a similar technology was in use to pluck product names from content for Microsoft’s Live.com product catalog service. You can compare the Microsoft technology with that of Cluuz.com. I find Cluuz.com’s approach more useful for my research. The set of content for Cluuz.com comes from the Yahoo.com index. Results from the Microsoft demo seem sparse. Cluuz.com seems to have addressed the problems identified with the Microsoft service.
Stephen E. Arnold, December 30, 2009
A freebie. If anyone were working at a watchdog agency in Washington, I would report this sad state of money-free writing.
Cicumvallation: Reed Elsevier and Thomson as Vercingetorix
November 27, 2009
Google Scholar Gets Smart in Legal Information
One turkey received a presidential pardon. Other turkeys may not be so lucky on November 26, 2009, when the US celebrates Thanksgiving. I am befuddled about this holiday. There are not too many farmers in Harrod’s Creek. The fields contain the abandoned foundations of McMansions that the present economic meltdown have left like Shelly’s statue of Ozymandius. The “half buried in the sand” becomes half built homes in the horse farm.
As Kentuckians in my hollow give thanks for a day off from job hunting,, I am sitting by the goose pond trying to remember what I read in my copy of Caesar’s De Bello Gallico. I know Caesar did not write this memoir, but his PR bunnies did a pretty good job. I awoke this morning thinking about the connection between the battle of Alesia and what is now happening to the publishing giants Reed-Elsevier and Thomson Reuters. The trigger for this mental exercise was Google’s announcement that it had added legal content to Google Scholar.
What’s Vercingetorix got to do with Google, Lexis, and Westlaw? Think military strategy. Starvation, death, surrender, and ritual killing. Just what today’s business giants relish.
Google has added the full text of US federal cases and state cases. The coverage of the federal cases, district and appellate, is from 1924 to the present. US state cases cover 1950 to the present. Additional content will be added; for example, I have one source that suggested that the Commonwealth of Virginia Supreme Court will provide Google with CD ROMs of cases back to 1924. Google, according to this source, is talking with other sources of US legal information and may provide access to additional legal information as well. What are these sources? Possibly
Public.Resource.Org and possibly Justia.org, among others.
The present service includes:
- The full text of the legal document
- Footnotes in the legal document
- Page numbers in the legal document
- Page breaks in the legal document
- Hyperlinks in the legal document to cases
- A tab to show how the case was cited in other documents
- Links to non legal documents that cite a case.
You can read various pundits, mavens, and azure=chip consultants’ comments on this Google action at this link.
You may want to listen to a podcast called TWIL and listened to the November 23, 2009, show on which Google Scholar was discussed for about a half hour. You can find that discussion on iTunes. Just search for TWIL and download the program “Social Lubricants and Frictions.”
On the surface, the Google push into legal information is a modest amount of data in terms of Google’s daily petabyte flows. The service is easy to use, but the engineering required to provide access to the content strikes me as non-trivial. Content transformation is an expensive proposition, and the cost of fiddling with legal information is one of the primary reasons commercial online services have had to charges hefty fees to look at what amounts to taxpayer supported, public information.
The good news is that the information is free, easily accessible even from an iPhone or other mobile device. The Google service does the standard Google animal tricks of linking, displaying content with minimal latency, and updating new content in a a minute or so that content becoming available to Google software Dyson vacuum cleaner.
So what?
This service is similar to others I have written about in my three Google monographs. Be aware. My studies are not Sergey-and-Larry-eat-pizza books. I look at the Google open source technical and business information. I ignore most of what Google’s wizards “say” in public. These folks are “running the game plan” and add little useful information for my line of work. Your mileage may differ. If so, stop reading this blog post and hunt down a cheerful non-fiction Google book by a real live journalist. That’s not my game. I am an addled goose.
Now let me answer the “so what”.
First, the Google legal content is an incremental effort for the Google. This means that Google’s existing infrastructure, staff, and software can handle the content transformation, parsing, indexing, and serving. No additional big-buck investment is needed. In fact, I have heard that the legal content project, like Google News, was accomplished in the free time for play that Google makes available to its full time professionals. A bit of thought should make clear to you that commercial outfits who have to invest to handle legal content in a Google manner have a cost problem right out of the starting blocks.
Second, Google is doing content processing that should be the responsibility of the US government. I know. I know. The US government wants to create information and not compete with commercial outfits. But the outfits manipulating legal information have priced it so that most everyday Trents and Whitneys cannot afford to use these commercial services. Even some law firms cannot afford these services. Pro bono attorneys don’t have enough money to buy yellow pads to help their clients. Even kind hearted attorneys have to eat before they pay a couple a hundred bucks to run a query on the commercial online services from publicly traded companies out to make their shareholders have a great big financial payday. Google is operating like a government when it processes legal information and makes it available without direct charge to the user. The monetization takes place but on a different business model foundation. That also spells T-R-O-U-B-L-E for the commercial online services like Lexis and Westlaw.
MailArchiva: An Open Source Email Archiving Tool
November 20, 2009
A happy quack to the reader who sent me a link to Junauza.com’s article “Open Source Email Archiving Software”. I was not aware of this software. With lawsuits all the rage, you may want to download this package and keep in handy. One never knows. The passage below provides the necessary links:
MailArchiva actually comes in two editions: the Open Source Edition (OSE) and Enterprise Edition (EE). See HERE to compare their features. If you want to download MailArchiva, you will have to sign-up HERE first.
I am downloading now. And if you have a corrupt email file, you may want to take a look at DiskGetor Data Recovery. There is a trial version.
Stephen Arnold, November 20, 2009
I want to disclose to the US Post Office that I was not paid to write about email which is rendering said institution somewhat out of step. You can’t lick email.
Coveo Expresso Breaks New Ground in Information Access
November 9, 2009
Coveo, a leading provider of enterprise search technology and information access solutions, recently unveiled a free, entry-level enterprise search solution, Coveo Expresso™ Beta. Coveo’s new solution places the power of enterprise information access in the hands of employees everywhere, at no cost, for up to 50 users. The free version of the Expresso content processing system can index one million one million desktop files and email items as well as 100,000 Intranet documents. Licenses can be expanded at minimal cost to as many as 250 users, five million desktop files and email items, and one million SharePoint and File share documents, just by typing a new access code. Administrators simply add new email accounts and SharePoint or file share documents within the intuitive administrative interface. Coveo Expresso is available for immediate download at www.coveo.com/expresso.
Laurent Simoneau, President and CEO, told Beyond Search:
Although enterprise search solutions have been available for nearly a decade, most are built on legacy systems that are difficult to implement and have not lived up to the promise of intuitive, secure and comprehensive information access across information silos. We want to re-educate businesses about the ease and simplicity with which enterprise search should work, as our customers can attest. Coveo Expresso does that—and takes enterprise search one step further with ubiquitous access interfaces such as the Coveo Outlook Sidebar or the desktop floating search bar, which provide guided, faceted search where employees ‘live’—in their email interface or on their PC/laptop. We’ve been testing this feature for a number of months with our current customers and have found it to be one of the biggest boosts to productivity for all employees, regardless of their roles.
Features
The free download features a number of Coveo innovations, including:
- Cross-enterprise Email Search, for 50 email accounts, including PST files and attachments, on desktops and in servers for up to 1 million total items.
- The Coveo Outlook Sidebar, the industry’s first true enterprise search Outlook plug-in, which provides sophisticated features such as conversation folding, related conversations, related people, related attachments, and the ability to search any indexed content without leaving Outlook, as well as the ability to launch advanced search with guided navigation through search facets.
- The Coveo Desktop Floating Search bar, enabling guided searches without leaving the program in which the user is working.
- Enterprise Desktop Search, including always-on indexing for 50 PCs/laptops.
- Mobile access via BlackBerries for 50 users.
The Espresso Interface
Search results appear in a clean, well-organized panel display.
ZyLAB Integrates Google Maps
November 8, 2009
According to Documanager.de, ZyLAB has integrated Google Maps with its ZyIMAGE Information Access Platform. Users now have the ability to identify the location of documents in a hit list. ZyLAB says that coordinates detail of the contents of a document can also be displayed on a Google Maps. The function requires no additional work on the part of the user.
Uses of the functionality range from law enforcement to eDiscovery. A user runs a query and each pin represents a document or a set of documents that are displayed on the additional metadata when you hover the mouse over it.
ZyLAB’s Rijnbeek Vincent, said:
This new functionality provides additional options to our use of visualization tools and ensuring more transparency in the information jungle. If, for example included in the context of criminal investigations coordinates of a crime scene in a document, it shows a pin exactly in these Google Maps to. But even in the building and construction sector is the new integration useful, by example, location information from complex construction plans quickly and clearly represents.
The use of visualization tools solves a major problem of the usual file structures: These traditional structures typically do not allow users to view an item that is not currently displayed on the screen. Large document sets pose a particular challenge. A collapsible folder structure is unwieldy, especially if users have to follow several nested folders. The constant scrolling, as is required in table structures, is cumbersome and not conducive to efficient and accurate data investigations.
More information is available from ZyLAB at http://www.zylab.com.
Stephen Arnold, November 8, 2009
No joy, no payment. Report this charitable act to the Red Cross.
Recommind and the On Premises versus Hosted Services Options
November 3, 2009
I read “One Quarter of UK Organizations Unnecessarily Outsource All eDisclosure Needs, Recommind Comments” as I was updating my briefing about content processing in 2010. The write up puzzled me. Recommind has positioned itself as a player in enterprise search. In this article, the company apparently conveyed the impression that Recommind is a leader in “information risk”. I also found interesting this assertion:
Recommind believes that organizations should refrain from outsourcing all information management responsibility when it comes to eDisclosure…
Outsourcing is an important part of eDiscovery options offered by Autonomy, Brainware, and dozens of other firms. In fact, hybrid solutions are often needed because the challenges of eDiscovery can be demanding due to time constraints or the volume of information that must be processed in a specific period of time. So “all” strikes me as one of those silly categorical affirmatives that appear to give a service an advantage but serve to point up the logical weakness in some marketing talk.
The one part of the write up I found useful was the link to an eDiscovery survey that contains some useful data. You can find the Fulbright & Jaworski report here.
Stephen Arnold, November 3, 2009
I wish to report to the General Services Administration that this was a freebie.