Content Grooming: An Opportunity for Tamr

June 20, 2015

Think back. Vivisimo asserted that it deduplicated and presented federated search results. There are folks at Oracle who have pointed to Outside In and other file conversion products available from the database company as a way to deal with different types of data. There are specialist vendors, which I will not name, who are today touting their software’s ability to turn a basket of data types into well-behaved rows and columns complete with metatags.

Well, not so fast.

Unifying structured and unstructured information is a time consuming, expensive process. The reasons for the obese exception files where objects which cannot be processed go to live out their short, brutish lives.

I read “Tamr Snaps Up $25.2 Million to Unify Enterprise Data.” The stakeholders know, as do I, that unifying disparate types of data is an elephant in any indexing or content analytics conference room. Only the naive believe that software whips heterogeneous data into Napoleonic War parade formations. Today’s software processing tools cannot get undercover police officers to look ship shape for the mayor.

Ergo, an outfit with an aversion to the vowel “e” plans to capture the flag on top of the money pile available for data normalization and information polishing. The write up states:

Tamr can create a central catalogue of all these data sources (and spreadsheets and logs) spread out across the company and give greater visibility into what exactly a company has. This has value on so many levels, but especially on a security level in light of all the recent high-profile breaches. If you do lose something, at least you have a sense of what you lost (unlike with so many breaches).

Tamr is correct. Organizations don’t know what data they have. I could mention a US government agency which does not know what data reside on the server next to another server managed by the same system administrator. But I shall not. The problem is common and it is not confined to bureaucratic blenders in government entities.

Tamr, despite the odd ball spelling, has Michael Stonebraker, a true wizard on the task. The write up mentions an outfit what might be politely described as a “database challenge” as a customer. If Thomson Reuters cannot figure out data after decades of efforts and millions upon millions of investment, believe me when I point out that Tamr may be on to something.

Stephen E Arnold, June 20, 2015

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta