Make Metadata Useful. But What If the Tags Are Lousy?

July 28, 2011

I must be too old and too dense to understand why the noise about metadata gives me a headache. I came across a post or story on the CNBC.com Web site that was half way between a commercial and a rough draft of a automated indexing vendor’s temp file stuffed with drafts created by a clever intern. The post hauled around this weighty title: “EMA and ASG Webinar: 7 Best Practices For Making Metadata Useful”. The first thing I did was look up EMA and ASG because I was unfamiliar with the acronyms.

I learned that EMA represents a firm called Enterprise Management Associates. The company does information technology and data management research, industry analysis, and consulting. Fair enough. I have done some of the fuzzy wuzzy work for a couple of reasonably competent outfits, including the once stellar Booz, Allen & Hamilton and a handful of large, allegedly successful companies.

ASG is an acronym for ASG Software Solutions. The parent company grows via acquisitions just like Progress Software and, more recently, Google. The focus of the company seems to be “the cloud in your hand.” I am okay with a metaphorical description.

I am confused about metadata. Source: http://www.thebusyfool.com/wp-content/uploads/2011/05/Decisions_clipart.jpg

What caught my attention is the focus on metadata, which in my little world, is the domain of people with degrees in library and information science, years of experience in building ANSI standard controlled term lists, and hands on time with automated and human centric indexing, content processing, and related systems. An ANSI standard controlled term list is not management research, industry analysis, consulting, or the “cloud in your hand.” Controlled term lists which make life bearable for a person seeking information are quite difficult work, combining the vision of an architect and the nitty gritty stamina of a Roman legionnaire building a road through Gaul.

Here’s the passage that caught my attention and earned a place in my “quotes to note” folder:

As data grows horizontally across the enterprise, businesses are faced with the urgent need to better define data and create an accurate, transparent and accessible view of their metadata. Metadata management and business glossary are foundational technologies that can help companies achieve this goal. EMA developed seven best practices that guide companies to get the most of their data management. All attendees receive the complimentary White Paper Managing Metadata for Accessibility, Transparency and Accountability authored by Shawn Rogers.

I am not sure what some of these words and phrases mean. For example, “better define data”. My question, “What data?” Next I struggled with “create an accurate, transparent, and accessible view of their metadata.” Now there are commercial systems which allow “views” of controlled term lists. One such vendor is Access Innovations, an outfit which visited me in rural Kentucky to talk about new approaches to indexing certain types of problematic content which is proliferating in organizations. Think in terms of social content without much context other than a “handle”, date, and time even within a buttoned up company.

What do users know? Image source: http://www.computersunplugged.com.au/images/angry-man.gif

Another phrase that caught my limited attention was “metadata management and business glossary are foundational”. Okay, but before one manages, one must do a modest amount of work. Even automated systems benefit from smart algorithms helped with a friendly human crafted training document set or direct intervention by a professional information scientist. Some organizations use commercial controlled term lists to seed the automatic content tagging system. I am all for management, but I don’t think I want to jump from the hard work to “management” without going to the controlled vocabulary gym and doing some push ups. “Business glossary” baffled me and I was not annoyed by what seems to be a high school grammar misstep. Nope. The “business glossary” is a good thing, but it must be constructed to match the language of the users, the corpus, and the accepted terminology. Indexing a document with the term “terminal” is not too helpful unless there is a “field code” that pegs the terminal as one where I find airplanes, trains, death, or computer stuff. A “business glossary” does not appear from thin air,although a “cloud” outfit may have that notion. I know better.

I did a quick Google search for “Shawn Rogers,” author of the white paper. Note: I don’t know what a white paper is. The first hit is to a document which is on what I think is a pay-to-play information service called “b-eye”. The second hit points to a LinkedIn profile. I don’t know if this is “the” Shawn Rogers whom I seek. I learned that he is:

[a professional who] has more than 19 years of hands-on IT experience with a focus on Internet-enabled technology. In 2004 he cofounded the BeyeNETWORK and held the position of Executive Vice President and Editorial Director. Shawn guided the company’s international growth strategy and helped the BeyeNETWORK grow to 18 web sites around the world making it the largest and most read community covering the business intelligence, data warehousing, performance management and data integration space. The BeyeNETWORK was sold to TechTarget in April 2010.

I concluded this was “the” Mr. Rogers I sought and that he or his organization is darned good at search engine optimization type work.

What clicked in my mind was a triple tap of hypotheses:

  1. A couple of services firms have teamed up to cash in on the taxonomy and metadata craze. I thought metadata had come and gone, but obviously these firms are, to use Google’s metaphor, putting more wood behind the metadata thing. So, this is a marketing in order to sell services. As I said, I am okay with that.
  2. These firms have found a way to address the core problem of indexing by people who do not have the faintest idea of what’s involved in metatagging that helps users. One hopes.
  3. The two companies are not sure what the outcome of the webinar and the white paper distribution will be. In short, this is a fishing trip or an exploration of the paths on an island owned by a cruise company. There’s not much at risk.

Okay, enough.

Here’s my view on metadata.

First, most organizations have zero editorial policy and zero willingness to do the hard work required to dedupe, normalize, and tag content in a way that allows a user to find a particular item without sticky notes, making phone calls, or clicking and scanning stuff for the needed items. I think vendors promise the sun and moon and deliver gravel. Don’t agree? Use the comments section, please. Don’t call me.

Second, most of the vendors who offer industrial strength indexing and content processing systems know what needs to be done to make content findable. But the licensees often want a silver bullet. So the vendors remain silent on certain key points such as the Roman legionnaire working in the snow part. The cost part is often pushed to the margin as well.

Third, the information technology professionals “know” best. Not surprisingly most content access in organizations is a pretty lukewarm activity. I received an email last week chastising me for pointing out that more than half of an organization’s search system users were dissatisfied with whatever system the company made available. Hey, I just report the facts. I know how to find information in my organization.

Fourth, no one pays real attention to the user of a system. The top brass, the IT experts, and the vendors talk about the users. The users don’t know anything and whatever input those folks provide is not germane to the smarties. Little wonder that in some organizations systems are just worked around. Tells range from a Google search appliance in marketing to sticky notes on monitors.

Will I attend the webinar? Nah. I don’t do webinars. Do I want to change the world and make every organization have a super duper controlled term list and findable content? Nah. Don’t care. Do I want outfits like CNBC to do a tiny bit of content curation before posting unusual write ups with possible grammatical errors? You bet.  What if those metadata and other tags are uncontrolled, improperly applied, and mismatched to the lingo? Status quo, I assert.

Enjoy the webinar. Good luck with your metadata and the “cloud in your hands” approach. Back to the goose pond. Honk.

Stephen E Arnold, July 29, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search, which is not a white paper and it is not free. But at $20, such a deal.

Comments

One Response to “Make Metadata Useful. But What If the Tags Are Lousy?”

  1. Marjorie Hlava on July 28th, 2011 12:57 pm

    It seems that yet another company has decided to test the waters. What they are actually doing is making them muddy. Without learning much about the industry, how metadata and controlled vocabularies work and why they should be applied they listen to the buzz and decide instead to try to see it there is a “market” that they can fit into. Another carpetbagger, no background but a good slick sales pitch, enters the business with a couple of days training at seminars and reading a techical report. Historically this was done in our business by a bunch of eager young PhD’s who developed a cool algorithim, got the sheepskin and then worked with the university sponsors to line up investment captial or government funding to create a business. The algorithim however was never tried on a large data set, never tested with a user community and after three years the business fails. In the meantime the promises made muddy the waters. Customers feel like the silver bulet was nearly thiers.

  • Archives

  • Recent Posts

  • Meta