Mondeca Updates Linked Open Vocabularies

June 11, 2012

Mondeca has updated their Linked Open Vocabularies(LOV). LOV’s goal is to help Web vocabulary users and managers access the broad ecosystem of linked open vocabularies in the Linked Data Cloud. The site’s About page explains:

“The vocabularies we are about are the many dialects (RDFS and OWL ontologies) used in the growing linked data Web. . . . Not only does linked data leverage a growing set of vocabularies, but vocabularies themselves rely more and more on each other through reusing, refining or extending, stating equivalences, declaring metadata.

“LOV objective is to provide easy access methods to this ecosystem of vocabularies, and in particular by making explicit the ways they link to each other and providing metrics on how they are used in the linked data cloud, help to improve their understanding, visibility and usability, and overall quality.”

A vocabulary is worthy of inclusion in the LOV dataset if it is expressed in one of the Semantic Web ontology languages (RDFS or some species of OWL); is published and freely available on the Web; is retrievable by content negotiation from its namespace URI; and is small enough easily integrated and re-used, in part or as a whole, by other vocabularies. See this page for more on the LOV dataset and features.

Mondeca is a leading provider of solutions for the management of advanced knowledge structures: ontologies, thesauri, taxonomies, terminologies, metadata repositories, knowledge bases, and linked open data. Their products and services help clients in Europe and North America boost their information retrieval, analysis, and usability. The firm was founded in 1999 and is based in Paris, France.

Cynthia Murrell, June 11, 2012

Sponsored by PolySpot

Autonomy Offers Automatic Classification and Taxonomy Generation

May 7, 2012

Conceptualizing the processes and methods behind the storage and organization of data in our current age ruled by unstructured content and meta-tags can prove overwhelming. We found a great source of information from Autonomy, which explains their offering of Automatic Classification and Taxonomy Generation.

With their eye on functionality, IDOL’s classification solutions help users to circumvent issues that have arisen in a time of exponential data growth.

In addition to Taxonomy Libraries and Automatic Categorization and Channels, the Autonomy Collaborative Classifier is included. Their website clearly delineates how these elements work.

The website states the following information regarding Taxonomy Libraries:

“Built by experienced knowledge engineers using best practices learned through hundreds of consulting engagements, Autonomy taxonomies let organizations rapidly deploy industry-standard taxonomies that can be combined with your corporate taxonomies or easily customized to meet company and industry-specific requirements. Each Autonomy taxonomy is based on industry standards, and built using IDOL’s conceptual analysis that provides the highest level of accuracy.”

There are a variety of taxonomies IDOL consists of ranging from biotechnology to financial services: a comprehensive solution, indeed. Overall, IDOL seems equipped to eradicate the need for time consuming intervention required in the past. But open source alternatives exist and should be considered by procurement teams.

Megan Feil, May 9, 2012

Sponsored by Ikanow

Exclusive Interview: Paul Doscher, President of Lucid Imagination

April 16, 2012

The Search Wizards Speak features Paul Doscher, the new president of Lucid Imagination. Mr. Doscher joined Lucid Imagination in December 2011. He had been president of Dassault Exalead USA prior to assuming the top spot at fast-growing, customer- and community-centric Lucid Imagination.

I spoke with Mr. Doscher when he was working for the Dassault Exalead organization. When he shifted to Lucid Imagination, I spoke with him about his views of open source search. After that brief initial conversation, I met again with Mr. Doscher and probed into his views about the impact open source search is having on traditional for-fee, proprietary search systems.

When I asked about the shift from proprietary search systems to open source search, he told me:

Today organizations need the flexibility to adapt and make changes. A proprietary solution may not permit the licensee to make enhancements. If a change is made, the proprietary search vendor may “own” the fix and will add that innovation to its core product. The licensee who created the fix gets nothing and may have had to pay for the right to innovate. As corporate information technology struggles to keep up with escalating business information demands and an ever increasing mountain of growing content of all types, open source search provides a cost effective and efficient way to develop applications to address the challenges and opportunities in today’s enterprise.

Mr. Doscher has strong views about how licensees of enterprise search systems have learned about costs, the time required to deploy a system, and the effort needed to keep a search system up and running. I asked him about Lucid Imagination’s approach to a search engagement. He said:

Our approach to an engagement is to listen to what our customers need, prepare an action plan, and then deliver. In a sense, our approach is the type of involvement that many software companies have stepped away from. We have an enthusiastic group of engineers and professionals who work with clients to meet their needs.

The full text of the interview appears on the Web site. For more information about Lucid Imagination’s open source search system, you will want to explore the company’s Web site and its blog. In addition, an interview with one of the founders of Lucid Imagination, Marc Krellenstein, and with Eric Gries, a former executive at Lucid Imagination, is available in the Beyond Search archives.
Stephen E Arnold, April 16, 2012

Sponsored by

Taxonomy for Tax Fraud

April 14, 2012

As the buzzword craziness shifts from taxonomy to big data, there is an interesting spin on a taxonomy. Navigate to “Book Cooking Guide”. (The headline alone will alert the MBAs that this is a write up from the Economist newspaper which sure looks like a magazine to me.) The write up presents some tips on fudging the books for the purpose of tax fraud, snookering stakeholders, and other MBA style activities. Here’s the passage I noted:

The IMF has a helpful laundry list of ways to keep sneaky politicians in check. Accounting measures should follow the movement of economic value, not cash, so that delaying pay packets until next year (or retirement) has no effect. Governments should publish net worth, which encompasses assets and liabilities, so taking over pension schemes is less appealing. Budgets should forecast up to 50 years out, so the full effects of policy are clearly seen.

Are economic methods satisfying, even reassuring?

Stephen E Arnold, April 14, 2012

Sponsored by

Iowa Government Gets a Digital Dictionary Provided By Access

April 7, 2012

How did we get by without the invention of the quick search to look up information?  We used to use dictionaries, encyclopedias, and a place called the library.  Access Innovations, Inc. has brought the Iowa Legislature General Assembly into the twenty-first century.

The write-up “Access Innovations, Inc. Creates Taxonomy for Iowa Code, Administrative Code and Acts” tells us the data management industry leader has built a thesaurus that allows the Legislature to search its library of proposed laws, bills, acts, and regulations.  Users can also add their unstructured data to the thesaurus.  Access used their Data Harmony software to provide subscription-based delivery and they built the thesaurus on MAIstro.

“The project differed from typical index and thesaurus creation because the Iowa Legislative Services Agency needed to maintain its existing codes from each back-of-the-book index, rather than starting from scratch and creating new codes.  One reference alone, the Blue Index, included 2,300 index terms.  To create the thesaurus, Access looked at different methods to apply to each term according to the existing references, tied preferred terms to the existing codes, and added related terms to the preferred terms.   The codes covered previous legislation dating as far back as 1953 to legislation through 2010.  Also, the custom taxonomy was built with only four levels in order to meet Iowa Legislative Services’ navigation requirements.  Typically, thesauri are not limited by a specified number of levels.”

The new legal thesaurus makes it much easier to find new laws and their changes instead of having to browse through pages of book.  Access Innovations hopes their project for the Iowa Legislature General Assembly will encourage other government bodies to turn their libraries over to them for indexing.  Not only would that make it easier for politicians and their staff to conduct research, maybe it could improve the political situation in the US.  Making part of a job easier tends to make people happy.

Whitney Grace, April 7, 2012

Sponsored by

SoSlang Crowdsources a Dictionary

March 20, 2012

Here’s a surprising and interesting approach to dictionaries: have users build their own. SoSlang allows anyone to add a slang term and its definition. Beware, though, this site is not for everyone. Entries can be salty. R-rated, even. You’ve been warned.


The site’s About page presents this description:


“So Slang is an un-complicated online slang dictionary which is contributed and edited by thousands of people online just like you. Unlike formal dictionaries, you can add your own meaning to millions of words.

“With more than 6 million definitions, So Slang is the biggest hub for street definitions of each and every word in the dictionary. These definitions are added by people all over the world wide web. If you’d like to add a definition, click here.”


Providing easy-to-understand definitions and lots of examples are emphasized. As users add definitions, though, the old ones are not removed; this means some entries have a long list of conflicting definitions. I suppose that’s the nature of slang, though.


If you are even somewhat easily offended, stay away. However, if you’re boggled by a slang expression you overheard, this may be the place to turn.


Stephen E. Arnold, March 20, 2012


Sponsored by

Ontoprise GmbH: Multiple Issues Says Wikipedia

March 3, 2012

Now Wikipedia is a go-to resource for Google. I heard from one of my colleagues that Wikipedia turns up as the top hit on a surprising number of queries. I don’t trust Wikipedia, but I don’t trust any encyclopedia produced by volunteers including volunteers. Volunteers often participate in a spoofing fiesta.

seo danger transparent

Note: I will be using this symbol when I write about subjects which trigger associations in my mind about use of words, bound phrases, and links to affect how results may be returned from,, and, among other modern Web indexing services either supported by government entities or commercial organizations.

I was updating my list of Overflight companies. We have added five companies to a new Overflight service called, quite imaginatively, Taxonomy Overflight. We have added five firms and are going through the process of figuring out if the outfits are in business or putting on a vaudeville act for paying customers.

The first five companies are:

  1. Millenium
  2. Mondeca
  3. Nuance
  4. Synaptica
  5. Visual Mining
  6. Wand

We will be adding to the Taxonomy Overflight another group of companies on March 4, 2012. I have not yet decided how to “score” each vendor. For enterprise search Overflight, I use a goose method. Click here for an example: Overflight about Autonomy. Three ducks. Darned good.

I wanted to mention one quite interesting finding. We came across a company doing business as Ontoprise. The firm’s Web site is We are checking to see which companies have legitimate Web sites, no matter how sparse.

We noted that the Wikipedia entry for Ontoprise carried this somewhat interesting “warning”:


The gist of this warning is to give me a sense of caution, if not wariness, with regard to this company which offers products which delivered “ontologies.” The company’s research is called “Ontorule”, which has a faintly ominous sound to me. If I look at the naming of products from such firms as Convera before it experienced financial stress, Convera’s product naming was like science fiction but less dogmatic than Ontoprise’s language choice. So I cannot correlate Convera and Ontoprise on other than my personal “semantic”baloney detector. But Convera went south in a rather unexpected business action.

Read more

Working Towards a Budget Friendly Military

March 2, 2012

There is no shortage of military supporters, and articles that applaud these men and women for their sacrifices to protect the United States and its interest. However, according to the WAND Action Center article “WAND and the Military Budget – What We Are Up Against” the United States military desperately needs a budget overhaul. “There is only one way to get the changes WAND believes are necessary: an informed citizenry. U.S. citizens are deeply disturbed about our economic problems, rising inequalities, and the perception that our country is falling behind, yet haven’t made the link between that and the devastating costs of our military.”

WAND believes that being a major military power and protecting other nations is of little importance if we cannot handle our own problems at home. This is a somewhat unusual yet interesting view of the military and politics and the battle lines that some groups have drawn. It seems that some believe even the US Military needs an allowance.

Interesting approach to marketing taxonomies.

April Holmes,March 2, 2012

Sponsored by

Exogenous Complexity 1: Search

January 31, 2012

I am now using the phrase “exogenous complexity” to describe systems, methods, processes, and procedures which are likely to fail due to outside factors. This initial post focuses on indexing, but I will extend the concept to other content centric applications in the future. Disagree with me? Use the comments section of this blog, please.

What is an outside factor?

Let’s think about value adding indexing, content enrichment, or metatagging. The idea is that unstructured text contains entities, facts, bound phrases, and other identifiable entities. A key word search system is mostly blind to the meaning of a number in the form nnn nn nnnn, which in the United States is the pattern for a Social Security Number. There are similar patterns in Federal Express, financial, and other types of sequences. The idea is that a system will recognize these strings and tag them appropriately; for example:

nnn nn nnn Social Security Number

Thus, a query for Social Security Numbers will return a string of digits matching the pattern. The same logic can be applied to certain entities and with the help of a knowledge base, Bayesian numerical recipes, and other techniques such as synonym expansion determine that a query for Obama residence will return White House or a query for the White House will return links to the Obama residence.

One wishes that value added indexing systems were as predictable as a kabuki drama. What vendors of next generation content processing systems participate in is a kabuki which leads to failure two thirds of the time. A tragedy? It depends on whom one asks.

The problem is that companies offering automated solutions to value adding indexing, content enrichment, or metatagging are likely to fail for three reasons:

First, there is the issue of humans who use language in unexpected or what some poets call “fresh” or “metaphoric” methods. English is synthetic in that any string of sounds can be used in quite unexpected ways. Whether it is the use of the name of the fruit “mango” as a code name for software or whether it is the conversion of a noun like information into a verb like informationize which appears in Japanese government English language documents, the automated system may miss the boat. When the boat is missed, continued iterations try to arrive at the correct linkage, but anyone who has used fully automated systems know or who paid attention in math class, the recovery from an initial error can be time consuming and sometimes difficult. Therefore, an automated system—no matter how clever—may find itself fooled by the stream of content flowing through its content processing work flow. The user pays the price because false drops mean more work and suggestions which are not just off the mark, the suggestions are difficult for a human to figure out. You can get the inside dope on why poor suggestions are an issue in Thining, Fast and Slow.

Read more

Taxonomy Meetings: Change in 2011 or a Realization?

January 26, 2012

Editor’s Note: Please see the full version of this article at Marjorie Hlava’s Taxodiary blog.

Where should a taxonomist go to learn about the latest implementations of controlled vocabulary strategies? The meetings we have attended for years are dying on the vine. The SLA Expo was sparse, the Information Today meetings are smaller, Online Information (formerly International Online) was nearly empty, and NFAIS remains the same size each year.

The Internet has made many things possible. We can convene a meeting electronically in a very short time. People have turned increasingly to webinars and web searching. We follow blogs to read opinions and discussions. If we go to a meeting, we are expecting something else. We want to find community.

Selling of the speaking slots has had a deleterious effect on the quality of the meetings. The costs have reached a point where they no longer provide a good return on investment. But more than that, the challenge remains: how do you get a sense of community?
There are several budding online communities, which seem to be flourishing. Taxonomy Community of practice is one; the Taxonomy Division of SLA is another. The rest are in user groups. Access Innovation’s Data Harmony User Group meeting will be held in Albuquerque February 7-9 2012. Come join the community!

Marjorie Hlava. January 26, 2012

Sponsored by

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta