CyberOSINT banner

Taxonomy Turmoil: Good Enough May Be Too Much

February 28, 2015

For years, I have posted a public indexing Overflight. You can examine the selected outputs at this Overflight link. (My non public system is more robust, but the public service is a useful temperature gauge for a slice of the content processing sector.)

When it comes to indexing, most vendors provide keyword, concept tagging, and entity extraction. But are these tags spot on? No, most are good enough.


A happy quack to Jackson Taylor for this “good enough” cartoon. The salesman makes it clear that good enough is indeed good enough in today’s marketing enabled world.

I chose about 50 companies that asserted their systems performed some type of indexing or taxonomy function. I learned that the taxonomy business is “about to explode.” I find that to be either an interesting investment tip or a statement that is characteristic of content processing optimists.

Like search and retrieval, plugging in “concepts” or other index terms is a utility function. For example, if one indexes each word in an article appearing in this blog, the article might be about another subject. For example, in this post, I am talking about Overflight, but the real topic is the broader use of metadata in information retrieval systems. I could assign the term “faceted navigation” to this article as a way to mark this article as germane to point and click navigation systems.

If you examine the “reports” Overflight outputs for each of the companies, you will discover several interesting things as I did on February 28, 2015 when I assembled this short article.

  1. Mergers or buying failed vendors at fire sale prices are taking places. Examples include Lucidea’s purchase of Cuadra and InMagic. Both of these firms are anchored in traditional indexing methods and seemed to be within a revenue envelope until their sell out. Business Objects acquired Inxight and then SAP acquired Business Objects. Bouvet acquired Ontopia. Teradata acquired Revelytix
  2. Moving indexing into open source. Thomson Reuters acquired ClearForest and made most of the technology available as OpenCalais. OpenText, a rollup outfit, acquired Nstein. SAS acquired Teragram. Smartlogic acquired Schemalogic. (A free report about Schemalogic is available at
  3. A number of companies just failed, shut down, or went quiet. These include Active Classification, Arikus, Arity, Forth ICA, MaxThink, Millennium Engineering, Navigo, Progris, Protege,, Questans, Quiver, Reuse Company, Sandpiper,
  4. The indexing sector includes a number of companies my non public system monitors; for example, the little known Data Harmony with six figure revenues after decades of selling really hard to traditional publishers. Conclusion: Indexing is a tough business to keep afloat.

There are numerous vendors who assert their systems perform indexing, entity, and metadata extraction. More than 18 of these companies are profiled in CyberOSINT, my new monograph. Oracle owns Triple Hop, RightNow, and Endeca. Each of these acquired companies performs indexing and metadata operations. Even the mashed potatoes search solution from Microsoft includes indexing tools. The proprietary XML data management vendor MarkLogic asserts that it performs indexing operations on content stored in its repository. Conclusion: More cyber oriented firms are likely to capture the juicy deals.

So what’s going on in the world of taxonomies? Several observations strike me as warranted:

First, none of the taxonomy vendors are huge outfits. I suppose one could argue that IBM’s Lucene based system is a billion dollar baby, but that’s marketing peyote, not reality. Perhaps MarkLogic which is struggling toward $100 million in revenue is the largest of this group. But the majority of the companies in the indexing business are small. Think in terms of a few hundred thousand in annual revenue to $10 million with generous accounting assumptions.

What’s clear to me is that indexing, like search, is a utility function. If a good enough search system delivers good enough indexing, then why spend for humans to slog through the content and make human judgments. Why not let Google funded Recorded Future identify entities, assign geo codes, and extract meaningful signals? Why not rely on Haystax or RedOwl or any one of more agile firms to deliver higher value operations.

I would assert that taxonomies and indexing are important to those who desire the accuracy of a human indexed system. This assumes that the humans are subject matter specialists, the humans are not fatigued, and the humans can keep pace with the flow of changed and new content.

The reality is that companies focused on delivering old school solutions to today’s problems are likely to lose contracts to companies that deliver what the customer perceives as a higher value content processing solution.

What can a taxonomy company do to ignite its engines of growth? Based on the research we performed for CyberOSINT, the future belongs to those who embrace automated collection, analysis, and output methods. Users may, if the user so chooses, provide guidance to the system. But the days of yore, when monks with varying degrees of accuracy created catalog sheets for the scriptoria have been washed to the margin of the data stream by today’s content flows.

What’s this mean for the folks who continue to pump money into taxonomy centric companies? Unless the cyber OSINT drum beat is heeded, the failure rate of the Overflight sample is a wake up call.

Buying Apple bonds might be a more prudent financial choice. On the other hand, there is an opportunity for taxonomy executives to become “experts” in content processing.

Stephen E Arnold, February 28, 2015

DBpedia Makes Wikipedia Part Of The Semantic Web

November 21, 2014 posted an article called “Retrieving And Using Taxonomy Data From DBpedia” with an interesting introduction. It explains that DBpedia is a crowd-sourced Internet community whose entire goal is to extract structured information from Wikipedia and share it. The introduction continues that DBpedia already has over three billion facts W3C standard RDF data model ready for application use.

The W3C standards are already written using the SKOS vocabulary, primarily used by the New York Times, the Library of Congress, and other organizations for their own taxonomies and subject headers. Users can extrapolate the data and implement it in their own RDF applications with the goal of giving your data more value.

DBpedia is doing a wonderful service for users so they do not have to rely on proprietary software to deliver them rich taxonomies. The taxonomies can be retrieved under the open source community bylaws and gain instant improvement for content. There is one caveat:

“Remember that, for better or worse, the data is based on Wikipedia data. If you extend the structure of the query above to retrieve lower, more specific levels of horror film categories, you’d probably find the work of film scholars who’ve done serious research as well as the work of nutty people who are a little too into their favorite subgenres.”

Remember Wikipedia is a good reference tool to gain an understanding of a topic, but you still need to check more verifiable resources for hard facts.

Whitney Grace, November 21, 2014
Sponsored by, developer of Augmentext

Wave Your WAND for a New Taxonomy Portal

May 2, 2014

If a library is in need of a taxonomy, most of the time all they need to do it wave a magic wand and its taxonomy wish is granted. Actually, they become a WAND, Inc. client, the world’s leading taxonomy provider. According to the WAND Inc. blog, the company has launched a new endeavor: “WAND Announces Launch Of New Taxonomy Portal.” The WAND Taxonomy Library Portal helps companies develop a taxonomy strategy that is integral for enterprise management strategy.

“According to Mark Leher, WAND’s COO, ‘The amount of unstructured information and data inside organizations continues to explode.  Companies need a taxonomy strategy to organize information and make it easily accessible to enterprise information workers.  The WAND Taxonomy Library Portal is a valuable resource that provides the foundation for a corporate taxonomy strategy.’ “

WAND Taxonomy Library Portal subscribers receive access to all of WAND’s taxonomies. They cover a range of topics, including insurance, medical equipment and supplies, travel, personal care, human resources, and many more. The portal is designed to help companies get the highest return investment on management applications.

“Leher continued, ‘What most people don’t realize is that there are more than 150 common enterprise information management applications that are designed to leverage taxonomy.  We estimate that most large organizations have already invested in 10-20 of those applications.  At WAND, our goal is to provide taxonomies that make those applications more effective and increase the return on investment.’ “

Taxonomies are lists of terms. It is hard to imagine that term lists are integral part of using a management applications, but they are important to identifying content and building a reference framework.

Whitney Grace, May 02, 2014
Sponsored by, developer of Augmentext

Taxonomy Round-Up Includes Variety of Taxonomy Discussions

January 27, 2014

The article on Synaptica Central titled End of 2013 Round Up of Taxonomy Blogs, Part 1 is exactly what it sounds like, an end of the year look at taxonomy in terms of articles, blog posts, videos and more gathered from such diverse areas of the internet as Pinterest, Twitter, StackExchange and Youtube, “Taxonomy as it relates to Drupal,” and posts on augmenting a taxonomy.

The author explains:

“It’s that time of year, folks, when it seems like everyone is publishing some kind of year end list or “round up” of the years news highlights, blogsand blog posts, photos, videos, and more. So why not do the same here at Synaptica? To keep things manageable, listed here are some 2013 blogs and blog entries, culled from almost a thousand, that address taxonomy in one way or another.”

Overall the article presents an interesting list of important taxonomy blog entries. Many touch on Bloom, such as the article How to Use Pinterest with Bloom’s Taxonomy Infographic and Revised Bloom’s Taxonomy and the Need for Higher Order Thinking. Another helpful highlight is Twitter Aligned with Bloom’s Taxonomy for Your Students. Whether you are looking for guidance in implementing or augmenting one, or just interested in a dialogue on the subject, this list is certainly a perfect starting point and should direct you toward discussions and communities galore.

Chelsea Kerwin, January 27, 2014

Sponsored by, developer of Augmentext

Another Content Management Company Another Day

August 12, 2013

Content management companies are springing up and gaining attention due to the Big Data boom. One of the companies that our content wranglers pulled out of an Internet Search is Applied Relevance. They specialize in several aspects of the content management spectrum, but the company’s Web site prominently promotes its taxonomy services. Applied Relevance offers the AR-Classifer tagging engine that can run on a variety of platforms. Its AR-Semantics is the flagship organization and categorization software, while the AR-Taxonomy is the tool needed to edit and manage taxonomies and if you want to search your taxonomies the AR-Navigator is available.

All this talk about Applied Relevance’s taxonomy software is informative, but what is interesting is the company’s description on the main page:

“Applied Relevance produces software and services to help enterprise users find the information they need. Our solutions augment traditional search engines by providing context for the search results. The AR toolset and our partners provide cost effective technology for the full spectrum of enterprise content management and search applications. With our tools, a search term and a few clicks, users can zero-in past ambiguities and come up with the right answer in the right context. Applied Relevance is located on the west coast of the east coast of North America.”

Descriptive, but not a word on taxonomy or what exactly the company specifically does. The tagline at the end about Applied Relevance’s location is even more ambiguous.

Whitney Grace, August 12, 2013

Sponsored by, developer of Beyond Search

Spotlight on Access Innovations Living Up to Name

May 14, 2013

New York-based print and digital educational content company, Triumph Learning, has struck up a partnership with taxonomy development leader Access Innovations, Inc. Together, they will be creating a new taxonomy designed to align standards-based instructional content for the k-1 education market. The news release, “Triumph Learning Partners with Access Innovations on Common Core Standards-Integrated Taxonomy,” explains more.

Content management can be a difficult challenge for companies like Triumph learning but Access Innovations facilitated a more efficient management system by developing and building taxonomy out of a structured vocabulary for math and English.

We learned about how the Common Core State Standards apply:

The Common Core State Standards provide concepts and terminology that Triumph Learning writers and editors can use to link pieces of content such as instruction and practice activities, as well as other supplemental material, to corresponding grade-level standards. ‘By using Access Innovations expertise we will be able to properly align our content for both teachers and students,’ said Aoife Dempsey, Chief Technology Officer at Triumph Learning.

For a company that has been around since 1978, Access Innovations truly lives up to their name. Their database and taxonomy creation capabilities and semantic integration technology stand out among others and it looks like their spotlight will continue to shine — especially now that they are involved in bolstering educational reform on a national level.

Megan Feil, May 14, 2013

Sponsored by, developer of Beyond Search

Taxonomy or Ontology

April 1, 2013

The WAND blog article “Common Taxonomy Questions: What is the difference between a Taxonomy and an Ontology?” attempts to clear up the misconception between taxonomy and ontology. The article details that these are two common words in the information management world but many people do not truly understand the difference.

“Taxonomy is a collection of terms that are connected by broader term, narrower term, related term, and synonym relationships.”

The article makes an interesting comparison. Taxonomy is a tree that has a parent/child relationship with terms and it usually covers a specific subject area. Taxonomies can be a valuable tool when adding structure/content to unstructured information, which makes the information more easily searchable. Multiple taxonomies can be used together as filters to help make the search experience more powerful and exact. Popular sites such as Amazon and Costco use this tool on their sites. When it comes to ontologies the author makes an interesting comparison.

“Ontologies can be thought of more like a web, with many different types of relationships between all concepts. Ontologies can have infinite number of relationships between concepts and it is easier to create relationships between concepts across different subject domains.”

Ontologies are handy for those who want a more sophisticated information model that could be valuable when doing advanced natural language processing or text analytics. Though the name of the system is WAND Product and Service Taxonomy believe it or not it is also an ontology. The blog provides a good distinction between ontology and taxonomy but then says that the WAND system is actually both, which makes one wonder how do you really distinguish the two. Looks like more questions than answers. Here we go again.

April Holmes, April 01, 2013

Sponsored by, developer of Augmentext

A Cause for Celebration

April 1, 2013

The best way to celebrate the successful completion of a project is with a celebration and no celebration is complete without a cake. Synaptica definitely knows how to throw a celebration party. According to the Synaptica Central piece “Elsevier Celebrates New Installation” Synaptica and Elsevier recently celebrated the successful completion of their software development project with a tasty cake.

“It is a pleasure when one of our customers has a specially decorated cake made to celebrate the successful deployment of their customized Synaptica taxonomy management software. The project, completed this month, was a collaboration between Synaptica and the content management team at Elsevier, Netherlands.”

Elsevier got its start with journal and book publishing but is also known for providing scientific, technical and medical information as well as various other products. Synaptica was started in 1995 and is owned by Trish Yancey and Dave Clarke. They are an industry leader in the taxonomy management and ontology software. Their software give users several key benefits such as increased relevance thanks to a synonym-rich indexing vocabulary and the ability to visualize taxonomies in a variety of both textual and graphical formats. Synaptica software can work in the enterprise world and has been integrated with several different third-party applications. In addition Synaptica is user friendly and can be set up in only a matter of minutes. Synaptica taxonomy software is used by a variety of organizations when it comes to their metadata management and information access applications. The company even received the “100 Companies that Matter” award. Looks like they definitely have a reason to celebrate.


April Holmes, April 01, 2013

Sponsored by, developer of Augmentext

DataFacet Video

February 15, 2013

DataFacet’s stream of news slowed in late 2012. The outfit seems to be quiet; what’s going on over there? While we wait for their next move, check out the interesting video on the DataFacet Web site, which effectively introduces their product. It begins with a good explanation of “taxonomy,” which might be useful to bookmark in case you need to define the term for someone unfamiliar with the field. The video goes on to show someone using parts of the DataFacet system, which gives a much better idea of what it does than any text explanation could. It’s set to a catchy tune, too.

The product description surrounding the video specifies:

DataFacet provides a taxonomy based data model for your enterprise’s unstructured information along with a sophisticated, yet easy to use, set of tools for applying the data model to your content.

It’s an easy three step process:

  1. Choose your foundation taxonomies from the DataFacet library of over 500 topic domains
  2. Customize your taxonomy with DataFacet Taxonomy Manager
  3. Tag your content with DataFacet Taxonomy Server

DataFacet is already available for the following search and content environments:

DataFacet is actually a joint project, built by taxonomists from WAND and Applied Relevance. Based in Denver, Colorado, WAND has been developing structured multi-lingual vocabularies since 1998. Their taxonomies have been put to good use in online search systems, ad-matching engines, B2B directories, product searches, and within enterprise search engines.

Applied Relevance offers automated tagging to help organizations contextualize their unstructured data. They have designed their user interface using cross-platform JavaScript and HTML5, which gives their application the flexibility to run in a browser, be embedded in a Web page, or be hosted in an Adobe Air desktop application.

Cynthia Murrell, February 15, 2013

Sponsored by, developer of Augmentext

WAND Partnership with Concept Searching Unveiled

January 2, 2013

A new partnership was revealed for WAND Inc recently. WAND is a developer of structured multi-lingual vocabularies. Digital Journal covered the story about the strategic partnership with Concept Searching in their article, “Concept Searching Selected as Founding Strategic Partner in the WAND Within Partnership Program.”

Concept Searching offers automatic semantic metadata generation, auto-classification, taxonomy management software. Because they have met the requirements from the new WAND Within program, they have been named one of the founding partners.

Additionally, smartStructures has emerged as a marketing collaboration between Concept Searching’s advanced technology platform, WAND Foundation Taxonomies, and industry expertise. These vertically aligned solutions will be available only from Concept Searching directly or from a set of certified partners.

The article offers more insight into the history of WAND:

“The WAND taxonomies have been used for the last fourteen years by organizations that want to benefit from industry and business function specific Foundation Taxonomies, to accelerate taxonomy development and management. The WAND Within™ partnership program is designed specifically for industry leading vendors in the taxonomy market, who add superior technology value to provide clients with powerful solutions to manage unstructured content.”

This sticks out as an interesting tie-up and therefore one that we will keep our eyes on since these companies are positioning themselves such a way that looks as if it could be meaningful.

Megan Feil, January 02, 2013

Sponsored by, developer of Augmentext

Next Page »