The Semantic Web as it Stands

April 16, 2011

Semantic search for the enterprise is here, but the semantic web remains  the elusive holy grail.  “Semantic Web:  Tools you can use” gives an overview of the existing state of semantic technology and what is needed to get it off the ground as a true semantic web technology.

Tim Berners-Lee was the first one to articulate what the semantic web would be like, and his vision of federated search is still sorely missing from reality.  Federated search searches several disparate resources simultaneously (like when you search several different library databases at once).  Windows 7 supports federated search, but it is still not common throughout the web.  The W3C (World Wide Web Consortium) has developed standards to support semantic web infrastructure, including SPARQL, RDF, and OWL, and Google, Yahoo and Bing are starting to use semantic metadata and support W3C standards like RDF.

Semantic software is able to analyze and describe the meaning of data objects and their inter-relationships, while resolving language ambiguities such as homonyms or synonyms, as long as standards are followed.  This has practical applications with things like shopping comparisons.  If standards are followed and semantic metadata provided by the merchants themselves, online shoppers can compare products without all the inaccuracies and out-of-date information currently plaguing third-party shopping comparison sites.
There are some tools, platforms, prewritten components, and services currently available to make semantic deployment easier and somewhat less expensive.  Jena is an open-source Java framework for building semantic Web applications, and Sesame, is an open-source framework for storing, inferencing and querying RDF data.  Lexalytics produces a semantic platform that contains general ontologies that can then be fine-tuned by service provider partners for specific business domains and applications.  Revelytix sells a knowledge-modeling tool called, a wiki-based framework that helps a wide variety of types of users to collaboratively develop a semantic vocabulary for domain-specific information residing on different web sites.  Sinequa’s semantic platform, Context Engine, provides semantic infrastructure that includes a generic semantic dictionary that can translate between various languages and can also be customized with business-specific terms.  Thomson Reuters provides Machine Readable News which collects and analyzes analyzes and scores online news for sentiment (public opinion), relevance, and novelty and OpenCalais, which creates open metadata for submitted content.

Despite all these advances for the use of the semantic web in the enterprise, general, widespread use of the semantic web remains elusive, and no one can predict exactly when that will change:

“In a 2010 Pew Research survey of about 895 semantic technology experts and stakeholders, 47% of the respondents agreed that Berners-Lee’s vision of a semantic Web won’t be realized or make a significant difference to end users by the year 2020. On the other hand, 41% of those polled predicted that it would. The remainder did not answer that query.”

Semantic technology for the enterprise is not only here today, but is growing by about 20% a year according to IDC.  That kind of semantic technology is a much smaller beast to tame.  When it comes to the World Wide Wide, there is still not widespread support of W3C standards and common vocabularies, which is why more people said no than yes in the survey mentioned above.  Generalized web searches are difficult because each site has its own largely proprietary ontology instead of a shared and open taxonomy.
Sometimes even within an enterprise it is difficult to overcome differences in different sectors of the same business.

However, certain industries are starting to come under pressure from customers or industry and have responded by creating standardized ontologies.  GoodRelations is one such e-commerce ontology used by,, and Google.  This kind of technique has not become widespread because of the costs and slow payoff involved.  This is a catch-22 where businesses don’t want to jump on the bandwagon because there is not a critical mass yet, but the real benefits won’t start until there is a large number of businesses participating.  Things like product categories are often unique to a business and getting some kind of universal standardization is akin to a nightmare, but there still needs to be consensus on using some type of W3C standards of categorization to satisfy customers.  And, with more an more bogus information proliferating on the web, semantics become not only convenient, but essential for finding the right information.

I think the fundamental question that this article leaves us with is whether or not we have the standards we need or whether the current standards are the stepping off point to something new.  SGML was fine in its day, but it didn’t get very far.  HTML cherrypicked some of the basic ideas of SGML and added linking and the World Wide Web was born.  Now HTML 5 is re-introducing some of the ideas of SGML that were lost.  Maybe HTML can continue to evolve, or maybe someone will cherrypick its best ideas and create something (almost) entirely new.  Another issue is all the work that it takes to create all the metadata, no matter what the standards.  Flickr and Facebook have made user tagging into a fun activity, but for the semantic web to really function, machines need to do do most of the work.  Will this all be figured out by 2020?  Survey says no, but who knows?

Alice Wasielewski
