CyberOSINT banner

Semantics and the Web: The Bacon Has Been Delivered

November 12, 2015

I read and viewed “What Happened to the Semantic Web.” For one thing, the search engine optimization has snagged the idea in order to build interest in search result rankings. The other thing I know if that most people are blissfully unaware of what semantics are supposed to be and how semantics impacts their lives. Many folks are thrilled when their mobile phone points them to a pizza joint or out of an unfamiliar part of town.

The write up explains that for the last 15 years there has been quite a bit of the old rah rah for semantics on the Web. Well, the semantics are there. The big boys like Google and Microsoft are making this happen. If you are interested in triples, POST, and RDF, you can work through the acronyms and get to the main points of the article.

The bulk of the write up is a series of comparative screen shots. I looked at these and tried to replicate a couple of them. I was not able to derive the same level of thrillness which the article expresses. Your mileage may vary.

Here’s the passage I highlighted in a definitely pale shade of green:

As you can see, there is no question that the Web already has a population of HTML documents that include semantically-enriched islands of structured data. This new generation of documents creates a new Web dimension in which links are no longer seen solely as document addresses, but can function as unambiguous names for anything, while also enabling the construction of controlled natural language sentences for encoding and decoding information [data in context] — comprehensible by both humans and machines (bots). The fundamental goal of the Semantic Web Project has already been achieved. Like the initial introduction of the Web, there wasn’t an official release date — it just happened!

I surmise this is the semantic heaven described by Ramanathan Guha and his series of inventions, now almost a decade old. What’s left out is a small point: The semantic technology allows Google and some other folks to create a very interesting suite of databases. Good or bad? I will leave it to you to revel in this semantic fait accompli.

Stephen E Arnold, November 12, 2015

Another Semantic Search Play

November 6, 2015

The University of Washington has been search central for a number of years. Some interesting methods have emerged. From Jeff Dean to Alon Halevy, the UW crowd has been having an impact.

Now another search engine with ties to UW wants to make waves with a semantic search engine. Navigate to “Artificial-Intelligence Institute Launches Free Science Search Engine.” The wizard behind the system is Dr. Oren Etzioni. The money comes from Paul Allen, a co founder of Microsoft.

Dr. Etzioni has been tending vines in the search vineyard for many years. His semantic approach is described this way:

But a search engine unveiled on 2 November by the non-profit Allen Institute for Artificial Intelligence (AI2) in Seattle, Washington, is working towards providing something different for its users: an understanding of a paper’s content. “We’re trying to get deep into the papers and be fast and clean and usable,” says Oren Etzioni, chief executive officer of AI2.

Sound familiar: Understanding what a sci-tech paper means?

According to the write up:

Semantic Scholar offers a few innovative features, including picking out the most important keywords and phrases from the text without relying on an author or publisher to key them in. “It’s surprisingly difficult for a system to do this,” says Etzioni. The search engine uses similar ‘machine reading’ techniques to determine which papers are overviews of a topic. The system can also identify which of a paper’s cited references were truly influential, rather than being included incidentally for background or as a comparison.

Does anyone remember Gene Garfield? I did not think so. There is a nod to Expert System, an outfit which has been slogging semantic technology in an often baffling suite of software since 1989. Yep, that works out to more than a quarter of a century.) Hey, few doubt that semantic hoohah has been a go to buzzword for decades.

There are references to the Microsoft specialist search and some general hand waving. The fact that different search systems must be used for different types of content should raise some questions about the “tuning” required to deliver what the vendor can describe as relevant results. Does anyone remember what Gene Garfield said when he accepted the lifetime achievement award in online? Right, did not think so. The gist was that citation analysis worked. Additional bells and whistles could be helpful. But humans referencing substantive sci-tech antecedents was a very useful indicator of the importance of a paper.

I interpreted Dr. Garfield’s comment as suggesting that semantics could add value if the computational time and costs could be constrained. But in an era of proliferating sci-tech publications, bells and whistles were like chrome trim on a 59 Oldsmobile 98. Lots of flash. Little substance.

My view is that Paul Allen dabbled in semantics with Evri. How did that work out? Ask someone from the Washington Post who was involved with the system.

Worth testing the system in comparative searches against commercial databases like Compendex, ChemAbs, and similar high value commercial databases.

Stephen E Arnold, November 5, 2015

Data Lake and Semantics: Swimming in Waste Water?

November 6, 2015

I read a darned fascinating write up called “Use Semantics to Keep Your Data Lake Clear.” There is a touch of fantasy in the idea of importing heterogeneous “data” into a giant data lake. The result is, in my experience, more like waste water in a pre-treatment plant in Saranda, Albania. Trust me. Distasteful.

Looks really nice, right?

The write up invokes a mid tier consultant and then tosses in the fuzzy word term governance. We are now on semi solid ground, right? I do like the image of a data swap which contrasts nicely with the images from On Golden Pond.

I noted this passage:

Using a semantic data model, you represent the meaning of a data string as binary objects – typically in triplicates made up of two objects and an action. For example, to describe a dog that is playing with a ball, your objects are DOG and BALL, and their relationship is PLAY. In order for the data tool to understand what is happening between these three bits of information, the data model is organized in a linear fashion, with the active object first – in this case, DOG. If the data were structured as BALL, DOG, and PLAY, the assumption would be that the ball was playing with the dog. This simple structure can express very complex ideas and makes it easy to organize information in a data lake and then integrate additional large data stores.


Next I circled:

A semantic data lake is incredibly agile. The architecture quickly adapts to changing business needs, as well as to the frequent addition of new and continually changing data sets. No schemas, lengthy data preparation, or curating is required before analytics work can begin. Data is ingested once and is then usable by any and all analytic applications. Best of all, analysis isn’t impeded by the limitations of pre-selected data sets or pre-formulated questions, which frees users to follow the data trail wherever it may lead them.

Yep, makes perfect sense. But there is one tiny problem. Garbage in, garbage out. Not even modern jargon can solve this decades old computer challenge.

Fantasy is much better than reality.

Stephen E Arnold, November 6, 2015

Whitepaper: Plan for Holiday Sales Now

October 16, 2015

Marketing pros and retailers take note: semantic tech firm ntent offers a free whitepaper to help you make the most of the upcoming holiday season, titled “Step-By-Step Guide to Holiday Campaign Planning.” All they want in return are your web address, contact info, and the chance to offer you a subscription to their newsletter, blog, and updates. (That checkbox is kindly deselected by default.) The whitepaper’s description states:

“Halloween candy and costumes are already overflowing on retail stores shelves. You know what that means, don’t you? It’s time for savvy marketers to get serious about their online retail planning for the impending holidays, if they haven’t already started. Why is it so important to take the time to coordinate a solid holiday campaign? Because according to the National Retail Federation [PDF] the holiday season can account for more than 20–40% of a retailer’s annual sales. And if that alone isn’t enough to motivate you, Internet Retailer reported that online retail sales this year are predicted to reach $349.06 billion a 14.2% YoY increase—start planning now to get your piece of the pie! Position your business for online success, more sales and more joy as you head into 2016 using these easy-to-follow, actionable tips!”

The paper includes descriptions of tactics and best practices, as well as a monthly to-do list and a planning worksheet. Founded in 2010, ntent leverages their unique semantic search technology to help clients quickly find the information they need. The company currently has several positions open at their Carlsbad, California, office.

Cynthia Murrell, October 16, 2015

Sponsored by, publisher of the CyberOSINT monograph

Quote to Note: Halevy after 10 Years Before the Ads

September 23, 2015

If you track innovations at the Alphabet Google thing, you will know that a number of wizards make the outfit hum. One of the big wizards is Dr. Alon Halevy. He is a database guru, has patents, and now an essayist.

Navigate to “A Decade at Google.” The write up does not reference the ad model which makes research possible. Legal dust ups are sidestepped. The management approach and the reorganization are not part of the write up.

I did note an interesting passage, which I flagged as a quote to note:

It is common wisdom that you should not choose a project that a product team is likely to be embarking on in the short term (e.g., up to a year). By the time you’ll get any results, they will have done it already. They might not do it as well as or as elegantly as you can, but that won’t matter at that point.

I interpreted this to underscore Alphabet Google thing’s “good enough” approach to its technology. If you have time, think about the confluence of Dr. Halevy’s research and Dr. Guha’s. The semantic search engine optimization crowd may have a field day.

Stephen E Arnold, September 23, 2015

The Semantic Web Has Arrived

September 20, 2015

Short honk: If you want evidence of the impact of the semantic Web, you will find “What Happened to the Semantic Web?” useful. The author captures 10 examples of the semantic Web in action. I highlighted this passage in the narrative accompanying the screenshots:

there is no question that the Web already has a population of HTML documents that include semantically-enriched islands of structured data. This new generation of documents creates a new Web dimension in which links are no longer seen solely as document addresses, but can function as unambiguous names for anything, while also enabling the construction of controlled natural language sentences for encoding and decoding information [data in context] — comprehensible by both humans and machines (bots).

Structured data will probably play a large part in the new walled gardens now under construction.

The conclusion will thrill the search engine optimization folks who want to decide what is relevant to a user’s query; to wit:

A final note — The live demonstrations in this post demonstrate a fundamental fact: the addition of semantically-rich structured data islands to documents already being published on the Web is what modern SEO (Search Engine Optimization) is all about. Resistance is futile, so just get with the program — fast!

Be happy.

Stephen E Arnold, September 20, 2015

Mondeca Has a Sandbox

September 15, 2015

French semantic tech firm Mondeca has their own research arm, Mondeca Labs. Their website seems to be going for a playful, curiosity-fueled vibe. The intro states:

“Mondeca Labs is our sandbox: we try things out to illustrate the potential of Semantic Web technologies and get feedback from the Semantic Web community. Our credibility in the Semantic Web space is built on our contribution to international standards. Here we are always looking for new challenges.”

The page links to details on several interesting projects. One entry we noticed right away is for an inference engine; they say it is “coming soon,” but a mouse click reveals that no info is available past that hopeful declaration. The site does supply specifics about other projects; some notable examples include linked open vocabularies, a SKOS reader, and a temporal search engine. See their home page, above, for more.

Established in 1999, Mondeca has delivered pragmatic semantic solutions to clients in Europe and North America for over 15 years. The firm is based in Paris, France.

Cynthia Murrell, September 15, 2015

Sponsored by, publisher of the CyberOSINT monograph

Smartlogic Chops at the Gordian Knot of the Semantic Web

September 5, 2015

This semantic Web thing just won’t take a nap. The cheerleaders for the Big Data and analytics revolutions are probably as annoyed as I am. Let’s face it. Semantic was a good buzzword years ago. The problem remains that anything to do with indexing, taxonomies, ontologies, and linguistics lacks sizzle.

If you want analytics, you definitely want predictive analytics. (I agree.Who wants those tired Statistics 101 methods when Kolmogorov-Arnold methods are available. Not me, that’s one of my relatives. I am the dumb Arnold.)

If you want data, you want Big Data. The notion of having large volumes of zeros and ones to process in real time is more exciting than extracting a subset which meets requirements for validity and then doing historical analyses. The real time thing is where it is at.

I read “The Promise of the Semantic Web, Truth of Fiction?” hoping for an epiphany. Failing that high water mark of intellectual insight, I would have been satisfied with a fresh spin on an old idea. No joy.

I read:

Semantic technologies have the capacity to extract meaning from unstructured information found within an enterprise and make them available for processing. Our new Semaphore 4 platform combines the power of semantic technologies with our ontology management, auto classification, and semantic enhancement server to help organizations identify, classify and tag their content in order to use the intelligence within it to manage their business.

The information strikes me as a bit of the old rah rah for a specific product. The system is proprietary. The licensee must perform some work to allow the “platform” to deliver optimum outputs.

What about the answer to the question of a promise as truth or fiction?

The answer is to license a proprietary product. I am okay with that, but when the title of the write up purports to tackle an issue of substance and deflects substantive analysis with a sales pitch, I realize that I am out of step with the modern methods.

Here’s my take on the question about the semantic Web.

Folks, the semantic Web thing is a reality. A number of outfits have been employing semantic methods for years. The semantics, however, are plumbing and out of site. The companies pitching RDF, Owl, and other conventions are following a wave which built, formed, and crashed on the shore years ago.

At this time, next generation information access vendors incorporate linguistic and semantic methods in their plumbing. The particular pipe and joint are not elevated to be the solution. The subsystems and their components are well understood, readily available methods.

As a result, one gets semantics with systems from Diffeo, Recorded Future, and other innovators.

The danger with asking a tough question and then answering it with somewhat stale information is that someone may come along and say, “There are vendors who are advancing the state of the art with innovative solutions.”

That is the main reason that MIT and Google have funded these NGIA (next generation information access) outfits. Innovation is more than asking a question, not answering it, and delivering a sales pitch for a component. Not too useful to me, gentle reader.

I would suggest that the Gordian knot is in the mind of the semantic solution marketer, not the mind of a prospect with a real time content problem for which modern technology enables effective solutions.

Stephen E Arnold, September 5, 2015

The Four Vs Arrive for Semantic Search

September 1, 2015

When I first encountered the four Vs, I thought someone was recycling the mnemonic trick I was taught when I was a wee pre-retirement person.

I associate the four Vs with IBM and Vivisimo. The hook up is probably a consequence of my flawed thought processes. I have a slide in my files showing an illustration of Volume, Velocity, Variety, and Veracity artfully presented as a cartoon.

One of the goslings showed me this image, but I am not sure it is the diagram of which I speak. Here it is, and you can figure out how the four balls, the plugs, and the tough to read cyan type explain Big Data.

These have migrated to semantic search. Now that’s as good a home for these buzzwords as any of the suburban developments in Jargonville.


As applied to semantic search, the four Vs appear to guide the would be enemies of relevance to write a lot, make lots of changes, post content in many forms, and provide “accurate” information.

Good advice.

I assume that somewhat revenue thirst search engine optimization experts will be raking in the dollars and euros explaining these concepts to their clients.

I am still baffled about the connection between IBM, Vivisimo, and Big Data. I will leave semantic search to the SEO mavens and the mid tier consultants whom I associate with hard to read azure colors. I much prefer the hard edge tones of the blue chip folks.

Stephen E Arnold, September 1, 2015

Forbes Bitten by Sci-Fi Bug

September 1, 2015

The article titled Semantic Technology: Building the HAL 9000 Computer on Forbes runs with the gossip from the Smart Data Conference this year. Namely, that semantic technology has finally landed. The article examines several leaders of the field including Maana, Loop AI Labs and Blazegraph. The article mentions,

“Computers still can’t truly understand human language, but they can make sense out of certain aspects of textual content. For example, Lexalytics ( is able to perform sentiment analysis, entity extraction, and ambiguity resolution. Sentiment analysis can determine whether some text – a tweet, say, expresses a positive or negative opinion, and how strong that opinion is. Entity extraction identifies what a paragraph is actually talking about, while ambiguity resolution solves problems like the Paris Hilton one above.”

(The “Paris Hilton problem” referred to is distinguishing between the hotel and the person in semantic search.) In spite of the excitable tone of the article’s title, its conclusion is much more measured. HAL, the sentient computer from 2001: A Space Odyssey, remains in our imaginations. In spite of the exciting work being done, the article reminds us that even Watson, IBM’s supercomputer, is still without the “curiosity or reasoning skills of any two-year-old human.” For the more paranoid among us, this might be good news.

Chelsea Kerwin, September 1, 2015

Sponsored by, publisher of the CyberOSINT monograph

Next Page »