CyberOSINT banner

Quote to Note: Halevy after 10 Years Before the Ads

September 23, 2015

If you track innovations at the Alphabet Google thing, you will know that a number of wizards make the outfit hum. One of the big wizards is Dr. Alon Halevy. He is a database guru, has patents, and now an essayist.

Navigate to “A Decade at Google.” The write up does not reference the ad model which makes research possible. Legal dust ups are sidestepped. The management approach and the reorganization are not part of the write up.

I did note an interesting passage, which I flagged as a quote to note:

It is common wisdom that you should not choose a project that a product team is likely to be embarking on in the short term (e.g., up to a year). By the time you’ll get any results, they will have done it already. They might not do it as well as or as elegantly as you can, but that won’t matter at that point.

I interpreted this to underscore Alphabet Google thing’s “good enough” approach to its technology. If you have time, think about the confluence of Dr. Halevy’s research and Dr. Guha’s. The semantic search engine optimization crowd may have a field day.

Stephen E Arnold, September 23, 2015

The Semantic Web Has Arrived

September 20, 2015

Short honk: If you want evidence of the impact of the semantic Web, you will find “What Happened to the Semantic Web?” useful. The author captures 10 examples of the semantic Web in action. I highlighted this passage in the narrative accompanying the screenshots:

there is no question that the Web already has a population of HTML documents that include semantically-enriched islands of structured data. This new generation of documents creates a new Web dimension in which links are no longer seen solely as document addresses, but can function as unambiguous names for anything, while also enabling the construction of controlled natural language sentences for encoding and decoding information [data in context] — comprehensible by both humans and machines (bots).

Structured data will probably play a large part in the new walled gardens now under construction.

The conclusion will thrill the search engine optimization folks who want to decide what is relevant to a user’s query; to wit:

A final note — The live demonstrations in this post demonstrate a fundamental fact: the addition of semantically-rich structured data islands to documents already being published on the Web is what modern SEO (Search Engine Optimization) is all about. Resistance is futile, so just get with the program — fast!

Be happy.

Stephen E Arnold, September 20, 2015

Mondeca Has a Sandbox

September 15, 2015

French semantic tech firm Mondeca has their own research arm, Mondeca Labs. Their website seems to be going for a playful, curiosity-fueled vibe. The intro states:

“Mondeca Labs is our sandbox: we try things out to illustrate the potential of Semantic Web technologies and get feedback from the Semantic Web community. Our credibility in the Semantic Web space is built on our contribution to international standards. Here we are always looking for new challenges.”

The page links to details on several interesting projects. One entry we noticed right away is for an inference engine; they say it is “coming soon,” but a mouse click reveals that no info is available past that hopeful declaration. The site does supply specifics about other projects; some notable examples include linked open vocabularies, a SKOS reader, and a temporal search engine. See their home page, above, for more.

Established in 1999, Mondeca has delivered pragmatic semantic solutions to clients in Europe and North America for over 15 years. The firm is based in Paris, France.

Cynthia Murrell, September 15, 2015

Sponsored by, publisher of the CyberOSINT monograph

Smartlogic Chops at the Gordian Knot of the Semantic Web

September 5, 2015

This semantic Web thing just won’t take a nap. The cheerleaders for the Big Data and analytics revolutions are probably as annoyed as I am. Let’s face it. Semantic was a good buzzword years ago. The problem remains that anything to do with indexing, taxonomies, ontologies, and linguistics lacks sizzle.

If you want analytics, you definitely want predictive analytics. (I agree.Who wants those tired Statistics 101 methods when Kolmogorov-Arnold methods are available. Not me, that’s one of my relatives. I am the dumb Arnold.)

If you want data, you want Big Data. The notion of having large volumes of zeros and ones to process in real time is more exciting than extracting a subset which meets requirements for validity and then doing historical analyses. The real time thing is where it is at.

I read “The Promise of the Semantic Web, Truth of Fiction?” hoping for an epiphany. Failing that high water mark of intellectual insight, I would have been satisfied with a fresh spin on an old idea. No joy.

I read:

Semantic technologies have the capacity to extract meaning from unstructured information found within an enterprise and make them available for processing. Our new Semaphore 4 platform combines the power of semantic technologies with our ontology management, auto classification, and semantic enhancement server to help organizations identify, classify and tag their content in order to use the intelligence within it to manage their business.

The information strikes me as a bit of the old rah rah for a specific product. The system is proprietary. The licensee must perform some work to allow the “platform” to deliver optimum outputs.

What about the answer to the question of a promise as truth or fiction?

The answer is to license a proprietary product. I am okay with that, but when the title of the write up purports to tackle an issue of substance and deflects substantive analysis with a sales pitch, I realize that I am out of step with the modern methods.

Here’s my take on the question about the semantic Web.

Folks, the semantic Web thing is a reality. A number of outfits have been employing semantic methods for years. The semantics, however, are plumbing and out of site. The companies pitching RDF, Owl, and other conventions are following a wave which built, formed, and crashed on the shore years ago.

At this time, next generation information access vendors incorporate linguistic and semantic methods in their plumbing. The particular pipe and joint are not elevated to be the solution. The subsystems and their components are well understood, readily available methods.

As a result, one gets semantics with systems from Diffeo, Recorded Future, and other innovators.

The danger with asking a tough question and then answering it with somewhat stale information is that someone may come along and say, “There are vendors who are advancing the state of the art with innovative solutions.”

That is the main reason that MIT and Google have funded these NGIA (next generation information access) outfits. Innovation is more than asking a question, not answering it, and delivering a sales pitch for a component. Not too useful to me, gentle reader.

I would suggest that the Gordian knot is in the mind of the semantic solution marketer, not the mind of a prospect with a real time content problem for which modern technology enables effective solutions.

Stephen E Arnold, September 5, 2015

The Four Vs Arrive for Semantic Search

September 1, 2015

When I first encountered the four Vs, I thought someone was recycling the mnemonic trick I was taught when I was a wee pre-retirement person.

I associate the four Vs with IBM and Vivisimo. The hook up is probably a consequence of my flawed thought processes. I have a slide in my files showing an illustration of Volume, Velocity, Variety, and Veracity artfully presented as a cartoon.

One of the goslings showed me this image, but I am not sure it is the diagram of which I speak. Here it is, and you can figure out how the four balls, the plugs, and the tough to read cyan type explain Big Data.

These have migrated to semantic search. Now that’s as good a home for these buzzwords as any of the suburban developments in Jargonville.


As applied to semantic search, the four Vs appear to guide the would be enemies of relevance to write a lot, make lots of changes, post content in many forms, and provide “accurate” information.

Good advice.

I assume that somewhat revenue thirst search engine optimization experts will be raking in the dollars and euros explaining these concepts to their clients.

I am still baffled about the connection between IBM, Vivisimo, and Big Data. I will leave semantic search to the SEO mavens and the mid tier consultants whom I associate with hard to read azure colors. I much prefer the hard edge tones of the blue chip folks.

Stephen E Arnold, September 1, 2015

Forbes Bitten by Sci-Fi Bug

September 1, 2015

The article titled Semantic Technology: Building the HAL 9000 Computer on Forbes runs with the gossip from the Smart Data Conference this year. Namely, that semantic technology has finally landed. The article examines several leaders of the field including Maana, Loop AI Labs and Blazegraph. The article mentions,

“Computers still can’t truly understand human language, but they can make sense out of certain aspects of textual content. For example, Lexalytics ( is able to perform sentiment analysis, entity extraction, and ambiguity resolution. Sentiment analysis can determine whether some text – a tweet, say, expresses a positive or negative opinion, and how strong that opinion is. Entity extraction identifies what a paragraph is actually talking about, while ambiguity resolution solves problems like the Paris Hilton one above.”

(The “Paris Hilton problem” referred to is distinguishing between the hotel and the person in semantic search.) In spite of the excitable tone of the article’s title, its conclusion is much more measured. HAL, the sentient computer from 2001: A Space Odyssey, remains in our imaginations. In spite of the exciting work being done, the article reminds us that even Watson, IBM’s supercomputer, is still without the “curiosity or reasoning skills of any two-year-old human.” For the more paranoid among us, this might be good news.

Chelsea Kerwin, September 1, 2015

Sponsored by, publisher of the CyberOSINT monograph

Jargon Overload: MoSCoW

August 28, 2015

Vladimir Putin is probably confused. My hunch is that when he hears “Moscow” uttered, he thinks about a lovely city, its courteous drivers, its delightful social groupings with idiosyncratic tattoos, and outstanding Moskva stile borshch.

Gentle reader, Mr. Putin would be off base.

MoSCoW, according to “Bats, Dolphins, and Semantic Search,” means Must, Should, Could, and Would. The application of these parental verb structures is to search engine optimization.


Please, take out the garbage and straighten your room, kiddies. MoSCoW now.

No, I don’t understand this, but you may want to check out the presentation. You may need to register for LinkedIn/Slideshare. I am never sure what I do to access the knowledge jewelry on this site. Here’s the link to try.

I am not into the parental thing. Click if you want. If not, no biggie.

Stephen E Arnold, August 28, 2015

Yahoo: Semantic Search Is the Future

August 16, 2015

I love it when Yahoo explains the future of search. The Xoogler has done the revisionism thing and shifted from Yahoo as a directory built by silly humanoids to a leader in search. Please, do not remember that Yahoo bought Inktomi in 2002 and then rolled out a wild and crazy search system in cahoots with IBM in 2006. (By the way, that search solution brought my IBM multi cpu, DASD equipped, RAM stuffed server to its knees. At least, the “free” software installed.)


Now to business: I read “The Future of Search Relies on Semantic Technologies.” For me, semantic technologies have been part of search for many years. But never mind reality. Let’s get to the Reddi-wip in the Yahoo confection.

Yahoo asserts:

Search companies are thus investing in information extraction and data fusion, as well as more and more advanced question-answering capabilities on top of the collected information. The need for these technologies is only increasing with mobile search, where providing results as ten blue links leads to a very poor user experience.

I would point out that as lousy as blue links are, these links produce about $60 billion a year for the Alphabet Google thing and enough zeros for the Microsoft wizards to hang on to its online advertising business even as it loses enthusiasm for other aspects of the Bing thing.

Yahoo adds:

We are a consumer internet company, so for us there is little difference between our internal and external representations.

My comment is a simple question, “What the heck is Yahoo saying?”

I also highlighted this semantic gem:

At Yahoo Labs, we work in advancing the sciences that underlie these approaches, i.e. Natural Language Processing, Information Retrieval and the Semantic Web.

I like the notion of Yahoo advancing science. I wonder if these advances will lead to advances in top line revenue, stabilizing management, and producing search results that are sort of related to the query.

Read more

Semantic Search Word Play: Nail and Hospital Edition

August 15, 2015

I find the semantic search hoo-hah fascinating. Not long ago, I reported that Yebol, which some semantic wizard was promoting, bit the dust in 2010. No matter. The semantic search boomlet continues to echo. I don’t hear it in my neck of the woods, but apparently some folks are tuned to this semantic razzle dazzle.

The write up which caught my attention this morning is “Semantic Search: Is It Time for a Think Building Campaign?”

My answer is, “No.”

Let’s look at the argument because I am often wrong, off base, and addled. What do you expect from a goose living in rural Kentucky where 300 baud is a speedy network connection.

The write up points out a traffic hungry person could sign up for directories. Anyone remember those? Yahoo was one, but the Xoogler has anchored Yahoo’s revisionist history is “search.” I don’t expect much in the history department. Sorry.

The article leaps to this point:

The biggest reason to re-evaluate the power of moderated local directories, and related resources, has to do with Google’s shift towards semantic search. If you aren’t aware of this transition, the premise is simple: instead of simply matching keywords to pages that exist in the search engine’s databases, Google’s engineers are trying to get better about understanding the context of the search, and the intent of the searcher.

Sounds great, right. The problem is that Google engineers (not the Alphabet crowd) are trying to find ways to pump up the advertising revenue. I am not sure “semantics” is going to help as much as other types of content processing activities.

Nevertheless, the write up then makes this interesting statement:

This sounds technical but it’s conceptually straightforward. Imagine for a moment that I pick up my iPhone and tell Google’s app that I’ve “driven a nail through my leg”. Matching that exact search phrase isn’t important to me in interpreting results – what matters is that I want “hospital” instead of a “hardware store.” That’s the essence of semantic search.

Now, hold one’s mules, please. The person who pounds a nail through one’s leg may not need a hospital. If the nail misses the femur and threads around (not through) the popliteal, posterior tibial, anterior tibial, peroneal, planar, and dorsalis pedis arteries—one might pull out the nail.

Here in Kentucky, the person who performs this act of self mutilation or willful or unintentional abuse might want a link to this health care facility:


Disagree? That’s what makes horse races.

The write up points out that one can purchase “reputation.” The article points to WhiteSpark and MOZ Local.

The conclusion to the write up certainly is upbeat:

Taking advantage of citations and directories can still help you improve your findability – on search engines and elsewhere in the real world – but only if you’re focused on providing valuable information for potential customers, instead of trying to beat those ever-changing algorithms. In many ways semantic search takes us back to the golden days of the Web, when in terms of working online anything was possible as long as you had passion, belief in yourself, and energy to work at it.

Yep, the golden days. The issue I have with the write up is that semantic search as a way to distort Google’s already flakey relevance algorithms is an example of SEO adaptation. The carnival has arrived. The SEO snake oil sales person will cure your site’s pancreatic cancer and maybe help a a customer avoid pounding nails into one’s body parts.

Stephen E Arnold, August 15, 2015

Oracle: The Ostrich Syndrome

August 14, 2015

I read “Oracle’s Chief Security Officer Mary Ann Davidson Just Made as Rookie Mistake.” No, it has nothing to do with trying to breathe life into Oracle Secure Enterprise Search or increasing the content processing speed of Endeca. Those might be really difficult tasks.

According to the write up:

Oracle Chief Security Officer Mary Ann Davidson was forced to remove a blog post after she made a mistake that made her sound out of touch with the security space. In her online post, she claimed that security researchers who point out flaws in Oracle software may be in violation of the company’s license agreement. She said reverse engineering is not allowed under the company’s own TOS.

Quite a good idea if one is struggling with the Java thing, open source database annoyances, and push back about certain licensing policies and fees.

I read this and thought of the creature which buries its head in the sand.

To make the issue more interesting, Oracle removed the post which allegedly said:

“If we determine as part of our analysis that scan results could only have come from reverse engineering, we send a letter to the sinning customer, and a different letter to the sinning consultant-acting-on-customer’s behalf – reminding them of the terms of the Oracle license agreement that preclude reverse engineering, So Please Stop It Already”

I love the “already.” There is a robust market sector which identifies and provides information about vulnerabilities to those who are not into the ostrich approach to information.

Isn’t this disappearing, revisionistic information trend fascinating. What you do not know cannot possibly harm you. Ignorance is bliss. Be happy.

Stephen E Arnold, August 14, 2014

Next Page »