Curious about Semantic Search the SEO Way?

November 12, 2019

DarkCyber is frequently curious about search: Semantic, enterprise, meta, multi-lingual, Boolean, and the laundry list of buzzwords marshaled to allow a person to find an answer.

If you want to get a Zithromax Z-PAK of semantic search talk, navigate to ‘Semantic Search Guide.” One has to look closely at the url to discern that this “objective” write up is about search engine optimization or SEO. DarkCyber affectionately describes SEO as the “relevance” killer, but that’s just our old-fashioned self refusing to adapt to the whizzy new world.

The link will point to a page with a number of links. These include:

  • Target audience and contributions
  • The knowledge graph explained
  • The evolution of search
  • Using Google’s entity search tool
  • Getting a Wikipedia listing

DarkCyber took a look at the “Evolution of Search” segment. We found it quirky but interesting. For example, we noted this passage:

Now we turn to the heart of full-text search. SEOs tend to dwell on the indexing part of search or the retrieval part of the search, called the Search Engine Results Pages (SERPs, for short). I believe they do this because they can see these parts of the search. They can tell if their pages have been crawled, or if they appear. What they tend to ignore is the black box in the middle. The part where a search engine takes all those gazillion words and puts them in an index in a way that allows for instant retrieval. At the same time, they are able to blend text results with videos, images and other types of data in a process known as “Universal Search”. This is the heart of the matter and whilst this book will not attempt to cover all of this complex subject, we will go into a number of the algorithms that search engines use. I hope these explanations of sometimes complex, but mostly iterative algorithms appeal to the marketer inside you and do not challenge your maths skills too much. If you would like to take these ideas in in video form, I highly recommend a video by Peter Norvig from Google in 2011:

Oh, well. This is one way to look at universal search. But Google has silos of indexes. The system after 20 plus years does not federate results across indexes. Semantic search? Yeah, right. Search each index, scan results, cut and paste, and then try to figure out the dates and times. Semantic search does not do time particularly well.

Important. Not to the SEO. Search babble may be more compelling.

If this approach is your cup of tea, inLinks has the hot water you need to understand why finding information is not what it seems.

Stephen E Arnold, November 12, 2019

Knowledge Graphs: Getting Hot

July 4, 2019

Artificial intelligence, semantics, and machine learning may lose their pride of place in the techno-jargon whiz bang marketing world. I read “A Common Sense View of Knowledge Graphs,” and noted this graph:


This is a good, old fashioned, Gene Garfield (remember him, gentle reader) citation analysis. The idea is that one can “see” how frequently an author or, in this case, a concept has been cited in the “literature.” Now publishers are dropping like flies and are publishing bunk. Nevertheless, one can see that using the phrase knowledge graph is getting popular within the sample of “literature” parsed for this graph. (No, I don’t recommend trying to perform citation analysis in Bing, Facebook, or Google. The reasons will just depress me and you, gentle reader.)

The section of the write I found useful and worthy of my “research” file is the collection of references to documents defining “knowledge graph.” This is useful, helpful research.

The write up also includes a diagram which may be one of the first representations of a graph centric triple. I thought this was something cooked up by Drs. Bray, Guha, and others in the tsunami of semantic excitement.

One final point: The list of endnotes is also useful. In short, good write up. The downside is that if the article gets wider distribution, a feeding frenzy among money desperate consultants, advisers, and analysts will be ignited like a Fourth of July fountain of flame.

Stephen E Arnold, July 4, 2019

Google: SEO Like a True Google Human Actor

April 18, 2019

We know Google’s search algorithm comprehends text, at least enough to produce relevant search results (though, alas, apparently not enough to detect improper comments in kiddie videos on YouTube). The mechanisms, though, remain murky. Yoast ponders, “How Does Google Understand Text?” Writer Jesse van de Hulsbeek observes Google keeps the particulars close to the vest, but points to some clues, like patents Google has filed. “Word embeddings,” or assessing closely related words, and related entities are two examples. Writing for his SEO audience, van de Hulsbeek advises:

If Google understands context in some way or another, it’s likely to assess and judge context as well. The better your copy matches Google’s notion of the context, the better its chances. So thin copy with limited scope is going to be at a disadvantage. You’ll need to cover your topics exhaustively. And on a larger scale, covering related concepts and presenting a full body of work on your site will reinforce your authority on the topic you specialize in.

We also noted:

Easier texts which clearly reflect relationships between concepts don’t just benefit your readers, they help Google as well. Difficult, inconsistent and poorly structured writing is more difficult to understand for both humans and machines. You can help the search engine understand your texts by focusing on: Good readability (that is to say, making your text as easy-to-read as possible without compromising your message)…Good structure (that is to say, adding clear subheadings and transitions)…Good context (that is to say, adding clear explanations that show how what you’re saying relates to what is already known about a topic).

The article does point out that including key phrases is still important. Google is trying to be more like a human reader, we’re reminded, so text that is good for the humans is good for the SEO ranking. Relevance? Not so much.

Cynthia Murrell, April 18, 2019

Elsevier: Raising Prices Easier Than Implementing Security?

March 19, 2019

Elsevier is a professional publishing company. The firm has a reputation for raising prices for its peer reviewed journals and online services. The challenge is that many subscribers are libraries and libraries are not rolling in cash. Raising prices is easy. One calls a meeting, examines models of money in vs subscribers out, and emails the price hike. No problem.

Security, however, works a bit differently. Elsevier may have learned this is the information in “Education and Science Giant Elsevier Left Users’ Passwords Exposed Online” is accurate. The write up asserts:

Elsevier, the company behind scientific journals such as The Lancet, left a server open to the public internet, exposing user email addresses and passwords. The impacted users include people from universities and educational institutions from across the world.

The article reports that Elsevier fixed the problem. The password security issue, not the burden on libraries.

Stephen E Arnold, March 19, 2019

Elastic Teams With Startup for Semantic Search

August 10, 2018

We’ve learned that a Search company we’ve been following with some interest, Elastic, is pairing with a Palo Alto-based startup to develop and integrate semantic search tools. Computer Weekly shares some details in, “Elastic Puts ‘Semantic Code Search’ Into Stack With” Writer Adrian Bridgwater tells us:

“Known for its Elasticsearch and Elastic Stack products, Elastic insists that’s technology is ‘highly complementary’ to other Elastic use cases and solutions—indeed, is built on the Elastic Stack. provides an interface to search and navigate the source code that is said to ‘go beyond’ simple free text search. Current programming language support includes C/C++, Java, Scala, Ruby, Python, and PHP. This ‘beyond text search’ function gives developers the ability to search for code pertaining to specific application functionality and dependencies. Essentially it provides IDE-like code intelligence features such as cross-reference, class hierarchy and semantic understanding. The impact of such functionality should stretch beyond exploratory question-and-answer utility, for example, enabling more efficient onboarding for new team members and reducing duplication of work for existing teams as they scale.”

According to Elastic’s CEO, integration of the technology will be familiar to anyone who observed how they did it with past acquisitions, like Opbeat and Prelert. We’re also assured that all of’s workers are being welcomed into Elastic’s development fold. Bridgwater notes that, with the startup’s Beiging-based engineering team, Elastic now has its first “formal” dev team located in China. Founded in 2012, Elastic is now based in Mountain View, California.

Cynthia Murrell, August 10, 2018

Mondeca: Another Semantic Search Option

April 9, 2018

Mondeca, based in France, has long been focused on indexing and taxonomy. Now they offer a search platform named, simply enough, Semantic Search. Here’s their description:

“Semantic search systems consider various points including context of search, location, intent, variation of words, synonyms, generalized and specialized queries, concept matching and natural language queries to provide relevant search results. Augment your SolR or ElasticSearch capabilities; understand the intent, contextualize search results; search using business terms instead of keywords.”

A few details from the product page caught my eye. Let’s begin with the Search functionality; the page succinctly describes:

“Navigational search – quickly locate specific content or resource. Informational search – learn more about a specific subject. Compound term processing, concept search, fuzzy search, simple but smart search, controlled terms, full text or metadata, relevancy scoring. Takes care of language, spelling, accents, case. Boolean expressions, auto complete, suggestions. Disambiguated queries, suggests alternatives to the original query. Relevance feedback: modify the original query with additional terms. Contextualize by user profile, location, search activity and more.”

The software includes a GUI for visualizing the semantic data, and features word-processing tools like auto complete and a thesaurus. Results are annotated, with key terms highlighted, and filters provide significant refinement, complete with suggestions. Results can also be clustered by either statistics or semantic tags. A personalized dashboard and several options for sharing and publishing round out my list. See the product page for more details.

Established in 1999, Mondeca delivers pragmatic semantic solutions to clients in Europe and North America, and is proud to have developed their own, successful semantic methodology. The firm is based in Paris. Perhaps the next time our beloved leader, Stephen E Arnold, visits Paris, the company will make time to speak with him. Previous attempts to set up a meeting were for naught. Ah, France.

Cynthia Murrell, April 9, 2018

IBM Socrates Wins 2017 Semantic Web Challenge

January 10, 2018

We learn from the press release “Elsevier Announces the Winner of the 2017 Semantic Web Challenge,” posted at PRNewswire, that IBM has taken the top prize in the 2017 Semantic Web Challenge world cup with its AI project, Socrates. The outfit sponsoring the competition is the number one sci-tech publisher, Elsevier. We assume IBM will be happy with another Jeopardy-type win.

Knowledge graphs were the focus of this year’s challenge, and a baseline representing current progress in the field was established. The judges found that Socrates skillfully wielded natural language processing and deep learning to find and check information across multiple web sources. About this particular challenge, the write-up specifies:

This year, the SWC adjusted the annual format in order to measure and evaluate targeted and sustainable progress in this field. In 2017, competing teams were asked to perform two important knowledge engineering tasks on the web: fact extraction (knowledge graph population) [and] fact checking (knowledge graph validation). Teams were free to use any arbitrary web sources as input, and an open set of training data was provided for them to learn from. A closed dataset of facts, unknown to the teams, served as the ground truth to benchmark how well they did. The evaluation and benchmarking platform for the 2017 SWC is based on the GERBIL framework and powered by the HOBBIT project. Teams were measured on a very clear definition of precision and recall, and their performance on both tasks was tracked on a leader board. All data and systems were shared according to the FAIR principles (Findable, Accessible, Interoperable, Reusable).

The Semantic Web Challenge has been going on since 2003, organized in cooperation with the Semantic Web Science Association.

Cynthia Murrell, January 10, 2018

Neural Network Revamps Search for Research

December 7, 2017

Research is a pain, especially when you have to slog through millions of results to find specific and accurate results.  It takes time and lot of reading, but neural networks could cut down on the investigation phase.  The Economist wrote a new article about how AI will benefit research: “A Better Way To Search Through Scientific Papers.”

The Allen Institute for Artificial Intelligence developed Semantic Search to aid scientific research.  Semantic Search’s purpose is to discover scientific papers most relevant to a particular problem.  How does Semantic Scholar work?

Instead of relying on citations in other papers, or the frequency of recurring phrases to rank the relevance of papers, as it once did and rivals such as Google Scholar still do, the new version of Semantic Scholar applies AI to try to understand the context of those phrases, and thus achieve better results.

Semantic Scholar relies on a neural network, a system that mirrors real neural networks and learns by trial and error tests.  To make Semantic Search work, the Allen Institute team annotated ten and sixty-seven abstracts.  From this test sample, they found 7,000 medical terms with which 2,000 could be paired.  The information was fed into the Semantic Search neural network, then it found more relationships based on the data.  Through trial and error, the neural network learns more patterns.

The Allen Institute added 26 million biomedical research papers to the already 12 million in the database.  The plan is to make scientific and medical research more readily available to professionals, but also to regular people.

Whitney Grace, December 7, 2017

Semantic Scholar Expanding with Biomedical Lit

November 29, 2017

Academic publishing is the black hole of the publishing world.  While it is a prestigious honor to have your work published by a scholar press or journal, it will not have a high circulation.  One reason that academic material is blocked behind expensive paywalls and another is that papers are not indexed well.  Tech Crunch has some good news for researchers: “Allen institute For AI’s Semantic Scholar Adds Biomedical Papers To Its AI-Sorted Corpus.”

The Allen Institute for AI started the Semantic Scholar is an effort to index scientific literature with NLP and other AI algorithms.  Semantic Scholar will now include biomedical texts in the index.  There is way too much content available for individuals to read and create indices.  AI helps catalog and create keywords for papers by scanning an entire text, pulling key themes, and adding it to the right topic.

There’s so much literature being published now, and it stretches back so far, that it’s practically impossible for a single researcher or even a team to adequately review it. What if a paper from six years ago happened to note a slight effect of a drug byproduct on norepinephrine production, but it wasn’t a main finding, or was in a journal from a different discipline?

Scientific studies are being called into question, especially when the tests are funded by corporate entities.  It is important to verify truth from false information as we consume more and more each day.  Tools like Semantic Scholar are key to uncovering the truth.  It is too bad it does not receive more attention.

Whitney Grace, November 29, 2017


Veteran Web Researcher Speaks on Bias and Misinformation

October 10, 2017

The CTO of semantic search firm Ntent, Dr. Ricardo Baeza-Yates, has been studying the Web since its inception. In their post, “Fake News and the Power of Algorithms: Dr. Ricardo Baeza-Yates Weights In With Futurezone at the Vienna Gödel Lecture,” Ntent shares his take on biases online by reproducing an interview Baeza-Yates gave Futurezone at the Vienna Gödel Lecture 2017, where he was the featured speaker. When asked about the consequences of false information spread far and wide, the esteemed CTO cited two pivotal events from 2016, Brexit and the US presidential election.

These were manipulated by social media. I do not mean by hackers – which cannot be excluded – but by social biases. The politicians and the media are in the game together. For example, a non-Muslim attack may be less likely to make the front page or earn high viewing ratings. How can we minimize the amount of biased information that appears? It is a problem that affects us all.

One might try to make sure people get a more balanced presentation of information. Currently, it’s often the media and politicians that cry out loudest for truth. But could there be truth in this context at all? Truth should be the basis but there is usually more than one definition of truth. If 80 percent of people see yellow as blue, should we change the term? When it comes to media and politics the majority can create facts. Hence, humans are sometimes like lemmings. Universal values could be a possible common basis, but they are increasingly under pressure from politics, as Theresa May recently stated in her attempt to change the Magna Carta in the name of security. As history already tells us, politicians can be dangerous.

Indeed. The biases that concern Baeza-Yates go beyond those that spread fake news, though. He begins by describing presentation bias—the fact that one’s choices are limited to that which suppliers have, for their own reasons, made available. Online, “filter bubbles” compound this issue. Of course, Web search engines magnify any biases—their top results provide journalists with research fodder, the perceived relevance of which is compounded when that journalist’s work is published; results that appear later in the list get ignored, which pushes them yet further from common consideration.

Ntent is working on ways to bring folks with different viewpoints together on topics on which they do agree; Baeza-Yates admits the approach has its limitations, especially on the big issues. What we really need, he asserts, is journalism that is bias-neutral instead of polarized. How we get there from here, even Baeza-Yates can only speculate.

Cynthia Murrell, October 10, 2017

Next Page »

  • Archives

  • Recent Posts

  • Meta