Mondeca: Tweaking Its Market Position

February 22, 2017

One of the Beyond Search goslings noticed a repositioning of the taxonomy capabilities of Mondeca. Instead of pitching indexing, the company has embraced ElasticSearch (based on Lucene) and Solr. The idea is that if an organization is using either of these systems for search and retrieval, Mondeca can provide “augmented” indexing. The idea is that keywords are not enough. Mondeca can index the content using concepts.

Of course, the approach is semantic, permits exploration, and enables content discovery. Mondeca’s Web site describes search as “find” and explains:

Initial results are refined, annotated and easy to explore. Sorted by relevancy, important terms are highlighted: easy to decide which one are relevant. Sophisticated facet based filters. Refining results set: more like this, this one, statistical and semantic methods, more like these: graph based activation ranking. Suggestions to help refine results set: new queries based on inferred or combined tags. Related searches and queries.

This is a similar marketing move to the one that Intrafind, a German search vendor, implemented several years ago. Mondeca continues to offer its taxonomy management system. Human subject matter experts do have a role in the world of indexing. Like other taxonomy systems and services vendors, the hook is that content indexed with concepts is smart. I love it when indexing makes content intelligent.

The buzzword is used by outfits ranging from MarkLogic’s merry band of XML and XQuery professionals to the library-centric outfits like Smartlogic. Isn’t smart logic better than logic?

Stephen E Arnold, February 22, 2017

The Pros and Cons of Human Developed Rules for Indexing Metadata

February 15, 2017

The article on Smartlogic titled The Future Is Happening Now puts forth the Semaphore platform as the technology filling the gap between NLP and AI when it comes to conversation. The article posits that in spite of the great strides in AI in the past 20 years, human speech is one area where AI still falls short. The article explains,

The reason for this, according to the article, is that “words often have meaning based on context and the appearance of the letters and words.” It’s not enough to be able to identify a concept represented by a bunch of letters strung together. There are many rules that need to be put in place that affect the meaning of the word; from its placement in a sentence, to grammar and to the words around – all of these things are important.

Advocating human developed rules for indexing is certainly interesting, and the author compares this logic to the process of raising her children to be multi-lingual. Semaphore is a model-driven, rules-based platform that allows us to auto-generate usage rules in order to expand the guidelines for a machine as it learns. The issue here is cost. Indexing large amounts of data is extremely cost-prohibitive, and that it before the maintenance of the rules even becomes part of the equation. In sum, this is a very old school approach to AI that may make many people uncomfortable.

Chelsea Kerwin, February 15, 2017

Semantics: Biting the Semantic Apple in the Garden of Search Subsystems

February 8, 2017

I love the Phoenix like behavior of search and content processing subsystems. Consider semantics or figuring out what something is about and assigning an index term to that aboutness. Semantics is not new, and it is not an end in itself. Semantic functions are one of the many Lego blocks which make up a functioning and hopefully semi accurate content processing and information accessing system.

I read “With Better Scaling, Semantic Technology Knocks on Enterprise’s Door.” The headline encapsulates decades of frustration for the champions of semantic solutions. The early bird vendors fouled the nest for later arrivals. As a result, nifty semantic technology makes a sales call and finds that those who bother to attend the presentation are [a] skeptical, [b] indifferent, [c] clueless, [d] unwilling to spend money for another career killer. Pick your answer.

For decades, yes, decades, enterprise search and content processing vendors have said whatever was necessary to close a deal. The operative concept was that the vendor could whip up a solution and everything would come up roses. Well, fly that idea by those who licensed Convera for video search, Fast Search for an intelligent system, or any of the other train wrecks that lie along the information railroad tracks.

This write up happily ignores the past and bets that “better” technology will make semantic functions accurate, easy, low cost, and just plain wonderful. Yep, the Garden of Semantics exists as long as the licensee has the knowledge, money, time, and personnel to deliver the farm fresh produce.

I noted this passage:

… semantics standards came out 15 or more years ago, but scalability has been an inhibitor. Now, the graph technology has taken off. Most of what people have been looking at it for is [online transactional processing]. Our focus has been on [online analytical processing] — using graph technology for analytics. What held graph technology back from doing analytics was the scaling problem. There was promise and hype over those years, but, at every turn, the scale just wasn’t there. You could see amazing things in miniature, but enterprises couldn’t see them at scale. In effect, we have taken our query technology and applied MPP technology to it. Now, we are seeing tremendous scales of data.

Yep, how much does it cost to shove Big Data through a computationally intensive semantic system? Ask a company licensing one of the industrial strength systems like Gotham or NetReveal.

Make sure you have a checkbook with a SPARQL enhanced cover and a matching pen with which to write checks appropriate to semantic processing of large flows of content. Some outfits can do this work and do it well. In my experience, most outfits cannot afford to tackle the job.

That’s why semantic chatter is interesting but often disappointing to those who chomp the semantic apple from the hot house of great search stuff. Don’t forget to gobble some cognitive chilies too.

Stephen E Arnold, February 8, 2017

More Semantic Search Cheerleading: My Ears Hurt

February 8, 2017

I read “Semantic Search. The Present and Future of Search Engine Optimization .” Let’s be clear. The point of this write up has zero to do with precision and recall. The goal strikes me as generating traffic. Period. Wrapping the blunt truth in semantic tinsel does not change the fact that providing on point information is not on the radar.

I noted this statement and circled it in wild and crazy pink:

SEO in the current times involves user intent to provide apt results which can help you to improve your online presence. Improvement is possible by emphasizing on various key psychological principles to attract readers; rank well and eventually expand business.

When I look for information, my intent is pretty clear to me. I have learned over the last 50 years that software is not able to assist me. May I give you an example from yesterday, gentle reader. I wanted information about Autonomy Kenjin, which became available in the late 1990s. It disappeared. Online was useless and the search systems I used either pointed me to board games, rock music, or Japanese culture. My intent is pretty clear to me. Intent to today’s search systems suck when it comes to my queries.

The write up points out that semantics will help out with “customer personality guiding SEO.” Maybe for Lady Gaga queries. For specialized, highly variable search histories, not a chance. Systems struggle to recognize the intent of highly idiosyncratic queries. Systems do best with big statistical globs. College students like pizza. This user belongs to a cluster of users labeled college students. Therefore, anyone in this cluster gets… pizza ads. Great stuff. Double cheese with two slices of baloney. Then there are keywords. Create a cluster, related terms to it. Bingo. Job done. Close enough for today’s good enough approach to indexing.

The real gems of the write up consist of admonitions to write about a relevant topic. Relevant to whom, gentle reader. The author, the reader, the advertiser? Include concepts. No problem. A concept to you might be a lousy word to describe something to me; for example, games and kenjin. And, of course, use keywords. Right, double talk and babble.

Semantic SEO. Great stuff. Cancel that baloney pizza order. I don’t feel well.

Stephen E Arnold, February 8, 2017

Learning the Aboutness of a Web Site or Other Other Online Text Object

February 7, 2017

Quite by accident the Beyond Search goslings came across a company offering a free semantic profile of online text objects. The idea is to plug in a url like The Leiki system will generate a snapshot of the concepts and topics the content object manifests. We ran the Beyond Search blog through the system. Here’s what we learned:

The system identified that the blog covers Beyond Search. We learned that our coverage of IBM is more intense than our coverage of the Google. But if one combines the Leiki category “Google Search” with the category “Google,” our love of the GOOG is manifest. We ran several other blogs through the Leiki system and learned about some content fixations that were not previously known to us.


We suggest you give the system a whirl.

The developer of the system provides a range of indexing, consulting, and semantic services. More information about the firm is at

Stephen E Arnold, February 7, 29017

Visualizing a Web of Sites

February 6, 2017

While the World Wide Web is clearly a web, it has not traditionally been presented visually as such. Digital Trends published an article centered around a new visualization of Wikipedia, Race through the Wikiverse for your next internet search. This web-based interactive 3D visualization of the open source encyclopedia is at It was created by Owen Cornec, a Harvard data visualization engineer. It pulls about 250,000 articles from Wikipedia and makes connections between articles based on overlapping content. The write-up tells us,

Of course it would be unreasonable to expect all of Wikipedia’s articles to be on Wikiverse, but Cornec made sure to include top categories, super-domains, and the top 25 articles of the week.

Upon a visit to the site, users are greeted with three options, each of course having different CPU and load-time implications for your computer: “Light,” with 50,000 articles, 1 percent of Wikipedia, “Medium,” 100,000 articles, 2 percent of Wikipedia, and “Complete,” 250,000 articles, 5 percent of Wikipedia.

Will this pave the way for web-visualized search? Or, as the article suggests, become an even more exciting playing field for The Wikipedia Game? Regardless, this advance makes it clear the importance of semantic search. Oh, right — perhaps this would be a better link to locate semantic search (it made the 1 percent “Light” cut).

Megan Feil, February 6, 2017

Alleged Google-Killer Omnity Is Now Free

January 31, 2017

Omnity is a search engine designed to deliver more useful results than one obtains from outfits like Google. The company, according to “Omnity Is a Semantic Mapping Search Engine That’s Now Offered for Free”,

…sometimes there’s a need for another kind of search, namely to locate documents that aren’t explicitly linked or otherwise referenced between each other but where each contains the same rare terms. In those cases, a method called “semantic mapping” becomes valuable, and there’s now a free option that does just that…

My query for “Omnity” returned these results:


When I checked the links in the central display and scanned the snippet in the left hand sidebar, I did not locate many relevant results. I noted a number of NASA related hits. A bit of checking allowed me to conclude that a company called Elumenati once offered product called Omnity.

If you want to experiment with the system, point your browser thing at You will have to register. Once you verify via an email, you are good to go.

We don’t have an opinion yet because we don’t know the scope of the index nor the method of determining relevance for an entity. The “semantic” jargon doesn’t resonate, but that may be our ignorance, ineptitude, or some simple interaction of our wetware.

Omnity may have some work to do before creating fear at the GOOG.

Stephen E Arnold, January 31, 2017

A New Search Engine Targeting Scientific Researchers Touts AI

January 27, 2017

The article titled How a New AI Powered Search Engine Is Changing How Neuroscientists Do Research on Search Engine Watch discusses the new search engine geared towards scientific researchers. It is called Semantic Scholar, and it uses AI to provide a comprehensive resource to scientists. The article explains,

This new search engine is actually able to think and analyze a study’s worth. GeekWire notes that, “Semantic Scholar uses data mining, natural language processing, and computer vision to identify and present key elements from research papers.” The engine is able to understand when a paper is referencing its own study or results from another source. Semantic Scholar can then identify important details, pull figures, and compare one study to thousands of other articles within one field.

This ability to rank and sort papers by relevance is tremendously valuable given the vast number of academic papers online. Google Scholar, by comparison, might lead a researcher in the right direction with its index of over 200 million articles, it simply does not have the same level of access to metadata that researchers need such as how often a paper or author has been cited. The creators of Semantic Scholar are not interested in competing with Google, but providing a niche search engine tailored to meet the needs of the scientific community.

Chelsea Kerwin, January 27, 2017

Some Things Change, Others Do Not: Google and Content

January 20, 2017

After reading Search Engine Journal’s, “The Evolution Of Semantic Search And Why Content Is Still King” brings to mind how there RankBrain is changing the way Google ranks search relevancy.  The article was written in 2014, but it stresses the importance of semantic search and SEO.  With RankBrain, semantic search is more of a daily occurrence than something to strive for anymore.

RankBrain also demonstrates how far search technology has come in three years.  When people search, they no longer want to fish out the keywords from their query; instead they enter an entire question and expect the search engine to understand.

This brings up the question: is content still king?  Back in 2014, the answer was yes and the answer is a giant YES now.  With RankBrain learning the context behind queries, well-written content is what will drive search engine ranking:

What it boils to is search engines and their complex algorithms are trying to recognize quality over fluff. Sure, search engine optimization will make you more visible, but content is what will keep people coming back for more. You can safely say content will become a company asset because a company’s primary goal is to give value to their audience.

The article ends with something about natural language and how people want their content to reflect it.  The article does not provide anything new, but does restate the value of content over fluff.  What will happen when computers learn how to create semantic content, however?

Whitney Grace, January 20, 2016

How Google Used Machine Learning and Loved It

January 16, 2017

If you use any search engine other than Google, except for DuckDuckGo, people cringe and doubt your Internet savvy.  Google has a reputation for being the most popular, reliable, and accurate search engine in the US.  It has earned this reputation, because, in many ways, it is the truth.  Google apparently has one upped itself, however, says Eco Consultancy in the article, “How Machine Learning Has Made Google Search Results More Relevant.”

In 2016, Google launched RankBrain to improve search relevancy in its results.  Searchmatics conducted a study and discovered that it worked.  RankBrain is an AI that uses machine learning to understand the context behind people’s search.  RankBrain learns the more it is used, similar to how a person learns to read.  A person learning to read might know a word, but can understand what it is based off context.

This increases Google’s semantic understanding, but so have the amount of words in a search query.  People are reverting to their natural wordiness and are not using as many keywords.  At the same time, back linking is not as important anymore, but the content quality is becoming more valuable for higher page rankings.  Bounce rates are increasing in the top twenty results, meaning that users are led to a more relevant result than pages with higher optimization.

RankBrain also shows Google’s growing reliance on AI:

With the introduction of RankBrain, there’s no doubt that Google is taking AI and machine learning more seriously.  According to CEO, Sundar Pichai, it is just the start. He recently commented that ‘be it search, ads, YouTube, or Play, you will see us — in a systematic way — apply machine learning in all these areas.’  Undoubtedly, it could shape more than just search in 2017.

While the search results are improving their relevancy, it spells bad news for marketers and SEO experts as their attempts to gain rankings are less effective.

Whitney Grace, January 16, 2016

Next Page »