Veteran Web Researcher Speaks on Bias and Misinformation

October 10, 2017

The CTO of semantic search firm Ntent, Dr. Ricardo Baeza-Yates, has been studying the Web since its inception. In their post, “Fake News and the Power of Algorithms: Dr. Ricardo Baeza-Yates Weights In With Futurezone at the Vienna Gödel Lecture,” Ntent shares his take on biases online by reproducing an interview Baeza-Yates gave Futurezone at the Vienna Gödel Lecture 2017, where he was the featured speaker. When asked about the consequences of false information spread far and wide, the esteemed CTO cited two pivotal events from 2016, Brexit and the US presidential election.

These were manipulated by social media. I do not mean by hackers – which cannot be excluded – but by social biases. The politicians and the media are in the game together. For example, a non-Muslim attack may be less likely to make the front page or earn high viewing ratings. How can we minimize the amount of biased information that appears? It is a problem that affects us all.

One might try to make sure people get a more balanced presentation of information. Currently, it’s often the media and politicians that cry out loudest for truth. But could there be truth in this context at all? Truth should be the basis but there is usually more than one definition of truth. If 80 percent of people see yellow as blue, should we change the term? When it comes to media and politics the majority can create facts. Hence, humans are sometimes like lemmings. Universal values could be a possible common basis, but they are increasingly under pressure from politics, as Theresa May recently stated in her attempt to change the Magna Carta in the name of security. As history already tells us, politicians can be dangerous.

Indeed. The biases that concern Baeza-Yates go beyond those that spread fake news, though. He begins by describing presentation bias—the fact that one’s choices are limited to that which suppliers have, for their own reasons, made available. Online, “filter bubbles” compound this issue. Of course, Web search engines magnify any biases—their top results provide journalists with research fodder, the perceived relevance of which is compounded when that journalist’s work is published; results that appear later in the list get ignored, which pushes them yet further from common consideration.

Ntent is working on ways to bring folks with different viewpoints together on topics on which they do agree; Baeza-Yates admits the approach has its limitations, especially on the big issues. What we really need, he asserts, is journalism that is bias-neutral instead of polarized. How we get there from here, even Baeza-Yates can only speculate.

Cynthia Murrell, October 10, 2017

European Tweets Analyzed for Brexit Sentiment

September 28, 2017

The folks at Expert System demonstrate their semantic intelligence chops with an analysis of sentiments regarding Brexit, as expressed through tweets. The company shares their results in their press release, “The European Union on Twitter, One Year After Brexit.” What are Europeans feeling about that major decision by the UK? The short answer—fear. The write-up tells us:

One year since the historical referendum vote that sanctioned Britain’s exit from the European Union (Brexit, June 23, 2016), Expert System has conducted an analysis to verify emotions and moods prevalent in thoughts expressed online by citizens. The analysis was conducted on Twitter using the cognitive Cogito technology to analyze a sample of approximately 160,000 tweets in English, Italian, French, German and Spanish related to Europe (more than 65,000 tweets for #EU, #Europe…) and Brexit (more than 95,000 tweets for #brexit…) posted between May 21 – June 21, 2017. Regarding the emotional sphere of the people, the prevailing sentiment was fear followed by desire as a mood for intensely seeking something, but without a definitive negative or positive connotation. The analysis revealed a need for more energy (action), and, in an atmosphere that seems to be dominated by a general sense of stress, the tweets also showed many contrasts: modernism and traditionalism, hope and remorse, hatred and love.

The piece goes on to parse responses by language, tying priorities to certain countries. For example, those tweeting in Italian often mentioned “citizenship”, while tweets in German focused largely on “dignity” and “solidarity.” The project also evaluates sentiment regarding several EU leaders. Expert System  was founded back in 1989, and their Cogito office is located in London.

Cynthia Murrell, September 28, 2017

Bitext and MarkLogic Join in a Strategic Partnership

June 13, 2017

Strategic partnerships are one of the best ways for companies to grow and diamond in the rough company Bitext has formed a brilliant one. According to a recent press release, “Bitext Announces Technology Partnership With MarkLogic, Bringing Leading-Edge Text Analysis To The Database Industry.” Bitext has enjoyed a number of key license deals. The company’s ability to process multi-lingual content with its deep linguistics analysis platform reduces costs and increases the speed with which machine learning systems can deliver more accurate results.

bitext logo

Both Bitext and MarkLogic are helping enterprise companies drive better outcomes and create better customer experiences. By combining their respectful technologies, the pair hopes to reduce data’s text ambiguity and produce high quality data assets for semantic search, chatbots, and machine learning systems. Bitext’s CEO and founder said:

““With Bitext’s breakthrough technology built-in, MarkLogic 9 can index and search massive volumes of multi-language data accurately and efficiently while maintaining the highest level of data availability and security. Our leading-edge text analysis technology helps MarkLogic 9 customers to reveal business-critical relationships between data,” said Dr. Antonio Valderrabanos.

Bitext is capable of conquering the most difficult language problems and creating solutions for consumer engagement, training, and sentiment analysis. Bitext’s flagship product is its Deep Linguistics Analysis Platform and Kantar, GFK, Intel, and Accenture favor it. MarkLogic used to be one of Bitext’s clients, but now they are partners and are bound to invent even more breakthrough technology. Bitext takes another step to cement its role as the operating system for machine intelligence.

Whitney Grace, June 13, 2017

Quote to Note: Hate That Semantic Web Stuff

June 8, 2017

I read “JSON-LD and Why I Hate the Semantic Web. “

Here’s the quote I noted:

I hate the narrative of the Semantic Web because the focus has been on the wrong set of things for a long time. That community, who I have been consciously distancing myself from for a few years now, is schizophrenic in its direction. Precious time is spent in groups discussing how we can query all this Big Data that is sure to be published via RDF instead of figuring out a way of making it easy to publish that data on the Web by leveraging common practices in use today. Too much time is spent assuming a future that’s not going to unfold in the way that we expect it to. That’s not to say that TURTLE, SPARQL, and Quad stores don’t have their place, but I always struggle to point to a typical startup that has decided to base their product line on that technology (versus ones that choose MongoDB and JSON on a regular basis).

There you go.

Stephen E Arnold, June 8, 2017

Deep Diving into HTML Employing Semantics

May 31, 2017

HTML, the programming language on which websites are based can employ semantics to make search easier and understanding, especially for those who use assistive technologies.

Web Dev Studios in an in-depth article titled Accessibility of Semantics: How Writing Semantic HTML Can Help Accessibility says:

Writing HTML is about more than simply “having stuff appear on the page.” Each element you use has a meaning and conveys information to your visitors, especially to those that use assistive technologies to help interpret that meaning for them.

Assistive technologies are used by people who have limited vision or other forms of impairment that prohibits them from accessing the web efficiently. If semantics is employed, according to the author of the article, impaired people too can access all features of the web like others.

The author goes on to explain things like how different tags in HTML can be used effectively to help people with visual impairments.

The Web and related technologies are evolving, and it can be termed as truly inclusive only when people with all types of handicaps are able to use it with equal ease.

Vishal Ingole, May 31, 2017

Semantic Platform Aggregates Scientific Information

May 1, 2017

A new scientific repository is now available from a prominent publisher, we learn from “GraphDB, Leading Semantic Database from Ontotext, Powers Springer Nature’s New Linked Open Data Platform” at PRWeb. (We note the word “leading” in the title; who verifies this assertion? Just curious.) The platform, dubbed SciGraph, aggregates data from Springer Nature and its academic partners. The press release specifies:

Thanks to semantic technologies, Linked Open Data and the GraphDB semantic database, all these data are connected in a way which semantically describes and visualizes how the information is interlinked. GraphDB’s capability to seamlessly integrate disparate data silos allows Springer Nature SciGraph to comprise metadata from journals and articles, books and chapters, organizations, institutions, funders, research grants, patents, clinical trials, substances, conference series, events, citations and reference networks, Altmetrics, and links to research datasets.

The dataset is released under a certain international creative commons license, and can be downloaded (by someone with the appropriate technical knowledge) here.

An early explorer of semantic technology, Ontotext was founded in 2000. Based in Bulgaria, the company keeps their North American office in New Jersey. Ontotext’s client roster includes big names in publishing, government agencies, and cultural institutions.

Cynthia Murrell, May 1, 2017

Keyword Search vs. Semantic Search for Patent Seekers

April 26, 2017

The article on BIP Counsels titled An Introduction to Patent Search, Keyword Search, and Semantic Searches offers a brief overview of the differences between keyword, and semantic search. The article is geared towards inventors and technologists in the early stages of filing a patent application. The article states,

If an inventor proceeds with the patent filing process without performing an exhaustive prior art search, it may hamper the patent application at a later point, such as in the prosecution process. Hence, a thorough search involving all possible relevant techniques is always advisable… Search tools such as ‘semantic search assistant’ help the user find similar patent families based on freely entered text.  The search method is ideal for concept based search.

Ultimately the article fails to go beyond the superficial when it comes to keyword and semantic search. One almost suspects that the author (BananaIP patent attorneys) wants to send potential DIY-patent researchers running into their office for help. Yes, terminology plays a key role in keyword searches. Yes, semantic search can help narrow the focus and relevancy of the results. If you want more information than that, you may want to visit the patent attorney. But probably not the one that wrote this article.

Chelsea Kerwin, April 26, 2017

Mondeca: Tweaking Its Market Position

February 22, 2017

One of the Beyond Search goslings noticed a repositioning of the taxonomy capabilities of Mondeca. Instead of pitching indexing, the company has embraced ElasticSearch (based on Lucene) and Solr. The idea is that if an organization is using either of these systems for search and retrieval, Mondeca can provide “augmented” indexing. The idea is that keywords are not enough. Mondeca can index the content using concepts.

Of course, the approach is semantic, permits exploration, and enables content discovery. Mondeca’s Web site describes search as “find” and explains:

Initial results are refined, annotated and easy to explore. Sorted by relevancy, important terms are highlighted: easy to decide which one are relevant. Sophisticated facet based filters. Refining results set: more like this, this one, statistical and semantic methods, more like these: graph based activation ranking. Suggestions to help refine results set: new queries based on inferred or combined tags. Related searches and queries.

This is a similar marketing move to the one that Intrafind, a German search vendor, implemented several years ago. Mondeca continues to offer its taxonomy management system. Human subject matter experts do have a role in the world of indexing. Like other taxonomy systems and services vendors, the hook is that content indexed with concepts is smart. I love it when indexing makes content intelligent.

The buzzword is used by outfits ranging from MarkLogic’s merry band of XML and XQuery professionals to the library-centric outfits like Smartlogic. Isn’t smart logic better than logic?

Stephen E Arnold, February 22, 2017

The Pros and Cons of Human Developed Rules for Indexing Metadata

February 15, 2017

The article on Smartlogic titled The Future Is Happening Now puts forth the Semaphore platform as the technology filling the gap between NLP and AI when it comes to conversation. The article posits that in spite of the great strides in AI in the past 20 years, human speech is one area where AI still falls short. The article explains,

The reason for this, according to the article, is that “words often have meaning based on context and the appearance of the letters and words.” It’s not enough to be able to identify a concept represented by a bunch of letters strung together. There are many rules that need to be put in place that affect the meaning of the word; from its placement in a sentence, to grammar and to the words around – all of these things are important.

Advocating human developed rules for indexing is certainly interesting, and the author compares this logic to the process of raising her children to be multi-lingual. Semaphore is a model-driven, rules-based platform that allows us to auto-generate usage rules in order to expand the guidelines for a machine as it learns. The issue here is cost. Indexing large amounts of data is extremely cost-prohibitive, and that it before the maintenance of the rules even becomes part of the equation. In sum, this is a very old school approach to AI that may make many people uncomfortable.

Chelsea Kerwin, February 15, 2017

Semantics: Biting the Semantic Apple in the Garden of Search Subsystems

February 8, 2017

I love the Phoenix like behavior of search and content processing subsystems. Consider semantics or figuring out what something is about and assigning an index term to that aboutness. Semantics is not new, and it is not an end in itself. Semantic functions are one of the many Lego blocks which make up a functioning and hopefully semi accurate content processing and information accessing system.

I read “With Better Scaling, Semantic Technology Knocks on Enterprise’s Door.” The headline encapsulates decades of frustration for the champions of semantic solutions. The early bird vendors fouled the nest for later arrivals. As a result, nifty semantic technology makes a sales call and finds that those who bother to attend the presentation are [a] skeptical, [b] indifferent, [c] clueless, [d] unwilling to spend money for another career killer. Pick your answer.

For decades, yes, decades, enterprise search and content processing vendors have said whatever was necessary to close a deal. The operative concept was that the vendor could whip up a solution and everything would come up roses. Well, fly that idea by those who licensed Convera for video search, Fast Search for an intelligent system, or any of the other train wrecks that lie along the information railroad tracks.

This write up happily ignores the past and bets that “better” technology will make semantic functions accurate, easy, low cost, and just plain wonderful. Yep, the Garden of Semantics exists as long as the licensee has the knowledge, money, time, and personnel to deliver the farm fresh produce.

I noted this passage:

… semantics standards came out 15 or more years ago, but scalability has been an inhibitor. Now, the graph technology has taken off. Most of what people have been looking at it for is [online transactional processing]. Our focus has been on [online analytical processing] — using graph technology for analytics. What held graph technology back from doing analytics was the scaling problem. There was promise and hype over those years, but, at every turn, the scale just wasn’t there. You could see amazing things in miniature, but enterprises couldn’t see them at scale. In effect, we have taken our query technology and applied MPP technology to it. Now, we are seeing tremendous scales of data.

Yep, how much does it cost to shove Big Data through a computationally intensive semantic system? Ask a company licensing one of the industrial strength systems like Gotham or NetReveal.

Make sure you have a checkbook with a SPARQL enhanced cover and a matching pen with which to write checks appropriate to semantic processing of large flows of content. Some outfits can do this work and do it well. In my experience, most outfits cannot afford to tackle the job.

That’s why semantic chatter is interesting but often disappointing to those who chomp the semantic apple from the hot house of great search stuff. Don’t forget to gobble some cognitive chilies too.

Stephen E Arnold, February 8, 2017

Next Page »

  • Archives

  • Recent Posts

  • Meta