Google Search and Hot News: Sensitivity and Relevance

November 10, 2017

I read “Google Is Surfacing Texas Shooter Misinformation in Search Results — Thanks Also to Twitter.” What struck me about the article was the headline; specifically, the implication for me was that Google was not responding to user queries. Google is actively “surfacing” or fetching and displaying information about the event. Twitter is also involved. I don’t think of Twitter as much more than a party line. One can look up keywords or see a stream of content containing a keyword or a, to use Twitter speak, “hash tags.”

The write up explains:

Users of Google’s search engine who conduct internet searches for queries such as “who is Devin Patrick Kelley?” — or just do a simple search for his name — can be exposed to tweets claiming the shooter was a Muslim convert; or a member of Antifa; or a Democrat supporter…

I think I understand. A user inputs a term and Google’s system matches the user’s query to the content in the Google index. Google maintains many indexes, despite its assertion that it is a “universal search engine.” One has to search across different Google services and their indexes to build up a mosaic of what Google has indexed about a topic; for example, blogs, news, the general index, maps, finance, etc.

Developing a composite view of what Google has indexed takes time and patience. The results may vary depending on whether the user is logged in, searching from a particular geographic location, or has enabled or disabled certain behind the scenes functions for the Google system.

The write up contains this statement:

Safe to say, the algorithmic architecture that underpins so much of the content internet users are exposed to via tech giants’ mega platforms continues to enable lies to run far faster than truth online by favoring flaming nonsense (and/or flagrant calumny) over more robustly sourced information.

From my point of view, the ability to figure out what influences Google’s search results requires significant effort, numerous test queries, and recognition that Google search now balances on two pogo sticks. Once “pogo stick” is blunt force keyword search. When content is indexed, terms are plucked from source documents. The system may or may not assign additional index terms to the document; for example, geographic or time stamps.

The other “pogo stick” is discovery and assignment of metadata. I have explained some of the optional tags which Google may or may not include when processing a content object; for example, see the work of Dr. Alon Halevy and Dr. Ramanathan Guha.

But Google, like other smart content processing today, has a certain sensitivity. This means that streams of content processed may contain certain keywords.

When “news” takes place, the flood of content allows smart indexing systems to identify a “hot topic.” The test queries we ran for my monographs “The Google Legacy” and “Google Version 2.0” suggest that Google is sensitive to certain “triggers” in content. Feedback can be useful; it can also cause smart software to wobble a bit.

Image result for the impossible takes a little longer

T shirts are easy; search is hard.

I believe that the challenge Google faces is similar to the problem Bing and Yandex are exploring as well; that is, certain numerical recipes can over react to certain inputs. These over reactions may increase the difficulty of determining what content object is “correct,” “factual,” or “verifiable.”

Expecting a free search system, regardless of its owner, to know what’s true and what’s false is understandable. In my opinion, making this type of determination with today’s technology, system limitations, and content analysis methods is impossible.

In short, the burden of figuring out what’s right and what’s not correct falls on the user, not exclusively on the search engine. Users, on the other hand, may not want the “objective” reality. Search vendors want traffic and want to generate revenue. Algorithms want nothing.

Mix these three elements and one takes a step closer to understanding that search and retrieval is not the slam dunk some folks would have me believe. In fact, the sensitivity of content processing systems to comparatively small inputs requires more discussion. Perhaps that type of information will come out of discussions about how best to deal with fake news and related topics in the context of today’s information retrieval environment.

Free search? Think about that too.

Stephen E Arnold, November 10, 2017

A Clever Take on Google and Fake News

November 8, 2017

I noted this story in the UK online publication The Register: “Google on Flooding the Internet with Fake News: Leave Us Alone. We’re Trying Really Hard. Sob.” The write up points out:

Google has responded in greater depth after it actively promoted fake news about Sunday’s Texas murder-suicide gunman by… behaving like a spoilt kid.

The Google response, as presented in the write up, warranted a yellow circle from my trusty highlighter. The Register said:

Having had time to reflect on the issue, the Silicon Valley monster’s “public liaison for search” and former Search Engine Land blog editor Danny Sullivan gave a more, um, considered response in a series of tweets. “Bottom line: we want to show authoritative information. Much internal talk yesterday on how to improve tweets in search; more will happen,” he promised, before noting that the completely bogus information had only appeared “briefly.”

image

The Register story includes other gems from the search engine optimization expert who seems to thrive on precision and relevance for content unrelated to a user’s query; for example, the article presents some “quotes” from Mr. Sullivan, the expert in charge of explaining the hows and whys of fake news:

  • “Early changes put in place after Las Vegas shootings seemed to help with Texas. Incorrect rumors about some suspects didn’t get in…”
  • Right now, we haven’t made any immediate decisions. We’ll be taking some time to test changes and have more discussions.
  • “Not just talk. Google made changes to Top Stories and is still improving those. We’ll do same with tweets. We want to get this right.”

Yep, Google wants to do better. Now Google wants to get “this” right. Okay. After 20 years, dealing with fake content, spoofs, and algorithmic vulnerability is on the to do list. That’s encouraging.

For more Google explanations, check out the Register’s story and follow the logic of the SEO wizard who now has to explain fake news creeping—well, more like flowing—into Google’s search and news content.

Does an inability to deal with fake news hint at truthiness challenges at Googzilla’s money machine? Interesting question from my point of view.

Stephen E Arnold, November 8, 2017

SEO Benefits Take Time to Realize

October 30, 2017

In many (most?) fields today, it is considered essential for companies to position themselves as close to the top of potential customers’ Web search results as possible. However, search engine optimization (SEO) efforts take time. Business 2 Community explains “Why It Takes Six Months to Improve Search Rankings.”  Marketers must accept that, unless they luck out with content that goes viral, they will just have to be patient for results. Writer Kent Campbell explains five reasons this is the case, and my favorite is number one—search systems were not built to aid marketers in the first place! In fact, in some ways, quite the opposite. Campbell writes:

Bing and Google Serve Their Searchers, Not You.

A search provider’s primary concern is its users, not you or any other business that’s fighting for a spot on the first page. The search engine’s goal is to provide the best user experience to its searchers; that means displaying the most relevant and high quality results for every search query. Both Bing and Google watch how people react to content before they decide how visible that content should be in results. Even when content has had a lot of SEO therapy, the content itself has to be spot-on. This is why Google evaluates every piece of content on more than 200 ranking factors and ensures that only the best quality pages make it to the top 10. The best way to make it to the first page is by aligning yourself with Google’s objective, which is to serve its users.

A company might be seeing slow results because they hesitated—Early Movers Have an Advantage is the second reason Campbell gives. On the other hand, at number three, we find that Creating Quality Content Takes Time. Then there is the fact that Link Building Is Not as Simple as Before. Finally, there’s this more recent complication—Social Media Also Impacts Rankings these days. See the article for Campbell’s explanation for each point. He concludes with a little advice: companies would do well to consider their SEO efforts an ongoing cost of doing business, rather than an extraordinary item.

Cynthia Murrell, October 30, 2017

Facebookand Publishing

October 23, 2017

Print publishing has slowly been circling the drain as revenue drops (at least depending on what type of publishing you are in).  Some publishers have tried going totally digital, hoping that online subscriptions and ads would pay the bills, but Google and Facebook are siphoning off the source.  The Next Web shares more of how publishers are struggling in the article, “Publishers Need To Learn From Mega Platforms Like Facebook.”

Like many smart companies, publishers have joined social media and hoped to build their brand image on them.  Publishers, however, have learned that Facebook and other social media platforms keep changing their requirements.  The article compares it to a type of Darwinian survival of the fittest.  The publishing companies with deep pockets are surviving by investments and smart digital upgrades.

Jeff Bezos is used as an example because he has turned video streaming as one of Amazon’s main profit generators.  The suggestion is that publishers follow suit with video and then live video streams.  The comments sections in these videos create an ongoing dialogue with viewers (while at the same time allowing trolls).  It turns out that commoditized content on social media is not the way to go.

Publishers need to instead concentrate on building their own platform apparently:

This is the perfect time for publishers to take control of their platforms and the video streams that will drive the next phase of the digital content revolution. With advances in live video programming and the speed with which original content can be created, publishers can greatly enhance what they already do and know, and monetize it through changes in advertising models that fuel online media platforms as well as live-streaming video platforms.

The Internet is more than video, however.  Podcasts and articles are still viable content too.  It might be time to double think your career if you are a social media manager.

Whitney Grace, October 23, 2017

Brief Configuration Error by Google Triggers Japanese Investigation

October 12, 2017

When a tech giant makes even a small mistake, consequences can be significant. A brief write-up from the BBC, “Google Error Disrupts Corporate Japan’s Web Traffic,”  highlights this lamentable fact. We learn:

Google has admitted that wide-spread connectivity issues in Japan were the result of a mistake by the tech giant. Web traffic intended for Japanese internet service providers was being sent to Google instead.

Online banking, railway payment systems as well as gaming sites were among those affected.

A spokesman said a ‘network configuration error’ only lasted for eight minutes on Friday but it took hours for some services to resume. Nintendo was among the companies who reported poor connectivity, according to the Japan Times, as well as the East Japan Railway Company.

All of that content—financial transactions included—was gone for good, since Google cannot transmit to third-party networks, according to an industry expert cited in the post. Essentially, it seems that for those few minutes, Google accidentally hijacked all traffic to NTT Communications Corp, which boasts over 50 million customers in Japan. The country’s Ministry of Internal Affairs and Communications is investigating the incident.

Cynthia Murrell, October 12, 2017

Google-Publishers Partnership Chases True News

September 22, 2017

It appears as though Google is taking the issue of false information, and perhaps even their role in its perpetuation, seriously; The Drum reveals, “Google Says it Wants to Fund the News, Not Fake It.” Reporters Jessica Goodfellow and Ronan Shields spoke with Google’s Madhav Chinnappa to discuss the Digital News Initiative (DNI), which was established in 2015. The initiative, a project on which Google is working with European news publishers, aims to leverage technology in support of good journalism. As it turns out, Wikipedia’s process suggests an approach; having discussed the “collaborative content” model with Chinnappa, the journalists write:

To this point, he also discusses DNI’s support of Wikitribune, asserting that it and Wikipedia are ‘absolutely incredible and misunderstood,’ pointing out the diligence that goes into its editing and review process, despite its decentralized means of doing so. The Wikitribune project tries to take some of this spirit of Wikipedia and apply this to news, adds Chinnappa. He further explains that [Wikipedia & Wikitribune] founder Jimmy Wales’ opinion is that the mainstream model of professional online publishing, whereby the ‘journalist writes the article and you’ve got a comment section at the bottom and it’s filled with crazy people saying crazy things’, is flawed. He [Wales] believes that’s not a healthy model. What Wikitribune wants to do is actually have a more rounded model where you have the professional journalist and then you have people contributing as well and there’s a more open and even dialogue around that,’ he adds. ‘If it succeeds? I don’t know. But I think it’s about enabling experimentation and I think that’s going to be a really interesting one.’

Yes, experimentation is important to the DNI’s approach. Chinnappa believes technical tools will be key to verifying content accuracy. He also sees a reason to be hopeful about the future of journalism—amid fears that technology will eventually replace reporters, he suggests such tools, instead, will free journalists from the time-consuming task of checking facts. Perhaps; but will they work to stem the tide of false propaganda?

Cynthia Murrell, September 22, 2017

Twitch Incorporates ClipMine Discovery Tools

September 18, 2017

Gameplay-streaming site Twitch has adapted the platform of their acquisition ClipMine, originally developed for adding annotations to online videos, into a metadata-generator for its users. (Twitch is owned by Amazon.) TechCrunch reports the development in, “Twitch Acquired Video Indexing Platform ClipMine to Power New Discovery Features.” Writer Sarah Perez tells us:

The startup’s technology is now being put to use to translate visual information in videos – like objects, text, logos and scenes – into metadata that can help people more easily find the streams they want to watch. Launched back in 2015, ClipMine had originally introduced a platform designed for crowdsourced tagging and annotations. The idea then was to offer a technology that could sit over top videos on the web – like those on YouTube, Vimeo or DailyMotion – that allowed users to add their own annotations. This, in turn, would help other viewers find the part of the video they wanted to watch, while also helping video publishers learn more about which sections were getting clicked on the most.

Based in Palo Alto, ClipMine went on to make indexing tools for the e-sports field and to incorporate computer vision and machine learning into their work. Their platform’s ability to identify content within videos caught Twitch’s eye; Perez explains:

Traditionally, online video content is indexed much like the web – using metadata like titles, tags, descriptions, and captions. But Twitch’s streams are live, and don’t have as much metadata to index. That’s where a technology like ClipMine can help. Streamers don’t have to do anything differently than usual to have their videos indexed, instead, ClipMine will analyze and categorize the content in real-time.

ClipMine’s technology has already been incorporated into stream-discovery tools for two games from Blizzard Entertainment, “Overwatch” and “Hearthstone;” see the article for more specifics on how and why. Through its blog, Twitch indicates that more innovations are on the way.

Cynthia Murrell, September 18, 2017

A New and Improved Content Delivery System

September 7, 2017

Personalized content and delivery is the name of the game in PRWEB’s, “Flatirons Solutions Launches XML DITA Dynamic Content Delivery Solutions.”  Flatirons Solutions is a leading XML-based publishing and content management company and they recently released their Dynamic Content Delivery Solution.  The Dynamic Content Delivery Solution uses XML-based technology will allow enterprises to receive more personalized content.  It is advertised that it will reduce publishing and support costs.  The new solution is built with the Mark Logic Server.

By partnering with Mark Logic and incorporating their industry-leading XML content server, the solution conducts powerful queries, indexing, and personalization against large collections of DITA topics. For our clients, this provides immediate access to relevant information, while producing cost savings in technical support, and in content production, maintenance, review and publishing. So whether they are producing sales, marketing, technical, training or help documentation, clients can step up to a new level of content delivery while simultaneously improving their bottom line.

The Dynamic Content Delivery Solution is designed for government agencies and enterprises that publish XML content to various platforms and formats.  Mark Logic is touted as a powerful tool to pool content from different sources, repurpose it, and deliver it to different channels.

MarkLogic finds success in its core use case: slicing and dicing for publishing.  It is back to the basics for them.

Whitney Grace, September 7, 2017

 

Factoids about Toutiao: Smart News Filtering Service

August 28, 2017

The filtering service Toutiao is operated by Bytedance. The company attracted attention  because it is generating money (allegedly) and has lots of users or “daily average users” in the 120 million range. (If you are acronym minded, the daily average user count is a DAU. Holy Dau!)

Forget Google’s “translate this page” for Toutiao, the service is blind to the Toutiao content. A work around is to cut and paste snippets into FreeTranslations.org or get someone who reads Chinese to explain what’s on the Toutiao’s pages.

Other items of interest include. (Oh, the hyperlinks point to the source of the factoid.)

    • $900 million in revenue (allegedly). Wall Street Journal, August 28, 2017 with a pay wall for your delectation
    • Funding of $3 billion Crunchbase
    • Valuation of $20 billion or more Reuters
    • Toutiao means headlines Wikipedia
    • What it does from Wikipedia:

Toutiao uses algorithms to select different quality content for individual users. It has created algorithmic models that understand information (text, images, videos, comments, etc.) in depth, and developed large-scale machine learning systems for personalized recommendation that surfaces content users have not necessarily signaled preference for yet. Using Natural Language Processing and Computer Vision technologies in A.I, Toutiao extracts hundreds of entities and keywords as features from each piece of content. When a user first open the app, Toutiao makes a preliminary recommendation based on the operation system of his mobile device, his location and other factors. With users’ interactions with the app, Toutiao fine-tunes its models and make better recommendations.

  • Founded by Zhang Yiming, age 34, in 2012 Reuters

Technode’s “Why Is Toutiao, a News App, Setting Off Alarm Bells for China’s Giants?” suggests that Toutiao may be the next big Chinese online success. The reason is that the service aggregates “news” from disparate content sources; for example, text, video, images, and data.

Toutiao may be the next big thing in algorithmic, mobile centric information access solutions. The company generates revenues from online ads. The company’s secret sauce include smart software plus some extra ingredients:

  • Social functions
  • Search
  • Video
  • User generated “original” content
  • Global plans.

Net net: Worth watching.

Stephen E Arnold, August 28, 2017

Smartlogic: A Buzzword Blizzard

August 2, 2017

I read “Semantic Enhancement Server.” Interesting stuff. The technology struck me as a cross between indexing, good old enterprise search, and assorted technologies. Individuals who are shopping for an automatic indexing systems (either with expensive, time consuming hand coded rules or a more Autonomy-like automatic approach) will want to kick the tires of the Smartlogic system. In addition to the echoes of the SchemaLogic approach, I noted a Thomson submachine gun firing buzzwords; for example:

best bets (I’m feeling lucky?)
dynamic summaries (like Island Software’s approach in the 1990s)
faceted search (hello, Endeca?)
model
navigator (like the Siderean “navigator”?)
real time
related topics (clustering like Vivisimo’s)
semantic (of course)
taxonomy
topic maps
topic pages (a Google report as described in US29970198481)
topic path browser (aka breadcrumbs?)
visualization

What struck me after I compiled this list about a system that “drives exceptional user search experiences” was that Smartlogic is repeating the marketing approach of traditional vendors of enterprise search. The marketing lingo and “one size fits all” triggered thoughts of Convera, Delphes, Entopia, Fast Search & Transfer, and Siderean Software, among others.

I asked myself:

Is it possible for one company’s software to perform such a remarkable array of functions in a way that is easy to implement, affordable, and scalable? There are industrial strength systems which perform many of these functions. Examples range from BAE’s intelligence system to the Palantir Gotham platform.

My hypothesis is that Smartlogic might struggle to process a real time flow of WhatsApp messages, YouTube content, and mobile phone intercept voice calls. Toss in the multi language content which is becoming increasingly important to enterprises, and the notional balloon I am floating says, “Generating buzzwords and associated over inflated expectations is really easy. Delivering high accuracy, affordable, and scalable content processing is a bit more difficult.”

Perhaps Smartlogic has cracked the content processing equivalent of the Voynich manuscript.

image

Will buzzwords crack the Voynich manuscript’s inscrutable text? What if Voynich is a fake? How will modern content processing systems deal with this type of content? Running some content processing tests might provide some insight into systems which possess Watson-esque capabilities.

What happened to those vendors like Convera, Delphes, Entopia, Fast Search & Transfer, and  Siderean Software, among others? (Free profiles of these companies are available at www.xenky.com/vendor-profiles.) Oh, that’s right. The reality of the marketplace did not match the companies’ assertions about technology. Investors and licensees of some of these systems were able to survive the buzzword blizzard. Some became the digital equivalent of Ötzi, 5,300 year old iceman.

Stephen E Arnold, August 2, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta