China Has a Big Data Policy Setting Everyone Back

December 5, 2017

China is very tightlipped about the way its government handles dissent. However, with the aid of data mining and fake news, they are no longer crushing opposing voices, they are drowning them out. We learned more in the Vox piece, “China is Perfecting a New Method for Suppressing Dissent on the Internet.”

Their paper, titled “How the Chinese Government Fabricates Social Media Posts for Strategic Distraction, Not Engaged Argument,” shows how Beijing, with the help of a massive army of government-backed internet commentators, floods the web in China with pro-regime propaganda.

What’s different about China’s approach is the content of the propaganda. The government doesn’t refute critics or defend policies; instead, it overwhelms the population with positive news (what the researchers call “cheerleading” content) in order to eclipse bad news and divert attention away from actual problems.

Seems like an unwinnable situation for China. However, it would be interesting to see what some of the good guys fighting fake news could do in this situation. We already know big data can be useful in stifling false news stories and intentionally abrasive points of view. But with China not exactly letting outside influence in easily, this will be an uphill battle for the Chinese online community.

Patrick Roland, December 5, 2017

Written by Stephen E. Arnold · Filed Under Content processing, Government, Internet, News | Comments Off on China Has a Big Data Policy Setting Everyone Back

IBM Can Train Smart Software ‘Extremely Fast’ an IBM Wizard Asserts

November 30, 2017

Short honk: If you love IBM, you will want to read “AI, Cognitive Realities and Quantum Futures – IBM’s Head of Cognitive Solutions Explains.” The article contains extracts of an IBM wizard’s comments at a Salesforce event. Here’s the passage I noted:

What we find is we always start with POCs, proof of concept. They can be small or large. They’re very quick now, because we can train Watson our new data extremely fast.

If this is true, IBM may have an advantage over many other smart software vendors. Why? Gathering input data, formatting that data into a form the system’s content processing module can handle, and crunching the data to generate the needed indexes takes time and costs a great deal of money. If one needs to get inputs from subject matter experts, the cost of putting engineers in a room with the SMEs can be high.

It would be interesting to know the metrics behind the IBM “extremely fast” comment. My hunch is that head-to-head tests with comparable systems will reveal that none of the systems have made a significant breakthrough in these backend and training processes.

Talk is easy and fast; training smart software not so much.

Stephen E Arnold, November 29, 2017

Written by Stephen E. Arnold · Filed Under Content processing, IBM Watson, Marketing, News | 1 Comment

Mitsubishi: Careless Salarymen or Spreadsheet Fever?

November 27, 2017

I read “Mitsubishi Materials Says Over 200 Customers Could be Affected by Data Falsification.” Source of the story is Thomson Reuters, a real news outfit, in my opinion.

The main point of the story is to reveal that allegedly false data were used to obfuscate the fact that 200 customers may have parts which do not meet requirements for load bearing, safety, or durability.

When I was in college, I worked in the Keystone Steel & Wire Company’s mill in Illinois. I learned that the superintendent enforced on going checks for steel grades. I learned that there is a big difference between the melt used for coat hanger wire and the melt for more robust austenitic steel. Think weapons or nuclear reactor components made of coat hanger steel.

Mislabeling industrial components is dangerous. Planes can fall from the sky. Bridges can collapse. Nuclear powered submarines can explode. Or back flipping robots to crush Softbank/Boston Dynamic cheerleaders and an awed kindergarten class.

Reuters calls this a “quality assurance and compliance scandal.” That’s a nicer way to explain the risks of fake data, but not even Reuters’ olive oil based soft soap can disguise the fact that distortion is not confined to bogus information in intelligence agency blog posts.

Online credibility is a single tile in a larger mosaic of what once was assumed to be the norm: Ethical behavior.

Without common values regarding what’s accurate and what’s fake, the real world and its online corollary are little more than video game or Hollywood comic book films.

Silicon Valley mavens chatter about smart software which will recognize fake news. How is that working out? Now about the crashworthiness of the 2018 automobiles?

I think the problem is salarymen, their bosses, and twiddling with outputs from databases and Excel in order to make the numbers “flow.”

Stephen E Arnold, November 27, 2017

Written by Stephen E. Arnold · Filed Under Business strategy, Content processing, News, Online (general) | Comments Off on Mitsubishi: Careless Salarymen or Spreadsheet Fever?

Google Search and Hot News: Sensitivity and Relevance

November 10, 2017

I read “Google Is Surfacing Texas Shooter Misinformation in Search Results — Thanks Also to Twitter.” What struck me about the article was the headline; specifically, the implication for me was that Google was not responding to user queries. Google is actively “surfacing” or fetching and displaying information about the event. Twitter is also involved. I don’t think of Twitter as much more than a party line. One can look up keywords or see a stream of content containing a keyword or a, to use Twitter speak, “hash tags.”

The write up explains:

Users of Google’s search engine who conduct internet searches for queries such as “who is Devin Patrick Kelley?” — or just do a simple search for his name — can be exposed to tweets claiming the shooter was a Muslim convert; or a member of Antifa; or a Democrat supporter…

I think I understand. A user inputs a term and Google’s system matches the user’s query to the content in the Google index. Google maintains many indexes, despite its assertion that it is a “universal search engine.” One has to search across different Google services and their indexes to build up a mosaic of what Google has indexed about a topic; for example, blogs, news, the general index, maps, finance, etc.

Developing a composite view of what Google has indexed takes time and patience. The results may vary depending on whether the user is logged in, searching from a particular geographic location, or has enabled or disabled certain behind the scenes functions for the Google system.

The write up contains this statement:

Safe to say, the algorithmic architecture that underpins so much of the content internet users are exposed to via tech giants’ mega platforms continues to enable lies to run far faster than truth online by favoring flaming nonsense (and/or flagrant calumny) over more robustly sourced information.

From my point of view, the ability to figure out what influences Google’s search results requires significant effort, numerous test queries, and recognition that Google search now balances on two pogo sticks. Once “pogo stick” is blunt force keyword search. When content is indexed, terms are plucked from source documents. The system may or may not assign additional index terms to the document; for example, geographic or time stamps.

The other “pogo stick” is discovery and assignment of metadata. I have explained some of the optional tags which Google may or may not include when processing a content object; for example, see the work of Dr. Alon Halevy and Dr. Ramanathan Guha.

But Google, like other smart content processing today, has a certain sensitivity. This means that streams of content processed may contain certain keywords.

When “news” takes place, the flood of content allows smart indexing systems to identify a “hot topic.” The test queries we ran for my monographs “The Google Legacy” and “Google Version 2.0” suggest that Google is sensitive to certain “triggers” in content. Feedback can be useful; it can also cause smart software to wobble a bit.

T shirts are easy; search is hard.

I believe that the challenge Google faces is similar to the problem Bing and Yandex are exploring as well; that is, certain numerical recipes can over react to certain inputs. These over reactions may increase the difficulty of determining what content object is “correct,” “factual,” or “verifiable.”

Expecting a free search system, regardless of its owner, to know what’s true and what’s false is understandable. In my opinion, making this type of determination with today’s technology, system limitations, and content analysis methods is impossible.

In short, the burden of figuring out what’s right and what’s not correct falls on the user, not exclusively on the search engine. Users, on the other hand, may not want the “objective” reality. Search vendors want traffic and want to generate revenue. Algorithms want nothing.

Mix these three elements and one takes a step closer to understanding that search and retrieval is not the slam dunk some folks would have me believe. In fact, the sensitivity of content processing systems to comparatively small inputs requires more discussion. Perhaps that type of information will come out of discussions about how best to deal with fake news and related topics in the context of today’s information retrieval environment.

Free search? Think about that too.

Stephen E Arnold, November 10, 2017

Written by Stephen E. Arnold · Filed Under Content processing, Google, Indexing, News, Text processing, Twitter | 4 Comments

A Clever Take on Google and Fake News

November 8, 2017

I noted this story in the UK online publication The Register: “Google on Flooding the Internet with Fake News: Leave Us Alone. We’re Trying Really Hard. Sob.” The write up points out:

Google has responded in greater depth after it actively promoted fake news about Sunday’s Texas murder-suicide gunman by… behaving like a spoilt kid.

The Google response, as presented in the write up, warranted a yellow circle from my trusty highlighter. The Register said:

Having had time to reflect on the issue, the Silicon Valley monster’s “public liaison for search” and former Search Engine Land blog editor Danny Sullivan gave a more, um, considered response in a series of tweets. “Bottom line: we want to show authoritative information. Much internal talk yesterday on how to improve tweets in search; more will happen,” he promised, before noting that the completely bogus information had only appeared “briefly.”

The Register story includes other gems from the search engine optimization expert who seems to thrive on precision and relevance for content unrelated to a user’s query; for example, the article presents some “quotes” from Mr. Sullivan, the expert in charge of explaining the hows and whys of fake news:

“Early changes put in place after Las Vegas shootings seemed to help with Texas. Incorrect rumors about some suspects didn’t get in…”
Right now, we haven’t made any immediate decisions. We’ll be taking some time to test changes and have more discussions.
“Not just talk. Google made changes to Top Stories and is still improving those. We’ll do same with tweets. We want to get this right.”

Yep, Google wants to do better. Now Google wants to get “this” right. Okay. After 20 years, dealing with fake content, spoofs, and algorithmic vulnerability is on the to do list. That’s encouraging.

For more Google explanations, check out the Register’s story and follow the logic of the SEO wizard who now has to explain fake news creeping—well, more like flowing—into Google’s search and news content.

Does an inability to deal with fake news hint at truthiness challenges at Googzilla’s money machine? Interesting question from my point of view.

Stephen E Arnold, November 8, 2017

Written by Stephen E. Arnold · Filed Under Business strategy, Content processing, Google, Management, News | Comments Off on A Clever Take on Google and Fake News

SEO Benefits Take Time to Realize

October 30, 2017

In many (most?) fields today, it is considered essential for companies to position themselves as close to the top of potential customers’ Web search results as possible. However, search engine optimization (SEO) efforts take time. Business 2 Community explains “Why It Takes Six Months to Improve Search Rankings.” Marketers must accept that, unless they luck out with content that goes viral, they will just have to be patient for results. Writer Kent Campbell explains five reasons this is the case, and my favorite is number one—search systems were not built to aid marketers in the first place! In fact, in some ways, quite the opposite. Campbell writes:

Bing and Google Serve Their Searchers, Not You.

A search provider’s primary concern is its users, not you or any other business that’s fighting for a spot on the first page. The search engine’s goal is to provide the best user experience to its searchers; that means displaying the most relevant and high quality results for every search query. Both Bing and Google watch how people react to content before they decide how visible that content should be in results. Even when content has had a lot of SEO therapy, the content itself has to be spot-on. This is why Google evaluates every piece of content on more than 200 ranking factors and ensures that only the best quality pages make it to the top 10. The best way to make it to the first page is by aligning yourself with Google’s objective, which is to serve its users.

A company might be seeing slow results because they hesitated—Early Movers Have an Advantage is the second reason Campbell gives. On the other hand, at number three, we find that Creating Quality Content Takes Time. Then there is the fact that Link Building Is Not as Simple as Before. Finally, there’s this more recent complication—Social Media Also Impacts Rankings these days. See the article for Campbell’s explanation for each point. He concludes with a little advice: companies would do well to consider their SEO efforts an ongoing cost of doing business, rather than an extraordinary item.

Cynthia Murrell, October 30, 2017

Written by Stephen E. Arnold · Filed Under Content processing, Google, News, Social Media | Comments Off on SEO Benefits Take Time to Realize

Facebookand Publishing

October 23, 2017

Print publishing has slowly been circling the drain as revenue drops (at least depending on what type of publishing you are in). Some publishers have tried going totally digital, hoping that online subscriptions and ads would pay the bills, but Google and Facebook are siphoning off the source. The Next Web shares more of how publishers are struggling in the article, “Publishers Need To Learn From Mega Platforms Like Facebook.”

Like many smart companies, publishers have joined social media and hoped to build their brand image on them. Publishers, however, have learned that Facebook and other social media platforms keep changing their requirements. The article compares it to a type of Darwinian survival of the fittest. The publishing companies with deep pockets are surviving by investments and smart digital upgrades.

Jeff Bezos is used as an example because he has turned video streaming as one of Amazon’s main profit generators. The suggestion is that publishers follow suit with video and then live video streams. The comments sections in these videos create an ongoing dialogue with viewers (while at the same time allowing trolls). It turns out that commoditized content on social media is not the way to go.

Publishers need to instead concentrate on building their own platform apparently:

This is the perfect time for publishers to take control of their platforms and the video streams that will drive the next phase of the digital content revolution. With advances in live video programming and the speed with which original content can be created, publishers can greatly enhance what they already do and know, and monetize it through changes in advertising models that fuel online media platforms as well as live-streaming video platforms.

The Internet is more than video, however. Podcasts and articles are still viable content too. It might be time to double think your career if you are a social media manager.

Whitney Grace, October 23, 2017

Written by Stephen E. Arnold · Filed Under Content processing, Facebook, News, Publishing | Comments Off on Facebookand Publishing

Brief Configuration Error by Google Triggers Japanese Investigation

October 12, 2017

When a tech giant makes even a small mistake, consequences can be significant. A brief write-up from the BBC, “Google Error Disrupts Corporate Japan’s Web Traffic,” highlights this lamentable fact. We learn:

Google has admitted that wide-spread connectivity issues in Japan were the result of a mistake by the tech giant. Web traffic intended for Japanese internet service providers was being sent to Google instead.

Online banking, railway payment systems as well as gaming sites were among those affected.

A spokesman said a ‘network configuration error’ only lasted for eight minutes on Friday but it took hours for some services to resume. Nintendo was among the companies who reported poor connectivity, according to the Japan Times, as well as the East Japan Railway Company.

All of that content—financial transactions included—was gone for good, since Google cannot transmit to third-party networks, according to an industry expert cited in the post. Essentially, it seems that for those few minutes, Google accidentally hijacked all traffic to NTT Communications Corp, which boasts over 50 million customers in Japan. The country’s Ministry of Internal Affairs and Communications is investigating the incident.

Cynthia Murrell, October 12, 2017

Written by Stephen E. Arnold · Filed Under Content processing, Corporate Concerns, Google, News | 1 Comment

Google-Publishers Partnership Chases True News

September 22, 2017

It appears as though Google is taking the issue of false information, and perhaps even their role in its perpetuation, seriously; The Drum reveals, “Google Says it Wants to Fund the News, Not Fake It.” Reporters Jessica Goodfellow and Ronan Shields spoke with Google’s Madhav Chinnappa to discuss the Digital News Initiative (DNI), which was established in 2015. The initiative, a project on which Google is working with European news publishers, aims to leverage technology in support of good journalism. As it turns out, Wikipedia’s process suggests an approach; having discussed the “collaborative content” model with Chinnappa, the journalists write:

To this point, he also discusses DNI’s support of Wikitribune, asserting that it and Wikipedia are ‘absolutely incredible and misunderstood,’ pointing out the diligence that goes into its editing and review process, despite its decentralized means of doing so. The Wikitribune project tries to take some of this spirit of Wikipedia and apply this to news, adds Chinnappa. He further explains that [Wikipedia & Wikitribune] founder Jimmy Wales’ opinion is that the mainstream model of professional online publishing, whereby the ‘journalist writes the article and you’ve got a comment section at the bottom and it’s filled with crazy people saying crazy things’, is flawed. He [Wales] believes that’s not a healthy model. What Wikitribune wants to do is actually have a more rounded model where you have the professional journalist and then you have people contributing as well and there’s a more open and even dialogue around that,’ he adds. ‘If it succeeds? I don’t know. But I think it’s about enabling experimentation and I think that’s going to be a really interesting one.’

Yes, experimentation is important to the DNI’s approach. Chinnappa believes technical tools will be key to verifying content accuracy. He also sees a reason to be hopeful about the future of journalism—amid fears that technology will eventually replace reporters, he suggests such tools, instead, will free journalists from the time-consuming task of checking facts. Perhaps; but will they work to stem the tide of false propaganda?

Cynthia Murrell, September 22, 2017

Written by Stephen E. Arnold · Filed Under Content processing, Google, News, Publishing | 1 Comment

Twitch Incorporates ClipMine Discovery Tools

September 18, 2017

Gameplay-streaming site Twitch has adapted the platform of their acquisition ClipMine, originally developed for adding annotations to online videos, into a metadata-generator for its users. (Twitch is owned by Amazon.) TechCrunch reports the development in, “Twitch Acquired Video Indexing Platform ClipMine to Power New Discovery Features.” Writer Sarah Perez tells us:

The startup’s technology is now being put to use to translate visual information in videos – like objects, text, logos and scenes – into metadata that can help people more easily find the streams they want to watch. Launched back in 2015, ClipMine had originally introduced a platform designed for crowdsourced tagging and annotations. The idea then was to offer a technology that could sit over top videos on the web – like those on YouTube, Vimeo or DailyMotion – that allowed users to add their own annotations. This, in turn, would help other viewers find the part of the video they wanted to watch, while also helping video publishers learn more about which sections were getting clicked on the most.

Based in Palo Alto, ClipMine went on to make indexing tools for the e-sports field and to incorporate computer vision and machine learning into their work. Their platform’s ability to identify content within videos caught Twitch’s eye; Perez explains:

Traditionally, online video content is indexed much like the web – using metadata like titles, tags, descriptions, and captions. But Twitch’s streams are live, and don’t have as much metadata to index. That’s where a technology like ClipMine can help. Streamers don’t have to do anything differently than usual to have their videos indexed, instead, ClipMine will analyze and categorize the content in real-time.

ClipMine’s technology has already been incorporated into stream-discovery tools for two games from Blizzard Entertainment, “Overwatch” and “Hearthstone;” see the article for more specifics on how and why. Through its blog, Twitch indicates that more innovations are on the way.

Cynthia Murrell, September 18, 2017

Written by Stephen E. Arnold · Filed Under Analytics, Content processing, Metadata, News | Comments Off on Twitch Incorporates ClipMine Discovery Tools

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

China Has a Big Data Policy Setting Everyone Back

IBM Can Train Smart Software ‘Extremely Fast’ an IBM Wizard Asserts

Mitsubishi: Careless Salarymen or Spreadsheet Fever?

Google Search and Hot News: Sensitivity and Relevance

A Clever Take on Google and Fake News

SEO Benefits Take Time to Realize

Facebookand Publishing

Brief Configuration Error by Google Triggers Japanese Investigation

Google-Publishers Partnership Chases True News

Twitch Incorporates ClipMine Discovery Tools

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta