Facebook: The Fallacy of Rules in an Open Ended Datasphere

December 29, 2018

I read “Inside Facebook’s Secret Rulebook for Global Political Speech.” Yogi Berra time: It’s déjà vu all over again.”

Some history, gentle reader.

Years ago I met with a text analytics company. The purpose of the meeting was to discuss how to identify problematic content; for example, falsified reports related to a warfighting location.

I listened as the 20 somethings and a couple of MBA types bandied about ideas for creating a set of rules that would identify the ways in which information is falsified. There was the inevitable knowledgebase, a taxonomy of terms and jargon, and rules. “If then” stuff.

The big idea was to filter the content with the front end of old school look ups and then feed the outputs into the company’s smart system. I listened and suggested that the cost and time of fiddling with rules would consume the available manpower, time, and money.

Ho, ho, ho was the response. Listen to the old goose from rural Kentucky.

Yeah, that was in 2005, and where is that system now? It’s being used as a utility for IBM’s staggering mountain of smart software and for finding items of interest for a handful of financial clients.

Ho, ho, ho. The joke is one the whiz kids and the investors, who care going to run out of patience when the light bulb does on and says:

“Yo, folks, figuring out what’s fake, shaped, disinformationized, or reformationized content is what makes search difficult.”

I read a scoop from the New York Times. Yep, that’s the print newspaper which delivers to my door each day information that is two or three days old. I see most of the stories online in one form or another. Tip: 85 percent of news is triggered by AP or Reuters feeds.

The article reveals that Facebook’s really smart people cannot figure out how to deal with various types of speech: Political and other types. The child porn content on WhatsApp is a challenge as well I would add.

The write up says:

An examination of the files revealed numerous gaps, biases and outright errors. As Facebook employees grope for the right answers, they have allowed extremist language to flourish in some countries while censoring mainstream speech in others.

Yep, a scoop.

Facebook’s hubris, like the text processing company which dragged me into a series of bull sessions, allows the company to demonstrate that it cannot cope with filtering within a datasphere in which controls are going to be tough to enforce.

The fix is to create a for fee country club. If a person does not meet the criteria, no membership for you. Then each member gets the equivalent of a US social security number which is linked to the verified identity, the payment mechanism, and other data the system can link.

Amazon has this type of system available, but I am not sure the Facebookers are going to pay Amazon to use its policeware to create a clean, well lit place. (Sorry, Ernest, not “lighted”.)

As a final point, may I suggest that rules based systems where big data floweth are going to be tough to create, update, and pay for.

On the other hand, why not hire the New York Times to set up an old school editorial board to do the work. News is not ringing the financial bell at the NYT, so maybe becoming the Facebook solution is a path forward. The cost may put Facebook in the dog house with investors, but the NYT regains it position as the arbiter of what’s in and what’s out.

Power again!

Stephen E Arnold, December 29, 2018

Happy Holidays: Google News May Be Mortally Wounded

December 25, 2018

I read “Google Says EU Rules Will Force It to Cut News Services.” My first reaction was, “There goes traffic to the news Web sites.” Then I thought, “What traffic?”

The write up reports:

Google has claimed it will be forced to slash the range of news thrown up by its search engine if European rules to protect copyright owners come into force.

Those copyright rules were, in part, triggered by Google itself. The click loving newspapers took a middle of the road approach: Not good, not bad.

Now the EC has cranked out a copyright regulation with Article 11. The lingo refers to “neighboring rights.” The idea is that Google has surfed on hard working journalists’ work. I assume the fraudulent stories in Der Spiegel are not included. (Yikes, a back link. Trouble looms for the Beyond Search goose.)

If the GOOG sticks in a link, the GOOG has to pay the publisher. It gives me a headache to think about the “who”. Many newspapers are pastiches of content from a wide range of sources. The copyright sensitive Associated Press is not going to be happy if one of its syndicated stories is not handled in a way that makes the AP’s legal eagle happy.

To sum up: The Google News death watch has begun. Will the GOOG survive or will it succumb to the EC’s immune system?

Stephen E Arnold, December 25, 2018

Publisher Morphs into Software Vendor

December 12, 2018

Just when we thought content management systems were dead, a publisher is jumping in with feet of Clay. That is the publishing platform Fast Company refers to in its headline, “Why New York Magazine is Selling Its Own Technology to Other Media Companies.” It has been some time since the magazine built its in-house platform, Clay, but recently it partnered with Slate to redesign that site’s CMS. That went so well that two other sites are following suit—Golf.com uses the tech for its large collection of data on U.S. golf courses, and it has been built into Radio.com, which streams music while users peruse the site. Writer Cale Guthrie Weissman reports:

“These features require their own bespoke functionalities—which Clay powers—but they weren’t part of an existing template, like the kind you would find in a conventional CMS. In fact, Clay goes against the model of template-based CMSs, and instead allows developers to use its code base and tools to build their own unique features. ‘What I think makes our CMS and model unique is that [clients are] not buying what we have,’ Hallac says. ‘The way Clay works is that licensees are part of a closed, open-sourced network.’ In a sense, all the customers are part of a consortium building their own things. The Clay codebase is shared among all of them, but they fork it and then build whatever they want atop it themselves.”

And search? Alas, it is not mentioned.

Weissman goes on to observe that selling such in-house software is becoming a trend, giving these examples:

“Vox Media, for instance, has been pushing its Chorus software—which offers both a CMS and an ad network. (New York Media is not selling any sort of advertising network products besides the ability to put ads on a site.) The Washington Post, too, has its own platform, called Arc, which is being licensed to newspapers around the world. Differences aside, the idea and price model is generally the same. Media companies find licensees who shell out monthly fees.”

This is indeed an interesting direction for the publishing industry. As for the Clay platform, Weissman suspects this timing may be an effort to make its parent company, New York Media, LLC, look tasty to potential buyers. The company is reported to have already fielded a few offers.

Cynthia Murrell, December 12, 2018

Thomson Reuters: Content Slicing and Dicing Chops People

December 6, 2018

A certain database company raked in the dough by pitching XML slicing and dicing as the way to piles of cash, happy customers, and reduced editorial costs. Thomson Reuters was “into” XML inspired slicing and dicing. Now the chopping up has moved from disparate content to staff.

According to a real news organization, the article “Thomson Reuters to cut 3,200 jobs by 2020, offer fewer products” states:

Thomson Reuters said it plans to cut its workforce by 12 percent, or 3,200 positions, by 2020 as part of a push to reduce spending.

Capital outlays as a share of revenue will be down about 30 percent by 2020, Thomson Reuters said Tuesday in a presentation for investors. By that year, Thomson Reuters expects to have about 11 percent fewer products and pare its number of locations by 30 percent. The pullback underscores efforts to exert cost discipline after third-quarter revenue came in 2.3 percent less than analysts had expected.

TR revenues have been less than exciting. Despite management’s heroic efforts, the company has not been able to shake the money tree with the vigor some stakeholders expect.

Thus, slicing and dicing of staff and products is underway. Nothing like a hefty reduction in force or RIF to brighten the individuals who can now look forward to finding their future elsewhere.

The larger question is, “What will TR do if the staff reductions and new points of focus do not generate revenue?” The account, lawyer, and MBA infused senior management may have to look for different sources of inspiration; for example:

  1. Seeking to pull the company into new markets with must have products and services. Not easy, I know, but TR will have to do more than follow the well worn grooves in the business models which are like the streets of Pompeii
  2. Selling itself to another large professional publishing outfit. What about a Thomson Elsevier or (perish the thought) an Ebsco Thomson?
  3. Selling the bits and pieces to investment banks or small companies eager to capitalize on TR’s missed penalty kicks. What would Bloomberg pay for the terminal business and maybe the Palantir inspired services? Perhaps Factset would toss a soccer boot on the pitch?
  4. Modifying its executive compensation methods so that TR unit managers actually cooperate on certain opportunities and initiatives.

There are, of course, other options, but many of these have been tried before; for instance, new units, new senior managers, new acquisitions, and new technologies.

Net net: TR may have to start thinking about life as a smaller, leaner, less profitable operation. Lord Thomson of Fleet may not be able to return and infuse the company. He’s needed in my opinion.

Stephen E Arnold, December 6, 2018

SciTech Journals Resist Smart Software

December 1, 2018

The scientific community is in the throes of a battle that might sound familiar even to folks who have nothing to do with science. They are trying to overcome a glut of fake news, and are turning those weapons on themselves to do so. We discovered more in a recent Analytics India article, “How AI is Tackling Fake Academic Research That is Plaguing Scientific Community.”

According to the story:

“A decade ago, researchers Jeremy Stribling, Dan Aguayo and Max Krohna of MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) built a computer science paper generator that could stitch together nonsense papers with impressive graphs and such. The papers drummed up by SCIgen software were accepted at major conferences and even by reputed journals.”

The article purports that that AI technology used to make fake articles is not being utilized in debunking them as well. According to Nature, these tools are running the gamut for academic journals, from choosing which peer reviewers to work with, to verifying statistics, and even summarizing complex articles. This is a neat tool that proves the only way to fight fire is with fire. We can only hope that we are able to keep ahead of the frauds.

Patrick Roland, December 1, 2018

Thomson Reuters on a Privacy International Beat

November 26, 2018

I know that commercial database publishers can be profitable operations. But in order to keep pace with erosion of some traditional revenue streams, some professional publishers have been working to generate new databases which can be licensed to certain government agencies. In most cases, a researcher or librarian will not have these electronic files in their toolkit.

Privacy International published “Who Supplies the Data, Analysis, and Tech Infrastructure to US Immigration Authorities?” The report is available without charge, but I suggest that you download it promptly. Certain reports about some topics can go offline without notice.

I don’t want to dig through the references to references to Palantir. The information about that company is not particularly fresh. However, Privacy International has gathered some useful examples of Thomson Reuters’ products and services to law enforcement and other government agencies.

Privacy International seems unaware that many LE and intel entities routinely outsource work to third part, license a wide range of numeric and factual data, and tap into the talent pools at third party firms.

The Privacy International report does not provide much information about Thomson Reuters’ use of the Palantir technology. That might be an interesting topic for some young researcher to explore. We will do a short item about some of the Privacy International information in the DarkCyber for December 11, 2018.

Stephen E Arnold, November 26, 2018

The Journal Subscription Doomsday Is Upon Us

November 11, 2018

China might have the great firewall when it comes to blocking access, but another infamous digital wall is the great academic paywall a.k.a. subscription paywalls that block access to scientific and academic journals. That could all be changing. Researchers everywhere are shocking and gasping at the thought of have free access to expensive research materials. Could this be the sound of celebration? Nature fills us in on the details in the story, “Radical Open-Access Plan Could Spell End To Journal Subscriptions.”

The celebrating would only take place in Europe, however, as European research funders have banded together for a radical open access initiative that could revolutionize science publishing. The publishers are already angry over this. The European funders are eleven agencies that spend over $8.8 billion annually in research grants. From 2020 and onward, they want all research papers resulting from their grants to be available under a liberal publishing license. The idea is that science should not be kept behind paywalls. The endeavor is called Plan S.

Plan S would change the current publishing model:

“As written, Plan S would bar researchers from publishing in 85% of journals, including influential titles such as Nature and Science. According to a December 2017 analysis, only around 15% of journals publish work immediately as open access (see ‘Publishing models’) — financed by charging per-article fees to authors or their funders, negotiating general open-publishing contracts with funders, or through other means. More than one-third of journals still publish papers behind a paywall, and typically permit online release of free-to-read versions only after a delay of at least six months — in compliance with the policies of influential funders such as the US National Institutes of Health (NIH).

We also noted:

And just less than half have adopted a ‘hybrid’ model of publishing, whereby they make papers immediately free to read for a fee if a scientist wishes, but keep most studies behind paywalls. Under Plan S, however, scientists wouldn’t be allowed to publish in these hybrid journals, except during a “transition period that should be as short as possible,” the preamble says.”

While eleven scientific organizations support Plan S, other European science organizations are still on the fence. These organizations are unsure of how the open access would alter their funding measures and affect their research. The publishers are even more concerned, because it disrupts their entire business model. While they support increasing access to journals, they do not want to get rid of hybrid journals. The publishers think it is better if they all act as one large conglomerate, instead of smaller groups so their goals align. Moving to entirely open access would diminish the quality, peer review, and research of papers.

Plan S would mean the end to subscription paywalls and allow more access to scientific research. The bigger question is who will pay the bill and will research suffer in quality if it becomes “free”?

Whitney Grace, November 11, 2018

Sensational Development from Real Publishers

October 5, 2018

You thought I was going to offer a comment about the Bloomberg report about Supermicro motherboards. Wrong. Frankly when one purchases hardware from sources which operate in far off lands, one often may not know exactly what functions and features those semi magical devices harbor. That is one reason why some law enforcement and intelligence organizations use Faraday cages and approach hardware with a bit of skepticism. How true are the Bloomberg allegation and the subsequent verbal arabesques like this one?

The hair on fire reaction to the allegedly accurate and then allegedly accurate information suggests that some people are concerned.

Nope, the big news from the world of real publishers and real publishing is different. Navigate to this Chicago Tribune report. The write up explains that the Tronc organization is changing its name back to Tribune Publishing.

Personally I liked the word Tronc. The judgment of the real publishing management professionals was on display.

Sadly Tronc has been retired. The article reported (accurately, I assume):

“We are excited about the company rebranding to Tribune Publishing,” spokeswoman Marisa Kollias said in a statement. “It’s a nod to our roots, and a reinforcement of the journalistic foundation on which all of our news brands stand.”

Yep, a journalistic foundation which Tronc did not suggest.

Stephen E Arnold, October 5, 2018

Manipulating the Google: A Reputation Management Method Revealed

October 1, 2018

I don’t want to go through the procedure described in “Data from the Lumen Database Highlights How Companies Use Fake Websites and Backdated Articles to Censor Google’s Search Results.” The article does a good job of explaining how Google’s weak time and date function makes it possible to neutralize certain content objects. The lever is the DMCA takedown notice.

Works most of the time in our experience with Augmentext and some related methods.

I thought it would be useful to highlight what Lumen is.

Straightaway it is a project of the Berkman Klein Center for Internet & Society at Harvard University. The group, however, is an independent third party research “project.” The third parties collect and analyze requests to remove material from the Web.

These data are gathered in a database and analyzed.

Who works on these objective investigations?

There is the EFF and law school clinics. Help for the unit was provided by Harvard, Berkeley, Stanford, University of San Francisco, University of Maine, George Washington School of Law, and Santa Clara University School of Law.

What’s interesting is that Lumen is supported by “gifts from Google.” Others kick in, of course. There are no promised deliverables. The pursuit of knowledge is the goal.

More info is here.

How surprised will Google, reputation management firms, and those who want certain content objects disassociated from their name?

Pretty surprised was the consensus around the cast iron stove here in Harrod’s Creek. We just burn magazines, books, and journals hereabouts.

Stephen E Arnold, October 1, 2018

Tracking Facebook: The Job of a Real Journalist Is Stressful, Alarming

September 30, 2018

Want to know what the life of a “real” journalist is like? Navigate to “Exposing Cambridge Analytica: ‘It’s Been Exhausting, Exhilarating, and Slightly Terrifying.” Here in Harrod’s Creek we believe everything we read online, whether from Facebook, the GOOG, or the Guardian.

The write up is unusual because on one hand, the virtues of being curious and asking questions leads to “terrifying” experiences. On the other hand, the Guardian is just a tiny bit proud that it made the information available.

I learned:

Cadwalladr’s reporting led to the downfall of Cambridge Analytica and a public apology from Facebook’s Mark Zuckerberg who was forced to testify before congress. Facebook has since lost $120 billion from its share price.

That’s nosing into Elon Musk Tweet territory.

I knew social media was a force, but these are big numbers. Perhaps newspaper advertising will reach these heights with “stressful, alarming” assignments for the “real” journalists?

I learned:

It’s got easier every time I’ve published – sunlight is the best disinfectant etc.

Interesting idea in a world which seems to be emulating the fiction of 1984.

I learned what lubricant allowed the “real” journalist to move forward:

I have to say that the support of readers was absolutely crucial and was one of the things that enabled me to carry on. Not just because it helped give me the confidence to keep going, but also because it helped give the organization confidence. It takes a huge amount of resources and resolve for a news organization to keep publishing in the face of the kind of threats we were facing, and the support of the readers for the story and what we were trying to do really did help give my editors confidence, I think. And I’m really grateful for that.

Does this mean that the “real” newspaper was the motive force?

If so, then “real” newspapers are positive forces in today’s world and not conduits for popular culture, sports, and informed opinion.

My thought was, “I wonder if the Babylonian clay tablet brigade voiced similar sentiments when writing on sheepskin became the rage.”

Probably not.

Rah rah for the “real” journalist. Rah rah for the newspaper.

Any rah rahs for Facebook? Nah. Bro culture. Security laughing stock. Sillycon Valley.

But Cambridge Analytica? Yeah, British with a lifeline from some interesting Americans.

Stephen E Arnold, September 30, 2018

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta