Elsevier: An Open Source Flag Carrier?

July 17, 2018

According to this article at the Guardian, the European Union is to be applauded for its goal of open access to all scientific publications by 2020. However, writer and Open Science advocate Jon Tennant condemns one key decision in, “Elsevier Are Corrupting Open Science in Europe.” He tells us:

“However, a cursory glance at the methodological note reveals something rather odd. The subcontractor for the monitor is Elsevier, the publisher and data analytics provider. Within scholarly communications, Elsevier has perhaps the single worst reputation. With profit margins around 37%, larger than Apple and big oil companies, Elsevier dominate the publishing landscape by selling research back to the same institutes that carried out the work. It gets worse too. Throughout the methods, you can see that there is an overwhelming bias towards Elsevier products and services, such as Scopus, Mendeley, and Plum Analytics. These services provide metrics for researchers such as citation counts and social media shares, as well as data-sharing and networking platforms. There are now dozens of comments in the note pointing out the clear bias towards Elsevier and the overlooking of alternatives. It is worth highlighting some of the key issues here that the Commission seems to have ignored in subcontracting to Elsevier.”

One such issue is Elsevier’s alleged track record of working against openness in order to protect its own financial interests. Also, many throughout the EU, including prominent research institutes, have turned against the publisher in distrust. Last but not least, naming an entity that stands to benefit as the Open Science Monitor is an obvious conflict of interest, Tennant declares with understandable incredulity. See the article for details on each of these points. The author is clearly aghast the appointment was allowed in the first place, and recommends the European Commission remove Elsevier from the position posthaste.

Worth watching via open source information, of course.

Cynthia Murrell, July 17, 2018

Cambridge Analytica: A Few More Alleged Factoids

July 17, 2018

It is 2018 and the 2016 US presidential election remain news. Nature wrote an interesting article that digs into the data used to target Facebook users: “The Scant Science Behind Cambridge Analytica’s Controversial Marketing Techniques.” It was revealed in March that Cambridge Analytica collected Facebook user data without consent and that was later used to send false news to voters. It involves something called psychographic targeting.

Psychographic targeting, in which psychographic marketing is based on, uses people’s personality traits to send them targeted information, such as ads. The scary thing is that psychographic targeting actually works, at least when it comes to shopping. Voting is a different matter:

“But these effects were small in absolute terms, points out Brendan Nyhan, a political researcher at Dartmouth College in Hanover, New Hampshire. And what works in consumer purchasing might not apply to voting, he says. “It’s surely possible to leverage personality information for political persuasion in some way, but, as far as I know, such effects are not proven or known to be of a substantively meaningful magnitude,” Nyhan adds. He points to other studies3,4,5 that suggest that political ‘microtargeting’ — sending specific kinds of messages to specific voters — has limited effectiveness.”

Cambridge Analytica might have built a model from the Facebook data, but no one is sure. In fact, no one is exactly sure how to even copy Cambridge Analytica’s methods. Some scientists are trying to reverse engineer the firm’s methods and a journalist and academic group is trying to get Cambridge Analytica to share its data.

Whitney Grace, July 17, 2018

DarkCyber for July 17, 2018, Now Available

July 17, 2018

DarkCyber for July 17, 2018, is now available. You may view the nine minute news program about the Dark Web and lesser known Internet services at www.arnoldit.com/wordpress or Vimeo at this link. This week’s program covers:

This week’s program covers four stories.
The first story reviews the enhanced capabilities of Webhose.io’s Dark Web and Surface Web monitoring service. Tor Version 3 is supported. The content collection system can now access content on Dark Web and i2p services. Plus, Webhose’s system now scans compressed attachments and can access obfuscated sites with Captcha and user name and password requirements.

The second story reports that NSO, an Israeli intelligence services firm, suffered an insider breach. NSO’s Pegasus platform can extract email, text messages, SIM card and cell network information, GPS location data, keychain passwords, including Wi-Fi and router, and voice and image data. The NSO Pegasus system was advertised on the Dark Web. The insider was identified and arrested.

The third story takes a look at Dark Web money laundering services. Mixers, tumblers, and flip concepts are explained. These services are becoming more popular and are coming under closer scrutiny by law enforcement.

The fourth story explains Diffeo’s approach to next generation information access. Diffeo was one of the technology vendors for the Defense Advanced Research Projects Agency’s Memex Dark Web indexing program. The commercial version of Diffeo’s analytic tool is in use at major financial institutions and the US Department of Defense.

Enjoy.

Kenny Toth, July 17, 2018

An Algorithm for Fairness and Bias Checking

July 16, 2018

I like the idea of a meta algorithm. This particular meta algorithm is described in “New Algorithm Limits Bias in Machine Learning.” The write up explains what those not working with smart software have known for—what is it?—decades? A century? Here’s the explanation of what happens when algorithms are slapped together:

But researchers have found that machine learning can produce unfair determinations in certain contexts, such as hiring someone for a job. For example, if the data plugged into the algorithm suggest men are more productive than women, the machine is likely to “learn” that difference and favor male candidates over female ones, missing the bias of the input. And managers may fail to detect the machine’s discrimination, thinking that an automated decision is an inherently neutral one, resulting in unfair hiring practices.

If you want to see how bias works, just run a query for “papa john pizza.” Google dutifully reports via its smart algorithm hits about Papa John’s founder getting evicted from his office, Papa John’s non admission of racial bias, and colleges cut ties to Papa John’s founder.” Google also provides locations and a a link to the a Twitter account. The result displayed for me this morning (July 16, 2018) at 940 am US Eastern was:

papa john pizza hits

The only problem with my query “papa john pizza” is that I wanted the copycat recipe at this link. Google’s algorithm made certain that I would know about the alleged dust up among and within the pizza empire and that I could navigate to a store in Louisville. The smart software made it quite difficult for me to locate the knock off information. Sure, I could have provided Google with more clues to what I wanted like Six Sisters, the word “copycat”, the word “recipe”,  and the word “ingredient.” But that’s what smart software is supposed to render obsolete. Boolean has no role in what algorithms expose to users. That’s why results are often interesting. That’s why smart software delivers off kilter results. The intent is to be useful. Often smart software is anything but.

Are the Google results biased? If I were Papa John, it is possible to take umbrage at the three headlines about bias.

Algorithms, if the write up is correct, will ameliorate this type of smart software dysfunctionality.

The article explains:

In a new paper published in the Proceedings of the 35th Conference on Machine Learning, SFI Postdoctoral Fellow Hajime Shimao and Junpei Komiyama, a research associate at the University of Tokyo, offer a way to ensure fairness in machine learning. They’ve devised an algorithm that imposes a fairness constraint that prevents bias.

The developers is quoted as saying:

“So say the credit card approval rate of black and white [customers] cannot differ more than 20 percent. With this kind of constraint, our algorithm can take that and give the best prediction of satisfying the constraint,” Shimao says. “If you want the difference of 20 percent, tell that to our machine, and our machine can satisfy that constraint.”

Just one question: What if a system incorporates two or more fairness algorithms?

Perhaps a meta fairness algorithm will herd the wandering sheep? Georg Cantor was troubled with this infinity of infinities type issues.

Fairness may be in the eye of the beholder. The statue of justice wears a blindfold, not old people magnifiers. Algorithms? You decide. Why not order a pizza or make your own clone of a Papa John pizza if you can find the recipe. Pizza and algorithms to verify algorithms. Sounds tasty.

If I think about algorithms identifying fake news, I may need to order maximum strength Pepcid and receive many, many smart advertisements from Amazon.

Stephen E Arnold, July 16, 2018

Enterprise Search: Long Documents Work, Short Documents, Not So Much

July 16, 2018

Enterprise Search goals are notoriously wordy and complex. Is this just a symptom of a complicated system that cannot be explained any other way? Probably not, and it’s all one venture capitalist’s idea, according to Business Insider’s recent story: “One Simple Management Trick to Improve Performance, According to John Doer.”

According to the story which is about Doerr’s book “Measure What Matters.”

“[It] explains the thinking behind the Objectives and Key Results (OKR) goal-setting process famously used by companies like Google, MyFitness Pal, and Intel…. “The theory explains that hard goals “drive performance more effectively than easy goals,” and that “specific hard goals ‘produce a higher level of output’ than vaguely worded ones.”

According to Skyword, there are things you and your vendors can do to reverse this trend (Some may not want to reverse. Hey, it’s your world.). Mainly, it deals with understanding your audience and giving them what they crave.

However, short documents often make sense in context; that is, metadata, information about the sender / author and reader / person looking for information, category tags, and other useful information. Enterprise search, despite the wide availability of low cost or no cost solutions, struggle to make sense of short messages like:

“Doesn’t work.”

Videos, encrypted messages, audio, compound documents—Enterprise search systems struggle and often fail. More OPUD.

Patrick Roland, July 16, 2018

Social Media: Mass Behavior Modification and Revenue

July 16, 2018

File this one under “Things That Are Not Big Shock,” but experts have recently been taking a closer look at the trends in social media that tend to ruin this platform. Big surprise, the fact that money is to be made is usually the root of a lot of bad behavior, as we discovered from this cheeky, but somewhat useful Guardian story, “Six Reasons Why Social Media is a Bummer.”

While it lists its gripes against social media in alphabetical order, the points are pretty good, like:

“The mass behavior modification machine is rented out to make money. The manipulations are not perfect, but they are powerful enough that it becomes suicidal for brands, politicians, and other competitive entities to forgo payments to Bummer enterprises. Universal cognitive blackmail ensues, resulting in a rising global spend on Bummer.”

It’s difficult to imagine that social media will change its tune or suddenly go non-profit. Not with the fact that it is a platform custom made to attract the eyes and wallets of millions of users. Just look in the news and you’ll see a controversy that bubbled up on Twitter nearly every day. We can’t see patterns changing when so much money is to be made from so much attention.

Patrick Roland, July 16, 2018

Journalists: Smart Software Is Learning How to Be a Real Journalist

July 15, 2018

I read “Why Bots Taking Over (Some) Journalism Could Be a Good Thing.” I love optimists who lack a good understanding of how numerical recipes work. The notion of “artificial intelligence” is just cool like something out of science fiction like “Ralph 124C 41+” except for the wrong headed predictions. In my 50 year work career, technologies are not revolutions. Technologies appear, die, reform, and then interact, often in surprising ways. Then one day, a clever person identifies a “paradigm shift” or “a big thing.”

The problem with smart software which seems obvious to me boils down to:

  • The selection of numerical recipes to use
  • The threshold settings or the Bayesian best guesses that inform the system
  • The order in which the processes are implemented within the system.

There are other issues, but these provide a reasonable checklist. What does on under the kimono is quite important.

The write up states:

If robots can take over the grunt work, which in many cases they can, then that has the potential to lower media organizations’ costs and enable them to spend a greater proportion of their advertising income on more serious material. That’s terrible news for anybody whose current job is to trawl Twitter for slightly smutty tweets by reality TV show contestants, but great news for organizations funding the likes of Guardian journalist Carole Cadwalladr, who broke the Facebook / Cambridge Analytica scandal. Isn’t it?

Good question. I also learned:

Technology can help with a lot of basic reporting. For example, the UK Press Association’s Radar project (Reporters And Data And Robots) aims to automate a lot of local news reporting by pulling information from government agencies, local authorities and the police. It’ll still be overseen by “skilled human journalists”, at least for the foreseeable future, but the actual writing will be automated: it uses a technology called Natural Language Generation, or NLG for short. Think Siri, Alexa or the recent Google Duplex demos that mimic human speech, but dedicated to writing rather than speaking.

I recall reading this idea to steal:

In fact, human reporters will continue to play a vital role in the process, and Rogers doesn’t see this changing anytime soon. It’s humans that make the decision on which datasets to analyze. Humans also “define” the story templates – for example, by deciding that if a certain variable in one region is above a particular threshold, then that’s a strong indicator that the data will make a good news story.

Now back to the points in the checklist. In the mad rush to reduce costs, provide more and better news, and create opportunities to cover certain stories more effectively, who is questioning the prioritization of content from an available stream, the selection of items from the stream, and the evaluation of the data pulled from the stream for automatic story generation?

My thought is that it will be the developers who are deciding what to do in one of those whiteboard meetings lubricated with latte and fizzy water.

The business models which once sustained “real” journalism focused on media battles, yellow journalism, interesting advertising deals, and the localized monopolies. (I once worked for such an outfit.)

With technology concentration a natural consequence of online information services, I would not get too excited about the NLG and NLP (natural language generation and natural language processing services). These capabilities for smart software will arrive. But I think the functionality will arrive in dribs and drabs. One day an MBA or electrical engineer turned business school professor will explain what happened.

What’s lost? Typing, hanging out in the newspaper lunch room, gossip, and hitting the bar a block from the office. Judgment? Did I leave out judgment. Probably not important. What’s important that I almost forgot? Getting rid of staff, health coverage, pensions, vacations, and sick leave. Software doesn’t get sick even though it may arrive in a questionable condition.

Stephen E Arnold, July 15, 2018

An Amazon Statistic and the Word All

July 14, 2018

I read “Amazon’s Share of the US E Commerce Market Is Now 49% or 5% of All Retail Spend.” The idea of “all” is reassuring. One does not have to worry about precision. All means all, right. Cantor struggled with his view of all. That worry does not trouble the expert writing this article.

infinity symbol

Setting aside word choice, the factoid is semi interesting. Amazon has, according to this source, about half of the e commerce market in the US. Now the sample has to ask people who own a computing device, are able to get online, and who have some method for making a digital payment. If one considers what percentage of the US population checks these boxes, the “all” becomes a subset of the US population. The reason check cashing services exist pivots on the individuals who do not have a banking relationship either directly or via the mostly available prepaid credit cards. Using these prepaid credit cards can be interesting.

Let’s assume that Amazon does have a big chunk of the US e commerce market. The write up suggests that Amazon is heading toward a tipping point. The idea, I think, is that the “all” really will mean every nickel and dime spent online for “retail products.” The idea that Amazon’s growth is surprising strikes me as interesting. The key metric is the rate of change between each major financial milestone. At one time, Google was smoking along. Has Amazon’s growth been chugging along with a nifty slope between time and financial data (remember those?)

An outfit called eMarketer provides data which illustrates how Amazon is making revenue in clothing, beauty items, and groceries.

The only problem I have is that Amazon’s online success is old news. Not far from our log cabin in rural Kentucky, Wal-Mart closed three of its retail outlets. I think that Amazon’s success in e commerce was a contributing factor. In some demographic segments, Amazon’s share of the US retail market is nosing toward 80 percent. Even in rural Kentucky, our rescue French bulldog can be run over by one of the six or seven Amazon deliveries each day unless we are on our toes.

So what?

  • Amazon’s tipping point was reached a couple of years ago? Amazon is now just running plays from its 20 year old playbook. We’re into déjà vu territory.
  • Amazon’s e commerce system is part of a slightly more sophisticated online store. I think it may be helpful for some whiz kid analysts to think about Oracle’s data marketplace and ask, “What does that have in common with Amazon’s retail business?”
  • The notion of “all” is not a helpful way to explain what Amazon has achieved. eMarketer like many other professionals think about consumer products. Big market indeed. There are other ways to look at Amazon’s platform.

Why are these questions important? If Amazon is going to generate enough revenue to double or triple its revenue, it will have to do more than sell T shirts, avocados, and vinyl records.

Wal-Mart’s “Amazon disease” is now spreading to other markets. The “all” misleads when used without a more informed context.

Stephen E Arnold, July 14, 2018

AI: Useful to Major Major Major of In and Out Fame

July 13, 2018

While the capacity for work and accuracy of artificial intelligence is pretty hard to argue with, the expense of starting a machine learning program from the ground up is pretty easy to argue. In some cases, it is more expensive to teach a machine to act like a human than to actually hire a human, we discovered after reading a recent Guardian story, “The Rise of Pseudo-AI: How Tech Firms Quietly Use Humans to Do Bots’ Work.”

One example they gave was:

“In the case of the San Jose-based company Edison Software, artificial intelligence engineers went through the personal email messages of hundreds of users – with their identities redacted – to improve a “smart replies” feature. The company did not mention that humans would view users’ emails in its privacy policy.”

How do you get around this modern day Catch-22? Some think the answer lies in blocks. By using Blockchain technology, development costs for AI could drastically be reduced, some experts think. This is because one of the great costs of AI is data management and sorting. If that process is simplified by Blockchain, the reasoning is that the cost of the program would go down. Finally, we can relieve those poor humans from doing a machine’s job.

When it’s in, it’s out. When it’s out, it’s in. Poetic, no?

Patrick Roland, July 13, 2018

Facebook: A Fan of Infowars

July 13, 2018

I don’t know much about Infowars. I do know that the host has an interesting verbal style. The stories, however, don’t make much sense to me. I just ignore the host and the program.

However, if the information in “Facebook Proves It Isn’t Ready To Handle Fake News” is accurate, Facebook is okay with the host and the Infowars’ approach to information.

The write up reports a Facebook news expert as saying:

“I guess just for being false that doesn’t violate the community standards.” I think part of the fundamental thing here is that we created Facebook to be a place where different people can have a voice. And different publishers have very different points of view.

The Buzzfeed story makes this statement:

Despite investing considerable money into national ad campaigns and expensive mini documentaries, Facebook is not yet up to the challenge of vanquishing misinformation from its platform. As its videos and reporter Q&As take pains to note, Facebook knows the truth is messy and hard, but it’s still not clear if the company is ready to make the difficult choices to protect it.

Hey, it’s difficult for some people to deal with responsibility. Ease off. Facebook is trying hard to be better. Every day. Better.

Stephen E Arnold, July 13, 2018

Next Page »

  • Archives

  • Recent Posts

  • Meta