Who Wrote What? Will an Algorithm Catch Name Surfers?

August 17, 2014

I read “New Algorithm Gives Credit Where Credit Is Due.” The write up sparked a number of thoughts. Let me highlight a couple of passages that made it into my research file.

The focus of the paper, in my opinion, are documents intended for peer reviewed publications and conferences. The write up did not include a sample of the type of “authorship” labeling that takes place. I dug through my files and located a representative example:

image

This is a paper about stuffing electronics on a contact lens. Microsoft was in this game. Google hired Babak Parviz (aka Babak Amir Parviz, Babak Amirparviz, and Babak Parvis). The paper has four authors:

  • H. Yao
  • A. Afanasiev
  • I. Lahdesmaki
  • B. A. Parviz

The idea is that the numerical recipe devised at the Center for Complex Network Research will figure out who did most of the work. I think this is a good idea because my research suggests that the guys doing the heavy lifting in the lab, with Excel, and writing were Yao, Afanasiev, and Lahdesmaki. The guru for the work was Parviz. I could be wrong, so an algorithm to help me out is of interest.

One of the points I highlighted in the write up was:

Using the algo­rithm, which Shen [math whiz] devel­oped, the team revealed a new credit allo­ca­tion system based on how often the paper is co-??cited with the other papers pub­lished by the paper’s co-??authors, cap­turing the authors’ addi­tional con­tri­bu­tions to the field.

Okay, my take on this is that this is a variation of Eugene Garfield’s citation analysis work. That is useful, but it does not dig very deeply into the context for the paper, the patent applications afoot, or the controls placed on the writers by their employers or their conscience. In short, I need some concrete examples or better yet access to the software so I can run some tests. Yep, just like those that mid tier consulting firms (what I call azure chip consultants) do not do. For reference see the Netscout legal document or my saucisson write up.)

The second point is that the sample strikes me as small. I know the rule of thumb that one well regarded researcher used was 50 in the sample, but there are hundreds of thousands of technical papers. Many are available as open source from services like PLOS One. Here’s the point I noted:

the team looked at 63 prize-??winning papers using the algo­rithm. In another finding, the algo­rithm showed physi­cist Tom Kibble, who in 1964 wrote a research paper on the Higgs boson theory, should receive the same amount of credit as Nobel prize win­ners Peter Higgs and François Englert.

I think the work is interesting, but it is in my opinion not ready for prime time.

I know that one content processing firm almost totally dependent on the US Army for funding has been working to identify misinformation, disinformation, and reformation. So far, the effort has yielded no commercial product. Other companies purport to have the ability to “understand” content. Presumably this includes the entities identified in the content object. Progress has stalled. Smart software is easier to write about in a marketing slide deck or a proposal than actually deliver.

That’s why authorship remains something a human has to chase down. Let me give you an example. I provided research to IDC, a mid tier consulting firm in 2012. From august 2012 to July 17, 2014, IDC marketed reports that carried my name, two of my research assistants’ names, and an IDC “expert’s” name. Dave Schubmehl, the IDC “expert” in search is listed as the “author.”

Now is he?

I am confident that in his mind and in IDC’s corporate wisdom he is the man. The person who justifies surfing on another’s name illustrates a core problem in authorship. You can see examples of Dave Schubmehl’s name surfing at this link. The sale of one of these documents on Amazon was an interesting attempt to gain traction for Dave Schubmehl in the high traffic eBook store. See “Amazon May Be Disintermediating Publishers: Maybe Good News for Authors.” I include a screen shot of the Amazon “hit.” My legal eagle successfully got the document removed from Amazon. I am not an Amazon author and don’t want to be.

Hopefully the algorithm to identify the “real” author of a series of $3,500 reports will become a commercial reality. I am interested to learn if there are any other mid tier consulting firms that have used others’ content without getting appropriate permissions. How many “experts” follow the IDC path of expediency?

For now, name surfers have to tracked one by one. Shubmehl and Arnold are now linked. Arnold is the surfboard; Schubmehl is the surfer. Catch a wave is the motto of many surfers.

Stephen E Arnold, August 17, 2014

Comments

One Response to “Who Wrote What? Will an Algorithm Catch Name Surfers?”

  1. MarkLogic: Banging a Drum in Hopes of Drowning Out Open Source NoSQL Reggae Beat : Stephen E. Arnold @ Beyond Search on October 3rd, 2014 9:42 am

    […] firms (Gartner, IDC, Ovum, etc.) generate “independent” reports to inflate the balloon. The French have a […]

  • Archives

  • Recent Posts

  • Meta