Professional Publishing and Academic Standards: A Low Water Mark?

January 20, 2022

Ah, professional publishing in action. Retraction Watch reports, “‘This is Really Ridiculous’: An Author Admitted Plagiarism. His Supervisor Asked for a Retraction. The Publisher said, ‘nah.’” We wish we were surprised by an academic journal’s disinterest in veracity. The write-up largely consist of excerpts from emails between the submitting author, his supervising professor, the co-authors he admitted to plagiarizing from, and editors at the journal (IEEE Access). In setting up those quotations, the article explains:

“Behrouz Pourghebleh is perplexed. And also exasperated. Pourghebleh, of the Young Researchers and Elite Club at the Urmia branch of Islamic Azad University in Iran, noticed a paper published on December 15, 2020 in an IEEE journal that overlapped 80 percent with an article he’d co-authored the year before. Pourghebleh wrote to Zakirul Alam Bhuiyan, the associate editor who had handled the paper, on December 31, 2020, expressing concern. Bhuiyan responded the same day, saying the paper hadn’t been flagged in a similarity check, and that he would contact the authors for a response. The first author, Karim Alinani, wrote to Pourghebleh less than two weeks later, admitting the plagiarism but citing personal circumstances.”

Those personal circumstances are heartbreaking, to be sure, and the consequences editor Bhuiyan notes can befall those called out for plagiarism are indeed ruinous. Given the potential aftermath, Bhuiyan pleaded with Pourghebleh, can’t we just let this one slide? (That is a succinct paraphrase.) Both authors of the plagiarized paper strongly disagreed, but were willing to pursue a less disastrous route to retraction by appealing to Alinani’s postdoctoral supervisor. Even at the professor’s request, though, retraction was a no-go for the publication. The curious can navigate to the write-up for the details in that trail of email excerpts.

Despite our sympathy for Alinani, we think the time to consider consequences is before submitting a paper for publication. Or at least it should be. We agree with Pourghebleh when he called the journal’s outright refusal to retract the paper “really ridiculous.” Retraction Watch notes that the problematic paper has been cited at least once. We doubt that will be the last time.

Cynthia Murrell, January 20, 2021

ShadowDragon Profiled by Esteemed Tech Expert Kim Komando

January 13, 2022

This is an interesting turn of events. Policeware vendor ShadowDragon has been profiled by computer guru-ette Kim Komando on her Tech Refresh podcast episode, “Software Tracking Everything You Do, New iPhone, Alexa on Wheels.” The video’s description reads:

“Have you heard of ShadowDragon? It collects data from 120 major sites going back a decade. Yes, 10 years of info about YOU. Plus, the iPhone 13 and iOS 15 are here, along with Amazon’s new smart home gear, including Astro, the Echo on wheels.”

Yes, we have heard of ShadowDragon. The security company mines data from more than 120 social-media websites, archives results for a decade, and shares the information with its law-enforcement clients around the world. ShadowDragon boasts its software can take an investigation down “from months to minutes.” The podcast starts discussing the company at timestamp 13:05, warning one would have to refrain from social media altogether to avoid its reach. The inclusion seems to support our prediction that reporters are becoming more aware of, and reporting more on, such specialized service vendors. This will make it harder for such firms to keep their generally preferred low profiles. Based in Cheyenne, Wyoming, ShadowDragon was founded in 2015.

For those curious, that podcast episode also discussed the newest iPhones, covered some weird news stories, and reviewed smart floodlights, among other wide-ranging topics. Their coverage of Amazon’s Astro home robot caught the attention of this Alexa-wary writer—apparently the device is so thirsty to identify folks with facial recognition it will (if left in “patrol” mode) follow guests around until it can identify them. It also, according to Motherboard, tracks everything owners do.

Cynthia Murrell, January 13, 2021

Amazon: A Decision Imposed and A Practice Challenged

January 12, 2022

Alexa.com, purportedly named for legendary bastion of knowledge the Library at Alexandria, has been a go-to tool for traffic-based web rankings, APIs, and other website information for 25 years. Now, however, Amazon is pulling the plug on the subsidiary. Bleeping Computer announces, “Amazon Is Shutting Down Web Ranking Site Alexa.com.” Perhaps Alexa the AI assistant wanted the name all to itself. New subscriptions have been halted, but existing subscribers will have access to Amazon data and SEO tools until May 1, 2022. Amazon APIs will be retired on December 8, 2022. Writer Mayank Parmar reports:

“In addition to the global website ranking system, Amazon’s Alexa.com also offers a full suite of SEO and competitor analysis tools with its paid subscriptions. In a new support document, Amazon says that it will be discontinuing the Alexa.com platform in May 2022 and no new monthly stats will be released going forward. ‘Twenty-five years ago, we founded Alexa Internet. After two decades of helping you find, reach, and convert your digital audience, we’ve made the difficult decision to retire Alexa.com on May 1, 2022. Thank you for making us your go-to resource for content research, competitive analysis, keyword research, and so much more,’ the company stated.”

Meanwhile, Reuters tells us good old Italy is trying to fight back against the Amazon behemoth in, “Italy Fines Amazon Record €1.3 Bln for Abuse of Market Dominance.” Reporters Elvira Pollina and Maria Pia Quaglia write:

“Italy’s watchdog said in a statement that Amazon had leveraged its dominant position in the Italian market for intermediation services on marketplaces to favor the adoption of its own logistics service – Fulfillment by Amazon (FBA) – by sellers active on Amazon.it. The authority said Amazon tied to the use of FBA access to a set of exclusive benefits, including the Prime label, that help increase visibility and boost sales on Amazon.it. … The antitrust authority also said it would impose corrective steps that will be subject to review by a monitoring trustee.”

This comes as the EU Commission is pursuing two of its own investigations into Amazon. One involves the use of sensitive data from independent retailers. The other considers whether the company elevated its own retail offers and those of sellers that use its logistics and delivery services over offers from other vendors. The €1.13 billion fine is one of the largest to be levied on a US tech company by a European entity, but will it be enough to give Amazon pause? Along with its compatriots/rivals Google and Facebook, the company has a history of shrugging off what seem to most like large fees and carrying on with business as usual.

Cynthia Murrell, January 12, 2022

How about That Smart Software?

January 3, 2022

In the short cut world of training smart software, minor glitches are to be expected. When an OCR program delivers 95 percent accuracy, that works out to five mistakes in every 100 words. When Alexa tells a child to put a metal object into a home electrical outlet, what do you expert? This is close enough for horse shoes.

Now what about the Google Maps of today, a maps solution which I find almost unusable. “Google Maps May Have Led Tahoe Travelers Astray During Snowstorm” quoted a Tweet from a person who is obviously unaware of the role probabilities play in the magical world of Google. Here’s the Tweet:

This is an abject failure. You are sending people up a poorly maintained forest road to their death in a severe blizzard. Hire people who can address winter storms in your code (or maybe get some of your engineers who are stuck in Tahoe right now on it).

Big deal? Of course not, Amazon and Google are focused on the efficiencies of machine-centric methods for identifying relevant, on point information. The probability is that most of the Amazon and Google outputs will be on the money. Google Maps rarely misses on pizza or the location of March Madness basketball games.

Severely injured children? Well, that probably won’t happen. Individuals lost in a snow storm? Well, that probably won’t happen.

The flaw in these giant firms’ methods are correct from these companies’ point of view in the majority of cases. A terminated humanoid or a driver wondering if a friendly forest ranger will come along the logging road? Not a big deal.

What happens when these smart systems output decisions which have ever larger consequences? Autonomous weapons, anyone?

Stephen E Arnold, January 3, 2021

Thoughts about AI Bias: Are Data Non-Objective?

December 10, 2021

I read “Breaking Bias — Ensuring Fairness in Artificial Intelligence.” The substance of the write up is an interview with Alix Melchy, VP of AI at Jumio. Okay.

I did note a couple of interesting statements in the interview.

First, Mr. Melchy takes aim at Snorkel-type systems and methods. These are efficient and do away with most of the expensive human intensive training data set work. Here’s his statement:

… fairness bias …enters into AI systems through training data that contains skewed human decisions or represents historical or social prejudices.

Data sets which are not woke are, its seems, going to be biased.

Second, Mr. Melchy says:

bias can be damaging to the credibility of AI as a whole,

Does the AI methods manifested by big tech care? Nope, not as long as the money flows into the appropriate bank account in my opinion.

Third, Mr. Melchy notes:

… companies that don’t build an AI system with bias considerations from the start are never going to catch up to an industry-standard level of accuracy.

Okay, Google. Alexa, are you listening?

Stephen E Arnold, December 10, 2021

Recognition (People and Things) Not 100 Percent Yet

November 24, 2021

It may sound like a good idea—use technology to find illegal images, like those of child sexual abuse, and report the criminals who perpetuate them. Apple, for example, proposed placing such a tool on all its personal devices but postponed the plan due to privacy concerns. And some law enforcement agencies are reportedly considering using the technology. However, researchers at the Imperial College London have found “Proposed Illegal Image Detectors on Devices Are ‘Easily Fooled’.” Reporter Caroline Brogan writes:

“Researchers who tested the robustness of five similar algorithms found that altering an ‘illegal’ image’s unique ‘signature’ on a device meant it would fly under the algorithm’s radar 99.9 per cent of the time. The scientists behind the peer-reviewed study say their testing demonstrates that in its current form, so-called perceptual hashing based client-side scanning (PH-CSS) algorithms will not be a ‘magic bullet’ for detecting illegal content like CSAM [Child Sexual Abuse Material] on personal devices. It also raises serious questions about how effective, and therefore proportional, current plans to tackle illegal material through on-device scanning really are. The findings are published as part of the USENIX Security Conference in Boston, USA. Senior author Dr Yves-Alexandre de Montjoye, of Imperial’s Department of Computing and Data Science Institute, said: ‘By simply applying a specifically designed filter mostly imperceptible to the human eye, we misled the algorithm into thinking that two near-identical images were different. Importantly, our algorithm is able to generate a large number of diverse filters, making the development of countermeasures difficult. Our findings raise serious questions about the robustness of such invasive approaches.’”

The write-up includes several examples of (innocuous) images before and after such cloaking filters were applied. They are less crisp, to be sure, but still clear as day to the human eye. The research team has wisely decided not to make their filtering technique public lest bad actors use it to fool PH-CSS algorithms. Their results do make one wonder if the use of these detection tools is worth the privacy trade-off. Perhaps not, at least until the algorithms learn to interpret filtered photos.

Cynthia Murrell, November 23, 2021

The Final Disintermediation: Are Libraries Marked for Death?

November 9, 2021

Brewster Kahle founded the Internet Archive, but according to the Time article: “I Set Out To Build The Next Library Of Alexandria. Now I Wonder: Will There Be Libraries In 25 Years?” he is pondering if he did the right thing. Kahle wanted the Internet Archive to preserve Web sites and television as well as digitize books. Out of necessity, libraries have become more digital.

While digital information has a multiple benefits, there is an extreme downside tied to corporate control:

“But just as the Web increased people’s access to information exponentially, an opposite trend has evolved. Global media corporations—emboldened by the expansive copyright laws they helped craft and the emerging technology that reaches right into our reading devices—are exerting absolute control over digital information. These two conflicting forces—towards unfettered availability and completely walled access to information—have defined the last 25 years of the Internet. How we handle this ongoing clash will define our civic discourse in the next 25 years. If we fail to forge the right path, publishers’ business models could eliminate one of the great tools for democratizing society: our independent libraries.”

The problem is the larger book publishers, not the small prints. The larger publishers limit the number of digital copies available to public libraries. Publishers are extorting money from public and academic libraries over every small thing related to books. It hinders the freedom and dissemination of information.

The Internet Archive doubles as a lending library. It lends out digitized books one user at a time, works with independent publishers to ensure their rights are respected. This is the proper way to manage “controlled digital lending.”

This happened in 2020:

“Last year, four of the biggest commercial publishers in the world sued the Internet Archive to stop this longstanding library practice of controlled lending of scanned books. The publishers filed their lawsuit early in the pandemic, when public and school libraries were closed. In March 2020, more than one hundred shuttered libraries signed a statement of support asking that the Internet Archive do something to meet the extraordinary circumstances of the moment. We responded as any library would: making our digitized books available, without waitlists, to help teachers, parents, and students stranded without books. This emergency measure ended two weeks before the intended 14-week period.”

The publishers’ lawsuit demands that the Internet Archive delete all the digital copies of books it acquired legally. Many states have reacted against the publishers’ demand as harmful to libraries. The publishers counter that it is unconstitutional.

Kahle believes libraries will still exist in twenty-five years in the current argument between publishers and libraries is handled well. He is right, but he is also discounting that libraries are technology media centers, provide free Internet, have free community programs, are meeting centers, and do much more than check out books.

Will libraries be disintermediated? Good question.

Whitney Grace, November 9, 2021

Clarivate Buys ProQuest

May 18, 2021

I don’t want to go into the history of commercial database producers. (Those readings about Oliver Cromwell in my British history class were orders of magnitude more exciting.)

ProQuest Bought by Clarivate in $5.3bn Deal” reports:

London-based Clarivate said the acquisition would establish it as “a premier provider of end-to-end research intelligence solutions” and significantly expand its content and data offerings.

Clarivate describes itself this way:

Together, we can create a better tomorrow.

The firm uses these phrases to communicate its business:

Every drop of potential needs to be squeezed from your IP

Make critical decisions with speed and certainty

Innovation in focus

Human ingenuity can change the world and improve our future

Accelerating innovation with actionable information and insights

If you are still unsure what the firm does, you will need to check the About page on the company’s Web site. Oh, sorry. There is no “About” page for Clarivate. A profile of the firm, which is assumed to be a household work, is available at this link.

ProQuest warrants its own Wikipedia entry which explains that

ProQuest LLC is an Ann Arbor, Michigan-based global information-content and technology company, founded in 1938 as University Microfilms by Eugene B. Power. ProQuest provides applications and products for libraries. ProQuest started as a producer of microfilm products, then became an electronic publisher, and later grew through acquisitions. Today, the company provides tools for discovery and citation management,[example needed] and platforms that allow library users to search, manage, use, and share research.

Net net: For fee online information access appears to mesh with the increased interest in subscription services. Challenges exist; for example, individuals like Sci Hub’s founder Alexandra Elbakyan and university professionals who can go off the reservation and present content outside of the peer reviewed journals, Dark Web archives, and customers mindful of the cost associated with an online for fee search may look for relevant information on Medium or Substack type services. My view is that this is a sale by ProQuest’s owner Cambridge Scientific Abstract comparable to Bill Ziff’s legendary deals.

Stephen E Arnold, May 18, 2021

More Search Explaining: Will It Help an Employee Locate an Errant PowerPoint?

May 13, 2021

Semantics, Ambiguity, and the role of Probability in NLU” is a search-and-retrieval explainer. After half a century of search explaining, one would think that the technology required to enter a keyword and get a list of documents in which the key word appears would be nailed down. Wrong.

“Search” in 2021 embraces many sub disciplines. These range from explicit index terms like the date of a document to more elusive tags like “sentiment” and “aboutness.” Boolean has been kicked to the curb. Users want to talk to search, at least to Alexa and smartphones. Users want smart software to deliver results without the user having to enter a query. When I worked at Booz, Allen & Hamilton, one of my colleagues (I think his name was Harvey Poppel, the smart person who coined the phrase “paperless office”) suggested that someday a smart system would know when a manager walked into his or her office. The smart software would display what the person needed to know for that day. The idea, I think, was that whist drinking herbal tea, the smart person would read the smart outputs and be more smart when meeting with a client. That was in the late 1970s, and where are we? On Zooms and looking at smartphones. Search is an exercise in frustration, and I think that is why venture firms continue to pour money into ideas, methods, concepts, and demos which have been recycled many times.

I once reproduced a chunk of Autonomy’s marketing collateral in a slide in one of my presentations. I asked those in the audience to guess at what company wrote the text snippet. There were many suggestions, but none was Autonomy. I doubt that today’s search experts are familiar with the lingo of search vendors like Endeca, Verity, InQuire, et all. That’s too bad because the prose used to describe those systems could be recycled with little or no editing for today’s search system prospects.

The write up in question is serious. The author penned the report late last year, but Medium emailed me a link to it a day ago along with a “begging for dollars” plea. Ah, modern online blogs. Works of art indeed.

The article covers these topics as part of the “search” explainer:

  • Ambiguity
  • Understanding
  • Probability

Ambiguity is interesting. One example is a search for the word “terminal.” Does the person submitting the query want information about a computer terminal, a bus terminal, or some other type of terminal; for instance the post terminal on the transformer to my model train set circa 1951? Smart software struggles with this type of ambiguity. I want to point out that a subject matter expert can assign a “field code” to the term and eliminate the ambiguity, but SMEs are expensive and they lose their index precision capability as the work day progresses.

The deal with the “terminal” example, the modern system has to understand [a] what the user wants and [b] what the content objects are about. Yep, aboutness. Today’s smart software does an okay job with technical text because jargon like Octanitrocubane allows relatively on point identification of a document relevant to a chemist in Columbus, Ohio. Toss in a chemical structure diagram, and the precision of the aboutness ticks up a notch. However, if you search for a word replete with social justice meaning, smart software often has a difficult time figuring out the aboutness. One example is a reference to Skokie, Illinois. Is that a radical right wing code word or a town loved for Potawatomi linguistic heritage?

Probability is a bit more specific — usually. The idea in search is that numbers can illuminate some of the dark corners of text’s meaning. Examples are plentiful. Curious about Miley Cyrus on SNL and then at the after party? The search engine will display the most probable content based on whatever data is sluiced through the query matcher and stored in a cache. If others looked at specific articles, then, by golly, a query about Miley is likely or highly probable to be just what the searcher wanted. The difference between ambiguity, understanding, and probability is — in my opinion — part of the problem search vendors faces. No one can explain why, after 50 years of SMART, and Personal Library Software, STAIRS, et al, finding on point information remains frustrating, expensive, and ineffective.

The write up states:

ambiguity was not invented to create uncertainty — it was invented as a genius compression technique for effective communication. And it works like magic, because on the receiving end of the message, there is a genius decoding and decompression technique/algorithm to uncover all that was not said to get at the intended thought behind the message. Now we know very well how we compress our thoughts into a message using a genius encoding scheme, let us now concentrate on finding that genius decoding scheme — a task that we all call now ‘natural language understanding’.

Sounds great. Now try this test. You have a recollection of viewing a PowerPoint a couple of weeks ago at an offsite. You know who the speaker was and you want the slide with the number of instant messages sent per day on WhatsApp? How do you find that data?

[a] Run a query on your Fabasoft, SearchUnify, or Yext system?

[b] Run a query on Google in the hopes that the GOOG will point you to Statista, a company you believe will have the data?

[c] Send an email to the speaker?

[d] All of the above.

I would just send the speaker a text message and hope for an answer. If today’s search systems were smart, wouldn’t the single PowerPoint slide be in my email anyway? Sure, someday.

Stephen E Arnold, May 13, 2021

Great Moments in Censorship: Beethoven Bust

May 3, 2021

I noted a YouTube video called “Five Years on YouTube.” Well, not any longer. A highly suspect individual who has the gall to teach piano was deemed unacceptable. Was this a bikini haul by YouTube influencer/sensation Soph Mosca, who recently pointed out that reading a book was so, well, you know, ummm hard. Was it a pointer to stolen software like this outstanding creators’ contributions who seem little troubled by YouTube’s smart software monitoring system:

image

Nope, the obviously questionable piano teacher with 29,000 people who want to improve their piano skills is a copyright criminal type.

Watch the video. Notice the shifty eyes. Notice the prison complexion. Notice the poor grammar, enunciation, and use of bad actor argot.

Did you hear these vile words:

  • Beethoven
  • APRA_CS, ECAD_CS, SOCAN, VCPMC_CS
  • Upsetting.

And the music? I think Beethoven is active on Facebook, Instagram, Twitter, and other social media channels. He is protected by the stalwarts at Amazon, Apple, and Google. Did he really tweet: “Persecute piano teachers”?

What’s he have to say about this nefarious person’s use of notes from the Moonlight Sonata?

Asking Beethoven is similar to asking Alexa or Siri something. The truth will be acted upon.

I think smart software makes perfect decisions even though accuracy ranges from 30 percent to 90 percent for most well crafted and fiddled models.

Close enough for horse shoes. And piano teachers! Ban them. Lock them up. Destroy their pianos.

Furthermore the perpetrator of this crime against humanity ifs marina@thepianokeys.com. If you want to help her, please, contact her. Beyond Search remembers piano teachers, an evil brood. Ban them all, including Tiffany Poon and that equally despicable Dame Mitsuko Uchida who has brazenly performed Mozart’s Piano Concerto K. 271.

Cleanse the world of these spawn of Naamah.

Stephen E Arnold, May 3, 2021

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta