Science: Just Delete It

September 10, 2020

The information in “Dozens of Scientific Journals Have Vanished from the Internet, and No One Preserved Them” may remind some people that the “world’s information” and the “Internet archives” are marketing sizzle. The steak is the source document. The FBI has used the phrase “going dark” as shorthand for not being able to access certain information. The thrill of not have potentially useful information is one that most researchers prefer to reserve for thrill rides at Legoland.

The write up states:

Eighty-four online-only, open-access (OA) journals in the sciences, and nearly 100 more in the social sciences and humanities, have disappeared from the internet over the past 2 decades as publishers stopped maintaining them, potentially depriving scholars of useful research findings, a study has found. An additional 900 journals published only online also may be at risk of vanishing because they are inactive, says a preprint posted on 3 September on the arXiv server. The number of OA journals tripled from 2009 to 2019, and on average the vanished titles operated for nearly 10 years before going dark, which “might imply that a large number … is yet to vanish…

Flat earthers and those who believe that “just being” is a substitute for academic rigor are probably going to have “thank goodness, these documents are gone” party. I won’t be attending.

Anti-intellectualism is really exciting. Plus, it makes life a lot easier for those in the top one percent of intellectual capability. Why? Extensive reading can fill in some blanks. Who wants to be comprehensive? Oh, I know: “Those who consume TikTok videos and devour Instagram while checking WhatsApp messages.”

Stephen E Arnold, September  10, 2020

A Librarian Looks at Google Dorking

August 24, 2020

In order to find solutions for their jobs, many people simply conduct a Google search. Google searching for solutions is practiced by teachers to executives to even software developers. Software developers spend an inordinate amount of their time searching for code libraries and language tutorials. One developer named Alec had the brilliant idea to create “dorking.” What is dorking?

“Use advanced Google Search to find any webpage, emails, info, or secrets

cost: $0

time: 2 minutes

Software engineers have long joked about how much of their job is simply Googling things

Now you can do the same, but for free”

Dorking is free! That is great! How does it work? Dorking is a tip guide using Boolean operators and other Google advanced search options to locate information. Dorking, however, does need a bit of coding knowledge to understand how it works.

Most some of these tips can be plugged into a Google search box, such as finding similar sites and find specific pages that must include a phrase in the Title text. Others need that coding knowledge to make them work. For example finding every email on a Web page requires this:


Yep, dorking for everyone.

After a few practice trials, these dorking tips are sure to work for even the most novice of Googlers. It will also make anyone, not just software developers, appear like experts. As a librarian, why not assign field types and codes, return Boolean logic, and respect existing Google operators. Putting a word in quotes and then getting a result without the word is — how should I frame it. I know — dorky.

Whitney Grace, MLS, August 24, 2020

Kaggle ArXiv Dataset

August 7, 2020

“Leveraging ML to Fuel New Discoveries with the ArXiv Dataset” announces that more than 1.7 million journal-type papers are available without charge on Kaggle. DarkCyber learned:

To help make the ArXiv more accessible, we present a free, open pipeline on Kaggle to the machine-readable ArXiv dataset: a repository of 1.7 million articles, with relevant features such as article titles, authors, categories, abstracts, full text PDFs, and more.

What’s Kaggle? The article explains:

Kaggle is a destination for data scientists and machine learning engineers seeking interesting datasets, public notebooks, and competitions. Researchers can utilize Kaggle’s extensive data exploration tools and easily share their relevant scripts and output with others.

The ArXiv contain metadata for each processed paper (document), including these fields:

  • ID: ArXiv ID (can be used to access the paper, see below)
  • Submitter: Who submitted the paper
  • Authors: Authors of the paper
  • Title: Title of the paper
  • Comments: Additional info, such as number of pages and figures
  • Journal-ref: Information about the journal the paper was published in
  • DOI: [](Digital Object Identifier)
  • Abstract: The abstract of the paper
  • Categories: Categories / tags in the ArXiv system
  • Versions: A version history

Details about the data and their location appear at this link. You can use the ArXiv ID to download a paper.

What if you want to search the collection? You may want to download the terabyte plus file and index the json using your favorite search utility. There’s a search system available from ArXiv and you can use the site: operator on Bing or Google to see if one of those ad-supported services will point you to the document set you need.

DarkCyber wants to suggest that you download the corpus now (datasets can go missing) and use your favorite search and retrieval system or content processing system to locate and make sense of the ArXiv content objects.

Stephen E Arnold, August 7, 2020

French Computer Terminology

August 1, 2020

This is a helpful resource. However, the term for “spreadsheet” is not included. If you want that spreadsheet holding a summary of your electricity bills, be sure to know the word “tableur.” You can find the collection of terms at this link. The compilation is not une faute passible d’un coup franc, but let’s check with the video assisted referee to be sure.

Stephen E Arnold, August 1, 2020

Online Books

June 16, 2020

The Internet Archive has pulled in its digital tentacles. Are there collections of online books that will not attract law suits from increasingly stressed “real” publishers?

The answer is, “Sort of.”

For a listing of “over three million free books on the Web”, point your Mother Hen browser at “The Online Books Page.” Some exploration is needed. The categories are not exactly easy to use, but what online index is these days.

The “Search Our Listings” lets a user search by author’s last name and title. The problem is, as many grade school students know, is that an author’s name can return many listings. To see what I mean, plug in “Plato”. There you go. A list of books that will dissuade some from locating the old guy who argued with Socrates (not the football playing medical doctor from Brazil).

You can also access a feature called “Exclude extended shelves.” Despite the name, the NOT function delivers the goods. Why make Boolean into something that makes little sense?

The new listings option delivers an earthworm result. Like to browse, this is your Disneyland. Want magazines? Just click “Serials.” This page leads to more pages listing magazines. Some of the journals in the link to the Electronic Journals Library are not free. Well, free is relative, I suppose.

The effort to gather the information is admirable. Polishing, editorial control, and consistent presentation may arrive in the future.

Worth checking into an author with whom one is familiar. Browsing can be interesting. Years ago I told a former client that no firm had a comprehensive index of electronic books. That company’s young and confident managers did not believe me. Flash forward to 2020, the problem still exists. There you go.

Stephen E Arnold, June 16, 2020

Bookmarks and the Dynamic Web: Yes, Still a Problem

June 3, 2020

Apparently, bookmarks are a thing. Again. Memex from is an open source browser extension that allows users to annotate, search, and organize online information locally. The offline functionality supports both privacy and data ownership. It is available for Chrome, Firefox, and Brave browsers, and now offers a mobile app called Memex Go. The product page lists these features:

Full Text History Search: Automatically indexes websites you visit. Instantly recover anything you’ve seen without upfront work.

Highlights & Annotations: Keep your thoughts organized with their original context.

Tags, Lists & Bookmarks: Quickly organize content via the sidebar or keyboard shortcuts.

Quickly save & organize content on the go: Encrypted sync between your computer, iOS and Android devices.

Your Data and Attention are yours: Memex is offline first & introduced a cap on investor returns so we don’t exploit your attention and data to maximize investor profits.

The page illustrates each feature with a dynamic screen shot, so check it out for more details. You can also click here to learn more about their financial philosophy. The Basic version of Memex is free, while the Pro version costs € 2 per month or € 20 per year (after the 14-day free trial). hopes its software will contribute to a “well-informed and less polarized global society.” Based in Berlin, the company was founded in 2017.

Cynthia Murrell, June 3, 2020

DeepDyve Offers Viable Alternative To Academic Paywalls

March 30, 2020

Academic paywalls are the bane of researchers even in the midst of the current health crisis. Why? Unless you are affiliated with a university or learning institution, you do not have immediate access to credible academic databases. Sure, there are there public libraries, but their database resources are limited . There might be an alternative solution that is actually viable and affordable: DeepDyve.

What is DeepDyve?

“DeepDyve offers an affordable monthly subscription service that gives unlimited full-text access to an amazing collection of premium academic publications.”

Users have access to over eighteen million articles, including full text pieces from over 15,000 peer-reviewed journals. The great thing about DeepDyve is that it is free for freelancers to create accounts, save their searches, curate their content, and export their citations. The freelance version of DeepDyve is limited to articles from Google Scholar (a notoriously low quality database), PubMed, and abstracts from all other publications. DeepDyve has a Pro account option for $49/month or $360/year that gives users access to all content.

That is much cheaper than signing up for academic databases on an individual basis as well as allows users to research from their own home without an academic institution affiliation. However does the cheaper price offer decent research materials?

DeepDyve does not appear to be hiding anything, because it lists all the different resources users can access with a subscription fee. Users can explore resources by research topic and see what a Deepdyve subscription offers.

DeepDyve could be a newer model for academic database and journal access. The big academic publishers still hold tons of power, but companies like DeepDyve could turn the publishing tide.

Whitney Grace, March 30, 2020

JSTOR: Some Free Info

March 23, 2020

Navigate to this link. Enter a query like “Kolmogorov Arnold” and you will see:


No registration, no begging for dollars. Why? Building goodwill?

What’s JSTOR? Wikipedia says:

JSTOR originally was conceived as a solution to one of the problems faced by libraries, especially research and university libraries, due to the increasing number of academic journals in existence. Most libraries found it prohibitively expensive in terms of cost and space to maintain a comprehensive collection of journals. By digitizing many journal titles, JSTOR allowed libraries to outsource the storage of journals with the confidence that they would remain available long-term. Online access and full-text search ability improved access dramatically.

JSTOR has an interesting history. DarkCyber will leave that research up to you, gentle reader. You have JSTOR to use for the research. Tip: The good stuff about JSTOR is not available from JSTOR.

The article about my relative’s math is available to you. “Quantum analogue of the Kolmogorov-Arnold-Moser Transition in Field Induced Barrier Penetration in a Quartic Potential” is much more interesting than battles with STM publishers, Aaron Swartz, and outflanking Ebsco.

Stephen E Arnold, March 23, 2020

British Maps Online: Finding a Map Is Challenging

February 26, 2020

The British Royal Collection recently added a brand new addition to their online collection. The blog Ian Visits explores the new collection in the post: “Huge Archive Of Old Military Maps Published.” The post explains that over three thousand maps that King George III collected have been digitized. Dr, Yolande Hodson headed the project and spend ten years cataloging George III’s collection. This is the first time in history that these documents have been available free to the public.

Scholars are amazed at the breadth and wealth of information available in the maps, but the average user will find the maps fun due to their age and information. The map collection contains items from the sixteenth to eighteenth centuries, consisting of maps drawn in the field, uniform depictions, fortification plans, and presentation maps of sieges, battles, and marches. King George III loved maps:

“Maps were an important part of George’s early life and education, and he built up a huge collection of more than 55,000 topographical, maritime and military prints, drawings, maps and charts. Upon the King’s death, his son, George IV, gave his father’s collections of topographical views and maritime charts to the British Museum (now in the British Library), but retained the military plans due to their strategic value and his own keen interest in the tactics of warfare.”

These maps offer a window to the past. They show how common soldiers and people dealt with in the daily lives. They are not photographs, but they offer more details than many a picture can.

Keep in mind that browsing may be needed to locate a particular map.

Whitney Grace, February 26, 2020

Are Media Worthless? Matt Taibbi Says Yes

January 3, 2020

Robert Steele, a former US spy whom I know, and also the top reviewer for non-fiction books in English, has published Review: Hate Inc. Why Today’s Media Makes Us Despise One Another by Matt Taibbi and given the book five stars, calling it “”totally brilliant.”

I was drawn to this statement in Steele’s review:

There will come a time, guaranteed, when Americans pine for a powerful neither-party-aligned news network, to help make sense of things.

Steele’s review appears to provide a concise summary of the book that those who worry about accuracy, data integrity, ethics, and the concept of social value should find interesting. Steele concludes the review by noting:

The same is true of the intelligence community, and the academy, of non-profits and governments. Keep the money moving, never mind the facts.

Facts? Are facts irrelevant? Steele and Taibbi appear to agree that facts remain important. Dissenters: Possibly the “media?”

Stephen E Arnold, January 3, 2020

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta