OSINT Analysts Alert: Biases Distilled to a One Page Cheat Sheet

March 20, 2023

Toward Parsimony in Bias Research: A Proposed Common Framework of Belief-Consistent Information Processing for a Set of Biases” is an academic write up. Usually I ignore these for two reasons: [a] the documents are content marketing designed to get a grant or further a career and [b] the results are non reproducible.

The write up, despite my skepticism of real researchers, contains one page which I think is a useful checklist of the pitfalls into which some people may happily [a] tumble, [b] live in, and [c] actively seek.

I know this image is unreadable, but I wanted to provide it with a hyperlink so you can snag the image and the full document:


Excellent work.

Stephen E Arnold, March 20, 2023

How about This Intelligence Blindspot: Poisoned Data for Smart Software

February 23, 2023

One of the authors is a Googler. I think this is important because the Google is into synthetic data; that is, machine generated information for training large language models or what I cynically refer to as “smart software.”

The article / maybe reproducible research is “Poisoning Web Scale Datasets Is Practical.”  Nine authors of whom four are Googlers have concluded that a bad actor, government, rich outfit, or crafty students in Computer Science 301 can inject information into content destined to be used for training. How can this be accomplished. The answer is either by humans, ChatGPT outputs from an engineered query, or a combination. Why would someone want to “poison” Web accessible or thinly veiled commercial datasets? Gee, I don’t know. Oh, wait, how about control information and framing of issues? Nah, who would want to do that?

The paper’s authors conclude with more than one-third of that Google goodness. No, wait. There are no conclusions. Also, there are no end notes. What there is a road map explaining the mechanism for poisoning.

One key point for me is the question, “How is poisoning related to the use of synthetic data?”

My hunch is that synthetic data are more easily manipulated than going through the hoops to poison publicly accessible data. That’s time and resource intensive. The synthetic data angle makes it more difficult to identify the type of manipulations in the generation of a synthetic data set which could be mingled with “live” or allegedly-real data.

Net net: Open source information and intelligence may have a blindspot because it is not easy to determine what’s right, accurate, appropriate, correct, or factual. Are there implications for smart machine analysis of digital information? Yep, in my opinion already flawed systems will be less reliable and the users may not know why.

Stephen E Arnold, February 23, 2023

What Happens When Misinformation Is Sucked Up by Smart Software? Maybe Nothing?

February 22, 2023

I noted an article called “New Research Finds Rampant Misinformation Spreading on WhatsApp within Diasporic Communities.” The source is the Daily Targum. I mention this because the news source is the Rutgers University Campus news service. The article provides some information about a study of misinformation on that lovable Facebook property WhatsApp.

Several points in the article caught my attention:

  1. Misinformation on WhatsApp caused people to be killed; Twitter did its part too
  2. There is an absence of fact checking
  3. There are no controls to stop the spread of misinformation

What is interesting about studies conducted by prestigious universities is that often the findings are neither novel nor surprising. In fact, nothing about social media companies reluctance to spend money or launch ethical methods is new.

What are the consequences? Nothing much: Abusive behavior, social disruption, and, oh, one more thing, deaths.

Stephen E Arnold, February 22, 2023

Secrets Patterns Database

February 15, 2023

One of my researchers called my attention to “Secrets Patterns Database.” For those interested in finding “secrets”, you may want to take a look. The data and scripts are available on GitHub… for now. Among its features are:

  • “Over 1600 regular expressions for detecting secrets, passwords, API keys, tokens, and more.
  • Format agnostic. A Single format that supports secret detection tools, including Trufflehog and Gitleaks.
  • Tested and reviewed Regular expressions.
  • Categorized by confidence levels of each pattern.
  • All regular expressions are tested against ReDos attacks.”

Links to the author’s Web site and LinkedIn profile appear in the GitHub notes.

Stephen E Arnold, February 20, 2023

Datasette: Useful Tool for Crime Analysts

February 15, 2023

If you want to explore data sets, you may want to take a look at the “open source multi-tool for exploring and publishing data.” The Datasette Swiss Army knife “is a tool for exploring and publishing data.”

The company says,

It helps people take data of any shape, analyze and explore it, and publish it as an interactive website and accompanying API. Datasette is aimed at data journalists, museum curators, archivists, local governments, scientists, researchers and anyone else who has data that they wish to share with the world. It is part of a wider ecosystem of 42 tools and 110 plugins dedicated to making working with structured data as productive as possible.

A handful of demos are available. Worth a look.

Stephen E Arnold, February 15, 2023

Modern Research Integrity: Stunning Indeed

February 13, 2023

I read “The Rise and Fall of Peer Review.” The essay addresses what happens when a researcher submits a research paper to a research journal. Many “research” journals are owned by big professional publishing companies. If you are not familiar with that sector, think about a publishing club which markets to libraries and “research” institutions. No articles in “research” publications, no promotion. The method for determining accuracy is to ask experts to read submitted papers, make comments, and send a signal about value of the “research.” I served on the peer review panel for a year and quit. I am no academic, but I know doo doo when it is on my shoe.

Now I want to focus on one passage. Consider this statement:

Why don’t reviewers catch basic errors and blatant fraud? One reason is that they almost never look at the data behind the papers they review, which is exactly where the errors and fraud are most likely to be. In fact, most journals don’t require you to make your data public at all. You’re supposed to provide them “on request,” but most people don’t. That’s how we’ve ended up in sitcom-esque situations like ~20% of genetics papers having totally useless data because Excel autocorrected the names of genes into months and years. (When one editor started asking authors to add their raw data after they submitted a paper to his journal, half of them declined and retracted their submissions. This suggests, in the editor’s words, “a possibility that the raw data did not exist from the beginning.”)


  1. There is exactly one commercial database which added corrections to its entries. Why? Accuracy is expensive and most publishers are not into corrections. I think the feature of that database has been in the trash heap for many, many years. The outfit which bought the database is not into excellence in anything but revenue and profit.
  2. I found it impossible to get access to [a] the author to whom I wanted to address a question directly; that is, on the telephone, or [b] to get the data on which the crazy statistical hoops were displayed. Hey, math is not the key differentiator for many researchers, getting tenure and grants are the prime movers. A peer reviewer with pointed questions? Sorry, no way.
  3. The professional publishers want to follow a process which shifts responsibility for publishing error-filled articles to the “procedure”, the peer reviewers, the editors, and probably the stray dog outside their headquarters. Everyone is responsible for mistakes except them.

Net net: Perhaps the notion of open source accuracy needs to be expanded beyond tweets and Facebook posts?

Stephen  E Arnold, February 14, 2023

Easy Monitoring for Web Site Deltas

February 9, 2023

We have been monitoring selected Clear Web pages for a research project. We looked at a number of solutions and turned to VisualPing.io. The system is easy to use. Enter a url of the Web page for which you want a notification of a delta (change). Enter an email, and the system will provide you with a notification. The service is free if you want to monitory five Web pages per day. The company has a pricing FAQ which explains the cost of more notification. The Visual Ping service assumes a user wants to monitor the same Web site or sites on a continuous basis. In order to kill monitoring for one site, a bit of effort is required. Our approach was to have a different team member sign up and enter the revised monitor list. There may be an easier way, but without an explicit method, a direct solution worked for us.

Stephen E Arnold, February 9, 2023

Interesting Search Tool: Tumbex

December 13, 2022

Interest in Open Source Intelligence has crossed what I call the Murdoch Wall Street Journal threshold. My MWSJ is that a topic, person, or idea bubbles along for a period of time, in this instance, decades. OSINT was a concept was discussed by a number of people in the 1980s. In fact, one advocate — a former Marine Corps. officer and government professional — organized open source intelligence conferences decades ago. That’s dinobaby history, and I know that few “real news” people remember Robert David Steele or his concepts about open source in general or OSINT in particular. (If you are curious about the history, email the Beyond Search team at benkent2020 @ yahoo dot com. Why? I participated in Mr. Steele’s conferences for many years, and we worked on a number of open source projects for a range of clients until shortly before his death in August 2021.) Yep, history. Sometimes knowing about events can be helpful.

Let’s talk about online information; specifically, an OSINT tool available since 2014 if my memory is working this morning. The tool is called Tumbex. With it, one can search Tumblr content.


Here’s what the Web site says:

Tumbex indexes only tumblr posts which have caption or tags. We analyse the content and define if tumblr or posts are nsfw/adult. If your tumblr was detected as nsfw by mistake, you can request a review and we will manually check your tumblr.

This is interesting. However, with a bit of query testing one can find some quite sporty content on the service.

The service, allegedly became available in 2014, is hosted by the French outfit OVH. According to StatShow, Tumbex has experienced a jump in traffic. The site is not particularly low profile because it has a user base of an estimated one million humans or bots. (Please, keep in mind that click data are often highly suspect regardless of source.) FYI: StatShow can be a useful OSINT resource as well.

If you are interested in some of the OSINT resources my team relies upon, navigate to www.osintfix.com. Click the image and a new window will open with an OSINT resource displayed. No ads, no trackers, no editorial. Just an old fashioned 1994 Web site which can be used fill an idle moment.

Now that the MWSJ threshold has been crossed, OSINT is a thing, an almost-overnight success with some youthful experts emphasizing that the US government has been asleep at the switch. I am not sure that assessment is one I can fully support.

Stephen E Arnold, December 13, 2022

OSINT: HandleFinder

November 22, 2022

If you are looking for a way to identify a user “handle” on various social networks, you may want to take a look at HandleFinder. The service appears to be offered without a fee. The developer does provide a “Buy Me a Coffee” link, so you can support the service. The service accepts a user name. We used our favorite ageing teen screen name ibabyrainbow or babyrainbow on some lower profile services. HandleFinder returned 31 results on our first query. (We ran the query a second time, and the system returned 30 results. We found this interesting.)

The services scanned included Patreon, TikTok, and YouTube, among others. The service did not scan the StreamGun video on demand service or NewRecs.

In order to examine the results, one clicks on service name which is underlined. Note that once one clicks the link, the result set is lost. We found that the link should be opened in a separate tab or window to eliminate the need to rerun the query after after each click. That’s how one of my team discovered the count variance.

When there is no result, the link in HandleFinder does not make this evident. Links to ibabyrainbow on Instagram returned “Page not found.” The result for Linktr.ee returned the Linktr.ee page of links, which means more clicking.

If one is interested in chasing down social media handles, you may want to check out this service. It is promising and hopefully will be refined.

Stephen E Arnold, November 22, 2022

Free and Useful OSINT Resource

November 8, 2022

For anyone interested in OSINT resources, here is a free eBook from low-profile intelware vendor Babel Street: Cybersecurity Insiders hosts “Open Source Intelligence (OSINT) Use Cases.” As the name implies, the volume describes practical applications of OSINT tools. The description reads:

“Businesses must be continually ready to mitigate all types of commercial and corporate risks. Some risks are known and easy to spot, but many are unknown and constantly evolving – your organization must be prepared to manage all of them or face serious consequences. A robust open-source intelligence (OSINT) platform is the answer. It combines publicly available information (PAI) data sources, with curated data streams, and filters to generate the actionable intelligence required to enhance protection or take action. This eBook includes twelve use cases exploring how OSINT tools help generate insights needed to drive improvements across:

  • Cyber risk management
  • Brand risk management
  • Operational risk management
  • Due diligence”

The book opens with a summary of how OSINT works and how it can be used. One notable use case is the very physical Event and Venue Protection under the otherwise BI-centered Brand Risk Management. One must provide some basic information before downloading the free resource, including name, company, email address, and phone number. Once registered with Cybersecurity Insiders, though, one has access to the site’s considerable roster of free cybersecurity resources. This includes another volume from Babel Street, “Best Practices for Using Publicly Available Information in Global Risk Management.” The firm’s clients include both government agencies and private enterprises. Not coincidentally, its AI analytics platform can assist organizations with the use cases described in the book. Based outside DC in Reston, Virginia, Babel Street was founded in 2009.

Cynthia Murrell, November 8, 2022

Next Page »

  • Archives

  • Recent Posts

  • Meta