Directories Have Value

November 29, 2024

Why would one build an online directory—to create a helpful reference? Or for self aggrandizement? Maybe both. HackerNoon shares a post by developer Alexander Isora, “Here’s Why Owning a Directory = Owning a Free Infinite Marketing Channel.”

First, he explains why users are drawn to a quality directory on a particular topic: because humans are better than Google’s algorithm at determining relevant content. No argument here. He uses his own directory of Stripe alternatives as an example:

“Why my directory is better than any of the top pages from Google? Because in the SERP [Search Engine Results Page], you will only see articles written by SEO experts. They have no idea about billing systems. They never managed a SaaS. Their set of links is 15 random items from Crunchbase or Product Hunt. Their article has near 0 value for the reader because the only purpose of the article is to bring traffic to the company’s blog. What about mine? I tried a bunch of Stripe alternatives myself. Not just signed up, but earned thousands of real cash through them. I also read 100s of tweets about the experiences of others. I’m an expert now. I can even recognize good ones without trying them. The set of items I published is WAY better than any of the SEO-optimized articles you will ever find on Google. That is the value of a directory.”

Okay, so that is why others would want a subject-matter expert to create a directory. But what is in it for the creator? Why, traffic, of course! A good directory draws eyeballs to one’s own products and services, the post asserts, or one can sell ads for a passive income. One could even sell a directory (to whom?) or turn it into its own SaaS if it is truly popular.

Perhaps ironically, Isora’s next step is to optimize his directories for search engines. Sounds like a plan.

Cynthia Murrell, November 29, 2024

Bookmark This: HathiTrust Digital Library

October 30, 2024

Concerned for the Internet Archive? So are we. (For multiple reasons.) But while that venerable site recovers from its recent cyberattacks, remember Hathi exists. Founded in 2008, the not-for-profit HathiTrust Digital Library is a collaborative of academic and research libraries. The site makes millions of digitized items available for study by humans as well as for data mining. The site shares the collection’s story:

“HathiTrust’s digital library came into being during the mid-2000s when companies such as Google began scanning print titles from the shelves of university and college campus libraries. When many of those same libraries created HathiTrust in 2008, they united library copies of those digitized books into a single, shared collection to make as much of the collection available for access as allowable by copyright law. Through HathiTrust, libraries collaborate on long-term management, preservation, and access of their collections. Book lovers and researchers like you can explore this huge collection of digitized materials! Today, HathiTrust Digital Library is the largest set of digitized books managed by academic and research libraries. The collection includes materials typically found on the shelves of North American university and college campuses with the benefit of being available online instead of scattered in buildings around the globe. Our enormous collection includes thousands of years of human knowledge and published materials from around the world, selected by librarians and preserved in the libraries of academic and research libraries. You can find all kinds of digitized books and primary source materials to suit a wide range of research needs.”

The collection contains books and “book-like” items—basically anything except audio/visual files. All Library of Congress subjects are represented, but the largest treasures lie in the Language & Literature, Philosophy, Religion, History, and Social Sciences chambers. All volumes not restricted by copyright are free for anyone to read. Just over half the works are in English, while the rest span over 400 languages, including some that are now extinct. Ninety-five percent were scanned from print by Google, but a few specialized collections were contributed by individuals or institutions. The Collection page offers several sample collections to get you started, or you can build your own. Have fun browsing their collections, and with luck the Internet Archive will be back up and running in no time.

Cynthia Murrell, October 30, 2024

PrivacyTools.io: A Good Resource for Privacy Tools and Services

October 30, 2024

Keeping up with the latest in global mass surveillance by private and state-sponsored groups can be a challenge. Here is a resource that can help: Privacy Tools evaluates the many tools designed to fight mass surveillance and highlights the best on its website. Its Home page lists its many clickable categories on the left and describes the criteria by which the site evaluates privacy tools and services. It also educates visitors on surveillance issues and why even those with “nothing to hide” should be concerned. It specifies:

“Many of the activities we carry out on the internet leave a trail of data that can be used to track our behavior and access some personal information. Some of the activities that collect data include credit card transactions, GPS, phone records, browsing history, instant messaging, watching videos, and searching for goods. Unfortunately, there are many companies and individuals on the internet that are looking for ways to collect and exploit your personal data to their own benefit for issues like marketing, research, and customer segmentation. Others have malicious intentions with your data and may use it for phishing, accessing your banking information or hacking into your online accounts. Businesses have similar privacy issues. Malicious entities could be looking for ways to access customer information, steal trade secrets, stop networks and platforms such as e-commerce sites from operating and disrupt your operations.”

The site’s list of solutions to these threats is long. Some are free and some are not. And which to choose will differ depending on one’s situation. One way to simplify the selection is with the group’s specific Privacy Guides—collections of tools for specific concerns. Categories currently include Android, Encryption, Network, Smartphones, Tor Browser, and Tracking, to name a few. This is a handy way to narrow down the many solutions featured on the site. A worthy undertaking since, as the site emphasizes, “You are being watched.”

Cynthia Murrell, October 30, 2024

Research: A Slippery Path to Wisdom Now

January 19, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

When deciding whether to believe something on the Internet all one must do is google it, right? Not so fast. Citing five studies performed between 2019 and 2022, Scientific American describes “How Search Engines Boost Misinformation.” Writer Lauren Leffer tells us:

“Encouraging Internet users to rely on search engines to verify questionable online articles can make them more prone to believing false or misleading information, according to a study published today in Nature. The new research quantitatively demonstrates how search results, especially those prompted by queries that contain keywords from misleading articles, can easily lead people down digital rabbit holes and backfire. Guidance to Google a topic is insufficient if people aren’t considering what they search for and the factors that determine the results, the study suggests.”

Those of us with critical thinking skills may believe that caveat goes without saying but, alas, it does not. Apparently evaluating the reliability of sources through lateral reading must be taught to most searchers. Another important but underutilized practice is to rephrase a query before hitting enter. Certain terms are predominantly used by purveyors of misinformation, so copy-and-pasting a dubious headline will turn up dubious sources to support it. We learn:

“For example, one of the misleading articles used in the study was entitled ‘U.S. faces engineered famine as COVID lockdowns and vax mandates could lead to widespread hunger, unrest this winter.’ When participants included ‘engineered famine’—a unique term specifically used by low-quality news sources—in their fact-check searches, 63 percent of these queries prompted unreliable results. In comparison, none of the search queries that excluded the word ‘engineered’ returned misinformation. ‘I was surprised by how many people were using this kind of naive search strategy,’ says the study’s lead author Kevin Aslett, an assistant professor of computational social science at the University of Central Florida. ‘It’s really concerning to me.’”

That is putting it mildly. These studies offer evidence to support suspicions that thoughtless searching is getting us into trouble. See the article for more information on the subject. Maybe a smart LLM will spit it out for you, and let you use it as your own?

Cynthia Murrell, January 19, 2024

Information Voids for Vacuous Intellects

January 18, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

In countries around the world, 2024 is a critical election year, and the problem of online mis- and disinformation is worse than ever. Nature emphasizes the seriousness of the issue as it describes “How Online Misinformation Exploits ‘Information Voids’—and What to Do About It.” Apparently we humans are so bad at considering the source that advising us to do our own research just makes the situation worse. Citing a recent Nature study, the article states:

“According to the ‘illusory truth effect’, people perceive something to be true the more they are exposed to it, regardless of its veracity. This phenomenon pre-dates the digital age and now manifests itself through search engines and social media. In their recent study, Kevin Aslett, a political scientist at the University of Central Florida in Orlando, and his colleagues found that people who used Google Search to evaluate the accuracy of news stories — stories that the authors but not the participants knew to be inaccurate — ended up trusting those stories more. This is because their attempts to search for such news made them more likely to be shown sources that corroborated an inaccurate story.”

Doesn’t Google bear some responsibility for this phenomenon? Apparently the company believes it is already doing enough by deprioritizing unsubstantiated news, posting content warnings, and including its “about this result” tab. But it is all too easy to wander right past those measures into a “data void,” a virtual space full of specious content. The first impulse when confronted with questionable information is to copy the claim and paste it straight into a search bar. But that is the worst approach. We learn:

“When [participants] entered terms used in inaccurate news stories, such as ‘engineered famine’, to get information, they were more likely to find sources uncritically reporting an engineered famine. The results also held when participants used search terms to describe other unsubstantiated claims about SARS-CoV-2: for example, that it rarely spreads between asymptomatic people, or that it surges among people even after they are vaccinated. Clearly, copying terms from inaccurate news stories into a search engine reinforces misinformation, making it a poor method for verifying accuracy.”

But what to do instead? The article notes Google steadfastly refuses to moderate content, as social media platforms do, preferring to rely on its (opaque) automated methods. Aslett and company suggest inserting human judgement into the process could help, but apparently that is too old fashioned for Google. Could educating people on better research methods help? Sure, if they would only take the time to apply them. We are left with this conclusion: instead of researching claims from untrustworthy sources, one should just ignore them. But that brings us full circle: one must be willing and able to discern trustworthy from untrustworthy sources. Is that too much to ask?

Cynthia Murrell, January 18, 2024

Academic Research Resources: Smart Software Edition

August 8, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_tNote: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

One of my research team called “The Best AI Tools to Power Your Academic Research.”  The article identifies five AI infused tools; specifically:

  • ChatPDF
  • Consensus
  • Elicit.org
  • Research Rabbit
  • Scite.ai

Each of the tools is described briefly. The “academic research” phrase is misleading. These tools can provide useful information related to inventors and experts (real or alleged), specific technical methods, and helpful background or contest for certain social, political, and intellectual issues.

If you have access to a LLM question-and-answer system, experimenting with article summaries, lists of information, and names of people associated with a particular activity — give a ChatGPT system a whirl too.

Stephen E Arnold, August 8, 2023

Need Research Assistance, Skip the Special Librarian. Go to Elicit

July 17, 2023

Vea4_thumb_thumb_thumb_thumb_thumb_t[1]Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.

Academic databases are the bedrock of research. Unfortunately most of them are hidden behind paywalls. If researchers get past the paywalls, they encounter other problems with accurate results and access to texts. Databases have improved over the years but AI algorithms make things better. Elicit is a new database marketed as a digital assistant with less intelligence than Alexa, Siri, and Google but can comprehend simple questions.

7 16 library hub

“This is indeed the research library. The shelves are filled with books. You know what a book is, don’t you? Also, will find that this research library is not used too much any more. Professors just make up data. Students pay others to do their work. If you wish, I will show you how to use the card catalog. Our online public access terminal and library automation system does not work. The university’s IT department is busy moonlighting for a professor who is a consultant to a social media company,” says the senior research librarian.

What exactly is Elicit?

“Elicit is a research assistant using language models like GPT-3 to automate parts of researchers’ workflows. Currently, the main workflow in Elicit is Literature Review. If you ask a question, Elicit will show relevant papers and summaries of key information about those papers in an easy-to-use table.”

Researchers use Elicit to guide their research and discover papers to cite. Researcher feedback stated they use Elicit to answer their questions, find paper leads, and get better exam scores.

Elicit proves its intuitiveness with its AI-powered research tools. Search results contain papers that do not match the keywords but semantically match the query meaning. Keyword matching also allows researchers to narrow or expand specific queries with filters. The summarization tool creates a custom summary based on the research query and simplifies complex abstracts. The citation graph semantically searches citations and returns more relevant papers. Results can be organized and more information added without creating new queries.

Elicit does have limitations such as the inability to evaluate information quality. Also Elicit is still a new tool so mistakes will be made along the development process. Elicit does warn users about mistakes and advises to use tried and true, old-fashioned research methods of evaluation.

Whitney Grace, July 16 , 2023

OSINT Is Popular. Just Exercise Caution

November 2, 2022

Many have embraced open source intelligence as the solution to competitive intelligence, law enforcement investigations, and “real” journalists’ data gathering tasks.

For many situations, OSINT as open source intelligence is called, most of those disciplines can benefit. However, as we work on my follow up to monograph to CyberOSINT and the Dark Web Notebook, we have identified some potential blind spots for OSINT enthusiasts.

I want to mention one example of what happens when clever technologists mesh hungry OSINT investigators with some online trickery.

Navigate to privtik.com  (78.142.29.185). At this site you will find:

image

But there is a catch, and a not too subtle one:

image

The site includes mandatory choices in order to access the “secret” TikTok profile.

How many OSINT investigators use this service? Not too many at this time. However, we have identified other, similar services. Many of these reside on what we call “ghost ISPs.” If you are not aware of these services, that’s not surprising. As the frenzy about the “value” of open source investigations increases, geotag spoofing, fake data, and scams will escalate. What happens if those doing research do not verify what’s provided and the behind the scenes data gathering?

That’s a good question and one that gets little attention in much OSINT training. If you want to see useful OSINT resources, check www.osintfix.com. Each click displays one of the OSINT resources we find interesting.

Stephen E Arnold, November 2, 2022

OSINT for Amateurs

January 13, 2022

Today I had a New Year chat with a person whom I met at specialized services conferences. I relayed to my friend the news that Robert David Steele, whom I knew since 1986, died in the autumn of 2021. Steele, a former US government professional, was described as one of the people who pushed open source intelligence down the bobsled run to broad use in government entities. Was he the “father of OSINT”? I don’t know, He and I talked via voice and email each week for more than 30 years. Our conversations explored the value of open source intelligence and how to obtain it.

After the call I read “How to Find Anyone on the Internet for Free.”

Wow, shallow. Steele would have had sharp words for the article.

The suggestions are just okay. Plus it is clear that a lack of awareness about OSINT exists.

My suggestion is that anyone writing about this subject spend some time learning about OSINT. There are books from professionals like Steele as well as my CyberOSINT: Next Generation Information Access. Also, attending a virtual conference about OSINT offered by those who have a background in intelligence would be useful. Finally, there are numerous resources available from intelligence gathering organizations. Some of these “lists” include a description of each site, service, or system mentioned.

For me and my team’s part, we are working to create 60 second videos which we will make available on Instagram-type services. Each short profile of an OSINT resource will appear under the banner “OSINT Radar.” These will be high value OSINT resources. Some of this information will also be presented in a new series of short articles and videos that Meg Coker, a former senior telecommunications executive, and I will create. Look for these in LinkedIn and other online channels.

Hopefully the information from OSINT Radar and the Coker-Arnold collaboration will provide useful data about OSINT resources which are useful and effective. Free and OSINT can go together, but the hard reality is that an increasing number of OSINT resources charge for the information on offer.

OSINT, unfortunately, is getting more difficult to obtain. Examples include China’s cut offs of technology information and the loss of shipping and train information from Ukraine. And there are more choke points; for example, Iran and North Korea. This means that OSINT is likely to require more effort than previously. The mix of machine and human work is changing. Consequently more informed and substantive information about OSINT will be required in 2022. The OSINT for amateurs approach is an outdated game.

Coker and Arnold are playing a new game.

Stephen E Arnold, January 13, 2022

Disrupting Commercial Sci-Tech Indexes

November 10, 2021

Pooling knowledge is beneficial for advancing research. Despite the availability of digital databases on the Internet, these individual databases are not connected. Nature shares that an American technologist created a, “Giant, Free Index To World’s Research Papers Released Online.”

Carl Malamud designed an online index that catalogs words and short phrases from over one hundred journal articles, including paywalled papers. Malamud released the index under his California non-profit Public Resource. The index is free and its purpose is to help scientists discover insights from all research, even if stuck behind paywalls. Technically Malamud does not have the legal right to index the paywalled articles. However, the index only contains short sentences less than five letters long from the paywalled articles. It does not violate copyright. Publishers may still argue that the index is a violation.

The index is a major innovation:

“Malamud’s General Index, as he calls it, aims to address the problems faced by researchers such as Yadav. Computer scientists already text mine papers to build databases of genes, drugs and chemicals found in the literature, and to explore papers’ content faster than a human could read. But they often note that publishers ultimately control the speed and scope of their work, and that scientists are restricted to mining only open-access papers, or those articles they (or their institutions) have subscriptions to. Some publishers have said that researchers looking to mine the text of paywalled papers need their authorization.”

Some publishers, like Springer Nature, support open source development projects like the Malamud General Index. Springer Nature said open source projects do encounter problems when they do not secure proper rights.

Publishers do not have a case against Malamud. The index does not violate copyright and full text articles are not published in it. Instead the index pools a wealth of information and exposes paywalled articles to a larger audience, who will purchase content if it is helpful to research.

Publishers, however, may need convincing of this perspective.

Whitney Grace, November 10, 2021

Next Page »

  • Archives

  • Recent Posts

  • Meta