GoAccess: A Log Analyzer

June 4, 2020

We are updating our tools section of an upcoming National Crime Conference lecture. If you have access to a Web server and a log, you may want to take a look at GoAccess. The software

was designed to be a fast, terminal-based log analyzer. Its core idea is to quickly analyze and view web server statistics in real time without needing to use your browser.

Analytics without Google” provides additional information about the software and includes helpful pointer. The article states:

What I further liked about GoAccess is I could run it on a separate machine, transferring logs from multiple servers into one place, then creating my necessary dashboards; this isn’t a specific feature of GoAccess, but a feature of the Unix philosophy. This flexibility works well with my seemingly ephemeral Digital Ocean Droplets, which don’t go kaboom on their own, but rather suffer from my own tendencies to erase and start from scratch. GoAccess reminded me how beautiful composable tools are. Its feature set is minimal and it plays nicely with the tools already available to us on a *nix platform. Do one thing and do it well — words of wisdom.

Worth a look.

Stephen E Arnold, June 4, 2020

Bookmarks and the Dynamic Web: Yes, Still a Problem

June 3, 2020

Apparently, bookmarks are a thing. Again. Memex from WorldBrain.io is an open source browser extension that allows users to annotate, search, and organize online information locally. The offline functionality supports both privacy and data ownership. It is available for Chrome, Firefox, and Brave browsers, and now offers a mobile app called Memex Go. The product page lists these features:

Full Text History Search: Automatically indexes websites you visit. Instantly recover anything you’ve seen without upfront work.

Highlights & Annotations: Keep your thoughts organized with their original context.

Tags, Lists & Bookmarks: Quickly organize content via the sidebar or keyboard shortcuts.

Quickly save & organize content on the go: Encrypted sync between your computer, iOS and Android devices.

Your Data and Attention are yours: Memex is offline first & WorldBrain.io introduced a cap on investor returns so we don’t exploit your attention and data to maximize investor profits.

The page illustrates each feature with a dynamic screen shot, so check it out for more details. You can also click here to learn more about their financial philosophy. The Basic version of Memex is free, while the Pro version costs € 2 per month or € 20 per year (after the 14-day free trial). WorldBrain.io hopes its software will contribute to a “well-informed and less polarized global society.” Based in Berlin, the company was founded in 2017.

Cynthia Murrell, June 3, 2020

The Good Old Internet Archive Attracts Some Legal Eagle Action

June 2, 2020

Who really owns the Internet Archive? Does the Internet Archive still bundle up tweets and provide them to the august Library of Congress? Is that the caterpillar tracks of the Bezos bulldozer in the roadway near the Internet Archives headquarters? DarkCyber does not know the answers to these questions.

What is clear is that the Association of American Publishers (yes, there are still American publishers) is not happy with the Internet Archive. “Publishers File Suit Against Internet Archive for Systematic Mass Scanning and Distribution of Literary Works. Ask Court to Enjoin and Deter Willful Infringement” is reasonably well written, probably because there is a Vassar literature major in the PR chain. The write up states:

… member companies of the Association of American Publishers (AAP) filed a copyright infringement lawsuit against Internet Archive (“IA”) in the United States District Court for the Southern District of New York. The suit asks the Court to enjoin IA’s mass scanning, public display, and distribution of entire literary works, which it offers to the public at large through global-facing businesses coined “Open Library” and “National Emergency Library,” accessible at both openlibrary.org and archive.org. IA has brazenly reproduced some 1.3 million bootleg scans of print books, including recent works, commercial fiction and non-fiction, thrillers, and children’s books.

The AAP, like DarkCyber, finds the self aggrandizing, virtue signaling about pandemics, unemployment, and other contentious social issues tiresome. Amazon and Google are busy waving their hands after recent social turmoil. That helps if one is seeking clicks and some positive corporate CxO stroking.

The AAP’s statement continues:

Despite the self-serving library branding of its operations, IA’s conduct bears little resemblance to the trusted role that thousands of American libraries play within their communities and as participants in the lawful copyright marketplace. IA scans books from cover to cover, posts complete digital files to its website, and solicits users to access them for free by signing up for Internet Archive Accounts. The sheer scale of IA’s infringement described in the complaint—and its stated objective to enlarge its illegal trove with abandon—appear to make it one of the largest known book pirate sites in the world. IA publicly reports millions of dollars in revenue each year, including financial schemes that support its infringement design.

You can read the AAP statement via the link above.

At a time when civility is in short supply, the AAP approaches its legal foe this way:

The lawsuit reflects widespread anger among publishers, authors, and the entire creative community regarding IA’s actions and its response to objections. In an open letter to IA and its Board of Directors, the Authors Guild observed,  “You cloak your illegal scanning and distribution of books behind the pretense of magnanimously giving people access to them. But giving away what is not yours is simply stealing, and there is nothing magnanimous about that. Authors and publishers—the rights owners who legally can give their books away—are already working to provide electronic access to books to libraries and the people who need them. We do not need Internet Archive to give our works away for us.”

Yikes. The news release should have carried a trigger warning. Where is that Vassar-powered red pencil when one needs it? After the Google-centric headline, why not add “This content may be disturbing.”

Will publishers succeed in this effort? The flapping of the legal eagles over the Google Books’ project is less noisy than it was. (When will those angry with Google realize that projects die at Google because staff lose interest?)

Internet Archive may be different. One bright spot: The search and retrieval mechanism for Internet Archive content is darned interesting. Try to find a content object. Great stuff. When a content object cannot be found, does it exist?

The lawsuit is unlikely to consider this question.

Stephen E Arnold, June 2, 2020

List of Online Libraries

June 2, 2020

One of the DarkCyber researchers spotted a list of online libraries. The source is an unlikely one: Voat and a contributor named Auchtung. The links point to Web sites which provide access to collections. The “collections” are often duplicates; that is, there is redundancy in the list. DarkCyber believes that if one library is taken down, another one can be located. If you are curious about lists of books offered without charge, navigate to this link. Registration may be required. Also, it is possible that some of the information offered on these Web pages is protected by copyright. Just a heads up.

Stephen E Arnold, June 2, 2020

Survey Says, Make the Content Go Away, Please

May 19, 2020

TechRadar states the obvious—“Want to Remove Information About Yourself Online? You’re Not Alone.” The write-up cites a recent Kaspersky survey of over 15,000 respondents. It confirms people are finally taking notice that their personal data has been making its way across the Web. The findings show a high percentage of Internet users have tried to erase personal information online, and for good reason, but many have met with little success. Writer Mike Moore reports:

“Four in five people (82 percent) surveyed in a major study by Kaspersky said they had tried to remove private information which had been publicly available, either from websites or social media channels, recently. However a third (37 percent) of those surveyed had no idea of how to remove details about themselves online. … [The survey] found that over a third (34 percent) of consumers have faced incidents where their private information was accessed by someone who did not have their consent. Of these incidents, over a quarter (29 percent) resulted in financial losses and emotional distress, and more than a third (35 percent) saw someone able to gain access to personal devices without permission. This rises to 39 percent among those aged between 25 and 34, despite younger internet users often being expected to have higher levels of technological literacy. Overall, one in five people say they are concerned about the personal data that organizations are collecting about them and their loved ones.”

The standard recommendation to protect privacy in the first place has been to use a VPN, but even that may be inadequate. A study performed by TechRadar Pro found that nearly half of all VPN services are based in countries that are part of the Fourteen Eyes international surveillance alliance. Looking for alternatives? Moore shares this link to a TechRadar article on what they say are the most secure VPN providers.

Cynthia Murrell, May 19, 2020

A Food Program Online: Shortages? What Shortages?

May 15, 2020

Retailers have adapted their sales model to serve customers during the COVID-19 pandemic. Humanity’s best qualities have also surfaced as we help each other during the bleak present. Combining charity and shopping in one, Walmart and Nextdoor launched a new assistance program. Techcrunch shares the story: “Nextdoor And Walmart Partner On A New Neighborly Assistance Program.”

Nextdoor is a neighborhood social network and the new “Neighbors Helping Neighbors” endeavor will allow vulnerable people to request shopping assistance from their neighbors who are already going to Walmart. Nextdoor users will post assistance requests in local groups via the Web site or app. Users can work out details through private messages or a message board.

The “Neighbors Helping Neighbors” is a low cost alternative to grocery delivery services and will help vulnerable people on fixed incomes.

The program is voluntary and Walmart is not monitoring the program. Walmart only partnered with Nextdoor to facilitate the “Neighbors Helping Neighbors” program.

“‘We’re inspired everyday by the kindness of people around the world who are stepping up and helping out. In recent weeks, we’ve been blown away by the number of members who have raised their hand to run an errand, go to the grocery store, or pick up a prescription for a neighbor,’ said Sarah Friar, Nextdoor CEO, about the feature. ‘We’re grateful for Walmart’s partnership to make this important connection between neighbors around vital services, and we’re proud to come together to ensure everyone has a neighborhood to rely on,’ she said.”

Helping vulnerable people during the COVID-19 pandemic is important, but the people who would use the “Neighbors Helping Neighbors” program could also be made targets via the same service. Nextdoor has useful information about crime which is not comprehensively indexed right now. The data about shopping and the “address / location” of those in need is a potential problem for users and an opportunity for bad actors to know whom to target.

And if there are food shortages? Virtue signals will light up.

Whitney Grace, May 15, 2020

Semantic SEO: Solution or Runway for Google Ads, Formerly AdWords?

May 14, 2020

I participated in a conversation with Robert David Steele, a former CIA professional, and a former Google software engineer named Zack Vorhies. One of the topics touched upon was Google’s relaxing of its relevance thresholds. A video of extracts from the conversation contains some interesting information; for example, the location of a repository of Google company documents Mr. Vorhies publicly released.

My contribution to the discussion focused on how valuable “relaxed” relevance is. The approach allows Google to display more ads per query. The “relaxed” query means that an ad inventory can be worked through more quickly than it would be IF old fashioned Boolean search were the norm for users. Advertisers’ eyes cross when an explanation of Boolean and “relaxing” a semantic method have to be explained.

DarkCyber’s research team prefers Boolean. None of the researchers need training wheels, Mother Google (which seems to emulate Elsa Krebs of James Bond fame) and WFH Googlers bonding with their mobile phones like a fuzzier, semantic Tommy Bahama methods.

The team spotted “The Newbie’s Information to Semantic Search: Examples and Instruments.” Our interpretation of “newbies” is that the collective noun refers to desperate marketers who have to find a way to boost traffic to a Web site BEFORE going to his or her millennial leader and saying, “Um, err, you know, I think we have to start buying Google Ads.”

Yes, there is a link between the SEO rah rah and the Google online advertising system. The idea is simple. When SEO fails, the owner of the Web page has to buy Google Ads (formerly Google AdWords). In a future post, someone on the team will write about this interesting business process. Just not in this post, thank you.

The article triggering this essay includes what looks like simplified semi-technical diagrams. Plus, there are screenshots featuring Yo Yo Ma. And SEOish jargon; for example:

Knowledge as in “knowledge of any Web page.” DarkCyber finds categorical affirmatives a crime against logicians living and semantically dead.
Mapping as in “semantic mapping”

Plus, the write up some to be an advertorial weaponized content object for a product called Optimizer. DarkCyber concluded that the system is a word look up tool, sort of a dumbed down thesaurus for hustlers, unemployed business administration junior college drop outs, and earnest art history majors working in the honorable discipline of SEO.

What’s the semantic analysis convey to a reader unfamiliar with the concepts of “semantic,” “mark up,” and “knowledge.”? The answer, in the view of the DarkCyber team, is less and less useful search results. Mr. Vorhies makes this point in the video cited above. In fact, he wants to go back to the “old Google.” Why? Today’s Google outputs frustratingly off point results.

The article’s main points, based on the DarkCyber interpretation of the article, are:

First, statements like this: “…don’t actually recognize how troublesome it’s to elucidate what’s being communicated with out the assistance of all “beyond-words” indicators.” Yeah, what? DarkCyber thinks the tortured words imply that smart software and data can light up the dark spaces of a user’s query. Stated another way: Search results should answer the user’s question with on point results. Yes, that sounds good. A tiny percentage of people using Google want to conduct an internal reference interview to identify what’s needed, select the online indexes to search, formulate the terms required for a query, and then run the query on multiple systems. Very few users of online search systems wants to scan results, analyzed the most useful content, dedupe and verify data, and then capture facts with appropriate bibliographic information. Many times, this type of process is little for than input for a more refined query. Who has time for a systematic, thorough informationizing process. Why? Saying the word “pizza” to a mobile phone is the way to go. If it works for pizza, the simple query will work for Inconel 235 chemical properties, right? This easy approach is called semantic. In reality it is a canned search with results shaped by advertisers who want clicks.

Second, a person desperately seeking traffic to a Web site must index content on a Web page. Today, “index” is a not-so-useful term. Today one “tags” a page with user assigned terms. Controlled vocabularies play almost no role in modern Web search systems. Just make up a term, then to a TikTok video and become a millionaire. Easy, right? To make tags more useful, one must use synonyms. If a page is about pizza, then a semantic tag is one that might offer the tag “vegetarian.” At least one of the DarkCyber team is old enough to remember being taught how to use a thesaurus and a dictionary. Today, one needs smart software to help the art major navigate the many words available in the English language.

Third, to make the best use of related words, the desperate marketer must embrace “semantic mapping.” The idea is to “visualize relationships between ideas and entities.” (The term “entity” is not defined, which the DarkCyber team is perfectly okay for newbies who need help with indexing.) The idea of a semantic map is a Google generated search page — actually a report of allegedly related data — created by Google’s smart software. In grade school decades ago, students were taken to the library, taught about the “catalog”. Then students would gather information from “sources.” The discovered information was then winnowed and assembled into an essay or a report. If something looked or seemed funny, there was a reference librarian or a teacher to inform the student about the method for verifying facts. Now? Just trust Google. To make the idea vivid, the article provides another Google output. Instead of Yo Yo Ma, the topic is “pizza.” There you go.

The write up reminds the reader to use the third party application Text Optimizer for best results. And the bad news is that “semantic codes” must be attached to these semantically related index terms. One example is the command for deleted text. Indeed, helpful. Another tag is to indicate a direct quotation. No link to a source is suggested. Another useful method for the practicing hustler.

Let’s step back.

The article is all too typical of search engine optimization expertise. The intent is wrapped in the wool of jargon. The main point is to sell a third party software which provides training wheels to the thrashing SEO hungry individual. Plus, the content is not designed to help the user who needs specific information.

The focus of SEO is to add fluff to content. When the SEO words don’t do the job, what does the SEO marketer do?

Buy Google Ads. This is “pay to play”, and it is the one thing that Google relies upon for revenue.

Stephen E Arnold, May 14, 2020

New Warning System For False Information

April 24, 2020

False information, fake news, and disinformation have been popular words in the American vocabulary since Trump’s 2016 presidential win. Back in 2016 and to the current day, disinformation spreads faster than wildfire due to social media platforms, bots, and people determined to spread lies. Science Magazine explores one way to fight false information in the article, “Researchers Develop Early Warning System To Fight Disinformation Online.”

A University of Notre Dame research team developed an early warning system using AI designed to identify edited images, fake videos, and other false information online. The project’s goal is to catch social media campaigns that are meant to trigger violence and ruin democratic elections. The project is headed by personnel from Notre Dame’s Department of Computer Science and Engineering.

The team collected over two million images and other content about the Indonesian 2019 general election from Instagram and Twitter. They discovered that there were spontaneous and coordinated campaigns on social media started to ignite violence and influence the election.

These campaigns use classic propaganda techniques and are dangerous:

“Those campaigns consisted of manipulated images exhibiting false claims and misrepresentation of incidents, logos belonging to legitimate news sources being used on fabricated news stories and memes created with the intent to provoke citizens and supporters of both parties. While the ramifications of such campaigns were evident in the case of the Indonesian general election, the threat to democratic elections in the West already exists. The research team at Notre Dame, comprised of digital forensics experts and specialists in peace studies, said they are developing the system to flag manipulated content to prevent violence, and to warn journalists or election monitors of potential threats in real time.”

The disinformation detecting system is built to be scalable so users can configure it to monitor different content. Current problems the research team is experiencing are figuring out how to optimize scalability for ingestion and processing to deliver fast results.

The newest decade in the twenty-first century might be dubbed the “disinformation age,” because of the false information circulating the Web. Some of it is harmless, but anyone who deals with trolls knows that it does not take much to ignite mob mentality on the Internet.

Whitney Grace, April 24, 2020

The Online Cohorts: A Potential Blind Sport

April 15, 2020

In a conversation last week, a teacher told me, “We are not prepared to teach classes online.” I sympathized. What appears trivial to a person who routinely uses a range of technology, a person accustomed to automatic teller machines, a mobile phone, and an Alexa device may be befuddled. Add to the sense of having to learn about procedures, there is the challenge of adopting in person skills to instructing students via a different method; for example, Google Hangouts, Zoom, and other video conferencing services. How is that shift going? There are anecdotal reports that the shift is not going smoothly.

That’s understandable. More data will become available as researchers and hopefully some teachers report the efficacy of the great shift from a high touch classroom to a no touch digital setting.

I noted “Students Often Do Not Question Online Information.” The article provides a summary of research that suggests:

students struggle to critically assess information from the Internet and are often influenced by unreliable sources.

Again, understandable.

The article points out a related issue:

“Having a critical attitude alone is not enough. Instead, Internet users need skills that enable them to distinguish reliable from incorrect and manipulative information. It is therefore particularly important for students to question and critically examine online information so they can build their own knowledge and expertise on reliable information,” stated Zlatkin-Troitschanskaia. [Professor Olga Zlatkin-Troitschanskaia from JGU. The study was carried out as part of the Rhine-Main Universities (RMU) alliance.]

Online is a catalyst. The original compound is traditional classroom teaching methodologies. The new element is online. The result appears to raise the possibility of a loss of certain thinking skills.

Net net: A long period of adaptation may be ahead. The problem of humans who cannot do math or think in a manner that allows certain statements to be classified as bunk and others as not bunk is likely to have a number of downstream consequences.

In short, certain types of thinking and critical analysis may become quite rare. Informed decisions may not be helpful if the information upon which a choice is based operates from a different type of fact base.

Maybe not so good?

Stephen E Arnold, April 15, 2020

Petrucci Music Library: Refreshing and Mostly Free

April 15, 2020

One of the most important things video content creators need is music. Music licensing fees are expensive and creators on a budget usually cannot afford them. The solution is public domain music, but that is more difficult to find than you think. The solution is the Wikipedia equivalent of public domain music: IMSLP. This is an organization:

“IMSLP, also known as the International Music Score Library Project or Petrucci Music Library, was started in 2006. The logo on the main page is a capital letter A. It was taken from the beginning of the very first printed book of music, the Harmonice Musices Odhecaton. It was published in Venice in 1501 by Ottaviano Petrucci, the library’s namesake. The IMSLP/Petrucci Music Library is currently owned and run by Project Petrucci LLC, a company created with the sole purpose of managing this site.”

Using the IMSLP requires a small subscription fee of $3/month or $28.00/year. Despite the fee, the library offers a catered content free of audio files, scores, no download waits, nor ads.

Users can also upload their music to IMSLP under a creative commons license and have their work heard all over the world.

Searching for public domain music is risky for anything newer than the 1920s. Music can easily be labeled as “public domain,” but it is the Internet and you cannot trust anything unless you do your research. If you pay the subscription fee, IMSLP’s content is all public domain and you do not need to worry about copyright infringements.

Whitney Grace, April 13, 2020

Next Page »

  • Archives

  • Recent Posts

  • Meta