Wonderful Statement about Baked In Search Bias

October 12, 2022

I was scanning the comments related to the HackerNews’ post for this article: “Google’s Million’s of Search Results Are Not Being Served in the Later Pages Search Results.”

Sailfast made this comment at this link:

Yeah – as someone that has run production search clusters before on technologies like Elastic / open search, deep pagination is rarely used and an extremely annoying edge case that takes your cluster memory to zero. I found it best to optimize for whatever is a reasonable but useful for users while also preventing any really seriously resource intensive but low value queries (mostly bots / folks trying to mess with your site) to some number that will work with your server main node memory limits.

The comment outlines a facet of search which is not often discussed.

First, the search plumbing imposes certain constraints. The idea of “all” information is one that many carry around like a trusted portmanteau. What are the constraints of the actual search system available or in use?

Second, optimization is a fancy word that translates to one or more engineers deciding what to do; for example, change a Bayesian prior assumption, trim content based on server latency, filter results by domain, etc.

Third, manipulation of the search system itself by software scripts or “bots” force engineers to figure out what signals are okay and which are not okay. It is possible to inject poisoned numerical strings or phrases into a content stream and manipulate the search system. (Hey, thank you, search engine optimization researchers and information warfare professionals. Great work.)

When I meet a younger person who says, “I am a search expert”, I just shake my head. Even open source intelligence experts display that they live in a cloud of unknowing about search. Most of these professionals are unaware that their “research” comes from Google search and maps.

Net net: Search and retrieval systems manifest bias, from the engineers, from the content itself, from the algorithms, and from user interfaces themselves. That’s why I say in my lectures, “Life is easier if one just believes everything one encounters online.” Thinking in a different way is difficult, requires specialist knowledge, and a willingness to verify… everything.

Stephen E Arnold, October 12, 2022

US Federally Funded Research: Open Access, Folks

September 7, 2022

In a surprise announcement, reports Ars Technica, “US Government to Make All Research it Funds Open Access on Publication.” The new policy was issued by the Office of Science and Technology Policy (OSTP) at the end of August. We expect this will be a windfall for researchers—in and outside the US. Though the US government is believed to be the world’s largest funder of scientific research, only those paying for subscriptions to academic journals have had access to many (most?) publicly funded studies. Writer John Timmer notes this constraint has loosened in recent years as a result of increased open-access journals and, especially with COVID-19 research, a trend toward preprints. We learn:

“Some people involved in scientific publishing worried that these trends would undercut the finances of the entire publishing industry, while others hoped to push them to open up all scientific publishing. This tension played out in the halls of Congress, where competing legislation would mandate or block open access to federal research. A truce of sorts was reached during the Obama administration. For federally funded research, publishers had two choices: either make the publication open access from the start or have subscription-only access for a year before opening things up. Government-sponsored repositories were opened to host copies of papers that weren’t made open access on the publisher’s site. In the intervening time, there has been a lot of growth in open access journals, and many subscription journals allowed authors to pay a fee to immediately open published papers. Most subscription journals also offered COVID-related papers as open access without any additional fees. OSTP has apparently decided that these adjustments have prepared the industry to survive even greater access levels.”

One provision requires a digital identifier, like a DOI, for all data and documentation. The policy memorandum argues the benefits of open access became apparent during the pandemic, when it accelerated researchers’ understanding of the virus and the development of a vaccine. Acting head of the OSTP Alondra Nelson expects the change will lead to gains across society. She stated:

“When research is widely available to other researchers and the public, it can save lives, provide policymakers with the tools to make critical decisions, and drive more equitable outcomes across every sector of society.”

Publishers have some time to pivot—the policy goes fully into effect in 2026. The article notes they could still make a buck from these papers by creating versions with added features like integrated graphics / videos or cross-references to other studies. Will that be enough to sooth ruffled feathers?

Cynthia Murrell, September 7, 2022

Deep Fakes: Alarming Predictions Made Real

August 25, 2022

Here is one disturbing way deep fakes can provide bad actors with a new money-making opportunity. Reporting on a growing, Google-play enabled problem, Rest of World tells us “Mexican Scam Loan Apps Will Edit Your Face onto X-Rated Photos and Send Them to Your Family.” Yes, at least one victim had such faked images sent to her contacts, including her minor daughter, along with the claim she had turned to prostitution to pay off her loan. In their odious efforts to collect more funds than they lent in the first place, these outfits will also dox victims and harass with ceaseless threatening phone calls. Reporter Erika Lilian Contreras writes:

Rest of World identified 94 loan apps listed as possibly related to doxxing activities across various Mexican cyber police departments; 35 of these are available on Google’s Play store, the app store on Android devices, which account for 80.87% of mobile internet traffic in Mexico. According to victims, government officials, digital rights activists, and platforms, gaps in Mexican law allow these lenders to continue to push scams on app stores and leave victims with no clear avenue to restitution or justice. Consumer education and criminal investigations that come after a crime has occurred are the only ways to currently combat this new type of digital financial fraud, but none of these efforts has stopped the apps from appearing in app stores, according to digital security experts who spoke to Rest of World.”

Mexico’s financial services regulators say it is not their problem since these loan apps are not registered financial institutions. This leaves the national and local cybersecurity agents who, so far, have been unable to keep the problem from growing. Bad actors know a good opportunity when they see one. The article notes:

“While the Mexican government can do little and the tech companies platforming scam loan apps won’t take responsibility, the onus has fallen onto civil society. … However, most activism is limited to educating Mexicans about the risks posed by scam loan apps.”

It would be nice if Google would supply even an ounce of prevention here as it finally did in India. That, however, was only after the Reserve Bank of India forcefully drew about 600 scam-credit apps to its attention.

We wonder: where will deep fake technology be weaponized next? We think we know one knock on effect: Open source intelligence will be eroded or poisoned.

Cynthia Murrell, August 25, 2022

TikTok: Is Joe Rogan the Person to Blow the Whistle on Chinese Surveillance?

August 3, 2022

TikTok has been around since 2015 as A.me and Douyin. If you want to scrape below the shiny surface of the TikTok rags-to-riches story, there something called Musical.ly which surfaced in 2014. In 2018, the Musical.ly management team decided that selling to ByteDance was a super great idea. Then TikTok was created to entertain and log data. Few talk about the link to certain entities in the Chinese political structure. Even fewer think that short videos were bad. Sure, there were allegations of self harm, addiction, erosion of self worth, and students who preferred watching vids pumped at them by a magical algorithm. Nobody, including some Silicon Valley real news people with an inflated view of their intellectual capabilities said, “Yo, TikTok is a weaponized content delivery and surveillance system.” Nope. Just cute videos. What’s the problemo?

Who is now concerned about TikTok? The NSA? The CIA? The badge-and-gun entities in the US Federal government? Well, maybe. But the big voice is now a semi-real sports event announcer. “Joe Rogan Warns Americans about TikTok: China Knows Every … Thing You type.” Hey, Joe, don’t forget psychographic profiling to identify future insider operators, please.

The article reports:

Rogan listed the other data being collected by the popular platform. “‘User agent, mobile carrier, time zone settings, identifiers for advertising purpose, model of your device, the device system, network type, device IDs, your screen resolution and operating system, app and file names and types,’” he said. “So all your apps and all your file names, all the things you have filed away on your phone, they have access to that.” He continued: “‘File names and types, keystroke patterns or rhythms.’”

Hot intel, Mr. Rogan.

Where did this major news originate? From Mr. Rogan’s wellness infused research?

Nope. He read the terms of service.

The estimable newspaper pointed out:

… the tech news site Gizmodo reported that leaked internal documents from TikTok showed the extent to which the app sought to “downplay the China association.” The documents, labeled “TikTok Master Messaging” and “TikTok Key Messages,” detail the social media giant’s public relations strategy during a period of mounting scrutiny from regulators and lawmakers over its parent company ByteDance and its ties to the Chinese Community Party.

Gizmodo? Is this Silicon Valley type “real news” outlet emulating Cryptome.org?

According to the cited New York Post story:

TikTok has pledged to “publish insights about the covert influence operations we identify and remove from our platform globally to show how seriously we take attempts to mislead our community.”

That sounds good just like a cyber security firm’s PowerPoint deck. Talk, however, is not action.

Maybe Mr. Rogan can use his ring announcer voice to catch people’s attention? I am not sure some of the TikTok lovers will listen or believe what Mr. Rogan discovered in the super stealthy terms of service for TikTok.

That’s real open source intel. Put Mr. Rogan on a panel at the next OSINT conference, please. I mean TikTok has a 10 year history and it seems to be quite new to some folks.

Stephen E Arnold, August 3, 2022

UK Organization to Harness Open Source Intelligence

June 30, 2022

Technical innovations over the last decade or so have empowered civilians with tools and information once the strict purview of government agencies. Now the war in Ukraine has prompted a new effort to harness that tech, we learn from the BBC article, “New UK Centre Will Help Fight Information War.” Those behind The Centre for Emerging Technology and Security (CETaS), based at the Alan Turing Institute, have noticed the efforts of Open Source Intelligence (OSINT) enthusiasts are proving effective against Russia’s disinformation campaign (outside of Russia anyway). The new center hopes to develop and channel this expertise. Reporter Gordon Corera writes:

“US and UK governments have been active in using open-source information to be able to talk publicly about what their secret sources are indicating. But this type of information is most powerfully used by those outside government to reveal what is really happening on the ground. On the evening of 23 February, graduate students in Monterey, California, who had been using publicly available satellite imagery to watch Russian tanks on the border with Ukraine, saw Google Maps showing a traffic jam inching towards the Ukrainian border. They tweeted that a war seemed to have started, long before any official announcement.”

Since the invasion, others have used OSINT to illuminate possible war crimes and counter Russian propaganda. For several years, many have considered Russia to be ahead of the tech information game with its weaponization of social media and hacking prowess. According to a pair of anonymous UK officials, however, the balance has shifted since the war began thanks to the skilled use of OSINT resources. Imagine what could be achieved if only such efforts were focused by a dedicated organization. The article continues:

“Harnessing new technology to maintain an edge is part of the new center’s mission. This could include fields like automated recognition of military vehicles from satellite imagery or social media, allowing human experts to spend their time on trickier problems. Tools are already allowing greater translation and interpretation of foreign language material. Artificial Intelligence can also be used to reveal patterns in behavior or language that indicate the presence of an organized disinformation network on social media. Dealing with these challenges at speed is one of the ambitions for the center which aims to build a community that can keep pace with the growing amount of data and tools to exploit it.”

But will the center be able to overcome the barriers? Intelligence agencies face not only regulatory and technical restrictions on what data they can use, but also a bias against information from beyond their institutions. We wonder whether the trend toward pay-to-play OSINT resources help or hurt the cause.

Cynthia Murrell, June 30, 2022

Google: Is The Ad Giant Consistently Inconsistent?

June 21, 2022

Not long ago, the super bright smart software management team decided that Dr. Timnit Gebru’s criticism of the anti-bias efficacy was not in sync with the company’s party line. The fix? Create an opportunity for Dr. Gebru to find her future elsewhere. The idea that a Googler would go against the wishes of the high school science club donut selection was unacceptable. Therefore, there’s the open window. Jump on through.

I recall reading about Google’s self declared achievement of quantum supremacy. This was an output deemed worthy of publicizing. Those articulating this wild and crazy idea in the midst of other wild and crazy ideas met the checklist criteria for academic excellence, brilliant engineering, and just amazing results. Pick out a new work cube and soldier on, admirable Googler.

I know that the UK’s Daily Mail newspaper is one of the gems of online trustworthiness. Therefore, I read “Google Engineer Warns the Firm’s AI Is Sentient: Suspended Employee Claims Computer Programme Acts Like a 7 or 8-Year-Old and Reveals It Told Him Shutting It Off Would Be Exactly Like Death for Me. It Would Scare Me a Lot.” (Now that’s a Googley headline! A bit overdone, but SEO, you know.)

The write up states:

A senior software engineer at Google who signed up to test Google’s artificial intelligence tool called LaMDA (Language Model for Dialog Applications), has claimed that the AI robot is in fact sentient and has thoughts and feelings.

No silence of the lambda in this example.

The write up adds:

Lemoine worked with a collaborator in order to present the evidence he had collected to Google but vice president Blaise Aguera y Arcas and Jen Gennai, head of Responsible Innovation at the company dismissed his claims. He was placed on paid administrative leave by Google on Monday [June 6, 2022 presumably] for violating its confidentiality policy.

What do these three examples suggest to me this fine morning on June 12, 2022?

  1. Get shown the door for saying Google’s smart software is biased and may not work as advertised and get fired for saying the smart software works really super because it is now alive. Outstanding control of corporate governance and messaging!
  2. The Google people management policies are interesting? MBA students, this is a case example to research. Get the “right” answer, and you too can work at Google. Get the wrong answer, and you will not understand the “value” of calculating pi to lots of decimal places!
  3. Is the objective of Google’s smart software to make search “work” or burn through advertising inventory? If I were a Googler, I sure wouldn’t write a paper on this topic.

Ah, the Google.

Stephen E Arnold, June 21, 2022

Open Source: Dietary Insights

May 5, 2022

One of the more benign news briefs about Russia these days concerns the eating habits of the country’s secret police. The Verge explains how delivery apps revealed Russian law enforcement’s food preferences: “Data Leak From Russian Delivery App Shows Dining Habits Of The Secret Police.” A massive data leak from Yandex Food, a large food delivery service in Russia, contained names, addresses, phone numbers, and delivery instructions related to the secret police.

Yandex Food is a subsidiary of the Russian search engine of the same name. The data leak occurred on March 1 and Yandex blamed it on the bad actions of one of its employees. The leak did not include users’ login information. The Roskomnadzor, the Russian government agency responsible for mass media, threatened Yandex with a 100,000 ruble fine and it also blocked a map containing citizen and secret police data.

Bellingcat researchers were investigating leads on the poisoning of Alexey Navalny, the Russian opposition leader. They searched the Yandex Food database collected from a prior investigation and discovered a person who was in contact with Russia’s Federal Security Service (FSB) to plan Navalny’s poisoning. The individual used his work email to register with Yandex Food. They also searched for phone numbers linked to Russia’s Main Intelligence Directorate (GRU). Bellingcat found interesting information in the leak:

“Bellingcat uncovered some valuable information by searching the database for specific addresses as well. When researchers looked for the GRU headquarters in Moscow, they found just four results — a potential sign that workers just don’t use the delivery app, or opt to order from restaurants within walking distance instead. When Bellingcat searched for FSB’s Special Operation Center in a Moscow suburb, however, it yielded 20 results. Several results contained interesting delivery instructions, warning drivers that the delivery location is a military base. One user told their driver “Go up to the three boom barriers near the blue booth and call. After the stop for bus 110 up to the end,” while another said ‘Closed territory. Go up to the checkpoint. Call [number] ten minutes before you arrive!’”

The most scandalous information leaked from the Yandex Food breach was information about Putin’s former mistress and their “suspected daughter.”

While it is hilarious to read about Russian law enforcement’s eating habits, it is alarming when the situation is applied to the United States. Imagine all of the information DoorDash, Grubhub, Uber Eats, and other delivery services collect on customers. There was a DoorDash data leak in 2019 that affected 4.9 million people and it was much larger than the Yandex Food leak.

Whitney Grace, May 5, 2022

Simple, Fair Digital Markets: Saddle Up, Don Quixote

March 25, 2022

Who knew that I would continue to reference the very long, very weird book I had to read in the seventh grade? Yet here I am: Don Quixote, slayer of windmills, a trusted sidekick, and a study horse.

Europe Agrees New Law to Curb Big Tech Dominance” explains that the proud animal and adept rider is ambling from the barn after decades of training. Tally ho! The write up says:

Under the Digital Markets Act (DMA), giants such as Google and Apple will be forced to open up their services and platforms to other businesses. Major technology firms have long faced criticism that they use their market dominance to squeeze out competition.

Now that certain US technology outfits are dominant, what’s the fix? I suppose one could dismount and paint the windmills a different color. Where would one locate a color? How about Googling? Alternatively one might consult a Facebook group. And there is the ever objective Amazon, complete with fake reviews and odd ball videos showing a functioning product? Amazing.

Outfitted like the elegant Don, the trusted source of information reports:

The EU wants to give users more choice over how people send messages. The new rules would require that technology make their messaging services interoperable with smaller competitors.

As the rider, cohorts, and snorting animals charge at their targets, will the companies be fungible. Might they prove to be chimera?

At least one of the evil entities is Googzilla? Despite its age, the creature still has teeth, lots of teeth, and lawyers, lots of lawyers.

Stephen E Arnold, March 25, 2022

Beyond Search and Dark Cyber Changes

February 14, 2022

Okay, I will be 78 in 2022. I have to be pragmatic about the content I have generated and posted without ads, commercial support, or compensation of any type since 2008. If you are a fan of Beyond Search, you will notice that we have removed the images, charts, graphs, and other visual accoutrements which we included in some blog posts. Why? I worked in online databases and publishing for many years before I retired. I operated within the boundaries of my understanding of fair use. I am now receiving machine generated allegations that I have not followed the definition of fair use now in play. Because I am creeping up in years, I don’t want to leave content online which can spawn assorted claims. Accordingly, we will be removing content. There are more than 12,000 posts in Beyond Search. Some of these contain obscure information about online search and retrieval. The illustrations in these were created by me. Nevertheless, these illustrations are goners as well.

And what about Dark Cyber? We have removed the videos posted as Honkin’ News and Dark Cyber from public access. If you want to view a video, you will have to go through a process which I have to determine. You can always ask about a video by writing benkent2020 at yahoo dot com.

Since I retired and stopped running around, giving lectures, and talking to people intrigued by my contrarian approach — traffic and viewership has slowly decreased. Now with the advent of artificially intelligent systems which proactively seek opportunities to assert that an entity has knowingly operated outside the boundaries of fair use, I am making these changes.

I will produce a new video series called “Stephen E Arnold’s OSINT Radar.” The illustrations in that series will come from the open source Web sites I talk about. In theory, this type of content will be within the boundaries of the fair use concept. If not, well, I am not sure what a person of my age can do. Die, for sure. Stop creating free, unsponsored, unbiased information, maybe.

One problem: With the online information I created over the years, those who are misinformed about certain aspects of search and the behavior of online information will never know how off base some of their systems, methods, and concepts are.

That’s the normal trajectory of the US democracy. As Alexis de de Tocqueville observed, average is just average.

Stephen E Arnold, February 14, 2022

Open Source How To: Hook Teams to Social Media

January 19, 2022

I read “Internal Facebook Note: Here Is A “Psychological Trick” To Target Teens.” Interesting stuff. One of the insightful items in the write up is that Facebook shut down the TBH operation. Well, that’s an assertion which a prudent person may want to verify. The write up also contains one of the Cambridge Analytica-type insights, a mini step by step guide to hooking a target sector.

Here’s the how to:

TBH noticed that teens often list their high school in their Instagram bio. So, using a private Instagram account of its own, the company would visit a school’s location page and follow all accounts that included the school’s name. TBH made sure its private account featured a mysterious call to action — something like “You’ve been invited to the new RHS app — stay tuned!” The startup would make one private account for each high school it wanted to target. The company found teens were naturally curious and would follow the private account back.

Helpful, particularly to bad actors without access to a pool of psychological tricks.

Stephen E Arnold, January 19, 2022

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta