Google Experiment: News? Nobody Cares So Ad Impact Is Zero, Baby, Zero

March 24, 2025

Dinobaby, here. No smart software involved unlike some outfits.

I enjoy reading statistically valid wizard studies from monopolistic outfits. “Our Experiment on the Value of European News Content” reports a wonderful result: Nobody cares if Googzilla does not index “real” news. That’s it. The online ad outfit conclusively proves that “real” news is irrelevant.

The write up explains:

The results have now come in: European news content in Search has no measurable impact on ad revenue for Google. The study showed that when we removed this content, there was no change to Search ad revenue and a <1% (0.8%) drop in usage, which indicates that any lost usage was from queries that generated minimal or no revenue. Beyond this, the study found that combined ad revenue across Google properties, including our ad network, also remained flat.

What should those with a stake in real news conclude? From my point of view, Google is making crystal clear that publishers need to shut up or else. What’s the “else”? Google stops indexing “real” news sites. Where will those “real” news sites get traffic. Bear Blog, a link from YCombinator Hacker News, a Telegram Group, Twitter, or TikTok?

Sure, absolutely.

Several observations:

Fool around with a monopoly in the good old days, and some people would not have a train stop at their town in Iowa or the local gas stations cannot get fuel. Now it is search traffic. Put that in your hybrid.
Google sucks down data. Those who make data available to the Google are not likely to be invited to the next Sundar & Prabhakar Comedy Show.
Google will continue to flip the digital bird at the EU, stopping when the lawsuits go away and publishers take their medicine and keep quiet. The crying and whining is annoying.

One has to look forward to Google’s next research study, doesn’t one?

Stephen E Arnold, March 24, 2025

Written by Stephen E. Arnold · Filed Under Google, News, Publishing | Comments Off on Google Experiment: News? Nobody Cares So Ad Impact Is Zero, Baby, Zero

Google and Job Security? What a Hoot

February 4, 2025

We have smart software, but the dinobaby continues to do what 80 year olds do: Write the old-fashioned human way. We did give up clay tablets for a quill pen. Works okay.

Yesterday (January 30, 2025), one of the group mentioned that Google employees were circulating a YAP. I was not familiar with the word “yap”, so I asked, “What’s a yap?” The answer: It is yet another petition.

Here’s what I learned and then verified by a source no less pristine than NBC news. About a 1,000 employees want Google to assure the workers that they have “job security.” Yo, Googlers, when lawyers at the Department of Justice and other Federal workers lose their jobs between sips of their really lousy DoJ coffee, there is not much job security. Imagine professionals with sinecures now forced to offer some version of reality on LinkedIn. Get real.

The “real” news outfit reported:

Google employees have begun a petition for “job security” as they expect more layoffs by the company. The petition calls on Google CEO Sundar Pichai to offer buyouts before conducting layoffs and to guarantee severance to employees that do get laid off. The petition comes after new CFO Anat Ashkenazi said one of her top priorities would be to drive more cost cutting as Google expands its spending on artificial intelligence infrastructure in 2025.

I remember when Googlers talked about the rigorous screening process required to get a job. This was the unicorn like Google Labs Aptitude Test or GLAT. At one point, years ago, someone in the know gave me before a meeting the “test.” Here’s the first page of the document. (I think I received this from a Googler in 2004 or 2005 five:

If you can’t read this, here’s question 6:

One your first day at Google, you discover that your cubicle mate wrote the textbook you used as a primary resource in your first year of graduate school. Do you:

a) Fawn obsequiously and ask if you can have an aut0ograph

b) Sit perfectly still and use only soft keystrokes to avoid disturbing her concentration

c) Leave her daily offerings of granola and English toffee from the food bins

d) Quote your favorite formula from the text book and explain how it’s now your mantra

e) Show her how example 17b could have been solved with 34 fewer lines of code?

I have the full GLAT if you want to see it. Just write benkent2020 at yahoo dot com and we will find a way to provide the allegedly real document to you.

The good old days of Googley fun and self confidence are, it seems, gone. As a proxy for the old Google one has employees we have words like this:

“We, the undersigned Google workers from offices across the US and Canada, are concerned about instability at Google that impacts our ability to do high quality, impactful work,” the petition says. “Ongoing rounds of layoffs make us feel insecure about our jobs. The company is clearly in a strong financial position, making the loss of so many valuable colleagues without explanation hurt even more.”

I would suggest that the petition won’t change Google’s RIF. The company faces several challenges. One of the major ones is the near impossibility of paying for [a] indexing and updating the wonderful Google index, [b] spending money in order to beat the pants off the outfits which used Google’s transformer tricks, and [c] buy, hire, or coerce the really big time AI wizards to join the online advertising company instead of starting an outfit to create a wrapper for Deepseek and getting money from whoever will offer it.

Sorry, petitions are unlikely to move a former McKinsey big time blue chip consultant. Get real, Googler. By the way, you will soon be a proud Xoogler. Enjoy that distinction.

Stephen E Arnold, February 4, 2025

Written by Stephen E. Arnold · Filed Under AI, Business strategy, Financial, Google, News | Comments Off on Google and Job Security? What a Hoot

Microsoft Still Searching after All These Years

January 28, 2025

Finally, long-suffering Windows users will get a better Windows Search. But only if they are willing to mix AI with their OS. The Register reports, "Improved Windows Search Arrives… But Only for Copilot+ PCs." Reporter Richard Speed writes:

"Windows Search has been the punchline to many a Windows joke over the years. The service is intended to provide an easy way of finding content on a local machine, and has previously been mocked for being slow and unreliable. It was blamed for various failures, from causing high CPU usage and toppling over when bits of infrastructure had issues, to tripping up other applications, such as Outlook. Microsoft is making improvements in the latest Dev Channel release – although only for Copilot+ PCs – and is ‘introducing semantic indexing along with traditional indexing.’ This means typing some natural phrases into the Windows search box on the taskbar or searching in File Explorer, which will produce a list of documents that include items close to or related to the search terms."

For now, Windows Search continues to only work on files stored locally. However, Microsoft plans to expand that to documents in the cloud in a future release. The firm promises "no data gathered during the indexing is sent to the company or used to train AI models." Sure.

How many search systems does Microsoft have? How many work as users expect? Our suggestion: Use Everything search. Not only does that freeware tool work well, it does not require one to embrace AI to function. Oh, Microsoft, keep searching. One day you may find a way to locate information in a Windows system. Maybe?

Cynthia Murrell, January 28, 2025

Written by Stephen E. Arnold · Filed Under Microsoft, News, Search | Comments Off on Microsoft Still Searching after All These Years

Good Enough Junk Food: Knowledge without the Work

December 16, 2024

This write up emerged from the dinobaby’s own mind. Yes, the dinobaby used AI because this write up is about the knowledge value lost with smart software. Intellectual junk food is not the same as my mother’s overcooked chicken which she killed on her aunt’s farm. Cluck, cluck, squawk.

I plugged the Guardian’s opinion essay into You.com’s Chat GPT-4o model. You can read the original essay titled “Is Doom Scrolling Really Rotting Our Brains? The Evidence Is Getting Harder to Ignore.” The original essay is clear, reasonably well researched, and structured to speak to [a] a person who has access to the Guardian online or in (gasp!) a print form, [b] who reads the Guardian’s opinion essays, and [c] who has the ability to recognize that “brain rot” has probably affected that individual, co-workers, government leaders, children, and people who repair refrigerators.

This illustration is good enough and it was produced by the AI system Magic Studio.

Here’s the summary Chat GPT-4o spit out:

The essay critiques the decline of human intellect in the digital age, using the viral “Skibidi Toilet” video as a cultural metaphor for deteriorating focus and intellectual engagement. The Oxford English Dictionary’s 2024 word of the year, “brain rot”, reflects this phenomenon, defined as the mental degradation caused by excessive consumption of trivial online content. The essay draws on academic research to demonstrate how compulsive internet use is literally shrinking grey matter, weakening memory, shortening attention spans, and distorting cognitive functions. Studies from Harvard, Oxford, and King’s College reveal the harmful effects of media multitasking and heavy internet usage, with findings such as reduced grey matter in the brain and the risk of “digital dementia” in young people during critical developmental years.The prevalence of infinite scroll and dopamine-driven design in apps is highlighted as a deliberate tactic by tech companies to exploit human psychology for profit, creating powerful addiction loops. Former Google design ethicist Tristan Harris warns that this “race to the bottom of the brain stem” manipulates fear, outrage, and reward systems, potentially driving society to catastrophe if left unchecked.Despite the grim outlook, the essay finds hope in the growing self-awareness among younger generations, who coined the term “brain rot” and recognize its effects. Movements advocating for reduced technology use, such as switching to dumbphones or promoting smartphone-free childhoods, are gaining traction. These trends suggest the possibility of reclaiming mental clarity and focus, countering the current trajectory of cognitive decline.Ultimately, the essay positions “Skibidi Toilet” as both a symptom of modern intellectual decay and a call to action. Humanity stands at a critical juncture: to either rise above the current trends of distraction and degradation or succumb further to the “U-bend” of intellectual regression.

For what it is worth, this is a typical bite-sized knowledge cookie. No one will know if the summary is good or bad unless that person takes the time to get the original full text and compare it with this AI generated output. The informational fast food provides a sugar jolt from saving time or the summary consumer’s belief that the important information is on the money. A knowledge cookie if you will, or maybe intellectual junk food?

Is this summary good enough? From my point of view, it is just okay; that is, good enough. What else is required? Flash back to 1982, the ABI/INFORM database was a commercial success. A couple of competitors were trying to capture our customers which was tricky. Intermediaries like Dialog Information Services, ESA, LexisNexis (remember Buster and his silver jumpsuit?), among others “owned” the direct relationship with the companies that paid the intermediaries to use the commercial databases on their systems. Then the intermediaries shared some information with us, the database producers.

How did a special librarian or a researcher “find” or “know about” our database? The savvy database producers provided information to the individuals interested in a business and management related commercial database. We participated in niche trade shows. We held training programs and publicized them with our partners Dow Jones News Retrieval, Investext, Predicasts, and Disclosure, among a few others. Our senior professionals gave lectures about controlled term indexing, the value of classification codes, and specific techniques to retrieve a handful of relevant citations and abstracts from our online archive. We issued news releases about new sources of information we added, in most cases with permission of the publisher.

We did not use machine indexing. We did have a wizard who created a couple of automatic indexing systems. However, when the results of what the software in 1922 could do, we fell back on human indexers, many of whom had professional training in the subject matter they were indexing. A good example was our coverage of real estate management activities. The person who handled this content was a lawyer who preferred reading and working in our offices. At this time, the database was owned by the Courier-Journal & Louisville Times Co. The owner of the privately held firm was an early adopted of online and electronic technology. He took considerable pride in our line up of online databases. When he hired me, I recall his telling me, “Make the databases as good as you can.”

How did we create a business and management database that generated millions in revenue and whose index was used by entities like the Royal Bank of Canada to index its internal business information?

Here’s the secret sauce:

We selected sources in most cases business journals, publications, and some other types of business related content; for example, the ANBAR management reports
The selection of which specific article to summarize was the responsibility of a managing editor with deep business knowledge
Once an article was flagged as suitable for ABI/INFORM, it was routed to the specialist who created a summary of the source article. At that time, ABI/INFORM summaries or “abstracts” were limited to 150 words, excluding the metadata.
An indexing specialist would then read the abstract and assign quite specific index terms from our proprietary controlled vocabulary. The indexing included such items as four to six index terms from our controlled vocabulary and a classification code like 7700 to indicate “marketing” with addition two digit indicators to make explicit that the source document was about marketing and direct mail or some similar subcategory of marketing. We also included codes to disambiguate between a railroad terminal and a computer terminal because source documents assumed the reader would “know” the specific field to which the term’s meaning belonged. We added geographic codes, so the person looking for information could locate employee stock ownership in a specific geographic region like Northern California, and a number of other codes specifically designed to allow precise, comprehensive retrieval of abstracts about business and management. Some of the systems permitted free text searching of the abstract, and we considered that a supplement to our quite detailed indexing.
Each abstract and index terms was checked by a control control process using people who had demonstrated their interest in our product and their ability to double check the indexing.
We had proprietary “content management systems” and these generated the specific file formats required by our intermediaries.
Each week we updated our database and we were exploring daily updates for our companion product called Business Dateline when the Courier Journal was broken up and the database operation sold to a movie camera company, Bell+Howell.

Chat GPT-4o created the 300 word summary without the human knowledge, expertise, and effort. Consequently, the loss of these knowledge based workflow has been replaced by a smart software which can produce a summary in less than 30 seconds.

And that summary is, from my point of view, good enough. There are some trade offs:

Chat GPT-4o is reactive. Feed it a url or a text, and it will summarize it. Gone is the knowledge-based approach to select a specific, high-value source document for inclusion in the database. Our focus was informed selection. People paid to access the database because of the informed choice about what to put in the database.
The summary does not include the ABI/INFORM key points and actionable element of the source document. The summary is what a high school or junior college graduate would create if a writing teacher assigned a “how to write a précis” as part of the course requirements. In general, high school and junior college graduates are not into nuance and cannot determine the pivotal information payload in a source document.
The precise indexing and tagging is absent. One could create a 1,000 such summaries, toss them in MISTRAL, and do a search. The result is great if one is uninformed about the importance of editorial polices, knowledge-based workflows, and precise, thorough indexing.

The reason I am sharing some of this “ancient” online history is:

The loss of quality in online information is far more serious than most people understand. Getting a summary today is no big deal. What’s lost is simply not on these individuals’ radar.
The lack of an editorial policy, precise date and time information, and the fine-grained indexing means that one has to wade through a mass of undifferentiated information. ABI/INFORM in the 1080s delivered a handful of citations directly on point with the user’s query. Today no one knows or cares about precision and recall.
It is now more difficult than at any other time in my professional work career to locate needed information. Public libraries do not have the money to obtain reference materials, books, journals, and other content. If the content is online, it is a dumbed down and often cut rate version of the old-fashioned commercial databases created by informed professionals.
People look up information online and remain dumb; that is, the majority of the people with whom I come in contact routinely ask me and my team, “Where do you get your information?” We even have a slide in our CyberSocial lecture about “how” and “where.” The analysts and researchers in the audience usually don’t know so an entire subculture of open source information professionals has come into existence. These people are largely on their own and have to do work which once was a matter of querying a database like ABI/INFORM, Predicasts, Disclosure, Agricola, etc.

Sure the essay is good. The summary is good enough. Where does that leave a person trying to understand the factual and logical errors in a new book examining social media. In my opinion, people are in the dark and have a difficult time finding information. Making decisions in the dark or without on point accurate information is recipe for a really bad batch of cookies.

Stephen E Arnold, December 15, 2024

Written by Stephen E. Arnold · Filed Under AI, Business process, Database, News | Comments Off on Good Enough Junk Food: Knowledge without the Work

Listary: A Chinese Alternative to Windows File Explorer

December 5, 2024

For anyone frustrated with Windows’ built-in search function, Lifehacker suggests an alternative. “Listary Is a Fast, Powerful Search Tool for Windows,” declares writer Justin Pot. He tells us:

“Listary is a free app with great indexing that allows you to find any file on your computer in just a couple of keystrokes. Tap the control key twice, start typing, and hit enter when you see what you want. You can also use the tool to launch applications or search the web. … The keyboard shortcut brings up a search window similar to Spotlight on the Mac. There is also a more advanced version of the application which you can bring up by clicking the tray icon for the application. This lets you do things like filter your search by file type or how recently it was created. This view also notably allows you to preview files before opening them, which I appreciate. You’re not limited to searching on your computer—you can also start web searches from here.”

That Web search function is preloaded with a few search engines, like Google, Wikipedia, IMDB, and YouTube, but one can add more platforms. The free version of Listary is for personal use only. The company, Bopsoft, makes its money on the Pro version, which is $20. Just once, not monthly or annually. That version offers network-drive indexing and customization options. Bopsoft appears to be based in Zaozhuang, China.

Cynthia Murrell, December 5, 2024

Written by Stephen E. Arnold · Filed Under News, Technology, Windows | Comments Off on Listary: A Chinese Alternative to Windows File Explorer

More Googley Human Resource Goodness

November 22, 2024

This essay is the work of a dumb dinobaby. No smart software required.

The New York Post reported that a Googler has departed. “Google News Executive Shailesh Prakash Resigns As Tensions with Publishers Mount: Report” states:

Shailesh Prakash had served as a vice president and general manager for Google News. A source confirmed that he is no longer with the company… The circumstances behind Prakash’s resignation were not immediately clear. Google declined to comment.

Google tapped a professional who allegedly rode in the Bezos bulldozer when the world’s second or third richest man in the world acquired the Washington Post. (How has that been going? Yeah.)

Thanks, MidJourney. Good enough.

Google has been cheerfully indexing content and selling advertising for decades. After a number of years of talking and allegedly providing some support to outfits collecting, massaging, and making “real” news available, the Google is facing some headwinds.

The article reports:

The Big Tech giant rankled online publishers last May after it introduced a feature called “AI Overviews” – which places an auto-generated summary at the top of its search results while burying links to other sites. News Media Alliance, a nonprofit that represents more than 2,200 publishers, including The Post, said the feature would be “catastrophic to our traffic” and has called on the feds to intervene.

News flash from rural Kentucky: The good old days of newspaper publishing are unlikely to make a comeback. What’s the evidence for this statement? Video and outfits like Telegram and WhatsApp deliver content to cohorts who don’t think too much about a print anything.

The article pointed out:

Last month, The Post exclusively reported on emails that revealed how Google leveraged its access to the Office of the US Trade Representative as it sought to undermine overseas regulations — including Canada’s Online News Act, which required Google to pay for the right to display news content.

You can read that report “Google Emails with US Trade Reps Reveal Cozy Ties As Tech Giant Pushed to Hijack Policy” if you have time.

Let’s think about why a member of Google leadership like Shailesh Prakash would bail out. Among the options are:

He wanted to spend more time with his family
Another outfit wanted to hire him to manage something in the world of publishing
He failed in making publishers happy.

The larger question is, “Why would Google think that one fellow could make a multi-decade problem go away?” The fact that I can ask this question reveals how Google’s consulting infused leaders think about an entire business sector. It also provides some insight into the confidence of a professional like Mr. Prakash.

What flees sinking ships? Certainly not the lawyers that Google will throw at this “problem.” Google has money and that may be enough to buy time and perhaps prevail. If there aren’t any publishers grousing, the problem gets resolved. Efficient.

Stephen E Arnold, November 22, 2024

Written by Stephen E. Arnold · Filed Under Google, Legal matters, Management, News, Publishing | Comments Off on More Googley Human Resource Goodness

Entity Extraction: Not As Simple As Some Vendors Say

November 19, 2024

No smart software. Just a dumb dinobaby. Oh, the art? Yeah, MidJourney.

Most of the systems incorporating entity extraction have been trained to recognize the names of simple entities and mostly based on the use of capitalization. An “entity” can be a person’s name, the name of an organization, or a location like Niagara Falls, near Buffalo, New York. The river “Niagara” when bound to “Falls” means a geologic feature. The “Buffalo” is not a Bubalina; it is a delightful city with even more pleasing weather.

The same entity extraction process has to work for specialized software used by law enforcement, intelligence agencies, and legal professionals. Compared to entity extraction for consumer-facing applications like Google’s Web search or Apple Maps, the specialized software vendors have to contend with:

Gang slang in English and other languages; for example, “bumble bee.” This is not an insect; it is a nickname for the Latin Kings.
Organizations operating in Lao PDR and converted to English words like Zhao Wei’s Kings Romans Casino. Mr. Wei has been allegedly involved in gambling activities in a poorly-regulated region in the Golden Triangle.
Individuals who use aliases like maestrolive, james44123, or ahmed2004. There are either “real” people behind the handles or they are sock puppets (fake identities).

Why do these variations create a challenge? In order to locate a business, the content processing system has to identify the entity the user seeks. For an investigator, chopping through a thicket of language and idiosyncratic personas is the difference between making progress or hitting a dead end. Automated entity extraction systems can work using smart software, carefully-crafted and constantly updated controlled vocabulary list, or a hybrid system.

Automated entity extraction systems can work using smart software, carefully-crafted and constantly updated controlled vocabulary list, or a hybrid system.

Let’s take an example which confronts a person looking for information about the Ku Group. This is a financial services firm responsible for the Kucoin. The Ku Group is interesting because it has been found guilty in the US for certain financial activities in the State of New York and by the US Securities & Exchange Commission.

Written by Stephen E. Arnold · Filed Under AI, Entity extraction, Feature, intelware, law enforcement, Text processing | 1 Comment

Open Podcast Index Lists Many

September 24, 2024

Podcasters who wish to be indexed by Apple or Spotify must abide by certain guidelines, some of which appear arbitrary or self-serving to some. Enter the Podcast Index, introduced by long-time broadcaster turned “podfather,” Adam Curry. The site follows the open-source tradition, promising:

“The Podcast Index is here to preserve, protect and extend the open, independent podcasting ecosystem. We do this by enabling developers to have access to an open, categorized index that will always be available for free, for any use. … Podcast Index LLC is a software developer focused partnership that provides tools and data to anyone who aspires to create new and exciting Podcast experiences without the heavy lifting of indexing, aggregation and data management.”

Funded by its founders and by donations, the site aims to list every available podcast so would-be listeners need not rely on commercial firms to discover them. This goal is emphasized by a running tally on the homepage, which counts over four million (!) podcasts listed as of this writing. One can filter and browse the many supporting apps, directories, and hosting companies here. Developers can sign up to use the API here. And, of course, donations can be made through the red button at the foot of the home page. For anyone wondering how to put content from around the world in their ears, this is a good place to start.

Cynthia Murrell, September 24, 2024

Written by Stephen E. Arnold · Filed Under News, Rich media | Comments Off on Open Podcast Index Lists Many

See How Clever OSINT Lovers Can Be. Impressed? Not Me

September 11, 2024

This essay is the work of a dumb dinobaby. No smart software required.

See the dancing dinosaur. I am a dinobaby, and I have some precepts that are different from those younger than I. I was working on a PhD at the University of Illinois in Chambana and fiddling with my indexing software. The original professor with the big fat grant had died, but I kept talking to those with an interest in concordances about a machine approach to producing these “indexes.” No one cared. I was asked to give a talk at a conference called the Allerton House not far from the main campus. The “house” had a number of events going on week in and week out. I delivered my lecture about indexing medieval sermons in Latin to a small group. In 1972, my area of interest was not a particularly hot topic. After my lecture, a fellow named James K. Rice waited for me to pack up my view graphs and head to the exit. He looked me in the eye and asked, “How quickly can you be in Washington, DC?

An old-time secure system with a reminder appropriate today. Thanks, MSFT Copilot. Good enough.

I will eliminate the intermediary steps and cut to the chase. I went to work for a company located in the Maryland technology corridor, a five minute drive from the Beltway, home of the Beltway bandits. The company operated under the three letter acronym NUS. After I started work, I learned that the “N” meant nuclear and that the firm’s special pal was Halliburton Industries. The little-known outfit was involved in some sensitive projects. In fact, when I arrived in 1972, there were more than 400 nuclear engineers on the payroll and more ring knockers than I had ever heard doing their weird bonding ritual at random times.

I learned three things:

“Nuclear” was something one did not talk about… ever to anyone except those in the “business” like Admiral Craig Hosmer, then chair of the Joint Committee on Atomic Energy
“Nuclear” information was permanently secret
Revealing information about anything “nuclear” was a one-way ticket to trouble.

I understood. That was in 1972 in my first day or two at NUS. I have never forgotten the rule because my friend Dr. James Terwilliger, a nuclear engineer originally trained at Virginia Tech said to me when we first met in the cafeteria: “I don’t know you. I can’t talk to you. Sit somewhere else.”

Jim and I became friends, but we knew the rules. The other NUS professionals did too. I stayed at the company for five years, learned a great deal, and never forgot the basic rule: Don’t talk nuclear to those not in the business. When I was recruited by Booz Allen & Hamilton, my boss and the fellow who hired me asked me, “What did you do at that little engineering firm?” I told him I worked on technical publications and some indexing projects. He bit on indexing and I distracted him by talking about medieval religious literature. In spite of that, I got hired, a fact other Booz Allen professionals in the soon-to-be-formed Technology Management Group could not believe. Imagine. Poetry and a shallow background at a little bitty, unknown engineering company with a meaningless name and zero profile in the Blue Chip Consulting world. Arrogance takes many forms.

Why this biographical background?

I read “Did Sandia Use a Thermonuclear Secondary in a Product Logo?” I have zero comment about the information in the write up. Read the document if you want. Most people will not understand it and be unable to judge its accuracy.

I do have some observations.

First, when the first index of US government servers was created using Inktomi and some old-fashioned manual labor, my team made sure certain information was not exposed to the public via the new portal designed to support citizen services. Even today, I worry that some information on public facing US government servers may have sensitive information exposed. This happens because of interns given jobs but not training, government professionals working with insufficient time to vet digital content, or the weird “flow” nature of digital information which allows a content object to be where it should not. Because I had worked at the little-known company with the meaningless acronym name, I was able to move some content from public-facing to inward-facing systems. When people present nuclear-related information, knowledge and good judgment are important. Acting like a jazzed up Google-type employee is not going to be something to which I relate.

Second, the open source information used to explain the seemingly meaningless graphic illustrates a problem with too much information in too many public facing places. Also, it underscores the importance of keeping interns, graphic artists, and people assembling reports from making decisions. The review process within the US government needs to be rethought and consequences applied to those who make really bad decisions. The role of intelligence is to obtain information, filter it, deconstruct it, analyze it, and then assemble the interesting items into a pattern. The process is okay, but nuclear information should not be open source in my opinion. Remember that I am a dinobaby. I have strong opinions about nuclear, and those opinions support my anti-open source stance for this technical field.

Third, the present technical and political environment frightens me. There is a reason that second- and third-tier nation states want nuclear technology. These entities may yip yap about green energy, but the intent, in my view, is to create kinetic devices. Therefore, this is the wrong time and the Internet is the wrong place to present information about “nuclear.” There are mechanisms in place to research, discuss, develop models, create snappy engineering drawings, and talk at the water cooler about certain topics. Period.

Net net: I know that I can do nothing about this penchant many have to yip yap about certain topics. If you read my blog posts, my articles which are still in print or online, or my monographs — you know that I never discuss nuclear anything. It is a shame that more people have not learned that certain topics are inappropriate for public disclosure. This dinobaby is really not happy. The “news” is all over a Russian guy. Therefore, “nuclear” is not a popular topic for the TikTok crowd. Believe me: Anything that offers nuclear related information is of keen interest to certain nation states. But some clever individuals are not happy unless they have something really intelligent to say and probably know they should not. Why not send a personal, informative email to someone at LANL, ORNL, or Argonne?

Stephen E Arnold, September 11, 2024

Written by Stephen E. Arnold · Filed Under cybersecurity, Government, News, OSINT | Comments Off on See How Clever OSINT Lovers Can Be. Impressed? Not Me

Google and Search: A Fix or a Pipe Dream?

September 6, 2024

This essay is the work of a dumb dinobaby. No smart software required.

I read “Dawn of a New Era in Search: Balancing Innovation, Competition, and Public Good.”

Don’t get me wrong. I think multiple search systems are a good thing. The problem is that search (both enterprise and Web) are difficult problems, and these problems are expensive to solve. After working more than 50 years in electronic information, I have seen search systems come and go. I have watched systems morph from search into weird products that hide the search plumbing beneath fancy words like business intelligence and OSINT tools, among others. In 2006 or 2007, one of my financial clients published some of our research. The bank received an email from an “expert” (formerly and Verity) that his firm had better technology than Google. In that conversation, that “expert” said, “I can duplicate Google search for $300 million.” The person who said these incredibly uninformed words is now head of search at Google. Ed Zitron has characterized the individual as the person who killed Google search. Well, that fellow and Google search are still around. This suggests that baloney and high school reunions provide a career path for some people. But search is not understood particularly well at Google at this time. It is, therefore, that awareness of the problems of search is still unknown to judges, search engine marketing experts, developers of metasearch systems which recycle Bing results, and most of the poohbahs writing about search in blogs like Beyond Search.

The poor search kids see the rich guy with lots of money. The kids want it. The situation is not fair to those with little or nothing. Will the rich guy share the money? Thanks, Microsoft Copilot. Good enough. Aren’t you one of the poor Web search vendors?

After five decades of arm wrestling with finding on point information for myself, my clients, and for the search-related start ups with whom I have worked, I have an awareness of how much complexity the word “search” obfuscates. There is a general perception that Google indexes the Web. It doesn’t. No one indexes the Web. What’s indexed are publicly exposed Web pages which a crawler can access. If the response is slow (like many government and underfunded personal / commercial sites), spiders time out. The pages are not indexed. The crawlers have to deal in a successful way with the changes on how Web pages are presented. Upon encountering something for which the crawler is not configured, the Web page is skipped. Certain Web sites are dynamic. The crawler has to cope with these. Then there are Web pages which are not composed of text. The problems are compounded by the vagaries of intermediaries’ actions; for example, what’s being blocked or filtered today? The answer is the crawler skips them.

Without revealing information I am not permitted to share, I want to point out that crawlers have a list which contains bluebirds, canaries, and dead ducks. The bluebirds are indexed by crawlers on an aggressive schedule, maybe multiple times every hour. The canaries are the index-on-a-normal-cycle, maybe once every day or two. The dead ducks are crawled when time permits. Some US government Web sites may not be updated in six or nine months. The crawler visits the site once every six months or even less frequently. Then there are forbidden sites which the crawler won’t touch. These are on the open Web but urls are passed around via private messages. In terms of a Web search, these sites don’t exist.

How much does this cost? The answer is, “At scale, a lot. Indexing a small number of sites is really cheap.” The problem is that in order to pull lots of clicks, one has to have the money to scale or a niche no one else is occupying. Those are hard to find, and when one does, it makes sense to slap a subscription fee on them; for example, POISINDEX.

Why am I running though what strikes me as basic information about searching the Web? “Dawn of a New Era in Search: Balancing Innovation, Competition, and Public Good” is interesting and does a good job of expressing a specific view of Web search and Google’s content and information assets. I want to highlight the section of the write up titled “The Essential Facilities Doctrine.” The idea is that Google’s search index should be made available to everyone. The idea is interesting, and it might work after legal processes in the US were exhausted. The gating factor will be money and the political climate.

From a competitor’s point of view, the index blended with new ideas about how to answer a user’s query would level the playing field. From Google’s point of view it would loss of intellectual property.

Several observations:

The hunger to punish Big Tech seems to demand being satisfied. Something will come from the judicial decision that Google is a monopoly. It took a couple of decades to arrive at what was obvious to some after the Yahoo ad technology settlement prior to the IPO, but most people didn’t and still don’t get “it.” So something will happen. What is not yet known.
Wide access to the complete Google index could threaten the national security of the US. Please, think about this statement. I can’t provide any color, but it is a consideration among some professionals.
An appeal could neutralize some of the “harms,” yet allow the indexing business to continue. Specific provisions might be applied to the decision of Judge Mehta. A modified landscape for search could be created, but online services tend to coalesce into efficient structures. Like the break up of AT&T, the seven Baby Bells and Bell Labs have become AT&T and Verizon. This could happen if “ads” were severed from Web search. But after a period of time, the break up is fighting one of the Arnold Laws of Online: A single monopoly is more efficient and emergent.

To sum up, the time for action came and like a train in Switzerland, left on time. Undoing Google is going to be more difficult than fiddling with Standard Oil or the railroad magnates.

Stephen E Arnold, September 6, 2024

Written by Stephen E. Arnold · Filed Under Financial, Indexing, Legal matters, News, Search | Comments Off on Google and Search: A Fix or a Pipe Dream?

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Google Experiment: News? Nobody Cares So Ad Impact Is Zero, Baby, Zero

Google and Job Security? What a Hoot

Microsoft Still Searching after All These Years

Good Enough Junk Food: Knowledge without the Work

Listary: A Chinese Alternative to Windows File Explorer

More Googley Human Resource Goodness

Entity Extraction: Not As Simple As Some Vendors Say

Open Podcast Index Lists Many

See How Clever OSINT Lovers Can Be. Impressed? Not Me

Google and Search: A Fix or a Pipe Dream?

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta