Google’s Government Indexing Desire

December 13, 2008

I laughed when I read the Washington Post article “Firms Push for a More Searchable Federal Web” by Peter Whoriskey. I wiped away my tears and pondered the revelation that Google wants to index the US government’s information. The leap from Eric Schmidt to the Smithsonian to search engine optimizer par excellence was almost too much for me. I assume that Google’s man in Washington did not recall the procurements in which Google participated. Procurement I might add that Google did not win. The company lost out to Inktomi, Fast Search, Microsoft, and Vivisimo. Now it seems Google wants to get back in the game. Google is in the game. The company has an index of some US government information here. The service is called Google US Government Search, and it even has an American flag to remind you that Google is indexing some of the US government’s public facing content. When I compare the coverage of Microsoft Vivisimo’s index here with that of Google’s US government index, I think Google delivers more on point information. Furthermore, the clutter free Google search pages lets me concentrate on search. The question that does not occur to Mr. Whoriskey is, “Why doesn’t the US government use Google for the USA.gov service?” I don’t know the answer to this question, but I have a hunch that Google did not put much emphasis on competition for a government wide indexing contract. Now I think Google wants that contract, and it is making its interest known. With cheerful yellow Google Search Appliances sprouting like daisies in government agencies, the GOOG wants

How good is Google’s existing index of US government information? I ran a few test queries. Here is a summary of my results of testing Google’s service with the USA.gov service provided by Microsoft Vivisimo. The first query is “nuclear eccs”. I wanted to see on the first results page pointers to emergency core cooling system information available from the Nuclear Regulatory Commission and a couple of other places which have US government facing nuclear related information. Well, Google nailed my preferred source with a directly link to the NRC Revision of Appendix K which is about ECCS. USA.gov provided a pointer to a GE document but the second link was to my preferred document. Close enough for horseshoes.

My second query was for “Laura Bush charity”. Google delivered a useful hit at item 3 “$2 Million Grant for Literacy Programs.” USA.gov nailed the result at the number one position in its hit list.

My third query was for “companycommand.com”. Google presented the link to CompanyCommand.com at the top of the results list and displayed a breakdown of the main sections of the Web site. USA.gov delivered the hit at the top of the results list.

My test–as unscientific as it was–revealed to me that neither Google nor Microsoft Vivisimo perform better than one another. Neither services sucks the content from the depths of the Department of Commerce. Where are those consulting reports prepared for small businesses?

What’s the difference between Google’s government index and the Microsoft Vivisimo government index?

The answer is simple, “Buzz”.

No one in my circle of contacts in the US government gets jazzed about Microsoft Vivisimo. But mention Google, and you have the Tang of search. Google strikes me as responding to requests from departments to buy Google Maps or the Google Search Appliance.

If Google wants to nail larger sales in the US government, the company will need the support of partners who can deliver the support and understanding government executives expect and warrant. The messages refracted through a newspaper with support from wizards in the business of getting a public Web site to appear at the top of a results list wont do the job.

In my opinion, both Google and the Washington Post have to be more precise in their communications. Search is a complicated beastie even though many parvenu consultants love the words “Easy,” “Simple” and “Awesome performance.” That’s the sizzle, not the steak. Getting content to the user is almost as messy as converting Elsie the cow into hamburgers.

Google’s the search system with buzz. The incumbent Microsoft Vivisimo has the government contract. If Google wants that deal and others of that scale, Google will want to find partners who can deliver, not partners who are in Google’s social circle. Google will want to leverage its buzz and build into their comments that search engine optimization is not exactly what is needed for the Google Search Appliance to deliver a solution to a government agency. Finally, the Washington Post may want to dig a bit deeper when writing about search in order to enhance clarity and precision.

Stephen Arnold, December 12, 2008

Tribune Says: Google’s Automated Indexing Not Good

September 11, 2008

I have been a critic of Sam Zell’s Tribune since I tangled with the site for my 86 year old father. You can read my negative views of the site’s usability, its indexing, and its method of displaying content here.

Now on with my comments on this Marketwatch story titled “Tribune Blames Google for Damaging News Story” by John Letzing, a good journalist in my book. Mr. Letzing reports that Google’s automated crawler and indexing system could not figure out that a story from 2002 was old. As a result, the “old” story appeared in Google News and the stock of United Airlines took a hit. The Tribune, according to the story, blames Google.

Hold your horses. This problem is identical to the folks who say, “Index my servers. The information on them is what we want indexed.” As soon as the index goes live, these same folks complain that the search engine has processed ripped off music, software from mysterious sources, Cub Scout fund raising materials, and some content I don’t want to mention in a Web log. How do I know? I have heard this type of rationalization many times. Malformed XML, duplicate content, and other problems means content mismanagement, not bad indexing by a search systems.

Most people don’t have a clue what’s on their public facing servers. The content management system may be at fault. The users might be careless. Management may not have policies and create an environment in which those policies are observed. Most people don’t know that “dates” are assigned and may not correlate with the “date” embedded in a document. In fact, some documents contain many dates. Entity extraction can discover a date, but when there are multiple dates, which date is the “right one”? What’s a search system supposed to do? Well, search systems process what’s exposed on a public facing server or a source identified in the administrative controls for the content acquisition system.

Blaming a software system for lousy content management is a flashing yellow sign that says to me “Uninformed ahead. Detour around problem.”

Based on my experience with indexing content managed by people who were too busy to know what was on their machines, I think blaming Google is typical of the level of understanding in traditional media about how automated or semi automated systems work. Furthermore, when I examined the Tribune’s for fee service referenced in my description identified above, it was clear that the level of expertise brought to bear on this service was in my opinion rudimentary.

Traditional media is eager to find fault with Google. Yet some of these outfits use automated systems to index content and cut headcount. The indexing generated by these systems is acceptable, but there are errors. Some traditional publishers not only index in a casual manner, these publishers charge for each query. A user may have to experiment in order to find relevant documents. Each search puts money in the publisher’s pocket. The Tribune charges for an online service that is essentially unusable by my 86 year old father.

If a Tribune company does not know what’s on its servers and exposes those servers on the Internet, the problem is not Google’s. The problem is the Tribune’s.

Stephen Arnold, September 11, 2008

Indexing Dynamic Databased Content

April 20, 2008

In the last week, there’s been considerable discussion of what is now called “deep Web” content. The idea is that some content requires the user to enter a query. The system processes the querey and generates a search result from a database. This function is easier to illustrate than explain in words.

Look at the screen shot below. I have navigated to Southwest Airlines Web page and entered a query for flights from Louisville, Kentucky, to Baltimore, Maryland.

southwest form

Here’s what the system shows me:

southwest result

If you do a search on Google, Live.com, or Yahoo, you won’t see the specific listing of flights shown below:

southwest flight listing

Read more

Indexing Hot Spots

February 29, 2008

Introduction

This is the third in a series of cost hot spots in behind-the-firewall search. This essay does not duplicate the information in Beyond Search, my new study for the Gilbane Group. This document is designed to highlight several functions or operations in an indexing subsystem than can cause system slow downs or bottlenecks. No specific vendors’ systems are referenced in this essay. I see no value in finger pointing because no indexing subsystem is without potential for performance degradation in a real world installation. – Stephen Arnold, February 29, 2008

Indexing: Often a Mysterious Series of Multiple Operations

One of the most misunderstood parts of a behind-the-firewall search system is indexing. The term indexing itself is the problem. For most people, an index is the key word listing that appears at the back of a book. For those hip to the ways of online, indexing means metatagging, usually in the form of a series of words or phrases assigned to a Web page or an element in a document; for example, an image and its associated test. The actual index in your search system may not be one data table. The index may be multiple tables or numeric values that “float” mysteriously within the larger search system. The “index” may not even be in one system. Parts of the index are in different places, updated in a series of processes that cannot be easily recreated after a crash, software glitch, or other corruption. This CartoonStock.com image makes clear the impact of a search system crash.

Centuries ago, people lucky enough to have a “book” learned that some sort of system was needed to find a scroll, stored in a leather or clay tube, sometimes chained to the wall to keep the source document from wandering off. In the so called Dark Ages, information was not free, nor did it flow freely. Information was something special and of high value. Today, we talk about information as a flood, a tidal wave, a problem. It is ubiquitous, without provenance, and digital. Information wants to be free, fluid, moving around, and unstable, dynamic. For indexing to work, you have a specific object at a point in time to process; otherwise, the index is useless. Also, the index must be “fresh”. Fresh means that the most recent information is in the system and therefore available to users. With lots of new and changed information, you have to determine how fresh is fresh enough. Real time data also provides a challenge. If your system can index 100 megabytes a minute and to keep up with larger volumes of new and changed data, something’s got to give. You may have to prioritize what you index. You handle high-priority documents first, then shift to lower priority document until new higher-priority documents arrive. This triage affects the freshness in the index or you can throw more hardware at your system, thus increasing capital investment and operational cost.Index freshness is important. A person in a professional setting cannot do “work” unless the digital information can be located. Once located, the information must be the “right” information. Freshness is important, but there are issues of versions of documents. These are indexing challenges and can require considerable intellectual effort to resolve. You have to get freshness right for a search system to be useful to your colleagues. In general, the more involved your indexing, the more important is the architecture and engineering of the “moving parts” in your search system’s indexing subsystem.Why is indexing a cost hot spot? Let’s look at some hot spots I have encountered in the last nine months.

Remediating Indiscriminate Indexing

When you deploy your behind-the-firewall search or content processing system, you have to tell your system how to process the content. You can operate an advanced system in default mode, but you may want to select certain features, level of stringency, and make sure that you are familiar with the various controls available to you. Investing time prior to deployment in testing may be useful when troubleshooting. The first cost hot spot is encountering disc thrashing or long indexing times. You come in one morning, check the logs, and learn no content was processed. In Beyond Search I talk about some steps you can take to troubleshoot this condition. If you can’t remediate the situation by rebooting the indexing subsystem, then you will have to work through the vendor’s technical support group, restore the system to a known good state, or – in some cases – reinstall the system. When you reinstall, some systems cannot use the back up index files. If you find that your back ups won’t work or deliver erratic results on test queries, then you may have to rebuild the index. In a small two person business, the time and cost are trivial. In an organization with hundreds of servers, the process can consume significant resources.

Updating the Index or Indexes

Your search or content processing system allows you to specify how frequently the index updates. When your system has robust resources, you can specify indexing to occur as soon as content becomes available. Some vendors talk about their systems as “real time” indexing engines. If you find that your indexing engine starts to slow down, you may have encountered a “big document” problem. Indexing systems make short work of HTML pages, short PDFs, and emails. But when document size grows, the indexing subsystem needs more “time” to process long documents. I have encountered situations in which a Word document includes objects that are large. The Word document requires the indexing subsystem to grind away on this monster file. If you hit a patch characterized by a large number of big documents, the indexing subsystem will appear to be busy but indexing subsystem outputs fall sharply.Let’s assume you build your roll out index based on a thorough document analysis. You have verified security and access controls so the “right” people see the information to which they have appropriate access. You know that the majority of the documents your system processes are in the 600 kilobyte range over the first three months of indexing subsystem operation. Suddenly the document size leaps to six megabytes and the number of big documents becomes more than 20 percent of the document throughput. You may learn that the set up of your indexing subsystem or the resources available are hot spots.Another situation concerns different versions of documents. Some search and content processing systems identify duplicates using date and time stamps. Other systems include algorithms to identify duplicate content and remove it or tag it so the duplicates may or may not be displayed under certain conditions. A surge in duplicates may occur when an organization is preparing for a trade show. Emails with different versions of a PowerPoint may proliferate rapidly. Obviously indexing every six megabyte PowerPoint makes sense if each PowerPoint is different. How your indexing subsystem handles duplicates is important. A hot spot occurs when a surge in the number of files with the same name and different date and time stamps are fed into the indexing system. The hot spot may be remediated by identifying the problem files and deleting them manually or via your system’s administrative controls. Versions of documents can become an issue under certain circumstances such as a legal matter. Unexpected indexing subsystem behavior may be related to a duplicate file situation.Depending on your system, you will have some fiddling to do in order to handle different versions of documents in a way that makes sense to your users. You also have to set up a de-duplication process in order to make it easy for your users to find the specific version of the document needed to perform a work task. These administrative interventions are not difficult when you know where to look for the problem. If you are not able to pinpoint a specific problem, the hunt for the hot spot can become time consuming.

Common Operations Become a Problem

Once an index has been constructed – a process often called indexation – incremental updates are generally trouble free. Notice that I said generally. Let’s look at some situations that can arise, albeit infrequently.Index RebuildYou have a crash. The restore operation fails. You have to reindex the content. Why is this expensive? You have to plan reindexing and then baby sit the update. For reindexing you will need the resources required when you performed the first indexation of your content. In addition, you have to work through the normal verifications for access, duplicates, and content processing each time you update. Whatever caused the index restore operation to fail must be remediated, a back up created when reindexing is completed, and then a test run to make sure the new back up restores correctly.Indexing New or Changed ContentLet’s assume that you have a system, and you have been performing incremental indexes for six months with no observable problems and no red flags from users. Users with no prior history of complaining about the search system complain that certain new documents are not in the system. Depending on your search system’s configuration, you may have a hot spot in the incremental indexing update process. The cause may be related to volume, configuration, or an unexpected software glitch. You need to identify the problem and figure out a fix. Some systems maintain separate indexes based on a maximum index size. When the index grows beyond a certain size, the system creates or allows the system administrator to create a second index. Parallelization makes it possible to query index components with no appreciable increase in system response time. A hot spot can result when a configuration error causes an index to exceed its maximum size, halting the system or corrupting the index itself, although other symptoms may be observable. Again – the key to resolving this hot spot is often configuration and infrastructure.Value-Added Content ProcessingNew search and content processing systems incorporate more sophisticated procedures, systems, and methods than systems did a few years ago. Fortunately faster processors, 64-bit chips, and plummeting prices for memory and storage devices allows indexing systems to pile on the operations and maintain good indexing throughput, easily several megabytes a minute to five gigabytes of content per hour or more.If you experience slow downs in index updating, you face some stark choices when you saturate your machine capacity or storage. In my experience, these are:

  • Reduce the number of documents processed
  • Expand the indexing infrastructure; that is, throw hardware at the problem
  • Turn off certain resource intensive indexing operations; in effect, eliminating some of the processes that use statistical, linguistic, or syntactic functions.

One of the questions that comes up frequently is, “Why are value-added processing systems more prone to slow downs?” The answer is that when the number of documents processed goes up or the size of documents rises, the infrastructure cannot handle the load. Indexing subsystems require constant monitoring and routine hardware upgrades.Iterative systems cycle through processes two or more times.Some iterative functions are dependent on other processes; for example, until the linguistic processes complete, another component – for example, entity extraction – cannot be completed. Many current indexing systems are be parallelized. But situations can arise in which indexing slows to a crawl because a software glitch fails to keep the internal pipelines flowing smoothly. If process A slows down, the lack of available data to process means process B waits. Log analysis can be useful in resolving this hot spot.Crashes: Still OccurMany modern indexing systems can hiccup and corrupt an index. The way to fix a corrupt index is to have two systems. When one fails, the other system continues to function.But many organizations can’t afford tandem operation and hot failovers. When an index corruption occurs, some organizations restore the index to a prior state. A gap may exist between the points in the back up and the index state at the time of the failure. Most systems can determine which content must be processed to “catch up”. Checking the rebuilt indexes is a useful step to take when a crash has taken place and the index restored and rebuilt. Keep in mind that back ups are not fool proof. Test your system’s back up and restore procedures to make sure you can survive a crash and have the system again operational.

Wrap Up

Let’s step back. The hot spots for indexing fall into three categories. First, you have to have adequate infrastructure. Ideally your infrastructure will be engineered to permit pipelined functions to operate rapidly and without latency. Second, you will want to have specific throughput targets so you can handle new and changed content whether your vendor requires one index or multiple indexes. Third, you will want to understand how to recover from a failure and have procedures in place to restore an index or “roll back” to a known good state and then process content to ensure no lost content.In general, the more value added content processing you use, your potential for hot spots increases. Search used to be simpler from an operational point of view. Key word indexing is very straight forward compared to some of the advanced content processing systems in use today. The performance of any system fluctuates to some extent. As sophisticated as today’s systems are, there is room for innovation in system design, architecture, and administration of indexing subsystems. Keep in mind that more specific information appears in Beyond Search, due out in April 2008.

Stephen Arnold, February 29, 2008

Google Places a Big Bet, and It May Not Pay Off

June 10, 2025

Dino 5 18 25Just a dinobaby and no AI: How horrible an approach?

Each day brings more AI news. I have playing in the background a video called “The AI Math That Left Number Theorists Speechless.” That word “speechless” does not apply because the interlocutor and the math whiz are chatty Cathies. The video runs a little less that two hours. Speechless? No, when it comes to smart software some people become verbose and excited. I like to be verbose. I don’t like to get excited about artificial intelligence. I am a dinobaby, remember?

I clicked on the first item in my trusty Overflight service and this write up greeted me: “Google Is Burying the Web Alive.” How does one “bury” a digital service? I assumed or inferred that the idea is that the alleged multi-monopoly Google was going to create another monopoly for itself anchored in AI.

The write up says:

[AI Overviews are] Google’s “most powerful AI search, with more advanced reasoning and multimodality, and the ability to go deeper through follow-up questions and helpful links to the web,” the company says, “breaking down your question into subtopics and issuing a multitude of queries simultaneously on your behalf.” It’s available to everyone. It’s a lot like using AI-first chatbots that have search functions, like those from OpenAI, Anthropic, and Perplexity, and Google says it’s destined for greater things than a small tab. “As we get feedback, we’ll graduate many features and capabilities from AI Mode right into the core Search experience,” the company says.

Let’s slow down the buggy. A completely new product or service has some baggage on board. Like “New Coke”, quite a few people liked “old Coke.” The company figured it out and innovated and finally just started buying beverage outfits that were pulling new customers. Then there is the old chestnut by the buggy stand which says, “Most start ups fail.” Finally, there is the shadow of impatient stakeholders. Fail to keep those numbers up, and consequences manifest themselves.

The write up gallops forward:

From the very first use, however, AI Mode crystallized something about Google’s priorities and in particular its relationship to the web from which the company has drawn, and returned, many hundreds of billions of dollars of value. AI Overviews demoted links, quite literally pushing content from the web down on the page, and summarizing its contents for digestion without clicking…

Those clicks make Google’s money flow. It does not matter if the user clicks to view a YouTube short or a click to view a Web page about a vacation rental. Clicks equal revenue. Fewer clicks may translate to less revenue. If this is true, then what happens?

The write up suggests an answer: The good old Web is marginalized. Kaput. Dead as a door nail:

of course, Google is already working on ads for both Overviews and AI Mode). In its drive to embrace AI, Google is further concealing the raw material that fuels it, demoting links as it continues to ingest them for abstraction. Google may still retain plenty of attention to monetize and perhaps keep even more of it for itself, now that it doesn’t need to send people elsewhere; in the process, however, it really is starving the web that supplies it with data on which to train and from which to draw up-to-date details. (Or, one might say, putting it out of its misery.)

As a dinobaby, I quite like the old Web. Again we have a giant company doing something “new” and “different.” How will those bold innovations work out? That’s the $64 question (a rigged game show my mother told me).

The article concludes:

In any case, the signals from Google — despite its unconvincing suggestions to the contrary — are clear: It’ll do anything to win the AI race. If that means burying the web, then so be it.

Whoa, Nellie!

Let’s think about what the Google is allegedly doing. First, the Google is spending money to index the “Web.” My team tells me that Google is indexing less thoroughly than it was 10 years ago. Google indexes where the traffic is, and quite a bit of that traffic is to Google itself. The losers have been grousing about a lack of traffic for years. I have worked with a consumer Web site since 1993, and the traffic cratered about seven years ago. Why? Google selected sites to boost because of the link between advertiser appetite and clicks. The owner of this consumer Web site cooked up a bit of jargon for what Google was doing; he called it “steering.” The idea is that Google shaped its crawls and “relevance” in order to maximize revenue from known big ad spenders.

Google is not burying anything. The company is selecting to maximize financial benefits. My experience suggests that when Google strays too far from what stakeholders want, the company will be whipped until it gets the horses under control. Second, the AI revolution poses a significant challenge for a number of reasons. Among these is the users’ desire for the information equivalent of a “dumb” mobile phone. The cacophony of digital information is too much and creates a “why bother” need. Google wants to respond in the hope that it can come up with a product or service that produces as much money as the old Yahoo Overture GoTo model. Hope, however, is not reality.

As a dinobaby, I think Google has a reasonably good chance of stratifying its “users”. Some will pay. Some will consume the sponsored by ads AI output. Some will find a way to get the restaurant address surrounded by advertisements.

What about AI?

I am not sure that anyone knows. Both Google and Microsoft have to find a way to produce significant and sustainable revenue from the large language model method which has come to be synonymous with smart software. The costs are massive. The use cases usually focus on firing people for cost savings until the AI doesn’t work. Then the AI supporters just hire people again. That’s the Klarna call to think clearly again.

Net net: The Google is making a big bet that it can increase its revenues with smart software. How probable is it that the “new” Google will turn out like the “New Coke”?  How much of the AI hype is just l’entreprise parle dans le vide? The hype may be the inverse of reality. Something will be buried, and it may not be the “Web.”

Stephen E Arnold, June 10, 2025

Ten Directories of AI Tools

May 26, 2025

Dino 5 18 25Just the dinobaby operating without Copilot or its ilk.

I scan DailyHunt, an India-based news summarizer powered by AI I think. The link I followed landed me on a story titled “Best 10 AI Directories to Promote.” I looked for a primary source, an author, and links to each service. Zippo. Therefore, I assembled the list, provided links, and generated with my dinobaby paws and claws the list below. Enjoy or ignore. I am weary of AI, but many others are not. I am not sure why, but that is our current reality, replete with alternative facts, cheating college professors, and oodles of crypto activity. Remember. The list is not my “best of”; I am simply presenting incomplete information in a slightly more useful format.

AIxploria https://www.aixploria.com/en/ [Another actual directory. Its promotional language says “largest list”. Yeah, I believe that]

AllAITool.ai at https://allaitool.ai/

FamouseAITools.ai https://famousaitools.ai/ [Another marketing outfit sucking up AI tool submissions]

Futurepedia.io https://www.futurepedia.io/ 

TheMangoAI.co https://themangoai.co/ [Not a directory, an advertisement of sorts for an AI-powered marketing firm]

NeonRev https://www.neonrev.com/ [Another actual directory. It looks like a number of Telegram bot directories]

Spiff Store https://spiff.store/ [Another directory. I have no idea how many tools are included]

StackViv https://stackviv.ai/ [An actual directory with 10,000 tools. No I did not count them. Are you kidding me?]

TheresanAIforThat https://theresanaiforthat.com/ [You have to register to look at the listings. A turn off for me]

Toolify.ai https://www.toolify.ai/ [An actual listing of more than 25,000 AI tools organized into categories probably by AI, not a professional indexing specialist]

When I looked at each of these “directories”, marketing is something the AI crowd finds important. A bit more effort in the naming of some of these services might help. Just a thought. Enjoy.

Stephen E Arnold, May 26, 2025

Bing Goes AI: Metacrawler Outfits Are Toast

May 15, 2025

dino-orange_thumb_thumb_thumb_thumb_[1]_thumbNo AI, just the dinobaby expressing his opinions to Zillennials.

The Softies are going to win in the AI-centric search wars. In every war, there will be casualties. One of the casualties will be metasearch companies. What’s metasearch? These are outfits that really don’t crawl the Web. That is expensive and requires constant fiddling to keep pace with the weird technical “innovations” purveyors of Web content present to the user. The metasearch companies provide an interface and then return results from cooperating and cheap primary Web search services. Most users don’t know the difference and have demonstrated over the years total indifference to the distinction. Search means Google. Microsoft wants to win at search and become the one true search service.

The most recent fix? Kill off the Microsoft Bing application programming interface. Those metasearch outfits will have to learn to love Qwant, SwissCows, and their ilk or face some-survive-or-die decisions. Do these outfits use YaCy, OpenSearch, Mwmbl, or some other source of Web indexing?

image

Bob Softie has just tipped over the metasearch lemonade stand. The metasearch sellers are not happy with Bob. Bob seems quite thrilled with his bold move. Thanks, ChatGPT, although I have not been able to access your wonder 4.1 service, the cartoon is good enough.

The news of this interesting move appears in “Retirement: Bing Search APIs on August 11, 2025.” The Softies say:

Bing Search APIs will be retired on August 11, 2025. Any existing instances of Bing Search APIs will be decommissioned completely, and the product will no longer be available for usage or new customer signup. Note that this retirement will apply to partners who are using the F1 and S1 through S9 resources of Bing Search, or the F0 and S1 through S4 resources of Bing Custom Search. Customers may want to consider Grounding with Bing Search as part of Azure AI Agents. Grounding with Bing Search allows Azure AI Agents to incorporate real-time public web data when generating responses with an LLM. If you have questions, contact support by emailing Bing Search API’s Partner Support. Learn more about service retirements that may impact your resources in the Azure Retirement Workbook. Please note that retirements may not be visible in the workbook for up to two weeks after being announced. 

Several observations:

  1. The DuckDuckGo metasearch system is exempted. I suppose its super secure approach to presenting other outfits’ search results is so darned wonderful
  2. The feisty Kagi may have to spend to get new access deals or pay low profile crawlers like Dassault Exalead to provide some content (Let’s hope it is timely and comprehensive)
  3. The beneficiaries may be Web search systems not too popular with some in North America; for example, Yandex.com. I have found that Yandex.com and Yandex.ru are presenting more useful results since the re-juggling of the company’s operations took place.

Why is Microsoft taking this action? My hunch is paranoia. The AI search “thing” is going to have to work if Microsoft hopes to cope with Google’s push into what the Softies have long considered their territory. Those enterprise, cloud, and partnership set ups need to have an advantage. Binging it with AI may be viewed as the winning move at this time.

My view is that Microsoft may be edging close to another Bob moment. This is worth watching because the metasearch disruption will flip over some rocks. Who knows if Yandex or another non-Google or non-Bing search repackager surges to the fore? Web search is getting slightly more interesting and not because of the increasing chaos of AI-infused search results.

Stephen E Arnold, May 15, 2025

Google, Its AI Search, and Web Site Traffic

May 12, 2025

dino orange_thumb_thumb_thumb_thumb_thumb_thumb_thumbNo AI. Just a dinobaby sharing an observation about younger managers and their innocence.

I read “Google’s AI Search Switch Leaves Indie Websites Unmoored.” I think this is a Gen Y way of saying, “No traffic for you, bozos.” Of course, as a dinobaby, I am probably wrong.

Let’s look at the write up. It says:

many publishers said they either need to shut down or revamp their distribution strategy. Experts this effort could ultimately reduce the quality of information Google can access for its search results and AI answers.

Okay, but this is just one way to look at Google’s delicious decision.

May I share some of my personal thoughts about what this traffic downshift means for those blue-chip consultant Googlers in charge:

First, in the good old days before the decline began in 2006, Google indexed bluebirds (sites that had to be checked for new content or “deltas” on an accelerated heart beat. Examples were whitehouse.gov (no, not the whitehouse.com porn site). Then there were sparrows. These plentiful Web sites could be checked on a relaxed schedule. I mean how often do you visit the US government’s National Railway Retirement Web site if it still is maintained and online? Yep, the correct answer is, “Never.” There there were canaries. These were sites which might signal a surge in popularity. They were checked on a heart beat that ensured the Google wouldn’t miss a trend and fail to sell advertising to those lucky ad buyers.

So, bluebirds, canaries, and sparrows.

This shift means that Google can reduce costs by focusing on bluebirds and canaries. The sparrows — the site operated by someone’s grandmother to sell home made quilts — won’t get traffic unless the site operator buys advertising. It’s pay to play. If a site is not in the Google index, it just may not exist. Sure there are alternative Web search systems, but none, as far as I know, are close to the scope of the “old” Google in 2006.

Second, by dropping sparrows or pinging them once in a blue moon will reduce the costs of crawling, indexing, and doing the behind-the-scenes work that consumes Google cash at an astonishing rate. Therefore, the myth of indexing the “Web” is going to persist, but the content of the index is not going to be “fresh.” This is the concept that some sites like whitehouse.gov have important information that must be in search results. Non-priority sites just disappear or fade. Eventually the users won’t know something is missing, which is assisted by the decline in education for some Google users. The top one percent knows bad or missing information. The other 99 percent? Well, good luck.

Third, the change means that publishers will have some options. [a] They can block Google’s spider and chase the options. How’s Yandex.ru sound? [b] They can buy advertising and move forward. I suggest these publishers ask a Google advertising representative what the minimum spend is to get traffic. [c] Publishers can join together and try to come up with a joint effort to resist the increasingly aggressive business actions of Google. Do you have a Google button on your remote? Well, you will. [d] Be innovative. Yeah, no comment.

Net net: This item about the impact of AI Overviews is important. Just consider what Google gains and the pickle publishers and other Web sites now find themselves enjoying.

Stephen E Arnold, May 12, 2025

US Brain Drain Droplet May Presage a Beefier Outflow

May 8, 2025

dino orange_thumbBelieve it or not, no smart software. Just a dumb and skeptical dinobaby.

When I was working on my PhD at the University of Illinois, I noticed that the number of foreign students on campus seemed to go up each year. One year in the luxurious Florida Avenue Residence Hall, most of the students were from farms. The next year, FAR was a mini-United Nations. I did not pay any attention because I was on my way to an actual “real” job at Halliburton Nuclear in Washington, DC.

I heard the phrase “brain drain” over the years. The idea was that people who wanted to work in technical fields would come to the US, get degrees, and then stay to work in US universities or dolphin-loving, humanity-centric outfits like the nuclear industry. The idea was that the US was a magnet: Good schools, many opportunities to work or start a company.

I am not sure that golden age exists any longer. I read about universities becoming research labs for giant companies. I see podcasts with foaming-at-the-mouth academics complaining about [a] the quality of the students, [b] squabbles between different ideological groups, and [c] the lack of tenure opportunities which once seemed to be a sinecure for life just like the US government’s senior executive service.

Now the world works in ever more mysterious ways. As a confused dinobaby, I read news items (unverified, of course) with headlines like this:

Top US Scientist leaves Department Of Energy To Join Sichuan University Amid Rising China Tensions.

The write up reports a “real” news:

Amid escalating US-China tensions, senior scientist Yi Shouliang, formerly with the US Department of Energy, has left the U.S. to assume a new academic role at Sichuan University in China…. Shouliang served as a principal scientist and project leader at the DOE’s National Energy Technology Laboratory (NETL), where he focused on the Water-Energy Program.

Let’s assume that this academic who had some business interests just missed his family. No big deal.

But what if a certain “home” country was starting to contact certain people and explaining that their future was back in the good old homeland? Could that country systematically explain the facts of life in a way that made the “home” country look more appealing than a big house in Squirrel Hill?

For a few months, I have been writing “China smart, US dumb” blog posts when I spot some news about how wonderfully bright many young Chinese men and women are.

As a dinobaby, my first thought is that China wants its smart people back in the Middle Kingdom. Hopefully more information about this 2025 brain drain from the US to other countries will become publicly available. Plus, one isolated person going against the “You can’t go home again” idea means nothing. Or does it mean something is afoot?

PS. No, I never went back to Chambana to turn in my thesis. I liked working at Halliburton Nuclear more than I liked indexing poetry for the now departed Dr. William Gillis. Sorry, Dr. Gillis, the truth is now out.

Stephen E Arnold, May 8, 2025

The 10X Engineer? More Trouble Than They Are Worth

April 25, 2025

dino orange_thumb_thumb_thumbDinobaby, here. No smart software involved unlike some outfits. I did use Sam AI-Man’s art system to produce the illustration in the blog post.

I like it when I spot a dinobaby fellow traveler. That happened this morning (March 28, 2025) when I saw the headline “In Praise of Normal Engineers: A Software Engineer Argues Against the Myth of the 10x Engineer.”

The IEEE Spectrum article states:

I don’t have a problem with the idea that there are engineers who are 10 times as productive as other engineers. The problems I do have are twofold.

image

Everyone is amazed that the 10X engineer does amazing things. Does the fellow become the model for other engineers in the office? Not for the other engineers. But the boss loves this super performer. Thanks, OpenAI, good enough.

The two “problems” — note the word “problems” are:

  1. “Measuring productivity.” That is an understatement, not a problem. With “engineers” working from home or in my case a far off foreign country, a hospital waiting room, or playing video games six fee from me productivity is a slippery business.
  2. “Teams own software.” Alas, that is indeed true. In 1962, I used IBM manuals to “create” a way to index. The professor who paid me $3 / hour was thrilled. I kept doing this indexing thing until the fellow died when I started graduate school. Since then, whipping up software confections required “teams.” Why? I figured out that my indexing trick was pure good fortune. After that, I made darned sure there were other eyes and minds chugging along by my side.

The write up says:

A truly great engineering organization is one where perfectly normal, workaday software engineers, with decent skills and an ordinary amount of expertise, can consistently move fast, ship code, respond to users, understand the systems they’ve built, and move the business forward a little bit more, day by day, week by week.

I like this statement. And here’s another from the article:

The best engineering orgs are not the ones with the smartest, most experienced people in the world. They’re the ones where normal software engineers can consistently make progress, deliver value to users, and move the business forward. Places where engineers can have a large impact are a magnet for top performers. Nothing makes engineers happier than building things, solving problems, and making progress.

Happy workers are magnets.

Now  let’s come back to the 10X idea. I used to work at a company which provided nuclear engineering services to the US government and a handful of commercial firms engaged in the nuclear industry. We had a real live 10X type. He could crank out “stuff” with little effort. Among the 600 nuclear engineers employed at this organization, he was the 10X person. Everyone liked him, but he did not have much to say. In fact, his accent made what he said almost impenetrable. He just showed up every day in a plaid coat, doodled on a yellow pad, and handed dot points, a flow chart, or a calculation to another nuclear engineer and went back to doodling.

Absolutely no one at the nuclear engineering firm wanted to be a 10X engineer. From my years of working at this firm, he was a bit of a one-off. When suits visited, a small parade would troop up to his office on the second floor. He shared that with my close friend, Dr. James Terwilliger. Everyone would smile and look at the green board. Then they would troop out and off to lunch.

I think the presence of this 10X person was a plus for the company. The idea of trying to find another individual who could do the nuclear “stuff” like this fellow was laughable. For some reason, the 10X person liked me, and I got the informal job of accompanying to certain engagements. I left that outfit after several years to hook up with a blue chip consulting firm. I lost track of the 10X person, but I had the learnings necessary to recognize possible 10X types. That was a useful addition to my bag of survival tips as a minus 3 thinker.

Net net: The presence of a 10X is a plus. Ignoring the other 599 engineers is a grave mistake. The errors of this 10X approach are quite evident today: Unchecked privacy violations, monopolistic behaviors enabled by people who cannot set up a new mobile phone, and a distortion of what it means to be responsible, ethical, and moral.

The 10X concept is little more than a way to make the top one percent the reason for success. Their presence is a positive, but building to rely on 10X anything is one of the main contributing factors to the slow degradation of computer services, ease of use, and, in my opinion, social cohesion.

Engineers are important. The unicorn engineers are important. Balance is important. Without out balance “stuff” goes off the rails. And that’s where we are.

Stephen E Arnold, April xx, 2025

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta