The Consequences of an Echo Chamber for Google Search

January 19, 2018

I read “Google Memory Loss.” The author is a fellow who created a text search engine, helped found OpenText, did some time at the GOOG, and swam in the Semantic Web pond.

The write up provides useful information to anyone wondering why a Google query for a company name goes off the rails or why the Google suggestions have zero relevance to the user’s query.

There were some important points in the write up; for example:

  1. Search is “crushingly expensive”. This means that when Google needs to cut costs and maximize revenue, the company will make business decisions. The decisions may favor advertising revenues. Maybe.
  2. Archival information is not popular. The reasoning may be, “Why index this stuff or revisit the archive to figure out if there is “new information” in the old archive? If old information is not important, what about unpopular sites the National Railway Retirement Board Web content?
  3. Google is into the timely, not the research-centric type of query.
  4. Dr. Bray uses Google but supplements the look up by using very un-Googley search systems.

Here in Harrod’s Creek, we love the Google. Filtered, ad-tailored results are perfect for looking up KY Fry or the NCAA rules committee’s favorite team, the Louisville Cardinals.

A search for Cardinals returns this results page this morning:


Lots of Googlers love March Madness. Too bad if a 7th grader has to look up information about cardinals with feathers.

Stephen E Arnold, January 19, 2018

We Are Without a Paddle on Growing Data Lakes

January 18, 2018

The pooling of big data is commonly known as a “data lake.” While this technique was first met with excitement, it is beginning to look like a problem, as we learned in a recent Info World story, “Use the Cloud to Create Open, Connected Data Lakes for AI, Not Data Swamps.”

According to the story:

A data scientist will quickly tell you that the data lake approach is a recipe for a data swamp, and there are a few reasons why. First, a good amount of data is often hastily stored, without a consistent strategy in place around how to organize, govern and maintain it. Think of your junk drawer at home: Various items get thrown in at random over time, until it’s often impossible to find something you’re looking for in the drawer, as it’s gotten buried.

This disorganization leads to the second problem: users are often not able to find the dataset once ingested into the data lake.

So, how does one take aggregate data from a stagnant swamp to a lake one can traverse? According to Scientific Computing, the secret lies in separating the search function into two pieces, finding and searching. When you combine this thinking with Info World’s logic of using the cloud, suddenly these massive swamps are drained.

Patrick Roland, January 18, 2018



Some Think Google Is No Longer the King of Search

January 18, 2018

Google is much more than a search engine, it’s a verb. Like Xerox and Kleenex before it, that says something about the hierarchy of their business. However, some are claiming it’s time for alternatives (In search…not in copy-making or nose blowing). This, according to a recent Eyerys story, “Searching Beyond Google: When The Internet is Too Big for a Single Search Engine.”

According to the story:

[T]he information you need might be hidden from the tools you use. Either because the webmasters wanted that to happen by blocking search engines’ access, or inaccessible by search engine because they are behind paywalls or login forms, or lies inside the deep web.

To access them, you need more specific tools other than search engines, and look at the right place, with the right privilege.

If and only if you still can’t find the information you’re looking for, it’s either not available on the internet, or doesn’t exist in the first place.

Or, they could be hidden inside database, encrypted, lies deeper and accessible to only using certain IPs, classified methods or privilege. In this case, it’s not publicly available though it is there. You need to be a hacker to get yourself into that, and that is certainly illegal by any means.

While the story has its heart in the right place, recommending alternative engines, like DuckDuckGo, and giving tips on using social media for search, it’s not really too believable. For one, humans are creatures of habit and they are stuck on the single search engine method. This is wishful thinking, and actually makes sense in places, but we can’t see it happening.

Patrick Roland, January 18, 2018

Qwant Goes to China

January 17, 2018

The roots of Qwant stretch back to Pertimm, an interesting search system which pre-dated today’s Qwant. Information in my files about Qwant reminded me that Qwant is a metasearch system which combines its own crawling of French sources. The key feature of Qwant is that it is not retaining data about users’ queries. It is important to keep in mind that legal intercepts can capture Internet data and may be able to map user actions to particular Web sites or topics.

In the article “Not Just a Horse: Macron Also Brings Privacy-Based Browser on Trip to China,” the French delegation visiting Chinese officials is, in part, designed to promote the use of Qwant.

I noted this statement in the article, one of the founders of Qwant allegedly stated:

Yes, we need a lot of data but we don’t need to know that it’s you or me. The whole idea of Qwant is to make AI and IoT without the data of the users. In our case, based on the fact that we are a privacy-based search engine, we don’t need people’s data. So maybe we‘ll have some technology that we can use more easily in China than some of our competitors.

My perception is that China is quite interested in who searches what, particularly within the Middle Kingdom. Qwant will follow “local regulations.”

My recollection is that Google has not achieved the same level of dominance that it has in Europe, home of Qwant.

Since the demise of Quaero and Muscat, Yandex has become one of the European alternatives to Google. The Exalead Web search system is still online, but it does not attract much attention. I find it useful because Google results are thin when I search for older content. You can locate the Exalead search system at this link. Dassault Systèmes uses Exalead for its product component search, and I am surprised that the company does not push the Web search capability more aggressively.

If you have not tried Qwant, you can try it at Compare the results with the Exalead system and the Russian Yandex system.

In my tests, I find it necessary to use multiple search systems, including the low profile and system. It is more difficult than ever to locate certain types of information in general purpose Web search systems. This applies to metasearch systems like Ixquick (now, Unbubble, Izito, and other systems which try to offer researchers an alternative to Google.

Google works well for pizza. Looking for other types of information? Qwant and other low profile systems have to be used. The process of locating something as basic as the address of a company in Madrid can require quite vigorous hoop jumping.

But China? Interesting.

Stephen E Arnold, January 17, 2018

What Is Wrong with Web Search? Question Answered

January 15, 2018

I read “How People Search: Classifying & Understanding User Intent.” The article is an extract I believe from a new book oriented to those interested in search engine optimization. I will confess. I am not a fan of search engine optimization.

The write up is important, however. The author makes clear why today’s search returns off point, irrelevant, and ad-related content more often than not.

Quick example: I was running a query for information about a company founded in Madrid, Spain. The company has an unusual name consisting of a single digit and two letters. I assumed that the company name would be unique; otherwise, why would a firm choose a sequence of letters and a number which generated false hits. I also theorized that the company’s location in Madrid, Spain, would narrow the result set.

I ran the query on Bing, Google, and Yandex. None of these systems returned the information I wanted. Bing pointed to some biographies in LinkedIn, Google expanded the query to intelligence quotient or IQ, and Yandex just didn’t have much of anything. I don’t fool with metasearch engines; these just send queries to Web indexes with which they are in cahoots.

What to do?

The solution was not easy.

First, I set up a Spain proxy so I could run my query in Spanish against Google’s index for Spain. One can no longer point to a country’s Google search system. A bit of effort is, therefore, required. Who would want to search outside the United States. Stupid, no?

Second, I turned to my directory of specialist search engines. The one which delivered useful results was I know you probably use this system everyday, gentle reader.

As a result, I was able to obtain the information I needed.

The reason I had to go to such lengths was that the information revealed in the SEO oriented article makes clear that search means delivering what most people want.

You want Minnesota Vikings? Well, you are going to get sports. Forget an easy path to those brave warriors who made life miserable to my relatives in the UK.

Here are some highlights from the article which help explain why advertising and appealing to what the author of Democracy in America pointed out as a path toward mediocrity:

  1. Engineers look at data and shape the system to match the numbers
  2. Quality is conformance to what sells ads and keeps most users happy
  3. Disambiguation is resolved by looking at what numbers suggest is the “correct” or “intended” meaning
  4. You really want to buy something; therefore, pizza is a slam dunk when running a query from a mobile device
  5. Voice search means “I want information”.

If these observations ring your chimes, you are one of the helpful people who have contributed to the death of relevance, the increasing difficulty of locating on points research, and using tools to obtain specific, on point, highly relevant information. Good job.

Stephen E Arnold, January 15, 2018

Online: Welcome to 1981 and 2018

January 8, 2018

I have been thinking about online. I met with a long-time friend and owner of a consumer-centric Web site. For many years (since 1993, in fact), the site grew and generated a solid stream of revenue.

At lunch, the site owner told me that in the last three years, the revenue was falling. As I listened to this sharp businessperson, I realized that his site had shifted from ads which he and his partners sold to ads provided by automated systems.

From direct control to the ease of automated ad provision created the current predicament: Falling revenue. At the same time, the mechanisms for selling ads directly evolved as well. The shift from many industry events to a handful of large business sector conferences took place. There were more potential customers at these shows, but the attendance shifted from hands-on marketers to people who wanted to make use of online automated sales and marketing systems began to dominate.


He said, “In the good old days of 1996, I could go to a trade show and meet people who made advertising and marketing decisions based on experience with print and TV advertising, dealer promotions, and ideas.”

“Now,” he continued, “I meet smart people who want to use methods which rely on automated advertising. When I talk about buying an ad on our site or sponsoring a section of our content, the new generation look at me like I’m crazy. What’s that?”

I listened. What could I say.

The good, old days maybe never existed.

I read “Facebook and Google Are Free. They Shouldn’t Be.” The write up has a simple premise: Users should pay for information.

I am not certain if the write up realizes that paying for online information was the only way to generate revenue from digital content in the past. I know that partners in law firms realize that running queries on LexisNexis and Westlaw have to allocate cash to pay for the digital information about laws, decisions, and cases. For the technical information in Chemical Abstracts, researchers and chemists have to pay as well. Financial data for traders costs money as well.

Read more

Give Bing a Chance

January 5, 2018

Google is still the most popular web search engine by far, but should we be giving Bing a closer look? Editor Anmol at the admittedly Microsoft-centric blog MSPowerUser explains, “Why I Prefer Bing Over Google (And You Should Too).” He begins with a little history:

Formerly called as MSN Search, Windows Live Search or Live Search, Bing was unveiled by former CEO of Microsoft, Steve Ballmer on May 28th, 2009 and went live on June 3rd. 2009.  Since then, Microsoft is showing its commitment to Bing as an Internet Search Engine rivalling the dominant giant Google. With Windows 8.1, Bing was deeply integrated with the OS with what was called ‘Smart Search’ and this was accessible from the Start Screen. But now a Search Engine is not used ‘just as a search engine.’ Now we use these services to find coffee places around us, book cabs, book movie tickets and more.

True. So why does the author think Bing is best? First, Bing integrates with the very useful Cortana, Microsoft’s digital assistant and, second, it is available across operating systems. Though others might disagree, Anmol feels Bing’s actual search results are as good as Google’s and, besides, it makes some good predictions. Here are the other strengths Anmol cites: a more appealing home page, the Microsoft rewards program, integration with Facebook Messenger, strong local search, package tracking, a capable image-search function, and its advanced math skills. Bing even seems to understand the needs of developers better than Google does. See the write-up for elaboration, including screenshots, on each of these points.

Anmol concludes:

Above are all that I think made me switch to Bing and are keeps me staying. All these features are brought together to life with advanced machine learning algorithms and years of research and hard work. As Microsoft is a productivity-focused Software giant, Bing is something that drives a large part of its revenue by conquering a large amount of market share. Because of their success already I can only see Microsoft offering even tougher competition to its largest rival Google.

Cynthia Murrell, January 5, 2018


Who Helps Trash Relevance in Search? INC Has the Answer

December 30, 2017

I read a story in Inc. magazine. The write up’s title is “9 SEO Experts To Follow In 2018.” First, Google is not a person. I think the idea is that a person who wants to buy traffic should pay attention to the GOOG. But I am not sure Google is an expert like the other eight names on the list.

Now my view of search engine optimization is a bit different from that of “experts” in search engine optimization. I think SEO is part of a carnival trick to get people to buy Adwords.

I explain some of the mechanisms in The Google Legacy and Google Version 2: The Calculating Predator. (Alas, out of print, but I sell a rough draft in PDF form. Write benkent2020 at yahoo dot com if you are interested.)

The idea is that people fix up their Web pages to meet Google guidelines. Changes which pass muster produce a boost in traffic. Then usually after a month or so, the changes don’t deliver the traffic. Traffic erodes.

Check with the Google. What’s the fix? More SEO? Nah, just buy Adwords.

When the advertiser grouses that leads aren’t as wonderful as they were perceived to be, what’s the fix?

Give up?

Buy Adwords.

The loop is a nifty one. Lots of SEO “experts” bill clients for changes which may or may not have substantive impact. When whatever impact fizzles, Google is able to suggest Adwords.


My take on the pay for traffic game is that it is evidence of the death of relevance.

Therefore, the eight “experts” are accessories to the termination with extreme prejudice the notion of entering a query and getting results which directly relate to that query.

Call me old fashioned but SEO experts are in cahoots with Google type outfits in the pay for traffic game.

Give me Boolean, precision, and recall.

Sounds crazy right? Just ask an SEO expert. Most will agree. Who cares about relevance and stupid precision and recall?

Well, I do.

Stephen E Arnold, December 30, 2017

You Cannot Search for Info If the Info Is Not Indexed: The Middle Kingdom Approach

December 26, 2017

I noted two items this morning as I geared up to video the next Dark Cyber program. (Dark Cyber is a new series of HonkinNews programs from the creator of this blog, Beyond Search.)

Item one’s title is “China Shuts Down Thousands of Websites in Internet Network Crackdown.” As I understand the article, Chinese authorities remove information to reduce the likelihood that problems will arise from unfettered information access, exchange, and communication. The article quotes one source as saying, “These moves have a powerful deterrent effect.” That’s true to some degree; however, squeezing the toothpaste tube of online content may result is forcing that information into channels which may be more difficult to constrain. Nevertheless, I find the action suggestive that the Wild West days of the Internet are drawing to a close in the Middle Kingdom.

Item two’s title is “China Sentences Man to Five Years in Jail for Running VPN Service.” The main idea is that the virtual private network approach to obfuscating one’s online activities is under scrutiny in China. Apple, as you may recall, removed VPN apps to comply with Chinese guidelines. I noted this passage in the source document:

Wu’s [the fellow who gets to sojourn 60 months in a prison] VPN service reportedly had 8,000 foreign clients and 5,000 businesses. However, he had failed to apply for a state permit. While his isn’t the first sentence since another person was sent to jail for nine months on similar charges, this is the first time that such a dramatic sentence has been approved, raising concerns about the government’s growing interest in controlling information that comes into the country.

What happens if one adds one plus two? The answer is, “You can’t search for information if it is not indexed.” What information in the US accessible indexes is not online.

This weekend I was looking for a story about a Norwich, UK, man who was sentenced to prison and placed on the UK register of sex offenders. The story was not in Google News. I located the story in Bing’s news index. I found this interesting, and you can get the gist of the arrest in the January 2, 2017, HonkinNews “Dark Cyber” program.

Stephen E Arnold, December 26, 2017

Data Analysis Startup Primer Already Well-Positioned

December 22, 2017

A new startup believes it has something unique to add to the AI data-processing scene, we learn from VentureBeat’s article, “Primer Uses AI to Understand and Summarize Mountains of Text.” The company’s software automatically summarizes (what it considers to be) the most important information from huge collections of documents. Filters then allow users to drill into the analyzed data. Of course, the goal is to reduce or eliminate the need for human analysts to produce such a report; whether Primer can soar where others have fallen short on this tricky task remains to be seen. Reporter Blair Hanley Frank observes:

Primer isn’t the first company to offer a natural language understanding tool, but the company’s strength comes from its ability to collate a massive number of documents with seemingly minimal human intervention and to deliver a single, easily navigable report that includes human-readable summaries of content. It’s this combination of scale and human readability that could give the company an edge over larger tech powerhouses like Google or Palantir. In addition, the company’s product can run inside private data centers, something that’s critical for dealing with classified information or working with customers who don’t want to lock themselves into a particular cloud provider.

Primer is sitting pretty with $14.7 million in funding (from the likes of Data Collective, In-Q-Tel, Lux Capital, and Amplify Partners) and, perhaps more importantly, a contract with In-Q-Tel that connects them with the U.S. Intelligence community. We’re told the software is being used by several agencies, but that Primer knows not which ones. On the commercial side, retail giant Walmart is now a customer. Primer emphasizes they are working to enable more complex reports, like automatically generated maps that pinpoint locations of important events. The company is based in San Francisco and is hiring for several prominent positions as of this writing.

Cynthia Murrell, December 22, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta