Qwant Goes to China

January 17, 2018

The roots of Qwant stretch back to Pertimm, an interesting search system which pre-dated today’s Qwant. Information in my files about Qwant reminded me that Qwant is a metasearch system which combines its own crawling of French sources. The key feature of Qwant is that it is not retaining data about users’ queries. It is important to keep in mind that legal intercepts can capture Internet data and may be able to map user actions to particular Web sites or topics.

In the article “Not Just a Horse: Macron Also Brings Privacy-Based Browser on Trip to China,” the French delegation visiting Chinese officials is, in part, designed to promote the use of Qwant.

I noted this statement in the article, one of the founders of Qwant allegedly stated:

Yes, we need a lot of data but we don’t need to know that it’s you or me. The whole idea of Qwant is to make AI and IoT without the data of the users. In our case, based on the fact that we are a privacy-based search engine, we don’t need people’s data. So maybe we‘ll have some technology that we can use more easily in China than some of our competitors.

My perception is that China is quite interested in who searches what, particularly within the Middle Kingdom. Qwant will follow “local regulations.”

My recollection is that Google has not achieved the same level of dominance that it has in Europe, home of Qwant.

Since the demise of Quaero and Muscat, Yandex has become one of the European alternatives to Google. The Exalead Web search system is still online, but it does not attract much attention. I find it useful because Google results are thin when I search for older content. You can locate the Exalead search system at this link. Dassault Systèmes uses Exalead for its product component search, and I am surprised that the company does not push the Web search capability more aggressively.

If you have not tried Qwant, you can try it at www.qwant.com. Compare the results with the Exalead system and the Russian Yandex system.

In my tests, I find it necessary to use multiple search systems, including the low profile iseek.com and Searx.me system. It is more difficult than ever to locate certain types of information in general purpose Web search systems. This applies to metasearch systems like Ixquick (now Startpage.com), Unbubble, Izito, and other systems which try to offer researchers an alternative to Google.

Google works well for pizza. Looking for other types of information? Qwant and other low profile systems have to be used. The process of locating something as basic as the address of a company in Madrid can require quite vigorous hoop jumping.

But China? Interesting.

Stephen E Arnold, January 17, 2018

What Is Wrong with Web Search? Question Answered

January 15, 2018

I read “How People Search: Classifying & Understanding User Intent.” The article is an extract I believe from a new book oriented to those interested in search engine optimization. I will confess. I am not a fan of search engine optimization.

The write up is important, however. The author makes clear why today’s search returns off point, irrelevant, and ad-related content more often than not.

Quick example: I was running a query for information about a company founded in Madrid, Spain. The company has an unusual name consisting of a single digit and two letters. I assumed that the company name would be unique; otherwise, why would a firm choose a sequence of letters and a number which generated false hits. I also theorized that the company’s location in Madrid, Spain, would narrow the result set.

I ran the query on Bing, Google, and Yandex. None of these systems returned the information I wanted. Bing pointed to some biographies in LinkedIn, Google expanded the query to intelligence quotient or IQ, and Yandex just didn’t have much of anything. I don’t fool with metasearch engines; these just send queries to Web indexes with which they are in cahoots.

What to do?

The solution was not easy.

First, I set up a Spain proxy so I could run my query in Spanish against Google’s index for Spain. One can no longer point to a country’s Google search system. A bit of effort is, therefore, required. Who would want to search outside the United States. Stupid, no?

Second, I turned to my directory of specialist search engines. The one which delivered useful results was iseek.com. I know you probably use this system everyday, gentle reader.

As a result, I was able to obtain the information I needed.

The reason I had to go to such lengths was that the information revealed in the SEO oriented article makes clear that search means delivering what most people want.

You want Minnesota Vikings? Well, you are going to get sports. Forget an easy path to those brave warriors who made life miserable to my relatives in the UK.

Here are some highlights from the article which help explain why advertising and appealing to what the author of Democracy in America pointed out as a path toward mediocrity:

  1. Engineers look at data and shape the system to match the numbers
  2. Quality is conformance to what sells ads and keeps most users happy
  3. Disambiguation is resolved by looking at what numbers suggest is the “correct” or “intended” meaning
  4. You really want to buy something; therefore, pizza is a slam dunk when running a query from a mobile device
  5. Voice search means “I want information”.

If these observations ring your chimes, you are one of the helpful people who have contributed to the death of relevance, the increasing difficulty of locating on points research, and using tools to obtain specific, on point, highly relevant information. Good job.

Stephen E Arnold, January 15, 2018

Online: Welcome to 1981 and 2018

January 8, 2018

I have been thinking about online. I met with a long-time friend and owner of a consumer-centric Web site. For many years (since 1993, in fact), the site grew and generated a solid stream of revenue.

At lunch, the site owner told me that in the last three years, the revenue was falling. As I listened to this sharp businessperson, I realized that his site had shifted from ads which he and his partners sold to ads provided by automated systems.

From direct control to the ease of automated ad provision created the current predicament: Falling revenue. At the same time, the mechanisms for selling ads directly evolved as well. The shift from many industry events to a handful of large business sector conferences took place. There were more potential customers at these shows, but the attendance shifted from hands-on marketers to people who wanted to make use of online automated sales and marketing systems began to dominate.

image

He said, “In the good old days of 1996, I could go to a trade show and meet people who made advertising and marketing decisions based on experience with print and TV advertising, dealer promotions, and ideas.”

“Now,” he continued, “I meet smart people who want to use methods which rely on automated advertising. When I talk about buying an ad on our site or sponsoring a section of our content, the new generation look at me like I’m crazy. What’s that?”

I listened. What could I say.

The good, old days maybe never existed.

I read “Facebook and Google Are Free. They Shouldn’t Be.” The write up has a simple premise: Users should pay for information.

I am not certain if the write up realizes that paying for online information was the only way to generate revenue from digital content in the past. I know that partners in law firms realize that running queries on LexisNexis and Westlaw have to allocate cash to pay for the digital information about laws, decisions, and cases. For the technical information in Chemical Abstracts, researchers and chemists have to pay as well. Financial data for traders costs money as well.

Read more

Give Bing a Chance

January 5, 2018

Google is still the most popular web search engine by far, but should we be giving Bing a closer look? Editor Anmol at the admittedly Microsoft-centric blog MSPowerUser explains, “Why I Prefer Bing Over Google (And You Should Too).” He begins with a little history:

Formerly called as MSN Search, Windows Live Search or Live Search, Bing was unveiled by former CEO of Microsoft, Steve Ballmer on May 28th, 2009 and went live on June 3rd. 2009.  Since then, Microsoft is showing its commitment to Bing as an Internet Search Engine rivalling the dominant giant Google. With Windows 8.1, Bing was deeply integrated with the OS with what was called ‘Smart Search’ and this was accessible from the Start Screen. But now a Search Engine is not used ‘just as a search engine.’ Now we use these services to find coffee places around us, book cabs, book movie tickets and more.

True. So why does the author think Bing is best? First, Bing integrates with the very useful Cortana, Microsoft’s digital assistant and, second, it is available across operating systems. Though others might disagree, Anmol feels Bing’s actual search results are as good as Google’s and, besides, it makes some good predictions. Here are the other strengths Anmol cites: a more appealing home page, the Microsoft rewards program, integration with Facebook Messenger, strong local search, package tracking, a capable image-search function, and its advanced math skills. Bing even seems to understand the needs of developers better than Google does. See the write-up for elaboration, including screenshots, on each of these points.

Anmol concludes:

Above are all that I think made me switch to Bing and are keeps me staying. All these features are brought together to life with advanced machine learning algorithms and years of research and hard work. As Microsoft is a productivity-focused Software giant, Bing is something that drives a large part of its revenue by conquering a large amount of market share. Because of their success already I can only see Microsoft offering even tougher competition to its largest rival Google.

Cynthia Murrell, January 5, 2018

 

Who Helps Trash Relevance in Search? INC Has the Answer

December 30, 2017

I read a story in Inc. magazine. The write up’s title is “9 SEO Experts To Follow In 2018.” First, Google is not a person. I think the idea is that a person who wants to buy traffic should pay attention to the GOOG. But I am not sure Google is an expert like the other eight names on the list.

Now my view of search engine optimization is a bit different from that of “experts” in search engine optimization. I think SEO is part of a carnival trick to get people to buy Adwords.

I explain some of the mechanisms in The Google Legacy and Google Version 2: The Calculating Predator. (Alas, out of print, but I sell a rough draft in PDF form. Write benkent2020 at yahoo dot com if you are interested.)

The idea is that people fix up their Web pages to meet Google guidelines. Changes which pass muster produce a boost in traffic. Then usually after a month or so, the changes don’t deliver the traffic. Traffic erodes.

Check with the Google. What’s the fix? More SEO? Nah, just buy Adwords.

When the advertiser grouses that leads aren’t as wonderful as they were perceived to be, what’s the fix?

Give up?

Buy Adwords.

The loop is a nifty one. Lots of SEO “experts” bill clients for changes which may or may not have substantive impact. When whatever impact fizzles, Google is able to suggest Adwords.

Nifty.

My take on the pay for traffic game is that it is evidence of the death of relevance.

Therefore, the eight “experts” are accessories to the termination with extreme prejudice the notion of entering a query and getting results which directly relate to that query.

Call me old fashioned but SEO experts are in cahoots with Google type outfits in the pay for traffic game.

Give me Boolean, precision, and recall.

Sounds crazy right? Just ask an SEO expert. Most will agree. Who cares about relevance and stupid precision and recall?

Well, I do.

Stephen E Arnold, December 30, 2017

You Cannot Search for Info If the Info Is Not Indexed: The Middle Kingdom Approach

December 26, 2017

I noted two items this morning as I geared up to video the next Dark Cyber program. (Dark Cyber is a new series of HonkinNews programs from the creator of this blog, Beyond Search.)

Item one’s title is “China Shuts Down Thousands of Websites in Internet Network Crackdown.” As I understand the article, Chinese authorities remove information to reduce the likelihood that problems will arise from unfettered information access, exchange, and communication. The article quotes one source as saying, “These moves have a powerful deterrent effect.” That’s true to some degree; however, squeezing the toothpaste tube of online content may result is forcing that information into channels which may be more difficult to constrain. Nevertheless, I find the action suggestive that the Wild West days of the Internet are drawing to a close in the Middle Kingdom.

Item two’s title is “China Sentences Man to Five Years in Jail for Running VPN Service.” The main idea is that the virtual private network approach to obfuscating one’s online activities is under scrutiny in China. Apple, as you may recall, removed VPN apps to comply with Chinese guidelines. I noted this passage in the source document:

Wu’s [the fellow who gets to sojourn 60 months in a prison] VPN service reportedly had 8,000 foreign clients and 5,000 businesses. However, he had failed to apply for a state permit. While his isn’t the first sentence since another person was sent to jail for nine months on similar charges, this is the first time that such a dramatic sentence has been approved, raising concerns about the government’s growing interest in controlling information that comes into the country.

What happens if one adds one plus two? The answer is, “You can’t search for information if it is not indexed.” What information in the US accessible indexes is not online.

This weekend I was looking for a story about a Norwich, UK, man who was sentenced to prison and placed on the UK register of sex offenders. The story was not in Google News. I located the story in Bing’s news index. I found this interesting, and you can get the gist of the arrest in the January 2, 2017, HonkinNews “Dark Cyber” program.

Stephen E Arnold, December 26, 2017

Data Analysis Startup Primer Already Well-Positioned

December 22, 2017

A new startup believes it has something unique to add to the AI data-processing scene, we learn from VentureBeat’s article, “Primer Uses AI to Understand and Summarize Mountains of Text.” The company’s software automatically summarizes (what it considers to be) the most important information from huge collections of documents. Filters then allow users to drill into the analyzed data. Of course, the goal is to reduce or eliminate the need for human analysts to produce such a report; whether Primer can soar where others have fallen short on this tricky task remains to be seen. Reporter Blair Hanley Frank observes:

Primer isn’t the first company to offer a natural language understanding tool, but the company’s strength comes from its ability to collate a massive number of documents with seemingly minimal human intervention and to deliver a single, easily navigable report that includes human-readable summaries of content. It’s this combination of scale and human readability that could give the company an edge over larger tech powerhouses like Google or Palantir. In addition, the company’s product can run inside private data centers, something that’s critical for dealing with classified information or working with customers who don’t want to lock themselves into a particular cloud provider.

Primer is sitting pretty with $14.7 million in funding (from the likes of Data Collective, In-Q-Tel, Lux Capital, and Amplify Partners) and, perhaps more importantly, a contract with In-Q-Tel that connects them with the U.S. Intelligence community. We’re told the software is being used by several agencies, but that Primer knows not which ones. On the commercial side, retail giant Walmart is now a customer. Primer emphasizes they are working to enable more complex reports, like automatically generated maps that pinpoint locations of important events. The company is based in San Francisco and is hiring for several prominent positions as of this writing.

Cynthia Murrell, December 22, 2017

Search System from UAEU Simplifies Life Science Research

December 21, 2017

Help is on hand for scientific researchers tired of being bogged down in databases in the form of a new platform called Biocarian. The Middle East’s ITP.net reports, “UAEU Develops New Search Engine for Life Sciences.” Semantic search is the key to the more efficient and user-friendly process. Writer Mark Sutton reports:

The UAEU [United Arab Emirages University] team said that Biocarian was developed to address the problem of large and complex data bases for healthcare and life science, which can result in researchers spending more than a third of their time searching for data. The new search engine users Semantic Web technology, so that researchers can easily create targeted searches to find the data they need in a more efficient fashion. … It allows complex queries to be constructed and entered, and offers additional features such as the capacity to enter ‘facet values’ according to specific criteria. These allow users to explore collated information by applying a range of filters, helping them to find what they are looking for quicker.

Project lead Nazar Zaki expects that simplifying the search process will open up this data to many talented researchers (who don’t happen to also be computer-science experts), leading to significant advances in medicine and healthcare. See the article for on the Biocarian platform.

Cynthia Murrell, December 21, 2017

If You Want Search Engines to Eliminate Fake News, Cautiously Watch Russia

December 21, 2017

There is a growing rallying cry for social media and search to better police fake news. This is an admirable plan, because nobody should be misled by false information and propaganda. However, as history has told us, those in charge of misinformation and propaganda can often use changes like this to their advantage. Take, for example, the recent Motherboard story, “How Russia Polices Yandex, Its Most Popular Search Engine,” which detailed how Russia aimed to get rid of its “fake news” but really only encourages more of it.

The story says,

This year, the “news aggregator law” came into effect in Russia. It requires websites that publish links to news stories with over one million daily users (Yandex.News has over six million daily users) to be responsible for all the content on their platform, which is an enormous responsibility.

 

‘Our Yandex.News team has been actively working to retain a high quality service for our users following new regulations that impacted our service this past year,’ Yandex told Motherboard in a statement, adding that to comply with new regulations, it reduced the number of sources that it aggregated from 7,000 to 1,000, which have official media licenses.’

In short, since the government oversees part of Yandex, the government can make it harder to publish stories that are not favorable to itself. It’s food for thought, especially to the Mark Zuckerbergs of the world calling for more government oversight in social media. You might not get exactly what you hoped for when a third party starts calling the shots.

Patrick Roland, December 21, 2017

Analyze the JFK Files to Your Hearts Content

December 20, 2017

History buffs, especially those interested in the JFK assassination, may want to check this out—“Research the JFK Files for Free with Logikcull.” Since the National Archives’ release of previously classified documents on the matter, eDiscovery firm Logikcull has uploaded them to their platform. They invite anyone interested to delve into the data and help make sense of it, using their software. It is a crowd-sourced project around a matter of great public interest that happens to expose potential users to their platform’s abilities—well-played. The post specifies:

The files are, of course, a mess. They are disorganized, incomplete, voluminous, and cobbled together from dozens of different sources. That is, they’re just like the files you’d find in any other document-intensive investigation. And, thankfully, we have eDiscovery software that is designed to help you make order and insight out of just such a mess. … To help researchers, journalists, JFK enthusiasts, concerned members of the public, and the like, we’ve uploaded the documents from the JFK Files into Logikcull, allowing you to apply Logikcull’s state-of-the-art discovery technology to the nearly 3,000 records released by the government. You can use Logikcull to cull through the junk and focus in on the documents that most interest you, to build complex, powerful searches with ease, and flag documents with tags of your choosing. There’s no need to flip through the documents declassified page by declassified page.

To get in on the sleuthing, readers are told to send an email to marketing@logikcull.com with the subject, “JFK Research Account,” and to specify their name, title, and company. It will be interesting what connections and conclusions this project turns up.

Founded in 2004 and based in San Francisco, Logikcull serves organizations from the US Government to Fortune 500 companies. They also happen to be hiring as of this writing.

Cynthia Murrell, December 20, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta