Zabasearch: Not Too Useful for Targeting Me

May 26, 2020

A reader wanted to know, “Have you used Zabasearch.” The answer was, “No.” I navigated to the Web site and learned:

People Search. Honestly Free! Search by Name.
Find People in the USA. Free People Finder.

Free. Okay! I tested the system by plugging my name into the search box. This once was called “ego surfing.” Here’s what the system revealed:

image

According to Zabasearch, I am in an industrial park between two computer stores. I feel safe because I am not at that location. I have driven by that location.

I did a Zabasearch using the site’s reverse phone look up. It found the name of the person whose number I plugged in. Once again, instead of living 500 feet from my office (not in the middle of a parking lot, thank you), the individual resides at the Edge Full Service Salon somewhere in West Louisville.

Close enough for free? Sure.

The system reports an incorrect telephone number for me too. I called it and I was invited to leave a message at Entré Computer Center. I checked the full profile and discovered that I am related to “Jeff Arnold.” Nope, sorry.

Net net: Zabrasearch is not likely to become my go-to person locator or source of phone numbers.

Stephen E Arnold, May 35, 2020

Microsoft: Rationalizing Is a Synonym for Good Enough Search

May 25, 2020

On May 16, 2020, Microsoft — the JEDI champions and the target of amusement for Google’s Action Blocks — updated its “Rationalizing Semantic and Keyword Search on Microsoft Academic” page. One notable change is references to everyone’s favorite pandemic and bandwagon for virtue signaling: Covid 19.

What’s Microsoft saying about its Microsoft Academic Search?

The write up points out that the four year old method for delivering “results that best matched semantically coherent interpretations of user queries, informed by the Microsoft Academic Graph (MAG)” is fixed up. I assume this means the fixing up which Longhorn required before it became semi ready for prime time.

Microsoft points out (mostly in a mist of misinformation) that the competitors just do keyword matches. I won’t repeat what I have written in my three Google monographs, the New Landscape of Search, and numerous columns and blog posts.

Well, Microsoft does allow some stupid, old fashioned, and hopelessly archaic keyword searching. The new search will avoid returning pages with null results or zero hits. FYI, gentle reader, learning there are “no hits” is high value information for many queries. Just ask someone running scientific, technical, engineering, and medical queries. Those quite specific searches with no hits are informationized payloads.

Keyword matching is now “rudimentary.” And what’s better? Okay, Boolean lovers who know how to formulate specific queries created after a reference interview by the light of an oil lamp in a damp cave in Eastern Europe:

To put it simply, we’ve changed our semantic search implementation from a strict form where all terms must be understood to a looser form where as many terms as possible are understood.

What’s this mean? Irrelevant, or at best tangential information. But without the explicit mechanisms of a faceted based search system. (Endeca, Endeca, why did you beat up on those who wanted to perform “guided navigation?” Are you wizards to blame?)

The write up presents some before and after queries. Guess what? You get more results, more to scan and review, and more time burned because the search system is being helpful.

Ah, no, thank you.

There is zero search system of which I know capable of “knowing” how to relax a query to provide the specific information for which I am looking. I prefer to formulate a query, scan, reformulate the query, scan, and hone my attention to the content object which in my judgment a useful nugget of information can be found.

Microsoft presents data and “distance” as evidence their new and improved system works. Better than sliced bread? For Microsoft search experts, the answer is a chorus of “yes indeeds.”

The result is another modern system which makes a person less skilled in retrieving “academic” information get a “good enough” answer.

Remember. This Microsoft outfit is going to be in the warfighting game. How does “good enough” information retrieval intentionally displaying content not directly related to the query meet the needs of an analyst in one of the more academic units of the Pentagon?

Oh, I bet this new system is not intended for that PhD. That individual uses a next generation information retrieval which provides specific tools to locate on point information.

Microsoft wants to be the search champion. Too bad it is emulating the king of irrelevant results and doing it without the payoff of massive advertising revenue.

Need academic information? Gentle reader, try iSeek, Qwant or Swisscows or your library’s online commercial databases. Include Microsoft’s offering, but supplement, analyze, and aggregate. You know like do research, not accept what the JEDI crowd offers up.

Stephen E Arnold, May 25, 2020

Crazy Enterprise Search Market Report for May 25, 2020

May 25, 2020

Another crazy enterprise market report is now available. This one skips when the report was written, falling back on the vague word “recent.” In fact, my hunch is that this is one dicey report marketed under different aliases in order to gin up sales.

The title? “Enterprise Search Market Dynamics, Comprehensive Analysis, Business Growth, Revealing Key Drivers, Prospects and Opportunities 2025”

What’s in this gem from Market Study Report. The write up about the report promises:

The recent document on the Enterprise Search market involves breakdown of this industry as well as division of this vertical. As per the report, the Enterprise Search market is subjected to grow and gain returns over the predicted time period with an outstanding growth rate y-o-y over the predicted period.

Yep, outstanding. Obviously the global economic downturn has not had an impact on the half century young enterprise search software sector.

Enterprise search solutions are hot items. Forget hand sanitizer and surgical masks, enterprise search solutions are the barn burners. Are their lines of eager customers queuing outside of Algolia, Coveo, Elastic, IBM Omnifind’s office, Lucidworks, and Microsoft’s search facility in Beijing? Sure, sure, long lines. No social distancing either. Jostling and crowding is what happens when a sizzler is on offer.

The report presents information “with regards to the geographical landscape.” Yep, but how many languages do enterprise search systems support?

What’s interesting is the list of companies analyzed in the report? Here you go:

Attivio Inc

Concept Searching Limited

Coveo Corp

Dassault Systemes

Expert System Inc

Google

Hyland

IBM Corp

Lucid Work (Would it be helpful if the report authors spelled the name of the company correctly, wouldn’t it?)

Marklogic Inc

Micro Focus

Oracle

SAP AG

Microsoft

X1 Technologies

There are some notable omissions, but I won’t provide these names. Obviously I am not hip to where enterprise search is at at this moment.

In the three editions of the Enterprise Search Report I wrote, it never crossed my mind to include this a manufacturing cost structure analysis. Poor stupid me.

What seems clear is that whoever is marketing this report recycles the content under different names, hoping for a sale.

The data in the report, one hopes, is more polished than the promotional material.

Stephen E Arnold, May 25, 2020

Search: Contentious and Increasingly Horrible

May 25, 2020

I dropped enterprise search, commercial search, and vertical search to the bottom of my “Favorite Topics” list years ago.

Why?

The individuals popping up and off at conferences were disconnected from the realities of looking for information under stressful circumstances.

image

Hey, big rocks, how did you move from that quarry kilometers away and get yourselves smoothed down? Just like modern online search systems, you won’t get an answer. Finding information relevant to a query is as difficult as getting megalithic stones to become Chatty Kathies.

The thumb typing crowd, some are now in their mid forties, ASSUME that search has to think for the stupid user.

The techniques range from smart software which skews results in what are to an experienced researcher stupid ways. For those search experts concerned with making their information or their name appear number one on a results list, good search was anything that produced a top spot in a result list even if that result was stupid, irrelevant, or shameless ego jockeying. Then there are the chipper, super confident experts who emerged from an educational system which awarded those who showed up and sort of behaved a blue ribbon. Yep, everything that group does is just wonderful. Yeah, right.

You can see the consequences of two forces colliding when you read Science Magazine’s “They Redesigned PubMed, a Beloved Website. It Hasn’t Gone Over Well.”

You can work through the examples in the source article. The pain points range from appearance to search functionality.

Why did this happen?

The change is a result of people who do not have the experience of performing search under stressful conditions. No, I don’t mean locating the Cuba Libre restaurant in Washington, DC, on a Google Map. I mean looking up technical information to complete a lab test, perform a diagnosis, locate a procedure, or some similar action. There is a pandemic going on, isn’t there?

The complaints indicate that the “new” PubMed is not perceived as a home run.

Go read the original.

I want to offer several observations:

  1. Those who do research with intent need predictability; that is, when a Boolean query is entered, the results should reflect that logic. Modern systems think Boolean is stupid. There you go, a value judgment from those with “Also Participated” ribbons in high school.
  2. Interfaces should allow the user to select an approach. There are some users who like a blinking dot or a question mark. Enter the commands and get a text output. Others like the Endeca style training wheels, although I doubt if any of the modern “helper” interfaces know what Endeca offered. Other may want some other type of interface like a PhD approach; that is, push here, dummy. The point is: Why not allow the user to select the interface?
  3. Change is introduced for dark purposes. Catalina has many points of friction so that Apple can extend its span of control. Annoying? Sure is. Why doesn’t Apple tell the truth about these friction points? What? Tell the truth, are you crazy. Apple, like Facebook and Google, are doing what they can to protect their hegemony, and the user is the victim. Tough. The same logic applies to PubMed. Dollars to donuts there is a “reason” for the change, and it may be due to whimsy, money, or the need to demonstrate the team is actually doing something instead of just having meetings with contractors.

Net net: Search, as I wrote for Barbara Quint in the now departed magazine Searcher, search is dead. Each day the hope for a better, more appropriate way to locate online information becomes lost in the mists of time. Getting relevant information from PubMed or any modern systems is like trying to get the stone of Ollantaytambo to explain how the rocks moved eons ago.

Finding information today is more difficult than at any other time in my professional career. That’s a big problem.

Stephen E Arnold, May 24, 2020

Microsoft and Its Latest Search Innovation: Moving Past Fast? Nope

May 22, 2020

I read “Microsoft Search: Search Your Document Like You Search the Web.” Perhaps Microsoft did not get the reports about the demise of the Google Search Appliance. That “invention” made clear that searching a corporate content collection like you search the Web was not exactly the greatest thing since sliced bread. There were a number of reasons for the failure of the GSA. It was a black box. You know that mere mortals could not tune the relevance component. You know that it produced results that left employees wondering, “Where is the document I wrote yesterday?” You know that the corpus of Web content is different from the fruit cake of corporate content. Web search returns something because the system is rigged to find a way to display ads to the hapless searcher.

Contrast this with documents in the cloud, in different systems like that old AS/400 Ironsides application used by the warehouse supervisors, and content tucked away on employees’ USB drives, mobile phones, the oldest kid’s iPad, and on services a go to sales professional uses to store PowerPoints for “special” customers. Then there are the documents in the corporate legal office. The consultants’ reports scanned and stored on the Market Department’s computer kept for interns.

Nevertheless, the article explains:

We’re utilizing well-established web search technologies, such as query and document understanding, and adding deep learning based natural language models. This allows us to handle a much broader set of search queries beyond “exact match.”

Okay, query expansion, synonym look up, and Fast Search’s concept feature. But there’s more:

With the recent breakthroughs in deep learning techniques, you can now go beyond the common search term-based queries. The result is answers to your questions based on the document content. This opens a whole new way of finding knowledge. When you’re looking at a water quality report, you can answer questions like “where does the city water originate from? How to reduce the amount of lead in water?”

May I suggest that Microsoft and dozens of other enterprise search vendors have promised magical retrieval?

May I point out that the following content types are usually outside the ken of the latest and great enterprise search confection; for example:

  • Quality control data on parts stored in an Autodesk engineering document
  • Real time data flowing into an organization from sensors
  • Video content, audio content, and rich media like photographs
  • Classified or content restricted by certain constraints. (Access controls are often best implemented by specialized systems unknown to the greedy enterprise search indexing system.)
  • Documents obtained through an eDiscovery process for legal matters.

Has Microsoft solved these problems? Sure, if everything (note the logically impossible categorical affirmative) is in an Azure repository, it is conceivable that a user query could return a particular content object.

But that’s Microsoft fantasy land, and it is about as likely as Mr. Nadella arriving at work on the back of a unicorn.

Microsoft feels compelled to reinvent search every year or two. The longest journey begins with a single step. It is just that Microsoft took those steps decades ago and still has not reached the now rubbelized Fred Harvey’s.

Stephen E Arnold, May 22, 2020

Lucidworks: Buzzwording in the Pandemic

May 19, 2020

Lucid Imagination (the outfit which contributed some Lucene/Solr talent to Amazon search) renamed itself Lucidworks. The company then embarked on becoming a West Coast version of Fast Search & Transfer, a Splunk like outfit, and now a customer support provider.

That’s a remarkable trajectory for a company built on open source software with more than $200 million in funding since 2007.

One of the DarkCyber researchers spotted “Lucidworks Develops Deep Learning Solution to Make Chatbots Smarter.” The story appeared in a New Zealand online publication. That’s interesting, but more intriguing is that Lucidworks is following in the marketing footsteps of Attivio, Coveo, and other vendors of search and retrieval. The destination customer service. Who doesn’t love automated customer support chat robots, self serve Web sites with smart software, and the general extinction of individuals who actually know a company’s software or hardware products?

The write up states:

Deep learning is essential for automated chatbots to understand natural language questions and to provide the right answers, which is something that AI-powered search firm Lucidworks has taken on board.

And why?

According to Lucidworks, companies rely on digital portals to provide information to users, whether digital commerce customers looking for product information before purchase, employees hunting for an HR document, or someone looking for an airline’s updated cancellation policies. Information is often scattered across disparate silos and is impossible for a user to locate using natural language questions.

But smart software is available from Amazon with a credit card and some free training courses. Outfits from Algolia to Voyager Search offer the service.

What is interesting is the buzzword salad tossed into this reheated plastic container of mapo tofu:

  • AI (artificial intelligence)
  • Automated
  • Chatbots
  • Conversational
  • Deep learning
  • Digital portals
  • Engagement
  • Experiences
  • Fusion
  • Natural language
  • Satisfaction
  • User intent
  • Virtual assistants

Quite vocabulary and what seems an exercise in content marketing. Plus, eager customers in New Zealand will have an opportunity to help the company repay its investors the $200 million plus interest. That works out to 13 years in the enterprise search wilderness before arriving at chatbots.

Options abound and many of them are open source and well documented.

Stephen E Arnold, May 19, 2020

Boolean Is Better but Maybe Google Must Motor Through Ad Inventory by Relaxing Queries…a Lot?

May 17, 2020

A brief exchange on StackExchange demonstrates some common sense. One user, moseisley.2015, asks the community, “Should Default Search Behavior be ‘This AND That,’ or ‘This OR That’?” They elaborate:

“I have web application that shows lists of various data types … employees, customers, inventory items, orders, and so on. There’s one simple search field for doing a ‘global’ search … . Question is, when a user enters multi-word text in the field should the default search behavior be (1) this OR that or (2) this AND that? What default behavior do you think average users would expect?”

Their example lists four records: John Smith, John Jones, Michael Smith, and Betty Taylor-Smith. Would users expect the query “John Smith” to return just the first record (AND) or all four (OR)? As any online researcher from the ‘70s and ‘80s would tell you, the Boolean AND is the better default. The first respondent, SNag, sensibly writes:

“As a user, the more I type in, the more specific I’m expecting the results to get, and this is what happens with AND. With OR, your results would explode! If my search for popular Google Doodle games gave me everything that was popular, everything Google, everything Doodle and every game out there, I’d be lost! If you’re expecting your user to fetch all matching either John or Smith results, consider supporting syntax like John|Smith (where | is the logical symbol for OR) and placing a hint ? icon next to the search box to showcase the various supported syntaxes. You could also consider quotes in the search syntax for exact matches, where “Smith” wouldn’t match Taylor-Smith, but Smith would. “John”|”Smith” would then match all John and all Smith but not Betty Taylor-Smith.”

We concur. The second respondent, Big_Chair, adds a good observation—users without any programming background are probably unfamiliar with the | character and may need a more explicit cue that their query is about to return results based on OR rather than AND.

Cynthia Murrell, May 17 2020

Google: Regular Search Not Up to Covid19 Queries. Who Knew?

May 15, 2020

Google has launched a new semantic search tool designed to help researchers fight this pandemic. The Google AI Blog reveals “An NLU-Powered Tool to Explore COVID-19 Scientific Literature.” As one might expect, researchers around the world have been turning out an enormous number of papers on the disease and how we might fight it. Why does this call for a special tool? Google researcher Keith Hall writes:

“Traditional search engines can be excellent resources for finding real-time information on general COVID-19 questions like ‘How many COVID-19 cases are there in the United States?’, but can struggle with understanding the meaning behind research-driven queries. Furthermore, searching through the existing corpus of COVID-19 scientific literature with traditional keyword-based approaches can make it difficult to pinpoint relevant evidence for complex queries. To help address this problem, we are launching the COVID-19 Research Explorer, a semantic search interface on top of the COVID-19 Open Research Dataset (CORD-19), which includes more than 50,000 journal articles and preprints.”

Based on the BERT technology recently injected into the general Google Search, this bespoke semantic AI has been trained on biomedical literature. The team chose to build a hybrid term-neural retrieval model for this platform—a combination of keyword search and neural retrieval; see the article for the technical details. Here’s how the search functions:

“When the user asks an initial question, the tool not only returns a set of papers (like in a traditional search) but also highlights snippets from the paper that are possible answers to the question. The user can review the snippets and quickly make a decision on whether or not that paper is worth further reading. If the user is satisfied with the initial set of papers and snippets, we have added functionality to pose follow-up questions, which act as new queries for the original set of retrieved articles.”

The open-alpha platform is available for free to the research community, and Google plans to continue refining the system over the next few months. May this tool help scientists find solutions that much faster.

Cynthia Murrell, May 15, 2020

Deindexing: Does It Officially Exist?

May 14, 2020

DarkCyber noted “LinkedIn Temporarily Deindexed from Google.” The rock solid, hard news service stated:

LinkedIn found itself deindexed from Google search results on Wednesday, which may or may not have occurred due to an error on their part. The telltale sign of an entire domain being deindexed from Google is performing a “site:” search and seeing zero results.

Mysterious.

DarkCyber has fielded two reports of deindexing from Google in the last three days. I one case a site providing automobile data was disappeared. In another, a site focused on the politics of the intelligence sector was pushed from page one to the depths of page three.

Why?

No explanation, of course.

LinkedIn is owned by Microsoft. Is that a reason? Did LinkedIn’s engineers ignore a warning about a problem in AMP?

Google does not make errors. If a problem arises, the cause is the vaunted Google smart software.

DarkCyber’s view is that Google is taking stepped up action to filter certain types of content. We have documented that one Google office has access to controls that can selectively block certain content from appearing in the public facing Web search system. The content is indeed indexed and available to those with certain types of access.

What’s up? Here are our theories?

  1. Google is trying to deal with problematic content in a more timely manner by relaxing constraints on search engineers working in Google “virtual offices” around the world. Human judgments will affect some Web site. (Contacting Google is as difficult as it has been for the last 20 years.)
  2. Google wants to make sure that ads do not appear next to content that might cause a big spender to pull away. Google needs the cash. The thought is that Amazon and Facebook are starting to put a shunt in the money pipeline.
  3. Google is struggling to control costs. Slowing indexing, removing sites from a crawl, and pushing content that is rarely viewed to the side of the Information Superhighway reduces some of the costs associated with serving more than 95 percent of the queries launched by humans each day.

Regardless of the real reason or the theoretical ones, Google’s control over findable content can have interesting consequences. For example, more investigations are ramping up in Europe about the firm’s practices (either human or software centric).

Interesting. Too bad others affected by Google actions are not of the girth and heft of LinkedIn. Oh, well, the one percent are at the top for a reason.

Stephen E Arnold, May 14, 2020

New Arnold-Steele Discussion: Findability Is Terrible

May 7, 2020

Robert David Steele, a former CIA professional, stored a video of our recent discussion about finding open source information. The main point is that findability has degraded to the point that results are generally useless. Bing, Google, and other ad-supported systems have abandoned precision and relevance. Search results are a dog’s breakfast. To view the findabiity discussion, navigate to this link. The video was produced by Mr. Steele.

Stephen E Arnold, May 7, 2020

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta