Now These Are Numbers You Can Bank On

June 1, 2020

In the midst of the pandemic, DarkCyber noted “How Semantic Search Helps Users Help Themselves.” The write up is from Lucidworks, a company reselling open source engineering support, proprietary software, and other jazzed up solutions. In the write up was a reference to an IBM document. The idea is that the IBM data make a case for buying IBM? Of course not. The data support the contention that semantic search is like training wheels on a toddler’s bicycle.

What are these magical data? First, the data come from an IBM blog post dated October 17, 2017. That’s a couple of years ago. Change does happen, doesn’t it?

Check out these numbers:

  • Businesses spend $1.3 trillion on 265 billion customer service calls each year
  • Phone interactions cost around $35-$50
  • Text chat costs about $8-$10 per session
  • It is realistic to aim to deflect between 40% – 80% of common customer service inquiries to automated frameworks.
  • A drop in per-query cost from $15-$200 (human agents) to $1 (virtual agents)

What’s the connection to the SOLR centric Lucidworks? The company wants to convince prospects that it has the solution known as chatbots. Clever phrase for what is a cost reduction play. Do chatbots work? That depends on whom one asks.

The good thing about chatbots is that they don’t create Rona hot spots. The bad thing is that most of the chatbots don’t work particularly well.

The IBM data, even though old and not in step with the Rona business climate, suggest that the on going cost of helping a “customer” deal with a product and service is brutal. Combine these here and now costs with the technical debt of informationized products and services and what do you get?

The short answer is that one has to have quite a bit of money to keep the good ship technology afloat.

Even Google-type companies, faced with sky rocketing costs and a dicey economic environment, are having to make money saving changes.

Net net: The happy talk about super duper technologies often creates cost black holes. What about IBM? Layoffs and ultra hedgey forecasts. What about Lucidworks type outfits? Wow. Much sales work ahead.

One suggestion? Watch those assertions and one’s cost accounting. Can one “help oneself”? Absolutely, maybe.

Stephen E Arnold, June 1, 2020

Crazy Enterprise Search Report: Sketchy Astounding Info PLUS a Free Consultation

June 1, 2020

This week’s crazy enterprise search report is titled “Enterprise Search Market: Global Industry Analysis 2020-2026 by Types, Applications and Key Players.” The content seems to be a rehash, reprint, or repositioning of the weird Covid and enterprise search market report. The DarkCyber team did a little poking around, and it appears the “author” of this report is using free news release services. As we have noted in our previous crazy ESR market stories, the companies covered are a fruit salad. Elastic is left out; Concept Searching is included. Also rans like Expert System, IBM, and SAP are included. The others? Well, each company uses “enterprise search” in its marketing material. That is close enough for horse shoes for this report.

But the real plus is that after you buy the multi thousand dollar report, the buyer gets “free consulting.” From whom? Not revealed? On what? Not disclosed. How good? Not addressed.

Some people must buy these reports. Google believes these news releases are “real news.” Well, that’s a plus. If one is not in Google, one does not exist,  right. That’s a bit like the market for enterprise search when Elasticsearch is a click away. The data in the report? Maybe a Hopf fibration calculation gone awry? Maybe Dr. Hopf (were he alive) would award an “A” for effort?

Stephen E Arnold, June 1, 2020

Wiby Search

May 29, 2020

DarkCyber noted the existence of Wiby, a throwback Web search system, in 2017. The idea for the service is to process queries. The queries are matched to shorter or old-fashioned Web pages. Let’s take a look at queries run on May 28, 2020, for some current hot topics. Wiby may be a precursor of the small Web movement. More details about this type of thinking appear in “Rediscovering the Small Web.”

Here’s the query for “Inca stone quarry” and the results:


The results are not directly related to the Inca or quarries. The system did return an off color headline “Results of Suck Off Between Eminem, Rolling Stone, and the Grammy’s.” DarkCyber doubts the relevance methodology used by Wiby.

A less arcane query “Ryzen 3950x” retrieved these results:


One similarity between each result set is the appearance of the morpheme “suck.” DarkCyber finds this interesting. The results are off point.

There is also some basic information about the service on the unlinked About page. We learned:

Search engines like Google are indispensable, able to find answers to all of your technical questions; but along the way, the fun of web surfing was lost. In the early days of the web, pages were made primarily by hobbyists, academics, and computer savvy people about subjects they were interested in. Later on, the web became saturated with commercial pages that overcrowded everything else. All the personalized websites are hidden among a pile of commercial pages. Google isn’t great at finding those gems, its focus is on finding answers to technical questions, and it works well; but finding things you didn’t know you wanted to know, which was the real joy of web surfing, no longer happens. In addition, many pages today are created using bloated scripts that add slick cosmetic features in order to mask the lack of content available on them. Those pages contribute to the blandness of today’s web. The Wiby search engine is building a web of pages as it was in the earlier days of the internet. In addition, Wiby helps vintage computers to continue browsing the web, as page results are more suitable for their performance. What’s the upside of Wiby? The system does generate some surprising results. No query is needed. Wiby offers a link which says “surprise me.”

Wiby also offers an old fashioned “submit a url” form. I entered one of my Web sites. Nothing happened, but maybe there is an editorial review process which struggles with law enforcement and intelligence related content? You can find the “submit a url” page at this link.

When one has an idle moment, a click on surprise me can be interesting.

Stephen E Arnold, May 29, 2020

Zabasearch: Not Too Useful for Targeting Me

May 26, 2020

A reader wanted to know, “Have you used Zabasearch.” The answer was, “No.” I navigated to the Web site and learned:

People Search. Honestly Free! Search by Name.
Find People in the USA. Free People Finder.

Free. Okay! I tested the system by plugging my name into the search box. This once was called “ego surfing.” Here’s what the system revealed:


According to Zabasearch, I am in an industrial park between two computer stores. I feel safe because I am not at that location. I have driven by that location.

I did a Zabasearch using the site’s reverse phone look up. It found the name of the person whose number I plugged in. Once again, instead of living 500 feet from my office (not in the middle of a parking lot, thank you), the individual resides at the Edge Full Service Salon somewhere in West Louisville.

Close enough for free? Sure.

The system reports an incorrect telephone number for me too. I called it and I was invited to leave a message at Entré Computer Center. I checked the full profile and discovered that I am related to “Jeff Arnold.” Nope, sorry.

Net net: Zabrasearch is not likely to become my go-to person locator or source of phone numbers.

Stephen E Arnold, May 35, 2020

Microsoft: Rationalizing Is a Synonym for Good Enough Search

May 25, 2020

On May 16, 2020, Microsoft — the JEDI champions and the target of amusement for Google’s Action Blocks — updated its “Rationalizing Semantic and Keyword Search on Microsoft Academic” page. One notable change is references to everyone’s favorite pandemic and bandwagon for virtue signaling: Covid 19.

What’s Microsoft saying about its Microsoft Academic Search?

The write up points out that the four year old method for delivering “results that best matched semantically coherent interpretations of user queries, informed by the Microsoft Academic Graph (MAG)” is fixed up. I assume this means the fixing up which Longhorn required before it became semi ready for prime time.

Microsoft points out (mostly in a mist of misinformation) that the competitors just do keyword matches. I won’t repeat what I have written in my three Google monographs, the New Landscape of Search, and numerous columns and blog posts.

Well, Microsoft does allow some stupid, old fashioned, and hopelessly archaic keyword searching. The new search will avoid returning pages with null results or zero hits. FYI, gentle reader, learning there are “no hits” is high value information for many queries. Just ask someone running scientific, technical, engineering, and medical queries. Those quite specific searches with no hits are informationized payloads.

Keyword matching is now “rudimentary.” And what’s better? Okay, Boolean lovers who know how to formulate specific queries created after a reference interview by the light of an oil lamp in a damp cave in Eastern Europe:

To put it simply, we’ve changed our semantic search implementation from a strict form where all terms must be understood to a looser form where as many terms as possible are understood.

What’s this mean? Irrelevant, or at best tangential information. But without the explicit mechanisms of a faceted based search system. (Endeca, Endeca, why did you beat up on those who wanted to perform “guided navigation?” Are you wizards to blame?)

The write up presents some before and after queries. Guess what? You get more results, more to scan and review, and more time burned because the search system is being helpful.

Ah, no, thank you.

There is zero search system of which I know capable of “knowing” how to relax a query to provide the specific information for which I am looking. I prefer to formulate a query, scan, reformulate the query, scan, and hone my attention to the content object which in my judgment a useful nugget of information can be found.

Microsoft presents data and “distance” as evidence their new and improved system works. Better than sliced bread? For Microsoft search experts, the answer is a chorus of “yes indeeds.”

The result is another modern system which makes a person less skilled in retrieving “academic” information get a “good enough” answer.

Remember. This Microsoft outfit is going to be in the warfighting game. How does “good enough” information retrieval intentionally displaying content not directly related to the query meet the needs of an analyst in one of the more academic units of the Pentagon?

Oh, I bet this new system is not intended for that PhD. That individual uses a next generation information retrieval which provides specific tools to locate on point information.

Microsoft wants to be the search champion. Too bad it is emulating the king of irrelevant results and doing it without the payoff of massive advertising revenue.

Need academic information? Gentle reader, try iSeek, Qwant or Swisscows or your library’s online commercial databases. Include Microsoft’s offering, but supplement, analyze, and aggregate. You know like do research, not accept what the JEDI crowd offers up.

Stephen E Arnold, May 25, 2020

Crazy Enterprise Search Market Report for May 25, 2020

May 25, 2020

Another crazy enterprise market report is now available. This one skips when the report was written, falling back on the vague word “recent.” In fact, my hunch is that this is one dicey report marketed under different aliases in order to gin up sales.

The title? “Enterprise Search Market Dynamics, Comprehensive Analysis, Business Growth, Revealing Key Drivers, Prospects and Opportunities 2025”

What’s in this gem from Market Study Report. The write up about the report promises:

The recent document on the Enterprise Search market involves breakdown of this industry as well as division of this vertical. As per the report, the Enterprise Search market is subjected to grow and gain returns over the predicted time period with an outstanding growth rate y-o-y over the predicted period.

Yep, outstanding. Obviously the global economic downturn has not had an impact on the half century young enterprise search software sector.

Enterprise search solutions are hot items. Forget hand sanitizer and surgical masks, enterprise search solutions are the barn burners. Are their lines of eager customers queuing outside of Algolia, Coveo, Elastic, IBM Omnifind’s office, Lucidworks, and Microsoft’s search facility in Beijing? Sure, sure, long lines. No social distancing either. Jostling and crowding is what happens when a sizzler is on offer.

The report presents information “with regards to the geographical landscape.” Yep, but how many languages do enterprise search systems support?

What’s interesting is the list of companies analyzed in the report? Here you go:

Attivio Inc

Concept Searching Limited

Coveo Corp

Dassault Systemes

Expert System Inc



IBM Corp

Lucid Work (Would it be helpful if the report authors spelled the name of the company correctly, wouldn’t it?)

Marklogic Inc

Micro Focus




X1 Technologies

There are some notable omissions, but I won’t provide these names. Obviously I am not hip to where enterprise search is at at this moment.

In the three editions of the Enterprise Search Report I wrote, it never crossed my mind to include this a manufacturing cost structure analysis. Poor stupid me.

What seems clear is that whoever is marketing this report recycles the content under different names, hoping for a sale.

The data in the report, one hopes, is more polished than the promotional material.

Stephen E Arnold, May 25, 2020

Search: Contentious and Increasingly Horrible

May 25, 2020

I dropped enterprise search, commercial search, and vertical search to the bottom of my “Favorite Topics” list years ago.


The individuals popping up and off at conferences were disconnected from the realities of looking for information under stressful circumstances.


Hey, big rocks, how did you move from that quarry kilometers away and get yourselves smoothed down? Just like modern online search systems, you won’t get an answer. Finding information relevant to a query is as difficult as getting megalithic stones to become Chatty Kathies.

The thumb typing crowd, some are now in their mid forties, ASSUME that search has to think for the stupid user.

The techniques range from smart software which skews results in what are to an experienced researcher stupid ways. For those search experts concerned with making their information or their name appear number one on a results list, good search was anything that produced a top spot in a result list even if that result was stupid, irrelevant, or shameless ego jockeying. Then there are the chipper, super confident experts who emerged from an educational system which awarded those who showed up and sort of behaved a blue ribbon. Yep, everything that group does is just wonderful. Yeah, right.

You can see the consequences of two forces colliding when you read Science Magazine’s “They Redesigned PubMed, a Beloved Website. It Hasn’t Gone Over Well.”

You can work through the examples in the source article. The pain points range from appearance to search functionality.

Why did this happen?

The change is a result of people who do not have the experience of performing search under stressful conditions. No, I don’t mean locating the Cuba Libre restaurant in Washington, DC, on a Google Map. I mean looking up technical information to complete a lab test, perform a diagnosis, locate a procedure, or some similar action. There is a pandemic going on, isn’t there?

The complaints indicate that the “new” PubMed is not perceived as a home run.

Go read the original.

I want to offer several observations:

  1. Those who do research with intent need predictability; that is, when a Boolean query is entered, the results should reflect that logic. Modern systems think Boolean is stupid. There you go, a value judgment from those with “Also Participated” ribbons in high school.
  2. Interfaces should allow the user to select an approach. There are some users who like a blinking dot or a question mark. Enter the commands and get a text output. Others like the Endeca style training wheels, although I doubt if any of the modern “helper” interfaces know what Endeca offered. Other may want some other type of interface like a PhD approach; that is, push here, dummy. The point is: Why not allow the user to select the interface?
  3. Change is introduced for dark purposes. Catalina has many points of friction so that Apple can extend its span of control. Annoying? Sure is. Why doesn’t Apple tell the truth about these friction points? What? Tell the truth, are you crazy. Apple, like Facebook and Google, are doing what they can to protect their hegemony, and the user is the victim. Tough. The same logic applies to PubMed. Dollars to donuts there is a “reason” for the change, and it may be due to whimsy, money, or the need to demonstrate the team is actually doing something instead of just having meetings with contractors.

Net net: Search, as I wrote for Barbara Quint in the now departed magazine Searcher, search is dead. Each day the hope for a better, more appropriate way to locate online information becomes lost in the mists of time. Getting relevant information from PubMed or any modern systems is like trying to get the stone of Ollantaytambo to explain how the rocks moved eons ago.

Finding information today is more difficult than at any other time in my professional career. That’s a big problem.

Stephen E Arnold, May 24, 2020

Microsoft and Its Latest Search Innovation: Moving Past Fast? Nope

May 22, 2020

I read “Microsoft Search: Search Your Document Like You Search the Web.” Perhaps Microsoft did not get the reports about the demise of the Google Search Appliance. That “invention” made clear that searching a corporate content collection like you search the Web was not exactly the greatest thing since sliced bread. There were a number of reasons for the failure of the GSA. It was a black box. You know that mere mortals could not tune the relevance component. You know that it produced results that left employees wondering, “Where is the document I wrote yesterday?” You know that the corpus of Web content is different from the fruit cake of corporate content. Web search returns something because the system is rigged to find a way to display ads to the hapless searcher.

Contrast this with documents in the cloud, in different systems like that old AS/400 Ironsides application used by the warehouse supervisors, and content tucked away on employees’ USB drives, mobile phones, the oldest kid’s iPad, and on services a go to sales professional uses to store PowerPoints for “special” customers. Then there are the documents in the corporate legal office. The consultants’ reports scanned and stored on the Market Department’s computer kept for interns.

Nevertheless, the article explains:

We’re utilizing well-established web search technologies, such as query and document understanding, and adding deep learning based natural language models. This allows us to handle a much broader set of search queries beyond “exact match.”

Okay, query expansion, synonym look up, and Fast Search’s concept feature. But there’s more:

With the recent breakthroughs in deep learning techniques, you can now go beyond the common search term-based queries. The result is answers to your questions based on the document content. This opens a whole new way of finding knowledge. When you’re looking at a water quality report, you can answer questions like “where does the city water originate from? How to reduce the amount of lead in water?”

May I suggest that Microsoft and dozens of other enterprise search vendors have promised magical retrieval?

May I point out that the following content types are usually outside the ken of the latest and great enterprise search confection; for example:

  • Quality control data on parts stored in an Autodesk engineering document
  • Real time data flowing into an organization from sensors
  • Video content, audio content, and rich media like photographs
  • Classified or content restricted by certain constraints. (Access controls are often best implemented by specialized systems unknown to the greedy enterprise search indexing system.)
  • Documents obtained through an eDiscovery process for legal matters.

Has Microsoft solved these problems? Sure, if everything (note the logically impossible categorical affirmative) is in an Azure repository, it is conceivable that a user query could return a particular content object.

But that’s Microsoft fantasy land, and it is about as likely as Mr. Nadella arriving at work on the back of a unicorn.

Microsoft feels compelled to reinvent search every year or two. The longest journey begins with a single step. It is just that Microsoft took those steps decades ago and still has not reached the now rubbelized Fred Harvey’s.

Stephen E Arnold, May 22, 2020

Lucidworks: Buzzwording in the Pandemic

May 19, 2020

Lucid Imagination (the outfit which contributed some Lucene/Solr talent to Amazon search) renamed itself Lucidworks. The company then embarked on becoming a West Coast version of Fast Search & Transfer, a Splunk like outfit, and now a customer support provider.

That’s a remarkable trajectory for a company built on open source software with more than $200 million in funding since 2007.

One of the DarkCyber researchers spotted “Lucidworks Develops Deep Learning Solution to Make Chatbots Smarter.” The story appeared in a New Zealand online publication. That’s interesting, but more intriguing is that Lucidworks is following in the marketing footsteps of Attivio, Coveo, and other vendors of search and retrieval. The destination customer service. Who doesn’t love automated customer support chat robots, self serve Web sites with smart software, and the general extinction of individuals who actually know a company’s software or hardware products?

The write up states:

Deep learning is essential for automated chatbots to understand natural language questions and to provide the right answers, which is something that AI-powered search firm Lucidworks has taken on board.

And why?

According to Lucidworks, companies rely on digital portals to provide information to users, whether digital commerce customers looking for product information before purchase, employees hunting for an HR document, or someone looking for an airline’s updated cancellation policies. Information is often scattered across disparate silos and is impossible for a user to locate using natural language questions.

But smart software is available from Amazon with a credit card and some free training courses. Outfits from Algolia to Voyager Search offer the service.

What is interesting is the buzzword salad tossed into this reheated plastic container of mapo tofu:

  • AI (artificial intelligence)
  • Automated
  • Chatbots
  • Conversational
  • Deep learning
  • Digital portals
  • Engagement
  • Experiences
  • Fusion
  • Natural language
  • Satisfaction
  • User intent
  • Virtual assistants

Quite vocabulary and what seems an exercise in content marketing. Plus, eager customers in New Zealand will have an opportunity to help the company repay its investors the $200 million plus interest. That works out to 13 years in the enterprise search wilderness before arriving at chatbots.

Options abound and many of them are open source and well documented.

Stephen E Arnold, May 19, 2020

Boolean Is Better but Maybe Google Must Motor Through Ad Inventory by Relaxing Queries…a Lot?

May 17, 2020

A brief exchange on StackExchange demonstrates some common sense. One user, moseisley.2015, asks the community, “Should Default Search Behavior be ‘This AND That,’ or ‘This OR That’?” They elaborate:

“I have web application that shows lists of various data types … employees, customers, inventory items, orders, and so on. There’s one simple search field for doing a ‘global’ search … . Question is, when a user enters multi-word text in the field should the default search behavior be (1) this OR that or (2) this AND that? What default behavior do you think average users would expect?”

Their example lists four records: John Smith, John Jones, Michael Smith, and Betty Taylor-Smith. Would users expect the query “John Smith” to return just the first record (AND) or all four (OR)? As any online researcher from the ‘70s and ‘80s would tell you, the Boolean AND is the better default. The first respondent, SNag, sensibly writes:

“As a user, the more I type in, the more specific I’m expecting the results to get, and this is what happens with AND. With OR, your results would explode! If my search for popular Google Doodle games gave me everything that was popular, everything Google, everything Doodle and every game out there, I’d be lost! If you’re expecting your user to fetch all matching either John or Smith results, consider supporting syntax like John|Smith (where | is the logical symbol for OR) and placing a hint ? icon next to the search box to showcase the various supported syntaxes. You could also consider quotes in the search syntax for exact matches, where “Smith” wouldn’t match Taylor-Smith, but Smith would. “John”|”Smith” would then match all John and all Smith but not Betty Taylor-Smith.”

We concur. The second respondent, Big_Chair, adds a good observation—users without any programming background are probably unfamiliar with the | character and may need a more explicit cue that their query is about to return results based on OR rather than AND.

Cynthia Murrell, May 17 2020

Next Page »

  • Archives

  • Recent Posts

  • Meta