Autonomy: An Interesting Legal Document

August 4, 2021

Years ago I did some work for Autonomy. I have followed the dispute between Hewlett Packard and Autonomy. Enterprise search has long been an interest of mine, and Autonomy had emerged as one of the most visible and widely known vendors of search and retrieval systems.

Today (August 3, 2021) I read “Hard Drives at Autonomy Offices Were Destroyed the Same Month CEO Lynch Quit, Extradition Trial Was Told.” The write up contains information with which I was not familiar.

In the write up is a link to “In the City of Westminster Magistrates’ Court The Government of the United States of America V Michael Richard Lynch Findings of Fact and Reasons.” That 62 page document contains a useful summary of the HP – Autonomy deal.

Several observations:

  1. Generating sustainable revenue for an enterprise search system and ancillary technology is difficult. This is an important fact for anyone engaged in search and retrieval.
  2. The actions summarized in the document provide a road map of what Autonomy did to maintain its story of success in what has been for decades a quite treacherous market niche. Search is particularly difficult, and vendors have found marketing a heck of a lot easier than delivering a system that meets users’ expectations.
  3. The information in the document suggests that the American judicial system may find this case a “bridge” between how corporate entities respond to the Wall Street demands for revenue and growth.

Like Fast Search & Transfer, executives found themselves making decisions which make search and retrieval a swamp. Flash forward to the present: Google search is shot through with adaptations to online advertising.

Perhaps the problem is that people expect software to deliver immediate, relevant results. Well, it is clear that most of the search and retrieval systems seeking sustainable revenues have learned that search can deliver good enough results. Good enough is not good enough, however.

Stephen E Arnold, August 4, 2021

Search Atlas Demonstrates Google Search Bias by Location

July 28, 2021

An article at Wired reminds us that Google Search is not the objective source of information it appears to many users. We learn that “A New Tool Shows How Google Results Vary Around the World.” Researchers and PhD students Rodrigo Ochigame of MIT and Katherine Ye of Carnegie Mellon University created Search Atlas, an experimental Google Search interface. The tool displays three different sets of results to the same query based on location and language, illustrating both cultural differences and government preferences. “Information borders,” they call it.

The first example involves image searches for “Tiananmen Square.” Users in the UK and Singapore are shown pictures of the government’s crackdown on student protests in 1989. Those in China, or elsewhere using the Chinese language setting, see pretty photos of a popular tourist destination. Google says the difference has nothing to do with censorship—they officially stopped cooperating with the Chinese government on that in 2010, after all. It is just a matter of localized results for those deemed likely to be planning a trip. Sure. Writer Tom Simonite describes more of the tool’s results:

“The Search Atlas collaborators also built maps and visualizations showing how search results can differ around the globe. One shows how searching for images of ‘God’ yields bearded Christian imagery in Europe and the Americas, images of Buddha in some Asian countries, and Arabic script for Allah in the Persian Gulf and northeast Africa. The Google spokesperson said the results reflect how its translation service converts the English term ‘God’ into words with more specific meanings for some languages, such as Allah in Arabic. Other information borders charted by the researchers don’t map straightforwardly onto national or language boundaries. Results for ‘how to combat climate change’ tend to divide island nations and countries on continents. In European countries such as Germany, the most common words in Google’s results related to policy measures such as energy conservation and international accords; for islands such as Mauritius and the Philippines, results were more likely to cite the enormity and immediacy of the threat of a changing climate, or harms such as sea level rise.”

Search Atlas is not yet widely available, but the researchers are examining ways to make it so. They presented it at last month’s Designing Interactive Systems conference and are testing a private beta. Of course, the tool cannot reveal the inner workings of Google’s closely held algorithms. It does, however, illustrate the outsized power the company has over who can access what information. As co-creator Ye observes:

“People ask search engines things they would never ask a person, and the things they happen to see in Google’s results can change their lives. It could be ‘How do I get an abortion?’ restaurants near you, or how you vote, or get a vaccine.”

The researchers point to Safiya Noble’s 2018 book “Algorithms of Oppression” as an inspiration for their work. They hope their project will bring the biased nature of search algorithms to the attention of a broader audience.

Cynthia Murrell, July 28, 2021

A Xoogler Wants to Do Search: Channeling Exalead and Smoking InfinitySearch?

July 16, 2021

Remember Exalead. This was a search engine created by a person who was asked to join the Google. The system was very good: 64 bit architecture, timely indexing of new and previously indexed sites, and novel features like searching via text for a specific point in an Exalead processed video. Now the system is part of Dassault Systèmes because senior management grew frustrated with one of the aggressively marketed “smart systems” available in the mid 2000s.

Now a Xoogler realizes that Google search is just an artifact of the Backrub search and retrieval system. What was “clever” in 1998 now generally a version of MySpace.com. Maybe anigifs are in the fridge waiting to become the next big thing at the GOOG.

Now there’s a new Google called Neeva, a subscription-based, allegedly non-tracking, ad-free alternative to Google. Plus, Neeva, is out of beta—let the marketing begin! Fast Company explores the new search engine and its developers in depth in, “Inside Neeva, the Ad-Free, Privacy-First Search Engine from Ex-Googlers.” (Keep in mind that InfinitySearch.co is a new search engine with an almost identical subscription business model. Haven’t heart of InfinitySearch? Hmmm. What about Okeano? Oh, not that system either? Hmmm.)

Co-founders Sridhar Ramaswamy and Vivek Raghunathan, who both used to work at Google, had front-row seats to the dominant search engine’s evolution. They were unhappy to see advertising become more and more intrusive over the years. They are betting many users are ready to pay a $4.95 a month to access what Google could have been if it were not in hot pursuit of the almighty ad dollar. Anyone who has been googling for years has watched ads migrate from a relatively unobtrusive position on the right of the page to the top of search results. For a while after that shift they were delineated by a shaded box, but now they suspiciously blend into the organic results. Google also started pushing links to its own services to the top, even when a competitor might better serve the searcher’s needs. The Fast Company write up states:

“Then there’s the fact that Google builds profiles of its users based on their online activity, the better to precisely target them with advertising not only at its own sites but all the other ones across the web whose ads are powered by Google. With no ads to serve up, Neeva shouldn’t leave privacy-conscious types feeling like they’re being monitored for ulterior purposes. (By default, Neeva does hold onto your searches for 90 days to improve the quality of features such as autosuggestions, but you can erase this log or tell the service you don’t want it to keep it in the first place.) In another break from search-engine tradition, Neeva says that it will turn at least 20 percent of its top-line revenue over to publishing partners, including the first two it’s announced, Quora and Medium. Though the details of where this could lead remain vague, it’s another attempt to set Neeva apart from Google, which has often been accused of benefiting from media outlets’ content without adequate compensation, a long-simmering dispute that has led to lawsuits and legislation.”

The founders hired on several other ex-Googlers. The team worked to create a platform that is close enough to their former employer’s to feel familiar while nixing all the advertising misery. To do this, Neeva blends its own indexing with results from Apple, Bing, Yelp, Intrinio, Weather.com, Xignite, and even Google Maps. McCracken reports the platform performs well for most tasks, falling short only on local searches. There is also the small inconvenience that, as of this writing, Chrome is the only browser that lets one set Neeva as the default search platform. Is this an acquisition-friendly move. See the Fast Company article for more on Neeva’s features as well as details on Ramaswamy’s and Raghunathan’s experiences that led them down the path to this adventure.

And you can check out Exalead search at this link. Yep, still online. May I suggest the Web, video, and forums search be expanded and enhanced. As I said, it was quite good.

Cynthia Murrell, July 16, 2021

Another New Web Search Engine: Zajjle

July 15, 2021

The DarkCyber research team tries to keep up with the Web search engines. Rarely does one come along with an index of content not generally included in the Bing, Google, Yandex systems or the new-kids-on-the-block like Metager or Neeva. According to “Arabic Search Engine plus Webmail and Data Analytics is Now Available at Zajjle”:

Zajjle is an Arabic search engine with many advantages. Besides offering the primary function as a search engine in the Arabic language, it also has additional features like webmail and website statistics & data analytics. People in Middle East countries can access the website in the Arabic language for searching the current news, website address, videos, and photos.

The write up states:

Ahmad A Najar, the Zajjle founder, is an entrepreneur who established CatchFood and Mazra3a.net. CatchFood is successful web-based restaurant management and food delivery platform. It has served countries like the United Arab Emirates, Jordan, Palestine, Iraq, Syria, and Lebanon, America, and Canada. Mazra3a.net is a popular Arabic agricultural platform connecting people interested in the agriculture industry to change ideas, increase knowledge, and communicate with professionals in the agriculture industry.

Will Zajjle index content deep in the US Department of Energy’s public facing Web sites? Will it snag content in Streamgun? What about content censored by mainstream systems?

You can explore the site at this link: http://www.zajjle.com/

Stephen E Arnold, July 15, 2021

Brave Privacy-Centric Search Arrives

July 5, 2021

Several online services that emphasize privacy have emerged in recent years, including the Brave browser. The San Francisco-based company is not stopping there. We learn from TechCrunch that “Brave’s Nontracking Search Engine is Now in Beta.” Earlier this year, Brave acquired Cliqz, a company which had developed an anti-tracking browser with its own search platform. That system’s Tailcat technology will underpin Brave Search. The company also offers Brave Ads, a way to make money while preserving users’ privacy. Brave is different from other non-tracking Google alternatives like DuckDuckGo because it is using its own, independent index. Reporter Natasha Lomas writes:

“Brave touts its eponymous search offering as having a number of differentiating features versus rivals (including smaller rivals) — such as its own index, which it also says gives it independence from other search providers. Why is having an independent index important? We put that question to Josep M. Pujol, chief of search at Brave, who told us: ‘… More choices will entail more freedom and also get back to real competition, with checks and balances. Choice can only be achieved by being independent, as if we do not have our own index, then we are just a layer of paint on top of Google and Bing, unable to change much or anything` in the results for users’ queries. Not having your own index, as with certain search engines, gives the impression of choice, but in reality such engine “skins’” are the same players as the big-two. Only by building our own index, which is a costly proposition, will we be in a position to offer true choice to the users for the benefit of all, whether they are Brave Search users or not.’”

It should be noted that for certain functions, like image searches, Brave currently relies on other search providers to ensure relevant results. For now. Otherwise it turns to anonymized community contributions to refine its index’s results and will soon provide “community-curated open ranking models” in an effort to combat censorship and AI bias. The company plans to offer both a free option supported by ads and a paid, ad-free version we are told will be “affordable.”

We are running test queries, and the results are promising. There are other services becoming available too. I like Swisscows.ch. But I like cows.

Let us hope Brave’s efforts result in an index that is better than what has gone before. The increasing number of search options is a signal that Google search has failed in its basic mission. The problem is that millions don’t know what they are missing. Undisclosed omissions and obfuscated distortions are worse than guessing or asking friends.

Cynthia Murrell, July 5, 2021

Google and Unreliable Results: Like the Jack Benny One Liner, I Am Thinking, I Am Thinking

June 25, 2021

I read a “real” news story called “Google Is Starting to Warn Users When It Doesn’t Have a Reliable Answer.”  (No, I will not ask, What’s reliable mean.)

Here’s the statement which snagged my attention in the write up:

“When anybody does a search on Google, we’re trying to show you the most relevant, reliable information we can,” said Danny Sullivan, a public liaison for Google Search. “But we get a lot of things that are entirely new,” Sullivan said the notice isn’t saying that what you’re seeing in search results is right or wrong — but that it’s a changing situation, and more information may come out later.

I think Mr. Sullivan, a former search engine optimization guru and conference organizer, is the “new” Matt Cutts, a Google professional helping to point the way to the digital future at the US government. Is key word packing the path to more patents than China?

I loved this statement which I know is pretty Tasmanian devil like: “Most relevant, reliable information we can.” I did a query for garage floor epoxy coating in Louisville. I gathered about 20 businesses display on the first two pages of Google search results. Two companies were in this business. Others were out of business. One “company” called me back and said, “My loser son has been gone for two years.”

I have other examples as well of search either being out of date, spoofed, or just weird.

Let’s look at some of the reasons why Google made a statement about “reliable answers.”

First, I think the difficulty of providing real-time indexing is beyond three Google capabilities: Outfits with real time content won’t play ball with Google unless Google pays up and works out a mechanism to move the content to a Google indexing queue. (Yep, queue as in long line at the McDonald’s drive through.)

Second, Google is not set up to do real time. I think the notion of having a short list of “must ping frequently sites” may be a hold over from the distant past. The reason? As the cost of indexing, updating, and making the Google indexes “consistent” – some of the practices no longer fit the current iteration of “relevant” and “reliable.” Google is not Twitter, and it is not Facebook. Therefore, the pipelines for real time content simply don’t exist. Googlers tried but seemed to be better at selling ads than dealing with new content types.

Third, hot info appears in non text form on Instagram, TikTok, and even places like DailyMotion and Vimeo sometimes days before the content plops into YouTube. Ever try to locate a video using the creator assigned index terms. That’s an exercise in futility. Ads, gentle reader, not relevant and reliable information.

From my vantage point on the porch overlooking a mine drainage pond, I have some hypotheses:

  1. Google is under financial pressure, a competitive pressure from Amazon and Facebook, and a legal pressure. Almost any nation state with an appetite to drag the Google into court is in gear.
  2. Google is just not able to handle the real time flows of content, either textual or imagery. Too bad, but that’s the excitement of Hegel’s these, antithese, synthese which “real” Googlers learn along with search engine optimization marketing methods.
  3. Google’s propagandistic and jingoistic assurances that it returns relevant and reliable results is more and more widely seen as key word spam.
  4. Google’s management methods are not tuned for the current business environment. I may be alone in noticing that high school science club thinking and management from assumed superiority is out of favor. (If Sergey Brin were to ride a Russian rocket into space, wou8ld he attract more signatures that Jeff Bezos. The quasi referendum did not want Mr. Bezos to return to earth. Mr. Brin’s ride did not materialize, so I won’t know who “won” the most votes.)

Net net: Relevant and reliable. That’s a line worthy of Jack Benny when he is asked about Fred Allen. I give up, “What does ‘reliable’ mean, Googlers.” My suggestion is marketing hoo haa with metatags.

Stephen E Arnold, June 25, 2021

X1 Embraces Social Media and Collections Search for eDiscovery

June 22, 2021

It looks like eDiscovery firm X1 is moving beyond enterprise search into collection-centric search. They have sent around a memo announcing their “Defensible Social Media and Web Collections On-Demand.” The notice explains:

“Taking on the process of capturing evidence in a forensically sound manner can be challenging, time consuming and sometimes impossible with ever-increasing workloads. Why not outsource the collection portion of the process by letting our team of experts perform the job for you? With X1 Social Discovery On-Demand, X1’s forensic experts capture the data you need in a legally defensible manner, alleviating any headaches or worry about ESI collection.

  • Social Media and Web Capture Collection – save critical time by leveraging the expertise of X1 for efficient and accurate collections
  • Defensible Metadata Collection – unlike the ‘print screen’ approach, with X1 key metadata is included with all captures and deliverables with chain of custody preserved throughout
  • Experienced Service Support – X1 works with you to understand your collection scope and deliverable requirements up front bringing timely, authenticated results
  • Choose from Several Different Export Options – Concordance Load File, CSV, PDF, HTML for clear and accurate output
  • Get Started Right Away – engage with an X1 Solutions Consultant and start the collection process same-day”

Information on the X1 Social Discovery and the X1E Remote Collection On-Demand can be found on the company’s products page. At the time of this writing, new customers can save 50% off their first collection for up to 10 accounts. We do not know how long the “limited time” offer will last. We also cannot speak for or against the solutions since we have not tried them ourselves. We find the development interesting, though. Founded in 2003, the evolving small business is based in Los Angeles.

Search is becoming policeware.

Cynthia Murrell, June 22, 2021

Ninfex Is a New Take on Internet Search

June 10, 2021

The creator of “experimental” search site Ninfex is trying a different approach that uses neither crawlers nor AI. The site’s Hello page explains:

“We rely on 2 proxies for search relevance. First: URL score (user votes). Second: Links to discussions on major forums. All submissions on Ninfex are votable by users. When you submit a link, you can also submit up to 5 supporting links (to discussions about that link) from external forums. Among the current submissions you are most likely to see forum links to reddit, hackernews, lobsters, stackexchange & tildes because those are the forums that I frequent most often.”

Yes, the young site still leans heavily toward material based on its maker’s interests, but that could change as its usership grows. The founder, who goes by the name traindreams, writes:

“I am the maker of Ninfex and right now I’m actively pushing to build the index around my personal wiki / research notes / bookmarks. That is why, the home feed mostly contains topics of my interest. The following is a list of those topics.”

See the page to explore that list of diverse and interesting topics, from Art to Psychology to Startups. Perhaps you will be inspired to vote or add a link. Traindreams has already made some changes based on user feedback, like cleaning up the UI, and promises more to come as the index grows. It looks like the idea is quality over quantity; we are curious to see if the enterprise takes off.

Cynthia Murrell, June 10, 2021

More Microsoft Finger Pointing: Not 1,000 Programmers, Just One

June 9, 2021

I got a kick out of “Microsoft Blames Human Error Amid Suspicion It Censored Bing Results for Tiananmen Square Tank Man.” The tank man reference refers to an individual who stood in front of a tank. Generally this is not a good idea because visibility within tanks is similar to that from a Honda CR-Z. Hold that. The tank has better visibility. Said tank continued forward, probably without noticing a slight impediment.

The story talks not about visibility; its focus is on Microsoft (yep, the SolarWinds’ and new Windows’ outfit). I read:

Throughout Friday afternoon, using the image search function on Microsoft-operated Bing using the words “Tank Man” returned the message, “There are no results for tank man / Check your spelling or try different keywords.” (According to Motherboard, the same is true in other countries outside the U.S., including France and Switzerland.)

DuckDuck and Yahoo search presented a similar no results message. These are metasearch systems eager to portray themselves as much, much more.

Why? The article reports:

Microsoft has done business in China for decades, and Bing is accessible there. Like competitors such as Apple, the company has long complied with the whims of Chinese censors to maintain access to the country’s massive market, and it purges Bing results within China of information its government deems sensitive. However, the company said that blocking image results for “Tank Man” in the U.S. was not intentional and the issue was being addressed. “This is due to an accidental human error and we are actively working to resolve this…”

Could a similar error been responsible for recent security lapses at the Redmond Defender office?

And no smart software, no rules-based instruction, and no filters involved in this curious search result?

Nope. I believe everything I read online about Microsoft. Call me silly.

Stephen E Arnold, June 9, 2021

Technical Debt: Paying Down Despite Disaster Waiting in the Wings

June 7, 2021

Some interesting ideas appear in “10 Ways to Prevent and Manage Technical Debt—Tips from Developers.” The listicle is not particularized on a specific application or service. Let me convert a few of the points in the article to the challenges that vendors of information retrieval software have to meet in a successful manner.

I am not tracking innovations in search the way I did when I wrote the first three editions of the Enterprise Search Report many years ago. Search technology, despite the hooting of marketers and “innovators” who don’t know much about the 50 year plus history of finding, search has not made much progress. In fact, if I were still giving talks at search-related events, I would present data showing that “findability” has regressed. Now to the matter at hand.

I am not sure most people understand what technical debt is in general and even fewer apply the concept to search and retrieval. To keep it simple, technical debt is not repairing and servicing your auto. You do just enough to keep the Nash Rambler on the road. Then it dies. You find that parts are tough to find and expensive to get. If you want to do the job “right,” you will find that specialists are on hand to make that hunk of junk gleam. Get out your checkbook and write small. You will be filling in some big numbers. Search is that Nash Rambler but you have a couple of Metropolitans and a junker of a 1951 Nash Ambassador sitting in your data center. You can get stuff from A to B, but each trip becomes more agonizing. Then you have to spend.

Technical debt is the amount you have to spend to get back up and running plus the lost revenue or estimated opportunity cost. These numbers are the cost of the hardware, software, knick knacks, and humans who sort of know what to do.

What about search? Let’s take three of the items identified in the article and consider them in terms of what is often incorrectly described as “enterprise search.” My work over the years has documented the fact that there is no enterprise search. Shocking? Think about it. Employees cannot find the video of that Zoom meeting or the transcript automagically prepared this morning. And that sales presentation with the new pricing? Oh, right, that’s on the VP’s laptop and it won’t connect to the cloud archiving system because the wizard executive has trouble opening a hotel room with the keycard. Like I said, “Wizard.”

Item number 2 in the article is “Embed technical debt management into the company culture.” Ho ho ho. The present state of play is to get something up and running, dump on features, and generate revenue, some revenue, any revenue. In many organizations, the pressure to move the needle trumps any weird ideas to go back and fix the plumbing. How often is the core of Google’s search and retrieval reworked? Yeah, not often and every year the job becomes less and less desirable. The legions of Xooglers who worked on the system are unlikely to return to the digital Disneyland to do this work even for dump trucks filled with Ethereum.

Item number 5 is “Make technical debt a priority in open source culture.” Okay, let’s think about open source search. Have you through about Sphinx recently. What about Xapian? The big dogs are under intense pressure from the real champions of open source like Amazon and everyone’s favorite security company Microsoft. The individuals who do the bulk of the work struggle to make the darned thing work on the latest and greatest platforms and operating systems. The more outfits like Amazon pressure Elastic, the less likely the humans who work on Lucene and Solr will be able to fend off complete commercialization. Hey, there’s always consulting work or a job at IBM, another cheerleader for open source. So priority? Right.

Now item number 6 in the article. It is “Choose a flexible architecture.” What does this mean for search and retrieval. Most search and search centric applications like policeware and intelware are mashups of open source, legacy code left over from another project, and intern-infused scripts. The “architecture” is whatever was easiest and most financially acceptable. Once those initial decisions are made or simply allowed to happen because someone knew someone, the systems are unlikely to change. Fixing up something that sort of works is similar to the stars of VanWives repairing their ageing vehicle while driving in the rain. Ain’t gonna happen.

Net net: Technical debt for most organizations is what will bring down the company. Innovations slows to a crawl and becomes a series of add ons, wrappers, and strapping tape patches. Then boom. A competitor has blown the doors off the incumbent, customers just cancel contracts for enterprise search systems, or the once valued function becomes a feature for a more important application.  Technical debt, like a college grad’s student loan, is a stress inducer. Stress can shorten one’s life and kill. The enterprise search market is littered with the corpses of outfits terminated from technical debt denial syndrome.

Stephen E Arnold, June 7, 2021

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta