Autonomy CFO: Sentenced

May 14, 2019

I read ”Autonomy’s Former CFO Sushovan Hussain Sentenced to Five Years in Jail.” The article reported that Sushovan Hussain will be incarcerated for 60 months and then “subject to “a further three years ‘supervised release’.” In addition to the sentence, Mr. Hussain has been fined $4 million and another $6.1 million described as a “forfeiture payment.” This $6.1 is the money Mr. Hussain allegedly received as a result of the sale of Autonomy to Hewlett Packard. HP bought Autonomy for about $11 billion in 2011. (HP news release is here.)

The write up states:

In summing up, Breyer stated that Hussain had been involved in a “methodological long-term pattern” of making false statements and added that Hussain believed that in a high-growth business, such as Autonomy, future growth would effectively cover-up any false statements. Breyer also argued that Hussain had used his position to corrupt “a number of innocent people”, chivvying them into becoming a part of the fraud.

If you are unfamiliar with the technical details and some of Autonomy’s background, you will find a profile I wrote years ago in the Xenky archive. This is a version of my final report, and it has not been updated, but it provides some context for the interest Autonomy generated in its search, retrieval, and content processing systems.

The Register, a UK publication, provides periodic updates about the trial currently underway in England. You can locate these reports at Use the search function to locate the stories.

Some History of Enterprise Search

This sentence and fine was more aggressive than the judgment against the former Fast Search & Transfer founder, John Lervik, who after a series of legal processes, was cleared of wrong doing in 2016. Microsoft purchased Fast Search & Transfer in 2008.

Autonomy and Fast Search were the two vendors of enterprise search which were the most widely licensed information access systems in the period 2005 to 2010 when appetite for proprietary search began to decline. The acquisition of Vivisimo by IBM and the purchase of Exalead by Dassault did not lead to litigation. Other search vendors sold out or simply tried to reinvent themselves in a somewhat challenging search for revenues.  Today, the most widely used enterprise search system is Elasticsearch, which is available as open source software. Endeca has been absorbed into Oracle. Delphis and Entopia went out of business. OpenText rolled up a number of search companies, which are now largely forgotten; for example, Fulcrum and BRS. There are a number of interesting case studies waiting to be written; for example, the trajectory of Convera from “inventor” to consulting business, the fate of Verity and IBM’s Stairs as well as other companies helping to expand search’s version of the tulip craze centuries ago.

Stephen E Arnold, May 14, 2019

Google: What Does Relevance Mean?

May 11, 2019

Here’s the question for you: “What’s relevance?” The answer — if I understand the allegedly true information in “Google Creates ‘Dedicated Placement’ in Search results for AMP Stories, Starting with Travel Category” — is what Google decides you may see.

Forget the AMP thing because it is a content tiering play. No AMP, no display in a special section of results. Simple. Easy to understand, right?

Why is this important?

  1. Most users (searchers) accept what Google delivers, and Google delivers what generates revenue..
  2. The majority of users want convenience and will not want to spend time “looking for information”. (When one does not exert data energy, what one gets is good enough. Try to explain this information issue, the fish only know water. The world of gaseous oxygen is a tough concept.
  3. Users do not perceive the scope of the machinations which content producers and advertisers eager for clicks and eyeballs undertake in order to appear in the special AMP listing. Few care or have the knowledge foundation to discern the machinery grinding away.

Google pulls the strings. Relevance is what generates revenues or helps Google meet its objectives.

## puppet 300

Who controls relevance for a particular person looking for information?

Does this redefinition of relevance impact me and my DarkCyber researchers? No. The reason is that we know that search results on Google are skewed. We know content disappears from the index. We know that to track down a particular citation or document we have to resort to old fashioned methods. Phone calls, use of niche search tools, and even visits to libraries with information on microfilm are not unusual for us.

The problem is that for a majority of people looking for information online, those skills and the knowledge which lubricates their functioning is either gone or quickly eroding.

Try to find the US Army’s updated guideline for software procurement via Google? Try to locate information about Threatgrid and its connections to other security firms. Try to locate documents germane to the CMS MIC program which back up and sometimes replaces FBI personnel’s investigations of health care fraud. Try to find English language content about Moonwalk, a video service of considerable interest to some people.

For years, I have retained some interesting content because I know that content may not be findable the next time I use the “AMP’ed” up Google or the other aggressively filtering Web indexing systems. Sometimes you can hear my team’s teeth gnashing over the whine of our local storage systems.

I call this the findability crisis. Someone has public information, but others cannot find it. Therefore, that information is effectively unfindable or “gone.” Hasta la vista.” And there’s no, “I’ll be back” for these content objects.

With shallower indexing and deletion of “old” content (which some call either history or evidence), the world of free, ad supported Web search and retrieval is going medieval. To get information, one has to be one of the top one percent of information professionals.

Interesting? Only if one knows what’s happening, gentle reader.

Relevance? Yep, new definition. New world of information. Knowledge is not power. Knowledge is danger maybe?

Stephen E Arnold, May 11, 2019

Facebook Search: Fun for Some?

April 19, 2019

Ah, Facebook. The news about millions of exposed passwords was almost lost in the buzz sparked by the now infamous “Report.” Every week, it seems, there is a Facebook goodie to delight.

Despite its modest flaws, Facebook might be a social media network becoming a fave of the Mashable reports in “Facebook’s Search Feature Has Some Pretty Creepy Suggestions” about the firm’s search function.

Allegedly the Facebook search function allowed users to search for photos of women, but not men. Inti De Ceukelaire, a Belgian security researcher, discovered that when he typed in “photos of my female friends,” he got the desired results. However, doing the opposite with “photos of my male friends” yielded memes Risqué search phrases were also automatically suggested:

“That discrepancy is troubling enough, but it gets worse. While testing out these searches, the first automatically suggested query was “photos of my female friends in bikinis,” which returned photos of women in bikinis, as well as one image of a topless woman, which would appear to violate Facebook’s rules against nudity. Facebook removed the image  following Mashable’s inquiry. Separately, “photos of my female friends at the beach” was also suggested.”

Mashable continued to test the big and discovered more questionable searches that contained what might be thought of as a “creep” factor. Searches with male in the search phrase, though, were more innocuous. Facebook reports that suggested search phrases are not based on an individual user’s history, but all of Facebook. In other words,

Who coded this search function? Maybe some men? Men just having fun?

Whitney Grace, April 18, 2019

Quantum Search: Consultants, Rev Your Engines

April 18, 2019

Search is a utility function. A number of companies have tried to make it into a platform upon which a business or a government agency’s mission rests. Nope.

In fact, for a decade I published “Beyond Search” and just got tired of repeating myself. Search works if one has a bounded domain, controlled vocabularies, consistent indexing, and technology which embraces precision and recall.

Today, not so much. People talk about search and lose their grip on the accuracy, relevance, and verifiability of the information retrieved. It’s not just wonky psycho-economic studies which cannot be replicated. Just try running the same query on two different mobile phones owned by two different people.

Against this background, please, read “How the Quantum Search Algorithm Works.” The paper contains some interesting ideas; for example:

It’s incredible that you need only examine an NN-item search space on the order of \sqrt{N}N?times in order to find what you’re looking for. And, from a practical point of view, we so often use brute search algorithms that it’s exciting we can get this quadratic speedup. It seems almost like a free lunch. Of course, quantum computers still being theoretical, it’s not quite a free lunch – more like a multi-billion dollar, multi-decade lunch!

Yes, incredible.

However, the real impact of this quantum search write up will be upon the search engine optimization crowd. How quickly will methods for undermining relevance be found.

Net net: Quantum or not, search seems destined to repeat its 50 year history in a more technically sophisticated computational environment. Consultants, abandon your tired explanations of federated search. Forget mere geo-tagging. Drill right into the heart of quantum possibilities. I am eagerly awaiting a Forrester wave report on quantum search and a Gartner magic quadrant, filled with subjective possibilities.

Stephen E Arnold, April 18, 2019

Expert System: Interesting Financials

April 6, 2019

Expert System SpA is a firm providing semantic software that extracts knowledge from text by replicating human processes. I noticed information on the company’s Web site which informed me:

  • The company had sales revenues of 28.7 million euros for 2018
  • The company’s growth was 343 percent compared to 2017
  • The net financial position was 12.4 million euros up from 8.8 million euros in March 2017.

Remarkable financial performance.

Out of curiosity I navigated to Google Finance and plugged in Expert System Spa to see what data the GOOG could offer.

Here’s the chart displayed on April 6, 2019:


The firm’s stock does not seem to be responding as we enter the second quarter of 2019.

Read more

Netwrix Buys Concept Searching

April 5, 2019

Late last year we learned that Concept Searching was selling itself to Netwrix. I don’t pay much attention to “finding” solutions. I thought of Concept Searching in the context of the delay in awarding the JEDI contract. Concept Searching might be a nifty add on if Microsoft gets the $10 billion deal.

Concept Searching had positioned itself as an indexing outfit and taxonomy management tool. The company struck me as having a Microsoft-centric focus and dabbled in enterprise search and jousted with Smartlogic.

According to the company’s founder Martin Garland:

Concept Searching is excited about becoming a part of Netwrix. Merging our unique technology with its exceptional Netwrix Auditor product delivers a new level of protection to organizations concerned about data security, with the ability to identify and remediate personal or organizationally defined sensitive information, regardless of where it is stored or how it was ingested. The expanded team will enable us to be even more agile, increasingly responsive to our clients’ needs, and to deliver a platform for growth to both client bases and ensure we maintain our leadership position in delivering world-class metadata-driven solutions.

Netwrix is a software company focused exclusively on providing IT security and operations teams with pervasive visibility into user behavior, system configurations and data sensitivity across hybrid IT infrastructures to protect data regardless of its location. The company has 10,000 customers.

DarkCyber believes that like Exalead’s acquisition by Dassault or OpenText’s purchase of assorted search and retrieval systems, it will be interesting to watch how this acquisition works out.

Stephen E Arnold, April 5, 2019

Thoughts about Search: A Word That Means Almost Anything

April 3, 2019

We have long been frustrated that search technology has not come very far since its early days. Sure, Google has made tweaks over the years, but even many of those incremental changes are designed to maximize that company’s ad revenue. Now, though, AI technology may fundamentally change how we find information online. Forbes asks, “Might AI Spell the Death of Search?” Writer Michael Ashley observes:

“‘This is the first time since 1994 when the search paradigm has changed,’ says David Seuss, CEO of Northern Light, a Boston-based strategic research portal provider I consult with that offers a cloud-based SaaS to global enterprises. ‘In 1994, you went to a search box, filled in a query, hit the search button, and received a list of documents. You manually reviewed these, picking the most relevant item to download. Fast forward to 2019 and it’s still the same thing. Find me one other part of the tech landscape that has not changed since the ’90s, whether it be broadband, wireless, mobile cloud computing, artificial intelligence—everything has changed. Everything except search.’”

The man has a point. He also claims it is Millennials that are pushing for change. Older users were just so happy to search from their desks instead of in the library stacks, he posits, that most of them remain satisfied with 90s-style online search. The younger generation, though, find manually reviewing search results inefficient, and they recognize that a lot of good information tends to get buried later in the search results—especially as paid listings claim the top spots. Ashley writes:

“With the help of A.I., tasks once relegated to flesh and blood researchers can be now accomplished by computers. Drawing on the latter’s pattern-forming and predictive abilities, it can observe users’ actions, discerning their interests based on what they download, share, comment on or bookmark. Informed by this knowledge, an A.I. can proactively—and without manual prompting—recommend relevant content to users. Disrupting the traditional search model to its page ranking core, content can seek out the user instead of the other way around.”

Ah, Northern Light sails again with the AI flag whipping in the marketing breeze with help from puffs of insight in “Why Are So Many People Wasting Their Time with Web Search?” Trim the jib!

Cynthia Murrell, April 3, 2019

Echosec: Dark Web Search for Those Who Qualify

April 2, 2019

A Canadian company has devised a way to search the Dark Web without the hassle of the Tor browser or proxy servers. HotHardware reports: “Beacon, a Dark Web Search Engine Can Be Your Eyes in the Internet Underworld.” The catch—one must prove to the company behind Beacon, Echosec, that they have a legitimate reason to use the “Google of the Dark Web.” The intention, we’re told, is for organizations to monitor whether any of their sensitive data has made it onto a Dark Web marketplace. Reporter Rod Scher writes:

“This could include stolen corporate emails, company documents, personal info, or other such data that could be detrimental to a company, its brand, or its customers. After all, if your data has been compromised, it’s always better to know than not to know. …

We noted this statement:

“While [CTO Mike] Raypold notes that it is possible to misuse Beacon, since the tool makes it easier for users to locate data they might otherwise have difficulty finding, he says that the company has taken steps to mitigate that danger. ‘First, every Echosec customer must go through a use-case approval process to determine how the customer is using the application and to make sure they are in compliance with the vendors from whom the data Is sourced,’ says Raypold. ‘If a potential customer cannot pass the use-case approval process, they do not get access to the system.’ Second, the company has built automated tools and manual processes into its platform and into the company workflows to notify the Echosec team if users attempt to run searches that are in violation of their approved use case.”

Not only will Echosec know if a user violates their agreement, certain queries simply cannot be run through Beacon. The company shares their acceptable-use policy here, and it is thorough. Founded in 2013, Echosec is based in Vancouver, British Columbia. If you want to see selected screenshots of the system’s output, check out the Dark Cyber video for March 26, 2019, at this link.

Stephen E Arnold, February 27, 2019

Audio Search: Google Gets with the Program

March 27, 2019

Searching audio files has been difficult. Exalead, before Dassault bought the company, dabbled in audio search. One could key in a key word and jump to the segment of a file which contained the word or phrase. That was in 2006, maybe 2007. That was, despite my advanced age and inability to recall the innovations from search and retrieval wizards, more than a decade ago.

I read “Google Podcast in Episode Search Is Coming, Shows Now Being Fully Transcribed.” The write up reports:

Google Podcasts is now automatically generating transcripts of episodes and is using them as metadata to help listeners search for shows, even if they don’t know the title or when it was published.

I spoke with a person who translates audio recordings from one language into English. Here are some highlights from that chat:

  • “Even though I am a native speaker and fluent in English, it is very, very difficult to make out what some people are saying. I slow down the recording. I listen several times. I fiddle with the sound.”
  • “Accents pose a problem. For example, if a person is speaking one language but learned that language by osmosis, the pronunciation is often strange. In some cases, I have no idea what the person speaking is trying to communicate. Some people do not articulate or put the stresses where a native speaker puts them.
  • “Muddled sounds pose big challenges. I am not sure why but even modern recording equipment drops sounds. In some cases, rustling or tapping fuzzes what the person is saying.”

Net net: How accurate will the transcripts be? The answer is going to be like the accuracy scores for facial recognition? Maybe 50 percent to 75 percent accurate out of the gate. But better than nothing, when one wants to sell ads which match the translated key words, right? Will Steve Gibson stop creating transcripts of Security Now? Probably not.

Stephen E Arnold, March 27, 2019

Apple News: Another Search Fail?

March 26, 2019

Apple is a bit of a mystery to me. Example: Navigate to the Apple app store. Type in a word like “disc recovery”? What do you get? Which app does what? I need to recover now, not conduct a day long click, read, and compare. Now try: “bootable iso”? Helpful, right? A suggestion or a link to Apple help might be useful? Next what app is best for a particular task like hotel reservation? Give up yet? Now go to Garageband and enter in the help or search box, “no audio for the microphone”? Get any help from the help system? Tip: Look for audio HDMI in utilities on a Mac laptop. The volume sliders get reset. How? Who knows? What about finding a book in Apple iTunes’ audiobook section? Type in an author’s name but misspell it by omitting a letter? Learn to spell, gentle reader.

I thought of these example when I read “Apple News Plus Is a Fine Way to Read Magazines, but a Disappointment to Anyone Wishing for a Real Boost for the News Business.” I noted this statement in the article:

It’s actually a little hard to even find L.A. Times and Journal content in Apple News Plus because they don’t fit into the magazine UX it’s dependent on. Tap “Browse the Catalog” in Plus and you can scroll all day, but you’ll never find either paper, because they’re not contained in “issues.”

Findability. Stated another way, Apple is not particularly good at search and retrieval. Gloss and PR are covered. Finding information? Not on the radar in my opinion.

Stephen E Arnold, March 26, 2019

Next Page »

  • Archives

  • Recent Posts

  • Meta