Google: Can Semantic Relaxing Display More Ads?

June 10, 2019

For some reason, vendors of search systems have shuddered if a user’s query returns a null set. the idea is that a user sends a query to a system or more correctly an index. The terms in the query do not match entries in the database. The system displays a message which says, “No results match your query.”

For some individuals, that null set response is high value information. One can bump into null sets when running queries on a Web site; for example, send the anti fungicide query to the Arnold Information Technology blog at this link. Here’s the result:

image

From this response, one knows that there is no content containing the search phrase. That’s valuable for some people.

To address this problem, modern systems “relax” the query. The idea is that the user did not want what he or she typed in the search box. The search system then changes the query and displays those results to the stupid user. Other systems take action and display results which the system determines are related to the query. You can see these relaxed results when you enter the query shadowdragon into Google. Here are the results:

image

Google ignored my spelling and displays information about a video game, not the little known company Shadowdragon. At least Google told me what it did and offers a way to rerun the query using the word I actually entered. But the point is that the search was “relaxed.”

The purpose of semantic expansion is a variation of Endeca’s facets. The idea is that a key word belongs to a category. If a system can identify a category, then the user can get more results by selecting the category and maybe finding something useful. Endeca’s wine demonstration makes this function and its value clear.

Read more

Google Makes Search, Mmmmm, Better

June 7, 2019

First AR Objects Launch in Google Search with 3D Animals” reports that Google makes search better again. Search for an animal on a supported device while you are doing the Google Lens thing and you will see a three dimensional animal. I would be thrilled if a query returned relevant results. Plus, I am okay with relevant links directing me to a relevant document which may or may not contain an illustration. Ah, progress. What happens if Google reconnects with a robot company so that as one looks at an AR rendering of a tiger, a robot tiger comes to the user’s location and snarls. Relevant? Heck, yes.

Stephen E Arnold, June 7, 2019

Autonomy CFO: Sentenced

May 14, 2019

I read ”Autonomy’s Former CFO Sushovan Hussain Sentenced to Five Years in Jail.” The article reported that Sushovan Hussain will be incarcerated for 60 months and then “subject to “a further three years ‘supervised release’.” In addition to the sentence, Mr. Hussain has been fined $4 million and another $6.1 million described as a “forfeiture payment.” This $6.1 is the money Mr. Hussain allegedly received as a result of the sale of Autonomy to Hewlett Packard. HP bought Autonomy for about $11 billion in 2011. (HP news release is here.)

The write up states:

In summing up, Breyer stated that Hussain had been involved in a “methodological long-term pattern” of making false statements and added that Hussain believed that in a high-growth business, such as Autonomy, future growth would effectively cover-up any false statements. Breyer also argued that Hussain had used his position to corrupt “a number of innocent people”, chivvying them into becoming a part of the fraud.

If you are unfamiliar with the technical details and some of Autonomy’s background, you will find a profile I wrote years ago in the Xenky archive. This is a version of my final report, and it has not been updated, but it provides some context for the interest Autonomy generated in its search, retrieval, and content processing systems.

The Register, a UK publication, provides periodic updates about the trial currently underway in England. You can locate these reports at www.theregister.co.uk. Use the search function to locate the stories.

Some History of Enterprise Search

This sentence and fine was more aggressive than the judgment against the former Fast Search & Transfer founder, John Lervik, who after a series of legal processes, was cleared of wrong doing in 2016. Microsoft purchased Fast Search & Transfer in 2008.

Autonomy and Fast Search were the two vendors of enterprise search which were the most widely licensed information access systems in the period 2005 to 2010 when appetite for proprietary search began to decline. The acquisition of Vivisimo by IBM and the purchase of Exalead by Dassault did not lead to litigation. Other search vendors sold out or simply tried to reinvent themselves in a somewhat challenging search for revenues.  Today, the most widely used enterprise search system is Elasticsearch, which is available as open source software. Endeca has been absorbed into Oracle. Delphis and Entopia went out of business. OpenText rolled up a number of search companies, which are now largely forgotten; for example, Fulcrum and BRS. There are a number of interesting case studies waiting to be written; for example, the trajectory of Convera from “inventor” to consulting business, the fate of Verity and IBM’s Stairs as well as other companies helping to expand search’s version of the tulip craze centuries ago.

Stephen E Arnold, May 14, 2019

Google: What Does Relevance Mean?

May 11, 2019

Here’s the question for you: “What’s relevance?” The answer — if I understand the allegedly true information in “Google Creates ‘Dedicated Placement’ in Search results for AMP Stories, Starting with Travel Category” — is what Google decides you may see.

Forget the AMP thing because it is a content tiering play. No AMP, no display in a special section of results. Simple. Easy to understand, right?

Why is this important?

  1. Most users (searchers) accept what Google delivers, and Google delivers what generates revenue..
  2. The majority of users want convenience and will not want to spend time “looking for information”. (When one does not exert data energy, what one gets is good enough. Try to explain this information issue, the fish only know water. The world of gaseous oxygen is a tough concept.
  3. Users do not perceive the scope of the machinations which content producers and advertisers eager for clicks and eyeballs undertake in order to appear in the special AMP listing. Few care or have the knowledge foundation to discern the machinery grinding away.

Google pulls the strings. Relevance is what generates revenues or helps Google meet its objectives.

## puppet 300

Who controls relevance for a particular person looking for information?

Does this redefinition of relevance impact me and my DarkCyber researchers? No. The reason is that we know that search results on Google are skewed. We know content disappears from the index. We know that to track down a particular citation or document we have to resort to old fashioned methods. Phone calls, use of niche search tools, and even visits to libraries with information on microfilm are not unusual for us.

The problem is that for a majority of people looking for information online, those skills and the knowledge which lubricates their functioning is either gone or quickly eroding.

Try to find the US Army’s updated guideline for software procurement via Google? Try to locate information about Threatgrid and its connections to other security firms. Try to locate documents germane to the CMS MIC program which back up and sometimes replaces FBI personnel’s investigations of health care fraud. Try to find English language content about Moonwalk, a video service of considerable interest to some people.

For years, I have retained some interesting content because I know that content may not be findable the next time I use the “AMP’ed” up Google or the other aggressively filtering Web indexing systems. Sometimes you can hear my team’s teeth gnashing over the whine of our local storage systems.

I call this the findability crisis. Someone has public information, but others cannot find it. Therefore, that information is effectively unfindable or “gone.” Hasta la vista.” And there’s no, “I’ll be back” for these content objects.

With shallower indexing and deletion of “old” content (which some call either history or evidence), the world of free, ad supported Web search and retrieval is going medieval. To get information, one has to be one of the top one percent of information professionals.

Interesting? Only if one knows what’s happening, gentle reader.

Relevance? Yep, new definition. New world of information. Knowledge is not power. Knowledge is danger maybe?

Stephen E Arnold, May 11, 2019

Facebook Search: Fun for Some?

April 19, 2019

Ah, Facebook. The news about millions of exposed passwords was almost lost in the buzz sparked by the now infamous “Report.” Every week, it seems, there is a Facebook goodie to delight.

Despite its modest flaws, Facebook might be a social media network becoming a fave of the Mashable reports in “Facebook’s Search Feature Has Some Pretty Creepy Suggestions” about the firm’s search function.

Allegedly the Facebook search function allowed users to search for photos of women, but not men. Inti De Ceukelaire, a Belgian security researcher, discovered that when he typed in “photos of my female friends,” he got the desired results. However, doing the opposite with “photos of my male friends” yielded memes Risqué search phrases were also automatically suggested:

“That discrepancy is troubling enough, but it gets worse. While testing out these searches, the first automatically suggested query was “photos of my female friends in bikinis,” which returned photos of women in bikinis, as well as one image of a topless woman, which would appear to violate Facebook’s rules against nudity. Facebook removed the image  following Mashable’s inquiry. Separately, “photos of my female friends at the beach” was also suggested.”

Mashable continued to test the big and discovered more questionable searches that contained what might be thought of as a “creep” factor. Searches with male in the search phrase, though, were more innocuous. Facebook reports that suggested search phrases are not based on an individual user’s history, but all of Facebook. In other words,

Who coded this search function? Maybe some men? Men just having fun?

Whitney Grace, April 18, 2019

Quantum Search: Consultants, Rev Your Engines

April 18, 2019

Search is a utility function. A number of companies have tried to make it into a platform upon which a business or a government agency’s mission rests. Nope.

In fact, for a decade I published “Beyond Search” and just got tired of repeating myself. Search works if one has a bounded domain, controlled vocabularies, consistent indexing, and technology which embraces precision and recall.

Today, not so much. People talk about search and lose their grip on the accuracy, relevance, and verifiability of the information retrieved. It’s not just wonky psycho-economic studies which cannot be replicated. Just try running the same query on two different mobile phones owned by two different people.

Against this background, please, read “How the Quantum Search Algorithm Works.” The paper contains some interesting ideas; for example:

It’s incredible that you need only examine an NN-item search space on the order of \sqrt{N}N?times in order to find what you’re looking for. And, from a practical point of view, we so often use brute search algorithms that it’s exciting we can get this quadratic speedup. It seems almost like a free lunch. Of course, quantum computers still being theoretical, it’s not quite a free lunch – more like a multi-billion dollar, multi-decade lunch!

Yes, incredible.

However, the real impact of this quantum search write up will be upon the search engine optimization crowd. How quickly will methods for undermining relevance be found.

Net net: Quantum or not, search seems destined to repeat its 50 year history in a more technically sophisticated computational environment. Consultants, abandon your tired explanations of federated search. Forget mere geo-tagging. Drill right into the heart of quantum possibilities. I am eagerly awaiting a Forrester wave report on quantum search and a Gartner magic quadrant, filled with subjective possibilities.

Stephen E Arnold, April 18, 2019

Expert System: Interesting Financials

April 6, 2019

Expert System SpA is a firm providing semantic software that extracts knowledge from text by replicating human processes. I noticed information on the company’s Web site which informed me:

  • The company had sales revenues of 28.7 million euros for 2018
  • The company’s growth was 343 percent compared to 2017
  • The net financial position was 12.4 million euros up from 8.8 million euros in March 2017.

Remarkable financial performance.

Out of curiosity I navigated to Google Finance and plugged in Expert System Spa to see what data the GOOG could offer.

Here’s the chart displayed on April 6, 2019:

image

The firm’s stock does not seem to be responding as we enter the second quarter of 2019.

Read more

Netwrix Buys Concept Searching

April 5, 2019

Late last year we learned that Concept Searching was selling itself to Netwrix. I don’t pay much attention to “finding” solutions. I thought of Concept Searching in the context of the delay in awarding the JEDI contract. Concept Searching might be a nifty add on if Microsoft gets the $10 billion deal.

Concept Searching had positioned itself as an indexing outfit and taxonomy management tool. The company struck me as having a Microsoft-centric focus and dabbled in enterprise search and jousted with Smartlogic.

According to the company’s founder Martin Garland:

Concept Searching is excited about becoming a part of Netwrix. Merging our unique technology with its exceptional Netwrix Auditor product delivers a new level of protection to organizations concerned about data security, with the ability to identify and remediate personal or organizationally defined sensitive information, regardless of where it is stored or how it was ingested. The expanded team will enable us to be even more agile, increasingly responsive to our clients’ needs, and to deliver a platform for growth to both client bases and ensure we maintain our leadership position in delivering world-class metadata-driven solutions.

Netwrix is a software company focused exclusively on providing IT security and operations teams with pervasive visibility into user behavior, system configurations and data sensitivity across hybrid IT infrastructures to protect data regardless of its location. The company has 10,000 customers.

DarkCyber believes that like Exalead’s acquisition by Dassault or OpenText’s purchase of assorted search and retrieval systems, it will be interesting to watch how this acquisition works out.

Stephen E Arnold, April 5, 2019

Thoughts about Search: A Word That Means Almost Anything

April 3, 2019

We have long been frustrated that search technology has not come very far since its early days. Sure, Google has made tweaks over the years, but even many of those incremental changes are designed to maximize that company’s ad revenue. Now, though, AI technology may fundamentally change how we find information online. Forbes asks, “Might AI Spell the Death of Search?” Writer Michael Ashley observes:

“‘This is the first time since 1994 when the search paradigm has changed,’ says David Seuss, CEO of Northern Light, a Boston-based strategic research portal provider I consult with that offers a cloud-based SaaS to global enterprises. ‘In 1994, you went to a search box, filled in a query, hit the search button, and received a list of documents. You manually reviewed these, picking the most relevant item to download. Fast forward to 2019 and it’s still the same thing. Find me one other part of the tech landscape that has not changed since the ’90s, whether it be broadband, wireless, mobile cloud computing, artificial intelligence—everything has changed. Everything except search.’”

The man has a point. He also claims it is Millennials that are pushing for change. Older users were just so happy to search from their desks instead of in the library stacks, he posits, that most of them remain satisfied with 90s-style online search. The younger generation, though, find manually reviewing search results inefficient, and they recognize that a lot of good information tends to get buried later in the search results—especially as paid listings claim the top spots. Ashley writes:

“With the help of A.I., tasks once relegated to flesh and blood researchers can be now accomplished by computers. Drawing on the latter’s pattern-forming and predictive abilities, it can observe users’ actions, discerning their interests based on what they download, share, comment on or bookmark. Informed by this knowledge, an A.I. can proactively—and without manual prompting—recommend relevant content to users. Disrupting the traditional search model to its page ranking core, content can seek out the user instead of the other way around.”

Ah, Northern Light sails again with the AI flag whipping in the marketing breeze with help from puffs of insight in “Why Are So Many People Wasting Their Time with Web Search?” Trim the jib!

Cynthia Murrell, April 3, 2019

Echosec: Dark Web Search for Those Who Qualify

April 2, 2019

A Canadian company has devised a way to search the Dark Web without the hassle of the Tor browser or proxy servers. HotHardware reports: “Beacon, a Dark Web Search Engine Can Be Your Eyes in the Internet Underworld.” The catch—one must prove to the company behind Beacon, Echosec, that they have a legitimate reason to use the “Google of the Dark Web.” The intention, we’re told, is for organizations to monitor whether any of their sensitive data has made it onto a Dark Web marketplace. Reporter Rod Scher writes:

“This could include stolen corporate emails, company documents, personal info, or other such data that could be detrimental to a company, its brand, or its customers. After all, if your data has been compromised, it’s always better to know than not to know. …

We noted this statement:

“While [CTO Mike] Raypold notes that it is possible to misuse Beacon, since the tool makes it easier for users to locate data they might otherwise have difficulty finding, he says that the company has taken steps to mitigate that danger. ‘First, every Echosec customer must go through a use-case approval process to determine how the customer is using the application and to make sure they are in compliance with the vendors from whom the data Is sourced,’ says Raypold. ‘If a potential customer cannot pass the use-case approval process, they do not get access to the system.’ Second, the company has built automated tools and manual processes into its platform and into the company workflows to notify the Echosec team if users attempt to run searches that are in violation of their approved use case.”

Not only will Echosec know if a user violates their agreement, certain queries simply cannot be run through Beacon. The company shares their acceptable-use policy here, and it is thorough. Founded in 2013, Echosec is based in Vancouver, British Columbia. If you want to see selected screenshots of the system’s output, check out the Dark Cyber video for March 26, 2019, at this link.

Stephen E Arnold, February 27, 2019

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta