Search: Useless Results Finally Recognized?

August 22, 2019

I cannot remember how many years ago it was since I wrote “Search Sucks” for Barbara Quint, the late editor of Searcher. I recall her comment to me, “Finally, someone in the industry speaks out.”

Flash forward a decade. I can now repeat her comment to me with some minor updating: “Finally someone recognized by the capitalist tool, Forbes Magazine, recognizes that search sucks.

The death of search was precipitated by several factors. Mentioning these after a decade of ignoring Web search still makes me angry. The failure of assorted commercial search vendors, the glacial movement of key trade associations, and the ineffectuality of search “experts” still makes me angry.

Image result for fake information

There are other factors contributing to the sorry state of Web search today. Note: I am narrowing my focus to the “free” Web search systems. If I have the energy, I may focus on the remarkable performance of “enterprise search.” But not today.

Here are the reasons Web search fell to laughable levels of utility:

  1. Google adopted the GoTo / Overture / Yahoo approach to determining relevance. This is the pay-to-play model.
  2. Search engine optimization “experts” figured out that Google allowed some fiddling with how it determined “relevance.” Google and other ad supported search systems then suggested that those listings might decay. The fix? Buy ads.
  3. Users who were born with mobile phones and flexible fingers styled themselves “search experts” along with any other individual who obtains information by looking for “answers” in a “free” Web search system.
  4. The willful abandonment of editorial policies, yardsticks like precision and recall, and human indexing guaranteed that smart software would put the nails in the coffin of relevance. Note: artificial intelligence and super duped automated indexing systems are right about 80 percent of the time when hammering scientific, technical, and engineering information. Toss is blog posts, tweets, and Web content created by people who skipped high school English and the accuracy plummets. Way down, folks. Just like facial recognition systems.

The information presented in “As Search Engines Increasingly Turn To AI They Are Harming Search” is astounding. Not because it is new, but because it is a reflection of what I call the Web search mentality.

Here’s an example:

Yet over the past few years, search engines of all kinds have increasingly turned to deep learning-powered categorization and recommendation algorithms to augment and slowly replace the traditional keyword search. Behavioral and interest-based personalization has further eroded the impact of keyword searches, meaning that if ten people all search for the same thing, they may all get different results. As search engines depreciate traditional raw “search” in favor of AI-assisted navigation, the concept of informational access is being harmed and our digital world is being redefined by the limitations of today’s AI.

The problem is not artificial intelligence.

Read more

New Jargon: Consultants, Start Your Engines

July 13, 2019

I read “What Is “Cognitive Linguistics“? The article appeared in Psychology Today. Disclaimer: I did some work for this outfit a long time ago. Anybody remember Charles Tillinghast, “CRM” when it referred to people, not a baloney discipline for a Rolodex filled with sales lead, and the use of Psychology Today as a text in a couple of universities? Yeah, I thought not. The Ziff connection is probably lost in the smudges of thumb typing too.

Onward: The write up explains a new spin on psychology, linguistics, and digital interaction. The jargon for this discipline or practice, if you will is:

Cognitive Linguistics

I must assume that the editorial processes at today’s Psychology Today are genetically linked to the procedures in use in — what was it, 1972? — but who knows.

excited fixed

Here’s the definition:

The cognitive linguistics enterprise is characterized by two key commitments. These are:
i) the Generalization Commitment: a commitment to the characterization of general principles that are responsible for all aspects of human language, and
ii) the Cognitive Commitment: a commitment to providing a characterization of general principles for language that accords with what is known about the mind and brain from other disciplines. As these commitments are what imbue cognitive linguistics with its distinctive character, and differentiate it from formal linguistics.

If you are into psychology and figuring out how to manipulate people or a Google ranking, perhaps this is the intellectual gold worth more than stolen treasure from Montezuma.

Several observations:

  1. I eagerly await an estimate from IDC for the size of the cognitive linguistics market, and I am panting with anticipation for a Garnter magic quadrant which positions companies as leaders, followers, outfits which did not pay for coverage, and names found with a Google search at Starbuck’s south of the old PanAm Building. Cognitive linguistics will have to wait until the two giants of expertise figure out how to define “personal computer market”, however.
  2. A series of posts from Dave Amerland and assorted wizards at SEO blogs which explain how to use the magic of cognitive linguistics to make a blog page — regardless of content, value, and coherence — number one for a Google query.
  3. A how to book from Wiley publishing called “Cognitive Linguistics for Dummies” with online reference material which may or many not actually be available via the link in the printed book
  4. A series of conferences run by assorted “instant conference” organizers with titles like “The Cognitive Linguistics Summit” or “Cognitive Linguistics: Global Impact”.

So many opportunities. Be still, my heart.

Cognitive linguistics — it’s time has come. Not a minute too soon for a couple of floundering enterprise search vendors to snag the buzzword and pivot to implementing cognitive linguistics for solving “all your information needs.” Which search company will embrace this technology: Coveo, IBM Watson, Sinequa?

DarkCyber is excited.

Stephen E Arnold, July 13, 2019

Amazon and YouTube: The Hong Kong Protests Mark the Day that Twitch.tv Made Clear the Limitations of YouTube

June 16, 2019

I heard there was a small protest underway in Hong Kong. The time is now 6 30 am US Eastern time. I navigated to YouTube, entered the query “Hong Kong protest”, and I saw links to videos from a day ago (today is June 16, 2019). I navigated to the YouTube “Live” page which provides a limited selection of streaming videos on YouTube. If you have not seen that somewhat incomplete index, navigate to https://www.youtube.com/live. No live stream of the Hong Kong protest.

If it’s not on YouTube, then it doesn’t exist, goes some old times’ catchphrase.

Well, not quite.

Navigate to Amazon’s Twitch.tv. Run a query for Hong Kong. Here’s what I saw before I clicked on the live stream of Unable to Breath.

image

Amazon Twitch.tv search result. The Unable to Breath stream is not one but an aggregate of eight separate feeds from Hong Kong.

Front and center was a link to Unable to Breath, which presents this streaming image:

image

This is a screen shot of a single screen which is eight different feeds showing different views of the handful of people who are participating in the event. Note: Handful means more than one million.

Notice that three are eight live streams of this modest protest. This is one live stream with eight separate views of the modest demonstration in Hong Kong. Eight in one stream! No registration required. No in stream pop up ads. Just high value intelligence in pretty good streaming video quality.

Read more

Google: Can Semantic Relaxing Display More Ads?

June 10, 2019

For some reason, vendors of search systems have shuddered if a user’s query returns a null set. the idea is that a user sends a query to a system or more correctly an index. The terms in the query do not match entries in the database. The system displays a message which says, “No results match your query.”

For some individuals, that null set response is high value information. One can bump into null sets when running queries on a Web site; for example, send the anti fungicide query to the Arnold Information Technology blog at this link. Here’s the result:

image

From this response, one knows that there is no content containing the search phrase. That’s valuable for some people.

To address this problem, modern systems “relax” the query. The idea is that the user did not want what he or she typed in the search box. The search system then changes the query and displays those results to the stupid user. Other systems take action and display results which the system determines are related to the query. You can see these relaxed results when you enter the query shadowdragon into Google. Here are the results:

image

Google ignored my spelling and displays information about a video game, not the little known company Shadowdragon. At least Google told me what it did and offers a way to rerun the query using the word I actually entered. But the point is that the search was “relaxed.”

The purpose of semantic expansion is a variation of Endeca’s facets. The idea is that a key word belongs to a category. If a system can identify a category, then the user can get more results by selecting the category and maybe finding something useful. Endeca’s wine demonstration makes this function and its value clear.

Read more

Nosing Beyond the Machine Learning from Human Curated Data Sets: Autonomy 1996 to Smart Software 2019

April 24, 2019

How does one teach a smart indexing system like Autonomy’s 1996 “neurodynamic” system?* Subject matter experts (SMEs) assembled training collection of textual information. The article and other content would replicate the characteristics of the content which the Autonomy system would process; that is, index and make searchable or analyzable. The work was important. Get the training data wrong and the indexing system would assign metadata or “index terms” and “category names” which could cause a query to generate results the user could perceive as incorrect.

image

How would a licensee adjust the Autonomy “black box”? (Think of my reference to Autonomy and search as a way of approaching “smart software” and “artificial intelligence.”)

The method was to perform re-training. The approach was practical and for most content domains, the re-training worked. It was an iterative process. Because the words in the corpus fed into the “black box” included new words, concepts, bound phrases, entities, and key sequences, there were several functions integrated into the basic Autonomy system as it matured. Examples ranged from support for term lists (controlled vocabularies) and dictionaries.

The combination of re-training and external content available to the system allowed Autonomy to deliver useful outputs.

Where the optimal results departed from the real world results usually boiled down to several factors, often working in concert. First, licensees did not want to pay for re-training. Second, maintenance of the external dictionaries was necessary because new entities arrive with reasonable frequency. Third, testing and organizing the freshening training sets and the editorial work required to keep dictionaries ship shape was too expensive, time consuming, and tedious.

Not surprisingly, some licensees grew unhappy with their Autonomy IDOL (integrated data operating layer) system. That, in my opinion, was not Autonomy’s fault. Autonomy explained in the presentations I heard what was required to get a system up and running and outputting results that could easily hit 80 percent or higher on precision and recall tests.

The Autonomy approach is widely used. In fact, wherever there is a Bayesian system in use, there is the training, re-training, external knowledge base demand. I just took a look at Haystax Constellation. It’s Bayesian and Haystax makes it clear that the “model” has to be training. So what’s changed between 1996 and 2019 with regards to Bayesian methods?

Nothing. Zip. Zero.

Read more

Expert System: Interesting Financials

April 6, 2019

Expert System SpA is a firm providing semantic software that extracts knowledge from text by replicating human processes. I noticed information on the company’s Web site which informed me:

  • The company had sales revenues of 28.7 million euros for 2018
  • The company’s growth was 343 percent compared to 2017
  • The net financial position was 12.4 million euros up from 8.8 million euros in March 2017.

Remarkable financial performance.

Out of curiosity I navigated to Google Finance and plugged in Expert System Spa to see what data the GOOG could offer.

Here’s the chart displayed on April 6, 2019:

image

The firm’s stock does not seem to be responding as we enter the second quarter of 2019.

Read more

Facebook: Ripples of Confusion, Denial, and Revisionism

March 18, 2019

Facebook contributed to an interesting headline about the video upload issue related to the bad actor in New Zealand. Here’s the headline I noted as it appeared on Techmeme’s Web page:

image

The Reuters’ story ran a different headline:

image

What caught my attention is the statement “blocked at upload.” If a video were blocked at upload, were those videos removed? If blocked, then the number of videos drops to 300 million.

This type of information is typical of the coverage of Facebook, a company which is become the embodiment of social media.

There were two other interesting Facebook stories in my news feed this morning.

The first concerns a high profile Silicon Valley investor, Marc Andreessen. The write up reports and updates a story whose main point is:

Facebook Board Member May Have Met Cambridge Analytica Whistleblower in 2016.

Read more

The Search Wars: When Open Starts to Close

March 12, 2019

Compass Search. The precursor. The result? Elasticsearch. No proprietary code. Free and open source. The world of enterprise search shifted.

As a result of Shay Bannon’s efforts, an alternative to proprietary search and interesting financial maneuvers, an individual or organization could download code and set up a functional enterprise search system.

There are proprietary search systems available like Coveo. But most of the offerings are sort of open sourcey. It is a marketing ploy. The forward leaning companies do not use the word search to market their products because zippier functionality is what brings tire kickers and some buyers.

The landscape of search seems to be doing its Hawaii volcano act. No real eruption buts shakes, hot gas, and cracks have begun to appear. The lava flows will come soon enough.

a bezos art

The path is clear to the intrepid developer.

The tip off is Amazon’s announcement that it now offers an open distro for Elasticsearch. Why is Amazon taking this step? The company explains:

Elasticsearch has become an essential technology for log analytics and search, fueled by the freedom open source provides to developers and organizations. Our goal is to ensure that open source innovation continues to thrive by providing a fully featured, 100% open source, community-driven distribution that makes it easy for everyone to use, collaborate, and contribute.

DarkCyber’s briefings about Amazon’s policeware initiative suggest that the online bookstore is adding another component to its robust intelligence system and services.

The move involves or will involve:

  • Entrepreneurs who will see Amazon as creating low friction for new products and services
  • Partners because implementing search can be a consulting gold mine
  • Users
  • Developers who will use an Amazon “off the shelf” solutions
  • Competitors who may find the “other open source” Elasticsearch lagging behind the Amazon “house brand”.

The move is not much of a surprise. Amazon seeks to implement its version of IBM’s 1960s style vendor lock in. Open source is open source, isn’t it? A version of the popular Elasticsearch system which has utility in commercial products to add ons which help make log files more mine-able. Plus search snaps into the DNA of the Amazon jungle of services, functions, features, and services. Where there is confusion, there are opportunities to make money.

Adding a house brand to its ecosystem is a basic tactic in the Amazon playbook. Those T shirts with the great price are Amazon’s, not the expensive stuff with a fancy brand name. T shirts and search? Who cares?

What’s the play mean for over extended proprietary search systems which may never generate a pay day for investors? A lot of explaining seems likely.

What the play mean for Elastic, the company which now operates the son of Compass Search? Some long off site meetings may be ahead and maybe some chats with legal eagles.

What’s the play mean for vendors using Amazon as back end plumbing for their enterprise or policeware services? A swap out of the Elasticsearch system for the Amazon version could be in the cards. Amazon Elasticsearch will probably deliver fewer headaches and lost weekends than using the Banon-Elastic version. Who wants headaches in an already complex, expensive implementation?

The Register quotes an evangelist from AWS as saying:

“We will continue to send our contributions and patches upstream to advance these projects.”

DarkCyber interprets this action and Amazon’s explanations from the perspective and context of a high school football coach:

“Front line, listen up, fork that QB. I want that guy put down. Hard. Let’s go.”

Amazon. The best defense is a good offense, right?

The coach shouts:

“Let’s hit those Sheep hard. Arrrgh.”

Stephen E Arnold, March 12, 2019

IBM Debate Contest: Human Judges Are Unintelligent

February 12, 2019

I was a high school debater. I was a college debater. I did extemp. I did an event called readings. I won many cheesey medals and trophies. Also, I have a number of recollections about judges who shafted me and my team mate or just hapless, young me.

I learned:

Human judges mean human biases.

When I learned that the audience voted a human the victor over the Jeopardy-winning, subject matter expert sucking, and recipe writing IBM Watson, I knew the human penchant for distortion, prejudice, and foul play made an objective, scientific assessment impossible.

ibm debate

Humans may not be qualified to judge state of the art artificial intelligence from sophisticated organizations like IBM.

The rundown and the video of the 25 minute travesty is on display via Engadget with a non argumentative explanation in words in the write up “IBM AI Fails to Beat Human Debating Champion.” The real news report asserts:

The face-off was the latest event in IBM’s “grand challenge” series pitting humans against its intelligent machines. In 1996, its computer system beat chess grandmaster Garry Kasparov, though the Russian later accused the IBM team of cheating, something that the company denies to this day — he later retracted some of his allegations. Then, in 2011, its Watson supercomputer trounced two record-winning Jeopardy! contestants.

Yes, past victories.

Now what about the debate and human judges.

My thought is that the dust up should have been judged by a panel of digital devastators; specifically:

  • Google DeepMind. DeepMind trashed a human Go player and understands the problems humanoids have being smart and proud
  • Amazon SageMaker. This is a system tuned with work for a certain three letter agency and, therefore, has a Deep Lens eye to spot the truth
  • Microsoft Brainwave (remember that?). This is a system which was the first hardware accelerated model to make Clippy the most intelligent “bot” on the planet. Clippy, come back.

Here’s how this judging should have worked.

  1. Each system “learns” what it takes to win a debate, including voice tone, rapport with the judges and audience, and physical gestures (presence)
  2. Each system processes the video, audio, and sentiment expressed when the people in attendance clap, whistle, laugh, sub vocalize “What a load of horse feathers,” etc.
  3. Each system generates a score with 0.000001 the low and 0.999999 the high
  4. The final tally would be calculated by Facebook FAIR (Facebook AI Research). The reason? Facebook is among the most trusted, socially responsible smart software companies.

The notion of a human judging a machine is what I call “deep stupid.” I am working on a short post about this important idea.

A human judged by humans is neither just nor impartial. Not Facebook FAIR.

An also participated award goes to IBM marketing.

participant meda

IBM snagged an also participated medal. Well done.

Stephen E Arnold, February 13, 2019

Deloitte and NLP: Is the Analysis On Time and Off Target?

January 18, 2019

I read “Using AI to Unleash the Power of Unstructured Government Data.” I was surprised because I thought that US government agencies were using smart software (NLP, smart ETL, digital notebooks, etc.). My recollection is that use of these types of tools began in the mid 1990s, maybe a few years earlier. i2 Ltd., a firm for which I did a few minor projects, rolled out its Analyst’s Notebook in the mid 1990s, and it gained traction in a number of government agencies a couple of years after British government units began using the software.

The write up states:

DoD’s Defense Advanced Research Projects Agency (DARPA) recently created the Deep Exploration and Filtering of Text (DEFT) program, which uses natural language processing (NLP), a form of artificial intelligence, to automatically extract relevant information and help analysts derive actionable insights from it.

My recollection is that DEFT fired up in 2010 or 2011. Once funding became available, activity picked up in 2012. That was six years ago.

However, DEFT is essentially a follow on from other initiatives which reach by to Purple Yogi (Stratify) and DR-LINK, among others.

The capabilities of NLP are presented as closely linked technical activities; for example:

  • Name entity resolution
  • Relationship extraction
  • Sentiment analysis
  • Topic modeling
  • Text categorization
  • Text clustering
  • Information extraction

The collection of buzzwords is interesting. I would annotate each of these items to place them in the context of my research into content processing, intelware, and related topics:

Read more

Next Page »

  • Archives

  • Recent Posts

  • Meta