Search: Useless Results Finally Recognized?

August 22, 2019

I cannot remember how many years ago it was since I wrote “Search Sucks” for Barbara Quint, the late editor of Searcher. I recall her comment to me, “Finally, someone in the industry speaks out.”

Flash forward a decade. I can now repeat her comment to me with some minor updating: “Finally someone recognized by the capitalist tool, Forbes Magazine, recognizes that search sucks.

The death of search was precipitated by several factors. Mentioning these after a decade of ignoring Web search still makes me angry. The failure of assorted commercial search vendors, the glacial movement of key trade associations, and the ineffectuality of search “experts” still makes me angry.

Image result for fake information

There are other factors contributing to the sorry state of Web search today. Note: I am narrowing my focus to the “free” Web search systems. If I have the energy, I may focus on the remarkable performance of “enterprise search.” But not today.

Here are the reasons Web search fell to laughable levels of utility:

  1. Google adopted the GoTo / Overture / Yahoo approach to determining relevance. This is the pay-to-play model.
  2. Search engine optimization “experts” figured out that Google allowed some fiddling with how it determined “relevance.” Google and other ad supported search systems then suggested that those listings might decay. The fix? Buy ads.
  3. Users who were born with mobile phones and flexible fingers styled themselves “search experts” along with any other individual who obtains information by looking for “answers” in a “free” Web search system.
  4. The willful abandonment of editorial policies, yardsticks like precision and recall, and human indexing guaranteed that smart software would put the nails in the coffin of relevance. Note: artificial intelligence and super duped automated indexing systems are right about 80 percent of the time when hammering scientific, technical, and engineering information. Toss is blog posts, tweets, and Web content created by people who skipped high school English and the accuracy plummets. Way down, folks. Just like facial recognition systems.

The information presented in “As Search Engines Increasingly Turn To AI They Are Harming Search” is astounding. Not because it is new, but because it is a reflection of what I call the Web search mentality.

Here’s an example:

Yet over the past few years, search engines of all kinds have increasingly turned to deep learning-powered categorization and recommendation algorithms to augment and slowly replace the traditional keyword search. Behavioral and interest-based personalization has further eroded the impact of keyword searches, meaning that if ten people all search for the same thing, they may all get different results. As search engines depreciate traditional raw “search” in favor of AI-assisted navigation, the concept of informational access is being harmed and our digital world is being redefined by the limitations of today’s AI.

The problem is not artificial intelligence.

Read more

Audio Data Set: Start Your AI Engines

August 16, 2019

Machine learning projects have a new source of training data. BoingBoing announces the new “Open Archive of 240,000 Hours’ Worth of Talk Radio, Including 2.8 Billion Words of Machine-Transcription.” A project of MIT Media Lab, Radiotalk holds a wealth of machine-generated transcriptions of talk radio broadcasts between October 2018 and March 2019. Naturally, the text is all tagged with machine-readable metadata. The team hopes their work will enrich research in natural language processing, conversational analysis, and social sciences. Writer Cory Doctorow comments:

“I’m mostly interested in the social science implications here: talk radio is incredibly important to the US political discourse, but because it is ephemeral and because recorded speech is hard to data-mine, we have very little quantitative analysis of this body of work. As Gretchen McCulloch points out in her new book on internet-era language, Because Internet, research on human speech has historically relied on expensive human transcription, leading to very small and corpuses covering a very small fraction of human communication. This corpus is part of a shift that allows social scientists, linguists and political scientists to study a massive core-sample of spoken language in our public discourse.”

The metadata attached to these transcripts includes information about geographical location, speaker turn boundaries, gender, and radio program information. Curious readers can access the researchers’ paper here (PDF).

Cynthia Murrell, August 16, 2019

US AI Standards Document

August 15, 2019

DarkCyber learned about “U.S. LEADERSHIP IN AI: A Plan for Federal Engagement in Developing Technical Standards and Related Tools.” The research team poked about and located the report. You can, as of August 15, 2019, download the 52 page document at this link. There are some interesting “targets” in the report; for example, the need for documented case examples. A related document has been prepared by AI insiders. You can download this 112 page report at this link. A bit of Binging and Googling (coupled with patience) reveals similar position papers.

DarkCyber’s view is:

  • The authors of these documents are not upfront about the technical balance tipping from the US to other countries’ efforts
  • The spectrum of “flavors” of artificial intelligence
  • The lag between what specific companies are doing and what the bold vision of a smart future will deliver.

How can these issues be addressed? DarkCyber has been noodling about the gaps, the spectrum, and the Borges like reality and fantasy dichotomy for years.

Based on the information presented in these documents, other issues are of greater concern to those wrestling with AI in a decidedly American venue, with US athletes, and a US definition of “world champion.”

In short, a question: What if Jorge Luis Burges’s observation is correct:

Reality is not always probable, or likely.

DarkCyber assumes one could ask IBM Watson or Amazon SageMaker.

Stephen E Arnold, August 15, 2019

Google Pumps Cash into DeepMind: A Cost Black Hole Contains Sour Grapes

August 8, 2019

DarkCyber believes that some of the major London newspapers are not wearing happy face buttons when talking about Google. The reasons boil down to money. Google has it in truckloads courtesy of advertising. London newspapers don’t because advertisers love print less these days.

I read “DeepMind Losses Mount as Google Spends Heavily to Win AI Arms Race.” The write up is a good example of bad decisions the now ageing whiz kids are making. Sour grapes? More like sour grapes journalism.

Straight away smart software is going to migrate through many human performed activities. Getting software to work, not send deliveries to the wrong house, pick out the exact person of interest from a sea of faces, and make decisions which are slightly more reliable than the LIBOR folks delivered — this is the future.

The future is expensive unless one gets really lucky. Right, that’s like the “I’m feeling lucky” thing Google provides courtesy of advertisers’ spending.

Back to the bitter vintage write up: The London newspaper states:

Its annual accounts from Companies House show losses of more than £470m in 2018, up from £302m the year before, and its expenses rose from £334m to £568m. Of the £1.03bn due for repayment this year, £883m is owed to parent company Alphabet.

Okay, investments (losses). This is not news. What is news is the tiny hint that there may be some value in looking at the repayments issue? Well, why not look into the tax implications of such inside debts?

Another non news factoid: It costs money to hire people who can make AI work. What about the future of AI if a company does not have smart people? There are some case examples about this type of misstep in non Googley businesses. What are the differences? Similarities? How about a smidgen of research and analysis.

Recycling numbers without context is — to be frank — like a commercial database summarizing an article from a linguistics journal published a year ago. Great for some, but for most, nothing substantive or useful.

Poor Google. The company is investing in a city and country which has the distinction of newspapers which grouse incessantly about a company that’s been around 20 or so years.

Will Google deploy its technology to report the news? Perhaps that would make an interesting write up. Recycling public financial data with a couple of ounces of lousy whine is not satisfying to those in Harrod’s Creek, Kentucky.

Stephen E Arnold, August 8, 2019

More on Biases in Smart Software

August 7, 2019

Bias in machine learning strikes again. Citing a study performed by Facebook AI Research, The Verge reports, “AI Is Worse at Identifying Household Items from Lower-Income Countries.” Researchers studied the accuracy of five top object-recognition algorithms, Microsoft Azure, Clarifai, Google Cloud Vision, Amazon Rekognition, and IBM Watson, using this dataset of objects from around the world. Writer James Vincent tells us:

“The researchers found that the object recognition algorithms made around 10 percent more errors when asked to identify items from a household with a $50 monthly income compared to those from a household making more than $3,500. The absolute difference in accuracy was even greater: the algorithms were 15 to 20 percent better at identifying items from the US compared to items from Somalia and Burkina Faso.”

Not surprisingly, researchers point to the usual suspect—the similar backgrounds and financial brackets of most engineers who create algorithms and datasets. Vincent continues:

“In the case of object recognition algorithms, the authors of this study say that there are a few likely causes for the errors: first, the training data used to create the systems is geographically constrained, and second, they fail to recognize cultural differences. Training data for vision algorithms, write the authors, is taken largely from Europe and North America and ‘severely under sample[s] visual scenes in a range of geographical regions with large populations, in particular, in Africa, India, China, and South-East Asia.’ Similarly, most image datasets use English nouns as their starting point and collect data accordingly. This might mean entire categories of items are missing or that the same items simply look different in different countries.”

Why does this matter? For one thing, it means object recognition performs better for certain audiences than others in systems as benign as photo storage services, as serious as security cameras, and as crucial self-driving cars. Not only that, we’re told, the biases found here may be passed into other types of AI that will not receive similar scrutiny down the line. As AI products pick up speed throughout society, developers must pay more attention to the data on which they train their impressionable algorithms.

Cynthia Murrell, August 7, 2019

Flawed Data In, Bias Out

August 3, 2019

Artificial intelligence is biased. AI algorithms are biased against non-white people as well as females. The reason is that the programmers are usually white males and it is usually an oversight to add data that makes their AI algorithms diverse. Silicon Republic shares a brand new ways that AI is biased, this time against poorer individuals: “Biased AI Reportedly Struggles To Identify Objects From Poorer Households.”

The biggest biased AI culprits are visual recognition algorithms built to identify people and objects. The main cause behind their biases is the lack of diverse data. The article points out how Facebook’s AI research lab discovered how biased data exists in internationally used visual object recognition systems. Microsoft Azure, Google Cloud Vision, Amazon Rekognition, Clarifai, and IBM Watson use algorithms that were tasked with identifying common household items from a global dataset. Information in the dataset included:

“The dataset covers 117 categories of different household items and documents the average monthly income of households from various countries across the world, ranging from $27 in Burundi to $10,098 in China. When the algorithms were shown the same product but from different parts of the world, the researchers found that there was a 10pc increase in chance they would fail to identify items from a household earning less than $50 versus one making more than $3,500 a month.”

This raises an interesting view on how the AI are programmed to identify objects. One example is identifying soap on different surfaces. In richer countries, soap was identified when it was in a soap pump dispenser on a tiled counter, but in poorer countries it was bar soap on a dirty surface. The AI was 20% more likely to identify objects in richer countries than poor ones. The difference increases with living rooms with a 40% accuracy difference and it is due to the lack of items in poorer homes. The programmers believe the bias is due to most of the data comes from wealthier countries and lack of information from poorer ones.

Is this another finding from Captain Obvious’ research lab? Is it possible to generate more representative datasets? Obviously not.

Whitney Grace, August 3, 2019

Smart Software: About Those Methods?

July 23, 2019

An interesting paper germane to machine learning and smart software is available from The title? “Are We Really Making Much Progress? A Worrying Analysis of Recent Neural Recommendation Approaches”.

The punch line for this academic document is, in the view of DarkCyber:

No way.

Your view may be different, but you will have to read the document, check out the diagrams, and scan the supporting information available on Github at this link.

The main idea is:

In this work, we report the results of a systematic analysis of algorithmic proposals for top-n recommendation tasks. Specifically, we considered 18 algorithms that were presented at top-level research conferences in the last years. Only 7 of them could be reproduced with reasonable effort. For these methods, it however turned out that 6 of them can often be outperformed with comparably simple heuristic methods, e.g., based on nearest-neighbor or graph-based techniques. The remaining one clearly outperformed the baselines but did not consistently outperform a well-tuned non-neural linear ranking method. Overall, our work sheds light on a number of potential problems in today’s machine learning scholarship and calls for improved scientific practices in this area.

So back to my summary, “No way.”

Here’s a “oh, how interesting chart.” Note the spikes:


Several observations:

  1. In an effort to get something to work, those who think in terms of algorithms take shortcuts; that is, operate in a clever way to produce something that’s good enough. “Good enough” is pretty much a C grade or “passing.”
  2. Math whiz hand waving and MBA / lawyer ignorance of what human judgments operate within an algorithmic operation guarantee that “good enough” becomes “Let’s see if this makes money.” You can substitute “reduce costs” if you wish. No big difference.
  3. Users accept whatever outputs a smart system deliver. Most people believe that “computers are right.” There’s nothing DarkCyber can do to make people more aware.
  4. Algorithms can be fiddled in the following ways: [a] Let these numerical recipes and the idiosyncrasies of calculation will just do their thing; for example, drift off in a weird direction or produce the equivalent of white noise; [b] get skewed because of the data flowing into the system automagically (very risky) or via human subject matter experts (also very risky); [c] the programmers implementing the algorithm focus on the code, speed, and deadline, not how the outputs flow; for example, k-means can be really mean and Bayesian methods can bay at the moon.

Net net: Worth reading this analysis.

Stephen E Arnold, July 23, 2019

Sockpuppet Image Source

July 23, 2019

I read “Turn Selfies into Classical Portraits with the AI That Fuels Deepfakes.” I gave the system a spin. I uploaded a picture from this week’s DarkCyber. The system generated a wonderful image usable by anyone with access to a source of images; for example, Bing Images or Facebook. Here’s the result:


Working well. Cloud centric or a laptop? I loved the explanation: “Huge traffic.” Back to those scaling lectures.

Stephen E Arnold, July 23, 2019

Machine Learning: Whom Does One Believe?

June 28, 2019

Ah, another day begins with mixed messages. Just what the relaxed, unstressed modern decider needs.

First, navigate to “Reasons Why Machine Learning can Prove Beneficial for Your Organization.” The reasons include:

  • Segment customer coverage. No, I don’t know what this means either.
  • Accurate business forecasts. No, machine learning systems cannot predict horse races or how a business will do. How about the impact of tariffs or a Fed interest rate change?
  • Improved customer experience. No, experiences are not improving. How do I know? Ask a cashier to make change? Try to get an Amazon professional to explain how to connect a Mac laptop to an Audible account WITHOUT asking, “May I take control of your computer with our software?”
  • Make decisions confidently. Yep, that’s what a decider does in the stable, positive, uplifting work environment of a electronic exchange when a bug costs millions in a two milliseconds.
  • Automate your routine tasks. Absolutely. Automation works well. Ask the families of those killed by “intelligence stoked” automobiles or smart systems on a 737 Max.

But there’s a flip side to these cheery “beneficial” outcomes. Navigate to “Machine Learning Systems Are Stuck in a Rut.” We noted these statements. First a quote from a technical paper.

In this paper we argue that systems for numerical computing are stuck in a local basin of performance and programmability. Systems researchers are doing an excellent job improving the performance of 5-year old benchmarks, but gradually making it harder to explore innovative machine learning research ideas.

Next this comment by the person who wrote the “Learning Systems” article:

The thrust of the argument is that there’s a chain of inter-linked assumptions / dependencies from the hardware all the way to the programming model, and any time you step outside of the mainstream it’s sufficiently hard to get acceptable performance that researchers are discouraged from doing so.

Which is better? Which is correct?

Be a decider either using a black box or the stuff between your ears.

Stephen E Arnold, June 28, 2019

Handy List of Smart Software Leaders

June 27, 2019

As the field of AI grows, it can be difficult to keep track of the significant players. Datamation shares a useful list in, “Top 45 Artificial Intelligence Companies.” If you skim the lineup, just keep in mind—entries are not ranked in any way, simply listed in alphabetical order. Writer Andy Patrizio begins with some observations about the industry:

“AI is driving significant investment from venture capitalist firms, giant firms like Microsoft and Google, academic research, and job openings across a multitude of sectors. All of this is documented in the AI Index, produced by Stanford University’s Human-Centered AI Institute. …

We noted:

“Consulting giant Accenture believes AI has the potential to boost rates of profitability by an average of 38 percentage points and could lead to an economic boost of US$14 trillion in additional gross value added (GVA) by 2035. In Truth, artificial intelligence holds a plethora of possibilities—and risks. ‘It will have a huge economic impact but also change society, and it’s hard to make strong predictions, but clearly job markets will be affected,’ said Yoshua Bengio, a professor at the University of Montreal, and head of the Montreal Institute for Learning Algorithms.”

For their selections, Datamation chose companies of particular note and those that have invested heavily in AI. Many names are ones you would expect to see, like Amazon, Google, IBM, and Microsoft. Others are more specialized—robotics platforms Anki and CloudMinds, for example, or iCarbonX, Tempus, and Zebra Medical Vision for healthcare. Several entries are open source. Check out the article for more.

Cynthia Murrell, June 24, 2019

Next Page »

  • Archives

  • Recent Posts

  • Meta