Search for Shopping: Still Room for Improvement

July 7, 2020

Targeted advertising is not the only way retailers can leverage all that personal data users have been forking over. Retail Times reports, “Findlogic Announces the Launch of AI-Powered Virtual Shopping Assistant, Lisa.” Lisa, huh? I guess Findlogic pays no heed to concerns around “female” virtual assistants. That tangent aside, the AI-powered tool is meant to reduce frustration for online shoppers and, in turn, facilitate to more completed sales. Writer Fiona Briggs tells us:

“Lisa returns on-site search based on an individual shopper’s buying intent signals in the context of a broad set of learnt user behaviors. This allows the solution to personalize results for each shopper, delivering more accurate search returns that connect customers to a desired product faster, moving them along the sales funnel and increasing conversion rates. By intelligently applying understanding to on-site search, Lisa helps shoppers better navigate product category or brand searches, which means that, rather than returning hundreds of options, the solution uses skills to refine results to bring shoppers to the exact product they are looking for quicker. Lisa also incorporates machine learning capabilities which allow it to learn and understand a shopper’s preferences and apply them to search, offering up personalized recommendations, which ranks the products the shopper is most likely to choose at the top of the list of results. Lisa also offers up intelligent ways to refine searches for generic keywords, using the application of a skill that then asks the user a set of questions to progress their search based on their individual requirements.”

Findlogic’s UK director emphasizes companies that put effort into getting shoppers to their websites in the first place are let down by traditional, keyword-based search systems that frustrate some 41% of potential customers. The company is betting this AI that can understand “intent” will change that. Based in Salzburg, Austria, Findlogic was founded in 2008.

Cynthia Murrell, July 7, 2020

Quote to Note: A Father of the Internet and a Googler to Boot

July 7, 2020

DarkCyber spotted this quote from Vint Cerf. I once introduced him at a conference and he displayed a T shirt with the message “I TCP on Everything!”

Here’s the Cerf quote from Diginomica:

When you see a phenomenon like the Internet, which is rich in its evolution, new ideas, new applications, it is a very open architecture and invites people to invent new ways of using it. But this introduces new kinds of governance concerns: what we do about misinformation, about malware which is propagating through the network, about someone in one country who is harmed by someone in another.  For anyone who is interested in governance, there is simply a wide open space here for hard work and for international agreements, in order to manage this very complex and very rich environment that we call the Internet, and the World Wide Web.

Interesting phrasing. We noted the words misinformation, malware, and governance.

Governance is particularly interesting; for example, what does governance mean in this sentence:

But this introduces new kinds of governance concerns.

Yes, that is true if the quote is accurate. If any company knows anything about governance, I would submit it is the Google.

Stephen E Arnold, July 7, 2020

Will Insurance Companies Tie Rates to Rage?

July 7, 2020

The community-driven navigation app Waze, owned by Google, has refreshed its design. The company changed up the color scheme, logos, icons, and typeface—the sort of tweaks one would expect to keep users engaged. One particular change, however, is more intriguing. Engadget reveals, “Waze Lets Drivers Display their Moods in the App.” That could prove to be very useful information for some advertisers, individuals, and government entities. Writer Christine Fisher reports:

“Waze is also adding something called Moods, a feature that will ‘capture users’ emotions.’ ‘Celebrating the passion and authenticity of its users, Waze hopes that the update will harness the “humanness” that can often be lost within inhumane traffic conditions,’ the company wrote in a press release. It’s unclear if Moods will be shared with nearby Waze users. Letting other drivers know how you feel doesn’t necessarily sound like a great idea, but for the most part the Mood icons look too cute to induce serious road rage. ‘Hopefully our new look reminds users of the magic of our community and the way we work together for better,’ said Jake Shaw, head of creative at Waze.”

The icons are indeed very cute, we’ll give them that, and touting the “magic of community” sounds delightful. But giving away even more personal data seems like a bad idea to those of us who understand how various entities can use seemingly benign personal details. Founded in 2007, Waze is based in the San Francisco Bay area. Google bought the company for $966 million in 2013.

Cynthia Murrell, July 7, 2020

The Cost of Training Smart Software: Is It Rising or Falling?

July 6, 2020

I read “The Cost of AI Training is Improving at 50x the Speed of Moore’s Law: Why It’s Still Early Days for AI.” The article’s main point is that “training” — that is, the cost of making machine learning smart — is declining.

That seems to make sense. First, there are cloud services. Some of these are cheaper than others, but, in general, relying on cloud compute eliminates the capital costs and the “ramp up” costs for creating one’s own infrastructure to train machine learning systems.

Second, use of a machine learning “utility” like Amazon AWS Sagemaker or the similar services available from IBM and Google provides two economic benefits:

  1. Tools are available to reduce engineering lift off and launch time
  2. Components like Sagemaker’s off-the-shelf data bundles eliminate the often-tedious process of finding additional data to use for training.

Third, assumptions about smart software’s efficacy appear to support generalizations about the training, use, and deployment of smart software.

I want to =note that there are some research groups who believe that software can learn by itself. If my memory is working this morning, I think the jazzy way to state is “sui generis.” Turn the system on, let it operate, and it learns by processing. For smart software, the crude parallel is learning the way humans learn: What’s in the environment becomes the raw material for learning.

The article correctly points out that the number of training models has increased. That is indeed accurate. A model is a numerical recipe set up to produce an output that meets the modeler’s goal. Thus, training a model involves providing data to the numerical recipe, observing the outputs, and then making adjustments. These “tweaks” can be simple and easy; for example, changing a threshold governing a decision. More complex fixes include, but are not limited to, selecting a different sequence for the individual processes, concatenating models so that multiple outputs inform a decision, and substituting one mathematical component for another. To get a sense of the range of components available to a modeler, a quick look at Algorithms. This collection is what I would call “ready to run.”

The article includes a number of charts. Each of these presents data supporting the argument that it is getting less costly to training smart software.

I am not certain I agree, although the charts seem to support the argument.

I want to point out that there are some additional costs to consider. A few of these can be “deal breakers” for financial and technical reasons.

Here’s my list of smart software costs. As far as I know, none of these has been the subject of an analyst’s examination and some may be unquantified because those in the business of smart software are not set up to capture them:

  1. Retraining. Anyone with experience with models knows that retraining is required. There are numerous reasons, but retraining is often more expensive than the first set of training activities.
  2. Gathering current or more on point training data. The assumption about training data is that it is useful. We live in the era of so called big data. Unfortunately on point data relevant to the retraining task is a time consuming and can be a complicated task involving subject matter experts.
  3. Data normalization. There is a perception that if data are digital, those data can be provided “as is” to a content processing system. That is not entirely accurate. The normalization processes can easily consume as much as 60 percent of available subject matter expert and data analysts’ time.
  4. Data validation. The era of big data makes possible this generalization, “The volume of data will smooth out any anomalies.” Maybe, but in my experience, the “anomalies” — if not addressed — can easily skew one of the ingredients in the numerical recipe so that the outputs are not reliable. The output may “look” like it is accurate. In real life, the output is not what’s desired. I would refer the reader to the stories about Detroit’s facial recognition system which is incorrect 96 percent of the time. For reference, see this Ars Technica article.
  5. Downstream costs. Let’s use the Detroit police facial recognition system to illustrate this cost. Answer this question, please, “What are the fully loaded costs for the consequences of the misidentification of a US citizen?”

In my view, taking a narrow look at the costs of training smart software is not in the interests of the analyst who benefits from handling investors’ money. Nor are the companies involved in smart software eager to monitor the direct and indirect costs associated with training the models. Finally, it is in no one’s interest to consider the downstream costs of a system which may generate inaccurate outputs.

Net net: In today’s economic environment, ignoring the broader cost picture is a distortion of what it takes to train and retrain smart software.

Stephen E Arnold, July 6, 2020

Context Collapse Masks a Deeper, More Problematic Factor

July 6, 2020

From Context Collapse to Content Collapse” appeared in January 2020. The author is the high profile pundit Nicholas Carr. Wikipedia tells me Mr. Carr “originally came to prominence with the 2003 Harvard Business Review article “IT Doesn’t Matter.”

The write up was in my files, and I looked it up after someone asked me if technology was changing the human brain. As it turned out, the person with whom I was speaking was an avid consumer of “real” news and information via the Vox publications, a zippy Silicon Valley type of information engine.

The blog post from January 2020 asserted:

Context collapse remains an important conceptual lens, but what’s becoming clear now is that a very different kind of collapse — content collapse — will be the more consequential legacy of social media. Content collapse, as I define it, is the tendency of social media to blur traditional distinctions among once distinct types of information — distinctions of form, register, sense, and importance. As social media becomes the main conduit for information of all sorts — personal correspondence, news and opinion, entertainment, art, instruction, and on and on — it homogenizes that information as well as our responses to it.

I agree. However, I think there is an importance aspect of digital information which is — forgive me, please — presented without context.

Specifically, when digital information flows, it operates in a manner akin to sand in a sandstorm. The abrasive nature of sand erodes and in some cases blasts surfaces. In other cases, a sandstorm in Saudi Arabia can lower the air quality in rural Kentucky.

The points, which are important to my work, are:

  1. Digital information is inherently corrosive; that is, digital information flows do not “build up”; digital information flows wear down. That’s where the Carr phrasing kicks in. The loss of context is a consequence of the nature of digital information flows.
  2. Content is not necessary for digital information to act as an abrasive. The Googley phrase “data exhaust” may be as or in some cases more important than the Instagram posts or TikTok videos. The “exhaust” provides the raw material for information manipulation, disinformation, misinformation, etc.
  3. Eroded structures can change their form and function. They can fall down like the collapse of the middle managers in an “informationized” organization. They can themselves become abrasive particles, a distinction I like to make when thinking about Facebook’s founder comments, the data Facebook gathers, and the behaviors regarding access to Facebook data.

For the now long gone US Office of Technology Assessment, I wrote “The Information Factory.” That monograph looked at Japan’s ambitious plans to become a leader in computing, databases, and other nifty technologies. I think we did the research in the early 1990s.

The point is that in the course of that research, the Japanese thinkers coined some words that I found more useful than some of the Japanese information investments; for instance:

Informationize. This words was used by MITI thinkers to describe what today is called “going digital” when a company uses new information technology to make a business more efficient.

Making the abstract noun “information” into a verb “to informationize” captured a mental mind set as well as the technical processes required to achieve the goal.

Mr. Carr’s insight and the question I was asked illustrate that it has taken more than 30 years to come to grips with the deeper implications of the “digital revolution.”

Collapse and loss of context are the visible consequences of flowing digital information. The underlying factor is, therefore, easily overlooked.

That underlying factor means that the train has left the station, and it if and when it returns, it will be changed in fundamental ways.

All aboard for the new normal. When the train pulls in if it ever does, the arrival will spark many TikTok videos.

Stephen E Arnold, July 6, 2020

Mathiness: Better Than Hunan Chicken?

July 6, 2020

I am thrilled when one of my math oriented posts elicits clicks and feedback. Some still care about mathematics. Yippy do.

I read “Why China’s Race for AI Dominance Depends on Math.” The article comes from one of those high-toned online publications of mystical origins and more mythy financial resources.

The main point of the article is that China may care more about numbers than Hunan chicken. I noted this statement:

Dozens of think tank projects and government reports won’t mean anything if Americans can’t maintain mastery over the fundamental mathematics that underpin AI.

The write up disputes the truism “it’s all about the data.” The article stated:

Yet without the right type of math, and those who can creatively develop it, all the data in the world will only take you so far

Now that’s an observation which undercuts what some might call “collect it all” thinking. The idea is that the nugget is in “there” somewhere. And at some point in time systems and software will “discover” or “reveal” what a particular person needs to complete a task. That task may be the answer to the question, “What stock can I buy cheap today to make a lot of money tomorrow?” to “Who helped Robert Maxwell’s extremely interesting daughter hide in New Hampshire?”

Years ago I was on the advisory panel for a company called NuTech Solutions. The founder and a couple of his relatives focused on applying a philosophical concept to predictive methods. The company developed a search system, a method for solving traveling sales person-type problems, and a number of other common computational chestnuts. The methods ranged from smart software to old-fashioned statistical procedures applied in novel ways.

Tough sell as it turned out. On one call in which I participated, I remember this exchange:

Prospective Customer: Would you tell us how your system works?

President of NuTech: Now I think we will not make a sale.

Prospective Customer: Why is that?

President of NuTech: I have to write down equations, and we need to talk about them.

Yep, math for some is not about equations. Math is buzzwords. I mentioned to a college medical analytics professor who asked me a question about what I was working on. I replied, “I have been thinking about Hopf fibration.”

Crickets. He changed the subject.

The write up (somewhat gleefully) it seemed to me, stated:

American secondary school and university students are not mastering the fundamental math that prepares them to move into the type of advanced fields, such as statistical theory and differential geometry, that makes AI possible. American fifteen-year-olds scored thirty-fifth in math on the OECD’s 2018 Program for International Student Assessment tests—well below the OECD average. Even at the college level, not having mastered the basics needed for rigorous training in abstract problem solving, American students are often mostly taught to memorize algorithms and insert them when needed.

If true (and I have only anecdotal evidence obtained by watching young people try to make change at Walgreen’s), the idea that “insert them” is going to create some crazier stuff than Google selling ads for fast food next to a video about losing weight.

My team and I did a job for the University of Michigan before I retired. The project was to provide an outsider’s view of what could be done to make the university rank higher in math, computer science, and related disciplines. We gathered data; we interviewed; and we did on site observations. We did many things. One fact jumped out. There were not too many Americans in the advanced classes. Plus, the very best students in the advanced programs stayed in lovely Michigan. Thus, instead of setting up a business near the university, there folks headed to better weather and a more favorable venture capital climate. Yikes. These are tough problems for a university to fix easily and maybe not be able to remediate in a significant way. Good news? Yep, I got paid.

The essay grinds forward with the analysis. The essay ended with this statement:

Winning the AI competition begins by acknowledging how poorly we do in attracting and training Americans in math at all levels. Without getting serious about the remedy, the AI race may be lost as clearly as two plus two equals four.

Now think about this article’s message in the context of no code or low code programming, one click output of predictive reports based on real time data flows, or deciding what numerical recipe to plug into a business dashboard for real deciders.

Outstanding work. Those railroad cars in Texas. Just a glitch in the system. The “glitch” may be a poor calculation. Guessing might yield better results in some circumstances. Why? Yikes, the answer requires equations and that’s a deal breaker in some situations. Just use a buzzword.

Stephen E Arnold, July 6, 2020

Smart Software and an Intentional Method to Increase Revenue

July 6, 2020

The excellent write up titled “How Researchers Analyzed Allstate’s Car Insurance Algorithm.” My suggestion? Read it.

The “how to” information is detailed and instructive. The article reveals the thought process and logical thinking that allows a giant company with “good hands” to manipulate its revenues.

Here’s the most important statement in the article:

In other words, it appears that Allstate’s algorithm built a “suckers list” that would simply charge the big spenders even higher rates.

The information in the article illustrates how difficult it may be for outsiders to figure out how some smart numerical procedures are assembled into “intentional machines.”

The idea is that data allow the implementation of quite simple big ideas in a slick, automated, obfuscated way.

As my cranky grandfather observed, “It all comes down to money.”

Stephen E Arnold, July 6, 2020

Stupid Enterprise Search Promotions

July 6, 2020

Check out these incredibly silly pitches for the same market study about enterprise search:

image

This is an example of search engine optimization gaming the Google Alert system. Ridiculous SEO play and a ridiculous report.

The offending company appears to be:

Advance Market Analytics

Shameful.

Stephen E Arnold, July 6, 2020

Math and Smart Software Ethicality

July 5, 2020

I noted “Mathematical Principle Could Help Unearth Unethical Choices of AI.” The idea is that a numerical recipe runs when the smart software developer trains the model or lattice of models. The paper states:

Our suggested ‘Unethical Optimization Principle’ can be used to help regulators, compliance staff, and others to find problematic strategies that might be hidden in large strategy space. Optimization can be expected to choose disproportionately many unethical strategies, an inspection of which should show where problems are likely to arise and thus suggest how the AI search algorithm should be modified to avoid them in the future. The Principle also suggests that it may be necessary to re-think the way AI operates in very large strategy spaces, so that unethical outcomes are explicitly rejected in the optimization/learning process.

Several observations:

First, does the method “work” in the murky world of smart software; that is, some smart software is designed specifically to generate revenue. The “training” increases the likelihood that the smart software will deliver the results; for example, increased ad revenue.

Second, what happens if the developers and subject matter experts ignore the proposed numerical recipe? Answer: The algorithm will perform based on the training it receives. The purpose of the smart algorithm is to deliver what may be to some an un-ethical result.

Third, what if the proposed numerical recipes itself identifies an “ethical” action as “un-ethical”?

To sum up, interesting idea. Some work may be needed before the cheerleading commences.

Stephen E Arnold, July 5, 2020

Misunderstanding the Google Hidden URL Play

July 4, 2020

I read “Where Am I?” The write up address the void in the browser’s address bar. The point is that Google hides urls.

The author address the “problem” this way:

Based on the contents of the page, I’m clearly on a NYTimes property, but based on the address bar I’m clearly on google.com. If I click in the address bar I see https://www.google.com/amp/s/www.nytimes.com/2020/05/22/technology/google-antitrust.amp.html.

The write up points out that Google wants the user to click on the “address bar” and then try to figure out who owns the Web page displayed.

Phishing is a popular sport, and it seems that Google’s blank or modified address bar is a giant opaque lake for bad actors.

The author of the write points out:

Google serves NYTimes’ controlled content on a Google domain.

The write up adds:

In work security trainings and guides on the Internet we are trained to look at the URL bar to help make a decision on whether to trust a site, but the Google AMP Cache requires contradictory assumptions.

Here’s a diagram of Google as the Internet. What’s “in” Google becomes the Internet:

image

Stephen E Arnold, The Google Legacy and Google Version 2, both published by Infonortics (now defunct like many publishing house). Users, partners, advertisers, and developers only “know” what Google decides to provide. Blank urls are an overt indication of Google’s “ownership” of the “Internet.” The diagram was first created for an Arnold lecture about Google in 2003.

Several observations:

  1. Google’s apparent objective is to become the gateway to the Internet. This is a variation of its walled garden approach. What you “receive” and “see” is the Internet. Obfuscating urls is one step toward this goal.
  2. The way to “find” certain content is to buy ads. Scrubbing urls for PDFs means that if someone wants content found, there is a road. That road is Google Advertising.
  3. Confusion in a Google service is understood by the happy Googlers. The confusion increases dependence on Google to locate information.

This is what some might characterize as “just business.” DarkCyber’s view is the Google is creating opportunities for bad actors to make phishing easier than ever.

Hey, how hard is it to create a spoofed page, SEO that puppy, and display it to one of my neighbors’ bridge partners?

Easy, gentle reader. Without ethical control or meaningful guidelines, the Google is — in case you have not figured it out — is the Internet.

A blank address bar is just the beginning too. Think of this control as a form of “independence.” Life is simpler when it is controlled.

Stephen E Arnold, July 4, 2020

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta