Alphabet Google Falls on Its Algorithms

March 24, 2017

Here in Harrod’s Creek, advertising is mostly hand painted signs nailed to telephone poles in front of trailer parks.

Real Advertising in Big Cities Does This

In the LED illuminated big cities, people advertise by:

  1. Cooking up some keywords that are used to locate products and services like mesothelioma or cheap tickets
  2. Paying money to the “do no evil” outfit Alphabet Google to put those ads in front of people who are searching (sometimes cluelessly) for a topic related to lung disease or flying to the land of milk and honey for a couple of hundred bucks
  3. Alphabet Google putting the ads in front of humans (or software robots as the case may be) who will click on the displayed message, banner, or video snippet
  4. The GOOG collects the money
  5. The advertiser gets leads
  6. Repeat the process.

The notion, like digital currencies, is based on trust. Advertisers trust or “believe” that the GOOG’s smart software will recognize a search for Madrid will require an airplane ticket and maybe a hotel. The GOOG’s smart software consults the ads germane to travel and displays a relevant ad in front of the human (or software robot as the case may be).

goofed for content

What happens when the GOOG’s smart software does everything except the relevance part?

The reaction in the non Sillycon Valley business world is easy to spot; for example, here are some examples of the consequences of the reality of what the GOOG does versus what advertisers and other true believers in the gospel of Google collides with faith, trust, and hope:

I could list more stories about this sudden discovery that matching ads to queries is not exactly what some people have believed.

Read more

Google, Query Relaxation, and Advertisers

March 23, 2017

Most folks don’t know what a query relaxation process does. Think of a noose around your neck. If someone pulls the noose tight, you elicit a very specific result. If I remove the noose, you can frolic on your mobile device. Now substitute strict Boolean queries for a free text search. The Boolean search pulls the result set tight; that is, you get results in which the indexed words match the Boolean query. If a vendor tosses in semantic expansion which drags in concepts, synonyms, and inputs from other users’ queries, the noose is relaxed. You can breathe again.

Search vendors dependent on advertising control the scope of the result set. Yandex, we noted, is relaxing its queries. The reason? Relaxed queries allow an ad matching system more leeway. The idea is that if I search for “Kia Soul 2011 P22545R18” tire, an outfit like Google has to match with ads its system has been told want the keyword “Kia” or “Soul.”

But if the query is relaxed and expansion methods are in play, “Kia” becomes “car”, “vehicle,” “SUV” and “Soul” becomes “auto parts” and maybe “religion.”

Instantly, the ad matching system can go to the advertising pool and start putting more ads into the search results. Some of the ads may be helpful; for example, “auto parts.” Others for a Zen weekend might not be germane to a person looking for a set of radials.

Pretty boring stuff, right? The problem is that as the number of queries sent to old school desktop computers goes down, the opportunity to use ads goes down too. The fix?

Query expansion. Looser queries, more opportunities to display less and less relevant ads. Who is going to notice? Well, that’s a good question.

Now navigate to “AT&T, Other U.S. Advertisers Quit Google, YouTube over Extremist Videos.” The write up points out:

AT&T, Verizon, Johnson & Johnson and other major U.S. advertisers are pulling hundreds of millions of dollars in business from Google and its video service YouTube despite the Internet giant’s pledge this week to keep offensive and extremist content away from ads. AT&T said that it is halting all ad spending on Google except for search ads. That means AT&T ads will not run on YouTube or two million websites that take part in Google’s ad network.

On the surface, the allegations suggest that Google’s smart software is not smart enough to prevent an ad for a mobile phone company from appearing as a sponsor of a video the advertiser finds offensive. From my point of view, this is an example of what happens when revenue drives query relaxation. With relaxed queries, the advertiser’s message is “close enough” to the results list. Bingo. Google books revenue and the advertiser’s message is displayed.

In the good old days before mobile devices decimated the GoTo.com/Overture.com model, less relaxed queries and ad matching worked reasonably well. Today, relaxed queries are an easy way to generate revenue.

The counter argument is that relaxed queries are what “usage data say searchers want.” Right, that assurance an a dime will buy me what? Not much.

Net net: Buy ads and make sales is a mantra from a time past. Today’s world of search is filled with relaxed queries and less relevant result sets and less relevant, context aware ads.

Google will have to figure something out. Relaxed queries and ad matching is now big news and costing my favorite free online search outfit a lot of money. My suggestion to Google: Relax less. Embrace relevance, precision, and recall.

Users want an answer to their question. Advertisers want to make sales. Google wants money. Dare I say, “Pick two.”

Stephen E Arnold, March 23, 2017

Cambridge Analytica: Buzz, Buzz, Buzz

March 9, 2017

The idea that software can make sense of information is a powerful one. Many companies tout the capabilities of their business processes, analytical tools, and staff to look at data and get a sense of the future. The vast majority of these firms have tools and methods which provide useful information.

What happens when a person who did not take a course in analytics learns about the strengths and limitations of these systems?

Answer: You get some excitement.

I read “Big Data’s Power Is Terrifying. That Could Be Good News for Democracy.” The main idea is that companies with nifty analytic systems and methods can control life is magnetic. Lots of folks want to believe that a company’s analyses can have a significant impact on elections, public opinion, and maybe the stock market.

The write up asserts:

Online information already lends itself to manipulation and political abuse, and the age of big data has scarcely begun. In combination with advances in cognitive linguistics and neuroscience, this data could become a powerful tool for changing the electoral decisions we make. Our capacity to resist manipulation is limited.

My view is that one must not confuse the explanations from marketing mavens, alarmists, and those who want to believe that Star Trek is “real” with what today’s systems can do. Firms like Cambridge Analytica and others generate reports. In fact, companies have been using software to figure out what’s what for many years.

What’s interesting is that folks learn about these systems and pick up the worn ball and carry it down field while screaming, “Touchdown.”

Sorry. The systems don’t warrant that type of excitement. Reality is less exciting. Probabilities are useful, not reality. But why not carry the ball. It is easier than learning what analytics firms do.

Stephen E Arnold, March 9, 2017

Enterprise Search in the Cloud: Which Service Provider?

March 9, 2017

In the wake of Amazon’s glitch, a number of publications rushed to report on the who, what, where, and why. ZDNet took a different approach in “Which Cloud Will Give You the Biggest Bang for the Buck?” The write up recycled in the best tradition of “real” journalism a report from a vendor named Cloud Spectator. I won’t ask too many questions about sample size, methodology, the meaning assigned to “value,” and statistical validity. I will assume that the information is not Facebook news.

The guts of the write  up is this chart, which is impossible to read in this blog post, but the original is reasonably legible:

image

What this chart reveals about hosting is that the 1&1 system is the big dog. I would point out that the naming of the service is “1+1” in the chart; the “real” name of the company is “1&1”, a real joy to search using free Web search systems.

Okay, 1+1 was on my radar as a very low cost provider of Web page hosting and other services. Now the company remains a low cost provider and has added a range of new services. Cloud Spectator finds the company A Number One. I was tempted to type ANo1, another keen string to plug into a Web search system.

What interested me was the cluster of outfits which the Cloud Spectator survey pegged as small dogs; for example, Amazon Web Services, the very same outfit that nuked some major Web sites. (Send in a two pizza team, Mr. Bezos.)

Close to Amazon’s lower third ranking was Microsoft Azure. Somehow that seems par for the new Microsoft. Google and the financially challenged Rackspace were in the middle of the pack. (What happened to Rackspace’s love affair with Robert Scobel, recently removed from the Gilmore Gang.)

But the major news for me was that IBM, yep, the owner of the famed and much admired Watson thing, was darn near last. IBM nosed out DimensionDate for the “Also Participated” badge.

Net net: Maybe 1&1 should get more attention. Perhaps the company will change its name to minimize the likelihood of misspellings. Alternatively 1&1 can hire Recode to endlessly repeat that one spells embarrassed with two r’s and two esses.

When it comes to search in the cloud, the question becomes, “How does one deploy an enterprise class search and content processing on the 1&1 system?” Good question.

Stephen E Arnold, March 9, 2017

Short Honks: 8 March 2017

March 8, 2017

We have a number of items which reveal great thought and actions in our world of digital information and other whizzy technological doings.

An Imponderable

First up, a quote to note from the New York Times, an outfit which is reinventing itself to be digital. I wonder if anyone recalls Jeff Pemberton and his Times Online notion from the 1980s. My hunch: Nah. Here’s the quote from the March 7, 2017 dead tree edition, ScienceTimes section, page D3 under the heading “Activists Rush to Find Dark Data under Threat”:

If they [the US government] are going to delete something, how will we even know it is deleted if we did not know it was there?

Yes, another expert in step with the antics of FirstGov.gov now USA.gov. Where are those data? The online version of the story at this link may charge you to view the content. Yes, the digital Gray Lady.

Hewlett Packard: Into Commodities

I don’t know too much about the ins and outs of a big time outfit like Hewlett Packard. I did note that Hewlett Packard is going to buy Nimble Storage for $1 billion. The write up states:

Some analysts, however, wonder if HPE overpaid. “This take-out price seems a little stretched for an asset that was not turning a profit,” Barclays analyst Mark Moskowitz wrote in a note to clients. “Plus, Nimble had been losing competitive momentum as the storage incumbents caught up on flash- and hybrid-based solutions.”

Yep, buying a money losing business which is pitching a commoditized storage method. What happens if HPE pairs its pricey Autonomy technology with Nimble Storage? Interesting hybrid to analyze with HPE’s predictive analytics tools.

Farewell Socl (Pronounced Social)

I don’t think there is a counseling service for disappointed Socl users. You know and use Socl, don’t you? Microsoft has decided to kill its social community. Microsoft is sure its “supportive community of like minded people” will forgive the Softies for this anti-social action. We noted this comment in the Verge:

[Microsoft] launched its own social network more than four years ago.

How time flies when you are fighting Facebook.

Google Buys a Community

The Google is going to be social. I noted that the containerizing outfit has purchased Kaggle. The write up in TechCrunch reported:

[Kaggle] is basically the de facto home for running data science — and machine learning — competitions.

Get those talented coders early and be social about it. You know. Friendly, courteous, team oriented. Take that, Facebook.

Stephen E Arnold, March 8, 2017

ScyllaDB Version 3.1 Available

March 8, 2017

According to Scylla, their latest release is currently the fastest NoSQL database. We learn about the update from SiliconAngle’s article, “ScyllaDB Revamps NoSQL Database in 1.3 Release.” To support their claim, the company points to a performance benchmark test executed by the Yahoo Cloud Serving Benchmark project. That group compared ScyllaDB to the open source Cassandra database, and found Scylla to be 4.6 times faster than a standard Cassandra cluster.

Writer Mike Wheatley elaborates on the product:

ScyllaDB’s biggest differentiator is that it’s compatible with the Apache Cassandra database APIs. As such, the creators claims that ScyllaDB can be used as a drop-in replacement for Cassandra itself, offering users the benefit of improved performance and scale that comes from the integration with a light key/value store.

The company says the new release is geared towards development teams that have struggled with Big Data projects, and claims a number of performance advantages over more traditional development approach, including:

*10X throughput of baseline Cassandra – more than 1,000,000 CQL operations per second per node

*Sub 1msec 99% latency

*10X per-node storage capacity over Cassandra

*Self-tuning database: zero configuration needed to max out hardware

*Unparalleled high availability, native multi-datacenter awareness

*Drop-in replacement for Cassandra – no additional scripts or code required”

Wheatley cites Scylla’s CTO when he points to better integration with graph databases and improved support for Thrift, Date Tiered Compaction Strategy, Large Partitions, Docker, and CQL tracing. I notice the company is hiring as of this writing. Don’t let the Tel Aviv location of Scylla’s headquarters stop from applying you if you don’t happen to live nearby—they note that their developers can work from anywhere in the world.

Cynthia Murrell, March 8, 2016

Bad Big Data? Get More Data Then

March 2, 2017

I like the idea that more is better. The idea is particularly magnetic when a company cannot figure out what it’s own, in house, proprietary data mean. Think of the legions of consultants from McKinsey and BCG telling executives what their own data “means.” Toss in the notion of a Big Data in a giant “data lake,” and you have decision makers who cannot use the information they already have.

Well, how does one fix that problem? Easy. Get more data. That sounds like a plan, particularly when the professionals struggling are in charge of figuring out if sales and marketing investments sort of pay for themselves.

I learned that I need more data by reading “Deepening The Data Lake: How Second-Party Data Increases AI For Enterprises.” The headline introduces the amazing data lake concept along with two giant lake front developments: More data and artificial intelligence.

Buzzwords? Heck no. Just solid post millennial reasoning; for example:

there are many marketers with surprisingly sparse data, like the food marketer who does not get many website visitors or authenticated customers downloading coupons. Today, those marketers face a situation where they want to use data science to do user scoring and modeling but, because they only have enough of their own data to fill a shallow lake, they have trouble justifying the costs of scaling the approach in a way that moves the sales needle.

I like that sales needle phrase. Marketers have to justify themselves and many have only “sparse” data. I would suggest that marketers have often useless data like the number of unique clicks, but that’s only polluting the data lake.

The fix is interesting. I learned:

we can think of the marketer’s first-party data – media exposure data, email marketing data, website analytics data, etc. – being the water that fills a data lake. That data is pumped into a data management platform (pictured here as a hydroelectric dam), pumped like electricity through ad tech pipes (demand-side platforms, supply-side platforms and ad servers) and finally delivered to places where it is activated (in the town, where people live)… this infrastructure can exist with even a tiny bit of water but, at the end of the cycle, not enough electricity will be generated to create decent outcomes and sustain a data-driven approach to marketing. This is a long way of saying that the data itself, both in quality and quantity, is needed in ever-larger amounts to create the potential for better targeting and analytics.

Yep, more data.

And what about making sense of the additional data? I learned:

The data is also of extremely high provenance, and I would also be able to use that data in my own environment, where I could model it against my first-party data, such as site visitors or mobile IDs I gathered when I sponsored free Wi-Fi at the last Country Music Awards. The ability to gather and license those specific data sets and use them for modeling in a data lake is going to create massive outcomes in my addressable campaigns and give me an edge I cannot get using traditional ad network approaches with third-party segments. Moreover, the flexibility around data capture enables marketers to use highly disparate data sets, combine and normalize them with metadata – and not have to worry about mapping them to a predefined schema. The associative work happens after the query takes place. That means I don’t need a predefined schema in place for that data to become valuable – a way of saying that the inherent observational bias in traditional approaches (“country music fans love mainstream beer, so I’d better capture that”) never hinders the ability to activate against unforeseen insights.

Okay, I think I understand. No wonder companies hire outfits like blue chip consulting firms to figure out what is going on in their companies. Stated another way, insiders live in the swamp. Outsiders can put the swamp into a context and maybe implement some pollution control systems.

Stephen E Arnold, March 2, 2017

Search Like Star Trek: The Next Frontier

February 28, 2017

I enjoy the “next frontier”-type article about search and retrieval. Consider “The Next Frontier of Internet and Search,” a write up in the estimable “real” journalism site Huffington Post. As I read the article, I heard “Scotty, give me more power.” I thought I heard 20 somethings shouting, “Aye, aye, captain.”

The write up told me, “Search is an ev3ryday part of our lives.” Yeah, maybe in some demographics and geo-political areas. In others, search is associated with finding food and water. But I get the idea. The author, Gianpiero Lotito of FacilityLive is talking about people with computing devices, an interest in information like finding a pizza, and the wherewithal to pay the fees for zip zip connectivity.

And the future? I learned:

he future of search appears to be in the algorithms behind the technology.

I understand algorithms applied to search and content processing. Since humans are expensive beasties, numerical recipes are definitely the go to way to perform many tasks. For indexing, humans fact checking, curating, and indexing textual information. The math does not work the way some expect when algorithms are applied to images and other rich media. Hey, sorry about that false drop in the face recognition program used by Interpol.

I loved this explanation of keyword search:

The difference among the search types is that: the keyword search only picks out the words that it thinks are relevant; the natural language search is closer to how the human brain processes information; the human language search that we practice is the exact matching between questions and answers as it happens in interactions between human beings.

This is as fascinating as the fake information about Boolean being a probabilistic method. What happened to string matching and good old truncation? The truism about people asking questions is intriguing as well. I wonder how many mobile users ask questions like, “Do manifolds apply to information spaces?” or “What is the chemistry allowing multi-layer ion deposition to take place?”

Yeah, right.

The write up drags in the Internet of Things. Talk to one’s Alexa or one’s thermostat via Google Home. That’s sort of natural language; for example, Alexa, play Elvis.

Here’s the paragraph I highlighted in NLP crazy red:

Ultimately, what the future holds is unknown, as the amount of time that we spend online increases, and technology becomes an innate part of our lives. It is expected that the desktop versions of search engines that we have become accustomed to will start to copy their mobile counterparts by embracing new methods and techniques like the human language search approach, thus providing accurate results. Fortunately these shifts are already being witnessed within the business sphere, and we can expect to see them being offered to the rest of society within a number of years, if not sooner.

Okay. No one knows the future. But we do know the past. There is little indication that mobile search will “copy” desktop search. Desktop search is a bit like digging in an archeological pit on Cyprus: Fun, particularly for the students and maybe a professor or two. For the locals, there often is a different perception of the diggers.

There are shifts in “the business sphere.” Those shifts are toward monopolistic, choice limited solutions. Users of these search systems are unaware of content filtering and lack the training to work around the advertising centric systems.

I will just sit here in Harrod’s Creek and let the future arrive courtesy of a company like FacilityLive, an outfit engaged in changing Internet searching so I can find exactly what I need. Yeah, right.

Stephen E Arnold, February 28, 2017

The Pros and Cons of Human Developed Rules for Indexing Metadata

February 15, 2017

The article on Smartlogic titled The Future Is Happening Now puts forth the Semaphore platform as the technology filling the gap between NLP and AI when it comes to conversation. The article posits that in spite of the great strides in AI in the past 20 years, human speech is one area where AI still falls short. The article explains,

The reason for this, according to the article, is that “words often have meaning based on context and the appearance of the letters and words.” It’s not enough to be able to identify a concept represented by a bunch of letters strung together. There are many rules that need to be put in place that affect the meaning of the word; from its placement in a sentence, to grammar and to the words around – all of these things are important.

Advocating human developed rules for indexing is certainly interesting, and the author compares this logic to the process of raising her children to be multi-lingual. Semaphore is a model-driven, rules-based platform that allows us to auto-generate usage rules in order to expand the guidelines for a machine as it learns. The issue here is cost. Indexing large amounts of data is extremely cost-prohibitive, and that it before the maintenance of the rules even becomes part of the equation. In sum, this is a very old school approach to AI that may make many people uncomfortable.

Chelsea Kerwin, February 15, 2017

HonkinNews for 14 February 2017 Now Available

February 14, 2017

Want some tax love? HonkinNews explains that you can visit an H&R Block store front and “touch” IBM Watson. Sounds inviting, doesn’t it? You will also learn about the fate of Lexmark’s search and content businesses under the firm’s new ownership. Denmark has appointed an ambassador to Sillycon Valley. Perhaps Apple, Facebook, and Google really are nation states? Google’s cloud wizard has some job advice for the newly terminated. Perhaps dog training collars are a breakthrough for those eager to acquire news skills. Lucid Imagination became Lucidworks. Now the company has positioned itself to deliver Exalead style search based applications. The play did not work too well for Exalead, which wrote the book about SBAs. Will Lucidworks make the me-too strategy pay off for the company’s backers and their tens of millions of dollars? We also catalog the many ways to search using the Pixel phone. Whatever happened to universal search?  We reveal where to live if you want easy access to old fashioned book stores. No, it is not Harrod’s Creek, Kentucky. You can view the video at this link.

Kenny Toth, February 14, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta