Do Not Gamble. Own the Casino. The Google Way?

August 3, 2020

I read “Google’s Top Search Result?” What a surprise? No, not the fact that Google present Google-centric results at the top of mobile search results. The surprise is that until July 28, 2020, no one knew that Google’s magical algorithmic, math-is-objective, super duper relevance scooper got more Google goodies than any other “content producer.” Amazing.

In the good old days of big desktop anchor computers and monitors, there was screen real estate. Google filled the screen with objective results and, of course, some advertisements.

That was then; this is now. Mobile screens are mostly squint-generators. In order to be seen and generate clicks, the Google has to work overtime.

The challenges include:

  • Traffic, eyeballs, and individuals who will go ga-ga over that which is Googley.
  • Sizzle that will burn the greedy fingertips of competitors who want to be placed front and center.
  • Useful information for consumers. Yep, what Google displays eliminates the need to think. Advertisers who want to be listed on a Google Map. Something can be worked out.

A number of organizations have groused about Google’s magical algorithmic, math-is-objective, super duper relevance scooper.

What’s fascinating is that it has taken two decades for some people to understand the wisdom embedded in the observation, “Own the casino.”

Pretty good advice and someone at the GOOG took it.

Stephen E Arnold, August 3, 2020

Search and Predicting Behavior

August 3, 2020

DarkCyber is interested in predictive analytics. Bayesian and other “statistical methods” are a go-to technique, and they find their way into many of the smart software systems. Developers rarely explain that systems share many features and functions. Marketers, usually kept in the dark like mushrooms, are free to formulate an interesting assertion or two.

I read “Google Searches During Pandemic Hint at Future Increase in Suicide,” and I was not sure about the methodology. Nevertheless, the write up provides some insight into what can be wiggled from Google search data.

Specifically Columbia University experts have concluded that financial distress is “strongly linked to suicide.”


I learned:

The researchers used an algorithm to analyze Google trends data from March 3, 2019, to April 18, 2020, and identify proportional changes over time in searches for 18 terms related to suicide and known suicide risk factors.

What algorithm?

The method is described this way:

The proportion of queries related to depression was slightly higher than the pre-pandemic period, and moderately higher for panic attack.

Perhaps the researchers looked at the number of searches and noted the increase? So comparing raw numbers? Tenure tracks and grants await! Because that leap between search and future behavior…

Stephen E Arnold, August 3, 2020

Untangling Streaming: Responses to a Huge Web Search Fail

July 22, 2020

More and more users rely on a patchwork of internet streaming services for their video entertainment. Anyone who subscribes to several of these knows the time-wasting tedium of combing through different menus, each with a different UI, just to find something to watch. With even more proprietary streaming services on the horizon, it seems that problem is poised to grow. However, there are at least two apps that provide viable solutions—Reelgood and JustWatch. “These Two Underdog Apps Have Solved Streaming TV’s Biggest Headache,” Fast Company observes. Writer Jared Newman reports:

“Instead of making you bounce between disparate apps, both services can tell you what’s available on practically any streaming service. You can then add movies and shows to a watch list, get more suggestions based on your viewing habits, and even load their apps on your television to use as a centralized streaming menu. Compared to the app overload of most streaming devices, the universal guides offered by JustWatch and Reelgood seem like the ideal way to watch TV in the streaming era.”

Sounds helpful. But why does it take “underdog” apps to do what common sense suggests devices like Roku and Amazon Fire TV should already offer? There are several business reasons, we’re told, like Netflix’s resistance to the aggregation of its content or the fact that streaming services pay for placement on those platforms. As for Reelgood and JustWatch, they each have their own business models. It comes as no surprise that each involves user data. Newman writes:

“JustWatch says that … about 70% of its revenue comes from targeting users with movie trailers based on their viewing habits. For every movie or TV show users click on, JustWatch builds up a taste profile, then separates users into anonymized groups based on what they might like. Movie studios such as Universal and Paramount then give JustWatch a budget to target users with relevant video trailers on sites like Facebook and YouTube. … Reelgood, meanwhile, started from more of a Silicon Valley mindset of building up the product first and finding ways to monetize it later. Sanderson, a former ad product manager at Facebook, initially thought that would take the shape of recommendation-style targeted ads within the service, but lately the company’s been leaning more into selling access to its data.”

See the write-up for more on the business considerations and plans for each of these entities, big and small. There are other notable players in this arena, including TV Time, Simkl, Watchworthy, Wander, and VUniverse. It will be interesting to see where the market, and the technology, go from here.

Cynthia Murrell, July 22, 2020

Google Alerts: Lost in Cyber Space?

July 16, 2020

Check out these headlines from my Google Alert for the phrase “enterprise search”.


The Covid angle is back. Who publishes this type of news? An outfit called Daily Research Chronicles. An outstanding SEO outfit? Maybe?

And how about these high relevance links to my enterprise search alert?


Silicon steel, analog cameras, and dental film.

Sure, the alerts are a free service. Sure, an item every week or three points to something relevant.

But the spoofiness of the service from outfits like Daily Research Chronicles begs me to ask?

What about those quality and relevance algorithms, dearest Google?

Stephen E Arnold, July 16, 2020

Visual Search Engines Provide Different POV Than the Google

July 15, 2020

Google image search is the standard visual search tool people use. It does not, however, provide the extra kick needed for deeper dives, especially with all the Pinterest results. Tech Funnel addresses how visual search engines are an advantage for businesses as well as points out nine great ones in: “Popular 9 Visual Search Engines To Know.”

There are many benefits to using visual search, such as it that it connects with younger generations because they connect with images when they use social media and apps. They are far more likely to purchase an item through these platforms than a Web site. Visual search also allows people to emotionally connect with a brand than standard text and it boosts revenue as it will be the next way people search for items along with voice search.

Popular visual search engines include Pinterest Lens that allows users to take photos of items and they can find, save, or shop for them. Fashion retailers are already using it, so Pinterest users can find clothing their models wear. Google Lens is similar to Pinterest Lens, except its applications are more diverse. It can be used for translation, searching for items, places, people, etc.

Amazon Rekognitio, Instagram Shopping, Snapchat Camera Search, and eBay powered by Cassini search engine have visual search engines dedicated to searching and locating items from photos. They each have different aspects, but all perform the same function. Bing appears to be different:

“From the viewpoint of a user, the experience gotten from Bing Visual Search is similar to other various visual search platforms. However, its feature of an extensive developer platform makes it preferable by a lot of developers.

With Bing Visual Search, developers are enabled to instruct the search engine on the particular data people can get from a specific photo. This means that if Bing Visual Search directs an individual to a certain product on your website, the developer has the ability to determine what information should be provided to the visitor.”

CamFind and EasyJet are the most original engines, because they are not associated with shopping nor Google. CamFind is the first successful mobile visual engine that uses image detection. EasyJet allows people to book flights based off photos, so now you can finally discover where you screen wallpaper is located.

Whitney Grace, July 15, 2020

Search History: Mostly Forgotten and Definitely of Zero Interest to the Smart Software Crowd

July 10, 2020

There’s an interesting, if selective, write up about online information search and retrieval. Navigate to “The Bourne Collection: Online Search Is Older Than You Think.”

An interesting statement appears in the write up:

Founder Roger Summit had been part of Lockheed Missiles and Space Corporation’s mid-1960s Information Sciences Laboratory (1964). He had built his ideas about iterative search—a “dialog” between the user and the computer—into a separate online search division for Lockheed. (This was very different from the “take your best shot” approach of modern search engines, where you generally need to run a new search to refine irrelevant results). Dialog licensed access to leading databases in a variety of fields, which you could search with its powerful tools. While the overall amount of information was far smaller than on the modern web, it was far, far more relevant and better organized.

For the modern online experts, such a quaint, irrelevant, and inefficient concept.

Stephen E Arnold, July 10, 2020

The Myth of Data Federation: Not a New Problem, Not One Easily Solved

July 8, 2020

I read “A Plan to Make Police Data Open Source Started on Reddit.” The main point of this particular article is:

The Police Data Accessibility Project aims to request, download, clean, and standardize public records that right now are overly difficult to find.

Interesting, but I interpreted the Silicon Valley centric write up differently. If you are a marketer of systems which purport to normalize disparate types of data, aggregate them, federate indexes, and make the data accessible, analyzable, retrievable, and bang on dead simple — stop reading now. I don’t want to deal with squeals from vendors about their superior systems.

For the individual reading this sentence, a word of advice. Fasten your seat belt.

Some points to consider when reading the article cited above, listening to a Vimeo “insider” sales pitch, or just doing techno babble with your Spin class pals:

  1. Dealing with disparate data requires time and money as well as NOT ONE but multiple software tools.
  2. Even with a well resourced and technologically adept staff, exceptions require attention. A failure to deal with the stuff in the Exceptions folder can skew the outputs of some Fancy Dan analytic systems. Example: How about that Detroit facial recognition system? Nifty, eh?
  3. The flows of real time data are a big problem — are you ready for this — a challenge to the Facebooks, Googles, and Microsofts of the world. The reason is that the volume of data and CHANGES TO THOSE ALREADY PROCESSED ITEMS OF INFORMATION is a very, very tough problem. No, faster processors, bigger pipes, and zippy SSDs won’t do the job. The trouble lies within, the intradevice and intra software module flow. The fix is to sample, and sampling increases the risk of inaccuracies. Example: Remember Detroit’s facial recognition accuracy. The arrested individual may share some impressions with you.
  4. The baloney about “all” data or “any” type is crazy talk. When one deals with more than 18,000 police forces in the US, outputs from surveillance devices from different vendors, and the geodumps of individuals and their ad tracking beacons — this is going to be mashed up and made usable. Noble idea. There are many noble ideas.

Why am I taking the time to repeat what anyone with experience in large scale data normalization and analysis knows?

Baloney can be thinly sliced, smeared with gochujang, and served on Delft plates. Know what? Still baloney.

Gobble this:

Still, data is an important piece of understanding what law enforcement looks like in the US now, and what it could look like in the future. And making that information more accessible, and the stories people tell about policing more transparent, is a first step.

But the killer assumption is that the humans involved don’t make errors, systems remain online, and file formats are forever.

That baloney. It really is incredible. Just not what you think.

Stephen E Arnold, July 8, 2020

Search for Shopping: Still Room for Improvement

July 7, 2020

Targeted advertising is not the only way retailers can leverage all that personal data users have been forking over. Retail Times reports, “Findlogic Announces the Launch of AI-Powered Virtual Shopping Assistant, Lisa.” Lisa, huh? I guess Findlogic pays no heed to concerns around “female” virtual assistants. That tangent aside, the AI-powered tool is meant to reduce frustration for online shoppers and, in turn, facilitate to more completed sales. Writer Fiona Briggs tells us:

“Lisa returns on-site search based on an individual shopper’s buying intent signals in the context of a broad set of learnt user behaviors. This allows the solution to personalize results for each shopper, delivering more accurate search returns that connect customers to a desired product faster, moving them along the sales funnel and increasing conversion rates. By intelligently applying understanding to on-site search, Lisa helps shoppers better navigate product category or brand searches, which means that, rather than returning hundreds of options, the solution uses skills to refine results to bring shoppers to the exact product they are looking for quicker. Lisa also incorporates machine learning capabilities which allow it to learn and understand a shopper’s preferences and apply them to search, offering up personalized recommendations, which ranks the products the shopper is most likely to choose at the top of the list of results. Lisa also offers up intelligent ways to refine searches for generic keywords, using the application of a skill that then asks the user a set of questions to progress their search based on their individual requirements.”

Findlogic’s UK director emphasizes companies that put effort into getting shoppers to their websites in the first place are let down by traditional, keyword-based search systems that frustrate some 41% of potential customers. The company is betting this AI that can understand “intent” will change that. Based in Salzburg, Austria, Findlogic was founded in 2008.

Cynthia Murrell, July 7, 2020

Stupid Enterprise Search Promotions

July 6, 2020

Check out these incredibly silly pitches for the same market study about enterprise search:


This is an example of search engine optimization gaming the Google Alert system. Ridiculous SEO play and a ridiculous report.

The offending company appears to be:

Advance Market Analytics


Stephen E Arnold, July 6, 2020

Algolia Pricing

July 3, 2020

Years ago I listened to a wizard from Verity explain that a query should cost the user per cell. Now that struck me as a really stupid idea. Data sets were getting larger. The larger the data set, even extremely well crafted narrow queries would “touch” more cells. In a world of real time queries and stream processing, the result of the per cell model would be more than just interesting, it would be a deal breaker.

Pricing digital anything has been difficult. In the good old days of the late 1970s and early 1980s, one paid in many different ways — within the same system. The best example of this was the AT&T/British Telecom approach to online data.

Here’s what was involved. I am 77 and working from memory:

  1. Installation, set up, or preparation fee. This was dependent of factors such as location, distance from a node, etc.
  2. Base rate; that is, what one paid simply to be connected. This could be an upfront fee or calculated on some measurement which was intentionally almost impossible to audit or verify.
  3. Service required. Today this would be called bandwidth or connect time. The definition was slippery, but it was a way for the telcos of that era to add a fee.

If a connection went to a data center housing data, then other fees would kick in; for example:

  1. Hourly fee billed fractionally for the connect time to the database
  2. Per item fee when extracting data from the database
  3. A “print” or “type” fee which applied to the format of the data extracted
  4. A “report” fee because reports required cost recovery for the pre-coded template, query time, formatting, and outputting.

There were other fees, but the most fascinating one was the “threshold fee.” The idea is that paid for 60 minutes of connect time. When the 61st minute was required, the threshold was crossed, and the billing could go up, often by factors of 2X or more. No warning, of course. And the mechanism for calculating threshold fees were not disclosed to the normal customer. (After I became a contractor to Bell Communications Research, I learned that the threshold fees were determined based on “outside” or exogenous factors. In Bell Head speak this seemed to mean, “This is where we make even more money.”

To sum up, online pricing was a remarkable swamp. Little wonder that outsiders would be baffled at the online invoices generated by the online providers. Exciting, yes. Happy customers, nah. No one at the AT&T/British Telecom type outfits cared about non Bell Heads. No Young Pioneer T shirt? Ho, ho, ho. Pay your bill or we kill your account. Ho ho ho.

Algolia announced a new pricing plan. You can read about it here. The idea is to reduce confusion and be more “customer friendly.” What’s interesting to me is the string of comments on the Hacker News site. You can read these comments at this link.

There’s some back and forth with Algolia participating.

Some of the comments underscore the type of “surprise” that certain types of pricing models spark; for example, from alooPotato:

We (Streak) are in the same boat. Looks like we’d be paying approx half a million dollars a month on their new pricing which would be ~100x more than we are paying now. Haven’t heard from our enterprise rep but starting to get nervous… Sounds like the new pricing is for their ecommerce customers given how much value they provide them, doesn’t seem to make sense anymore for SaaS use cases.

ysavir takes a balanced view; that is, some good, some bad:

Not the GP, but I figure their point is as follows: If I’m running an e-commerce website, I don’t mind pay-per-search since those searches may turn into sales, so the cost is justified. My income scales with search count, and the Algolia price is part of user acquisition costs. If I’m running a SaaS business, the search is a feature for customers who have already paid, so I don’t see any further returns from the search being used. The more a client uses search, the less I’m profiting from having them as a client. They could potentially even cost me money to service them!

The point is that any pricing model — whether the AT&T/British Telecom type pricing “simplification” or a made-up, wacko approach like the IBM J1, J2, J3, etc. approach — is not going to meet the requirements of every customer.

The modern approach to pricing is to obfuscate and generate opaque variable prices. You can see this model in action by navigating to Amazon and running a query for “mens golf shirt and then zipping over to AWS and check out the prices for Sagemaker models to drive Athena. Got the difference, gentle reader?

The nifty world of enterprise search has been a wonderland of pricing methods. I flipped through the pricing data files for the three editions of the Enterprise Search Report which I began writing in 2002. Here are some highlights:

  • Base fee plus engineering services. Upgrades priced individually.
  • Base fee plus fixed price over a period of time.
  • Variable elements like the crazy “per cell” idea from the guy who is now the head of Google Search (Oh, yeah!)
  • Free if the customer (the US government) licensed other software
  • One time charge. Upgrades are easy. Buy another license.
  • Free. The vendor is in the business of selling engineering support, training, and custom widgets to make the search system sort of work.
  • Whatever can be billed. This is extremely popular because the negotiation process reveals the allocated funds and the search system vendor angles to get as much of the allocated cash as humanly possible.
  • Free for the first budget cycle. Then when funds become available, prices are negotiated.
  • Custom quote only. NDA required.

Today, life is easier. One can download a free and open source search system, hit the local university for some “interns”, and let ‘er rip. Another alternative is to look for a hosted search service. maybe?

Net net: Pricing has one goal: Generate revenue and lock in for the vendor. That’s one reason why vendors of what I can search centric services are so darned lovable.

Stephen E Arnold, July 3, 2020

Next Page »

  • Archives

  • Recent Posts

  • Meta