Algolia Pricing
July 3, 2020
Years ago I listened to a wizard from Verity explain that a query should cost the user per cell. Now that struck me as a really stupid idea. Data sets were getting larger. The larger the data set, even extremely well crafted narrow queries would “touch” more cells. In a world of real time queries and stream processing, the result of the per cell model would be more than just interesting, it would be a deal breaker.
Pricing digital anything has been difficult. In the good old days of the late 1970s and early 1980s, one paid in many different ways — within the same system. The best example of this was the AT&T/British Telecom approach to online data.
Here’s what was involved. I am 77 and working from memory:
- Installation, set up, or preparation fee. This was dependent of factors such as location, distance from a node, etc.
- Base rate; that is, what one paid simply to be connected. This could be an upfront fee or calculated on some measurement which was intentionally almost impossible to audit or verify.
- Service required. Today this would be called bandwidth or connect time. The definition was slippery, but it was a way for the telcos of that era to add a fee.
If a connection went to a data center housing data, then other fees would kick in; for example:
- Hourly fee billed fractionally for the connect time to the database
- Per item fee when extracting data from the database
- A “print” or “type” fee which applied to the format of the data extracted
- A “report” fee because reports required cost recovery for the pre-coded template, query time, formatting, and outputting.
There were other fees, but the most fascinating one was the “threshold fee.” The idea is that paid for 60 minutes of connect time. When the 61st minute was required, the threshold was crossed, and the billing could go up, often by factors of 2X or more. No warning, of course. And the mechanism for calculating threshold fees were not disclosed to the normal customer. (After I became a contractor to Bell Communications Research, I learned that the threshold fees were determined based on “outside” or exogenous factors. In Bell Head speak this seemed to mean, “This is where we make even more money.”
To sum up, online pricing was a remarkable swamp. Little wonder that outsiders would be baffled at the online invoices generated by the online providers. Exciting, yes. Happy customers, nah. No one at the AT&T/British Telecom type outfits cared about non Bell Heads. No Young Pioneer T shirt? Ho, ho, ho. Pay your bill or we kill your account. Ho ho ho.
Algolia announced a new pricing plan. You can read about it here. The idea is to reduce confusion and be more “customer friendly.” What’s interesting to me is the string of comments on the Hacker News site. You can read these comments at this link.
There’s some back and forth with Algolia participating.
Some of the comments underscore the type of “surprise” that certain types of pricing models spark; for example, from alooPotato:
We (Streak) are in the same boat. Looks like we’d be paying approx half a million dollars a month on their new pricing which would be ~100x more than we are paying now. Haven’t heard from our enterprise rep but starting to get nervous… Sounds like the new pricing is for their ecommerce customers given how much value they provide them, doesn’t seem to make sense anymore for SaaS use cases.
ysavir takes a balanced view; that is, some good, some bad:
Not the GP, but I figure their point is as follows: If I’m running an e-commerce website, I don’t mind pay-per-search since those searches may turn into sales, so the cost is justified. My income scales with search count, and the Algolia price is part of user acquisition costs. If I’m running a SaaS business, the search is a feature for customers who have already paid, so I don’t see any further returns from the search being used. The more a client uses search, the less I’m profiting from having them as a client. They could potentially even cost me money to service them!
The point is that any pricing model — whether the AT&T/British Telecom type pricing “simplification” or a made-up, wacko approach like the IBM J1, J2, J3, etc. approach — is not going to meet the requirements of every customer.
The modern approach to pricing is to obfuscate and generate opaque variable prices. You can see this model in action by navigating to Amazon and running a query for “mens golf shirt and then zipping over to AWS and check out the prices for Sagemaker models to drive Athena. Got the difference, gentle reader?
The nifty world of enterprise search has been a wonderland of pricing methods. I flipped through the pricing data files for the three editions of the Enterprise Search Report which I began writing in 2002. Here are some highlights:
- Base fee plus engineering services. Upgrades priced individually.
- Base fee plus fixed price over a period of time.
- Variable elements like the crazy “per cell” idea from the guy who is now the head of Google Search (Oh, yeah!)
- Free if the customer (the US government) licensed other software
- One time charge. Upgrades are easy. Buy another license.
- Free. The vendor is in the business of selling engineering support, training, and custom widgets to make the search system sort of work.
- Whatever can be billed. This is extremely popular because the negotiation process reveals the allocated funds and the search system vendor angles to get as much of the allocated cash as humanly possible.
- Free for the first budget cycle. Then when funds become available, prices are negotiated.
- Custom quote only. NDA required.
Today, life is easier. One can download a free and open source search system, hit the local university for some “interns”, and let ‘er rip. Another alternative is to look for a hosted search service. Blossom.com maybe?
Net net: Pricing has one goal: Generate revenue and lock in for the vendor. That’s one reason why vendors of what I can search centric services are so darned lovable.
Stephen E Arnold, July 3, 2020
Oh, Oh, Somebody Has Blown the Whistle on the Machine Learning Fouls
July 3, 2020
Wonder why smart software is often and quite spectacularly stupid? You can get a partial answer in “On Moving from Statistics to Machine Learning, the Final Stage of Grief.” There’s some mathiness in the write up. However, the author who tries to stand up to heteroskedastic errors, offers some useful explanations and good descriptions of the short cuts some of the zippy machine learning systems take.
Here’s a passage I found interesting:
As you can imagine, machine learning doesn’t let you side-step the dirty work of specifying your data and models (a.k.a. “feature engineering,” according to data scientists), but it makes it a lot easier to just run things without thinking too hard about how to set it up. In statistics, bad results can be wrong, and being right for bad reasons isn’t acceptable. In machine learning, bad results are wrong if they catastrophically fail to predict the future, and nobody cares much how your crystal ball works, they only care that it works.
Also this statement:
I like showing ridge regression as an example of machine learning because it’s very similar to OLS, but is totally and unabashedly modified for predictive purposes, instead of inferential purposes.
One problem is that those individuals who most need to understand why smart software is stupid are likely to struggle to understand this quite helpful explanation.
Math understanding is the problem. That lack of mathiness is why smart software is likely to remain like a very large, eager wet Newfoundland water dog shaking in the kitchen. Yep, the hairy beast is an outlier heteroskedastically speaking, of course.
Stephen E Arnold, July 3, 2020
Another Dust Up: A Consequence of Swisherism?
July 3, 2020
I associate Silicon Valley journalism with the dynamic duo of Swisher and Mossberg. The Walt has retired from the field of battle—almost. Kara Swisher sallies forth. The analytic approach taken by the “I” journalist has had a significant impact on others who want to reveal the gears, levers, and machine oil keeping the Silicon Valley factories running the way their owners and bankers intended.
Hence, Swisherism which I define as:
A critical look at Silicon Valley as a metaphor for the foibles of individuals who perceive themselves as smarter than anyone else, including those not in the room.
A good example of Swisherism’s consequences appears in “Silicon Valley Elite Discuss Journalists Having Too Much Power in Private App.” The write up is like a techno anime fueled with Jolt Cola.
For example:
During a conversation held Wednesday night on the invite-only Clubhouse app—an audio social network popular with venture capitalists and celebrities—entrepreneur Balaji Srinivasan, several Andreessen Horowitz venture capitalists, and, for some reason, television personality Roland Martin spent at least an hour talking about how journalists have too much power to “cancel” people and wondering what they, the titans of Silicon Valley, could do about it.
This is inside baseball given a dramatic twist. Big names (for some I suppose). A country-club app for insiders. An us versus them plot line worthy of Homer. The specter of retribution.
Yikes.
Even more interesting is that the article references a “recording” of what may have been perceived as a private conversation.
There’s nothing to inspire confidence like leaked recordings, right?
There is a sprinkling of foul language. A journalist becomes the target of interest. There is loaded language like “has been harassed and impersonated” to make sure that the reader understands that badness of the situation.
Swisherisms? Sort of, but the spirit is there. The under dog needs some support. Pitch in. Let’s make attitudes “better.” Rah rah.
I particularly like the use of Twitter as a weapon of myth destruction:
Lorenz’s tweet was immediately tweeted about by several Silicon Valley venture capitalists, most notably Srinivasan, who eventually made a seven-tweet thread in which he suggested Lorenz, and journalists like her, are “sociopaths.” That same day, a self-described Taylor Lorenz “parody” Twitter account started retweeting Srinivasan and other tech investors and executives critical of her work. The account’s bio also links to a website, also self-described as parody, which is dedicated to harassing Lorenz. (Twitter told Motherboard it deleted another account for impersonating Lorenz.)
“Lorenz” is the journalist who became the windmill toward which the Silicon Valley elite turned their digital lances.
Net net: Darned exciting. New type of “real” journalism. That’s the Swisherism in bright regalia. Snarkiness, insults, crude talk, and the other oddments of Silicon Valley excitement. No one like constructive criticism it seems. Politics, invective, overt and latent hostility, and a “you should do better” leitmotif. Sturm und drang to follow? Absolutely.
Stephen E Arnold, July 3, 2020
IBM Donates Projects to the Cause of Responsible AI
July 3, 2020
The first question arising was, “Was the marketing of Watson responsible?” But why rain on a virtue signaling parade? It is almost the 4th of July in IBM land?
The LF AI Foundation was formed to support open source innovation in artificial intelligence, machine learning, and deep learning. Now IBM has climbed on board, we learn from “IBM Donates ‘Trusted AI’ Projects to Linux Foundation AI” at ZDNet. In a blog post, the company promises these donations will help ensure AI deployments are fair, secure, and trustworthy. They will also facilitate the creation of such software by the open source community under the direction of the Linux Foundation. Journalist Stephanie Codon writes:
“Specifically, IBM is contributing the AI Fairness 360 Toolkit, the Adversarial Robustness 360 Toolbox and the AI Explainability 360 Toolkit. The AI Fairness 360 Toolkit allows developers and data scientists to detect and mitigate unwanted bias in machine learning models and datasets. Along with other resources, it provides around 70 metrics to test for biases and 11 algorithms to mitigate bias in datasets and models. The Adversarial Robustness 360 Toolbox is an open-source library that helps researchers and developers defend deep neural networks from adversarial attacks. Meanwhile, the AI Explainability 360 Toolkit provides a set of algorithms, code, guides, tutorials, and demos to support the interpretability and explainability of machine learning models. The LFAI’s Technical Advisory Committee voted earlier this month to host and incubate the project, and IBM is currently working with them to formally move them under the foundation. IBM joined the LFAI last year and helped established its Trusted AI Committee, which is working towards defining and implementing principles of trust in AI deployments.”
Plus a foundation can deal with any political or legal issues, perhaps? The article notes that governments are taking a serious interest in AI governance. The EU released a white paper on the topic in February, and 14 countries and the EU are teaming up in the Global Partnership on Artificial Intelligence (GPAI). It is about time governing bodies woke up to the effects unchecked AI can have on our communities. Now about the Watson Covid, the avocado festival, and the game show?
Cynthia Murrell, July 3, 2020
Techno-Grousing: A New Analytic Method?
July 3, 2020
Two items snagged my attention as my team and I were finishing the pre-recorded lecture about Amazon policeware for the upcoming National Cyber Crime Conference.
The first is a mostly context free item from a Silicon Valley type “real” news outfit. The article’s title is:
Hany Farid Says a Reckoning Is Coming for Toxic Social Media
The item comes from one of the technology emission centers in the San Francisco / Silicon Valley region: A professor at the University of California, Berkeley.
What’s interesting is that Hany Farid is activating a klaxon that hoots:
In five years, I expect us to have long since reached the boiling point that leads to reining in an almost entirely unregulated technology sector to contend with how technology has been weaponized against individuals, society, and democracy.
Insight? Prediction? Anticipatory avoidance?
After decades of supporting, advocating, and cheerleading technology — now, this moment, is the time to be aware that change is coming. Who is responsible? The media is a candidate, people who disseminate misinformation, and bad actors.
Sounds good. What about educators? Well, not mentioned.
The other item comes from the Jakarta Post. You can find the story at this link. I have learned that mentioning the entity the story discusses results in my blog post being skipped by certain indexing systems. Hey, that’s a surprise, right?
The point of the write up is that a certain social media site is now struggling with increased feistiness among otherwise PR influenced users.
What’s interesting is that suddenly, like the insight du jour from the Berkeley professor, nastiness is determined to be undesirable.
The fix for the social media outfit is simple: Get out of line and you will be blocked from the service. There’s nothing so comforting as hitting the big red cancel button.
Turning battleships quickly can have interesting consequences. The question is, “What if the battleship’s turn has unforeseen consequences?”
Stephen E Arnold, July 3, 2020
MIT and Being Smart
July 3, 2020
When I hear “MIT”, I think Jeffrey Epstein. Sorry. Imprinting at work. I read “MIT Apologizes, Permanently Pulls Offline Huge Dataset That Taught AI Systems to Use Racist, Misogynistic Slurs.” Yep, that the MIT which trains smart people today.
The write up reports:
Vinay Prabhu, chief scientist at UnifyID, a privacy startup in Silicon Valley, and Abeba Birhane, a PhD candidate at University College Dublin in Ireland, pored over the MIT database and discovered thousands of images labeled with racist slurs for Black and Asian people, and derogatory terms used to describe women. They revealed their findings in a paper undergoing peer review for the 2021 Workshop on Applications of Computer Vision conference.
Presumably the demise of Mr. Epstein prevented him from scrutinizing the dataset for appropriate candidates.
Error corrected. Apology emitted. Another outstanding example of academic excellence engraved in digital history.
Stephen E Arnold, July 3, 2020
Google and the EU: Bureaucracy Versus Clicks
July 2, 2020
The Google is providing a “free” Web search system. The European Union seems unwilling or unable to understand the logic of providing a “free” service.
“EU Throws New Rule Book at Google, Tech Giants in Competition Search” explains:
Driven in large part by a conclusion that multiple antitrust actions against Google have been ineffectual, the EU’s new strategy aims to lay down ground rules for data-sharing and how digital marketplaces operate.
What’s the EU going to do?
So as US antitrust enforcers prepare yet another possible case against Google, the EU’s Digital Services Act (DSA) could instead force big tech firms to offer smaller rivals access to data on reasonable, standardized and non-discriminatory terms.
Sounds good. The problem may be that Google — like other US technology centric monopolies — operate in a digital environment.
Regulatory authorities operate in a bureaucratic environment. Like the Great Firewall of China, digital information seeps through barriers.
Maybe the regulators should consider other options? Meetings, fines, and white papers are ideal complements to levying fines which appear to have minimal impact.
Like advertisers boycotting Facebook, the digital monopolies continue to accrue clicks and revenue.
After two decades of consistent digital behavior, regulatory methods seem to be consistently ineffective.
Stephen E Arnold, July 2, 2020
Facebook Ad Boycott Risk: The Mark of El Zucko
July 2, 2020
I have a general rule: Those with power are likely to stomp on little people like me. What happens when companies that need access to Facebook users get cute with El Zucko?
Mr. Zuckerberg may not have a sword like El Zorro’s, but he has a digital cattle probe, and he can crank up the voltage.
Moral: A big advertiser better be a heck of a lot bigger than El Zucko, or the advertiser will end up with some memorable Facebook moments. Not all of these love taps with the cattle probe will be “likes.”
The trust outfit published “Facebook Frustrates Advertisers As Boycott over Hate Speech Kicks Off.” The message I carried away from the trust outfit’s “real” news story was that Facebook keeps on being Facebook.
Let’s consider the advertisers’ options:
First, advertisers can route their digital advertising to services which disseminate content on AdF.ly type networks. If you are not familiar with this fine option, check it out. If AdF.ly is a bit too avant garde, there is lovable Alphabet Google YouTube. Ads can appear in interesting contexts. Because the AGY systems are dynamic, one may not know where ads appear. Not to worry, right?
Second, advertisers can run into the arms of those lovable Amazonians. Pitching consulting services on Amazon is tricky, but it is not impossible. Options range from zippy videos for the Twitch.tv consumers, or one can team up with a vendor of something and package one’s consulting service with the tangible product as an after purchase “training” or “support” option.
Third, advertisers can hunt down the ad sales professionals at print publications. These individuals are easy to spot. Their schedules are vacant like their eyes. Well, maybe that is a haunted look related to fear. Just buy space in ever popular publications like the local newspaper. Alternatively why not buy double truck ads in the Wall Street Journal and the New York Times. Those must work. IBM ran it’s “we are in a yellow submarine” ad a few days ago.
Fourth, advertisers can pay search engine optimization experts to pump their message hither and yon using every conceivable type of digital channel available. Everyone loves irrelevant content and links to big company Web sites where emails can be provided and money spent.
Fifth, hang it up. Emulate the businesses which are closing. Blame it on the pandemic, the surge, or whatever.
Net net: Facebook for the foreseeable future has considerable power. El Zucko can keep on doing what he does best; that is, whatever he wants. When he decides to raise ad rates and change the rules of his game, he will. There are ways to implement differential pricing and other types of hair shirt freebies for certain advertisers.
The mark of El Zucko may be a painful burn and a giant Z on an expanse of advertiser skin in the game.
Stephen E Arnold, July 2, 2020
Neeva: To the Rescue?
July 2, 2020
After the 2017 scandal involving YouTube ads, Google’s head of advertising left the company. However, Sridhar Ramaswamy was not finished with search; he promised then to find another way that did not depend on ads. Now we learn subscription service Neeva is that promised approach from Ars Technica’s article, “Search Engine Startup Asks Users to Be the Customer, not the Product.” Not only does paying to search through Neeva allow one to avoid ads, the platform vows to respect user privacy, as well.
There are just a couple, fundamental problems. First, will enough users actually pay to search when they are used to Googling for free? Critics suspect most users will opt to accept ads over paying a fee. As for the privacy promise, we already have (ad-supported) privacy-centric search platforms DuckDuckGo and Startpage. Besides, though Neeva’s “Digital Bill of Rights” that dominates the company’s About page sounds nice, the official Privacy Policy linked in the site’s footers prompts doubt. Reporter Jim Salter writes:
“Neeva opens that section by saying it does not share, disclose, or sell your personal information with third parties ‘outside of the necessary cases below’—but those necessary cases include ‘Affiliates,’ with the very brusque statement that Neeva ‘may share personal information with our affiliated companies.’ Although the subsections on both Service Providers and Advertising Partners are hedged with usage limitations, there are no such limits given for data shared with ‘Affiliates.’ The document also provides no concrete definition of who the term ‘Affiliates’ might refer to, or in what context.
We noted:
“More security-conscious users should also be aware of Neeva’s Data Retention policy, which simply states ‘we store the personal information we receive as described in this Privacy Policy for as long as you use our Services or as necessary to fulfill the purposes for which it was collected… [including pursuit of] legitimate business purposes.’ Given that the data collection may include direct connection to a user’s primary Google or Microsoft email account, this might amount to a truly unsettling volume of personal data—data that is now vulnerable to compromise of Neeva’s services, as well as use or sale (particularly in the case of acquisition or merger) by Neeva itself.”
Neeva is currently in beta testing, but anyone still interested can sign up to be an early tester on waitlist at the bottom of this blog post. Though Neeva has yet to set a price for its subscription, we’re told it should be under $10 per month.
Cynthia Murrell, July 2, 2020
Google and Winston: Confusing Relationship for Sure
July 2, 2020
Computer glitches happen, even at large companies like Google. The timing of this one, though, looks a little suspicious. The Belfast Telegraph reports, “Google Says Churchill Image Missing Because of Bug in System.” The problem occurred just as the former prime minister’s statue was being walled away to protect it from protesters. Writer Martyn Landi explains:
“Winston Churchill’s image briefly disappeared from Google search results because it was being updated to be more representative of the former prime minister, the tech giant has said. However, that update had been delayed by a bug in Google’s system, the firm said in a statement. It comes after some users complained that Churchill’s image was not appearing in search results for UK prime ministers, although his name was still listed. Culture Secretary Oliver Dowden was among those to express ‘concern’ and said he had spoken to the tech giant over the incident, which occurred during the ongoing debate about Churchill’s statue in Parliament Square, which was boarded up last week.”
Despite the timing, Google insists the snafu had nothing to do with the statue, protestors, or the former prime minister’s alleged racism. The search platform’s Knowledge Graph had been pulling a picture of Churchill from his younger days, which is not the iconic image most of us are familiar with. Googley humans blocked that image from the algorithm, forcing it to choose another one. Between those steps, however, the mysterious “bug” halted the update. Users searching for Churchill received only portrait-free text descriptions. The company stated:
“As a result, Churchill’s entry lacked an image from late April until this weekend, when the issue was brought to our attention and resolved soon after. We apologize again for concerns caused by this issue with Sir Winston Churchill’s Knowledge Graph image. We will be working to address the underlying cause to avoid this type of issue in the future.”
Just what this bug entailed is not revealed. Sounds like a “dog ate my homework” response to us.
Cynthia Murrell, July 2, 2020