Algolia and Its View of the History of Search: Everyone Has an Opinion

August 11, 2021

Search is similar to love, patriotism, and ethical behavior. Everyone has a different view of the nuances of meaning with a specific utterance. Agree? Let’s assume you cannot define one of these words in a way that satisfies a professor from a mid tier university teaching a class to 20 college sophomores who signed up for something to do with Western philosophy: Post Existentialism. Imagine your definition. I took such a class, and I truly did not care. I wrote down the craziness the brown clad PhD provided, got my A, and never gave that stuff a thought. And you, gentle reader, are you prepared to figure out what an icon in an ibabyrainbow chat stream “means.” We captured a stream for one of my lectures to law enforcement in which she says, “I love you.” Yeah, right.

Now we come to “Evolution of Search Engines Architecture – Algolia New Search Architecture Part 1.” The write up explains finding information, and its methods through the lens of Algolia, a publicly traded firm. Search, which is not defined, characterizes the level of discourse about findability. The write up explains an early method which permitted a user to query by key words. This worked like a champ as long as the person doing the search knew what words to use like “nuclear effects modeling”.

The big leap was faster computers and clever post-Verity methods of getting distributed index to mostly work. I want to mention that Exalead (which may have had an informing role to play in Algolia’s technical trajectory) was a benchmark system. But, alas, key words are not enough. The Endeca facets were needed. Because humans had to do the facet identification, the race was on to get smart software to do a “good enough” job so old school commercial database methods could be consigned to a small room in the back of a real search engine outfit.

Algolia includes a diagram of the post Alta Vista, post Google world. The next big leap was scaling the post Google world. What’s interesting is that in my experience, most search problems result in processing smaller collections of information containing disparate content types. What’s this mean? When were you able to use a free Web search system or an enterprise search system like Elastic or Yext to retrieve text, audio, video, engineering drawings and their associated parts data, metadata from surveilled employee E2EE messages, and TikTok video résumés or the wildly entertaining puff stuff on LinkedIn? The answer is and probably will be for the foreseeable future, “No.” And what about real time data, the content on a sales person’s laptop with the changed product features and customer specific pricing. Oh, right. Some people forget about that. Remember. I am talking about a “small” content set, not the wild and crazy Internet indexes. Where are those changed files on the Department of Energy Web site? Hmmm.

The fourth part of the “evolution” leaps to keeping cloud centric, third party hosted chugging along. Have you noticed the latency when using the OpenText cloud system? What about the display of thumbnails on YouTube? What about retrieving a document from a content management system before lunch, only to find that the system reports, “Document not found.” Yeah, but. Okay, yeah but nothing.

The final section of the write up struck me as a knee slapper. Algolia addresses the “current challenges of search.” Okay, and what are these from the Algolia point of view: The main points have to do with using a cloud system to keep the system up and running without trashing response time. That’s okay, but without a definition of search, the fixes like separating search and indexing may not be the architectural solution. One example is processing streams of heterogeneous data in real time. This is a big thing in some circles and highly specialized systems are needed to “make sense” of what’s rushing into a system. Now means now, not a latency centric method which has remain largely unchanged for – what? — maybe 50 years.

What is my view of “search”? (If you are a believer that today’s search systems work, stop reading.) Here you go:

  1. One must define search; for example, chemical structure search, code search, HTML content search, video search, and so on. Without a definition, explanations are without context and chock full of generalizations.
  2. Search works when the content domain is “small” and clearly defined. A one size fits all content is pretty much craziness, regardless of how much money an IPO’ed or SPAC’ed outfit generates.
  3. The characteristic of the search engines my team and I have tested over the last — what is it now, 40 or 45 years — is that whatever system one uses is “good enough.” The academic calculations mean zero when an employee cannot locate the specific item of information needed to deal with a business issue or a student wants to locate a source for a statement from a source about voter fraud. Good enough is state of the art.
  4. The technology of search is like a 1962 Corvette. It is nice to look at but terrible to drive.

Net net: Everyone is a search expert now. Yeah, right. Remember: The name of the game is sustainable revenue, not precision and recall, high value results, or the wild and crazy promise that Google made for “universal search”. Ho ho ho.

Stephen E Arnold, August 11, 2021

Explaining Me-Too, Copycatting, and What Might be Theft

August 10, 2021

Consider TikTok. LinkedIn has found itself behind the video résumé eight ball. The Google, not to be left out of the short video game, is rolling out YouTube Shorts. Despite the hoo-ha output on a podcast with two alpha wizards, the Shorts thing is a lot like TikTok. Instead of China’s watchful eye, the Google just wants to keep advertisers happy, really, really happy. Which is better? Wow, that’s an interesting question when one defines “better.” I don’t know what better means, but you may.

I read “Is Iterated Amplification Really More Powerful Than Imitation?” For me, the write up is a logical differentiation within the adage “If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.” Yep, that smart software has identified actor A and a match for a known bad actor. Works great. Sort of.

However, the duck analogy is of little utility in the work from home, thumb typing world. Thus, enter the phrase iterated amplification. The euphony is notable, but I am a duck kind of person.

The write up explains in a way which evokes memories of my one and only college course in semantics. Who taught this gem? A PhD from somewhere east of Russia who aspired to make the phoneme –er his life’s work. The –er in which he was planning to leverage into academic fame was its usage in words like hamburger. Okay, I will have a duckburger with pickles and a dash of -mustard. So it is not me-too; it is imitative, iterated amplification. (And the logical end point is similar, if not almost identical systems. A bit like ducks maybe?)

I learned in the essay:

iterated amplification is not necessarily the most efficient way to create a more powerful AI, and a human mimics would have the flexibility to choose other techniques. Current AI researchers don’t usually try to increase AI capabilities by iterated amplifications, but instead by coming up with new algorithms.

“New algorithms?” I thought today’s smart software recycled code chunks from open sources, friendly smart cloud outfits like Amazon, Google, IBM (excited am I), Microsoft, and others.

The innovation might be interpreted as playing with thresholds and fiddling with the knobs and dials on procedures explained in class or by textbook writers like Peter Norvig. Lectures on YouTube can be helpful too. Maybe what works best is a few smart interns who are given the message: Adjust until we get something we can use.

Duck analogy: Google’s DeepMind has been me-too’ed by other outfits. These outfits have mostly forgotten where the method originated. Upon inspection, the method may have been the outcome of a classroom assignment.

That’s why facial recognition systems and other applications of smart software often generate me too, me too outputs; that is, misidentification is more reliable than identification. With recognition ranging from 30 percent confidence to 90 percent confidence, there’s some room for error. Actually, there’s room for darned crazy errors and fascinating secondary consequences.

Just admit that “innovation” is not much different from a duck. And imitation is less costly than doing the original thinking work. Revenue, not bright ideas, are more reliable than cooking up a power from the air scheme.

Stephen E Arnold, August 10, 2021

Another Perturbation of the Intelware Market: Apple Cores Forbidden Fruit

August 6, 2021

It may be tempting for some to view Apple’s decision to implement a classic man-in-the-middle process. If the information in “Apple Plans to Scan US iPhones for Child Abuse Imagery” is correct, the maker of the iPhone has encroached on the intelware service firms’ bailiwick. The paywalled newspaper reports:

Apple intends to install software on American iPhones to scan for child abuse imagery

The approach — dubbed ‘neuralMatch’ — is on the iPhone device, thus providing functionality substantially similar to other intelware vendors’ methods for obtaining data about a user’s actions.

The article concludes:

According to people briefed on the plans, every photo uploaded to iCloud in the US will be given a “safety voucher” saying whether it is suspect or not. Once a certain number of photos are marked as suspect, Apple will enable all the suspect photos to be decrypted and, if apparently illegal, passed on to the relevant authorities.

Observations:

  1. The idea allows Apple to provide a function likely to be of interest to law enforcement and intelligence professionals; for example, requesting a report about a phone with filtered and flagged data are metadata
  2. Specialized software companies may have an opportunity to refine existing intelware or develop a new category of specialized services to make sense of data about on-phone actions
  3. The proposal, if implemented, would create a PR opportunity for either Apple or its critics to try to leverage
  4. Legal issues about the on-phone filtering and metadata (if any) would add friction to some legal matters.

One question: How similar is this proposed Apple service to the operation of intelware like that allegedly available from the Hacking Team, NSO Group, and other vendors? Another question: Is this monitoring a trial balloon or has the system and method been implemented in test locations; for example, China or an Eastern European country?

Stephen E Arnold, August 6, 2021

Deteching: Not Possible, Muchachos

August 6, 2021

Don’t become an Enterprise/IT Architect…” contains a small truth and a Brobdingnagian baby.

The small truth is, according to the article:

there are two speeds in IT: change is slow, growth is fast(-ish). Even if upper management (and many others, but the focus of this post is directed at the gap between ‘top’ and ‘bottom’) thinks they understand the complexity and effects, in reality, most of the time they have no clue as to the actual scale of the problem…

The idea is that there is a permanent break in the cable linking the suits with the people who have desks littered with usb keys, scraps of paper, and technical flotsam and jetsam.

Now for the Big Boy truth:

The frustration is that it will become harder to explain the ‘top’ what is going on and it will be particularly difficult to convince. This is especially true if that top has no interest in actually paying attention, because then it will be even harder as the first difficult step is to get them to hear you out.

What’s this mean for little problems like the SolarWinds’ misstep? What’s this mean for making informed decisions about cloud versus on premises or hybrid versus cloud, etc.? What’s this mean for making deteriorating systems actually work; for example, monopoly provided services which experience continuous and apparently unfixable flaws?

Big and small appear to be forcing a shift to a detech world; that is, one in which users (people or entities) have no choice but to go back to the methods which can be understood and which work. A good example is a paper calendar, not a zippy do, automated kitchen sink solution which is useless when one of the niggling issues causes problems.

As I said, SolarWinds: A misstep. Cyber security solutions that don’t secure anything. Printing modules which don’t print.

Detech. No choice, muchachos.

Stephen E Arnold, August 6, 2021

Facial Recognition: More Than Faces

July 29, 2021

Facial recognition software is not just for law enforcement anymore. Israel-based firm AnyVision’s clients include retail stores, hospitals, casinos, sports stadiums, and banks. Even schools are using the software to track minors with, it appears, nary a concern for their privacy. We learn this and more from, “This Manual for a Popular Facial Recognition Tool Shows Just How Much the Software Tracks People” at The Markup. Writer Alfred Ng reports that AnyVision’s 2019 user guide reveals the software logs and analyzes all faces that appear on camera, not only those belonging to persons of interest. A representative boasted that, during a week-long pilot program at the Santa Fe Independent School District in Texas, the software logged over 164,000 detections and picked up one student 1100 times.

There are a couple privacy features built in, but they are not turned on by default. “Privacy Mode” only logs faces of those on a watch list and “GDPR Mode” blurs non-watch listed faces on playbacks and downloads. (Of course, what is blurred can be unblurred.) Whether a client uses those options depends on its use case and, importantly, local privacy regulations. Ng observes:

“The growth of facial recognition has raised privacy and civil liberties concerns over the technology’s ability to constantly monitor people and track their movements. In June, the European Data Protection Board and the European Data Protection Supervisor called for a facial recognition ban in public spaces, warning that ‘deploying remote biometric identification in publicly accessible spaces means the end of anonymity in those places.’ Lawmakers, privacy advocates, and civil rights organizations have also pushed against facial recognition because of error rates that disproportionately hurt people of color. A 2018 research paper from Joy Buolamwini and Timnit Gebru highlighted how facial recognition technology from companies like Microsoft and IBM is consistently less accurate in identifying people of color and women. In December 2019, the National Institute of Standards and Technology also found that the majority of facial recognition algorithms exhibit more false positives against people of color. There have been at least three cases of a wrongful arrest of a Black man based on facial recognition.”

Schools that have implemented facial recognition software say it is an effort to prevent school shootings, a laudable goal. However, once in place it is tempting to use it for less urgent matters. Ng reports the Texas City Independent School District has used it to identify one student who was licking a security camera and to have another removed from his sister’s graduation because he had been expelled. As Georgetown University’s Clare Garvie points out:

“The mission creep issue is a real concern when you initially build out a system to find that one person who’s been suspended and is incredibly dangerous, and all of a sudden you’ve enrolled all student photos and can track them wherever they go. You’ve built a system that’s essentially like putting an ankle monitor on all your kids.”

Is this what we really want as a society? Never mind, it is probably a bit late for that discussion.

Cynthia Murrell, July 29, 2021

Google: API Promise, Circa 2021

July 27, 2021

If you are not familiar with “Google Data APIs Client Library (1.41.1),” it is worth a look. You will notice that there is a table of contents to:

An industrious online search wizard can locate other APIs consigned to the Google bit bin; for example, Transformics’ contributions and the much-loved Orkut (loved at least by some innovating individuals in Brazil and a handful of lawyers).

Fresh from this walk down API Memory Lane, navigate to “How Google Cloud Plans to Kill Its ‘Killed By Google’ Reputation.” The write up reports:

Under the new Google Enterprise APIs policy, the company is making a promise that its services will remain available and stable far into the future….The announcement is clear recognition of widespread feedback from Google Cloud customers and outright derision in several corners of the internet regarding Google’s historic reputation for ending support for its APIs without sufficient notice or foresight. The canonical example was probably the company’s decision to shutter Google Reader in 2013 with just a couple of months’ notice, which led to a torrent of criticism that persists today.

Google doesn’t want to leave any customer behind. How did that type of assertion work out for “No Child Left Behind”?

The “new” Google wants to be the “real” Google. That’s going to be a hill to climb with the Bezos bulldozer reworking the cloud landscape and Microsoft (the all time champion of great security) leveraging the brilliant individuals trying to use Excel and Word.

Yasmine El Rashidi allegedly said:

When you have a dream and someone makes promises they keep breaking, it is hard to recover. You lose hope.

What’s this mean for the ad supported Google? Will you promise to give a honest answer and provide factual back up? Yikes, your proof was in disappeared photos on a deprecated Google service. Thus, whatever you wish to say is meaningless at this time. Maybe there is a copy on Google’s never forget subsystem?

Stephen E Arnold, July 27, 2021

NSO Group: The Rip in the Fabric of Intelware

July 22, 2021

A contentious relationship with the “real news” organizations can be risky. I have worked at a major newspaper and a major publisher. The tenacity of some of my former colleagues is comparable to the grit one associates with an Army Ranger or Navy Seal, just with a slightly more sensitive wrapper. Journalists favored semi with it clothes, not bushy beards. The editorial team was more comfortable with laptops than an F SCAR.

Communications associated with NSO Group — the headline magnet among the dozens of Israel-based specialized software companies (an very close in group by the way)— may have torn the fabric shrouding the relationship among former colleagues in the military, government agencies, their customers, and their targets.

Whose to blame? The media? Maybe. I don’t have a dog in this particular season’s of fights. The action promises to be interesting and potentially devastating to some comfortable business models. NSO Group is just one of many firms working to capture the money associated with cyber intelligence and cyber security. The spat between the likes of journalists at the Guardian and the Washington Post and NSO Group appears to be diffusing like spilled ink on a camouflage jacket.

I noted “Pegasus Spyware Seller: Blame Our Customers Not Us for Hacking.” The main point seems to be that NSO Group allegedly suggests that those entities licensing the NSO Group specialized software are responsible for their use of the software. The write up reports:

But a company spokesman told BBC News: “Firstly, we don’t have servers in Cyprus.

“And secondly, we don’t have any data of our customers in our possession.

“And more than that, the customers are not related to each other, as each customer is separate.

“So there should not be a list like this at all anywhere.”

And the number of potential targets did not reflect the way Pegasus worked.

“It’s an insane number,” the spokesman said.

“Our customers have an average of 100 targets a year.

“Since the beginning of the company, we didn’t have 50,000 targets total.”

For me, the question becomes, “What controls exist within the Pegasus system to manage the usage of the surveillance system?” If there are controls, why are these not monitored by an appropriate entity; for example, an oversight agency within Israel? If there are no controls, has Pegasus become an “on premises” install set up so that a licensee has a locked down, air tight version of the NSO Group tools?

The second item I noticed was “NSO Says ‘Enough Is Enough,’ Will No Longer Talk to the Press About Damning Reports.” At first glance, I assumed that an inquiry was made by the online news service and the call was not returned. That happens to me several times a day. I am an advocate of my version of cancel culture. I just never call the entity again and move on. I am too old to fiddle with the egos of a younger person who believes that a divine entity has given that individual special privileges. Nope, delete.

But not NSO Group. According to the write up:

“Enough is enough!” a company spokesperson wrote in a statement emailed to news organizations. “In light of the recent planned and well-orchestrated media campaign lead by Forbidden Stories and pushed by special interest groups, and due to the complete disregard of the facts, NSO is announcing it will no longer be responding to media inquiries on this matter and it will not play along with the vicious and slanderous campaign.” NSO has not responded to Motherboard’s repeated requests for comment and for an interview.

Okay, the enough is enough message is allegedly in “writing.” That’s better than a fake message disseminated via TikTok. However, the “real journalists” are likely to become more persistent. Despite a lack of familiarity with the specialized software sector, a large number of history majors and liberal arts grads can do what “real” intelligence analysts do. Believe me, there’s quite a bit of open source information about the cozy relationship within and among Israel’s specialized software sector, the interaction of these firms with certain government entities, and public messages parked in unlikely open source Web sites to keep the “real” journalists learning, writing, and probing.

In my opinion, allowing specialized software services to become public; that is, actually talk about the capabilities of surveillance and intercept systems was a very, very bad idea. But money is money and sales are sales. Incentive schemes for the owners of specialized software companies guarantee than I can spend eight hours a day watching free webinars that explain the ins and outs of specialized software systems. I won’t but some of the now ignited flames of “real” journalism will. They will learn almost exactly what is presented in classified settings. Why? Capabilities when explained in public and secret forums use almost the same slide decks, the same words, and the same case examples which vary in level of detail presented. This is how marketing works in my opinion.

Observations:

1. A PR disaster is, it appears, becoming a significant political issue. This may pose some interesting challenges within the Israel centric specialized software sector. NSO Group’s system ran on cloud services like Amazon’s until AWS allegedly pushed Pegasus out of the Bezos stable.

2. A breaker of the specialized software business model of selling to governments and companies. The cost of developing, enhancing, and operating most specialized software systems keeps companies on the knife edge of solvency. The push into commercial use of the tools by companies or consumerizing the reports means government contracts will become more important if the non-governmental work is cut off. Does the world need several dozen Dark Web indexing outfits and smart time line and entity tools? Nope.

3. A boost to bad actors. The reporting in the last week or so has provided a detailed road map to bad actors in some countries about [a] What can be done, [b] How systems like Pegasus operate, [c] the inherent lack of security in systems and devices charmingly labeled “insecure by design” by a certain big software company, and [d] specific pointers to the existence of zero day opportunities in blast door protected devices. That’s a hoot at ??????? ???? “Console”.

Net net: The NSO Group “matter” is a very significant milestone in the journey of specialized software companies. The reports from the front lines will be fascinating. I anticipate excitement in Belgium, France, Germany, Israel, the United Kingdom, and a number of other countries. Maybe a specialized software Covid Delta?

Stephen E Arnold, July 22, 2021

Online Anonymity: Maybe a Less Than Stellar Idea

July 20, 2021

On one hand, there is a veritable industrial revolution in identifying, tracking, and pinpointing online users. On the other hand, there is the confection of online anonymity. The idea is that by obfuscation, using a fake name, or hijacking an account set up for one’s 75 year old spinster aunt — a person can be anonymous. And what fun some can have when their online actions are obfuscated either by cleverness, Tor cartwheels, and more sophisticated methods using free email and “trial” cloud accounts. I am not a big fan of online anonymity for three reasons:

  1. Online makes it easy for a person to listen to one’s internal demons’ chatter and do incredibly inappropriate things. Anonymity and online, in my opinion, are a bit like reverting to 11 year old thinking often with an adult’s suppressed perceptions and assumptions about what’s okay and what’s not okay.
  2. Having a verified identity linked to an online action imposes social constraints. The method may not be the same as a small town watching the actions of frisky teens and intervening or telling a parent at the grocery that their progeny was making life tough for the small kid with glasses who was studying Lepidoptera.
  3. Individuals doing inappropriate things are often exposed, discovered, or revealed by friends, spouses angry about a failure to take out the garbage, or a small investigative team trying to figure out who spray painted the doors of a religious institution.

When I read “Abolishing Online Anonymity Won’t Tackle the Underlying Problems of Racist Abuse.” I agree. The write up states:

There is an argument that by forcing people to reveal themselves publicly, or giving the platforms access to their identities, they will be “held accountable” for what they write and say on the internet. Though the intentions behind this are understandable, I believe that ID verification proposals are shortsighted. They will give more power to tech companies who already don’t do enough to enforce their existing community guidelines to protect vulnerable users, and, crucially, do little to address the underlying issues that render racial harassment and abuse so ubiquitous.

The observation is on the money.

I would push back a little. Limiting online use to those who verify their identity may curtail some of the crazier behaviors online. At this time, fractious behavior is the norm. Continuous division of cultural norms, common courtesies, and routine interactions destroys.

My thought is that changing the anonymity to real identity might curtail some of the behavior online systems enable.

Stephen E Arnold, July 20, 2021

Amazon: Its True Classiness Shines Bright

July 16, 2021

Richard Branson, the UK winner from the land of those who cannot kick straight, flew to the edge of space. If Mr. Branson could get into real space, I assume he could have swapped the Hubble telescope’s computer and made the darned thing work. He did not get into “real” space, and ever classy Amazon apparently suggested that Mr. Branson’s well-publicized ride “didn’t go high enough to make it to what most people consider space.” Hey, this is not my quote. I pulled it from the Metro UK online information service, a super high quality source of “real” news. For the original diss, click here.

The classy Amazon-linked observation points out:

[Amazon] claims that its own spacecraft, New Shepard, makes it above a well-defined boundary between Earth and outer space known as the Kármán Line.

I prefer the Hubble fix idea myself.

As helpful as this space flight observation is the information in “Amazon Gets Waiver from FCC to Monitor Sleep with a Radar Sensor.” First, I like the idea of a waiver because it makes clear that rules are made to be Elastic, just like Amazon’s cloud in India. (Bummer that it went down for a couple of hours on or about US July 12, 2021.) Second, the new sleep tracking device may allow some sysadmins to monitor Mr. Bezos’ sleep (maybe snapping some pix, capturing vids, and recording some spoken words) in the post-Branson era. Third, this is a classy idea: More home surveillance but for a really good reason. On the other hand, maybe Amazon just wants to fatten up the data gathered by its different surveillance devices helpfully connected to Sidewalk. Who doesn’t love Sidewalks?

My take away from these two news items is that rich people and big companies are like medieval kings. Different rules apply. But the outcome is the same. True classiness and the ability to do pretty much what the billionaires want. As I said, classy.

Stephen E Arnold, July 16, 2021

News Flash! Security Measures Only Work if Actually Implemented

July 14, 2021

Best practices are there for a reason but it seems many companies are not following them. According to TechRadar, “Ransomware Is Not Out of Control’ Security Teams Are.” Reporter Mayank Sharma interviewed Optiv Security VP and former FBI Information and Technology official James Turgal, who puts the blame for recent ransomware attacks squarely on organizations themselves. In answer to a question on the most common missteps that pave the way for ransomware attacks, Turgal answered:

“Every business is different. Some older and more established organizations have networks and infrastructure that have evolved through the years without security being a priority, and IT shops have traditionally just bolted on new technology without properly configuring it and/or decommissioning the old tech. Even startups who begin their lives in the cloud still have some local technology servers or infrastructure that need constant care and feeding. Some of the themes I see, and the most common mistakes made by companies, are:

1. No patch strategy or a strategy that is driven more by concerns over network unavailability and less on actual information assurance and security posture.

2. Not understanding what normal traffic looks like on their networks and/or relying on software tools. Usually too many of them overlap and are misconfigured. The network architecture is the company’s pathway to security or vulnerability with misconfigured tools.

3. Relying too much on backups, and believing that a backup is enough to protect you. Backups that were not segmented from the network, were only designed to provide a method of restoring a point in time, and were never designed to be protected from an attacker. Backups need to be tested regularly to ensure the data is complete and not corrupted.”

Another mistake is focusing so narrowly on new projects, like a move to cloud storage, that vulnerabilities in older equipment are neglected. See the article for more of Turgal’s observations and advice. Surely he would like readers to consider his company’s services, and for some businesses outsourcing cybersecurity to experienced professionals (there or elsewhere) might be a wise choice. Whatever the approach, organizations must keep on top of implementing the most up-to-date security best practices in order to stem the tide of attacks. Better to spend the money now than pay out in Bitcoin later.

Cynthia Murrell, July 14, 2021

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta