Backpressure: A Bit of a Problem in Enterprise Search in 2024

March 27, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

I have noticed numerous references to search and retrieval in the last few months. Most of these articles and podcasts focus on making an organization’s data accessible. That’s the same old story told since the days of STAIRS III and other dinobaby artifacts. The gist of the flow of search-related articles is that information is locked up or silo-ized. Using a combination of “artificial intelligence,” “open source” software, and powerful computing resources — problem solved.

image

A modern enterprise search content processing system struggles to keep pace with the changes to already processed content (the deltas) and the flow of new content in a wide range of file types and formats. Thanks, MSFT Copilot. You have learned from your experience with Fast Search & Transfer file indexing it seems.

The 2019 essay “Backpressure Explained — The Resisted Flow of Data Through Software” is pertinent in 2024. The essay, written by Jay Phelps, states:

The purpose of software is to take input data and turn it into some desired output data. That output data might be JSON from an API, it might be HTML for a webpage, or the pixels displayed on your monitor. Backpressure is when the progress of turning that input to output is resisted in some way. In most cases that resistance is computational speed — trouble computing the output as fast as the input comes in — so that’s by far the easiest way to look at it.

Mr. Phelps identifies several types of backpressure. These are:

  1. More info to be processed than a system can handle
  2. Reading and writing file speeds are not up to the demand for reading and writing
  3. Communication “pipes” between and among servers are too small, slow, or unstable
  4. A group of hardware and software components cannot move data where it is needed fast enough.

I have simplified his more elegantly expressed points. Please, consult the original 2019 document for the information I have hip hopped over.

My point is that in the chatter about enterprise search and retrieval, there are a number of situations (use cases to those non-dinobabies) which create some interesting issues. Let me highlight these and then wrap up this short essay.

In an enterprise, the following situations exist and are often ignored or dismissed as irrelevant. When people pooh pooh my observations, it is clear to me that these people have [a] never been subject to a legal discovery process associated with enterprise search fraud and [b] are entitled whiz kids who don’t do too much in the quite dirty, messy, “real” world. (I do like the variety in T shirts and lumberjack shirts, however.)

First, in an enterprise, content changes. These “deltas” are a giant problem. I know that none of the systems I have examined, tested, installed, or advised which have a procedure to identify a change made to a PowerPoint, presented to a client, and converted to an email confirming a deal, price, or technical feature in anything close to real time. In fact, no one may know until the president’s laptop is examined by an investigator who discovers the “forgotten” information. Even more exciting is the opposing legal team’s review of a laptop dump as part of a discovery process “finds” the sequence of messages and connects the dots. Exciting, right. But “deltas” pose another problem. These modified content objects proliferate like gerbils. One can talk about information governance, but it is just that — talk, meaningless jabber.

Second, the content which an employees needs to answer a business question in a timely manner can reside in am employee’s laptop or a mobile phone, a digital notebook, in a Vimeo video or one of those nifty “private” YouTube videos, or behind the locked doors and specialized security systems loved by some pharma company’s research units, a Word document in something other than English, etc. Now the content is changed. The enterprise search fast talkers ignore identifying and indexing these documents with metadata that pinpoints the time of the change and who made it. Is this important? Some contract issues require this level of information access. Who asks for this stuff? How about a COTR for a billion dollar government contract?

Third, I have heard and read that modern enterprise search systems “use”, “apply,” “operate within” industry standard authentication systems. Sure they do within very narrowly defined situations. If the authorization system does not work, then quite problematic things happen. Examples range from an employee’s failure to find the information needed and makes a really bad decision. Alternatively the employee goes on an Easter egg hunt which may or may not work, but if the egg found is good enough, then that’s used. What happens? Bad things can happen? Have you ridden in an old Pinto? Access control is a tough problem, and it costs money to solve. Enterprise search solutions, even the whiz bang cloud centric distributed systems, implement something, which is often not the “right” thing.

Fourth, and I am going to stop here, the problem of end-to-end encrypted messaging systems. If you think employees do not use these, I suggest you do a bit of Eastern egg hunting. What about the content in those systems? You can tell me, “Our company does not use these.” I say, “Fine. I am a dinobaby, and I don’t have time to talk with you because you are so much more informed than I am.”

Why did I romp though this rather unpleasant issue in enterprise search and retrieval? The answer is, “Enterprise search remains a problematic concept.” I believe there is some litigation underway about how the problem of search can morph into a fantasy of a huge business because we have a solution.”

Sorry. Not yet. Marketing and closing deals are different from solving findability issues in an enterprise.

Stephen E Arnold, March 27, 2024

Comments

Got something to say?





  • Archives

  • Recent Posts

  • Meta