What Happens When Wild West Innovation Operates without Barbed Wires to Corral the Doggies?

June 1, 2022

I have seen quite a few comments about a facial recognition company called PimEyes. My hunch is that ClearView.ai is happy to see that another magnet for commentary has surfaced. The write up I found interesting was “The Facial Recognition Search Engine Apocalypse Is Coming.” The cited article points to the New York Times’ exposé of a high tech outfit the NYT professionals did not know much about. Dispersing this cloud of unknowing allows for comments about the negatives associated with facial recognition. Security benefits related to entering a building with restricted access? Well, not a factor.

Here’s the comment I found in the cited article that edges closer to a substantive issue:

I’d never heard of PimEyes before, but suspect we’ll be hearing about it more going forward. I also suspect this is a losing game of whack-a-mole. Machine learning is clearly already good enough to do this with spooky accuracy, and it’s only going to get better. Should we try passing legislation to strictly regulate facial recognition search? Sure. But I suspect it’s futile, particularly given the global nature of the internet.

With few consequences and no government-generated guidelines, what will high school science club members do when they leave behind rocket motors and Raspberry Pi spy cameras? These clever lads and lassies do pretty much whatever they want.

Is that a good thing or a bad thing? My hunch is that guidelines, like the wonky sonnet form Willie Shakespeare followed, can improve innovation and possibly spill over into adulting thought about a particular capability. Right now, stampede those cows, partner! It may not be an apocalypse, but those hooves can do some damage to humanoids who are in the path of progress.

Stephen E Arnold, June 1, 2022

France and French: The Language of Diplomacy Says “Non, Non” to Gamer Lingo

May 31, 2022

I like France. Years ago I shipped my son to Paris to learn French. He learned other things. So, as a good daddy, I shipped him off to a language immersion school in Poitier. He learned other things. Logically, I responded as a good shepherd of my only son, I shipped him to Jarnac, to work for a cognac outfit. He learned other things. Finally, I shipped him to Montpellier. How was his French? Coming along I think.

He knew many slang terms.

Most of these were unknown to my wife (a French teacher) and me (a dolt from central Illinois). We bought a book of French slang, and it was useless. The French language zips right along: Words and phrases from French speaking Swiss people (mon dieu). Words and phrases from North Africans (what’s the term for head butt?). Words and phrases from the Middle East popular among certain fringe groups.

Over the decades, French has become Franglish. But the rock of Gibraltar (which should be a French rock, according to some French historians) is the Académie française e and its mission (a tiny snippet follows but there is a lot more at this link.

La mission confiée à l’Académie est claire : « La principale fonction de l’Académie sera de travailler, avec tout le soin et toute la diligence possibles, à donner des règles certaines à notre langue et à la rendre pure, éloquente et capable de traiter les arts et les sciences.»

Who cares? The French culture ministry (do we have one in the US other than Disneyland?)

France Bans English Gaming Tech Jargon in Push to Preserve Language Purity” explains:

Among several terms to be given official French alternatives were “cloud gaming”, which becomes “jeu video en nuage”, and “eSports”, which will now be translated as “jeu video de competition”. The ministry said experts had searched video game websites and magazines to see if French terms already existed. The overall idea, said the ministry, was to allow the population to communicate more easily.

Will those French “joueur-animateur en direct” abandon the word “streamer”?

Sure, and France will once again dominate Europe, parts of Africa, and the beaver-rich lands in North America. And Gibraltar? Sure, why not?

Stephen E Arnold, May 30, 2022

Autonomy Business Details: Are These Relevant to Search- and Content Processing Type Outfits Today?

May 31, 2022

I read “Judge Details Lynch’s $700k Signoff via iPhone Text in Full Autonomy Judgement.” The main idea is that Autonomy — an early entrant in the smart software for search and content processing — engaged in some business practices which a British judge finds suggestive. How suggestive? I am not sure, but the idea of using resellers and transactions to amp up revenues is interesting.

Another search and content processing outfit called Fast Search & Transfer (which Microsoft acquired more than a decade ago) found itself subject to some scrutiny for financial fancy dancing. One of the firm’s founders was found guilty and may have spent some time in the custody of a government. Maybe the fellow was cross country skiing and shooting a rifle at snow bunnies.

The relevance of the cited story and the reference to skis and weapons reminds me that the financial reports of high-flying search and content processing companies have to be scrutinized. I mention this because some of the more interesting search and content processing centric companies are publicly traded. Palantir Technologies comes to mind because I have seen a couple of semi-optimistic write ups about the company.

If I were a more youthful 77 year old, I would muster the energy to:

  1. Investigate the US government and UK government contracts for term, sunset dates, and contracting officers (what’s the background of these individuals)
  2. Research the question, “What’s bundled into the basic commercial and the basic government deal?”
  3. Explore the question, “How is cost of sales reacting to the economic climate since Palantir went public?”
  4. Try to determine answers to these questions: “What’s the ratio of sales people to programmers? The ratio of full time equivalents to contractors? How has the ratio changed since the firm went public?”
  5. Interview some people at LE and intel conferences to get a sense of the chatter related to this question: “Is Palantir bundling Amazon cloud services or doe the licensee have a choice?” and “Has there been talk of Palantir providing a “system in a box” to licensees with this requirement?

Why think about these types of questions? Oh, I am just curious about search and content processing outfits.

Stephen E Arnold, May 31, 2022

The Business Intelligence Blind Spot: Everyone Needs These Systems

May 30, 2022

I recall that a booth called “Business Blind Spots” identified a number of behaviors which contribute to business missteps. Staff, preconceived notions, market receptivity, etc. were among the points I recall.

I want to toss one more blind spot into the raging fire of burned cash, torched reputations, and incinerated opportunities. I call this bling spot, “Everybody needs these systems.” Plug in your own “systems”; for example, software that manages several cloud accounts which are guaranteed to blow through budget assumptions with no easy way to control the rising expenses.

I read “Palantir Stock: Getting Desperate.” I think the write up has been riding the well-worn fire trail to a burning coal mine.

Palantir Technologies is when the charities, the razzle dazzle, and the jargon are stripped away, is a search and retrieval company. The idea is that a person looking for information about a bad actor, for instance, can plug in the name and see results.

Now this seems like a function which is readily available from many vendors. The twist for Palantir is that it positioned its search as one that would meet the needs of intelligence officers. The US government entity embracing Palantir’s software influenced the add-ons; for example, the ability to ingest certain types of content that only government agencies could acquire.

In order to make sales, the marketing engine of Palantir came up with the same type of “latest and greatest” verbiage that characterizes intelware (that’s software built around the specific needs of intelligence analysts). One example is importing proprietary file types. Another is keeping track of where a dataset came from, who fiddled with it, and what an authorized user did with the data when in search mode.

Over time, companies which serve government agencies have to choose one of three paths:

  1. Path 1 is to just do commercial work. Forget the intelligence market. A company which has moved in this direction is one you may not know anything about. It is LifeRaft. Look them up. Now the company does market and ad intelligence for commercial companies, ad agencies, and probably some non profit outfits.
  2. Path 2 is to just focus on government sales. An example of this type of outfit is BAE Systems which has software able to do Palantir type functions.  I am not sure BAE Systems returns phone calls from a bank or real estate agency wanting some Detica goodness.
  3. Path 3 is to do both. The best example of this is Voyager Labs which does the LifeRaft type work and the intelligence and law enforcement work of outfits like Palantir.

Which is the right path?

From my point of view, a company selling intelware should stick to government clients, maintain a low profile, and keep systems and methods secret. LifeRaft told me, “Don’t even mention our firm at the 2022 National Cyber Crime Conference.” Why? Doing work for certain government agencies gives some commercial firms and their go-go decision makers the heebie jeebies. The fear comes from folks who are interacting with investigators, intelligence operatives, and analysts could say something that will create big time thunderstorms for the commercial company. Some businesses are not exactly paragons of behavior. This means that the purchase cycle is drawn out, excuses are made, concerns about confidentiality raised, and weirdness about the amount of training, customizing, and optimizing the intelware system requires. The result? Some pretty crazy attempts to sell the product and the resulting disconnect from promises of reality from the commercial sector and the inevitable gap. This type of “gap” created some interesting situations in the decade or so.

What about government sales? Unless a company is selling hardware, software, spare parts, training, and services governments a fickle. Sure, an intelware outfit like Palantir will get initial contracts. But the government agencies have roving eyes and will keep licensing, looking for the perfect solution to intel needs. What happens is that the software only vendor runs out of customers. Once a number of big agencies sign up, the US General Services Administration or the Defense Services Administration will start angling for a deal. Cut the fees or lose the contracts. This is bad news because expensive software takes time to sell to government customers who want a demo or a  year of free or discounted use in order to figure out if the system actually works. The problem is that There are not that many government agencies in the free world to support the intelware companies hungry for allocated budget dollars. Stated another way, the intelware company has to get some contracts, make the software work, and forget about the hockey stick financial projections. The intelware vendors chase US allies, but there are vendors in those countries, and  it may make more sense to license Trendalyze or Verint, not the Silicon Valley type outfit. Bad financial news? Yep.

Path three is to sell to anyone who wants the system. This is very, very difficult because the intelware system has to be fiddled with in order to meet the specific requirements of an organization. Chasing bad actors is one thing; figuring out what type of beverage a college student wants is another thing. Hanging over the commercial sales call is the concern about the government work, the government customers, and the government processes, which — once started — are tough to turn off.

This means that companies crafted for intelware users find that government sales slow down, commercial sales cycles take a long time and often end up at a dead end, and non government organizations don’t want or can’t pay big bucks for what is search software.

The market itself is changing. If you want to analyze tweets, hire a marketing agency and get rid of them once they have completed a project. Clean, tidy, easy. If a client has some Google grade programmers, download Maltego, license the $100 Hunchly, and spend some time looking at tools on GitHub. (Thank you, Microsoft, but do you know what’s on that service? I thought so.)

The cited article makes this point:

…the company must expand internationally. What better way to get new sales than to start fires and be the person to sell the smoke detectors? That is what Palantir’s software does, assess and analyze data for threats. It is a loose analogy but fitting. But why is Palantir in such desperate need of expansion to new governments and industries? It is because the only thing keeping the stock going is the revenue growth rate which has been so strong. The company has incurred losses every year of operation. It expects operating expenses to increase.

And what about international sales? Three points:

  1. There are vendors offering comparable or better systems so buying non-US may make economic and political sense
  2. The cost of closing deals internationally is — the last time I checked — two to three times the cost of selling from Chicago to US based customers
  3. The number of purchasers is not as large as one thinks? The US is the living embodiment of Parkinson’s Law and the Peter Principle. Other countries are not much better and they have less disposable cash.

Net net: The word desperate may be appropriate for Palantir Technologies. I don’t have a good set of options for the company: Too much hype, too much development cost, too much customizing and tuning and training, and too much nuke talk. Not helpful.

Stephen E Arnold, May 30, 2022

Cybersecurity: Are the Gloves Off?

May 26, 2022

Cybersecurity has been a magnet for investments. Threats are everywhere! Threats are increasing! Ransomware destroys businesses and yours will be next? One thousand bad actors attack in the SolarWinds’ misstep, right? The sky is falling!

Frightened yet?

Changes are evident. Let me offer two examples:

Lacework

The cybersecurity outfit Lacework has just allowed about 20 percent of their workforce to find their future elsewhere. Uber, perhaps? Piece work via Fiverr.com? A for-fee blog on Substack, the blog platform with real journalists, experts, pundits, wizards, etc.?

Cloud Security Firm Lacework Lays Off 20% of Staff

” reports:

A well-funded startup in the cybersecurity industry, Lacework, has become the latest tech firm to disclose a major round of layoffs amid fears of a broader economic slowdown. In a statement provided to Protocol, Lacework confirmed that the layoffs impacted 20% of its employees, in connection with what it called a “decision to restructure our business.”

Is the number of future hunters let loose in the datasphere accurate? The article points out that Lacework used the outstanding Twitter to say, 20 percent was a “significant overestimate.” Whom does one believe? In today’s world, I have to hold two contradictory statements in my mind because I sure as heck don’t know why a hot sector with a well funded company is making more parking available and reducing demand for the ping pong table.

Cybersecurity Does Not Work

The second example I noted an advertisement in my dead tree version of the Wall Street Journal. Here’s the ad from the May 26, 2022, publication:

tanium ad

The text Tanium advertisement declares that cybersecurity systems fail their customers. The idea is that there are many cybersecurity vendors, and each offers pretty good barriers to a couple of threats. The customers of these firms’ products have to buy multiple solutions. The fix? License Tanium, a “best place to work.”

Stepping Back

The first example provides a hint that certain companies in the cybersecurity market are taking steps to reduce costs. Nothing works quite as well as winnowing the herd. My hunch is that Lacework is like a priest in ancient Greece poking at a sacrificial lamb and declaring, “Prepare for the pestilence and the coming famine. Have a good day.”

The second example may signal that the policy of cybersecurity vendors not criticizing one another is over. Tanium is criticizing a pride of cyber lions. My hunch is that the gloves will be coming off. Saying that no other vendor can deal with cyber threats in the Wall Street Journal is a couple of levels above making snarky comments in a security trade show booth.

Net Net

Bad actors can add some of the Lacework castoffs to their virtual crimeware teams hiding behind the benign monikers of front companies in Greece and Italy, among other respected countries. The Tanium ad copy offers proof that existing cyber defense may have some gaps. The information will encourage bad actors to keep chipping away at juicy online targets. Change has arrived.

Stephen E Arnold, May 26, 2022

Google: Embrace, Control, and Sell Advertising?

May 25, 2022

Google claims to support open source technology and contributes some of its code libraries and projects, except for the black box search algorithm, to the public. The Verge shares Google’s new open source initiative in the article, “Google Will Start Distributing A Security-Vetted Collection Of Open-Source Software Libraries.” Google wants to add its branding and stamp of approval to open source software.

Google wants to control its portion of the open source community by curating and distributing security-vetted software to Google Cloud customers. The new initiative is called Assured Open Source Software. Andy Chang is a Google Cloud Group Produce Manager for Security and Privacy and he said there were challenges to secure open source software:

“ ‘There has been an increasing awareness in the developer community, enterprises, and governments of software supply chain risks,’ Chang wrote, citing last year’s major log4j vulnerability as an example. ‘Google continues to be one of the largest maintainers, contributors, and users of open source and is deeply involved in helping make the open source software ecosystem more secure.’”

The Assured Open Source Software will allow Google Cloud customers to use the same software auditing process as Alphabet Inc. The open source packages are the same ones the company uses and are managed by regular scanning and vulnerability analysis. Currently, there are 550 libraries Google monitors on GitHub and can be downloaded independently of Google. These same libraries will be available via Google Cloud later in 2022.

Google’s Assured Open Source Software is an industry-wide pull to secure the open source software supply chain. The Biden administration supports the endeavor.

Open source does need to be secure, but is putting a tech giant, notorious for collecting and selling user data, the right way to go? Sure it is, it is Google approved!

Whitney Grace May 25, 2022

Synthetic Data: Cheap, Like Fast Food

May 25, 2022

Fabricated data may well solve some of the privacy issues around healthcare-related machine learning, but what new problems might it create? The Wall Street Journal examines the technology in, “Anthem Looks to Fuel AI Efforts with Petabytes of Synthetic Data.” Reporter Isabelle Bousquette informs us Anthem CIO Anil Bhatt has teamed up with Google Cloud to build the synthetic data platform. Interesting choice, considering the health insurance company has been using AWS since 2017.

The article points out synthetic data can refer to either anonymized personal information or entirely fabricated data. Anthem’s effort involves the second type. Bousquette cites Bhatt as well as AI and automation expert Ritu Jyoti as she writes:

“Anthem said the synthetic data will be used to validate and train AI algorithms that identify things like fraudulent claims or abnormalities in a person’s health records, and those AI algorithms will then be able to run on real-world member data. Anthem already uses AI algorithms to search for fraud and abuse in insurance claims, but the new synthetic data platform will allow it to scale. Personalizing care for members and running AI algorithms that identify when they may require medical intervention is a more long-term goal, said Mr. Bhatt. In addition to alleviating privacy concerns, Ms. Jyoti said another advantage of synthetic data is that it can reduce biases that exist in real-world data sets. That said, she added, you can also end up with data sets that are worse than real-world ones. ‘The variation of the data is going to be very, very important,’ said Mr. Bhatt, adding that he believes the variation in the synthetic data will ultimately be better than the company’s real-world data sets.”

The article notes the use of synthetic data is on the rise. Increasing privacy and reducing bias both sound great, but that bit about potentially worse data sets is concerning. Bhatt’s assurance is pleasant enough, but how can will we know whether his confidence pans out? Big corporations are not exactly known for their transparency.

Cynthia Murrell, May 25, 2022

Controlled Term Lists Morph into Data Catalogs That Are Better, Faster, and Cheaper to Generate

May 24, 2022

Indexing and classifying content is boring. A human subject matter expert asked to extract index terms and assign classification codes work great. But the humanoid SME gets tired and begins assigning general terms from memory. Plus humanoids want health care, retirement benefits, and time to go fishing in the Ozarks. (Yes, the beautiful sunny Ozarks!)

With off-the-shelf smart software available on GitHub or at a bargain price from the ever-secure Microsoft or the warehouse-subleasing Amazon, innovators can use machines to handle the indexing. In order to make the basic into a glam task. Slap on a new bit of jargon, and you are ready to create a data catalog.

16 Top Data Catalog Software Tools to Consider Using in 2022” is a listing of automated indexing and classifying products and services. No humanoids or not too many humanoids needed. The software delivers lower costs and none of the humanoid deterioration after a few hours of indexing. Those software systems are really something: No vacations, no benefits, no health care, and no breaks during which unionization can be discussed.

What’s interesting about the list is that it includes the allegedly quasi monopolistic outfits like Amazon, Google, IBM, Informatica, and Oracle. The write up does not answer the question, “Are the terms and other metadata the trade secret of the customer?” The reason I am curious is that rolling up terms from numerous organizations and indexing each term as originating at a particular company provides a useful data set to analyze for trends, entities, and date and time on the document from which the terms were derived. But no alleged monopoly would look at a cloud customer’s data? Inconceivable.

The list of vendors also includes some names which are not yet among the titans of content processing; for example:

Alation

Alex

Ataccama

Atlan

Boomi

Collibra

Data.world

Erwin

Lumada.

There are some other vendors in the indexing business. You can identify these players by joining NFAIS, now the National Federation of Advanced Information Services. The outfit discarded the now out of favor terminology of abstracting and indexing.  My hunch is that some NFAIS members can point out some of the potential downsides of using smart software to process business and customer information. New terms and jazzy company names can cause digital consternation. But smart software just gets smarter even as it mis-labels, mis-indexes, and mis-understands. No problem: Cheaper, faster, and better. A trifecta. Who needs SMEs to look at an exception file, correct errors, and tune the sysetm? No one!

Stephen E Arnold, May 24, 2022

Google, Smart Software, and Prime Mover for Hyperbole

May 17, 2022

In my experience, the cost of training smart software is very big problem. The bigness does not become evident until the licensee of a smart system realizes that training the smart software must take place on a regular schedule. Why is this a big problem? The reason is the effort required to assemble valid training sets is significant. Language, data types, and info peculiarities change over time; for example, new content is fed into a smart system, and the system cannot cope with the differences between the training set that was used and the info flowing into the system now. A gap grows, and the fix is to assemble new training data, reindex the content, and get ready to do it again. A failure to keep the smart software in sync with what is processed is a tiny bit of knowledge not explained in sales pitches.

Accountants figure out that money must be spent on a cost not in the original price data. Search systems return increasingly lousy results. Intelligence software outputs data which make zero sense to a person working out a surveillance plan. An art history major working on a PowerPoint presentation cannot locate the version used by the president of the company for last week’s pitch to potential investors.

The accountant wants to understand overruns associated with smart software, looks into the invoices and time sheets, and discovers something new: Smart software subject matter experts, indexing professionals, interns buying third-party content from an online vendor called Elsevier. These are not what CPAs confront unless there are smart software systems chugging along.

The big problem is handled in this way: Those selling the system don’t talk too much about how training is a recurring cost which increases over time. Yep, reindexing is a greedy pig and those training sets have to be tested to see if the smart software gets smarter.

The fix? Do PR about super duper even smarter methods of training. Think Snorkel. Think synthetic data. Think PowerPoint decks filled with jargon that causes clueless MBAs do high fives because the approach is a slam dunk. Yes! Winner!

I read “DeepMind’s Astounding New ‘Gato’ AI Makes Me Fear Humans Will Never Achieve AGI” and realized that the cloud of unknowing has not yet yield to blue skies. The article states:

Just like it took some time between the discovery of fire and the invention of the internal combustion engine, figuring out how to go from deep learning to AGI won’t happen overnight.

No kidding. There are gotchas beyond training, however. I have a presentation in hand which I delivered in 1997 at an online conference. Training cost is one dot point; there are five others. Can you name them? Here’s a hint for another big issue: An output that kills a patient. The accountant understands the costs of litigation when that smart AI makes a close enough for horseshoes output for a harried medical professional. Yeah, go catscan, go.

Stephen E Arnold, May 17, 2022

Does Google Have Search Fear?

May 16, 2022

I can hear the Googlers at an search engine optimization conference saying this:

Our recent investments in search are designed to provide a better experience for our users. Our engineers are always seeking interesting, new, and useful ways to make the world’s information more accessible.

What these code words mean to me is:

Yep, the ancient Larry and Sergey thing. Not working. Oh, my goodness. What are we going to do? Buy Neeva, Kagi, Seekr, and Wecript? Let’s let Alphabet invest and we can learn and maybe earn before more people figure out our results are not as good as Bing and DuckDuckGo’s.

Even Slashdot is running items which make clear that Google and search do not warrant the title of “search giant.”

image

Source: Slashdot at https://bit.ly/3PkBOGt

I crafted this imaginary dialog when I read “This Germany-based AI Startup is Developing the Next Enterprise Search Engine Fueled by NLP and Open-Source.” That write up said:

Deepset, a German startup, is working to add to Natural Language Processing by integrating a language awareness layer into the business tech stack, allowing users to access and interact with data using language. Its flagship product, Haystack, is an open-source NLP framework that enables developers to create pipelines for a variety of search use-cases.

But here’s the snappy part of the article:

The Haystack-based NLP is typically implemented over a text database like Elasticsearch or Amazon’s OpenSearch branch and then connects directly with the end-user application through a REST API. It already has thousands of users and over 100 contributors. It uses transformer models to let developers create a variety of applications, such as production-ready question answering (QA), semantic document search, and summarization. The company has also introduced Deepset Cloud, an end-to-end platform for integrating customized and high-performing NLP-powered search systems into your application.

In theory, this is an open source, cloud centric super app, a meta play, a roll up of what’s needed to make finding information sort of work.

The kicker in the story is this statement:

The Berlin-based company has raised $14M in Series A funding led by GV, Alphabet’s venture capital arm.

Yep, the Google is investing. Why? Check that which applies:

(  ) Its own innovation engines are the equivalent of a Ford Pinto racing a Tesla Model S Plaid? Google search is no longer the world’s largest Web site?

(  ) Amazon gets more product searches than Google does?

( ) Users are starting to complain about how Google ignores what users key in the search box?

( ) Large sites are not being spidered in a comprehensive or timely manner?

( ) All of the above.

Stephen E Arnold, May 16, 2022

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta