Controlled Term Lists Morph into Data Catalogs That Are Better, Faster, and Cheaper to Generate
May 24, 2022
Indexing and classifying content is boring. A human subject matter expert asked to extract index terms and assign classification codes work great. But the humanoid SME gets tired and begins assigning general terms from memory. Plus humanoids want health care, retirement benefits, and time to go fishing in the Ozarks. (Yes, the beautiful sunny Ozarks!)
With off-the-shelf smart software available on GitHub or at a bargain price from the ever-secure Microsoft or the warehouse-subleasing Amazon, innovators can use machines to handle the indexing. In order to make the basic into a glam task. Slap on a new bit of jargon, and you are ready to create a data catalog.
“16 Top Data Catalog Software Tools to Consider Using in 2022” is a listing of automated indexing and classifying products and services. No humanoids or not too many humanoids needed. The software delivers lower costs and none of the humanoid deterioration after a few hours of indexing. Those software systems are really something: No vacations, no benefits, no health care, and no breaks during which unionization can be discussed.
What’s interesting about the list is that it includes the allegedly quasi monopolistic outfits like Amazon, Google, IBM, Informatica, and Oracle. The write up does not answer the question, “Are the terms and other metadata the trade secret of the customer?” The reason I am curious is that rolling up terms from numerous organizations and indexing each term as originating at a particular company provides a useful data set to analyze for trends, entities, and date and time on the document from which the terms were derived. But no alleged monopoly would look at a cloud customer’s data? Inconceivable.
The list of vendors also includes some names which are not yet among the titans of content processing; for example:
Alation
Alex
Ataccama
Atlan
Boomi
Collibra
Data.world
Erwin
Lumada.
There are some other vendors in the indexing business. You can identify these players by joining NFAIS, now the National Federation of Advanced Information Services. The outfit discarded the now out of favor terminology of abstracting and indexing. My hunch is that some NFAIS members can point out some of the potential downsides of using smart software to process business and customer information. New terms and jazzy company names can cause digital consternation. But smart software just gets smarter even as it mis-labels, mis-indexes, and mis-understands. No problem: Cheaper, faster, and better. A trifecta. Who needs SMEs to look at an exception file, correct errors, and tune the sysetm? No one!
Stephen E Arnold, May 24, 2022
Google Et Al: A Small Matter Perhaps?
May 24, 2022
In India, the Lok Sabha is a bit like the US Congress. Like its US equivalent, the group of distinguished individuals can be frisky, intellectually speaking, of course. “Standing Committee On Finance To Discuss Big Tech Firms’ Practices” reports:
…the parliamentary panel will be hearing views of hospitality, restaurants and travel agents associations on the subject ‘Anti-Competitive Practices by Big-tech companies…
By itself, this type of investigation and questioning is chugging along in the US, the EU, and India. In Russia, the country has seized the Google’s assets, and it is not clear what the future will hold for other US Big Tech Firms.
I noted this statement in the source article:
Representatives of Google, Amazon, Facebook, Twitter and others too were summoned by the panel.
I anticipate that the answers to the interlocutors’ questions will be along the line, “Thank you for the question. I will collect the information and provide it to your office.”
However, this sentence suggests that India may be considering adding some teeth to its approach to the alleged monopolistic and anti-competitive behavior of the Amazon, Facebook, Google, and Twitter outfits:
The CCI Act was initiated in 2002 and last amended in 2007. A bill to amend the Act is also under consideration wherein provisions are likely to be introduced to deal with anti-competition practices of tech giants.
Worth watching? India? Is that a significant market? Yep.
Stephen E Arnold, May 24, 2022
TikTok Shell Shock: In App Gaming
May 24, 2022
I think that outfits like Facebook, Twitter, and YouTube are less interesting than TikTok. Facebook or Zuckbook has the layoffs, the cutbacks, and the Zuck. Twitter has — for better or worse — the Tesla person. YouTube has its outstanding human resource management system. But the TikTok has a small test which may not amount to much. On the other hand, the test in Vietnam way have some upside.
“TikTok Tests In-App Gaming Feature in Vietnam” reports:
TikTok is testing a new feature to let users play games within the short-video app in Vietnam…
Facebook and Netflix are sniffing around this application of game mania as well.
The article continues:
In addition to gaming, TikTok has also been expanding its ecommerce efforts, recently rolling out its online shopping platform TikTok Shop in Malaysia. The efforts come as the company’s ecommerce arm entered Thailand and Vietnam in February, where it has been hiring local teams.
My take is:
- TikTok and testing in a non-US piece of real estate is interesting
- The TikTok monitoring technology may open the door to dynamic personalization of eGames
- The TikTok app may become a portal to a TikTok metaverse.
Net net: TikTok may be poking around the the super app space. Me-too time for some US tech outfits? Yep.
Stephen E Arnold, May 24, 2022
Librarians and Book Bans: Why Not Burn the Books?
May 24, 2022
Every couple of years the nation deals with an unprecedented surge in book banning. Once upon a time Mark Twain’s Tom Sawyer and Huckleberry Finn were the top banned contenders and they remain on the list. Other challenged titles include JK Rowling’s Harry Potter series, The Autobiography of Malcolm X, The Diary of Anne Frank, The Perks of Being a Wallflower, and The Bluest Eye. Banned books usually include content people deem blasphemous, lewd, racist, pornographic, and contrary to decent society.
Most books currently being banned discuss racism, trans people, homosexuality, sex, and alternative views of history. Educators are not the only professionals dealing with demands to ban books, school and public librarians are fighting the angry mobs too. Salon explores the current book banning storm that is sweeping the country in the article, “‘Hearbreaking’: Frightened Librarians Face ‘Hostile’ Harassment Trying To Navigate Book Bans.”
Texas is a conservative, Republican, Christian state and its librarians are facing a book banning crisis. Many librarians are quitting their jobs or are fired because they refuse to cave to the mobs. Librarians are not only facing hostility at work, but they are also dealing with it on social media:
“In Keller, local Facebook group pages and Twitter accounts have included pointed comments about librarians being “heretical” and portrayed them as pedophile “groomers” who order pornographic books. After a particular book challenge failed, one commenter included the phrase “pass the millstones,” a biblical reference to execution by drowning.”
Librarians who continue to work under the box banning blitz are hesitant to make any move that could be misinterpreted. They are afraid to order new books (and probably new copies of past banned books) because they do not want to deal with the backlash. They are playing it safe, but they wonder at what cost to themselves, students, and the future of education?
Most of the attacks are coming from parents who are unaware of what the questionable books contain. Many of the challengers do not read the books, stating that they do not need to read to recognize “smut” or “inappropriate” material. Book banning is an ignorant act and violates the First Amendment, the right to freedom of speech. Book banning is a propaganda tool for the Nazis, the Soviets, communist China, North Korea, various South American dictatorships, cults, and fundamentalist zealots use to control, indoctrinate, and brainwash people.
Book access should be managed for kids, but not everything deemed inappropriate fits the bill. The Bible contains references to genocide, sex, incest, rape, magic, cannibalism, evolution, war, and many more inappropriate subjects. Are librarians to be equipped with matches?
Whitney Grace, May 21, 2022
Facebook: Maybe Thinking about Superapps?
May 23, 2022
The idea of popping up a level is a good one. Examples range from companies offering ways to manage multiple APIs to services hooking consumers with individual providers, regardless of where the providers call home.
“Wikipedia Over WhatsApp” explains:
If the wifi is letting WhatsApp messages through, what if we used WhatsApp as a vehicle for the information we really care about? Much like we encapsulate the rest of our networking objects in higher-level objects, we could encapsulate web pages inside of WhatsApp messages.
Okay, who is really excited about reading Wikimedia’s entry about my relative Vladimir Igorevich Arnold, a mathematician, which is an exciting profession to be sure? Not too many people.
The idea of using WhatsApp as a mechanism for other services is a good one. Is it Facebook’s attempt to become a superapp or allow others to use WhatsApp as a superapp.
Some encrypted end to end messaging services include a number of useful functions now. But what if almost any traditional browser based function could be supported within a messaging app on one’s mobile phone. Apple uses a “up a level” method with its requirement that browser developers honor and respect the wonderful WebKit thing.
Interesting if true.
Stephen E Arnold, May 23, 2022
Apple Disdain: The Right to Repair? Absolutely, Well, Sort Of…
May 23, 2022
I read “Apple Shipped Me a 79-Pound iPhone Repair Kit to Fix a 1.1 Ounce Battery.” The allegedly true write up reports:
Apple has been lobbying to suppress right-to-repair policies around the country, with the company accused of doing everything it can to keep customers from repairing their own phones.
Now Apple wants to be helpful.
What’s needed to insert a battery in a current iPhone?
The article states:
I expected Apple would send me a small box of screwdrivers, spudgers, and pliers; I own a mini iPhone, after all. Instead, I found two giant Pelican cases — 79 pounds of tools — on my front porch. I couldn’t believe just how big and heavy they were considering Apple’s paying to ship them both ways.
The repair kit strikes me — and this is my opinion — what some lower-class real journalist might describe as a bright digital finger for anyone who thinks he/she/it/them can repair an Apple device. Doesn’t the vaunted Apple manufacturing method utility robots or individuals richly compensated in OSHA and EPA approved facilities to make the gizmos for thick fingered humanoids? In my experience, humans are less than zero when it comes to precision assembly of gadgets and gizmos. What about those rows of happy workers I recall seeing in unverified write ups about worker abuse, child labor, and happy re-education campers? My view is that those “people” are M2 chip powered robots manufactured by other robots to look a bit like moms, dads, college students, and others looking for truly rewarding, intellectually engaging work.
Therefore, when a mere real humanoid customer buys and breaks a device, the non-Deep Fake customer is 100% responsible for trotting to the Apple store, assuming it is open due to assorted medical and Genius considerations. The sausage-fingered real humanoid who wants to do an iPhone repair on his/her, its/thems kitchen table may work with light from the flickering middle finger. I hope there is an Apple logo tattooed on the hand itself. That’s exception design grammar, is it not?
How customer centric is the Apple approach to the right to repair an iPhone? The write up concludes:
It would be an understatement to say that Apple has a history of resisting right-to-repair efforts.
Now what purpose does that really big digital middle finger serve? The answer may appear in “I Can’t Imagine a Day without Proctology,” available from Amazon for less than $7.
Stephen E Arnold, May 23, 2022
An Analyst Wrestles with the Palantir Realities
May 23, 2022
Palantir Technologies in my world view is a services and software company positioned as a provider of intelware. Intelware means software and services which allow users to extract high-value information from text, numeric, and possibly image and video data.
Palantir, founded in 2003, has been influenced from its inception by precursor software like the original i2 Ltd. Analyst Notebook and BAE Systems Detica. Both of these systems allowed user to intake “content”, enter the names of people or things, and display the outputs so that the higher-value facts were presented in a useful way; for example, a chart or a relationship graph.
The US government works to learn about new and potentially useful software and systems. Not surprisingly, a government agency showed interest in Palantir’s software when the entrepreneurs involved in the company started describing the Palantir features and functions. Appreciate that in its early years almost two decades ago, the presentations and demonstrations captured what I call “to be” systems; that is, at some point in the future, Palantir’s system and software would be everything that Analyst Notebook, Detica, and the other intelware vendors could offer. The pitch is compelling.
Palantir, now almost two decades old, is a publicly traded company, and it is working overtime to move beyond sales to governments in the US and elsewhere. One of the characteristics of selling intelware to non-governmental organizations is that the capabilities of the system and its use by government clients are often disconcerting to a financial institution, a big hospital chain, or consulting firm focused on real estate.
Furthermore, intelware systems require data. Some data can be easily imported into a system like Palantir’s; for example, plain ASCII text and Excel spreadsheets. Other data are in a format which must be transformed so that Palantir can import the information. Other data present challenges like converting an image with a date and time stamp into an indexed content object. That indexing, to be helpful and to reduce the likelihood of errors, has to be accurate. Some non-text data must be enriched. French content processing experts refer to this enrichment as “fertilization.”
The write up “Palantir: Complete Disaster” includes this statement:
We think there are three possible courses of action in the disaster that has been Palantir, all of which are correct.
Here are the three “courses of action”:
- Don’t buy shares in Palantir.
- Buy shares, maybe short the stock.
- Buy shares and ride out the downturn.
Each of these options ignore two issues. The first is why Palantir is not closing deals and showing a profit. The second is why an intelware company is not able to amp up its sales to government agencies in the US, Western Europe, and selected government agencies elsewhere.
My view is that Palantir is a tough sell for these reasons:
- To land a deal, the prospect has to know what the payoff from using the Gotham / Foundry system is. “Intelligence” is a hot concept, but it is a tough sell unless there is a “champion” inside the prospect’s organization to grease the skids.
- Competitors offer comparable products for as little as $5,000 per month and some of these competitors bundle third party data which can be fused with the licensee’s data with minimal fiddling with filters and file conversions.
- Newer systems are easier to use, include automated workflows which speed analysts, investigators, and and researchers work.
The slow sales of Palantir follow the same type of curve that sales of Autonomy, Fast Search & Software, and many other “information” or “intelligence” focused products have. The initial sales are from government agencies which want better mouse traps. When the intelware does not deliver markedly significant payoffs, the licensees keep looking for better, faster, and cheaper options.
Will Palantir be able to generate a profit and deliver organic growth?
If the trajectory of precursor companies is the path Palantir is on, the answer is, “No.”
Stephen E Arnold, May 23, 2022
Scraping By: A Winner Business Model
May 23, 2022
Will Microsoft-owned LinkedIn try, try, try again? The platform’s latest attempt to protect its users’ data from being ransacked has been thwarted, TechCrunch reveals in, “Web Scraping Is Legal, US Appeals Court Reaffirms.” The case reached the Supreme Court last year, but SCOTUS sent it back down to the Ninth Circuit of Appeals for a re-review. That court reaffirmed its original finding: scraping publicly accessible data is not a violation of the decades-old Computer Fraud and Abuse Act (CFAA). It is a decision to celebrate or to lament, depending on one’s perspective. A threat to the privacy of those who use social media and other online services, the practice is integral to many who preserve, analyze, and report information. Writer Zack Whittaker explains:
“The Ninth Circuit’s decision is a major win for archivists, academics, researchers and journalists who use tools to mass collect, or scrape, information that is publicly accessible on the internet. Without a ruling in place, long-running projects to archive websites no longer online and using publicly accessible data for academic and research studies have been left in legal limbo. But there have been egregious cases of web scraping that have sparked privacy and security concerns. Facial recognition startup Clearview AI claims to have scraped billions of social media profile photos, prompting several tech giants to file lawsuits against the startup. Several companies, including Facebook, Instagram, Parler, Venmo and Clubhouse have all had users’ data scraped over the years. The case before the Ninth Circuit was originally brought by LinkedIn against Hiq Labs, a company that uses public data to analyze employee attrition. LinkedIn said Hiq’s mass web scraping of LinkedIn user profiles was against its terms of service, amounted to hacking and was therefore a violation of the CFAA.”
The Ninth Circuit disagreed. Twice. In the latest decision, the court pointed to last year’s Supreme Court ruling which narrowed the scope of the CFAA to those who “gain unauthorized access to a computer system,” as opposed to those who simply exceed their authorization. A LinkedIn spokesperson expressed disappointment, stating the platform will “continue to fight” for its users’ rights over their data. Stay tuned.
Cynthia Murrell, May 23, 2022
Make Sales, Bill Time: Is There More to Real Work?
May 20, 2022
A partial answer to this question can be found in “Many Software Companies Are a Joke.” I circled this in bright green (that’s the money paid for not-too-helpful outputs):
The sad thing is that you get used to being busy but not productive, and when I say busy I mean pretending to be working hard when being watched. In other words, you will master the art of “eye service”.
Can one detect signals about “busy but not productive” in sectors other than software development? Let’s give this a whirl.
- Microsoft Teams and its monitoring functions and the parallel development of software that spoofs such monitoring.
- Meetings (in person and virtual) about inconsequential details when core functions do not meet customer needs.
- Decisions to replace informed humans with chatbots so employees do not have to deal with customers who complain about incorrect orders, non-functioning components, or bill mistakes.
I do like the idea of perfecting “eye service.” Perhaps this can be converted into a for-fee training program called “How to Look Busy While Doom scrolling”?
If one does no work, then one is not responsible for problems.
Stephen E Arnold, May 20, 2022
TikTok Sells Books. Who Knew? Amazon?
May 20, 2022
And some were concerned social media had made books obsolete. To the contrary, reports BBC News, “TikTok Helps UK Book Sales Hit Record Levels, Publishers Association Says.” The UK’s Publishers Association was pleased to see sales of printed books in that country rise by 5% last year. It is especially impressive, notes the organization’s chief executive Stephen Lotinga, given bookstores were still closed for the first quarter of 2021. He credits a TikTok trend with at least part of the increase. The article reports:
“The organization said four of the top five young adult bestsellers in 2021 had been driven by the BookTok trend. … Publishers Association chief executive Stephen Lotinga said viral videos on platforms like TikTok and YouTube had been ‘really significant’ in encouraging readers to discover books. ‘Anecdotally, we’ve had lots of individual booksellers talking about the fact that they’re having lots of young people coming into their book stores, talking about books that they have heard about on TikTok and asking for them,’ he said. ‘It is having an impact on the number of books sold, but the shape of what’s being sold is changing as well. Throughout the pandemic period, we saw people increasingly buying what we call backlist books, which are books that have been published in the past.’ Many of the titles that have taken off on TikTok are several years old rather than brand new releases.”
Rather than a preference for new releases, we learn, BookTok favors books with unexpected or dramatic endings. At least that appears to be the current trend. The bump in print-book sales was accompanied by a dip by 1% in digital sales. Interestingly, audio-book downloads beat out both with an increase of 14%.
Cynthia Murrell, May 20, 2022