Exposing Big Data: A Movie Person Explains Fancy Math

April 16, 2021

I am not “into” movies. Some people are. I knew a couple of Hollywood types, but I was dumbfounded by their thought processes. One of these professionals dreamed of crafting a motion picture about riding a boat powered by the wind. I think I understand because I skimmed one novel by Herman Melville, who grew up with servants in the house. Yep, in touch with the real world of fish and storms at sea.

However, perhaps an exception is necessary. A movie type offered some interesting ideas in the BBC “real” news story “Documentary Filmmaker Adam Curtis on the Myth of Big Data’s Predictive Power: It’s a Modern Ghost Story.” Note: This article is behind a paywall designed to compensate content innovators for their highly creative work. You have been warned.

Here are several statements I circled in bright True Blue marker ink:

  • “The best metaphor for it is that Amazon slogan, which is: ‘If you like that, then you’ll like this,'” [Adam] Curtis [the documentary film maker]
  • [Adam Curtis] pointed to the US National Security Agency’s failure to intercept a single terrorist attack, despite monitoring the communications of millions of Americans for the better part of two decades.
  • [Big data and online advertising] a bit like sending someone with a flyer advertising pizzas to
    the lobby of a pizza restaurant,” said Curtis. “You give each person one of those flyers as they come into the restaurant and they walk out with a pizza.  “It looks like it’s one of your flyers that’s done it. But it wasn’t – it’s a pizza restaurant.”

Maybe I should pay more attention to the filmic mind. These observations strike me as accurate.

Predictive analytics, fancy math, and smart software? Ghosts.

But what if ghosts are real?

Stephen E Arnold, April 16, 2021

MIT Deconstructs Language

April 14, 2021

I got a chuckle from the MIT Technology Review write up “Big Tech’s Guide to Talking about AI Ethics.” The core of the write up is a list of token words like “framework”, “transparency”, by design”, “progress”, and “trustworthy.” The idea is that instead of explaining the craziness of smart software with phrases like “yeah, the intern who set up the thresholds is now studying Zen in Denver” or “the lady in charge of that project left in weird circumstances but I don’t follow that human stuff.” The big tech outfits which have a generous dollop of grads from outfits like MIT string together token words to explain what 85 percent confidence means. Yeah, think about it when you ask your pediatrician if the antidote given your child will work. Here’s the answer most parents want to hear: “Ashton will be just fine.” Parents don’t want to hear, “probably 15 out of every 100 kids getting this drug will die. Close enough for horse shoes.”

The hoot is that I took a look at MIT’s statements about Jeffrey Epstein and the hoo-hah about the money this estimable person contributed to the MIT outfit. Here are some phrases I selected plus their source.

  • a thorough review of MIT’s engagements with Jeffrey Epstein (Link to source)
  • no role in approving MIT’s acceptance of the donations. (Link to source)
  • gifts to the Institute were approved under an informal framework (Link to source)
  • for all of us who love MIT and are dedicated to its mission (Link to source)
  • this situation demands openness and transparency (Link to source).

Yep, “framework”, “openness,” and “transparency.” Reassuring words like “thorough” and passive voice. Excellent.

Word tokens are worth what exactly?

Stephen E Arnold, April 14, 2021

Palantir Fourth Quarter Results Surprises One Financial Pundit

February 22, 2021

I read “Palantir Stock Slides As It Posts a Surprise Loss in Fourth Quarter.” The pundit noted:

Palantir stock has been very volatile this year. It is among the stocks that were been pumped by the Reddit group WallStreetBets. Palantir stock had a 52-week high of $45 amid frenzied buying. However, as has been the case with other meme stocks, it is down sharply from its recent highs. Based on yesterday’s closing prices, Palantir stock has lost almost 30% from its 52-week highs. The drawdown is much lower than what we’ve seen in stocks like GameStop and AMC Theatres. But then, the rise in Palantir stock was also not comparable to the massive gains that we saw in these companies.

Yikes. Worse than GameStop? Quite a comparison.

The pundit pointed out:

Palantir has been diversifying itself away from government business that currently accounts for the bulk of its revenues. This year, it has signed many deals that would help it diversify its revenues. Earlier this month, Palantir announced that it has extended its partnership with energy giant BP for five more years.

Who knew that a company founded in 2003 would have difficulty meeting Wall Street expectation? Maybe that IBM deal and the new US president’s administration can help Palantir Technologies meet financial experts’ expectations?

Search and content processing companies have been worn down by long sales cycles, lower cost competitors, and the friction of customization, training, and fiddling with content intake.

Palantir might be an exception. Stakeholders are discomfited by shocks.

Stephen E Arnold, February 22, 2021

Where Did You Say “Put the Semantic Layer”?

February 10, 2021

Eager to add value to their pricey cloud data-warehouses, cloud vendors are making a case for processing analytics right on their platforms. Providers of independent analytics platforms note such an approach falls short for the many companies that have data in multiple places. VentureBeat reports, “Contest for Control Over the Semantic Layer for Analytics Begins in Earnest.” Writer Michael Vizard tells us:

“Naturally, providers of analytics and business intelligence (BI) applications are treating data warehouses as another source from which to pull data. Snowflake, however, is making a case for processing analytics in its data warehouse. For example, in addition to processing data locally within its in-memory server, Alteryx is now allowing end users to process data directly in the Snowflake cloud. At the same time, however, startups that enable end users to process data using a semantic layer that spans multiple clouds are emerging. A case in point is Kyligence, a provider of an analytics platform for Big Data based on open source Apache Kylin software.”

Alteryx itself acknowledges the limitations of data-analysis solutions that reside on one cloudy platform. The write-up reports:

“Alteryx remains committed to a hybrid cloud strategy, chief marketing officer Sharmila Mulligan said. Most organizations will have data that resides both in multiple clouds and on-premises for years to come. The idea that all of an organization’s data will reside in a single data warehouse in the cloud is fanciful, Mulligan said. ‘Data is always going to exist in multiple platforms,’ she said. ‘Most organizations are going to wind up with multiple data warehouses.’”

Kyligence is one firm working to capitalize on that decentralization. Its analytics platform pulls data from multiple platforms in an online analytical processing database. The company has raised nearly $50 million, and is releasing an enterprise edition of Apache Kylin that will run on AWS and Azure. It remains to be seen whether data warehouses can convince companies to process data on their platforms, but the push is clearly part of the current trend—the pursuit of a never-ending flow of data.

Cynthia Murrell, February 10, 2021

Infodemic: Another Facet of Good Old 2020

November 12, 2020

It is difficult to locate non political, non Covid, and non frightening information. I read “Misinformation in the New Normal in a technology publication.” The essay is descriptive; that is, one does not solve a problem or spell out a fix. It’s like a florid passage in James Fennimore Cooper’s novels. There were some factoids in the essay; for example:

According to one piece of research, websites spreading misinformation about the pandemic received nearly half a billion views via Facebook in April alone…

Source? Not stated.

I also noted this statement in the write up:

As defensive measures evolve, so do the attacks, and the further development of deep fake technology is a worrying growth area for misinformation campaigns. Like fake domains, these altered recordings aim to create a veneer of trust in order to seed bad or dangerous information – but deep fakes are now around five years ahead, in technological development terms, of our ability to defend against them.

Five years? That’s another interesting number: 2025. And the lingo like infodemic? Snappy.

I have added the word “infodemic” to my list of interesting neologisms which contain gems like these: neurosymbolic AI, perception hacks, digital detox, and dissonance score.

But the article “Can the Law Stop Internet Bots from Undressing You?” raises another viewpoint about online data; specifically:

For women and men over the age of 18, the production of a sexual pseudo-image of a person is not in itself illegal under international law or in the UK, even if it is produced and distributed without the consent of the person portrayed in the image.

Have government regulators failed? Have educators been unable to impart ethical values to students? Have clever people embraced the methods of some Silicon Valley-type wizards?

Problem solved in 2025?

Stephen E Arnold, November 12, 2020

ThoughtTrace Launches AI Document Comprehension and Management Combo

November 5, 2020

Great idea—Will it work? “ThoughtTrace Unveils the First All-in-One A.I. Document Understanding and Management Platform,” we learn at PR Newswire. The press release explains:

“Today, ThoughtTrace, Inc., the leader in contract and document analytics for asset intensive industries since 2017, announced the official release of their new Document Understanding platform. The new platform combines self-organizing document management with contract analytics and powerful contextual search to discover critical contract data in seconds, condensing weeks of work down to minutes. ‘ThoughtTrace was built to be fundamentally different from both traditional document management and ‘train your own A.I.’ style contract analytics,’ said Nick Vandivere, Chief Executive Officer at ThoughtTrace. ‘With the new platform we are able to completely disrupt traditional approaches to document review that rely on very structured document organization and workflow, and replace that with the ability for the software to actually understand the meaning of the documents being managed. Rather than rigid processes where several different people need to review a document to understand what it says, just ask ThoughtTrace the appropriate question, and it will surface the appropriate results – even for industry specific language, and across thousands to millions of documents.’”

The “appropriate question,” he says. That may be the sticking point for many users. If one can find the magic wording, ThoughtTrace promises to greatly simplify the process of making difficult decisions. We’re told the platform runs on machine learning models tailored to each industry, so no tweaking is required to get started. It can, however, be customized to automate business processes that involve other applications. ThoughtTrace was founded in 1999 and is based in Houston, Texas.

Cynthia Murrell, November 5, 2020

Linear Math Textbook: For Class Room Use or Individual Study

October 30, 2020

Jim Hefferon’s Linear Algebra is a math textbook. You can get it for free by navigating to this page. From Mr. Hefferon’s Web page for the book, you can download a copy and access a range of supplementary materials. These include:

  • Classroom slides
  • Exercise sets
  • A “lab” manual which requires Sage
  • Video.

The book is designed for students who have completed one semester of calculus. Remember: Linear algebra is useful for poking around in search or neutralizing drones. Zaap. Highly recommended.

Stephen E Arnold, October 30, 2020

Content Management: A New Spin

October 27, 2020

What do you get when a young wizard reinvents information management? First, there was records management. Do you know what that was supposed to do? Yep, manage records and know when to destroy them according to applicable guidelines. Next, there was content management. In the era of the Internet, newly minted experts declared that content destined for a Web site had to be management. There were some exciting solutions which made some consultants lots of money; for example, Broadvision/Aurea. Excellent solution. Then there was document management exemplified by companies like Exstream Software which still lives at OpenText as a happy 22 year old solution.) These “disciplines” generated much jargon and handwaving, but most of the chatter sank into data lakes and drowned. Once in a while, like Nessie, an XML/JSON monster emerges and roars, “Success. All your content belong to us.” On the shore of the data lake, eDiscovery vendors shiver in fear. Information management is a scary place.

I read because someone sent me a link, knowing my interest in crazy mid tier consulting speak, to this article: “The Problem with Books of Record and How an EMS Could Help Solve That Problem.” Now here’s the subtitle: “Execution management systems are a new category of software that unlocks value in the hairball of enterprise IT landscapes. Here’s how.”

The acronym EMS means “execution management systems.” Okay. EMS is similar to CMS (content management systems) but with a difference. Execution has a actionable edge. Execution. Get something done. Terminate with extreme prejudice.

Another clarification appears in the write up:

To be a book of record, the data would be in one place, always current and complete. Today’s business systems often have data stored, redundantly, in many places, with many elements incomplete and possibly out of date.

Okay, a book of record and the reference to the existing content chaos which exists in most of these “management” systems.

I am now into new territory. The filing cabinet has yielded to the data lake which suggests dumping everything in one big pool and relying of keywords, Fancy Dan solution like natural language processing, and artificial intelligence to deliver what the person looking for information needs. (The craziness of this approach can be relived by reading about the Google Search Appliance or using an enterprise search system to locate a tweet by a crazed marketer who decided to criticize a competitor after a two hour Zoom meeting followed by a couple of cans of Mountain Dew.)

The write up explains:

Solutions like Celonis’ EMS (execution management) exist because few vendors have focused on all these information handshakes. To create a really efficient business environment, the devil is in the nooks, crannies, handoffs, manual steps, integrations, systems changes, queues, and more. Execution management is about documenting, understanding, integrating, streamlining, optimizing and reengineering how work gets done.  Put simply, Celonis’ tools, in short, document processes, mine what’s happening from the underlying systems to see what kinds of tortured paths are being followed to get work done and then, via benchmarks, best practices and smart automation capabilities, straighten out the flow.

Is this a sales pitch for a company called Celonis?


The firm, according to its Web site, is the number one in the execution management system space. I believe everything I read on the Internet.

Several observations:

  • Automation is a hot topic. Hooking information to workflow makes sense.
  • The word choice or attempt at creating awareness with the EMS moniker could be confusing to some. For me, EMS means emergency management solutions.
  • Founded in 2011, Celonis has ingested (according to Crunchbase) more than $300 million in funding. Investors are optimistic and know that the trajectories of FileNet and FatWire are in their future.

The information management revolution continues. At some point, the problem with information in an organization will be solved. On the other hand, it may be one of those approaching infinity thing-a-ma-bobs. You can’t get there from here.

Some corporate executives experience stress when dealing with content and information challenges: Legal discovery, emails with long forgotten data, and references to documents which no longer “exist.”

Net net: Stress can lead to heart attacks. That’s when the real EMS is needed.

Stephen E Arnold, October 27, 2020

Buzzwords and Baloney: Insecurity Signals? No Way. Do You Like My Hair?

October 22, 2020

People like to sound smart and impressive. The belief is if they appear smart and impressive they will rub shoulders with the best of the best. The Next Web says otherwise in the article: “Using Jargon To Sound Smart? Science Says You’re Just Insecure.”

Apparently people who use too much jargon-use are insecure. Relying on a specialized vocabulary momentarily inflates their ego. This long known truth was proven by the study “Compensatory Conspicuous Communication: Low Status Increases Jargon Use.” The study found that professionals low on the corporate ladder used more acronyms in their written communication and relied on jargon usage when interacting with higher ranks.

All industries have their jargon, but it is alienating to people outside the specific industry. It is even more alienating to others within the industry, because if they are unfamiliar with the term they will not admit it.

Does this mean people on every corporate ladder rung has insecurity? Yup.

Unfortunately you cannot beat jargon users so it is better to join the herd:

“As much as it’s annoying and superfluous, jargon is unlikely to go away. So you literally have two choices: you can embrace it or ignore it. I’m of the opinion that if you can’t beat them, you join them. How? By using a technology bullsh*t generator — yes, you’ve read that correctly. This tool won’t change your life but you’ll definitely have some fun.”

Another fun thing to do with jargon enthusiasts is make up words. It takes practice, but if you speak confidently enough you will soon be “proclaving” [sic] people. Cloudify too.

Whitney Grace, October 23, 2020

Text Analytics: Are These Really the Companies to Watch in the Next 12 Weeks?

October 16, 2020

DarkCyber spotted “Top 10 Text Analytics Companies to Watch in 2020.” Let’s take a quick look at some basic details about each firm:

Alkymi, founded in 2017, makes an email indexing system. The system, according to the company’s Web site, “understands documents using deep learning and visual analysis paired with your human in-the-loop expertise.” Interesting but text analytics appears to be a component of a much larger system. What’s interesting is that the business relies in some degree upon Amazon Web Services. The company’s Web site is https://alkymi.io/.

Aylien Ltd., based in Ireland, appears to be a company with text analysis technology. However, the company’s system is used to create intelligence reports for analysts; for example, government intelligence officers, business analysts, and media outlets. Founded in 2010, the company’s Web site is https://aylien.com.

Hewlett Packard Enterprise. The inclusion of HPE was a bit of a surprise. This outfit once owned the Autonomy technology, but divested itself of the software and services. To replace Autonomy, the company developed “Advanced Text Analysis” which appears to be an enterprise search centric system. The service is available as a Microsoft Azure function and offers 60 APIs (which seems particularly generous) “that deliver deep learning analytics on a wide range of data.” The company’s Web site is https://www.hpe.com/in/en/home.html. One product name jumped out: Ezmeral which maybe a made up word.

InData Labs lists data science, AI, AI driven mobile app development, computer vision, machine learning, data capture and optical character recognition, and big data solutions as its services. Its products include face recognition and natural language processing. Perhaps it is the NLP product which equates to text analytics? The firm’s Web site is https://indatalabs.com/ The company was founded in 2014 and operates from Belarus and has a San Francisco presence.

Kapiche, founded in 2016, focuses on “customer insights”. Customer feedback yields insight with “no set up, no manual coding, and results you can trust,” according to the company. The text analytics snaps into services like Survey Monkey and Google Forms, among others. Clients include Target and Toyota. The company is based in Australia with an office in Denver, Colorado. The firm’s Web site is https://www.kapiche.com. The firm offers applied text analytics.

Lexalytics, founded in 2003, was one of the first standalone text analytics vendors. The company’s system allows customers to “tell powerful stories from complex text data.” DarkCyber prefers to learn “stories” from the data, however. In the last 17 years, the company has not gone public nor been acquired. The firm’s Web site is https://www.lexalytics.com/.

MindGap. The MindGap identified in the article is in the business of providing “AI for business.” the company appears to be a mash up of artificial intelligence and “top tier strategy consulting:. That may be true, but we did not spot text analytics among the core competencies. The firm’s clients include Mail.ru, Gazprom, Yandex, and Huawei. The firm’s Web site is https://www.mindgap.dev/. The firm lists two employees on LinkedIn.

Primer has ingested about $60 million in venture funding since it was founded  in 2015. The company ingests text and outputs reports. The company was founded by the individual who set up Quid, another analytics company. Government and business analysts consume the outputs of the Primer system. The company’s Web site is https://primer.ai.

Semeon Analytics, now a unit of Datametrex, provides “custom language and sentiment ontology” services. Indexing and entity extraction, among other NLP modules, allows the system to deliver “insight analysis, rapid insights, and sentiment of the highest precision on the market today.” The Semeon Web site is still online at https://semeon.com.

ThoughtTrace appears to focus on analysis of text in contracts. The firm’s Web site says that its software can “find critical contract facts and opportunities.” Text analytics? Possibly, but the wording suggests search and retrieval. The company has a focus on oil and gas and other verticals. The firm’s Web site is https://www.thoughttrace.com/. (Note that the design of the Web site creates some challenges for a person looking for information.) The company, according to Crunchbase, was founded in 1999, and has three employees.

Three companies are what DarkCyber would consider text analytics firms: Aylien, Lexalytics, and Primer. The other firms mash up artificial intelligence, machine learning, and text analytics to deliver solutions which are essentially indexing and workflow tools.

Other observations include:

  1. The list is not a reliable place to locate flagship vendors; specifically, only three of the 10 companies cited in the article could be considered contenders in this sector.
  2. The text analytics capabilities and applications are scattered. A person looking for a system which is designed to handle email would have to examine the 10 listings and work from a single pointer, Alkymi.
  3. The selection of vendors confuses technical disciplines; for example, AI, machine learning, NLP, etc.

The list appears to have been generated in a short Zoom meeting, not via a rigorous selection and analysis process. Perhaps one of the vendors’ text analytics systems could have been used. Primer’s system comes to mind as one possibility. But that, of course, is work for a real journalist today.

Stephen E Arnold, October 16, 2020

Next Page »

  • Archives

  • Recent Posts

  • Meta