Buzzwords and Baloney: Insecurity Signals? No Way. Do You Like My Hair?
October 22, 2020
People like to sound smart and impressive. The belief is if they appear smart and impressive they will rub shoulders with the best of the best. The Next Web says otherwise in the article: “Using Jargon To Sound Smart? Science Says You’re Just Insecure.”
Apparently people who use too much jargon-use are insecure. Relying on a specialized vocabulary momentarily inflates their ego. This long known truth was proven by the study “Compensatory Conspicuous Communication: Low Status Increases Jargon Use.” The study found that professionals low on the corporate ladder used more acronyms in their written communication and relied on jargon usage when interacting with higher ranks.
All industries have their jargon, but it is alienating to people outside the specific industry. It is even more alienating to others within the industry, because if they are unfamiliar with the term they will not admit it.
Does this mean people on every corporate ladder rung has insecurity? Yup.
Unfortunately you cannot beat jargon users so it is better to join the herd:
“As much as it’s annoying and superfluous, jargon is unlikely to go away. So you literally have two choices: you can embrace it or ignore it. I’m of the opinion that if you can’t beat them, you join them. How? By using a technology bullsh*t generator — yes, you’ve read that correctly. This tool won’t change your life but you’ll definitely have some fun.”
Another fun thing to do with jargon enthusiasts is make up words. It takes practice, but if you speak confidently enough you will soon be “proclaving” [sic] people. Cloudify too.
Whitney Grace, October 23, 2020
Text Analytics: Are These Really the Companies to Watch in the Next 12 Weeks?
October 16, 2020
DarkCyber spotted “Top 10 Text Analytics Companies to Watch in 2020.” Let’s take a quick look at some basic details about each firm:
Alkymi, founded in 2017, makes an email indexing system. The system, according to the company’s Web site, “understands documents using deep learning and visual analysis paired with your human in-the-loop expertise.” Interesting but text analytics appears to be a component of a much larger system. What’s interesting is that the business relies in some degree upon Amazon Web Services. The company’s Web site is https://alkymi.io/.
Aylien Ltd., based in Ireland, appears to be a company with text analysis technology. However, the company’s system is used to create intelligence reports for analysts; for example, government intelligence officers, business analysts, and media outlets. Founded in 2010, the company’s Web site is https://aylien.com.
Hewlett Packard Enterprise. The inclusion of HPE was a bit of a surprise. This outfit once owned the Autonomy technology, but divested itself of the software and services. To replace Autonomy, the company developed “Advanced Text Analysis” which appears to be an enterprise search centric system. The service is available as a Microsoft Azure function and offers 60 APIs (which seems particularly generous) “that deliver deep learning analytics on a wide range of data.” The company’s Web site is https://www.hpe.com/in/en/home.html. One product name jumped out: Ezmeral which maybe a made up word.
InData Labs lists data science, AI, AI driven mobile app development, computer vision, machine learning, data capture and optical character recognition, and big data solutions as its services. Its products include face recognition and natural language processing. Perhaps it is the NLP product which equates to text analytics? The firm’s Web site is https://indatalabs.com/ The company was founded in 2014 and operates from Belarus and has a San Francisco presence.
Kapiche, founded in 2016, focuses on “customer insights”. Customer feedback yields insight with “no set up, no manual coding, and results you can trust,” according to the company. The text analytics snaps into services like Survey Monkey and Google Forms, among others. Clients include Target and Toyota. The company is based in Australia with an office in Denver, Colorado. The firm’s Web site is https://www.kapiche.com. The firm offers applied text analytics.
Lexalytics, founded in 2003, was one of the first standalone text analytics vendors. The company’s system allows customers to “tell powerful stories from complex text data.” DarkCyber prefers to learn “stories” from the data, however. In the last 17 years, the company has not gone public nor been acquired. The firm’s Web site is https://www.lexalytics.com/.
MindGap. The MindGap identified in the article is in the business of providing “AI for business.” the company appears to be a mash up of artificial intelligence and “top tier strategy consulting:. That may be true, but we did not spot text analytics among the core competencies. The firm’s clients include Mail.ru, Gazprom, Yandex, and Huawei. The firm’s Web site is https://www.mindgap.dev/. The firm lists two employees on LinkedIn.
Primer has ingested about $60 million in venture funding since it was founded in 2015. The company ingests text and outputs reports. The company was founded by the individual who set up Quid, another analytics company. Government and business analysts consume the outputs of the Primer system. The company’s Web site is https://primer.ai.
Semeon Analytics, now a unit of Datametrex, provides “custom language and sentiment ontology” services. Indexing and entity extraction, among other NLP modules, allows the system to deliver “insight analysis, rapid insights, and sentiment of the highest precision on the market today.” The Semeon Web site is still online at https://semeon.com.
ThoughtTrace appears to focus on analysis of text in contracts. The firm’s Web site says that its software can “find critical contract facts and opportunities.” Text analytics? Possibly, but the wording suggests search and retrieval. The company has a focus on oil and gas and other verticals. The firm’s Web site is https://www.thoughttrace.com/. (Note that the design of the Web site creates some challenges for a person looking for information.) The company, according to Crunchbase, was founded in 1999, and has three employees.
Three companies are what DarkCyber would consider text analytics firms: Aylien, Lexalytics, and Primer. The other firms mash up artificial intelligence, machine learning, and text analytics to deliver solutions which are essentially indexing and workflow tools.
Other observations include:
- The list is not a reliable place to locate flagship vendors; specifically, only three of the 10 companies cited in the article could be considered contenders in this sector.
- The text analytics capabilities and applications are scattered. A person looking for a system which is designed to handle email would have to examine the 10 listings and work from a single pointer, Alkymi.
- The selection of vendors confuses technical disciplines; for example, AI, machine learning, NLP, etc.
The list appears to have been generated in a short Zoom meeting, not via a rigorous selection and analysis process. Perhaps one of the vendors’ text analytics systems could have been used. Primer’s system comes to mind as one possibility. But that, of course, is work for a real journalist today.
Stephen E Arnold, October 16, 2020
Sentiment Analysis with Feeling
September 25, 2020
As AI technology progresses, so too does the field of sentiment analysis. What could go wrong? Sinapticas explores “How Algorithms Discern Our Mood from What We Write Online.” Reporter Dana Mackenzie begins with an example we can truly relate to right now:
“Many people have declared 2020 the worst year ever. While such a description may seem hopelessly subjective, according to one measure, it’s true. That yardstick is the Hedonometer, a computerized way of assessing both our happiness and our despair. It runs day in and day out on computers at the University of Vermont (UVM), where it scrapes some 50 million tweets per day off Twitter and then gives a quick-and-dirty read of the public’s mood. According to the Hedonometer, 2020 has been by far the most horrible year since it began keeping track in 2008. The Hedonometer is a relatively recent incarnation of a task computer scientists have been working on for more than 50 years: using computers to assess words’ emotional tone. To build the Hedonometer, UVM computer scientist Chris Danforth had to teach a machine to understand the emotions behind those tweets — no human could possibly read them all. This process, called sentiment analysis, has made major advances in recent years and is finding more and more uses.”
The accompanying “Average Happiness for Twitter” graph is worth a gander, and well illustrates the concept (and the ride that has been 2020 thus far). The article is a good introduction to sentiment analysis. It contrasts lexicon-based with the more complex neural network approaches. We learn neural networks may never completely eclipse lexicon-based systems because of the immense computing power required for the latter. Hedonometer, for example, uses a lexicon.
Mackenzie also describes several applications of sentiment analysis, like predicting mental health, assessing prevailing attitudes on issues of the day, and, of course, supplying business intelligence. And the hedonism of the hedonometer, of course.
Cynthia Murrell, September 25, 2020
Fixing Language: No Problem
August 7, 2020
Many years ago I studied with a fellow who was the world’s expert on the morpheme _burger. Yep, hamburger, cheeseburger, dumbburger, nothingburger, and so on. Dr. Lev Sudek (I think that was his last name but after 50 years former teachers blur in my mind like a smidgen of mustard on a stupidburger.) I do recall his lecture on Indo-European languages, the importance of Sanskrit, and the complexity of Lithuanian nouns. (Why Lithuanian? Many, many inflections.) Those languages evolving or de-volving from Sanskrit or ur-Sanskrit differentiated among male, female, singular, neuter, plural, and others. I am thinking 16 for nouns but again I am blurring the Sriacha on the Incredible burger.
This morning, as I wandered past the Memoryburger Restaurant, I spotted “These Are the Most Gender-Biased Languages in the World (Hint: English Has a Problem).” The write up points out that Carnegie Mellon analyzed languages and created a list of biased languages. What are the languages with an implicit problem regarding bias? Here a list of the top 10 gender abusing, sexist pig languages:
- Danish
- German
- Norwegian
- Dutch
- Romanian
- English
- Hebrew
- Swedish
- Mandarin
- Persian
English is number 6, and if I understand Fast Company’s headline, English has a problem. Apparently Chinese and Persian do too, but the write up tiptoes around these linguistic land mines. Go with the Covid ridden, socially unstable, and financially stressed English speakers. Yes, ignore the Danes, the Germans, Norwegians, Dutch, and Romanians.
So what’s the fix for the offensive English speakers? The write up dodges this question, narrowing to algorithmic bias. I learned:
The implications are profound: This may partially explain where some early stereotypes about gender and work come from. Children as young as 2 exercise these biases, which cannot be explained by kids’ lived experiences (such as their own parents’ jobs, or seeing, say, many female nurses). The results could also be useful in combating algorithmic bias.
Profound indeed. But the French have a simple, logical, and “c’est top” solution. The Académie Française. This outfit is the reason why an American draws a sneer when asking where the computer store is in Nimes. The Académie Française does not want anyone trying to speak French to use a disgraced term like computer.
How’s that working out? Hashtag and Franglish are chugging right along. That means that legislating language is not getting much traction. You can read a 290 page dissertation about the dust up. Check out “The Non Sexist Language Debate in French and English.” A real thriller.
The likelihood of enforcing specific language and usage changes on the 10 worst offenders strikes me as slim. Language changes, and I am not sure the morpheme –burger expert understood decades ago how politicallycorrectburgers could fit into an intellectual menu.
Stephen E Arnold, August 7, 2020
Tick Tock Becomes Tit for Tat: The Apple and Xiao-i Issue
August 5, 2020
Okay, let’s get the company names out of the way:
- Shanghai Zhizhen Network Technology Company is known as Zhizhen
- Zhizhen is also known as Xiao-i
- Apple is the outfit with the virtual assistant Siri.
Zhizhen owns a patent for a virtual assistant. In 2013, Apple was sued for violating a Chinese patent. Apple let loose a flock of legal eagles to demonstrate that its patents were in force and that a Chinese voice recognition patent was invalid. The Chinese court denied Apple’s argument.
Tick tock tick tock went the clock. Then the alarm sounded. Xiao-i owns the Chinese patent, and that entity is suing Apple.
“Apple Faces $1.4B Suit from Chinese AI Company” reports:
Shanghai Zhizhen Network Technology Co. said in a statement on Monday it was suing Apple for an estimated 10 billion yuan ($1.43 billion) in damages in a Shanghai court, alleging the iPhone and iPad maker’s products violated a patent the Chinese company owns for a virtual assistant whose technical architecture is similar to Siri. Siri, a voice-activated function in Apple’s smartphones and laptops, allows users to dictate text messages or set alarms on their devices.
But more than the money, the Xiao-i outfit “asked Apple to stop sales, production, and the use of products fluting such a patent.”
Coincidence? Maybe. The US wants to curtail TikTok, and now Xiao-i wants to put a crimp in Apple’s China revenues.
Several observations:
- More trade related issues are likely
- Intellectual property disputes will become more frequent. China will use its patents to inhibit American business. This is a glimpse of a future in which the loss of American knowledge value will add friction to the US activities
- Downstream consequences are likely to ripple through non-Chinese suppliers of components and services to Apple. China is using Apple to make a point about the value of Chinese intellectual property and the influence of today’s China.
Just as China has asserted is cyber capabilities, the Apple patent dispute — regardless of its outcome — is another example of China’s understanding of American tactics, modifying them, and using them to try to gain increased economic, technical, and financial advantage.
Stephen E Arnold, August 3, 2020
Natural Language Processing: Useful Papers Selected by an Informed Human
July 28, 2020
Nope, no artificial intelligence involved in this curated list of papers from a recent natural language conference. Ten papers are available with a mouse click. Quick takeaway: Adversarial methods seem to be a hot ticket. Navigate to “The Ten Must Read NLP/NLU Papers from the ICLR 2020 Conference.” Useful editorial effort and a clear, adult presentation of the bibliographic information. Kudos to jakubczakon.
Stephen E Arnold, July 27, 2020
Optical Character Recognition for Less
July 10, 2020
Optical character recognition software was priced high, low, and in between. Sure, the software mostly worked if you like fixing four or five errors per scanned page with 100 words on it. Oh, you use small sized type. That’s eight to 10 errors per scanned page. Good enough I suppose.
You may want to check out EasyOCR, now available via Github. The information page says:
Ready-to-use OCR with 40+ languages supported including Chinese, Japanese, Korean and Thai.
Worth a look.
Stephen E Arnold, July 10, 2020
CFO Surprises: Making Smart Software Smarter
April 27, 2020
The Cost of Training NLP Models is a useful summary. However, the write up leaves out some significant costs.
The focus of the paper is a:
review the cost of training large-scale language models, and the drivers of these costs.
The cost factors discussed include:
- The paradox of compute costs going down yet the cost of processing data goes up—a lot. The reason is that more data are needed and more data can be crunched more quickly. Zoom go the costs.
- The unknown unknowns associated with processing the appropriate amount of data to make the models work as well as they can
- The wide use of statistical models which have a voracious appetite for training data.
These are valid points. However, the costs of training include other factors, and these are significant as well; for example:
- The directs and indirects associated with creating training sets
- The personnel costs required to assess and define retraining and the information assembly required for that retraining
- The costs of normalizing training corpuses.
More research into the costs of smart software training and tuning is required.
Stephen E Arnold, April 28, 2020
Acquisdata: High Value Intelligence for Financial and Intelligence Analysts
March 31, 2020
Are venture capitalist, investment analysts, and other financial professionals like intelligence officers? The answer, according to James Harker-Mortlock, is, “Yes.”
The reasons, as DarkCyber understands them, are:
- Financial professionals to be successful have to be data omnivores; that is, masses of data, different types, and continuously flowing inputs
- The need for near real time or real time data streams can make the difference between making a profit and losses
- The impact of changing work patterns on the trading floor are forcing even boutique investment firms and global giants to rely upon smart software to provide a competitive edge. These smart systems require data for training machine learning modules.
James Harker-Mortlock, founder of Acquidata, told DarkCyber:
The need for high-value data from multiple sources in formats easily imported into analytic engines is growing rapidly. Our Acquisdata service provides what the financial analysts and their smart software require. We have numerous quant driven hedge funds downloading all our data every week to assist them in maintaining a comprehensive picture of their target companies and industries.”
According to the company’s Web site, Acquisdata:
Acquisdata is a fast growing digital financial publishing company. Established in 2010, we have quickly become a provider to the world’s leading financial news companies, including Thomson Reuters/Refinitiv, Bloomberg, Factset, IHS Markit, and Standard and Poor’s Capital IQ, part of McGraw Hill Financial, and ISI Emerging Markets. We also provide content to a range of global academic and business database providers, including EBSCO, ProQuest, OCLC, Research & Markets, CNKI and Thomson Reuters West. We know and understand the electronic publishing business well. Our management has experience in the electronic publishing industry going back 40 years. We aim to provide comprehensive and timely information for investors and others interested in the drivers of the global economy, primarily through our core products, the Industry SnapShot, Company SnapShot and Executive SnapShot products. Our units provide the annual and interim reports of public companies around the world and fundamental research on companies in emerging markets sectors, and aggregated data from third-party sources. In a world where electronic publishing is quickly changing the way we consume news and information, Acquisdata is at the very forefront of providing digital news and content solutions.
DarkCyber was able to obtain one of the firm’s proprietary Acquisdata Industry Snapshots. “United States Armaments, 16 March 2020” provides a digest of information about the US weapons industry. the contents of the 66 page report include news and commentary, selected news releases, research data, industry sector data, and company-specific information.
Obtaining these types of information from many commercial sources poses a problem for a financial professional. Some reports are in Word files; some are in Excel; some are in Adobe PDF image format; and some are in formats proprietary to a data aggregator. We provide data in XML which can be easily imported into an analytic system; for example, Palantir’s Metropolitan or similar analytical tool. PDF versions of the more than 100 weekly reports are available.
DarkCyber’s reaction to these intelligence “briefs” was positive. The approach is similar to the briefing documents prepared for the White House.
Net net: The service is of high value and warrants a close look for professionals who need current, multi-type data about a range of company and industry investment opportunities.
You can get more information about Acquisdata at www.acquidata.com.
Stephen E Arnold, March 31, 2020
Content for Deep Learning: The Lionbridge View
March 17, 2020
Here is a handy resource. Lionbridge AI shares “The Best 25 Datasets for Natural Language Processing.” The list is designed as a starting point for those just delving into NLP. Writer Meiryum Ali begins:
“Natural language processing is a massive field of research. With so many areas to explore, it can sometimes be difficult to know where to begin – let alone start searching for data. With this in mind, we’ve combed the web to create the ultimate collection of free online datasets for NLP. Although it’s impossible to cover every field of interest, we’ve done our best to compile datasets for a broad range of NLP research areas, from sentiment analysis to audio and voice recognition projects. Use it as a starting point for your experiments, or check out our specialized collections of datasets if you already have a project in mind.”
The suggestions are divided by purpose. For use in sentiment analysis, Ali notes one needs to train machine learning models on large, specialized datasets like the Multidomain Sentiment Analysis Dataset or the Stanford Sentiment Treebank. Some text datasets she suggests for natural language processing tasks like voice recognition or chatbots include 20 Newsgroups, the Reuters News Dataset, and Princeton University’s WordNet. Audio speech datasets that made the list include the audiobooks of LibriSpeech, the Spoken Wikipedia Corpora, and the Free Spoken Digit Dataset. The collection concludes with some more general-purpose datasets, like Amazon Reviews, the Blogger Corpus, the Gutenberg eBooks List, and a set of questions and answers from Jeopardy. See the write-up for more on each of these entries as well as the rest of Ali’s suggestions in each category.
This being a post from Lionbridge, an AI training data firm, it naturally concludes with an invitation to contact them when ready to move beyond these pre-made datasets to one customized for you. Based in Waltham, Massachusetts, the company was founded in 1996 and acquired by H.I.G. Capital in 2017.
Cynthia Murrell, March 17, 2020