IBM: A Leader in Following?
March 16, 2020
DarkCyber spotted “IBM Prepares To Advance Watson’s Language Ability.” The story appeared in Capital FM, an online publication in Nairobi. That’s okay. What’s interesting is that IBM has announced “the first commercialization of key Natural Language Processing (NLP) capabilities to come from IBM Research’s Project Debater, the only AI system capable of debating humans on complex topics.”
What’s new, aside from the Kenya coverage? Here’s a sampling of the technologies that will allegedly make Watson a superhero: Natural language processing. Watson will understand sentiment which can “identify and analyze idioms and colloquialisms for the first time.” [Emphasis added]
Plus:
IBM is bringing technology from IBM Research for understanding business documents, such as PDF’s and contracts, to also add to their AI models.
Where’s the technology originate? Project Debater. There’s also “deep learning based classification which
can learn from as few as several hundred samples to do new classifications quickly and easily. It will be added to Watson Discovery later this year.
Also, there’s another innovation:
It will also exploit natural language through Clustering or Advanced Topic Clustering. Building on insights gained from Project Debater, new topic clustering techniques will enable users to “cluster” incoming data to create meaningful “topics” of related information, which can then be analyzed.
Okay, let’s step back. NLP, quick deep learning, clustering, and the other technologies. My recollection is:
- IBM’s Dharmendra Modha was writing about text clustering in “Large Scale Parallel Data Mining” which is about a decade after the Endeca crowd fired up their functional facets for “Guided Navigation”. Now this clustering is coming to IBM Watson. What?
- In 2003 IBM researchers filed a patent application for “US7130777, Method to hierarchical pooling of opinions from multiple sources.” Now Watson is doing what commercial vendors have been offering for many years; for example, Lexalytics in 2003. Not exactly a text book case of using home grown technology or emulating a competitor, is it?
- And NLP dates back to 1993 and the work of Vincent Stanford, Ora Williamson, Elton Sherwin, and Frank Castellucci. See US5615296. These are IBM professionals. And 1993 was more than a quarter century ago.
Net net: Kenya, Watson, and technologies that have been around for decades are part of IBM’s preparations to add functions to Watson. “Prepares”, year, pretty speedy.
Watson? What are you doing? Maybe DarkCyber should ask Alexa?
Stephen E Arnold, March 16, 2020
Medical Surveillance: Numerous Applications for Government Entities and Entrepreneurs
March 16, 2020
With the Corona virus capturing headlines and disrupting routines, how can smart software monitoring data help with the current problem?
DarkCyber assumes that government health professionals would want to make use of technology that reduced a Corona disruption. Enforcement professionals would understand that monitoring, alerting, and identifying functions could assist in spotting issues; for example, in a particular region.
What’s interesting is that the application of intelware systems and methods to health issues is likely to become a robust business. However, despite the effective application of established techniques, identifying signals in a stream of data is an extension of innovations reaching back to i2 Analyst Notebook and other sensemaking systems in wide use in many countries’ enforcement and intelligence agencies.
What’s different is the keen attention these monitoring, alerting, and identifying systems are attracting.
Let’s take one example: Bluedot, a company operating from Canada. Founded by an infectious disease physician, Dr. Kamran Kahn. This company was one of the first firms to highlight the threat posed by the Coronavirus. According to Diginomica, BlueDot “alerted its private sector and government clients about a cluster of unusual pneumonia cases happening around a market in Wuhan, China.”
BlueDot, founded in 2013, combined expertise in infectious disease, artificial intelligence, analytics, and flows of open source and specialized information. “How Canadian AI start-up BlueDot Spotted Coronavirus before Anyone Else Had a Clue” explains what the company did to sound the alarm:
The BlueDot engine gathers data on over 150 diseases and syndromes around the world searching every 15 minutes, 24 hours a day. This includes official data from organizations like the Center for Disease Control or the World Health Organization. But, the system also counts on less structured information. Much of BlueDot’s predictive ability comes from data it collects outside official health care sources including, for example, the worldwide movements of more than four billion travelers on commercial flights every year; human, animal and insect population data; climate data from satellites; and local information from journalists and healthcare workers, pouring through 100,000 online articles each day spanning 65 languages. BlueDot’s specialists manually classified the data, developed a taxonomy so relevant keywords could be scanned efficiently, and then applied machine learning and natural language processing to train the system. As a result, it says, only a handful of cases are flagged for human experts to analyze. BlueDot sends out regular alerts to health care, government, business, and public health clients. The alerts provide brief synopses of anomalous disease outbreaks that its AI engine has discovered and the risks they may pose.
DarkCyber interprets BlueDot’s pinpointing of the Corona virus as an important achievement. More importantly, DarkCyber sees BlueDot’s system as an example of innovators replicating the systems, methods, procedures, and outputs from intelware and policeware systems.
Independent thinkers arrive at a practical workflow to convert raw data into high-value insights. BlueDot is a company that points the way to the future of deriving actionable information from a range of content.
Some vendors of specialized software work hard to keep their systems and methods confidential and in some cases secret. Now a person interested in how some specialized software and service providers assist government agencies, intelligence professionals, and security experts can read about BlueDot in open source articles like the one cited in this blog post or work through the information on the BlueDot Web site. The company wants to hire a surveillance analyst. Click here for information.
Net net: BlueDot provides a template for innovators wanting to apply systems and methods that once were classified or confidential to commercial problems. Business intelligence may become more like traditional intelligence more quickly than some anticipated.
Stephen E Arnold, March 16, 2020
Smart Software: Biases Are Encoded
March 16, 2020
Big Think published “Busting the Myth of Neutral AI”. There’s an annoying autorun version of the story but – thank heaven for small favors – there is a transcript of the lecture.
The main points, despite the wonky packaging, are semi-important; for example:
- “Technology can really amplify biases because we create technologies based on who we are.” The amplification point is related to the reactions caused by social networks or the feedback loops set up in YouTube recommendations
- Bias is part of the territory: “technologies built upon biases are learning from data sets that are out there and they’re learning from an unequal world. Because our world, we still have to try to perfect our union. We have to think about artificial intelligence in aspirational ways rather than this myth that it’s somehow neutral or scientific or it’s just technology.”
- People are clueless with regard to smart software and its ubiquity: “…All the time we’re interacting with AI systems it’s not disclosed to us; we don’t know what those systems know about us, we don’t know what are the values that guide their decisions, we don’t know how that might shape our lives, we don’t know what alternatives we might provide. All of that is a black box and all of that should be opened up.”
Useful observations.
Stephen E Arnold, March 16, 2020
Click Money from Google: A Digital Dodo?
March 15, 2020
At the beginning of 2020, Google released its 2019 end of year financial report and some amazing surprises were revealed. ZDNet has the details in the article, “The Mysterious Disappearance Of Google’s Click Metric.” For the first time since acquiring YouTube, Google shared revenue for YouTube and its cloud IT business, but they removed information about how much money the company made from clicks or the Cost-per-Click (CPC) plus its growth.
What does this mean for Google? It is even more confusing that the Wall Street analysts did not question the lack of information. The truth is something that Google might not want to admit, but the key to their revenue is dying and they are not happy.
“Google has a rapidly deflating advertising product, sometimes 29% less revenue per click, every quarter, year-on-year, year after year…. Every three months Google has to find faster ways of expanding the total number of paid clicks by as much as 66%. How is this a sustainable business model? There is an upper limit to how much more expansion in paid links can be found especially with the shift to mobile platforms and the constraints of the display. And what does this say about the effectiveness of Google’s ads? They aren’t very good and their value is declining at an astounding and unstoppable pace.”
Google might start placing more ads on its search results and other services. It sounds like, however, Google will place more ineffective ads in more places. Google’s ads have eroded efficiency for years, plus there is the question of whether more bots, less humans are clicking these ads. Clicks do not create brands and most people ignore ads. Don’t you love ads?
Whitney Grace, March 15, 2015
Deep Learning Startups May Encounter a Gotcha
March 14, 2020
Though fragmented, the deep learning AI market is growing rapidly. Anyone wishing to launch (or invest in) such a firm may want to check out Analytics India Magazine’s article, “Common Pitfalls that the Deep Learning Startups Fail to Recognise.” Writer Sameer Balanganur describes prevalent missteps under these headings: Not Investing Enough in Data and Powerful Processors, Not Accounting for the Cloud Charges, Expensive Data Cleansing, The Edge Cases, and Hiring the Right People.
The part that struck me was this description under Expensive Data Cleansing, as it Illustrates something many fail to understand:
“Training the model nowadays to achieve the state-of-the-art results [still] involves a lot of manual cleaning and labelling of large datasets. And the process of manual cleaning and labelling is expensive and is one of the largest barriers the deep learning startups face. … Although as time passes, the AI systems are moving towards complete automation, which will significantly reduce the cost. However, these AI-based automation applications still need human intervention for years to come. Even if there is full automation achieved, it’s not clear how much the margin of cost and efficiency will improve, so this becomes a matter of whether one should invest towards processes like drift learning and active learning to enhance the ability.
We noted:
“Not only expensive, the human intervention sometimes hinders the system’s creativity, but they might also do it by selecting what is essential for an algorithm to process or not using deep learning for a problem it can easily solve. Many times, deep learning is seen as overkill for many problems. The costs incurred by human intervention and cloud are interdependent. Reducing one means an increase in another.”
AI investment could be quite profitable, if one considers carefully. As always, look before you leap. See the write-up for more details.
Cynthia Murrell, March 14, 2020
Amazon Versus Microsoft: A Jedi Fight Development
March 13, 2020
DarkCyber spotted this story on the BBC Web site: “Pentagon to Reconsider Jedi $10bn Cloud Contract.” Since we are in rural Kentucky, the intrepid team does not know if the information in the Beeb’s write up is accurate. The factoids are definitely interesting. The story asserts:
The US Department of Defense is to “reconsider” its decision to award a multi-billion dollar cloud contract to Microsoft over Amazon.
The story points out that Microsoft is confident that its Azure system will prevail. Amazon, on the other hand, is allegedly pleased.
What’s at stake?
- Money
- A hunting license for other government contracts
- Implicit endorsement of either AWS or Azure
- Happy resellers, integrators, and consultants
- Ego (maybe?)
When will JEDI be resolved? Possibly in the summer of 2020.
Stephen E Arnold, March 13, 2020
Google and Amazon: Two Dominant Dogs Snap and Snarl at One Another
March 13, 2020
DarkCyber read “How Google Kneecapped Amazon’s Smart TV Efforts.” The uptake on criminal lingo continues. For those not hip to the argot of some technology savvy professionals, the Urban Dictionary defines the concept this way:
The act of permanently destroying someone’s kneecaps. Often done with a firearm (as popularized in film and television), a baseball bat or lead pipe or other blunt instrument, or a power drill (often used in conjunction with a countersunk drill bit and popular with the IRA).
Yes, the elegance of business competition requires these metaphors it seems. DarkCyber thinks the article is “about” the collision of cleverness and rapaciousness. But enough of our philosophical wanderings. What did Google do to Amazon, assuming online services have joints which keep bone and joint doctors busy?
The write up states:
Any company that licenses Google’s Android TV operating system for some of its smart TVs or even uses Android as a mobile operating system has to agree to terms that prevent it from also building devices using forked versions of Android like Amazon’s Fire TV operating system, according to multiple sources. If a company were to break those terms, it could lose access to the Play Store and Google’s apps for all of its devices.
Ah, ha! The kneecapping is not physical; those making devices sign a contract.
Plus, there’s another Googley twist of the 6 mm drill bit, a metaphor for kneecapping explained above:
At the center of Google’s efforts to block Amazon’s smart TV ambitions is the Android Compatibility Commitment — a confidential set of policies formerly known as the Anti-Fragmentation Agreement — that manufacturers of Android devices have to agree to in order to get access to Google’s Play Store. Google has been developing Android as an open-source operating system, while at the same time keeping much tighter control of what device manufacturers can do if they want access to the Play Store as well as the company’s suite of apps. For Android TV, Google’s apps include a highly customized launcher, or home screen, optimized for big-screen environments, as well as a TV version of its Play Store. Google policies are meant to set a baseline for compatible Android devices and guarantee that apps developed for one Android device also work on another. The company also gives developers some latitude, allowing them to build their own versions of Android based on the operating system’s open source code, as long as they follow Google’s compatibility requirements.
Interesting.
How will the issue be resolved? Legal eagles will flap and squawk. Customers can vote with their purchases. But TVs cost very little because “advertising” and data are often useful sources of revenue. Regulators can regulate, just as they have since Google and Amazon discovered the benefits of their interesting business activities.
Regardless of the outcome between the assailant and the victim, the article reveals some of the more charming facets of two “must have” businesses. How can a person advance his or her understanding of the kneecapping allegation.
DarkCyber will run a Google query for business ethics and purchase a copy of Business Ethics: Best Practices for Designing and Managing Ethical Organizations from Amazon. You have to find your own way through the labyrinths of the underworld, you gangster, no mercy, no malice, as the pundit, scholar, entrepreneur, and media phenomenon Scott Gallaway has said.
Stephen E Arnold, March 12, 2020
Banjo: A How To for Procedures Once Kept Secret
March 13, 2020
DarkCyber wrote about BlueDot and its making reasonably clear what steps it takes to derive actionable intelligence from open source and some other types of data. Ten years ago, the processes implemented by BlueDot would have been shrouded in secrecy.
From Secrets to Commercial Systems
Secret and classified information seems to find its way into social media and the mainstream media. DarkCyber noted another example of a company utilizing some interesting methods written up in a free online publication.
DarkCyber can visualize old-school companies depending on sales to law enforcement and the intelligence community asking themselves, “What’s going on? How are commercial firms getting this know how? Why are how to and do it yourself travel guides to intelligence methods becoming so darned public?”
It puzzles DarkCyber as well.
Let’s take a look at the revelations in “Surveillance Firm Banjo Used a Secret Company and Fake Apps to Scrape Social Media.” The write up explains:
- A company called Pink Unicorn Labs created apps which obtained information from users. Users did not know their data were gathered, filtered, and cross correlated.
- Banjo, an artificial intelligence firm that works with police used a shadow company to create an array of Android and iOS apps that looked innocuous but were specifically designed to secretly scrape social media. The developer of the apps was Pink Unicorn. Banjo CEO Damien Patton created Pink Unicorn.
- Why create apps that seemed to do one while performing data inhalation: “Dataminr received an investment from Twitter. Dataminr has access to the Twitter fire hose. Banjo, the write up says, “did not have that sort of data access.” The fix? Create apps that sucked data.
- The apps obtained information from Facebook, Twitter, Instagram, Russian social media app VK, FourSquare, Google Plus, and Chinese social network Sina Weibo.
- The article points out: “Once users logged into the innocent looking apps via a social network OAuth provider, Banjo saved the login credentials, according to two former employees and an expert analysis of the apps performed by Kasra Rahjerdi, who has been an Android developer since the original Android project was launched. Banjo then scraped social media content.”
- The write up explains, Banjo, via a deal with Utah, has access to the “state’s traffic, CCTV, and public safety cameras. Banjo promises to combine that input with a range of other data such as satellites and social media posts to create a system that it claims alerts law enforcement of crimes or events in real-time.”
Discussion
Why social media? On the surface and to most parents and casual users of Facebook, Twitter, and YouTube, there are quite a few cat posts. But via the magic of math, an analyst or a script can look for data which fills in missing information. The idea is to create a record of a person, leave blanks where desirable information is not yet plugged in, and then rely on software to spot the missing item. How is this accomplished? The idea is simple. One known fact appears in the profile and that fact appears in another unrelated item of content. Then the correlated item of content is scanned by a script and any information missing from the profile is plugged in. Using this method and content from different sources, a clever system can compile a dossier on an entity. Open source information yields numerous gems; for example, a cute name applied to a boy friend might become part of a person of interest’s Dark Web handle. Phone numbers, geographic information, friends, and links to other interesting content surface. Scripts work through available data. Data can be obtained in many ways. The methods are those which were shrouded in secrecy before the Internet started publishing essays revealing what some have called “tradecraft.”
Net Net
Banjo troubles DarkCyber on a number of levels:
- Secrecy has significant benefits. Secrets, once let loose, have interesting consequences.
- Users are unaware of the risks apps pose. Cluelessness is in some cases problematic.
- The “now” world looks more like an intelligence agency than a social construct.
Stephen E Arnold, March 13, 2020
In the UK, Brexit Leads to Taxit for Techs
March 13, 2020
US technology companies are likely to face a rocky 2020. The Coronavirus is creating some problems. If the information in “US Tech Companies Will Be Hit with New UK Tax in Just Three Weeks” is accurate, those juicy margins may be trimmed. The write up states:
The UK government said Wednesday [March 11, 2020] that it’s moving ahead with a 2% tax on revenue from digital services such as search and advertising starting on April 1. The levy will apply to firms with global sales of more than £500 million ($648 million), with at least £25 million ($32.4 million) coming from UK users.
Is the tax discriminatory? Yep.
What happens if the US technology companies pay up?
That’s easy. There are a number of European entities eager to implement a taxation model that generates revenue.
What happens if the US retaliates?
There will be collateral damage.
How likely will countries be to escalate if the tax fails? Some may implement a simple but Draconian solution: Throttling or blocking maybe?
Monopolies are good for those who obtain money from the firms in the cat-bird seat. Some European countries may not share the same view.
Stephen E Arnold, February 13, 2020
Google Creates a Podcast about Marketing
March 13, 2020
Just a quick note. Google now outputs “Think with Google Podcast.” You can listen to show #2 at this link. the subject is “Captivating Creative.” Not much in terms of technical information, but the Mad Ave types may go ga-ga with the breezy style and fluffy content. One amusing aspect of the show is that Google wants to know more about you. Listeners are enjoined to take a survey about the show. The appeal takes place before the show. Imagine. Google wants to know more about you. What a surprise. Now how about a search engine for podcasts? Oh, right. Google has one. It’s super too.
Stephen E Arnold, March 13, 2020