Is Deep Learning About to Hit a Wall?
August 13, 2020
Given the complexity of deep learning computations, it should be no surprise the technology uses an abundance of computing power. After analyzing analyzed over a thousand arXiv research papers and other sources, a team of researchers has to determined just how much power all this image classification, object detection, question answering, named entity recognition, and machine translation take now and, in theory, will take in the future. Researchers from MIT, the MIT-IBM Watson AI Lab, Underwood International College, and the University of Brasilia contributed to the research. Interesting Engineering summarizes the results in, “Deep Learning Reaching Computational Limits, Warns New MIT Study.” Reporter Loukia Papadopoulos writes:
“They did so by conducting two separate analyses of computational requirements: (1) Computation per network pass (the number of floating-point operations required for a single pass in the network), and (2) Hardware burden (the computational capability of the hardware used to train the model). The researchers found that just three years of algorithmic improvement was equivalent to a 10 times increase in computing power. They concluded that if progress continues along the same lines, deep learning‘s computational requirements will quickly become technically, economically, and environmentally prohibitive. However, all is not lost.
“‘Despite this, we find that the actual computational burden of deep learning models is scaling more rapidly than (known) lower bounds from theory, suggesting that substantial improvements might be possible,’ wrote the coauthors. The researchers found that there are deep learning improvements at the algorithmic level taking place all the time. Some of these include hardware accelerators, field-programmable gate arrays (FPGAs), and application-specific integrated circuits (ASICs). Time will tell whether deep learning will become more efficient or be replaced altogether.”
We wonder, if deep learning is replaced by something more efficient, what would that something look like? More marketing?
Cynthia Murrell, August 13, 2020
Quantexa: Awash in Cash
August 13, 2020
As the COVID-19 pandemic continues to spread, crime has not stopped. Instead of illegal activities taking place in person, bad actors have moved their activities online. Cybersecurity experts discovered that the pandemic has also made bad actors more desperate and are willing to take more risks online. Inventiva explains with the rise of risky cyber crimes, cybersecurity companies are seeing huge investments such as: “Quantexa Raises $64.7M To Bring Big Data Intelligence To Risk Analysis And Investigations.”
Quantexa is a UK-based company that designed a Contextual Decision Intelligence. Machine learning platform that analyzes data points to track criminal activity and build better profiles of companies’ customer base. Quantexa recently raised $64.7 million in Series C fundraising. The funds will be used to develop further tools for cybersecurity and expand Quantexa into other continents.
Quantexa has done work for banks and other businesses in the financial industry. The company hopes the fundraising infusion will set them up with work in the government/public sector and insurance companies.
Quantexa founder and CEO Vishal Marria said he created the company, because he encountered many challenges with investigations while he was an Ernst & Young executive director. He noticed that when potential bad actors were investigated, only small pieces of information were used. Marria thought of a better way, so he designed AL algorithms and used big data to find the bigger picture:
“As an example, typically, an investigation needs to do significantly more than just track the activity of one individual or one shell company, and you need to seek out the most unlikely connections between a number of actions in order to build up an accurate picture. When you think about it, trying to identify, track, shut down and catch a large money launderer (a typical use case for Quantexa’s software) is a classic big data problem.”
This sector of cybersecurity continues to grow and similar companies to Quantexa are also fundraising with investors.
Pieces of information always point to a larger puzzle. It begs the question how bad actors were caught in the past.
Whitney Grace, August 13, 2020
Data Federation: K2View Seizes Lance, Mounts Horse, and Sallies Forth
August 13, 2020
DarkCyber noted “K2View Raises $28 million to Automate Enterprise Data Unification.”
Here’s the write up’s explanation of the K2View:
K2View’s “micro-database” Fabric technology connects virtually to sources (e.g., internet of things devices, big data warehouses and data lakes, web services, and cloud apps) to organize data around segments like customers, stores, transactions, and products while storing it in secure servers and exposing it to devices, apps, and services. A graphical interface and auto-discovery feature facilitate the creation of two-way connections between app data sources and databases via microservices, or loosely coupled software systems. K2View says it leverages in-memory technology to perform transformations and continually keep target databases up to date.
The write up contains a block diagram:
Observations:
- It is difficult to determine how much manual (human) work will be required to deal with content objects not recognized by the K2View system
- What happens if the Internet connection to a data source goes down?
- What is the fall back when a microservice is not available or removed from service?
Many organizations offer solutions to disparate types of data scattered across many systems. Perhaps K2View will slay the digital windmills of silos, different types of data, and unstable connections? Silos have been part of the data landscape as long as Don Quixote has been spearing windmills.
Stephen E Arnold, August 13, 2020
TikTok: Exploiting, Exploited, or Exploiter?
August 12, 2020
I read “TikTok Tracked Users’ Data with a Tactic Google Banned.” [Note: You will have to pay to view this article. Hey, The Murdoch outfit has to have a flow of money to offset its losses from some interesting properties, right?]
The write up reveals that TikTok, the baffler for those over 50, tricked users. Those lucky consumers of 30 second videos allegedly had one of their mobile devices ID numbers sucked into the happy outfit’s data maw. Those ID numbers — unlike the other codes in mobile devices — cannot be changed. (At least, that’s the theory.)
What can one do with a permanent ID number? Let us count some of the things:
- Track a user
- Track a user
- Track a user
- Obtain information to pressure a susceptible person into taking an action otherwise not considered by that person?
I think that covers the use cases.
The write up states with non-phone tap seriousness, a business practice of one of the Murdoch progeny:
The identifiers collected by TikTok, called MAC address, are most commonly used for advertising purposes.
Whoa, Nellie. This here is real journalism. A MAC address is shorthand for “media access control.” I think of the MAC address as a number tattooed on a person’s forehead. Sure, it can be removed… mostly. But once a user watches 30-second videos and chases around for “real” information on a network, that unique number can be used to hook together otherwise disparate items of information. The MAC is similar to one of those hash codes which allow fast access to data in a relational structure or maybe an interest graph. One can answer the question, “What are the sites with this MAC address in log files?” The answer can be helpful to some individuals.
There are some issues bubbling beneath the nice surface of the Murdoch article; for example:
- Why did Google prohibit access to a MAC address, yet leave a method to access the MAC address available to those in the know? (Those in the know include certain specialized services support US government agencies, ByteDance, and just maybe Google. You know Google. That is the outfit which wants to create a global seismic system using every Android device who owner gives permission to monitor earthquakes. Yep, is that permission really needed? Ho, ho, ho.)
- What vendors are providing MAC address correlations across mobile app content and advertising data? The WSJ is chasing some small fish who have visited these secret data chambers, but are there larger, more richly robust outfits in the game? (Yikes, that’s actually going to take more effort than calling a university professor who runs a company about advertising as a side gig. Effort? Yes, not too popular among some “real” Murdoch reporters.)
- What are the use cases for interest graphs based on MAC address data? In this week’s DarkCyber video available on Facebook at this link, you can learn about one interesting application: Targeting an individual who is susceptible to outside influence to take an action that individual otherwise would not take. Sounds impossible, no? Sorry, possible, yes.
To summarize, interesting superficial coverage but deeper research was needed to steer the writing into useful territory and away from the WSJ’s tendency to drift closer to News of the World-type information. Bad TikTok, okay. Bad Google? Hmmmm.
Stephen E Arnold, August 12, 2020
SlideShare: Some Work to Do
August 12, 2020
DarkCyber noted “Scribd Acquires Presentation Sharing Service SlideShare from LinkedIn.” In 2004, one could locate presentations on Google by searching for the extension ppt and its variants. In 2006, SlideShare became available. Then something happened. PowerPoints became more difficult to locate. When an online search pointed to a PowerPoint deck, the content was:
- Marketing fluff
- Incorrectly rendered with weird typography and wonky graphics
- Corrupted files.
What about today? DarkCyber’s most recent foray into the slide deck content wilderness produced zero; for example, SlideShare search produced identical pages of search results. The query retrieved slide decks on unrelated topics. Even worse, a query would result in SlideShare’s sending email upon email pointing to other slide decks. The one characteristic of these related slide deck was/is that they were unrelated to the information we sought.
There are online presentation services. There are open source presentation tools like SoftMaker’s. There is the venerable Keynote which never quite converts a PowerPoint file correctly.
Is there a future in a searchable collection of slide decks? In theory, yes. In reality, the cost of finding, indexing, and making searchable presentations faces some big hurdles; for example:
- Many organizations — for example, DARPA — standardize on PDF file formats. These are okay, but indexing these can be an interesting challenge
- Some presenters put their talks in the cloud, hoping that an Internet connection will allow their slides to display
- The Zoom world puts PowerPoints and other presentation materials on the user’s computer, never to make it into a more publicly accessible repository.
Like the dream of collecting conferences, presentations, and poster sessions, some content remains beyond the reach of researchers and analysts. The desire to get anyone looking for a slide deck to subscribe to a service gives operators of this service a chance to engage in spreadsheet fever. Here’s how this works? If there are X researchers, and we get Y percent of them. We can charge Z per year? By substituting guesstimates for the variables, the service becomes a winner.
The reality is that finding information in slide decks is more difficult today than it was in 2004. Access to information is becoming more difficult. DarkCyber would like to experience a SlideShare with useful content, more effective search and retrieval, and far less one page duplicates of ads for books.
Someday. Maybe?
Stephen E Arnold, August 12, 2020
Chinese Clouds Move In: The Data May Be Wonky But the Message Is Clear
August 12, 2020
I am not a big fan of data generated by mid tier consulting firms. I call these outfits azure-chip consulting firms. Overall these firms are not able to attract and retain the type of individual who works at Bain, BCG, Booz Allen, and McKinsey. That’s a generalization, but it is one with which I am comfortable.
I read “Amazon Continues to Dominate Global IaaS Market: Report.” The data come from the estimable outfit Gartner, and the report involves real numbers. Some of the Gartner reports are what I would call subjective, but this report includes percentages.
The main point is that Amazon is the Big Dog for infrastructure as a service. The idea is that instead of have hardware in a closet and a couple of pizza eaters on the payroll to maintain the gear, one uses the cloud. It is magical and, best of all, does not involve capital expenditures and those pizza lovers.
The Gartner study explains that Alibaba, Amazon, Google, Microsoft, and TenCent dominate the market. That means Hewlett Packard and IBM are doing what they know how to do: Disappoint in fast growing market sectors.
The write up states:
These top five providers accounted for 77% of the total market in 2018, and in 2019 this number swelled to 80%. Throughout all IaaS providers in the market, 75% saw growth in 2018.
Okay, but to DarkCyber the main point is that two Chinese companies are now nosing into territory once dominated by US firms. As in the artificial intelligence market, the increased presence and success of these outfits from the Middle Kingdom is the big story. Even with squishy numbers, the change is evident.
Stephen E Arnold, August 12, 2020
Celebrity Net Worth: Misunderstanding the Google?
August 12, 2020
Navigate to this link to view the PDF of testimony from the founder of Celebrity Net Worth. Don’t you love it when Google hides urls? Try to find the document whose title appears in the next paragraph. Give up. It makes life so much easier for bad actors and people wondering “Where did that document come from?” and “Who puts this document online?” Helpful as ever. DarkCyber loves Google. Who needs to know provenance type information? Losers that’s who!
But I digress. Click here and read “Written Statement for the Record by Brian Warner for a hearing before The House Judiciary Subcommittee on Antitrust, Commercial and Administrative Law titled Online Platforms and Market Power, Part 2: Innovation and Entrepreneurship,” July 16, 2019.
The write up makes clear that Celebrity Net Worth misunderstood Google. Mr. Warner and others involved with Celebrity Net Worth believed one of the founders of Google who said in an S-1 document filed with the SEC in 2004:
“We want you to come to Google and quickly find what you want. Then we’re happy to send you to the other sites. In fact, that’s the point. The portal strategy tries to own all of the information … Most portals show their own content above content elsewhere on the web. We feel that’s a conflict of interest, analogous to taking money for search results … We want to get you out of Google and to the right place as fast as possible. It’s a very different model.” — Larry Page, co-founder of Google
The write up documents how Google scraped content from Celebrity Net Worth and displayed it on search results pages. Usage of Celebrity Net Worth dropped “20 percent.” By 2016, Celebrity Net Worth was no longer at the top of a search for celebrity net worth. Traffic dropped another “50 percent.” By 2019, traffic to Celebrity Net Worth was 80 percent lower than in 2014.
The write up includes these data:
In June 2019, search engine analyst Rand Fishkin put together a report about Google using data from web analytics firm Jumpshot . The data show that today an estimated 48.96% of all Google searches end with the searcher NOT clicking through to a website. The same report estimates that 7% of all search clicks go to a paid ad result and 12% go to properties owned by Google’s parent company Alphabet. Moreover, those stats do not even show the full extent of the problem because the data largely relied upon desktop devices and could not track searches that took users to a Google-owned app like the YouTube or Google Maps.
These data are highly suggestive. However, Google has to generate revenue, and it — like many other information finding services — does what’s best for itself. This self-interest is explained in terms of “user experience,” not in terms of making money, increasing information control, and ensuring that clicks benefit Google.
The misunderstanding is that individuals with good idea assumed that Mother Google wanted to be supportive, be friends, and do what some people would assume to be appropriate.
How’s that working out? That’s why DarkCyber loves Google. Just take the information provided and don’t think. Cuba Libre does not exist if it is not on a Google Map. The Auto Channel car reviews don’t exist if not in the Google search results. And Foundem? Who the heck runs that site? DarkCyber absolutely loves Google but sometimes one must embrace a Swiss Cow?
Stephen E Arnold, August 12, 2020
Defeating Facial Recognition: Chasing a Ghost
August 12, 2020
The article hedges. Check the title: “This Tool could Protect Your Photos from Facial Recognition.” Notice the “could”. The main idea is that people do not want their photos analyzed and indexed with the name, location, state of mind, and other index terms. I am not so sure, but the write up explains with that “could” coloring the information:
The software is not intended to be just a one-off tool for privacy-loving individuals. If deployed across millions of images, it would be a broadside against facial recognition systems, poisoning the accuracy of the data sets they gather from the Web. <
So facial recognition = bad. Screwing up facial recognition = good.
There’s more:
“Our goal is to make Clearview go away,” said Dr Ben Zhao, a professor of computer science at the University of Chicago.
Okay, a company is a target.
How’s this work:
Fawkes converts an image — or “cloaks” it, in the researchers’ parlance — by subtly altering some of the features that facial recognition systems depend on when they construct a person’s face print.
Several observations:
- In the event of a problem like the explosion in Lebanon, maybe facial recognition can identify some of those killed.
- Law enforcement may find narrowing a pool of suspects to a smaller group may enhance an investigative process.
- Unidentified individuals who are successfully identified “could” add precision to Covid contact tracking.
- Applying the technology to differentiate “false” positives from “true”positives in some medical imaging activities may be helpful in some medical diagnoses.
My concern is that technical write ups are often little more than social polemics. Examining the upside and downside of an innovation is important. Converting a technical process into a quest to “kill” a company, a concept, or an application of technical processes is not helpful in DarkCyber’s view.
Stephen E Arnold, August 12, 2020
Polynomial—Deep Learning for the Enterprise
August 12, 2020
A new company, backed by CoantumLeap Tech Ventures, promises remarkable results. Inventiva introduces us to “Polynomial.ai—Making Cutting Edge Deep-Learning Accessible to Enterprises.” Writer Priyadharshini Varadharajan claims most companies, both startups and established tech firms alike, fail to deliver effective automation and intelligent workflows because they lack “real” AI. She tells us:
“The team of developers and researchers Polynomial are working on platforms that can turn any enterprise workflow like sales, customer service, product fulfillment and new customer onboarding into an adaptive AI-powered workflow that is delivered to end-users via conversational interfaces (chatbots, voice, enterprise, and consumer channels). Polynomial uses Machine learning in turning unstructured and dispersed data into insights and builds upon conversational tech (NLP and NLG) to communicate with intelligence. Polynomial claims to have the world’s first massively parallel processing multi-brain deep learning architecture that is enterprise-ready. The tech stack is being leveraged by clients across the globe in areas as diverse as marketing automation, fintech, education tech & shared economy platforms. The tech stack has been validated in an enterprise context for over a billion transactions from an instance of a single app. Live instances in execution effectively use tens of thousands of Convolution Neural Network layers and over a hundred thousand nodes equivalent of processing.”
So far, three platforms make up Polynomial’s offerings, each available in both PaaS and SaaS models. CoCo converts enterprise workflows into chatbots that, we’re told, could pass for humans. These bots come pre-trained for specific domains and can be further trained with specific vocabulary. They also provide business intelligence from the data they process. SAM is a personalized collaboration platform that combines video, voice, and online environments. Finally, LENS processes unstructured data to reveal patterns in text, images, and video.
The article goes on to give a brief bio of each of Polynomial’s founders: Puneet Gupta, Pramod Bharadwaj, and Manish Bhai, so navigate there if curious. Founded in March 2020, the company is based in Singapore.
Cynthia Murrell, August 12, 2020
Grousing about PDFs: Gently But More People Need to Object
August 11, 2020
I learned about a file format that could be viewed or printed exactly like a composed page in the late 1980s. I think the technology was called Trapeze. A large New York outfit published books and magazines. These could be printed on paper using a series of manual and computer assisted work processes. The publishing outfit was on the look out for a way to create electronic replicas that were visually faithful to the printed version of a document, magazine, or book. The technology emerged after years of “stealth” trials and demonstrations as the Adobe Portable Document Format. Let’s assume that my recollection of Adobe Trapeze is correct. That makes the lovable Portable Document Format language, its original counter dongle thing, and its wonky limitations more than 30 years young. I thought about PDFs, which are now ubiquitous, when I read “PDF: Still Unfit for Human Consumption, 20 Years Later.” Yeah, the date is off, but the main idea of the article seems valid. PDFs suck.
According to the write up:
PDFs are typically large masses of text and images. The format is intended and optimized for print. It’s inherently inaccessible, unpleasant to read, and cumbersome to navigate online.
An important point in the write up seems to be:
Burying information in PDFs means that most people won’t read it. Participants in several of our recent usability studies on corporate websites and intranets did not appreciate PDFs and skipped right over them. They complained woefully whenever they encountered PDF files and many who opened PDFs quickly abandoned them.
What’s the fix? The write up suggests HTML Gateway pages. That’s an interesting idea, but it may be difficult to automate without some ground swell of support for this approach: A link, a summary, and then the PDF. So, think non starter.
I don’t want to dwell on the why a PDF is the way it is. The important point is that Adobe made the PDF an open standard. Why? Can you imagine trying to rehab a Trapeze artist who has minimal balance, a fear of heights, and greasy fingers?
Acrobat is anchored in the centuries-old tradition of print. Postscript solved some typesetting problems. Acrobat fell off the high wire. Today, the beloved but addled performer is popular, but his tricks are no longer capable of evoking applause.
Adobe now sells subscriptions, not Acrobat dongles. Adobe does not have a cost effective way of keeping its software from delivering malware into the soft innards of a user’s computing device. Adobe has made the PDF open so it can leave the innovation to the community to use.
When you want to make life difficult for a researcher, create an unindexed PDF. Let the user figure out how to OCR the page replica and then search the text. Nifty. Want a PDF to open a specific number of times and then block additional accesses? Well, figure it out yourself. Want a page replica of a patent filing that can be navigated in a seamless way? Get serious, folks.
Adobe sells subscriptions, not usability. Hopefully someone with an “affinity” for producing documents will come up with a solution. Neither the community or Adobe have cracked the code.
Stephen E Arnold, August 11, 2020