The Expanding PR Challenge for Cyber Threat Intelligence Outfits
August 10, 2022
Companies engaged in providing specialized services to law enforcement and intelligence entities have to find a way to surf on the building wave of NSO Group backlash.
What do I mean?
With the interest real journalists have in specialized software and services has come more scrutiny from journalists, financial analysts, and outfits like Citizens Lab.
The most recent example is the article which appeared in an online publication focused on gadgets. The write up is “: These Companies Know When You’re Pregnant—And They’re Not Keeping It Secret. Gizmodo Identified 32 Brokers Selling Data on 2.9 Billion Profiles of U.S. Residents Pegged as Actively Pregnant or Shopping for Maternity Products.” The write up reports:
A Gizmodo investigation into some of the nation’s biggest data brokers found more than two dozen promoting access to datasets containing digital information on millions of pregnant and potentially pregnant people across the country. At least one of those companies also offered a large catalogue of people who were using the same sorts of birth control that’s being targeted by more restrictive states right now. In total, Gizmodo identified 32 different brokers across the U.S. selling access to the unique mobile IDs from some 2.9 billion profiles of people pegged as “actively pregnant” or “shopping for maternity products.” Also on the market: data on 478 million customer profiles labeled “interested in pregnancy” or “intending to become pregnant.”
To add some zest to the write up, the “real news” outfit provided a link to 32 companies allegedly engaged in such data aggregation, normalization, and provision. Here are the 32 companies available from the gadget blogs link. Note sic means this is the actual company name. The trendy means very hip marketing.
123Push
Adprime Health
Adstra
Alike Audience
Anteriad (180byTwo)
Cross Pixel
Datastream Group
Dstillery (sic and trendy)
Epsilon
Experian
Eyeota (sic and trendy)
FieldTest
Fluent
Fyllo (sic)
LBDigital
Lighthouse (Ameribase Digital)
PurpleLab
Quotient
Reklaim (sic)
ShareThis
Skydeo
Stirista (Crosswalk) (sic)
TrueData
Valassis Digital
Weborama Inc
Ziff Davis
ZoomInfo (Clickagy)
How many of these do you recognize? Perhaps Experian, usually associated with pristine security practices and credit checks? What about Ziff Davis, the outfit which publishes blogs which reveal the inner workings of Microsoft and a number of other “insider” information? Or Zoom Info, an outfit once focused on executive information and now apparently identified as a source of information to make a pregnant teen fear the “parent talk”?
But the others? Most people won’t have a clue? Now keep in mind these are companies in the consumer information database business. Are there other firms with more imaginative sources of personal data than outfits poking around open source datasets, marketing companies with helpful log file data, and blossoming data scientists gathering information from retail outlets?
The answer is, “Yes, there are.”
That brings me to the building wave of NSO Group backlash. How does one bridge the gap between a government agency using NSO Group type tools and data?
The answer is that specialized software and services firms themselves are the building blocks, engineer-constructors, and architect-engineers of these important bridges.
So what’s the PR problem?
Each week interesting items of information surface. For example, cyber threat firms report new digital exploits. I read this morning about Cerebrate’s Redeemer. What’s interesting is that cyber threat firms provide software and services to block such malware, right? So the new threat appears to evade existing defense mechanisms. Isn’t this a circular proposition: Buy more cyber security. Learn about new threats. Ignore the fact that existing systems do not prevent the malware from scoring a home run? Iterate… iterate… iterate.
At some point, a “real news” outfit will identify the low profile engineers engaged in what might be called “flawed bridge engineering.”
Another PR problem is latent. People like the Kardashians are grousing about Instagram. What happens when influencers and maybe some intrepid “real journalists” push back against the firms collecting personal information very few people think of as enormously revelatory. Example: Who has purchased a “weapon” within a certain geofence? Or who has outfitted an RV with a mobile Internet rig? Or who has signed up for a Dark Web forum and accessed it with a made up user name?
Who provides these interesting data types?
The gadget blog is fixated on pregnancy because of the current news magnetism. Unfortunately the pursuit of clicks with what seems really significant does not provide much insight into the third party data businesses in the US, Israel, and other countries.
That’s the looming PR problem. Someone is going to step back and take a look at companies which do not want to become the subject of a gadget blog write up with a 30 plus word headline. In my opinion, that will happen, and that’s the reason certain third party data providers and specialized software and services firms face a crisis. These organizations have to sell to survive, except for a handful supported by their countries’ governments. If that marketing becomes too visible, then the gadget bloggers will out them.
What’s it mean when a cyber threat company hires a former mainstream media personality to bolster the company’s marketing efforts? I have some thoughts. Mine are colored by great sensitivity to the NSO Group and the allegations about its Pegasus specialized software. If these allegations are true, what better way to get personal data than suck it directly from a single target’s or group of targets’ mobile devices in real time?
Here are the chemical compounds in the data lab: The NSO Group-type technology which is increasingly understood and replicated. Gadget bloggers poking around data aggregators chasing ad and marketing service firms. Cyber threat companies trying to market themselves without being too visible.
The building wave is on the horizon, just moving slowly.
Stephen E Arnold, August 10, 2022
Oracle: Marketing Experience or MX = Zero?
August 10, 2022
How does one solve the problem MX = 0? One way is to set M to zero and X to zero and bingo! You have zero. If the information in the super select, restricted, juicy article called “Oracle Insiders Describe the Complete Chaos from Layoffs and Restructuring While Employees Brace for More” is accurate, the financially lucrative Oracle database system is unhappy with the firm’s marketing. Not just the snappy PowerPoint decks or the obedient database administrator documentation. Nope. Everything is apparently a bit of indigestion.
The write up which is as I have mentioned is super selected, restricted, and juicy is a bit jumbled. Nevertheless, I noted several observations I found interesting. Let me summarize the 1,100 word report this way: Lots of people from marketing and customer experience (whatever that is) have been fired. Okay. Now let’s look at the comments that struck me as significant. Keep in mind that I love Oracle. Yep, clients just pay those who can make the sleek, efficient, tightly integrated components hum like an electric motor on a fully functioning Ford F 150 Lightning. Here we go. (My comments appear in italics after each bullet.)
- “The common verb to describe ACX is that they were obliterated,” said a person who works at Oracle. (I quite liked the use of the word “obliterated.” Was Oracle using a Predator launched flying ginsu management bomb or just an email or maybe a Zoom call?)
- “There’s no marketing anymore…” (My question is, “Was there ever any marketing at Oracle?” Bombast, yes. Rah rah conferences. Jet flights after curfew at the San Jose airport. But marketing? In my opinion, no.)
- “There’s a sense among many at Oracle of impending doom…” (Yep, upbeat stuff.)
- “We’ve been kind of working like zombies the last couple of weeks because there’s just this sense of ‘What am I doing here?” (The outfit on the former Sea World exit excels at management. Well, maybe it doesn’t? How does the Oracle hit above its weight? That’s a good question. Let’s ask Cerner about the electronic medical record business and its seamless functioning with the Oracle database, shall I? No I shall not.)
- “…Oracle’s code base is so complicated that it can take years before engineers are fully up to speed with how everything works, and workers with over a decade of experience were cut…” (Ah, ha, Oracle is weeding out the dinobabies. Useless deadwood. A 20 something engineer can figure out where an entire database is hiding.)
Net net: I hate to suggest this, but perhaps some database types think using AWS, the GOOG, or the super secure MSFT data management systems is better, faster, and cheaper. Pick two.
Stephen E Arnold, August 10, 2022
TikTok: Allegations of Data Sharing with China! Why?
June 21, 2022
If one takes a long view about an operation, some planners find information about the behavior of children or older, yet immature, creatures potentially useful. What if a teenager, puts up a TikTok video presenting allegedly “real” illegal actions? Might that teen in three or four years be a target for soft persuasion? Leaking the video to an employer? No, of course not. Who would take such an action?
I read “Leaked Audio from 80 Internal TikTok Meetings Shows That US User Data Has Been Repeatedly Accessed from China.” Let’s assume that this allegation has a tiny shred of credibility. The financially-challenged Buzzfeed might be angling for clicks. Nevertheless, I noted this passage:
…according to leaked audio from more than 80 internal TikTok meetings, China-based employees of ByteDance have repeatedly accessed nonpublic data about US TikTok users…
Is the audio deeply faked? Could the audio be edited by a budding sound engineer?
Sure.
And what’s with the TikTok “connection” to Oracle? Probably just a coincidence like one of Oracle’s investment units participating in Board meetings for Voyager Labs. A China-linked firm was on the Board for a while. No big deal. Voyager Labs? What does that outfit do? Perhaps it is the Manchester Square office and the delightful restaurants close at hand?
The write up refers to data brokers too. That’s interesting. If a nation state wants app generated data, why not license it. No one pays much attention to “marketing services” which acquire and normalize user data, right?
Buzzfeed tried to reach a wizard at Booz, Allen. That did not work out. Why not drive to Tyson’s Corner and hang out in the Ritz Carlton at lunch time. Get a Booz, Allen expert in the wild.
Yep, China. No problem. Take a longer-term view for creating something interesting like an insider who provides a user name and password. Happens every day and will into the future. Plan ahead I assume.
Real news? Good question.
Stephen E Arnold, June 21, 2022
Near: A Complement to ClearView AI?
May 26, 2022
“Data Intelligence Startup Near, with 1.6B anonymized User IDs, Lists on NASDAQ via SPAC at a $1B Market Cap; Raises $100M” is an interesting story. On one hand, in the midst of some financial headwinds, the outfit Near is a unicorn. That’s exciting for some. The most significant part of the short item is this passage: Near offers
anonymised, location-based profiles of users based on a trove of information that Near sources and then merges from phones, data partners, carriers and its customers. It claims the database has been built “with privacy by design.”
The word merging as in “merging data from different sources” is not jargony enough. The Near write up uses the term “stitching” as in “threads which hold the parts of a football together.” I prefer the term “federating” as in “federating data.”
The idea is a good one. Take information from different sources, index it (assign tags today, of course) and group information about a person under that entity’s “name.” This is a useful workflow, and my hunch is that the system works best for individuals leaving digital footprints and crumbs of ones and zeros behind as these “entities” go about their business.
The successful merging and profiling will give Near a competitive advantage. Like ClearView and many other companies, scraping and licensing commercial datasets can produce a valuable data asset.
On the downside, as ClearView has learned as it explained its business to legal eagles, some concerns for privacy can arise. Assurances of privacy have created some issues for firms performing similar work for government agencies. Law enforcement and intelligence professionals are likely to show some interest in Near’s products and services.
Successfully navigating marketing to commercial outfits and selling to government agencies is like sailing into an unfamiliar port with a very large boat.
Kudos to near for its funding. Now it will be interesting to watch the firm’s management walk the marketing tightrope over the Niagara Falls of cash flow as legal eagles circle.
Stephen E Arnold, May 26, 2022
Synthetic Data: Cheap, Like Fast Food
May 25, 2022
Fabricated data may well solve some of the privacy issues around healthcare-related machine learning, but what new problems might it create? The Wall Street Journal examines the technology in, “Anthem Looks to Fuel AI Efforts with Petabytes of Synthetic Data.” Reporter Isabelle Bousquette informs us Anthem CIO Anil Bhatt has teamed up with Google Cloud to build the synthetic data platform. Interesting choice, considering the health insurance company has been using AWS since 2017.
The article points out synthetic data can refer to either anonymized personal information or entirely fabricated data. Anthem’s effort involves the second type. Bousquette cites Bhatt as well as AI and automation expert Ritu Jyoti as she writes:
“Anthem said the synthetic data will be used to validate and train AI algorithms that identify things like fraudulent claims or abnormalities in a person’s health records, and those AI algorithms will then be able to run on real-world member data. Anthem already uses AI algorithms to search for fraud and abuse in insurance claims, but the new synthetic data platform will allow it to scale. Personalizing care for members and running AI algorithms that identify when they may require medical intervention is a more long-term goal, said Mr. Bhatt. In addition to alleviating privacy concerns, Ms. Jyoti said another advantage of synthetic data is that it can reduce biases that exist in real-world data sets. That said, she added, you can also end up with data sets that are worse than real-world ones. ‘The variation of the data is going to be very, very important,’ said Mr. Bhatt, adding that he believes the variation in the synthetic data will ultimately be better than the company’s real-world data sets.”
The article notes the use of synthetic data is on the rise. Increasing privacy and reducing bias both sound great, but that bit about potentially worse data sets is concerning. Bhatt’s assurance is pleasant enough, but how can will we know whether his confidence pans out? Big corporations are not exactly known for their transparency.
Cynthia Murrell, May 25, 2022
Data Federation? Loser. Go with a Data Lake House
February 8, 2022
I have been the phrase “data lake house” or “datalake house.” I noted some bold claims about a new data lake house approach in “Managed Data Lakehouse Startup Onehouse Launches with $8M in Funding.” The write up states:
One of the flagship features of Onehouse’s lakehouse service is a technology called incremental processing. It allows companies to start analyzing their data soon after it’s generated, which is difficult when using traditional technologies.
The write up adds:
The company’s lakehouse service automatically optimizes customers’ data ingestion workflows to improve performance, the startup says. Because the service is delivered via the cloud on a fully managed basis, customers don’t have to manage the underlying infrastructure.
The idea of course is that traditional methods of handling data are [a] slow, [b] expensive, and [c] difficult to implement.
The premise is that the data lake house delivers more efficient use of data and a way to “future proof the data architected for machine learning / data science down the line.”
When I read this I thought of Vivisimo’s explanation of its federating method. IBM bought Vivisimo, and I assume that it is one of the ingredient in IBM’s secret big data sauce. MarkLogic also suggested in one presentation I sat through that its system would ingest data and the MarkLogic system (once eyed by the Google as a possible acquisition) would allow near real time access to the data. One person in the audience was affiliated with the US Library of Congress, and that individual seemed quite enthused about MarkLogic. And there are companies which facilitate data manipulation; for example, Kofax and its data connectors.
From my point of view, the challenge is that today large volumes of data are available. These data have to be moved from point A to point B. Ideally data do not require transformation. At some point in the flow, data in motion can be processed. There are firms which offer real time or near real time data analytics; for example, Trendalyze.com.
Conversion, moving, saving, and then doing something “more” with the data remain challenges. Maybe Onehouse has the answer?
Stephen E Arnold, February 8, 2022
Coalesce: Tackling the Bottleneck Few Talk About
February 1, 2022
Coalesce went stealth, the fancier and more modern techno slang for “going dark,” to work on projects in secret. The company has returned to the light, says Crowd Fund Insider with a robust business plan and product, plus loads of funding: “Coalesce Debuts From Stealth, Attracts $5.92M For Analytics Platform.”
Coalesce is run by a former Oracle employee and it develops products and services similar to Oracle, but with a Marklogic spin. That is one way to interpret how Coalesce announced its big return with its Coalesce Data Transformation platform that offers modeling, cleansing, governance, and documentation of data with analytical efficiency and flexibility. Do no forger that 11.2 Capital and GreatPoint Ventures raised $5.92 million in seed funding for the new data platform. Coalesce plans to use the funding for engineering functions, developing marketing strategy, and expanding sales.
Coalesce noticed that there is a weak link between organizations’ cloud analytics and actively making use of data:
“ ‘The largest bottleneck in the data analytics supply chain today is transformations. As more companies move to the cloud, the weaknesses in their data transformation layer are becoming apparent,’ said Armon Petrossian, the co-founder and CEO of Coalesce. “Data teams are struggling to keep up with the demands from the business, and this problem has only continued to grow with the volumes and complexity of data combined with the shortage of skilled people. We are on a mission to radically improve the analytics landscape by making enterprise-scale data transformations as efficient and flexible as possible.’”
Coalesce might be duplicating Oracle and MarkLogic, but if they have discovered a niche market in cloud analytics then they are about to rocket from their stealth. Hopefully the company will solve the transformation problem instead of issuing marketing statements as many other firms do.
Whitney Grace, February 1, 2022
Fuzzifying Data: Yeah, Sure
January 19, 2022
Data are often alleged to be anonymous, but they may not be. Expert companies such as LexisNexis, Acxiom, and mobile phone providers argue that as long as personal identifiers, including names, address, etc., are removed from data it is rendered harmless. Unfortunately data can be re-anonymized without too much trouble. Wired posted Justin Sherman’s article, “Big Data May Not Know Your Name. But It Knows Everything Else.”
Despite humans having similar habits, there is some truth in the phrase “everyone is unique.” With a few white hat or black hat tactics, user data can be traced back to the originator. Data proves to be not only individualized based on a user’s unique identity, but there are also minute ways to gather personal information ranging from Internet search history, GPS logs, and IP address. Companies that want to sell you goods and services purchase the data, but also governments and law enforcement agencies do as well.
There are stringent privacy regulations in place, but in the face of the all mighty dollar and governments bypassing their own laws, it is like spitting in the wind. The scariest fact is that nothing is secret anymore:
“The irony that data brokers claim that their “anonymized” data is risk-free is absurd: Their entire business model and marketing pitch rests on the premise that they can intimately and highly selectively track, understand, and micro target individual people.
This argument isn’t just flawed; it’s also a distraction. Not only do these companies usually know your name anyway, but data simply does not need to have a name or social security number attached to cause harm. Predatory loan companies and health insurance providers can buy access to advertising networks and exploit vulnerable populations without first needing those people’s names. Foreign governments can run disinformation and propaganda campaigns on social media platforms, leveraging those companies’ intimate data on their users, without needing to see who those individuals are.”
Companies and organizations need to regulate themselves, while governments need to pass laws that protect their citizens from bad actors. Self-regulation in the face of dollar signs is like asking a person with sweet tooth to stop eating sugar. However, if governments concentrated on types of data and types of data collection and sharing to regulate rather than a blanket solution could protect users.
Let’s think about the implications. No, let’s not.
Whitney Grace January 19, 2022
What Is Better Than One Logic? Two Logics?
December 22, 2021
Search, database, intelligence, data management and analytics firm MarkLogic continues to evolve and grow. Business Wire reveals, “MarkLogic Acquires Leading Metadata Management Provider Smartlogic.” Good choice—we have found Smartlogic to be innovative, reliable, and responsive. We expect MarkLogic will be able to preserve these characteristics, considering Smartlogic’s top brass will be sticking around. The press release tells us:
“As part of the transaction, Smartlogic’s founder and Chief Executive Officer, Jeremy Bentley, as well as other members of the senior management team, will join the MarkLogic executive team. Financial terms of the transaction were not disclosed. Founded in 2006, Smartlogic has deciphered, filtered, and connected data for many of the world’s largest organizations to help solve their complex data problems. Global organizations in the energy, healthcare, life sciences, financial services, government and intelligence, media and publishing, and high-tech manufacturing industries rely on Smartlogic’s metadata and AI platform every day to enrich enterprise information with context and meaning, as well as extract critical facts, entities, and relationships to power their businesses. For the past four years, Smartlogic has been recognized as a leader by Gartner’s Magic Quadrant for Metadata Management Solutions and by Info-Tech as the preeminent leader of the Data Quadrant for Metadata Management (May 2021).”
Based in San Carlos, California, MarkLogic was founded in 2001 and gained steam in 2012 when it picked up former Oracle database division leader Gary Bloom. Smartlogic is headquartered in San Jose, less than 30 miles away. Perhaps MarkLogic’s XML with taxonomy management will triumph in more markets and bring the Oracle outfit to its knees? Perhaps index term management is the killer app?
Cynthia Murrell, December 22, 2021
What Google Knows about the Honest You
December 10, 2021
I read this quote in a Kleenex story about Google’s lists of popular searches:
“You’re never as honest as you are with your search engine. You get a sense of what people genuinely care about and genuinely want to know — and not just how they’re presenting themselves to the rest of the world.”
The alleged Googler crafting this statement is a data editor. You can read more about the highly selective and unverified Google search trends in “What Google’s Trending Searches Say about America in 2021.”
For me, the statement allows several observations:
- A person acting in an unguarded way reveals information not usually disseminated in “guarded” settings; for example, a job interview
- The word “honest” implies an unvarnished look at the psycho-social factors within a single person
- A collection of data points about the psycho-social aspects of a single person makes it possible to tag, classify, and relate that individual to others. Numerical procedures allow a person or system with access to those data to predict certain behaviors, predispositions, or actions.
Thus, the collection of searches, clicks, and items created by an individual using Google services such as Gmail and YouTube create a palette of color from which a data maestro can paint a picture.
Predestination has never been easier, more automatable, or cheaper to convert into an actionable knowledgebase for smart software. Yep, just simple queries. Useful indeed.
Stephen E Arnold, December 10, 2021