One Tiny Point about Oracle
September 5, 2022
When silicon valley-type real news outfits “correct” one another, we tend to wonder why. In this case, it appears Gizmodo writer Matt Novak feels readers should know one key bit of information omitted by a recent Vox article: the fact that “Larry Ellison’s Oracle Started As a CIA Project.” He writes:
“Vox simply says that Oracle was founded in ‘the late 1970s’ and ‘sells a line of software products that help large and medium-sized companies manage their operations.’ All of which is true! But as the article continues, it somehow ignores the fact that Oracle has always been a significant player in the national security industry. And that its founder would not have made his billions without helping to build the tools of our modern surveillance state.”
One of those tools, of course, being the sort of database Oracle specializes in. The write-up emphasizes Ellison’s longstanding belief in a large federal database, asserting the attacks of 9/11 gave the tech tycoon the chance to push his vision. Novak quotes:
“‘The single greatest step we Americans could take to make life tougher for terrorists would be to ensure that all the information in myriad government databases was copied into a single, comprehensive national security database,’ Larry Ellison wrote in the New York Times in January of 2002. ‘Creating such a database is technically simple. All we have to do is copy information from the hundreds of separate law enforcement databases into a single database. A national security database could be built in a few months,’ Ellison explained. ‘A national security database combined with biometrics, thumb prints, hand prints, iris scans or whatever is best can be used to detect people with false identities.'”
We are not sure whether Novak is suggesting Vox deliberately downplayed Oracle’s role in facilitating a surveillance state infrastructure. He certainly wants us to know the company’s fortunes rose after that fateful day in September 2001, with federal government contracts making up 23 percent if its licensing revenue in 2003 to the tune of $2.5 billion. We are reminded Oracle’s David Carney stated in 2002, while trying but failing to avoid sounding callous, that 9/11 had been good for business. Perhaps Vox did not believe this facet of Oracle’s history to be relevant, but Gizmodo can consider us, dear readers, duly informed.
Cynthia Murrell, September 5, 2022
Data: A Disappointing Ride Down Zero Lane to Cell One
August 26, 2022
Projects meant to glean business insights through the analysis of vast troves of data still tend to disappoint. On its blog, British data-project management firm Brijj lists “5 Reasons Why 80% of Data and Insight Projects Fail.” The write-up tells us:
“In the UK alone, we spend £24bn on data projects every year. According to recent studies, however, organizational leadership has been dissatisfied with the value they get from data. In fact, they consider 80% of all data projects a failure. That equates to £19bn of waste. And why? Because so many don’t do the basics well. They never stood a chance.”
Not surprisingly, writer and Brijj founder/CEO Adrian Mitchell suggests consulting outside data experts from the start to make sure one’s project delivers those sweet, sweet insights:
“The bottom line is that both data creators and their business customers need to be involved in the data & insight project from the initial question through to the outcome and work closely together for it to provide actionable insights and urge action. Currently, there are many gaps between the two groups, resulting in disconnect, frustrations, time and financial losses, and no real-world outcomes. Organizations need to close these to truly harness the power of data and maximize its value.”
The list Mitchell offers looks awfully familiar; we think we have heard some of these “reasons” before. We are told the biggest problem is asking the wrong questions in the first place. Then there is, as mentioned above, a lack of collaboration between data analysts and their clients. If one has managed to gather useful bits of knowledge, they must be both communicated to the right people and made easy to find. Finally, standardized systems (like Brijj’s, we presume) should be put in place to make the whole process easier for the technically disinclined.
Perhaps Mitchell is right and these measures can help some companies make the most of the data they were persuaded to accumulate? It is worth keeping in mind, though, that any concepts derived by software have limitations… just like a blind data.
Cynthia Murrell, August 26, 2022
Quality Defined: Just Two Ways?
August 25, 2022
I am not sure what to make of “The Two Types of Quality.” A number of years ago I was in Osaka and Tokyo to deliver several lectures about commercial databases. The topic had to be narrowed, so I focused on the differences between a commercially successful database like those produced by the Courier Journal & the Louisville Times, the Petroleum Institute, and ERIC, among a handful of other must-have professional-operated databases. I explained that database quality could be defined by technical requirements; for example, timeliness of record updating, assigning a specific number of index terms from a subject matter expert developed and maintained controlled vocabulary (term) list, accurate spelling, abstracts conforming to the database publishers’ editorial guidelines, etc. The other type of quality was determined by the user; for example, was the information provided by the database timely, accurate, and in line with the scope of the database. Neither definition of quality was particularly good. I made this point in my lectures. Quality is like any abstract noun. Context defines it. Today quality means, as I was told after a lecture in Germany, “good enough.” I thought the serious young person was joking. He was not. This professional, who worked for an important European Union department, embraced “good enough” as the definition of quality.
The cited essay explains that there are two types of quality. The first is “purely functional.” I think that’s close to my definition of quality for old-fashioned databases. There were expensive to produce, difficult to index in a useful, systematic way even with our human plus smart software systems, and quite difficult to explain to a person unfamiliar with the difference between looking up something using Google and looking up a chemical structure in Chemical Abstracts. When I was working full time, I had a tough time explaining that Google was neither comprehensive nor focused on consistency. Google wanted to sell ads. Popularity determined quality, but that’s not what “quality” means to a person researching a case in a commercial database of legal information.
The second is “quality that fascinates.” I must admit that this is related to my notion of context, but I am not sure that “fascination” is exactly what I mean by context. A ball of string can fascinate the cat owner as well as the cat. Is this quality? Not in my book.
Several observations:
- Quality cannot be defined. I do believe that a company, its products, and an individual can produce objects or services that serve a purpose and do not anger the user. Well, maybe not anger. How about annoy, frustrate, stress, or force a product or service change. It is also my perception that quality is becoming a lost art like chipping stone arrowheads.
- The word “quality” can be defined in terms of cost cutting. I use products and services that are not without big flaws. Whether it is getting Microsoft Windows to print or preventing a Tesla from exploding into flames, short cuts seem to create issues. These folks are not angels in my opinion.
- The marketers, many of those whom I met were former journalists or art history majors, explain quality and other abstract terms in a way which obfuscates the verifiable behavior of a product or service. These folks are mendacious in my opinion.
Net net: Quality now means good enough.
That’s why nothing seems to work: Airport luggage handling, medical treatments of a president, electric vehicles, contacting a local government agency about a deer killed by a rolling smoke pickup truck driver, etc.
Quality products and services exist. Is it possible to find these using Bing, Google, or Yandex?
Nope.
Stephen E Arnold, August 25, 2022
TikTok: Allegations of Keylogging
August 22, 2022
I am not a TikTok person; therefore, I exist in a trend free zone. Others are sucking down short videos with alacrity. I admire a company, possibly linked to China’s government, which has pioneered a next generation video editor and caused the Alphabet Google YouTube DeepMind thing to innovate via its signature “me too” method of innovation.
Now TikTok has another feature, which is an interesting allegation. “TikTok’s In-App Browser Can Monitor Your Every Click and Keystroke” asserts:
When Krause [a security researcher] dug a little deeper into what these apps’ in-app browsers really do, he’d found that TikTok does some bad things, including monitoring all of users’ keyboard inputs and taps. So, if you open a web page inside of TikTok’s app, and enter your credit card details there, TikTok can access all of those details. TikTok is also the only app, out of all the apps Krause has looked into, that doesn’t even offer an option to open the link in the device’s default browser, forcing you to go through its own in-app browser.
Let’s assume this finding is spot on. First question: Does anyone care? Second question: So what?
I don’t have answers to either question. I do, however, have several observations:
- Oracle, for some reason, seems to care. The estimable database company is making an effort to find information that suggests TikTok data are kept in a cupboard. Only grandma can check out who will be an easy target for psychological manipulation. No results yet, but if TikTok is a neutral service, why’s Oracle involved?
- A number of Silicon Valley pundits have pointed out that TikTok is no big deal. That encapsulates the “so what” issue. “Put that head in the sand and opine forward” is the rule of thumb for these insightful folks.
- Keyloggers are a fave of certain actors. TikTok may have found them useful for benign purposes.
Quite an allegation.
Stephen E Arnold, August 22, 2022
The Expanding PR Challenge for Cyber Threat Intelligence Outfits
August 10, 2022
Companies engaged in providing specialized services to law enforcement and intelligence entities have to find a way to surf on the building wave of NSO Group backlash.
What do I mean?
With the interest real journalists have in specialized software and services has come more scrutiny from journalists, financial analysts, and outfits like Citizens Lab.
The most recent example is the article which appeared in an online publication focused on gadgets. The write up is “: These Companies Know When You’re Pregnant—And They’re Not Keeping It Secret. Gizmodo Identified 32 Brokers Selling Data on 2.9 Billion Profiles of U.S. Residents Pegged as Actively Pregnant or Shopping for Maternity Products.” The write up reports:
A Gizmodo investigation into some of the nation’s biggest data brokers found more than two dozen promoting access to datasets containing digital information on millions of pregnant and potentially pregnant people across the country. At least one of those companies also offered a large catalogue of people who were using the same sorts of birth control that’s being targeted by more restrictive states right now. In total, Gizmodo identified 32 different brokers across the U.S. selling access to the unique mobile IDs from some 2.9 billion profiles of people pegged as “actively pregnant” or “shopping for maternity products.” Also on the market: data on 478 million customer profiles labeled “interested in pregnancy” or “intending to become pregnant.”
To add some zest to the write up, the “real news” outfit provided a link to 32 companies allegedly engaged in such data aggregation, normalization, and provision. Here are the 32 companies available from the gadget blogs link. Note sic means this is the actual company name. The trendy means very hip marketing.
123Push
Adprime Health
Adstra
Alike Audience
Anteriad (180byTwo)
Cross Pixel
Datastream Group
Dstillery (sic and trendy)
Epsilon
Experian
Eyeota (sic and trendy)
FieldTest
Fluent
Fyllo (sic)
LBDigital
Lighthouse (Ameribase Digital)
PurpleLab
Quotient
Reklaim (sic)
ShareThis
Skydeo
Stirista (Crosswalk) (sic)
TrueData
Valassis Digital
Weborama Inc
Ziff Davis
ZoomInfo (Clickagy)
How many of these do you recognize? Perhaps Experian, usually associated with pristine security practices and credit checks? What about Ziff Davis, the outfit which publishes blogs which reveal the inner workings of Microsoft and a number of other “insider” information? Or Zoom Info, an outfit once focused on executive information and now apparently identified as a source of information to make a pregnant teen fear the “parent talk”?
But the others? Most people won’t have a clue? Now keep in mind these are companies in the consumer information database business. Are there other firms with more imaginative sources of personal data than outfits poking around open source datasets, marketing companies with helpful log file data, and blossoming data scientists gathering information from retail outlets?
The answer is, “Yes, there are.”
That brings me to the building wave of NSO Group backlash. How does one bridge the gap between a government agency using NSO Group type tools and data?
The answer is that specialized software and services firms themselves are the building blocks, engineer-constructors, and architect-engineers of these important bridges.
So what’s the PR problem?
Each week interesting items of information surface. For example, cyber threat firms report new digital exploits. I read this morning about Cerebrate’s Redeemer. What’s interesting is that cyber threat firms provide software and services to block such malware, right? So the new threat appears to evade existing defense mechanisms. Isn’t this a circular proposition: Buy more cyber security. Learn about new threats. Ignore the fact that existing systems do not prevent the malware from scoring a home run? Iterate… iterate… iterate.
At some point, a “real news” outfit will identify the low profile engineers engaged in what might be called “flawed bridge engineering.”
Another PR problem is latent. People like the Kardashians are grousing about Instagram. What happens when influencers and maybe some intrepid “real journalists” push back against the firms collecting personal information very few people think of as enormously revelatory. Example: Who has purchased a “weapon” within a certain geofence? Or who has outfitted an RV with a mobile Internet rig? Or who has signed up for a Dark Web forum and accessed it with a made up user name?
Who provides these interesting data types?
The gadget blog is fixated on pregnancy because of the current news magnetism. Unfortunately the pursuit of clicks with what seems really significant does not provide much insight into the third party data businesses in the US, Israel, and other countries.
That’s the looming PR problem. Someone is going to step back and take a look at companies which do not want to become the subject of a gadget blog write up with a 30 plus word headline. In my opinion, that will happen, and that’s the reason certain third party data providers and specialized software and services firms face a crisis. These organizations have to sell to survive, except for a handful supported by their countries’ governments. If that marketing becomes too visible, then the gadget bloggers will out them.
What’s it mean when a cyber threat company hires a former mainstream media personality to bolster the company’s marketing efforts? I have some thoughts. Mine are colored by great sensitivity to the NSO Group and the allegations about its Pegasus specialized software. If these allegations are true, what better way to get personal data than suck it directly from a single target’s or group of targets’ mobile devices in real time?
Here are the chemical compounds in the data lab: The NSO Group-type technology which is increasingly understood and replicated. Gadget bloggers poking around data aggregators chasing ad and marketing service firms. Cyber threat companies trying to market themselves without being too visible.
The building wave is on the horizon, just moving slowly.
Stephen E Arnold, August 10, 2022
Oracle: Marketing Experience or MX = Zero?
August 10, 2022
How does one solve the problem MX = 0? One way is to set M to zero and X to zero and bingo! You have zero. If the information in the super select, restricted, juicy article called “Oracle Insiders Describe the Complete Chaos from Layoffs and Restructuring While Employees Brace for More” is accurate, the financially lucrative Oracle database system is unhappy with the firm’s marketing. Not just the snappy PowerPoint decks or the obedient database administrator documentation. Nope. Everything is apparently a bit of indigestion.
The write up which is as I have mentioned is super selected, restricted, and juicy is a bit jumbled. Nevertheless, I noted several observations I found interesting. Let me summarize the 1,100 word report this way: Lots of people from marketing and customer experience (whatever that is) have been fired. Okay. Now let’s look at the comments that struck me as significant. Keep in mind that I love Oracle. Yep, clients just pay those who can make the sleek, efficient, tightly integrated components hum like an electric motor on a fully functioning Ford F 150 Lightning. Here we go. (My comments appear in italics after each bullet.)
- “The common verb to describe ACX is that they were obliterated,” said a person who works at Oracle. (I quite liked the use of the word “obliterated.” Was Oracle using a Predator launched flying ginsu management bomb or just an email or maybe a Zoom call?)
- “There’s no marketing anymore…” (My question is, “Was there ever any marketing at Oracle?” Bombast, yes. Rah rah conferences. Jet flights after curfew at the San Jose airport. But marketing? In my opinion, no.)
- “There’s a sense among many at Oracle of impending doom…” (Yep, upbeat stuff.)
- “We’ve been kind of working like zombies the last couple of weeks because there’s just this sense of ‘What am I doing here?” (The outfit on the former Sea World exit excels at management. Well, maybe it doesn’t? How does the Oracle hit above its weight? That’s a good question. Let’s ask Cerner about the electronic medical record business and its seamless functioning with the Oracle database, shall I? No I shall not.)
- “…Oracle’s code base is so complicated that it can take years before engineers are fully up to speed with how everything works, and workers with over a decade of experience were cut…” (Ah, ha, Oracle is weeding out the dinobabies. Useless deadwood. A 20 something engineer can figure out where an entire database is hiding.)
Net net: I hate to suggest this, but perhaps some database types think using AWS, the GOOG, or the super secure MSFT data management systems is better, faster, and cheaper. Pick two.
Stephen E Arnold, August 10, 2022
TikTok: Allegations of Data Sharing with China! Why?
June 21, 2022
If one takes a long view about an operation, some planners find information about the behavior of children or older, yet immature, creatures potentially useful. What if a teenager, puts up a TikTok video presenting allegedly “real” illegal actions? Might that teen in three or four years be a target for soft persuasion? Leaking the video to an employer? No, of course not. Who would take such an action?
I read “Leaked Audio from 80 Internal TikTok Meetings Shows That US User Data Has Been Repeatedly Accessed from China.” Let’s assume that this allegation has a tiny shred of credibility. The financially-challenged Buzzfeed might be angling for clicks. Nevertheless, I noted this passage:
…according to leaked audio from more than 80 internal TikTok meetings, China-based employees of ByteDance have repeatedly accessed nonpublic data about US TikTok users…
Is the audio deeply faked? Could the audio be edited by a budding sound engineer?
Sure.
And what’s with the TikTok “connection” to Oracle? Probably just a coincidence like one of Oracle’s investment units participating in Board meetings for Voyager Labs. A China-linked firm was on the Board for a while. No big deal. Voyager Labs? What does that outfit do? Perhaps it is the Manchester Square office and the delightful restaurants close at hand?
The write up refers to data brokers too. That’s interesting. If a nation state wants app generated data, why not license it. No one pays much attention to “marketing services” which acquire and normalize user data, right?
Buzzfeed tried to reach a wizard at Booz, Allen. That did not work out. Why not drive to Tyson’s Corner and hang out in the Ritz Carlton at lunch time. Get a Booz, Allen expert in the wild.
Yep, China. No problem. Take a longer-term view for creating something interesting like an insider who provides a user name and password. Happens every day and will into the future. Plan ahead I assume.
Real news? Good question.
Stephen E Arnold, June 21, 2022
Near: A Complement to ClearView AI?
May 26, 2022
“Data Intelligence Startup Near, with 1.6B anonymized User IDs, Lists on NASDAQ via SPAC at a $1B Market Cap; Raises $100M” is an interesting story. On one hand, in the midst of some financial headwinds, the outfit Near is a unicorn. That’s exciting for some. The most significant part of the short item is this passage: Near offers
anonymised, location-based profiles of users based on a trove of information that Near sources and then merges from phones, data partners, carriers and its customers. It claims the database has been built “with privacy by design.”
The word merging as in “merging data from different sources” is not jargony enough. The Near write up uses the term “stitching” as in “threads which hold the parts of a football together.” I prefer the term “federating” as in “federating data.”
The idea is a good one. Take information from different sources, index it (assign tags today, of course) and group information about a person under that entity’s “name.” This is a useful workflow, and my hunch is that the system works best for individuals leaving digital footprints and crumbs of ones and zeros behind as these “entities” go about their business.
The successful merging and profiling will give Near a competitive advantage. Like ClearView and many other companies, scraping and licensing commercial datasets can produce a valuable data asset.
On the downside, as ClearView has learned as it explained its business to legal eagles, some concerns for privacy can arise. Assurances of privacy have created some issues for firms performing similar work for government agencies. Law enforcement and intelligence professionals are likely to show some interest in Near’s products and services.
Successfully navigating marketing to commercial outfits and selling to government agencies is like sailing into an unfamiliar port with a very large boat.
Kudos to near for its funding. Now it will be interesting to watch the firm’s management walk the marketing tightrope over the Niagara Falls of cash flow as legal eagles circle.
Stephen E Arnold, May 26, 2022
Synthetic Data: Cheap, Like Fast Food
May 25, 2022
Fabricated data may well solve some of the privacy issues around healthcare-related machine learning, but what new problems might it create? The Wall Street Journal examines the technology in, “Anthem Looks to Fuel AI Efforts with Petabytes of Synthetic Data.” Reporter Isabelle Bousquette informs us Anthem CIO Anil Bhatt has teamed up with Google Cloud to build the synthetic data platform. Interesting choice, considering the health insurance company has been using AWS since 2017.
The article points out synthetic data can refer to either anonymized personal information or entirely fabricated data. Anthem’s effort involves the second type. Bousquette cites Bhatt as well as AI and automation expert Ritu Jyoti as she writes:
“Anthem said the synthetic data will be used to validate and train AI algorithms that identify things like fraudulent claims or abnormalities in a person’s health records, and those AI algorithms will then be able to run on real-world member data. Anthem already uses AI algorithms to search for fraud and abuse in insurance claims, but the new synthetic data platform will allow it to scale. Personalizing care for members and running AI algorithms that identify when they may require medical intervention is a more long-term goal, said Mr. Bhatt. In addition to alleviating privacy concerns, Ms. Jyoti said another advantage of synthetic data is that it can reduce biases that exist in real-world data sets. That said, she added, you can also end up with data sets that are worse than real-world ones. ‘The variation of the data is going to be very, very important,’ said Mr. Bhatt, adding that he believes the variation in the synthetic data will ultimately be better than the company’s real-world data sets.”
The article notes the use of synthetic data is on the rise. Increasing privacy and reducing bias both sound great, but that bit about potentially worse data sets is concerning. Bhatt’s assurance is pleasant enough, but how can will we know whether his confidence pans out? Big corporations are not exactly known for their transparency.
Cynthia Murrell, May 25, 2022
Data Federation? Loser. Go with a Data Lake House
February 8, 2022
I have been the phrase “data lake house” or “datalake house.” I noted some bold claims about a new data lake house approach in “Managed Data Lakehouse Startup Onehouse Launches with $8M in Funding.” The write up states:
One of the flagship features of Onehouse’s lakehouse service is a technology called incremental processing. It allows companies to start analyzing their data soon after it’s generated, which is difficult when using traditional technologies.
The write up adds:
The company’s lakehouse service automatically optimizes customers’ data ingestion workflows to improve performance, the startup says. Because the service is delivered via the cloud on a fully managed basis, customers don’t have to manage the underlying infrastructure.
The idea of course is that traditional methods of handling data are [a] slow, [b] expensive, and [c] difficult to implement.
The premise is that the data lake house delivers more efficient use of data and a way to “future proof the data architected for machine learning / data science down the line.”
When I read this I thought of Vivisimo’s explanation of its federating method. IBM bought Vivisimo, and I assume that it is one of the ingredient in IBM’s secret big data sauce. MarkLogic also suggested in one presentation I sat through that its system would ingest data and the MarkLogic system (once eyed by the Google as a possible acquisition) would allow near real time access to the data. One person in the audience was affiliated with the US Library of Congress, and that individual seemed quite enthused about MarkLogic. And there are companies which facilitate data manipulation; for example, Kofax and its data connectors.
From my point of view, the challenge is that today large volumes of data are available. These data have to be moved from point A to point B. Ideally data do not require transformation. At some point in the flow, data in motion can be processed. There are firms which offer real time or near real time data analytics; for example, Trendalyze.com.
Conversion, moving, saving, and then doing something “more” with the data remain challenges. Maybe Onehouse has the answer?
Stephen E Arnold, February 8, 2022