Data Mesh: An Innovation or a Catchphrase?
October 18, 2023
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Have you ever heard of data mesh? It’s a concept that has been around the tech industry for a while but is gaining more traction through media outlets. Most of the hubbub comes from press releases, such as TechCrunch’s: “Nextdata Is Building Data Mesh For Enterprise.”
Data mesh can be construed as a data platform architecture that allows users to access information where it is. No transferring of the information to a data lake or data warehouse is required. A data lake is a centralized, scaled data storage repository, while a data warehouse is a traditional enterprise system that analyzes data from different sources which may be local or remote.
Nextdata is a data mesh startup founded by Zhamek Dehghani. Nextdata is a “data-mesh-native” platform to design, share, create, and apply data products for analytics. Nextdata is directly inspired by Dehghani’s work at Thoughtworks. Instead of building storing and using data/metadata in single container, Dehghani built a mesh system. How does the NextData system work?
“Every Nextdata data product container has data governance policies ‘embedded as code.’ These controls are applied from build to run time, Dehghani says, and at every point at which the data product is stored, accessed or read. ‘Nextdata does for data what containers and web APIs do for software,’ she added. ‘The platform provides APIs to give organizations an open standard to access data products across technologies and trust boundaries to run analytical and machine-learning workloads ‘distributedly.’ (sic) Instead of requiring data consumers to copy data for reprocessing, Nextdata APIs bring processing to data, cutting down on busy work and reducing data bloat.’’
NextData received $12 million in seed investment to develop her system’s tooling and hire more people for the product, engineering, and marketing teams. Congratulations on the funding. It is not clear at this time that the approach will add latency to operations or present security issues related to disparate users’ security levels.
Whitney Grace, October 18, 2023
Key Words: Useful Things
October 7, 2021
In the middle of nowhere in the American southwest, lunch time conversation turned to surveillance. I mentioned a couple of characteristics of modern smartphones, butjec people put down their sandwiches. I changed the subject. Later, when a wispy LTE signal permitted, I read “Google Is Giving Data to Police Based on Search Keywords, Court Docs Show.” This is an example of information which I don’t think should be made public.
The write up states:
Court documents showed that Google provided the IP addresses of people who searched for the arson victim’s address, which investigators tied to a phone number belonging to Williams. Police then used the phone number records to pinpoint the location of Williams’ device near the arson, according to court documents.
I want to point out that any string could contain actionable information; to wit:
- The name or abbreviation of a chemical substance
- An address of an entity
- A slang term for a controlled substance
- A specific geographic area or a latitude and longitude designation on a Google map.
With data federation and cross correlation, some specialized software systems can knit together disparate items of information in a useful manner.
The data and the analytic tools are essential for some government activities. Careless release of such sensitive information has unanticipated downstream consequences. Old fashioned secrecy has some upsides in my opinion.
Stephen E Arnold, October 7, 2021
EU Wants Google to Promise It Will Not Use Fitbit Data to Enhance Search
July 27, 2020
We noted “Europe Wants Google to Pledge That Fitbit Data Won’t Further Enhance Search.” Let’s see what “pledge” means:
Your Dictionary says: “The definition of a pledge is something held as security on a contract, a promise, or a person who is in a trial period before joining an organization. An example of a pledge is a cash down payment on a car. An example of a pledge is a promise that you’ll buy a person’s car.”
Dictionary.com says: “A solemn promise or agreement to do or refrain from doing something:a pledge of aid; a pledge not to wage war. Something delivered as security for the payment of a debt or fulfillment of a promise, and subject to forfeiture on failure to pay or fulfill the promise.”
Wordsense.eu says: “From Middle English plege?, from Anglo-Norman plege?, from Old French plege? (Modern French pleige?) from Medieval Latin plevium?, plebium?, from Medieval Latin plebi?? (“I pledge”), from Frankish *plegan? (“to pledge; to support; to guarantee”), from Proto-Germanic *plehan?? (“to care about, be concerned with”). Akin to Old High German pflegan? (“to take care of, be accustomed to”), Old Saxon plegan? (“to vouch for”), Old English pl?on? (“to risk, endanger”).”
The write up says:
EU regulators are asking Google to pledge that Fitbit information will not be used to “further enhance its search advantage.” Another demand involves letting third-parties have “equal” access to that data.
DarkCyber’s comment: Ho, ho, ho. Guarantee? Data are ingested and processed. Ho, ho, ho. No humans involved. Ho, ho, ho. It’s an artificial intelligence system. Ho, ho, ho. Let the lawyers figure it out. Ho, ho, ho. Fitbit users buy products, and Google wants to sell like Amazon. Ho, ho, ho.
Stephen E Arnold, July 27, 2020
The Myth of Data Federation: Not a New Problem, Not One Easily Solved
July 8, 2020
I read “A Plan to Make Police Data Open Source Started on Reddit.” The main point of this particular article is:
The Police Data Accessibility Project aims to request, download, clean, and standardize public records that right now are overly difficult to find.
Interesting, but I interpreted the Silicon Valley centric write up differently. If you are a marketer of systems which purport to normalize disparate types of data, aggregate them, federate indexes, and make the data accessible, analyzable, retrievable, and bang on dead simple — stop reading now. I don’t want to deal with squeals from vendors about their superior systems.
For the individual reading this sentence, a word of advice. Fasten your seat belt.
Some points to consider when reading the article cited above, listening to a Vimeo “insider” sales pitch, or just doing techno babble with your Spin class pals:
- Dealing with disparate data requires time and money as well as NOT ONE but multiple software tools.
- Even with a well resourced and technologically adept staff, exceptions require attention. A failure to deal with the stuff in the Exceptions folder can skew the outputs of some Fancy Dan analytic systems. Example: How about that Detroit facial recognition system? Nifty, eh?
- The flows of real time data are a big problem — are you ready for this — a challenge to the Facebooks, Googles, and Microsofts of the world. The reason is that the volume of data and CHANGES TO THOSE ALREADY PROCESSED ITEMS OF INFORMATION is a very, very tough problem. No, faster processors, bigger pipes, and zippy SSDs won’t do the job. The trouble lies within, the intradevice and intra software module flow. The fix is to sample, and sampling increases the risk of inaccuracies. Example: Remember Detroit’s facial recognition accuracy. The arrested individual may share some impressions with you.
- The baloney about “all” data or “any” type is crazy talk. When one deals with more than 18,000 police forces in the US, outputs from surveillance devices from different vendors, and the geodumps of individuals and their ad tracking beacons — this is going to be mashed up and made usable. Noble idea. There are many noble ideas.
Why am I taking the time to repeat what anyone with experience in large scale data normalization and analysis knows?
Baloney can be thinly sliced, smeared with gochujang, and served on Delft plates. Know what? Still baloney.
Gobble this:
Still, data is an important piece of understanding what law enforcement looks like in the US now, and what it could look like in the future. And making that information more accessible, and the stories people tell about policing more transparent, is a first step.
But the killer assumption is that the humans involved don’t make errors, systems remain online, and file formats are forever.
That baloney. It really is incredible. Just not what you think.
Stephen E Arnold, July 8, 2020
Looking for News Like the Hawaii Volcano Eruption?
June 28, 2018
With the problem of fake news online, the news itself has often made headlines of late. We’ve noticed a couple different news-related moves from big players: TechCrunch reports on Google’s recent project in, “Google Experiments in Local News with an App Called Bulletin.” We learn that Apple, meanwhile, plans to integrate its recent acquisition, magazine aggregator Texture, into Apple News and an upcoming subscription service in, “Apple Said to Plan a ‘Netflix for News’ in Latest Push” at the Daily Herald.
Google’s Bulletin is a place for members of a community to post local news and event notices. TechCrunch’s Sarah Perez suspects it’s also another attempt by Google to squeeze into the Social Media space. She observes:
“The move to delve into local news would have Google competing with other services where people already share news about what’s happening locally. Specifically, people tend to tweet or live stream when news is breaking …. Meanwhile, if they’re trying to promote a local event … it’s likely that they’ll post that to the business’s Facebook Page, where it can then be discovered through the Page’s fans and surfaced in Facebook’s Local app. And if Google aims to more directly compete with local news resources like small-town print or online publishers or Patch, it could have a tougher road. Hyperlocal news has been difficult to monetize, and those who have made it work aren’t likely interested in shifting their limited time and energy elsewhere.”
Over at the Daily Herald, reporters Mark Gurman and Gerry Smith Bloomberg note that Apple cut 20 Texture workers shortly after acquiring the company, but we’re cautioned against reading too much into that. The article notes:
“An upgraded Apple News app with the subscription offering is expected to launch within the next year, and a slice of the subscription revenue will go to magazine publishers that are part of the program, [sources] said. … A new, simplified subscription service covering multiple publications could spur Apple News usage and generate new revenue in a similar manner to the $9.99 per month Apple Music offering.”
Will enough folks pay per month for news, like they do for (other) online entertainment? Perhaps now, when it is prudent to be skeptical, people are willing to pay up to 10 bucks a month for a trusted name. We shall see.
What’s clear is that when one looks for “news” about the Hawaii volcano, few of the online news services are useful. In order to keep up to date, old fashioned search, review, and read processes are the norm. Want current videos about the eruption on YouTube? Good luck with that too. Comprehensiveness may be impossible with free or low cost services, but chronological tags and spam content filtering could be helpful.
For slow moving lava, what’s the rush?
Cynthia Murrell, June 28, 2018
Filtered Content: Tactical Differences between Dow Jones and Thomson Reuters
December 5, 2017
You may know that Dow Jones has an online search company. The firm is called Factiva, and it is an old-school approach to finding information. The company recently announced a deal with an outfit called Curation. Founded by a former newspaper professional, Curation uses mostly humans to assemble reports on hot topics. Factiva is reselling these services, and advertising for customers in the Wall Street Journal. Key point: This is mostly a manual method. The approach was more in line with the types of “reports” available from blue chip consulting firms.
You may also know that Thomson Reuters has been rolling out machine curated reports. These have many different product names. Thomson Reuters has a large number of companies and brands. Not surprisingly, Thomson’s approach has to apply to many companies managed by executives who compete with regular competitors like Dow Jones but also among themselves. Darwin would have loved Thomson Reuters. The point is that Thomson Reuters’ approach relies on “smart” software.
You can read about Dow Jones’ play here.
You can read about Thomson Reuters’ play here.
My take is that these two different approaches reflect the painful fact that there is not clear path forward for professional publishing companies. In order to make money from electronic information, two of the major players are still experimenting. The digital revolution began, what?, about 40 years ago.
One would have thought that leading companies like Dow Jones and Thomson Reuters would have moved beyond the experimental stage and into cash cow land.
Not yet it seems. The reason for my pointing out these two different approaches is that there are more innovative methods available. For snapshots of companies which move beyond the Factiva and Thomson methods, watch Dark Cyber, a new program is available every Tuesday via YouTube at this link.
Stephen E Arnold, December 5, 2017
Newsreaders through Time
December 30, 2015
Those chart mavens at CBInsights have produced another timeline for wild and crazy Internet services. “The Rise and Fall of Venture Backed News Readers” makes clear the long odds traditional news producers face when trying to find a business model. The chart is a shopping list of case studies for MBA programs. The idea of providing “news” to the hungry minds with mobile devices and sci fi laptops seems to be a bit of a challenge. For investors, these services trigger opportunities to explain why their investments did not perform particularly well. The chart, intentionally or unintentionally, causes Flipboard to stand out from the crowd. It may be the red logo and bold faced type. Alternatively, Flipboard has managed to attract money over the last five years. The chart makes clear why an average millennial may want to take a vacation instead of investing in a newsreader start up.
Stephen E Arnold, December 30, 2015
Link Mischief at Feedly
January 8, 2014
Here is some content excitement of interest to journalists and bloggers everywhere. MakeUseOf informs us that “Feedly Was Stealing Your Content—Here’s the Story, and Their Code.” Apparently, the aggregation site was directing shared links to copies on their own site instead of to original articles, essentially stealing traffic. Writer James Bruce, eager to delve deeper into the code, makes it clear that he is following up on a discovery originally revealed by The Digital Reader.
For example, the article notes that Feedly is now sending links to the proper sites, but by way of JavaScript code instead of in the usual, server-level way. Bruce also noticed that, in its attempt to improve functionality, Feedly was stripping embedded items from content. Advertising, tracking, share buttons, even “donate” buttons—gone.
Bruce writes:
“Not only were Feedly scraping the content from your site, they were then stripping any original social buttons and rewriting the meta-data. This means that when someone subsequently shared the item, they would in fact be sharing the Feedly link and not the original post. Anyone clicking on that link would go straight to Feedly.
So what, you might ask? When a post goes viral, it can be of huge benefit to the site in question — raising page views and ad revenues, and expanding their audience. Feedly was outright stealing that specific benefit away from the site to expand its own user base. The Feedly code included checks for mobile devices that would direct the users to the relevant appstore page.
It wasn’t ‘just making the article easier to view’ — it was stealing traffic, plain and simple. That’s really not cool.”
The write-up goes on to detail the ways Feedly has responded to discoveries, where the issue stands now, and “what we have learnt”: Feedly made some bad choices in the pursuit of a streamlined reading experience. As a parting shot, Bruce cites another example of a bad call by the company—it briefly required a Google+ account to log in. He has a point there.
Cynthia Murrell, January 08, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Give A Hand For Old Fashioned Journalistic Bribery
March 26, 2013
A top news story used to either make or break a reporter, though it can still do so today, the old channels are mostly closed and monitored by the Internet beat. Reporters used to have to bribe sources and the best information continues to come from the source. In a throwback to the old days, The Guardian says, “Wall Street Journal Blames Beijing Troublemaking For US Bribery Probe.” The accusation is that the Chinese Wall Street Journal office bribed government officials with expensive gifts for information. The US Justice Department was already conducting an investigation on the Journal’s parent company News Corporation under the Foreign Corrupt Practices Act.
News Corporation believes that someone only wants to make trouble for the Journal and they are upset over the allegations. They also believe a Chinese government agent tipped off authorities. In an internal investigation, News Corporation did not find anything wrong.
How did this happen?
“The newspaper believes the bribery allegation came in relation to the Journal’s reporting of events in Chongqing, the province in which disgraced Chinese official Bo Xilai once had a power base.”
and:
“The report also comes in the wake of claims that China has hacked into the systems of US newspapers – allegations that are denied by Beijing.”
The proper authorities are conducting further investigation, while the US, England, and China argue back and forth, name-calling and the like. The new Chinese premier Li Keqiang even made a statement that everyone should forget this event and concentrate on preventing further cyber attacks. Only in a perfect world or if something bigger comes along, like North Korea gaining an atom bomb.
Whitney Grace, March 26, 2013
Sponsored by ArnoldIT.com, developer of Beyond Search
Agile Solutions for Big Data Make Exabytes Less Overwhelming
November 14, 2012
Many narratives follow the phrase big data and ZDNet discusses the story that EMC tells about big data. Katharin Winkler, vice president of corporate sustainability for EMC, explained the elusive concept in layman’s terms at the Verge @ Greenbuild summit. The main reason that this concept needs to be brought down to a lower level is because big data is affecting everyone’s lives. It has effectively “escaped the data center.”
Back in 2000, two exabytes of new information were created in the world. In 2011, Winkler said the world was creating data at a rate of more than two exabytes of new information everyday.
In the article, “EMC Explains Making Big Data More Concrete to General Public” we learned about EMC’s strategy:
Winkler briefly outlined EMC’s overall strategy, dubbed “The Human Face of Big Data,” which is designed make big data more comprehensible for everyday Internet users. That strategy includes a book of the same name being published later this month, which features images from more than 150 photojournalists worldwide, demonstrating that basically every moment of our lives can now be chronicled in the cloud.
The possibilities with big data may seem overwhelming at times. Inherently, the opportunities are endless. However, these insights and information can only be delivered to decision-makers with the proper infrastructure technologies in place. We have had our eyes on PolySpot for their agile solutions in this department.
Megan Feil, November 14, 2012
Sponsored by ArnoldIT.com, developer of Augmentext.