Hazy Promises of AI Data Magic

December 11, 2020

Forbes has posted an article that sounds full of promise, “How to Understand All of Your Data to Transform Your Business.” Unfortunately, the piece is full of logical flaws. We note that writer Daniel Fallmann’s company, Mindbreeze, is part of Fallsoft in Austria, and is Microsoft centric. When he speaks of “all” your data, he seems to be talking about the inclusion of unstructured data. That is the holy grail data management vendors have been chasing for years, with less success than once hoped. Fallmann states what is now the obvious:

“Almost everybody hates filling out forms. That’s why you write a note instead. You send an email or text. You record an audio message. You create a video. You communicate in an unstructured, humanized way. Unlike metadata in forms, which are structured, these other methods of communication are unstructured. Unstructured data lacks metadata, and semi-structured information has limited metadata. The real value of unstructured data like an email, for example, is in the body of that email. You and I can often make sense of an email and other semi-structured and unstructured information. However, for a company, and for search, understanding the essence of a message is not that easy. This is problematic because when you can’t get to the essence of a message, you miss out on opportunities. You find it difficult — if not impossible — to connect the dots of your enterprise data. As a result, a wealth of knowledge that already exists in your enterprise goes to waste. That’s a lot of waste considering that unstructured data represents more than 80% of enterprise data.”

All true. But being able to define the problem does not mean one has the solution. The piece goes on to assert that machine learning can be used to connect the dots between structured and unstructured data, to criticize mindless silo migrations, and to stress the value of removing outdated or incorrect data from one’s database. So far so good. But Fallmann’s generic claims that new technology is “changing everything” lack substance. He fails to provide any factual backup for his assertions about AI or any definition of knowledge management or content management systems.

Doesn’t this company license enterprise search?

Cynthia Murrell, December 11, 2020

How to Be a Data Scientist

December 9, 2020

Do you want to be a data scientist without [a] going to a university, [b] watching YouTube videos, and [c] relying on persistence? If you answer “yes” to any of these questions, “You Don’t Need a Ph.D. in Data Science, but…” offers a road map. One tip: Figure out how to do a regression in Excel. Okay.

  • The write up includes a number of suggestions, including:
  • Kaggle notebooks
  • Free book books
  • Free courses from universities
  • Why Python, R, and SQL should be on your radar
  • The value of math and statistics
  • How to get a job.

Interesting summary. But imagine math and statistics at the tail end of the article. Perhaps whose disciplines should have been identified at the top of the list. Just a thought.

Stephen E Arnold, December 9, 2020

Exclusive: Interview with DataWalk’s Chief Analytics Officer Chris Westphal, Who Guides an Analytics Rocket Ship

October 21, 2020

I spoke with Chris Westphal, Chief Analytics Officer for DataWalk about the company’s string of recent contract “wins.” These range from commercial engagements to heavy lifting for the US Department of Justice.

Chris Westphal, founder of Visual Analytics (acquired by Raytheon) brings his one-click approach to advanced analytics.

The firm provides what I have described as an intelware solution. DataWalk ingests data and outputs actionable reports. The company has leap-frogged a number of investigative solutions, including IBM’s Analyst’s Notebook and the much-hyped Palantir Technologies’ Gotham products. This interview took place in a Covid compliant way. In my previous Chris Westphal interviews, we met at intelligence or law enforcement conferences. Now the experience is virtual, but as interesting and information in July 2019. In my most recent interview with Mr. Westphal, I sought to get more information on what’s causing DataWalk to make some competitors take notice of the company and its use of smart software to deliver what customers want: Results, not PowerPoint presentations and promises. We spoke on October 8, 2020.

DataWalk is an advanced analytics tool with several important innovations. On one hand, the company’s information processing system performs IBM i2 Analyst’s Notebook and Palantir Gotham type functions — just with a more sophisticated and intuitive interface. On the other hand, Westphal’s vision for advanced analytics has moved past what he accomplished with his previous venture Visual Analytics. Raytheon bought that company in 2013. Mr. Westphal has turned his attention to DataWalk. The full text of our conversation appears below.

Read more

Another Data Marketplace: Amazon, Microsoft, Oracle, or Other Provider for This Construct?

August 31, 2020

The European Union is making a sharp U-turn on data privacy, we learn from MIT Technology Review’s article, “The EU Is Launching a Market for Personal Data. Here’s What That Means for Privacy.” The EU has historically protected its citizens’ online privacy with vigor, fighting tooth and nail against the commercial exploitation of private information. As of February, though, the European Commission has decided on a completely different data strategy (PDF). Reporter Anna Artyushina writes:

The Trusts Project, the first initiative put forth by the new EU policies, will be implemented by 2022. With a €7 million [8.3 million USD] budget, it will set up a pan-European pool of personal and nonpersonal information that should become a one-stop shop for businesses and governments looking to access citizens’ information. Global technology companies will not be allowed to store or move Europeans’ data. Instead, they will be required to access it via the trusts. Citizens will collect ‘data dividends,’ which haven’t been clearly defined but could include monetary or nonmonetary payments from companies that use their personal data. With the EU’s roughly 500 million citizens poised to become data sources, the trusts will create the world’s largest data market. For citizens, this means the data created by them and about them will be held in public servers and managed by data trusts. The European Commission envisions the trusts as a way to help European businesses and governments reuse and extract value from the massive amounts of data produced across the region, and to help European citizens benefit from their information.”

It seems shifty they have yet to determine just how citizens will benefit from this data exploitation, I mean, value-extraction. There is no guarantee people will have any control over their information, and there is currently no way to opt out. This change is likely to ripple around the world, as the way EU approaches data regulation has long served as an example to other countries.

The concept of data trusts has been around since 2018, when Sir Tim Berners Lee proposed it. Such a trust could be for-profit, for a charitable cause, or simply for data storage and protection. As Artyushina notes, whether this particular trust actually protects citizens depends on the wording of its charter and the composition of its board of directors. See the article for examples of other trusts gone wrong, as well as possible solutions. Let us hope this project is set up and managed in a way that puts citizens first.

Cynthia Murrell, August 31, 2020

Amazon and Toyota: Tacoma Connects to AWS

August 20, 2020

This is just a very minor story. For most people, the information reported in “Toyota, Amazon Web Services Partner On Cloud-Connected Vehicle Data” will be irrelevant. The value of the data collected by the respective firms and their partners is trivial and will not have much impact. Furthermore, any data processed within Amazon’s streaming data marketplace and made available to some of the firm’s customers will be of questionable value. That’s why I am not immediately updating my Amazon reports to include the Toyota and insurance connection.

Now to the minor announcement:

Toyota will use AWS’ services to process and analyze data “to help Toyota engineers develop, deploy, and manage the next generation of data-driven mobility services for driver and passenger safety, security, comfort, and convenience in Toyota’s cloud-connected vehicles. The MSPF and its application programming interfaces (API) will enable Toyota to use connected vehicle data to improve vehicle design and development, as well as offer new services such as rideshare, full-service lease, proactive vehicle maintenance notifications and driving behavior-based insurance.

Are there possible implications from this link up? Sure, but few people care about Amazon’s commercial, financial, and governmental services, why think about issues like:

  • Value of the data to the AWS streaming data marketplace
  • Link analytics related to high risk individuals or fleet owners
  • Significance of the real time data to predictive analytics, maybe to insurance carriers and others?

Nope, not much of a big deal at all. Who cares? Just mash that Buy Now button and move on. Curious about how Amazon ensures data integrity in such a system? If you are, you can purchase our 50 page report about Amazon’s advanced data security services. Just write darkcyber333 at yandex dot com.

But I know first hand after two years of commentary, shopping is more fun than thinking about Amazon examined from a different viewshed.

Stephen E Arnold, August 20, 2020

Google and Global Surveillance

August 14, 2020

DarkCyber noted “Android Users Could Detect Earthquakes Soon As Google Is Planning to Turn Them into Seismometers.” The write up describes a global system to note perturbations in the earth’s crust. Yep, earthquake warning on a global scale. The write up states:

the internet giant plans on using the built-in accelerators of Android devices to turn them into a network of makeshift seismometers, and while they won’t be able to predict these quakes, the long-term goal is that Android users affected by a tremor will receive push notifications as soon as it happens. Based on the report, Android users would have to opt-into the new system for it to work. Additionally, the phone would have to be plugged in and motionless to detect a nearby quake and send an alert to the user.

Useful data for the Google? That’s a good question. If one assumes the data are valid, what can these seismic data reveal? Ads for products needed in the aftermath of a natural disaster? Hints about investment opportunities? Fine grained surveillance of mobile phone users’ behavior when disaster strikes?

What use cases are possible? What about the location of a mobile device in an area in which looting is occurring? Any others come to mind?

Stephen E Arnold, August 14, 2020

After 20 Plus Years, Whoa! Surveillance by Big Tech

August 10, 2020

DarkCyber has noted a flurry of write ups expressing surprise, rage, indignation, and blusterification at the idea of a commercial company collecting data. Hello, services are free for a basic reason: Making money. Part of making money is to have something that other companies and organizations will purchase. A good example is personal information about users of free services. The way big companies work is that there is a constant pressure to find new ways to generate money. Thus, there are data sucking apps; there are advertisements and more advertisements; there are subscriptions which lock in revenue while providing an Amazon-style we know a lot about those who shop on Amazon; and there are many ornaments on these methods.

I got a kick out of “Silicon Valley’s Vast Data Collection Should Worry You More Than TikTok.” We know the story well. Commercial firms in the US gather data and license it, often to marketing firms and to other organizations. After two decades of blissful ignorance a devoted band of “real” journalists are now probing the core business model of many technology centric companies.

Give me a break. We are talking decades of business processes designed to generate useful reports from flows of actions by individuals. In some countries, the government performs this task. In others, commercial enterprises do the work and license the normalized data to governments.

This passage from the write up tickled my funny bone:

And none of this is unreasonable. We should be worried about private companies and governments potentially collecting data on millions of unsuspecting people and censoring content they don’t like. But those based in China represent just a sliver of that threat.

Yep, the old “woulda, coulda, shoulda” ploy. May I remind you, gentle reader, that we are decades into the automation of data about the actions of individuals. These are the happy and often ignorant humanoids who download apps, run queries, click on videos, and send personal message while leaving a data trail a foot deep and a mile wide.

And now the need for something?

And data collection is not a technical and economic issue. Nope. Data collection is politics; for example:

TikTok’s critics might point to the increasingly scary behavior of China’s government as to why Chinese control of information is particularly alarming. They’re right about the behavior, but they curiously ignore the fact that the United States itself is currently governed by a far-right demagogue with his own concentration camps and authoritarian repression, and that the party behind him, which aligns entirely with his politics, reliably cycles into power at least once every eight years.

What’s the fix? Well, “oppose it all.”

Where were the regulators, the users, and the competitors 20 years ago? Probably in grade school, blissfully unaware that those handheld gadgets would become more important than other activities. Okay, adult thumbtypers, your outrage is interesting. Step back, and perhaps you can see why the howls of outrage, the references to evil forms of government, and the horrors of toting around a device that usually provides real time documentation of one’s actions as a bad thing.

But after 20 years, is it surprising that personal data actions are captured, analyzed, and used to provide more data “stuff” to consume? As I said, its been 20 years with no lessening of the processes. Complain to your parents. Maybe they dropped the ball? Commercial enterprises and governments are like beavers. And beavers do what beavers do.

Stephen E Arnold, August 10, 2020

Why Is MiningLamp Getting Ink?

December 3, 2019

The question “Why is MiningLamp getting ink?” is an interesting one to some people. The firm was founded in 2014. The company was a product of bunsha practiced by Miaozhen Systems, a company engaged in advertising “analysis.” The company is funded by Tencent, China Renaissance, and Sequoia Capital China. The firm may have revenues in the hundreds of millions of dollars. Data about the influence of the Chinese government is not available to the DarkCyber team at this time. MiningLamp may have received as much as $290 million from its backers.

image

Companies want publicity to get sales leads, attract investors, create buzz to lure new hires, and become known to procurement professionals in government agencies.

image

We noted talk about MiningLamp at a couple of law enforcement and intelligence conferences. The company provides policeware and intelware to customers in China and elsewhere. You can read about the firm on its Web site at this link. (Be patient. The service seems to provide a high latency experience.) Product pages also seem to be missing in action.

Nevertheless, “Chinese Data Mining Firm MiningLamp, Now a National AI Champion, Began by Helping Police Solve Crimes” does not talk about a dearth of public information. The write up states that “MiningLamp’s business analytics tools are used by more than 200 companies in the Fortune 200.” That’s a lot of big companies embracing investigative software. Judging from the attendees at law enforcement and intelligence conference, these big companies are finding out about a Chinese company somehow.

The news story states that “Like Palantir, this Chinese start up uses AI to help corporate clients convert huge volumes of data into actionable information.” Palantir is a big ticket item. Perhaps price is a factor or Fortune 200 companies want to rely on a business intelligence system operated by a company located outside the span of control of some government authorities.

The company has been named a Chinese champion. The article reveals:

Although not as well known as US equivalent Palantir Technologies, which reportedly contributed to America’s success in hunting down Osama bin Laden, MiningLamp’s data mining software is used to spot crime patterns, track drug dealers and prevent human trafficking.

DarkCyber thinks that any company which has 200 Fortune listed companies as customers is reasonably well known.

We learned:

“Cases are being resolved on our platforms every day” in more than 60 cities and regions in China, said founder and CEO Wu Minghui. “We can run fast analysis on potential drug dealers or major suspects, improving the overall case-solving efficiency several hundred times.”

Read more

Federating Data: Easy, Hard, or Poorly Understood Until One Tries It at Scale?

March 8, 2019

I read two articles this morning.

One article explained that there’s a new way to deal with data federation. Always optimistic, I took a look at “Data-Driven Decision-Making Made Possible using a Modern Data Stack.” The revolution is to load data and then aggregate. The old way is to transform, aggregate, and model. Here’s a diagram from DAS43. A larger version is available at this link.das42 diagram

Hard to read. Yep, New Millennial colors. Is this a breakthrough?

I don’t know.

When I read “2 Reasons a Federated Database Isn’t Such a Slam-Dunk”, it seems that the solution outlined by DAS42 and the InfoWorld expert are not in sync.

There are two reasons. Count ‘em.

One: performance

Two: security.

Yeah, okay.

Some may suggest that there are a handful of other challenges. These range from deciding how to index audio, video, and images to figuring out what to do with different languages in the content to determining what data are “good” for the task at hand and what data are less “useful.” Date, time, and geocodes metadata are needed, but that introduces the not so easy to solve indexing problem.

So where are we with the “federation thing”?

Exactly the same place we were years ago…start ups and experts notwithstanding. But then one has to wrangle a lot of data. That’s cost, gentle reader. Big money.

Stephen E Arnold, March 8, 2019

Data Science Gets Political

November 20, 2018

With the near ubiquitous use of big data science in every industry short of rock hunting, it was inevitable that there would be blowback. Recently, many tech companies began to feel some political heat due to their involvement with immigration agencies. We learned more from a recent Mercury News story, “Bay Area Cities May Boycott Tech Giants Contracting With ICE.”

According to the story:

“The policy comes as the local immigration debate shifts toward several prominent tech companies — including Palo Alto’s Palantir Technologies, Vigilant Solutions in Livermore and Amazon, which have been criticized for contracting with federal immigration agencies. Last week, advocates descended on Salesforce’s annual conference in San Francisco with an 14-foot-tall cage symbolizing ICE detention to protest the company’s contract with Customs and Border Protection.”

If this sounds a little farfetched or even unlikely, pay close attention to similar actions in Europe. There, when people pushed back against the intersection of politics and big data, it began to impact finances. And when pocketbooks begin to suffer, you can guarantee companies take notice. We don’t yet know if the same will happen in America, but we have a hunch this issue won’t vanish quietly.

Patrick Roland, November 20, 2018

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta