Google and Global Surveillance

August 14, 2020

DarkCyber noted “Android Users Could Detect Earthquakes Soon As Google Is Planning to Turn Them into Seismometers.” The write up describes a global system to note perturbations in the earth’s crust. Yep, earthquake warning on a global scale. The write up states:

the internet giant plans on using the built-in accelerators of Android devices to turn them into a network of makeshift seismometers, and while they won’t be able to predict these quakes, the long-term goal is that Android users affected by a tremor will receive push notifications as soon as it happens. Based on the report, Android users would have to opt-into the new system for it to work. Additionally, the phone would have to be plugged in and motionless to detect a nearby quake and send an alert to the user.

Useful data for the Google? That’s a good question. If one assumes the data are valid, what can these seismic data reveal? Ads for products needed in the aftermath of a natural disaster? Hints about investment opportunities? Fine grained surveillance of mobile phone users’ behavior when disaster strikes?

What use cases are possible? What about the location of a mobile device in an area in which looting is occurring? Any others come to mind?

Stephen E Arnold, August 14, 2020

After 20 Plus Years, Whoa! Surveillance by Big Tech

August 10, 2020

DarkCyber has noted a flurry of write ups expressing surprise, rage, indignation, and blusterification at the idea of a commercial company collecting data. Hello, services are free for a basic reason: Making money. Part of making money is to have something that other companies and organizations will purchase. A good example is personal information about users of free services. The way big companies work is that there is a constant pressure to find new ways to generate money. Thus, there are data sucking apps; there are advertisements and more advertisements; there are subscriptions which lock in revenue while providing an Amazon-style we know a lot about those who shop on Amazon; and there are many ornaments on these methods.

I got a kick out of “Silicon Valley’s Vast Data Collection Should Worry You More Than TikTok.” We know the story well. Commercial firms in the US gather data and license it, often to marketing firms and to other organizations. After two decades of blissful ignorance a devoted band of “real” journalists are now probing the core business model of many technology centric companies.

Give me a break. We are talking decades of business processes designed to generate useful reports from flows of actions by individuals. In some countries, the government performs this task. In others, commercial enterprises do the work and license the normalized data to governments.

This passage from the write up tickled my funny bone:

And none of this is unreasonable. We should be worried about private companies and governments potentially collecting data on millions of unsuspecting people and censoring content they don’t like. But those based in China represent just a sliver of that threat.

Yep, the old “woulda, coulda, shoulda” ploy. May I remind you, gentle reader, that we are decades into the automation of data about the actions of individuals. These are the happy and often ignorant humanoids who download apps, run queries, click on videos, and send personal message while leaving a data trail a foot deep and a mile wide.

And now the need for something?

And data collection is not a technical and economic issue. Nope. Data collection is politics; for example:

TikTok’s critics might point to the increasingly scary behavior of China’s government as to why Chinese control of information is particularly alarming. They’re right about the behavior, but they curiously ignore the fact that the United States itself is currently governed by a far-right demagogue with his own concentration camps and authoritarian repression, and that the party behind him, which aligns entirely with his politics, reliably cycles into power at least once every eight years.

What’s the fix? Well, “oppose it all.”

Where were the regulators, the users, and the competitors 20 years ago? Probably in grade school, blissfully unaware that those handheld gadgets would become more important than other activities. Okay, adult thumbtypers, your outrage is interesting. Step back, and perhaps you can see why the howls of outrage, the references to evil forms of government, and the horrors of toting around a device that usually provides real time documentation of one’s actions as a bad thing.

But after 20 years, is it surprising that personal data actions are captured, analyzed, and used to provide more data “stuff” to consume? As I said, its been 20 years with no lessening of the processes. Complain to your parents. Maybe they dropped the ball? Commercial enterprises and governments are like beavers. And beavers do what beavers do.

Stephen E Arnold, August 10, 2020

Why Is MiningLamp Getting Ink?

December 3, 2019

The question “Why is MiningLamp getting ink?” is an interesting one to some people. The firm was founded in 2014. The company was a product of bunsha practiced by Miaozhen Systems, a company engaged in advertising “analysis.” The company is funded by Tencent, China Renaissance, and Sequoia Capital China. The firm may have revenues in the hundreds of millions of dollars. Data about the influence of the Chinese government is not available to the DarkCyber team at this time. MiningLamp may have received as much as $290 million from its backers.

image

Companies want publicity to get sales leads, attract investors, create buzz to lure new hires, and become known to procurement professionals in government agencies.

image

We noted talk about MiningLamp at a couple of law enforcement and intelligence conferences. The company provides policeware and intelware to customers in China and elsewhere. You can read about the firm on its Web site at this link. (Be patient. The service seems to provide a high latency experience.) Product pages also seem to be missing in action.

Nevertheless, “Chinese Data Mining Firm MiningLamp, Now a National AI Champion, Began by Helping Police Solve Crimes” does not talk about a dearth of public information. The write up states that “MiningLamp’s business analytics tools are used by more than 200 companies in the Fortune 200.” That’s a lot of big companies embracing investigative software. Judging from the attendees at law enforcement and intelligence conference, these big companies are finding out about a Chinese company somehow.

The news story states that “Like Palantir, this Chinese start up uses AI to help corporate clients convert huge volumes of data into actionable information.” Palantir is a big ticket item. Perhaps price is a factor or Fortune 200 companies want to rely on a business intelligence system operated by a company located outside the span of control of some government authorities.

The company has been named a Chinese champion. The article reveals:

Although not as well known as US equivalent Palantir Technologies, which reportedly contributed to America’s success in hunting down Osama bin Laden, MiningLamp’s data mining software is used to spot crime patterns, track drug dealers and prevent human trafficking.

DarkCyber thinks that any company which has 200 Fortune listed companies as customers is reasonably well known.

We learned:

“Cases are being resolved on our platforms every day” in more than 60 cities and regions in China, said founder and CEO Wu Minghui. “We can run fast analysis on potential drug dealers or major suspects, improving the overall case-solving efficiency several hundred times.”

Read more

Federating Data: Easy, Hard, or Poorly Understood Until One Tries It at Scale?

March 8, 2019

I read two articles this morning.

One article explained that there’s a new way to deal with data federation. Always optimistic, I took a look at “Data-Driven Decision-Making Made Possible using a Modern Data Stack.” The revolution is to load data and then aggregate. The old way is to transform, aggregate, and model. Here’s a diagram from DAS43. A larger version is available at this link.das42 diagram

Hard to read. Yep, New Millennial colors. Is this a breakthrough?

I don’t know.

When I read “2 Reasons a Federated Database Isn’t Such a Slam-Dunk”, it seems that the solution outlined by DAS42 and the InfoWorld expert are not in sync.

There are two reasons. Count ‘em.

One: performance

Two: security.

Yeah, okay.

Some may suggest that there are a handful of other challenges. These range from deciding how to index audio, video, and images to figuring out what to do with different languages in the content to determining what data are “good” for the task at hand and what data are less “useful.” Date, time, and geocodes metadata are needed, but that introduces the not so easy to solve indexing problem.

So where are we with the “federation thing”?

Exactly the same place we were years ago…start ups and experts notwithstanding. But then one has to wrangle a lot of data. That’s cost, gentle reader. Big money.

Stephen E Arnold, March 8, 2019

Data Science Gets Political

November 20, 2018

With the near ubiquitous use of big data science in every industry short of rock hunting, it was inevitable that there would be blowback. Recently, many tech companies began to feel some political heat due to their involvement with immigration agencies. We learned more from a recent Mercury News story, “Bay Area Cities May Boycott Tech Giants Contracting With ICE.”

According to the story:

“The policy comes as the local immigration debate shifts toward several prominent tech companies — including Palo Alto’s Palantir Technologies, Vigilant Solutions in Livermore and Amazon, which have been criticized for contracting with federal immigration agencies. Last week, advocates descended on Salesforce’s annual conference in San Francisco with an 14-foot-tall cage symbolizing ICE detention to protest the company’s contract with Customs and Border Protection.”

If this sounds a little farfetched or even unlikely, pay close attention to similar actions in Europe. There, when people pushed back against the intersection of politics and big data, it began to impact finances. And when pocketbooks begin to suffer, you can guarantee companies take notice. We don’t yet know if the same will happen in America, but we have a hunch this issue won’t vanish quietly.

Patrick Roland, November 20, 2018

Amazon Intelligence Gets a New Data Stream

June 28, 2018

I read “Amazon’s New Blue Crew.” The idea is that Amazon can disintermediate FedEx, UPS (the outfit with the double parking brown trucks), and the US Postal Service.

On the surface, the idea makes sense. Push down delivery to small outfits. Subsidize them indirectly and directly. Reduce costs and eliminate intermediaries not directly linked to Amazon.

FedEx, UPS, and the USPS are not the most nimble outfits around. I used to get FedEx envelopes every day or two. I haven’t seen one of those for months. Shipping vis UPS is a hassle. I fill out forms and have to manage odd slips of paper with arcane codes on them. The US Postal Services works well for letters, but I have noticed some returns for “addresses not found.” One was an address in the city in which I live. I put the letter in the recipient’s mailbox. That worked.

The write up reports:

The new program lets anyone run their own package delivery fleet of up to 40 vehicles with up to 100 employees. Amazon works with the entrepreneurs — referred to as “Delivery Service Partners” — and pays them to deliver packages while providing discounts on vehicles, uniforms, fuel, insurance, and more. They operate their own businesses and hire their own employees, though Amazon requires them to offer health care, paid time off, and competitive wages. Amazon said entrepreneurs can get started with as low as $10,000 and earn up to $300,000 annually in profit.

Now what’s the connection to Amazon streaming data services and the company’s intelligence efforts? Several hypotheses come to mind:

  • Amazon obtains fine grained detail about purchases and delivery locations. These are data which no longer can be captured in a non Amazon delivery service system
  • The data can be cross correlated; for example, purchasers of a Kindle title with the delivery of a particular product; for example, hydrogen peroxide
  • Amazon’s delivery data make it possible to capture metadata about delivery time, whether a person accepted the package or if the package was left at the door and other location details such as a blocked entrance, for instance.

A few people dropping off packages is not particularly useful. Scale up the service across Amazon operations in the continental states or a broader swatch of territory and the delivery service becomes a useful source of high value information.

FedEx and UPS are ripe for disruption. But so is the streaming intelligence sector. Worth monitoring this ostensible common sense delivery play.

Stephen E Arnold, June 28, 2018

UK Surveillance Backlash

February 9, 2018

Recently, the UK attempted to fight a variety of criminal activity by developing a mass data unit that used analytics and AI to fight crime. If it sounds like science fiction, that’s because it doesn’t really exist. The task force was ruled illegal recently, we discovered in a Guardian story, “UK Mass Digital Surveillance Regime Ruled Illegal.”

According to the story Security minister Ben Wallace responded to the ruling saying:

“Communications data is used in the vast majority of serious and organized crime prosecutions and has been used in every major security service counter-terrorism investigation over the last decade. It is often the only way to identify pedophiles involved in online child abuse as it can be used to find where and when these horrendous crimes have taken place.”

While the British police are crying for more freedom, they are not the only ones being restricted. A better solution, in our mind, comes from Crime Report, who are advocating for a balanced system of big data policing. According to their report, “acceptable boundaries” must be set in order to protect citizen privacy, but also increase the police’s ability to do their job through technology. It’s likely to be a debate that rages on for a while, but we are hoping for an acceptable solution.

Patrick Roland, February 9, 2018

MarkLogic Aims to Take on Oracle in Enterprise Class Data Hub Frameworks

October 10, 2017

MarkLogic is trying to give Oracle a run for its money in the world of enterprise-class data hubs. According to a recent press release on ITWire, “MarkLogic Releases New Enterprise Class Data Hub Framework to Enhance Agility and Speed Digital Transformations.”

How does this Australian legend plan on doing this? According to the release:

Traditionally, integrating data from silos has been very costly and time consuming for large organizations looking to make faster and better decisions based on their data assets. The Data Hub Framework simplifies and speeds the process of building a MarkLogic solution by providing a framework around how to data model, load data, harmonize data, and iterate with new data and compliance requirements.

But is that enough to unseat Oracle, who has long had a seat at the head of the table? Especially, since they have their own new framework hitting the market. That is still up for debate, but MarkLogic is confident in their ability to compete. According to the piece:

Unlike other databases, NoSQL was specifically designed to ingest and integrate all types of disparate data to find relationships among data, and drive searches and analytics—within seconds.

This battle is just beginning and we have no indication of who has the edge, but you can bet it will be an interesting fight in the marketplace between these two titans.

Patrick Roland, October 10, 2017

Yet Another Digital Divide

September 8, 2017

Recommind sums up what happened at a recent technology convention in the article, “Why Discovery & ECM Haven’t, Must Come Together (CIGO Summit 2017 Recap).” Author Hal Marcus first discusses that he was a staunch challenge to anyone who said they could provide a complete information governance solution. He recently spoke at CIGO Summit 2017 about how to make information governance a feasible goal for organizations.

The problem with information governance is that there is no one simple solution and projects tend to be self-contained with only one goal: data collection, data reduction, etc. When he spoke he explained that there are five main reasons for there is not one comprehensive solution. They are that it takes a while to complete the project to define its parameters, data can come from multiple streams, mass-scale indexing is challenging, analytics will only help if there are humans to interpret the data, risk, and cost all put a damper on projects.

Yet we are closer to a solution:

Corporations seem to be dedicating more resources for data reduction and remediation projects, triggered largely by high profile data security breaches.

Multinationals are increasingly scrutinizing their data sharing and retention practices, spurred by the impending May 2018 GDPR deadline.

ECA for data culling is becoming more flexible and mature, supported by the growing availability and scalability of computing resources.

Discovery analytics are being offered at lower, all-you-can-eat rates, facilitating a range of corporate use cases like investigations, due diligence, and contract analysis

Tighter, more seamless and secure integration of ECM and discovery technology is advancing and seeing adoption in corporations, to great effect.

And it always seems farther away.

Whitney Grace, September 8, 2017

Big Data Too Is Prone to Human Bug

August 2, 2017

Conventional wisdom says Big Data being a realm of machines is immune from human behavioral traits like discrimination. Insights from data scientists, however, are different.

According to an article published by PHYS.ORG titled Discrimination, Lack of Diversity, and Societal Risks of Data Mining Highlighted in Big Data, the author says:

Despite the dramatic growth in big data affecting many areas of research, industry, and society, there are risks associated with the design and use of data-driven systems. Among these are issues of discrimination, diversity, and bias.

The crux of the problem is the way data is mined, processed and decisions made. At every step, humans need to be involved in order to tell machines how each of these processes are executed. If the person guiding the system is biased, these biases are bound to seep into the subsequent processes in some way.

Apart from decisions like granting credit, human resources which also is being automated may have diversity issues. The fundamental remains the same in this case too.

Big Data was touted as the next big thing and may turn out to be so, but most companies are yet to figure out how to utilize it. Streamlining the processes and making them efficient would be the next step.

Vishal Ingole, August 2, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta