Data and Analytics: Do Good, Not Bad

April 1, 2019

Nope, not an April Fool’s spoof. “Using Data and Analytics for Good” is an attempt to make a case for monitoring and intercept technology to make the world a better place. No, the write up does not use China’s social credit score as an example of “doing good.”

I noted this statement from Cindi Howson, a Gartner fellow traveler, in the write up:

Howson said the mission is a personal one for her that started when she was a college student. She was working two jobs to pay her own way, and after she wrote that big tuition check she had only $2 left to buy hotdogs and a box of macaroni to last a week. She knew that financially there wasn’t much separating her from the homeless people she had passed on the streets of New York City every night.

Was this the plight of the students whose parents paid hundreds of thousands of dollars so that their progeny could enter “prestigious schools”?

Will Gartner convert data for good into revenue? Stakeholders may be crossing their fingers.

The “doing good” thing does not get much coverage in The Age of Surveillance Capitalism. That’s no April Fool’s joke.

Stephen E Arnold, April 1, 2019

Who Is Assisting China in Its Technology Push?

March 20, 2019

I read “U.S. Firms Are Helping Build China’s Orwellian State.” The write up is interesting because it identifies companies which allegedly provide technology to the Middle Kingdom. The article also uses an interesting phrase; that is, “tech partnerships.” Please, read the original article for the names of the US companies allegedly cooperating with China.

I want to tell a story.

Several years ago, my team was asked to prepare a report for a major US university. Our task was to try and answer what I thought was a simple question when I accepted the engagement, “Why isn’t this university’s computer science program ranked in the top ten in the US?”

The answer, my team and I learned, had zero to do with faculty, courses, or the intelligence of students. The primary reason was that the university’s graduates were returning to their “home countries.” These included China, Russia, and India, among others. In one advanced course, there was no US born, US educated student.

We documented that for over a seven year period, when the undergraduate, the graduate students, and post doctoral students completed their work, they had little incentive to start up companies in proximity to the university, donate to the school’s fund raising, and provide the rah rah that happy graduates often do. To see the rah rah in action, may I suggest you visit a “get together” of graduates near Stanford or an eatery in Boston or on NCAA elimination week end in Las Vegas.

How could my client fix this problem? We were not able to offer a quick fix or even an easy fix. The university had institutionalized revenue from non US student and was, when we did the research, dependent on non US students. These students were very, very capable and they came to the US to learn, form friendships, and sharpen their business and technical “soft” skills. These, I assume, were skills put to use to reach out to firms where a “soft” contact could be easily initiated and brought to fruition.

threads fixed

Follow the threads and the money.

China has been a country eager to learn in and from the US. The identification of some US firms which work with China should not be a surprise.

However, I would suggest that Foreign Policy or another investigative entity consider a slightly different approach to the topic of China’s technical capabilities. Let me offer one example. Consider this question:

What Israeli companies provide technology to China and other countries which may have some antipathy to the US?

This line of inquiry might lead to some interesting items of information; for example, a major US company which meets on a regular basis with a counterpart with what I would characterize as “close links” to the Chinese government. One colloquial way to describe the situation is like a conduit. Digging in  this field of inquiry, one can learn how the Israeli company “flows” US intelligence-related technology from the US and elsewhere through an intermediary so that certain surveillance systems in China can benefit directly from what looks like technology developed in Israel.

Net net: If one wants to understand how US technology moves from the US, the subject must be examined in terms of academic programs, admissions, policies, and connections as well as from the point of view of US company investments in technologies which received funding from Chinese sources routed through entities based in Israel. Looking at a couple of firms does not do the topic justice and indeed suggests a small scale operation.

Uighur monitoring is one thread to follow. But just one.

Stephen E Arnold, March 20, 2019

US Government Slow In Adopting Big Data?

March 13, 2019

We are not sure if this is good news or bad news. But the United States may be slow in adopting new technology and policies. The IRS is one government branch that is leveraging big data with actual results. Mondaq shares the IRS’s data analysis in the article, “United States: States Follow The IRS In Joining The Big Data Revolution.”

The IRS has used data analysis since the 1960s to select taxes to adult. As the technology advanced over the years, it has caught more errors and corrected them without any human involvement. The IRS created a new data analysis projected dubbed the Nationally Coordinated Investigation Unit (NCIU). NCIU will focus on using external data and the IRS to select criminal investigations. They also signed a $99 million deal with Palantir. With Palantir’s technology, the IRS will analyze and search terabytes of data on internal and external data sources on a single platform. The IRS is not only data mining for criminal activities. Big data is also being used for civil audits and predict outcomes on cases referred to the IRS Office of Appeals.

State governments have followed the IRS and implemented their own tax data analysis projects. Many of them have already caught fraudulent returns and so far state governments have saved sizable chunks of cash. These data analysis implementations are great, but there are still limitations. We learned:

“Like the IRS, many state departments of revenue have faced significant budgetary pressure in recent years, as governments have tried to cut down the size and cost of government, and have turned to technology to fill the gap. As powerful as data analytics are, however, there is a limit to the extent they can replace human investigators. In 2016, for example, the Arizona Department of Revenue began to lay off dozens of auditors and tax collectors, citing budget cuts. The result was a catastrophe, as audit collections dropped nearly 47 percent—$82 million—in 2017. The IRS itself has taken a markedly different approach: IRS CI has recently announced a hiring blitz, in the course of which it will hire 250 special agents, a number of data scientists, and over 100 professional staff.”

Big data analysis will become a significant tool in the future for the IRS and local tax offices. Good or bad? Excellent question.

Whitney Grace, March 13, 2019

Good News about Big Data and AI: Not Likely

February 25, 2019

I read a write up which was a bit of a downer. The story appeared in Analytics India and was titled “10 Challenges That Data Science Industry Still Faces.” Oh, oh. Maybe not good news?

My first thought was, “Only 10?”

The write up explains that the number one challenge is humans. The idea that smart software would solve these types of problems: Sluggish workers at fast food restaurants, fascinating decisions made by entry level workers in some government bureaus, and the often remarkable statements offered by talking heads on US cable TV “real news” programs, among others.

Nope. The number one challenge is finding humans who can do data science work.

What’s number two after this somewhat thorny problem? The answer is finding the “right data” and then getting a chunk of data one can actually process.

So one and two are what I would call bedrock issues: Expertise and information.

What about the other eight challenges. Here are three of them. I urge you to read the original article for the other five issues.

  • Informing people why data science and its related operations are good for you. Is this similar to convincing a three year old that lima beans are just super.
  • Storytelling. I think this means, “These data mean…” One hopes the humans (who are in short supply) draw the correct inferences. One hopes.
  • Models. This is a shorthand way of saying, “What’s assembled will work.” Hopefully the answer is, “Sure, our models are great.”

Analytics India has taken a risk with their write up. None of the data science acolytes want to hear “bad news.”

Let’s federate and analyze that with great data we can select to generate a useful output. Maybe 80 percent “accuracy” on a good day?

Stephen E Arnold, February 25, 2019

Big Data: Cost Control May Be a Challenge

December 24, 2018

I read “AI’s Dark Secret? A Desire for Data.” The write up states:

The AI revolution is hungry for personal data.

Those data come with a catch.

To ensure that AI algorithms work properly and to get the bugs out, they need to fed a consistent stream of data. The data needs to be reliable, accurate, and objective and that costs a lot of money. Venture Beat shares how data has a downside in the article, “Could Data Costs Kill Your AI Startup?”

AI startups that discover their funds are chipped away by data costs should consider moving that cost from the research and development line to the costs of goods sold column. The article explains it is a golden opportunity to scale up your company, drive costs down, so that margins will increase.

Startups use data in three basic ways: acquiring, storing, and annotating the data to train the algorithm model. All these steps cost money and can tack on more expenses based on what resources and services you offer. There are different ways to scale down costs at each of the steps, but how and what depends on your individual project. The best way is to figure out how to optimize not only your costs, but also all of your tools:

“The first successful AI businesses came to market offering AI-free workflow tools to capture data that eventually trained AI models and enhanced the tools’ value. These startups were able to achieve software margins early on, since the data and AI were secondary to the startup’s value proposition. As we move to more specialized applications of AI, however, the next wave of AI startups will face higher startup costs and will require more human labor to provide initial value to their customers, making them resemble lower-margin services businesses.”

The only fact you can be sure of with your AI startup is that costs will continue to rise. In order to maintain your relevancy and sell your product, figure out how you can make the most of everything available to you.

Whitney Grace, December 24, 2018

Making Sense of Big Data: What Is Needed Now

October 29, 2018

Picture, images, and visualization will chop Big Data down to size. SaveDelete explained this idea in depth in its recent story: “The Next Big Phase of Big Data: Simplification.”

According to the article:

Data visualization is a growing trend, and that momentum isn’t likely to decline anytime soon. Visuals make everything simpler; complex relationships between data points can be seen at a glance, reporting is reduced to a handful of pages, and the esoteric mathematics and statistics behind variable relationships disappear when you’re communicating with someone inexperienced.”

Other ways to deal with making sense of Big Data include:

  • “Approachable” software. I think this means easy to use, maybe?
  • Gathering the right data. Yep, if one wants to understand terrorist attacks one does not need too much data about hamburger sales in downtown Louisville.
  • Reducing insights. This is a tough one. I think the idea is similar to Admiral Craig Hosmer’s statement to me in 1973: “If you can’t get it on a 4×6 note card, I don’t want to see it.”
  • Make everything simple. Homer Simpson would be proud.

Useful for math and statistics majors.

Stephen E Arnold, October 29, 2018

Free Data Sources

October 19, 2018

We were plowing through our research folder for Beyond Search. We overlooked the article “685 Outstanding Free Data Sources For 2017.” If you need a range of data sources related to such topics as government data, machine learning, and algorithms, you might want to bookmark this listing.

Stephen E Arnold, October 19, 2018

FOIA Suit Seeks Details of Palantirs Work with ICE

March 21, 2018

Well, this should be interesting. The Electronic Privacy Information Center (Epic.org) has announced, “EPIC FOIA- EPIC Sues for Details of Palantir’s Government Systems.” The brief write-up reports the watchdog’s complaint requesting information on the relationship between data-analysis firm Palantir and the Immigration and Customs Enforcement agency (ICE). The announcement specifies:

The federal agency contracted with the Peter Thiel company to establish vast databases of personal information, and to make secret determinations about the opportunities for employment, travel, and also who is subject to criminal investigations. EPIC is seeking the government contracts with Palantir, as well as assessments and other related documents. The ICE Investigative Case Management System and the FALCON system pull together vast troves of personal data from across the federal government. EPIC wrote in the complaint, ‘Palantir’s “big data” systems raise far-reaching privacy and civil liberties risks.’

Palantir’s role in creating “risk assessment” scores for travelers (US citizens and non-citizens alike) was revealed through an earlier FOIA lawsuit from EPIC. It would be interesting to see what information the organization is able to shake loose.

Cynthia Murrell, March 21, 2018

Big Data and Smart Software: A Volatile Mixture?

March 19, 2018

For several years big data and artificial intelligence have been running on parallel tracks. Once in a while they cross over, but mostly they kept independent of one another. But that is poised to change, as we saw from a recent press release from Lucidworks, “Lucidworks Launches Fusion 4 With Operationalized AI and Portable Applications.”

According to the piece, their AI and big data are coming together because:

“Our customers are global organizations who demand a reactive and flexible platform that lets them adapt to hybrid run-time environments including on premise, private cloud, and public cloud infrastructures. With Fusion 4, we’ve brought that portability to application development so customers can create and run apps that best fit their security and operational constraints.”

This is picking up steam in a major way. Multiple companies are offering mashups of big data and AI and the results run the gamut. Forbes recently ran a list of 30 free sources for such tools. Not only are they becoming more available, there is a lot of evidence that this combo is disrupting the normal patterns of business and life. Expect more from this pairing because our world is waiting for a new explosion of AI and data.

Beyond Search wants to point out that the use of data from services like Facebook by third parties can have unexpected consequences. Those facilitating volatile compounds may find themselves walking a knife edge. Will that “work”?

Patrick Roland, March 19, 2018

Bigquery Equals Big Data Transfers for Google

March 16, 2018

Google provides hundreds of services for its users; these include YouTube, AdWord, DoubleClick Campaign Manager, and more.  Google, however, is mainly used as a search engine and all of the content on its other services are fed into the search algorithm so they can be queried.  In order for all of the content to be searchable, it needs to be dumped and mined.  That requires a lot of push power, so what does Google use?  According to Smart Data Collective, Google uses the, ““Big Query Service: Next Big Thing Unveiled By Google On Big Data”.“”

Google and big data have not been in the news together for a while, but the BigQuery Data Transfer Service shows how it is moving away from SaaS.  How exactly does this work?

According to a Google’s blog post, the new service automates the migration of data from these apps in BigQuery in a scheduled and managed manner. So good so far, the service will support data transfers from AdWords, DoubleClick Campaign Manager, DoubleClick for Publishers, and YouTube Content and Channel Owner Reports and so forth. As soon as the data gets to BigQuery, users can begin querying on the immediate basis. With the help of Google Cloud Dataprep, users cannot only clean and prep the data for that analysis but also further think of analyzing other data alongside that information kept in BigQuery.

The data moves from the apps within 24 hours and BigQuery customers can schedule their own data deliveries so they occur regularly.  Customers who already use BigQuery are Trivago and Zenith.

The article turns into a press release for other services Google provides related to machine learning and explains how it is the leading company in the industry.  It is simply an advertisement for cloud migration and yet another Google service.

Whitney Grace, March 16, 2018

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta