An Upside to Fake Data

February 2, 2018

We never know if “data” are made up or actual factual. Nevertheless, we read “How Fake Data Can Help the Pentagon Track Rogue Weapons.” The main idea from our point of view is predictive analytics which can adapt to that which has not yet happened. We circled this statement from the company with the contract to make “fake” data useful under a US government contract:

IvySys Founder and Chief Executive Officer James DeBardelaben compared the process to repeatedly finding a needle in a haystack, but making both the needle and haystack look different every time. Using real-world data, agencies can only train algorithms to spot threats that already exist, he said, but constantly evolving synthetic datasets can train tools to spot patterns that have yet to occur.

Worth monitoring IvySys at

Stephen E Arnold, February 2, 2018

Averaging Information Is Not Cutting It Anymore

January 16, 2018

Here is something interesting that comes after the headline of “People From Around The Globe Met For The First Flat Earth Conference” and beliefs that white supremacists are gaining more power.  The Frontiers Media shares that, “Rescuing Collective Wisdom When The Average Group Opinion Is Wrong” is an article that pokes fun at the fanaticism running rampant in the news.  Beyond the fanaticism in the news, there is a real concern with averaging when it comes to data science and other fields that heavily rely on data.

The article breaks down the different ways averaging is used and the different theorems that are developed from it.  The introduction is a bit wordy but it sets the tone:

The total knowledge contained within a collective supersedes the knowledge of even its most intelligent member. Yet the collective knowledge will remain inaccessible to us unless we are able to find efficient knowledge aggregation methods that produce reliable decisions based on the behavior or opinions of the collective’s members. It is often stated that simple averaging of a pool of opinions is a good and in many cases the optimal way to extract knowledge from a crowd. The method of averaging has been applied to analysis of decision-making in very different fields, such as forecasting, collective animal behavior, individual psychology, and machine learning. Two mathematical theorems, Condorcet’s theorem and Jensen’s inequality, provide a general theoretical justification for the averaging procedure. Yet the necessary conditions which guarantee the applicability of these theorems are often not met in practice. Under such circumstances, averaging can lead to suboptimal and sometimes very poor performance. Practitioners in many different fields have independently developed procedures to counteract the failures of averaging. We review such knowledge aggregation procedures and interpret the methods in the light of a statistical decision theory framework to explain when their application is justified. Our analysis indicates that in the ideal case, there should be a matching between the aggregation procedure and the nature of the knowledge distribution, correlations, and associated error costs.

Understanding how data can be corrupted is half the battle of figuring out how to correct the problem.  This is one of the complications related to artificial intelligence and machine learning.  One example is trying to build sentiment analysis engines.  These require huge data terabytes and the Internet provides an endless supply, but the usual result is that the sentiment analysis engines end up racist, misogynist, and all around trolls.  It might lead to giggles but does not very accurate results.

Whitney Grace, January 17, 2018

Apples Orchard of AI Talent

January 1, 2018

Here’s an analysis that will be of interest to competitive artificial intelligence professionals. Fast Company reports on its own research in the piece, “Where Apple Recruits Its AI Talent, According to LinkedIn.” Writer Jared Newman begins:

Apple appears to have doubled its headcount in artificial intelligence and related fields since 2014–and more than tripled its number of PhD holders in the sector–as tech companies race to build a generation of smarter products. That’s one conclusion from an analysis of more than 600 Apple employees who specialize in machine learning, computer vision, natural language processing, and other disciplines related to AI. To help us understand where Apple is getting its AI talent, Fast Company created a database from publicly available LinkedIn profiles, searching for employees who either defined their jobs as “scientist” or “researcher” or listed AI-related skills in their resumes. This analysis certainly does have some limitations: It won’t account for employees who have defined their jobs in vague terms on their profiles, self-reported inaccurately or incompletely, or have avoided sharing their employment information on LinkedIn entirely. Apple has reportedly discouraged employees from announcing their AI jobs on LinkedIn in the past, so blind spots in our study are inevitable. Still, this analysis provides a broad snapshot of Apple’s response to a growing AI arms race in the tech industry.

The article goes on to share several graphs representing Apple AI hiring trends, like the proportion of Ph.D. to non-Ph.D., hires by year; or the percentages of employees obtained from acquisitions, universities or government organizations, and other businesses. We can also see from which businesses and universities Apple have hired most, and which acquisitions brought the company the most AI talent. See the article for all the details.

Cynthia Murrell, January 1, 2018

AI Has Become Better at Predicting Your Actions Than You Are

January 1, 2018

It’s official, AI has become smarter than us. Well, maybe. It certainly is more sophisticated about human patterns than we ourselves are. We learned just how advanced in a recent article, “Can Math Predict What You’ll Do Next?

According to the piece:

When making predictions, scientists have historically been limited by a lack of complete data, relying instead on small samples to infer characteristics of a wider population.

But in recent years, computational power and methods of collecting data have advanced to the point of creating a new field: big data. Thanks to the huge availability of collected data, scientists can examine empirical relationships between a wide variety of variables to decipher the signal from the noise.

For example, Amazon uses predictive analytics to guess which books we may like based on our prior browsing or purchase history. Similarly, automated online advertisement campaigns tell us which vehicles we may be interested in based on vehicles sought out the day before.

Not convinced? Consider this story about how Carnegie Mellon’s AI  recently won a Texas Hold ‘em Tournament. Poker, of course, is based on subtle human cues, bluffing, and psychology. So, if an AI system is on target there, imagine what it would do if the attention was focused on us?

Patrick Roland, January 1, 2018

Alexa AI Could Drastically Change Your Shopping Experience

December 25, 2017

Amazon’s Alexa, a wi-fi enabled, voice-activated speaker, has become less of a novelty and more of a way of life for millions of owners. With that in mind, the company is aiming to utilize this exposure for analytic purposes. But many are not so excited, as we learned from a Wired piece, “Alexa Wants You To Talk to Your Ads.”

According to the story,

These early interactions won’t necessarily provide additional revenue, but for forward-thinking brands they do hold value. No matter how basic the interaction, connecting with a customer through voice provides a trove of data on how consumers are interacting with a product. Collecting information on how Alexa is used will provide a base of knowledge to position brands to build the more sophisticated tech still to come. Once that “killer experience” is discovered and the confusion clears, these early advertising settlers will be set up to succeed.

They are angling this as a great thing for customers, too. But we are a little skeptical. There is a real fear that Amazon is overstepping boundaries in the name of AI and analytics. Recently, it has come to light that Alexa is always listening and possibly transmitting that data to a warehouse. Even more intimidating is a recent report that Alexa can be easily hacked and used as an eavesdropping tool. This might not be the ideal time for Amazon to encourage this level of interaction with Alexa.

Patrick Roland, December 25, 2017

Big Data Used to Confirm Bad Science

November 30, 2017

I had thought we had moved beyond harnessing big data and were now focusing on AI and machine learning, but Forbes has some possible new insights in, “Big Data: Insights Or Illusions?”

Big data is a tool that can generate new business insights or it can reinforce a company’s negative aspects.  The article consists of an interview with Christian Madsbjerg of ReD Associates.  It opens with how Madsbjerg and his colleagues studied credit card fraud by living like a fraudster for a while.  They learned some tricks and called their experience contextual analytics.  This leads to an important discussion topic:

Dryburgh: This is really interesting, because it seems to me that big data could be a very two-edged sword. On the one hand you can use it in the way that you’ve described to validate hypotheses that you’ve arrived at by very subjective, qualitative means. I guess the other alternative is that you can use it simply to provide confirmation for what you already think.

Madsbjerg: Which is what’s happening, and with the ethos that we’ve got a truth machine that you can’t challenge because it’s big data. So you’ll cement and intensify the toxic assumptions you have in the company if you don’t use it to challenge and explore, rather than to confirm things you already know.

This topic is not new.  We are seeing unverified news stories reach airwaves and circulate the Internet for the pure sake of generating views and profit.  Corporate entities do the same when they want to churn more money into their coffers than think of their workers or their actual customers.  It is also like Hollywood executives making superhero movies based on comic heroes when they have no idea about the medium’s integrity.

In other words, do not forget context and the human factor!

Whitney Grace, November 30, 2017

Analytics Tips on a Budget

November 23, 2017

Self-service analytics is another way to say “analytics on a budget.”  Many organizations, especially non-profits, do not have the funds to invest in a big data plan and technology, so they decide to take the task on themselves.  With the right person behind the project, self-service analytics is a great way to save a few bucks.  IT Pro Portal shares some ways how to improve on an analytics project in, “Three Rules For Adopting Self-Service Analytics.”  Another benefit to self-service analytics is that theoretically anyone in the organization can make use of the data and find some creative outlet for it.  The tips come with the warning label:

Any adoption of new technology requires a careful planning, consultation, and setup process to be successful: it must be comprehensive without being too time-consuming, and designed to meet the specific goals of your business end-users. Accordingly, there’s no one-size-fits-all approach: each business will need to consider its specific technological, operational and commercial requirements before they begin.

What are the three tips?

  1. Define your business requirements
  2. Collaborate and integrate
  3. Create and implement a data governance policy

All I can say to this is, duh!  These are standard tips that can be applied, not only for self-service analytics but also BI plans and any IT plan.  Maybe there are a few tips directly geared at the analytics field but stick to fewer listicles and more practical handbooks.  Was this a refined form of clickbait?

Whitney Grace, November 23, 2017

Healthcare Analytics Projected to Explode

November 21, 2017

There are many factors influencing the growing demand for healthcare analytics: pressure to lower healthcare costs, demand for more personalized treatment, the emergence of advanced analytic technology, and impact of social media.  PR Newswire takes a look at how the market is expected to explode in the article, “Healthcare Analytics Market To Grow At 25.3% CAGR From 2013 To 2024: Million Insights.”  Other important factors that influence healthcare costs are errors in medical products, workflow shortcomings, and, possibly the biggest, having cost-effective measures without compromising care.

Analytics are supposed to be able to help and/or influence all of these issues:

Based on the component, the global healthcare analytics market is segmented into services, software, and hardware. Services segment held a lucrative share in 2016 and is anticipated to grow steady rate during the forecast period. The service segment was dominated by the outsourcing of data services. Outsourcing of big data services saves time and is cost effective. Moreover, Outsourcing also enables access to skilled staff thereby eliminating the requirement of training of staff.

The cloud-based delivery is anticipated to grow and be the most widespread analytics platform for healthcare.  It allows remote access, avoids complicated infrastructures, and has real-time data tracking.  Adopting analytics platforms help curb the rising problems from cost to workforce to treatment the healthcare industry faces and will deal with in the future.  While these systems are being implemented, the harder part is determining how readily workers will be correctly trained on using them.

Whitney Grace, November 21, 2017

Need Better Charts and Graphs?

October 27, 2017

If you want to move beyond the vanilla charts and graphs in Excel and PowerPoint, you will want to read “The 15 Best Data Visualization Tools.” Don’t forget to make sure the data you present are accurate, timely, and germane to the point your snappy graphic will make. (Keep in mind that some folks are happy with snazzy visuals. Close enough for horseshoes.)

Stephen E Arnold, October 27, 2017

Uber vs DC Subway: Fancy Math but No Fires

October 23, 2017

I know I am supposed to focus on search and online content processing. But when I read “Metrorail vs Uber: Travel Time and Cost,” I decided to highlight this example of local government fancy math. The write up explains when it makes sense to take the DC subway usually referenced by those who live in Washington, DC as “the metro” and Uber.

The analysis uses graphs and logic to prove that the DC subway is the better bet for commuting. I noted this passage:

It is unclear how long Uber prices will remain this low. Several news outlets have reported that Uber subsidizes its rides with money from investors, meaning current fares might not reflect the full cost of a ride.

My take is that when prices go up, the DC subway is the better choice when moving around the throbbing heart of government.

But there are the fires, the breakdowns, and the complexity of the transfer bus system to delight the visitor from out of town and the long suffering Red Line riders trying to get from Shady Grove to Pentagon City.

Nifty illustration of what one can do with spare time and a somewhat superficial analysis. Now about those dead elevators or what I call the hassle factor? For added entertainment, watch a person from another country try to buy a ticket to ride the DC subway. Great fun!

Stephen E Arnold, October 23, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta