We Are Without a Paddle on Growing Data Lakes

January 18, 2018

The pooling of big data is commonly known as a “data lake.” While this technique was first met with excitement, it is beginning to look like a problem, as we learned in a recent Info World story, “Use the Cloud to Create Open, Connected Data Lakes for AI, Not Data Swamps.”

According to the story:

A data scientist will quickly tell you that the data lake approach is a recipe for a data swamp, and there are a few reasons why. First, a good amount of data is often hastily stored, without a consistent strategy in place around how to organize, govern and maintain it. Think of your junk drawer at home: Various items get thrown in at random over time, until it’s often impossible to find something you’re looking for in the drawer, as it’s gotten buried.

This disorganization leads to the second problem: users are often not able to find the dataset once ingested into the data lake.

So, how does one take aggregate data from a stagnant swamp to a lake one can traverse? According to Scientific Computing, the secret lies in separating the search function into two pieces, finding and searching. When you combine this thinking with Info World’s logic of using the cloud, suddenly these massive swamps are drained.

Patrick Roland, January 18, 2018



Google Tries Like Crazy to End Extreme Content Controversy

January 16, 2018

Google is having a tough time lately. When it purchased YouTube few thought extremist videos and wonky children’s programming would be its most concerning headaches. But their solutions remain strained, as we discovered in a recent Verge story, “YouTube Has Removed Thousands of Videos from Extremist Cleric Anwar Al-Awlaki.”

Google removed hundreds of al-Awalaki’s videos in 2010 which directly advocated violence, following the conviction of Roshonara Choudhry, a radicalized follower who stabbed British MP Stephen Timms earlier that year. At the time, a YouTube spokesperson cited the site’s guidelines against inciting violence. But al-Awalaki posted tens of thousands of other videos, and in subsequent years, was cited as an influence in other notable terrorist attacks at Fort Hood, the Boston Marathon, San Bernardino, and Orlando, Florida.

This comes on the heels of another Verge story with a similar issue, “YouTube Says it Will Crack Down on Bizarre Videos Targeting Children.”

We’re in the process of implementing a new policy that age restricts this content in the YouTube main app when flagged,” said Juniper Downs, YouTube’s director of policy. “Age-restricted content is automatically not allowed in YouTube Kids.” YouTube says that it’s been formulating this new policy for a while, and that it’s not rolling it out in direct response to the recent coverage.

Google is trying to do better, but it seems like they are fighting off an avalanche with a snow shovel. Luckily, as Washington Post points out, the United States leads the world in terms of big data. One can hope that a solution lies in their somewhere, but good luck predicting what it will be.

Patrick Roland, January 17, 20186

One of Big Datas Giants Accused of Big Time Fraud

January 15, 2018

Palantir, one of the biggest names in big data has been praised for its innovative solutions since it began 2004. However, it has been getting attention for all the wrong reasons lately, as we saw in a recent Deal Street Asia story, “Palantir Holder Says Company Sabotaged Stock Sale to Chinese.”

One of Palantir Technologies Inc.’s early investors accused the data-mining startup of sabotaging his attempt to sell his $60 million stakes to a Chinese company so directors and executives could enrich themselves by selling their stock instead.

Marc Abramowitz, a 63-year-old lawyer and investor, contends that when Palantir executives got wind of his offer to sell his stock to Chinese private equity firm CDH Investments Fund Management Co., they sunk the deal by offering to sell their shares to CDH instead, according to a lawsuit filed Thursday in Delaware. Palantir’s campaign to spoil Abramowitz’s sale demonstrates the Silicon Valley company’s “willingness to intentionally interfere with shareholder transactions in an effort…’

It may be tough to prove this in court, however. Palantir is famous for its secrecy, though that may become a thing of the past when they go public. Either way, this is an interesting look at the cutthroat world of big data and the potential things people do to stay on top.

Patrick Roland, January 15, 2018

Big Data Logic Turning Government on Its Ear

January 3, 2018

Can the same startup spirit that powers so many big data companies disrupt the way the government operates? According to a lot of experts, that’s exactly what is happening. We discovered more in a recent Next Gov article, “This Company is Trying to Turn Federal Agencies into Startups.”

According to the story:

BMNT Partners, a Palo Alto-based company, is walking various government agencies through the process of identifying pressing problems and then creating teams that compete against each other to design the best solution. The best of those products might warrant future investments from the agency.

The process begins when an agency presents BMNT with an array of problems it faces internally; BMNT staff helps them narrow down the problem scope, conduct market research to identify the problems that could pique interest from commercial companies, and then track down experts within the agency who can evaluate the solutions. BMNT also helps agencies create various teams of three or four employees who can start building minimum viable products. Newell explained those employees often are selected from the pool within the chief information officer’s or chief technology officers’ staffs.

This seems like a very plausible avenue. Federal agencies are already embracing machine learning and AI, so why not move a little further in this direction? We are looking forward to seeing how this pans out, but chances are this is something the government cannot ignore.

Patrick Roland, January 3, 2018

AI Has Become Better at Predicting Your Actions Than You Are

January 1, 2018

It’s official, AI has become smarter than us. Well, maybe. It certainly is more sophisticated about human patterns than we ourselves are. We learned just how advanced in a recent Phys.org article, “Can Math Predict What You’ll Do Next?

According to the piece:

When making predictions, scientists have historically been limited by a lack of complete data, relying instead on small samples to infer characteristics of a wider population.

But in recent years, computational power and methods of collecting data have advanced to the point of creating a new field: big data. Thanks to the huge availability of collected data, scientists can examine empirical relationships between a wide variety of variables to decipher the signal from the noise.

For example, Amazon uses predictive analytics to guess which books we may like based on our prior browsing or purchase history. Similarly, automated online advertisement campaigns tell us which vehicles we may be interested in based on vehicles sought out the day before.

Not convinced? Consider this story about how Carnegie Mellon’s AI  recently won a Texas Hold ‘em Tournament. Poker, of course, is based on subtle human cues, bluffing, and psychology. So, if an AI system is on target there, imagine what it would do if the attention was focused on us?

Patrick Roland, January 1, 2018

Watson and CDC Research Blockchain

December 29, 2017

Oh, Watson!  What will IBM have you do next?  Apparently, you will team up with the Centers for Disease Control and Prevention to research blockchain benefits.  The details about Watson’s newest career are detailed in Fast Company’s article, “IBM Watson Health Team With the CDC To Research Blockchain.”  Teaming up with the CDC is an extension of the work IBM Watson is already doing with the Food and Drug Administration by exploring owned-mediated data exchange with blockchain.

IBM chief science officer Shahram Ebadollahi explained that the research with the CDC and FDA with lead to blockchain adoption at the federal government level.  By using blockchain, the CDC hopes to discover new ways to use data and expedite federal reactions to health threats.

Blockchain is a very new technology developed to handle sensitive data and cryptocurrency transactions.  It is used for applications that require high levels of security.  Ebadollahi said:

 ‘Blockchain is very useful when there are so many actors in the system,’ Ebadollahi said. ‘It enables the ecosystem of data in healthcare to have more fluidity, and AI allows us to extract insights from the data. Everybody talks about Big Data in healthcare but I think the more important thing is Long Data.’

One possible result is that consumers will purchase a personal health care system like a home security system.  Blockchain could potentially offer a new level of security that everyone from patients to physicians is comfortable with.

Blockchain is basically big data, except it is a more specific data type.  The applications are the same and it will revolutionize the world just like big data.

Whitney Grace, December 29, 2017

Turning to AI for Better Data Hygiene

December 28, 2017

Most big data is flawed in some way, because humans are imperfect beings. That is the premise behind ZDNet’s article, “The Great Data Science Hope: Machine Learning Can Cure Your Terrible Data Hygiene.” Editor-in-Chief Larry Dignan explains:

The reality is enterprises haven’t been creating data dictionaries, meta data and clean information for years. Sure, this data hygiene effort may have improved a bit, but let’s get real: Humans aren’t up for the job and never have been. ZDNet’s Andrew Brust put it succinctly: Humans aren’t meticulous enough. And without clean data, a data scientist can’t create algorithms or a model for analytics.


Luckily, technology vendors have a magic elixir to sell you…again. The latest concept is to create an abstraction layer that can manage your data, bring analytics to the masses and use machine learning to make predictions and create business value. And the grand setup for this analytics nirvana is to use machine learning to do all the work that enterprises have neglected.

I know you’ve heard this before. The last magic box was the data lake where you’d throw in all of your information–structured and unstructured–and then use a Hadoop cluster and a few other technologies to make sense of it all. Before big data, the data warehouse was going to give you insights and solve all your problems along with business intelligence and enterprise resource planning. But without data hygiene in the first place enterprises replicated a familiar, but failed strategy: Poop in. Poop out.

What the observation lacks in eloquence it makes up for in insight—the whole data-lake concept was flawed from the start since it did not give adequate attention to data preparation. Dignan cites IBM’s Watson Data Platform as an example of the new machine-learning-based cleanup tools, and points to other noteworthy vendors investigating similar ideas—Alation, Io-Tahoe, Cloudera, and HortonWorks. Which cleaning tool will perform best remains to be seen, but Dignan seems sure of one thing—the data that enterprises have been diligently collecting for the last several years is as dirty as a dustbin lid.

Cynthia Murrell, December 28, 2017

Big Data Used to Confirm Bad Science

November 30, 2017

I had thought we had moved beyond harnessing big data and were now focusing on AI and machine learning, but Forbes has some possible new insights in, “Big Data: Insights Or Illusions?”

Big data is a tool that can generate new business insights or it can reinforce a company’s negative aspects.  The article consists of an interview with Christian Madsbjerg of ReD Associates.  It opens with how Madsbjerg and his colleagues studied credit card fraud by living like a fraudster for a while.  They learned some tricks and called their experience contextual analytics.  This leads to an important discussion topic:

Dryburgh: This is really interesting, because it seems to me that big data could be a very two-edged sword. On the one hand you can use it in the way that you’ve described to validate hypotheses that you’ve arrived at by very subjective, qualitative means. I guess the other alternative is that you can use it simply to provide confirmation for what you already think.

Madsbjerg: Which is what’s happening, and with the ethos that we’ve got a truth machine that you can’t challenge because it’s big data. So you’ll cement and intensify the toxic assumptions you have in the company if you don’t use it to challenge and explore, rather than to confirm things you already know.

This topic is not new.  We are seeing unverified news stories reach airwaves and circulate the Internet for the pure sake of generating views and profit.  Corporate entities do the same when they want to churn more money into their coffers than think of their workers or their actual customers.  It is also like Hollywood executives making superhero movies based on comic heroes when they have no idea about the medium’s integrity.

In other words, do not forget context and the human factor!

Whitney Grace, November 30, 2017

The Thing Holding AI Back Is the Thing It Needs Most, Data

November 30, 2017

Here’s an interesting problem: for artificial intelligence and machine learning to thrive, it needs a massive amount of information. However, they need so much data that it causes hiccups in the system. Google has a really interesting solution to this problem, as we learned in the Reuter’s article, “Google’s Hinton Outlines New AI Advance That Requires Less Data.”

The bundling of neurons working together to determine both whether a feature is present and its characteristics also means the system should require less data to make its predictions.


The leader of Google Brain said, “The hope is that maybe we might require less data to learn good classifiers of objects, because they have this ability of generalizing to unseen perspectives or configurations of images.

Less data for big data? It’s just crazy enough to work. In fact, some of the brightest minds in the business are trying to, as ComputerWorld said, “do less with more.” The piece focuses on Fuzzy LogiX and their attempts to do exactly what Google is hypothetically saying. It will be interesting to see what happens, but we are betting on technology cracking this nut.

Patrick Roland, November 30, 2017


The Worlds Wealthiest People Should Fear Big Data

November 24, 2017

One of the strengths that the planets elite and wealthy have is secrecy. In most cases, average folks and media don’t know where big money is stored or how it is acquired. However, that recently changed for The Queen of England, several Trump cabinet members, and other powerful men and women. And they should be afraid of what big data and search can do with their info, as we learned in the Guardian’s piece, “Paradise Papers Leak Reveals Secrets of the World’s Elite Hidden Wealth.”

The story found a lot of fishy dealings with political donors and those in power, Queen Elizabeth having tax-free money in the Caymans and more. According to the story:

At the centre of the leak is Appleby, a law firm with outposts in Bermuda, the Cayman Islands, the British Virgin Islands, the Isle of Man, Jersey and Guernsey. In contrast to Mossack Fonseca, the discredited firm at the centre of last year’s Panama Papers investigation, Appleby prides itself on being a leading member of the “magic circle” of top-ranking offshore service providers.


Appleby says it has investigated all the allegations, and found “there is no evidence of any wrongdoing, either on the part of ourselves or our clients”, adding: “We are a law firm which advises clients on legitimate and lawful ways to conduct their business. We do not tolerate illegal behaviour.

Makes you wonder what would happen if some of the brightest minds in search and big data got ahold of this information? We suspect a lot of the financial knots this money ties to keep itself concealed would untangle. In an age of increasing transparency, we wouldn’t be shocked to see that happen.

Patrick Roland, November 24, 2017

Next Page »

  • Archives

  • Recent Posts

  • Meta