February 8, 2017
Data visualization may be hitting at just the right time. Data Floq shared an article highlighting the latest, Data Visualisation Can Change How We Think About The World. As the article mentions, we are primed for it biologically: the human eye and brain processes 10 to 12 separate images per second, comfortably. Considering the output, visualization provides the ability to rapidly incorporate new data sets, remove metadata and increase performance. Data visualization is not without challenge. The article explains,
Perhaps the biggest challenge for data visualisation is understanding how to abstract and represent abstraction without compromising one of the two in the process. This challenge is deep rooted in the inherent simplicity of descriptive visual tools, which significantly clashes with the inherent complexity that defines predictive analytics. For the moment, this is a major issue in communicating data; The Chartered Management Institute found that 86% of 2,000 financiers surveyed late 2013, were still struggling to turn volumes of data into valuable insights. There is a need, for people to understand what led to the visualisation, each stage of the process that led to its design. But, as we increasingly adopt more and more data this is becoming increasingly difficult.
Is data visualization changing how we think about the world, or is the existence of big data the culprit? We would argue data visualization is simply a tool to present data; it is a product rather than an impetus for a paradigm shift. This piece is right, however in bringing attention to the conflict between detail and accessibility of information. We can’t help but think the meaning is likely in the balancing of both.
Megan Feil, February 8, 2017
January 30, 2017
Apparently, money laundering has become a very complicated endeavor, with tools like Bitcoin “washers” available via the Dark Web. Other methods include trading money for gaming or other virtual currencies and “carding.” ZDNet discusses law enforcement’s efforts to keep up in, “How Machine Learning Can Stop Terrorists from Money Laundering.”
It will not surprise our readers to learn authorities are turning to machine learning to cope with new money laundering methods. Reporter Charlie Osborne cites the CEO of cybersecurity firm ThetaRay, Mark Gazit, when she writes:
By taking advantage of Big Data, machine learning systems can process and analyze vast streams of information in a fraction of the time it would take human operators. When you have millions of financial transactions taking place every day, ML provides a means for automated pattern detection and potentially a higher chance of discovering suspicious activity and blocking it quickly. Gazit believes that through 2017 and beyond, we will begin to rely more on information and analytics technologies which utilize machine learning to monitor transactions and report crime in real time, which is increasingly important if criminals are going to earn less from fraud, and terrorism groups may also feel the pinch as ML cracks down on money laundering.
Of course, criminals will not stop improving their money-laundering game, and authorities will continue to develop tools to thwart them. Just one facet of the cybersecurity arms race.
Cynthia Murrell, January 30, 2017
January 18, 2017
Big Data and Cloud Computing were supposed to make things easier for the C-Suites to take billion dollar decisions. But it seems things have started to fall apart.
In an article published by Forbes titled The Data Warehouse Has Failed, Will Cloud Computing Die Next?, the author says:
A company that sells software tools designed to put intelligence controls into data warehousing environments says that traditional data warehousing approaches are flaky. Is this just a platform to spin WhereScape wares, or does Whitehead have a point?
WhereScape, a key player in Data Warehousing is admitting that the buzzwords in the IT industry are fizzing out. The Big Data is being generated, in abundance, but companies still are unsure what to do with the enormous amount of data that their companies produce.
Large corporations who already have invested heavily in Big Data are yet to find any RoIs. As the author points out:
Data led organizations have no idea how good their data is. CEOs have no idea where the data they get actually comes from, who is responsible for it etc. yet they make multi million pound decisions based on it. Big data is making the situation worse not better.
Looks like after 3D-Printing, another buzzword in the tech world, Big Data and Cloud Computing is going to be just a fizzled out buzzword.
Vishal Ingole, January 18, 2017
January 17, 2017
Have you ever visited an awesome Web site or been curious how an organization manages their Web presence? While we know the answer is some type of software, we usually are not given a specific name. Venture Beat reports that it is possible to figure out the software in the article, “SimilarTech’s Profiler Tells You All Of The Technologies That Web Companies Are Using.”
SimilarTech is a tool designed to crawl the Internet to analyze what technologies, including software, Web site operators use. SimiliarTech is also used to detect which online payment tools are the most popular. It does not come as a surprise that PayPal is the most widely used, with PayPal Subscribe and Alipay in second and third places.
Tracking what technology and software companies utilize for the Web is a boon for salespeople, recruiters, and business development professionals who want a competitive edge as well as:
Overall, SimilarTech provides big data insights about technology adoption and usage analytics for the entire internet, providing access to data that simply wasn’t available before. The insights are used by marketing and sales professionals for website profiling, lead generation, competitive analysis, and business intelligence.
SimiliarTech can also locate contact information for personnel responsible for Web operations, in other words new potential clients.
This tool is kind of like the mailing houses of the past. Mailing houses have data about people, places, organizations, etc. and can generate contact information lists of specific clientele for companies. SimiliarTech offers the contact information, but it does one better by finding the technologies people use for Web site operation.
Whitney Grace, January 17, 2016
January 9, 2017
Imagine that: Big Data may not have a direct impact on business strategy.
I read “Why Big Data and Algorithms Won’t Improve Business Strategy.” I learned that Big Data learns by playing algorithmic chess. The “moves” can be converted to patterns. The problem is that no one knows what the game is.
The write up points out:
White’s control panel is just a shadow of the landscape and the sequence of presses lacks any positional information or consistent understanding of movement on the board. When faced with a player who does understand the environment then no amount of large scale data analysis on combinations of sequences of presses through the control panel or application of artificial intelligence or algorithms that is going to help you.
The idea is that a disconnect occurs.
Data does not equal strategy for the game of “real” chess.
The write up includes an analysis of a famous battle. An accurate map may be more useful than an MBA analysis of a situationally ignorant analysis. Okay, I understand.
The write up points out:
In the game of Chess above, yes you can use large scale data analytics, AI and algorithms to discover new patterns in the sequences of presses and certainly this will help you against equally blind competitors. Such techniques will also help you in business improve your supply chain or understand user behavior or marketing or loyalty programs or operational performance or any number of areas in which we have some understanding of the environment.
The author adds:
But this won’t help you in strategy against the player with better situational awareness. Most business strategy itself operates in a near vacuum of situational awareness. For the vast majority then I’ve yet to see any real evidence to suggest that big data is going to improve this. There are a few and rare exceptions but in general, the key is first to understand the landscape and that a landscape exists.
The write up leaves me with an opportunity to hire the author. What’s clear is that content marketing and business strategy do connect. That’s reassuring. No analysis needed. No map either.
Stephen E Arnold, January 9, 2017
December 25, 2016
I read “Don’t Blame Big Data for Pollsters’ Failings.” The news about the polls predicting a victory for Hillary Clinton reached me in Harrod’s Creek five days after the election. Hey, Beyond Search is in rural Kentucky. It looks from the news reports and the New York Times’s odd letter about doing “real” journalism that the pundits predicted that the mare would win the US derby.
The write up explains that Big Data did not fail. The reason? The pollsters were not using Big Data. The sample sizes were about 1,000 people. Check your statistics book. In the back will be samples sizes for populations. If you have an older statistics book, you have to use the formula like
Big Data doesn’t fool around with formulas. Big Data just uses “big data.” Is the idea is that the bigger the data, the better the output?
The write up states that the problem was the sample itself: The actual humans.
The write up quotes a mid tier consultant from an outfit called Ovum which reminds me of eggs. I circled this statement:
“When you have data sets that are large enough, you can find signals for just about anything,” says Tony Baer, a big data analyst at Ovum. “So this places a premium on identifying the right data sets and asking the right questions, and relentlessly testing out your hypothesis with test cases extending to more or different data sets.”
The write up tosses in social media. Facebook takes the position that its information had minimal effect on the election. Nifty assertion that.
The solution is, as I understand the write up, to use a more real time system, different types of data, and math. The conclusion is:
With significant economic consequences attached to political outcomes, it is clear that those companies with sufficient depth of real-time behavioral data will likely increase in value.
My view is that hope and other distinctly human behaviors certainly threw an egg at reality. It is great to know that there is a fix and that Big Data emerge as the path forward. More work ahead for the consultants who often determine sample sizes by looking at Web sites like SurveySystem and get their sample from lists of contributors, a 20 something’s mobile phone contact list, or lists available from friends.
If you use Big Data, tap into real time streams of information, and do the social media mining—you will be able to predict the future. Sounds logical? Now about that next Kentucky Derby winner? Happy or unhappy holiday?
Stephen E Arnold, December 25, 2016
December 16, 2016
Big Data touches every part of our lives and we are unaware. Have you ever noticed when you listen to the news, read an article, or watch a YouTube video that people say items such as: “experts claim, “science says,” etc.” In the past, these statements relied on less than trustworthy sources, but now they can use Big Data to back up their claims. However, popular opinion and puff pieces still need to back up their big data with hard fact. Nature.com says that transparency is a big deal for Big Data and algorithm designers need to work on it in the article, “More Accountability For Big-Data Algorithms.”
One of the hopes is that big data will be used to bridge the divide between one bias and another, except that he opposite can happen. In other words, Big Data algorithms can be designed with a bias:
There are many sources of bias in algorithms. One is the hard-coding of rules and use of data sets that already reflect common societal spin. Put bias in and get bias out. Spurious or dubious correlations are another pitfall. A widely cited example is the way in which hiring algorithms can give a person with a longer commute time a negative score, because data suggest that long commutes correlate with high staff turnover.
Even worse is that people and organizations can design an algorithm to support science or facts they want to pass off as the truth. There is a growing demand for “algorithm accountability,” mostly in academia. The demands are that data sets fed into the algorithms are made public. There also plans to make algorithms that monitor algorithms for bias.
Big Data is here to say, but relying too much on algorithms can distort the facts. This is why the human element is still needed to distinguish between fact and fiction. Minority Report is closer to being our present than ever before.
Whitney Grace, December 16, 2016
December 5, 2016
Many know that law enforcement often turns to social media for clues, but you may not be aware how far such efforts have gotten. LittleSis, a group that maps and publishes relationships between the world’s most powerful entities, shares what it has learned about the field of social-media spying in, “You Are Being Followed: The Business of Social Media Surveillance.”
LittleSis worked with MuckRock, a platform that shares a trove of original government documents online. The team identified eight companies now vending social-media-surveillance software to law enforcement agencies across the nation; see the article for the list, complete with links to more information on each company. Writer Aaron Cantú describes the project:
We not only dug into the corporate profiles of some of the companies police contract to snoop on your Tweets and Facebook rants, we also filed freedom of information requests to twenty police departments across the country to find out how, when, and why they monitor social media. …
One particularly well-connected firm that we believe is worth highlighting here is ZeroFOX, which actively monitored prominent Black Lives Matter protesters in Baltimore and labeled some of them, including former Baltimore mayoral candidate DeRay McKesson, ‘threat actors.’ The company reached out to Baltimore officials first, offering it services pro-bono, which ZeroFOX executives painted as a selfless gesture of civic responsibility. But city officials may have been especially receptive to ZeroFOX’s pitch because of the powerful names standing behind it.
Behind ZeroFOX are weighty names indeed, like Mike McConnell, former director of the NSA, and Robert Rodgiguez, who is tied to Homeland Security, the Secret Service, and a prominent security firm. Another company worth highlighting is Geofeedia, because its name appears in all the police-department records the project received so far. The article details how each of these departments have worked with that company, from purchase orders to contract specifications. According to its CEO, Geofeedia grew sevenfold in just the last two years.
Before closing with a call for readers to join the investigation through MuckRock, Cantú makes this key observation:
Because social media incites within us a compulsion to share our thoughts, even potentially illegal ones, law enforcement sees it as a tool to preempt behavior that appears threatening to the status quo. We caught a glimpse of where this road could take us in Michigan, where the local news recently reported that a man calling for civil unrest on Facebook because of the Flint water crisis was nearly the target of a criminal investigation. At its worst, social media monitoring could create classes of ‘pre-criminals’ apprehended before they commit crimes if police and prosecutors are able to argue that social media postings forecast intent. This is the predictive business model to which Geofeedia CEO Phil Harris aspires. [The link goes to a 23-minute interview with Harris at YouTube.]
Postings forecast intent”— because no one ever says anything online they don’t really mean, right? There is a reason the pre-crime-arrest concept is fodder for tales of dystopian futures. Where do details like civilian oversight and the protection of civil rights come in?
Cynthia Murrell, December 5, 2016
December 5, 2016
An analytics company that collects crime related data from local law enforcement agencies plans to help reduce crime rates by using Big Data.
CrimerReports.com, in its FAQs says:
The data on CrimeReports is sent on an hourly, daily, or weekly basis from more than 1000 participating agencies to the CrimeReports map. Each agency controls their data flow to CrimeReports, including how often they send data, which incidents are included.
Very little is known about the service provider. WhoIs Lookup indicates that though the domain was registered way back in 1999, it was updated few days back on November 25th 2016 and is valid till November 2, 2017.
CrimeReports is linked to a local law enforcement agency that selectively shares the data on crime with the analytics firm. After some number crunching, the service provider then sends the data to its subscribers via emails. According to the firm:
Although no formal, third-party study has been commissioned, there is anecdotal evidence to suggest that public-facing crime mapping—by keeping citizens informed about crime in their area—helps them be more vigilant and implement crime prevention efforts in their homes, workplaces, and communities. In addition, there is anecdotal evidence to suggest that public-facing crime mapping fosters more trust in local law enforcement by members of the community.
To maintain data integrity, the data is collected only through official channels. The crime details are not comprehensive, rather they are redacted to protect victim and criminal’s privacy. As of now, CrimeReports get paid by law enforcement agencies. Certainly, this is something new and probably never tried.
Vishal Ingole, December 5, 2016
December 1, 2016
“Let’s index everything” or “Let’s process all the digital data”. Ever hear these statements or something similar? I have. In fact, I hear this type of misinformed blather almost every day. I read “Big Data Coming in Faster Than Biomedical Researchers Can Process It” seems to have figured out that yapping about capture and crunch are spitting out partial truths. (What’s new in the trendy world of fake news?)
The write up points out in a somewhat surprised way:
“It’s not just that any one data repository is growing exponentially, the number of data repositories is growing exponentially,” said Dr. Atul Butte, who leads the Institute for Computational Health Sciences at the University of California, San Francisco.
Now the kicker:
Prospecting for hints about health and disease isn’t going to be easy. The raw data aren’t very robust and reliable. Electronic medical records are often kept in databases that aren’t compatible with one another, at least without a struggle. Some of the potentially revealing details are also kept as free-form notes, which can be hard to extract and interpret. Errors commonly creep into these records. And data culled from scientific studies aren’t entirely trustworthy, either.
Net net: Lots of data. Inadequate resources. Inability to filter for relevance. Failure to hook “data” to actual humans. The yap about curing cancer or whatever disease generates a news release indicates an opportunity. But there’s no easy solution.
The resources to “make sense” of large quantities of historical and real time data are not available. But marketing is easy. Dealing with real world data is a bit more difficult. Keep that in mind if you develop a nifty disease and expect Big Data and analytics to keep the cookies from burning. Sure the “data” about making a blue ribbon batch of chocolate chips is available. Putting the right information into a context at the appropriate time is a bit more difficult even for the cognitive, smart software, text analytics cheerleaders.
Wait. I have a better idea. Why not just let a search system find and discover exactly what you need? Let me know how that works out for you.
Stephen E Arnold, December 1, 2016