October 8, 2014
I read “Belief, Bias, and Bayes.” The write up appeared in the open source friendly Guardian newspaper. Bayes and his methods are more popular than ever. Instead of meeting the good churchman in a statistics class, there he is near the editorial page, on the Web, and in the blogosphere.
This particular write up is surprisingly gentle toward the bane of many university students. Here’s the explanation of the method in the article:
I find it easier to be concrete. So imagine I have a bag containing three stones; two blue and one red. Without looking, and in random order, you and I pick, and keep, one stone each. What are the chances I have blue and you have red? I could work them out two ways. If you have the only red stone (which you have a one-in-three chance of having got, without knowing anything about my choice) then I must have a blue (one-in-one). The probability is ? × 1 = ?, a third. On the other hand, if we know I have a blue stone (probability two-in-three) then there is a 50:50 chance you have a red stone. The probability is ? × ½ = ? again. The answers had to come out the same, since both ways of working it out describe the same result. The “probability of me having blue if you have red, multiplied by the probability of you having red”, has to be the same as “the probability of you having red if I have blue multiplied by the probability of me having blue”. Abstracted, that’s Bayes’ Theorem.
There you go. There was a particularly useful quotation in the article; to wit:
One of the things that gets people fired up is that Bayesian statistics can introduce a level of subjectivity into the scientific process that some scientists see as unacceptable.
Spot on. I recall one failed webmaster who publishes “expert opinions” who fulminated against this “flaw” in the method. I made a brief effort to explain the benefits of the method, but he would have none of it. The biases baked into his “expert” brain was more correct than any mathematical reasoning. That’s what makes this person a “real” expert and, of course, a failed webmaster.
The Guardian article comes at being resistant to a procedure this way:
I guess Bayesian statistics provides a mathematical definition of a closed mind. Anyone with a prior of zero about something can never learn from any amount of evidence, because anything multiplied by zero is still zero.
I think this means one is stupid. Perhaps this resistance to a method is behind much of the fulminating about Autonomy’s digital reasoning engine and its integrated data operating layer?
Stephen E Arnold, October 8, 2014
October 7, 2014
I am amused when a company can roll out a product that people do not like. A good example is the Windows 8 version of the popular operating system. I think of Vista and Windows ME. I wonder how a company cannot “predict” how its own customers will react to a series of very expensive operating system changes.
The answer is that Microsoft’s ability to predict is not particularly good in my opinion. I won’t mention Windows Phone. I would point out that Apple’s iPhone 6 moved millions of units over a weekend. Did Microsoft predict that its phone would perform at a comparable level? Probably.
I read “A New Kind of Data-Driven Predictive Methodology.” The article is one of a flurry of fancy math stories that are choking my Overflight intelligence system.
The article explains that Microsoft predicted the Scottish independence vote and:
Microsoft…correctly predicted the winners of all 15 World Cup knockout games earlier this year and got the Obama vs. Romney outcome right in 50 of 51 jurisdictions (the states plus the District of Columbia) in the 2012 U.S. presidential election.
Pretty impressive until I think about Microsoft’s dismal track record with its own products’ acceptance by its own customers.
If you want to get more insight into a system that seems to perform well for non Microsoft questions, dig in. Microsoft is into social, reinventing survey research, and analysis of data that “must be accurate.”
Yep, accurate data help. How did those predictions about the Fast Search & Transfer acquisition work out? I will try to “Delve” into that question.
Stephen E Arnold, October 7, 2014
October 6, 2014
The article on Geekzone titled IBM Introduces Powerful Analytics For Everyone discusses the recent announcement from IBM. The promise of an natural language-based analytics that is easier to get and to use is meant to cater to the modern businessperson. Three major advances in analytics technology have been applied to Watson Analytics including a streamlined “single business analytics experience”, a guided predictive analytics” that brings relevant material to the surface, and finally a “natural language dialogue” in familiar business jargon. Senior Vice President Bob Picciano explains in the article,
“Watson Analytics is designed to help all business people – from sales reps on the road to company CEOs – see patterns, pursue ideas and improve all types of decisions. We have eliminated the barrier between the answers they seek, the analytics they want and the data in the form they need. The combination of Watson-fueled analytics to magnify human cognition, the vast potential of big data, and cloud-scale delivery to PCs, smart phones and other devices is transformational.”
IBM has kept the actual businessperson in mind, the businessperson who spends half of their time collecting accurate data. By automating many of the steps in analysis, Watson analytics aims to aid in the efficiency and relevance of the analysis at hand.
Chelsea Kerwin, October 06, 2014
October 3, 2014
Big Data. Biggish Data. Now dark data. The idea plays on the silliness of the dark Web; that is, it is information that is “there”, but you don’t know about it. Well, get with it, pilgrim. Datameer used this term in “Shine Light on Dark Data.”
Here’s the definition:
At every organization neglected data sits overlooked in log files and archives accumulating digital dust and incurring costs. But as more organizations look for ways to become better, stronger and faster, they’re digging into this “dark” data and uncovering a gold mine of business intelligence.
Now how do you shine light on dark data? Great question. I will not probe the logical aspects of this concept. There are, according to the article, five steps to take. These are—unsurprisingly—the same steps a prudent and informed manager takes to figure out just plain old data.
Words to marketers make all the difference. I am not sure data has an opinion.
Stephen E Arnold, October 3, 2014
October 1, 2014
Short honk: It might be almost 50 years old, but if you are into math, Carl Boyer’s History of Mathematics is worth a look. You can access the full text in image form at http://bit.ly/1oueUIL.
Stephen E Arnold, October 1, 2014
September 30, 2014
The hyperbole artists have painted themselves into a corner. I am not sure too many folks know this. The idea that one can crank out killer analyses with a couple of swipes or a mouse click are raising expectations. Like so much in content processing, reality is a just little bit different.
You know that the slips twixt cup and lip must be cropping up in numerous organizations. The Harvard Business Review does not write too much science fiction compared to MIT’s Technology Review across the river.
“Beware the Analytics Bottleneck” adopts the same MBA tone that makes Wall Street bankers and lawyers so beloved by the common man and delivers what might be a downside.
The write up states:
“Don’t be overwhelmed. Start slower to go faster.” I think that runs counter to the baloney in the Eric Schmidt Google tome.
Next the HBR wants to keep life simple for the busy one percenters:
Technology doesn’t have to be exposed. Keep the complexity behind the curtain. Definitely good advice if one does not know whether the data are valid and the numerical recipes are configured in an appropriate manner.
Then the golden piece of advice for the go go MBA looking for a payday so he or she can pursue his or her dream of helping people or just spending money:
Make faster decisions for faster rewards.
That’s a sure fire way to break through bottlenecks. Use the outputs to support really fast decisions. Forget that pondering stuff. Just guess.
What’s scary is that when some folks have a tiny bit of knowledge, their deliberations can yield disastrous decisions. Need some examples. Well, do some thinking. How about GM and ignition switches? What about IRS actions and email mysteries? Or multi billion dollar acquisitions that lead to multi billion dollar write offs shortly after handing over the dump trucks filled with cash?
My take on this write up is that the “expert” did not focus on the bottlenecks that Big Data often produce like sex crazed hamsters:
- The time and cost to normalize and validate data
- The complexity of updating indexes so that reports reflect the most recent data, not stale data
- Dealing with the configuration decisions that generate outputs that are just plain wrong
- The money spent to get a system back online when it crashes either an old fashioned on premises flame out or one of the nifty new cloud systems that are virtual and allegedly fool proof.
In short, Big Data and analytics pose some very significant challenges for vendors, licensees, and those who use the systems. The good news is that guessing will probably produce better results than reasoning through a decision based on flawed information. The bad news is that fancy content processing systems are likely to gobble budgets and increase certain operational costs.
The HBR obviously does not agree. Well, the fellows around the cast iron stove in Harrod’s Creek, Kentucky, find my observations directly on point.
Stephen E Arnold, September 30, 2014
September 28, 2014
The yada yada about Big Data drones on. I found “Billboard Crunches Facebook Data to Chart Music Tastes of NFL Team Fan Bases.” Analytics can be useful. Forget Ebola, the data about ISIS/ISIL, and precision/recall scores for a Google search.
The article explains a dramatic application for number crunching. I learned:
Billboard asked Facebook’s data crunchers to tally the music pages “liked” by fans of each NFL team and figured out who likes to blast what from their stereos.
Here are three insights generated by the intrepid data scientists. How many of these outputs resonate with you?
- Oakland Raiders’ fans like Snoop Dogg
- Buffalo Bills’ fans like The Beatles.
- Chicago Bears’ fans like Michael Jackson.
And my favorite. Arizona Cardinals’ fans prefer Pitbull. (Who? A bird and a dog type?)
Essential insights for decision makers. Yeah, yeah, yeah.
Stephen E Arnold, September 28, 2014
September 27, 2014
Short honk: I you want a copy of National Security Agency 2014 Technology Catalog: Technology Transfer Program, you can download it for now from this link. If found pages 26 to 40 fascinating. Will IDC issue its own version of this document, using its surfing technical demonstrated by Dave Schubmehl with my content? I will keep my eye open.
Stephen E Arnold, September 27, 2014
September 25, 2014
I find visualization endlessly entertaining. I read “A Visual Data Mining Framework for Convenient Identification of Useful Knowledge.” The authors of the paper are Kaidi Zhao and Bing Liu (University of Illinois, Chicago) and Thomas M. Tirpak and Weimin Xiao (Motorola Labs). The paper illustrates the effort going in to making sense of available data. What struck me was the illustration on page 6 of the PDF.
I hope that progress is rapid with their approach. Useful knowledge is helpful particularly when the visualization method is crystal clear. When I look closely at a magnified view of page 6’s diagram, I can spot “values.” In Column C, Row 6, there are three unexplained values.
Stephen E Arnold, September 25, 2014
September 22, 2014
I read “How IBM’s Watson Could Do for Analytics What Search Did for Google.” I urge you to flip through a math book like Calculus on Manifolds: A Modern Approach to Classical Theorems of Advanced Calculus. Although an older book, some of its methods are now creeping into the artificial intelligence revolution that seems to be the next big thing. Then read the Datamation write up.
IBM is rolling out a “freemium model to move Watson, their [sic] English language AI interface for analytics, into the market more aggressively.” What could be more aggressive than university contents, recipes for Bon Appétit, and curing cancer?
The article points out that the only competitor to Watson is Google. Well, that’s an interesting assertion.
Google put an interface on search I learned. The rest is Google’s dominance. Now IBM wants to put an interface on analytics, and—I assume it follows to the thinkers at IBM—IBM’s dominance will tag along.
The article asserts:
We often talk about analytics needing data scientists who have a unique skill set, allowing them to get out the answers needed from highly complex data repositories. Since the results of the analysis are supposed to lead to better executive decisions the ideal skill set would have been an MBA Data Scientist, yet I’ve actually never seen one of those. Folks who are good at deep analysis and folks that are good at business tend to be very different folks, and data scientists are in very short supply at the moment.
Well, someone has to:
- Select numerical recipes
- Set thresholds
- Select process sequences
- Select data and ensure that they are valid
- Set up outputs, making decisions about what to show and what not to show
- Modify when the outputs do not match reality. (I realize that this step is of little interest to some analytics users.)
The article concludes:
The Freemium model has similar advantages. So if you wrap a product that line executives should prefer with an economic model that removes most of the financial barriers, you should end up with a solution that does for IBM what Search did for Google. And that could do some interesting things to the analytics market, creating a similar set of conditions to those that put IBM on top of technology in the last century.
What’s a freemium model? What’s the purpose of the analysis? What’s the method to validate results? What controls does a clueless user have over the Watson system?
Oh, wait. Watson is a search system. Google is a search system that people use. Watson is a search system that few use. Also, IBM still sells mainframes. This is a useful factoid to keep in mind.
Stephen E Arnold, September 22, 2014