Big Data Gets Emotional
December 15, 2015
Christmas is the biggest shopping time of the year and retailers spending months studying consumer data. They want to understand consumer buying habits, popular trends in clothing, toys, and other products, physical versus online retail, and especially what competition will be doing sale wise to entice more customers to buy more. Smart Data Collective recently wrote about the science of shopping in “Using Big Data To Track And Measure Emotion.”
Customer experience professionals study three things related to customer spending habits: ease, effectiveness, and emotion. Emotion is the biggest player and is the biggest factor to spur customer loyalty. If data specialists could figure out the perfect way to measure emotion, shopping and science would change as we know it.
“While it is impossible to ask customers how do they feel at every stage of their journey, there is a largely untapped source of data that can provide a hefty chunk of that information. Every day, enterprise servers store thousands of minutes of phone calls, during which customers are voicing their opinions, wishes and complaints about the brand, product or service, and sharing their feelings in their purest form.”
The article describes some methods emotional data is fathered: phone recordings, surveys, and with vocal layer speech layers being the biggest. Analytic platforms that measure vocal speech layers that measure relationships between words and phrases to understand the sentiment. The emotions are ranged on a five-point scale, ranging from positive to negative to discover patterns that trigger reactions.
Customer experience input is a data analyst’s dream as well as nightmare based on all of the data constantly coming.
Whitney Grace, December 15, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Big Data and Math: Puzzlers Abound
December 14, 2015
Interested in math? Navigate to “Big Data’s Mathematical Mysteries: Machine Learning Works Spectacularly Well, but Mathematicians Aren’t Quite Sure Why.” Yep, I know the certainties of high school math are annoyances when dealing with more sophisticated procedures. Dabble in C* algebras, and you will realize why home economics and general business were appealing.
The point of the write up is that numerical recipes can do their thing on existing and incoming data. If the recipes were trained correctly by a human, some of the niftier systems can learn. Now this is not the same as housebreaking your new Great Dane, but the analogy is close enough for would be mathematicians.
If a system does not require humans to supervise it, methods exist to explore hidden structures. Think patterns a human cannot perceive.
Here’s the passage I highlighted:
These methods are already leading to interesting and useful results, but many more techniques will be needed. Applied mathematicians have plenty of work to do. And in the face of such challenges, they trust that many of their “purer” colleagues will keep an open mind, follow what is going on, and help discover connections with other existing mathematical frameworks. Or perhaps even build new ones.
The idea is that good enough mathematicians can use numerical procedures and get pretty useful outputs. There you go. No need to fool around with Hilbert spaces.
Stephen E Arnold, December 11, 2015
More Big Data Analytics Market Size Numbers
December 13, 2015
I read “IBM, SAS, and SAP Do0minatge Big Data Analytics Market, but Challengers Remain.” The write up contained one of those wild and wooly market estimates. I love these confections. Many gobble down the figures because, like cupcakes, the sweet looks so darned tasty.
Here’s the passage I highlighted:
Big Data analytics software revenues will experience strong growth in the coming years, doubling its current global 2015 revenue of $US36.20 billion to $US 73.77 billion by 2021 and reaching $US81 billion by 2022, a compound annual growth rate of 12 per cent in the next seven years.
Here’s another number:
Drilling down into industry specifics, findings also claim that the Big Data analytics healthcare vertical segment will grow from $US7.964 billion in software revenue in 2015 to $US17.031 billion in 2022, a CAGR of 11 percent worldwide.
Seven years. Okay. There is no explanation about the method used to cook up the data. That’s fine. Who reads the label on a box of Cap’n Crunch?
The write up identified some companies posing a challenge to the Big Data analytics leaders IBM, SAS, and SAP. I found this list fascinating:
- Bosch
- Cisco
- Dell
- General Electric
- Intel
- Microsoft
- Oracle.
Life is tidy when the pool of players consists of publicly traded companies. What about the minnows not on the list? Presumably the big folks need not worry about upstarts.
Stephen E Arnold, December 13, 2015
Metanautix: Big Data Search
December 9, 2015
I read “Ex-Google, Facebook Duo Aim to Simplify Big Data Search.” The idea is that people with Big Data cannot find what is needed to answer a question. The fix may be developed by Matanautix.
Sound familiar?
I have heard this user requirement for what is it now? 25, 30 years, or more?
According to the write up:
When a company wants to analyze data, typically it first has to input all of that information all into some type of database. Then an engine can be built to bring about answers to any inquires. What Metanautix does, however, is build-in search capabilities for an existing database.
I thought that a number of other firms have developed solutions for Big Data search; for example, Lucidworks. If the article is correct, the fine folks at Lucidworks will have to content with a competitor that does more than put out marketing assertions.
Stephen E Arnold, December 9, 2015
Computers Pose Barriers to Scientific Reproducibility
December 9, 2015
These days, it is hard to imagine performing scientific research without the help of computers. Phys.org details the problem that poses in its thorough article, “How Computers Broke Science—And What We Can Do to Fix It.” Many of us learned in school that reliable scientific conclusions rest on a foundation of reproducibility. That is, if an experiment’s results can be reproduced by other scientists following the same steps, the results can be trusted. However, now many of those steps are hidden within researchers’ hard drives, making the test of reproducibility difficult or impossible to apply. Writer, Ben Marwick points out:
“Stanford statisticians Jonathan Buckheit and David Donoho [PDF] described this issue as early as 1995, when the personal computer was still a fairly new idea.
‘An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.’
“They make a radical claim. It means all those private files on our personal computers, and the private analysis tasks we do as we work toward preparing for publication should be made public along with the journal article.
This would be a huge change in the way scientists work. We’d need to prepare from the start for everything we do on the computer to eventually be made available for others to see. For many researchers, that’s an overwhelming thought. Victoria Stodden has found the biggest objection to sharing files is the time it takes to prepare them by writing documentation and cleaning them up. The second biggest concern is the risk of not receiving credit for the files if someone else uses them.”
So, do we give up on the test of reproducibility, or do we find a way to address those concerns? Well, this is the scientific community we’re talking about. There are already many researchers in several fields devising solutions. Poetically, those solutions tend to be software-based. For example, some are turning to executable scripts instead of the harder-to-record series of mouse clicks. There are also suggestions for standardized file formats and organizational structures. See the article for more details on these efforts.
A final caveat: Marwick notes that computers are not the only problem with reproducibility today. He also cites “poor experimental design, inappropriate statistical methods, a highly competitive research environment and the high value placed on novelty and publication in high-profile journals” as contributing factors. Now we know at least one issue is being addressed.
Cynthia Murrell, December 9, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Big Data? Nope, Bad Data
December 3, 2015
I have been skeptical of surveys generated by mid tier consulting firms in search of customers. I shudder when I read about customer surveys conducted using online Web data collection forms tossed open to anyone who stumbles upon the survey URL. The results are usually a source of amusement. I am sorely tempted to highlight some of these Fred Allen radio show scripts, but I don’t need letters from legal eagles threatening my continued existence here in rural Kentucky.
I did read “This Isn’t ‘Big Data.’ It’s Just Bad Data,” and it seems a handful of other folks share my concern about bogus studies. The write up appeared in a Bloomberg service. I am starting to think that the Bloomberg outfit is as skeptical as I am about the Big Data revolution. In today’s economic environment, a friendly convenient store selling essentials like baloney is a welcome sight to storm tossed managers.
The write up says:
With response rates that have declined to under 10 percent, public opinion polls are increasingly unreliable. Perhaps even more concerning, though, is that the same phenomenon is hindering surveys used for official government statistics, including the Current Population Survey, the Survey of Income and Program Participation and the American Community Survey. Those data are used for a wide array of economic statistics — for example, the numbers you read in newspapers on unemployment, health insurance coverage, inflation and poverty.
The key point in my opinion is unreliable. Academics are concerned as well. Hey, these folks have their own challenge with the reproducibility of results issue. Oh, well.
The article points out that some Federal survey funds may be allocated elsewhere. Yikes.
My view is that the bad data thing is a growing problem. As self service systems like using Cortana to get business intelligence become more widely available, fewer and fewer folks with worry about the validity of the data upon which the “intelligence” is based.
Is this a problem? Yep. Will the feisty Big Data cheerleaders take action? Nah. Revenue, baby.
Stephen E Arnold, December 3, 2015
Google Drastically Slows Acquisition Spending
December 3, 2015
As Google becomes Alphabet, the company seems to be taking a new approach to its investments. Business Insider declares, “Google Slammed the Brakes on its Acquisition Machine, with the Lowest Deal-Making Since 2009.” The article references Google’s 10Q quarterly earnings report, and compares that quarter’s acquisition total of $250 million to the company’s speeding sprees of years past; see the post for details. Writer Alexai Oreskovic observes:
“The M&A slowdown comes as Google has transformed itself into the Alphabet holding company, which separates various Google projects, such as fiber-based internet access, and Nest into separate companies. It also comes as new CFO Ruth Porat has taken steps to make Google more disciplined about its spending, and to return some cash to shareholders through buybacks. Stock buybacks and slowing M&A — perhaps this is the new Google. Or perhaps Google is just taking a breather on its acquisitions to digest all the companies it has swallowed up over the years. Asked about the slowing M&A, a Google representative responded by email: ‘Acquisitions by their nature are inherently lumpy and don’t follow neat 9 month patterns.’”
Well, that’s true, I suppose, as far as it goes. We hope this turn to fiscal discipline does not portend trouble for Google/ Alphabet. What is the plan? We are curious to see where the company goes from here.
Cynthia Murrell, December 3, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Big Data: Some Obvious Issues
November 30, 2015
Imagine this. Information Week writes an article which does not mix up marketing jargon, cheerleading, and wild generalizations. No. It’s true. Es verdad.
Navigate to “Big Data & The Law Of Diminishing Returns.” The write up is a recycling of comments from and Ivory Tower type at Harvard University. Not enough real world experience? Never fear, the poobah is Cathy O’Neil, who worked in the private sector.
Here are her observations as presented by the estimable Information Week real journalists. Note: my observation appears in italics after the words of wisdom.
“The [Big Data] technology is encouraging people to use algorithms they don’t understand.” My question: How many information professionals got an A in math?
“Know what you don’t know. It’s hard.” My question: How does not know oneself if the self is trying to hit one’s numbers and work with concepts about which their information is skewed by Google boosted assumptions about one’s intelligence?
The write up includes this bummer of a statement to the point and click analytics wizards:
“I’d rather have five orthogonal modest data sets than one ginormous data set along a single axis…That is where the law of diminishing returns kicks in.” This is attributed to Caribou Hoenig, a venture capital firm. My question: What is ginormous?
The write up also reveals, without much in the way of questioning the analytic method, that IDC has calculated that the size of the Big Data market will be “$58.6 billion by the end of the year, and it would grow to $101.9 billion by 2019.”
Perhaps clear thinking about data begins with some thinking about where numbers come from, the validity of the data set, and the methods used to figure out the future.
Oh, right. That’s the point of the article. Too bad the write up ignores its own advice. I like that ginormous number in 2019. Yep, clear thinking about data abounds.
Stephen E Arnold, November 30, 2015
Crazy Numbers Department: Big Data Spending in 2019
November 26, 2015
It is almost 2016. IDC, an outfit owned by an optimistic outfit, has taken a tiny step forward. The IDC wizards answered this question, “How big will Big Data spending be in 2019?” Yep, that is 36 months in the future. There might be more money in predicting Super Bowl winners, what stock to pick, and the steps to take to minimize risk at a restaurant. But no.
According to the true believers in the Content Loop, “IDC Days Big Data Spending to Hit 48.6 Billion in 2019.” I like that point six, which seems to suggest that real data were analyzed exhaustively.
The write up reports:
The market for big data technology and services will grow at a compound annual growth rate (CAGR) of 23 percent through 2019, according to a forecast issued by research firm International Data Corp. (IDC) on Monday. IDC predicts annual spending will reach $48.6 billion in 2019. IDC divides the big data market into three major submarkets: infrastructure, software and services. The research firm expects all three submarkets to grow over the next five years, with software — information management, discovery and analytics and applications software — leading the charge with a CAGR of 26 percent.
I will go out on a limb. I predict that IDC will offer for sale three reports, maybe more. I hope the company communicates with its researchers to avoid the mess created when IDC wizard Dave Schubmehl tried to pitch eight pages of wonderfulness based on my research for a mere $3,500 without my permission. Ooops. Those IDC folks are too busy to do the contract thing I assumed.
A Schubmehl-type IDC wizard offered this observation with only a soupçon of jargon:
The ever-increasing appetite of businesses to embrace emerging big data-related software and infrastructure technologies while keeping the implementation costs low has led to the creation of a rich ecosystem of new and incumbent suppliers…. At the same time, the market opportunity is spurring new investments and M&A activity as incumbent suppliers seek to maintain their relevance by developing comprehensive solutions and new go-to-market paths.– Ashish Nadkarni, program director, Enterprise Servers and Storage, IDC
Yes, ever increasing and go to spirit. Will the concept apply to IDC’s revenues? Those thrilled with the Big Numbers are the venture folks pumping money into Big Data companies with the type of enthusiastic good cheer as Russian ground support troops are sending with the Backfires, Bears, and Blackjacks bound for Syria.
Thinking about international tension, my hunch is that the global economy seems a bit dicey, maybe unstable, at this time. I am not too excited at the notion of predicting what will happen in all things digital in the next few days. Years. No way, gentle reader.
Thinking about three years in the future strikes me as a little too bold. I wonder if the IDC predictive methods have been applied to DraftKings and FanDuel games?
Stephen E Arnold, November 26, 2015
Interview with Informatica CEO
November 26, 2015
Blogger and Datameer CEO Stefan Groschupf interviews Anil Chakravarthy, acting CEO of Informatica, in a series of posts on his blog, Big Data & Brews. The two executives discuss security in the cloud, data infrastructure, schemas, and the future of data. There are four installments as of this writing, but it was an exchange in the second iteration, “Big Data Brews: Part II on Data Security with Informatica,” that captured our attention. Here’s Chakravarthy’s summary of the challenge now facing his company:
Stefan: From your perspective, where’s the biggest growth opportunity for your company?
Anil: We look at it as the intersection of what’s happening with the cloud and big data. Not only the movement of data between our premise and cloud and within cloud to cloud but also just the sheer growth of data in the cloud. This is a big opportunity. And if you look at the big data world, I think a lot of what happens in the big data world from our perspective, the value, especially for enterprise customers, the value of big data comes from when they can derive insights by combining data that they have from their own systems, etc., with either third-party data, customer-generated data, machine data that they can put together. So, that intersection is good for, and we are a data infrastructure provider, so those are the two big areas where we see opportunity.
It looks like Informatica is poised to make the most of the changes prompted by cloud technology. To check out the interview from the beginning, navigate to the first installment, “Big Data & Brews: Informatica Talks Security.”
Informatica offers a range of data-management and integration tools. Though the company has offices around the world, they maintain their headquarters in Redwood City, California. They are also hiring as of this writing.
Cynthia Murrell, November 26, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph