March 18, 2014
It is the data equivalent of a distortion-free sound system— Karmasphere blogs about what they are calling “Full-Fidelity Analytics.” Karmashpere founder Martin Hall explains what the analytics-for-Hadoop company means by the repurposed term:
“Ensuring Full-Fidelity Analytics means not compromising the data available to us in Hadoop in order to analyze it. There are three principles of Full-Fidelity Analytics:
1. Use the original data. Don’t pre-process or abstract it so it loses the richness that is Hadoop
2. Keep the data open. Don’t make it proprietary which undermines the benefits of Hadoop open standards
3. Process data on-cluster without replication. Replication and off-cluster processing increases complexity and costs of hardware and managing the environment.
“By adhering to these principals during analytics, the data remains rich and standard empowering deep insights faster for companies in the era of Big Data.”
The post goes on to list several advantages to the unadulterated-data policy; Hall declares that it reduces complexity, lowers the total cost of ownership, and avoids vendor lock-in, to name a few benefits. The write-up also discusses the characteristics of a full-fidelity analytics system. For example, it uses the standard Hadoop metastore, processes analytics on-cluster, and, above all, avoids replication and sampling. See the post for more details about this concept. Founded in 2010, Karmasphere is headquartered in Cupertino, California.
Cynthia Murrell, March 18, 2014
March 17, 2014
The article titled HP Autonomy Unlocks Value of Clinical Data with HP Healthcare Analytics from Market Watch explores HP’s announcement of a new analytics platform for healthcare providers to use in their work to comprehend clinical data, both structured and unstructured. The new platform was created in a partnership between HP and Standford Children’s Health and Lucile Packard Children’s Hospital. It is powered by HP Idol. The article states,
“The initial results have already yielded valuable insights, and have the potential to improve quality of care and reduce waste and inefficiency.
Though the core mission of the Information Services Analytics team at Lucile Packard Children’s Hospital Stanford is to enable operational insights from structured clinical and administrative data, innovation projects are also a key strategic initiative of the group… The healthcare industry faces the enormous challenges of reducing cost, increasing operational efficiency and elevating the quality of patient care.”
Costs have gotten out of control and it is the hope of this collaboration that analytics might be the key. A huge part of problem is the unstructured data that is overlooked in the form of text in a patient’s records, notes from the doctor or emails between the doctor and patient. HP Idol’s ability to understand and categorize such information will make early diagnosis and early detection much more possible. For more information visit www.autonomy.com/healthcare.
Chelsea Kerwin, March 17, 2014
March 15, 2014
Run a query for Google Flu Trends on Google. The results point to the Google Flu Trends Web site at http://bit.ly/1ny9j58. The graphs and charts seem authoritative. I find the colors and legends difficult to figure out, but Google knows best. Or does it?
A spate of stories have appeared in New Scientist, Smithsonian, and Time that pick up the threat that Google Flu Trends does not work particularly well. The Science Magazine podcast presents a quite interesting interview with David Lazar, one of the authors of “The Parable of Google Flu: Traps in Big Data Analysis.”
The point of the Lazar article and the greedy recycling of the analysis is that algorithms can be incorrect. What is interesting is the surprise that creeps into the reports of Google’s infallible system being dead wrong.
For example, Smithsonian Magazine’s “Why Google Flu Trends Can’t Track the Flu (Yet)” states, “The vaunted big data project falls victim to periodic tweaks in Google’s own search algorithms.” The write continues:
A huge proportion of the search terms that correlate with CDC data on flu rates, it turns out, are caused not by people getting the flu, but by a third factor that affects both searching patterns and flu transmission: winter. In fact, the developers of Google Flu Trends reported coming across particular terms—those related to high school basketball, for instance—that were correlated with flu rates over time but clearly had nothing to do with the virus. Over time, Google engineers manually removed many terms that correlate with flu searches but have nothing to do with flu, but their model was clearly still too dependent on non-flu seasonal search trends—part of the reason why Google Flu Trends failed to reflect the 2009 epidemic of H1N1, which happened during summer. Especially in its earlier versions, Google Flu Trends was “part flu detector, part winter detector.”
Oh, oh. Feedback loops, thresholds, human bias—Quite a surprise apparently.
Time Magazine’s “Google’s Flu Project Shows the Failings of Big Data” realizes:
GFT and other big data methods can be useful, but only if they’re paired with what the Science researchers call “small data”—traditional forms of information collection. Put the two together, and you can get an excellent model of the world as it actually is. Of course, if big data is really just one tool of many, not an all-purpose path to omniscience, that would puncture the hype just a bit. You won’t get a SXSW panel with that kind of modesty.
Scientific American’s “Why Big Data Isn’t Necessarily Better Data” points out:
Google itself concluded in a study last October that its algorithm for flu (as well as for its more recently launched Google Dengue Trends) were “susceptible to heightened media coverage” during the 2012-2013 U.S. flu season. “We review the Flu Trends model each year to determine how we can improve—our last update was made in October 2013 in advance of the 2013-2014 flu season,” according to a Google spokesperson. “We welcome feedback on how we can continue to refine Flu Trends to help estimate flu levels.”
The word “hubris” turns up in a number of articles about this “surprising” suggestion that algorithms drift.
Forget Google and its innocuous and possibly ineffectual flu data. The coverage of the problems with the Google Big Data demonstration have significance for those who bet big money that predictive systems can tame big data. For companies licensing Autonomy- or Recommind-type search and retrieval systems, the flap over flu trends makes clear that algorithmic methods require baby sitting; that is, humans have to be involved and that involvement may introduce outputs that wander off track. If you have used a predictive search system, you probably have encountered off center, irrelevant results. The question “Why did the system display this document?” is one indication that predictive search may deliver a load of fresh bagels when you wanted a load of mulch.
For systems that do “pre crime” or predictive analyses related to sensitive matters, uninformed “end users” can accept what a system outputs and take action. This is the modern version of “Ready, Fire, Aim.” Some of these actions are not quite as innocuous as over-estimating flu outbreaks. Uninformed humans without knowledge of context and biases in the data and numerical recipes can find themselves mired in a swamp, not parked at the local Starbuck’s.
And what about Google? The flu analyses illustrate one thing: Google can fool itself in its effort to sell ads. Accuracy is not the point of Google or many other online information retrieval services.
Painful? Well, taking two aspirins won’t cure this particular problem. My suggestion? Come to grips with rigorous data analysis, algorithm behaviors, and old fashioned fact checking. Big Data and fancy graphics are not, by themselves, solutions to the clouds of unknowing that swirl through marketing hyperbole. There is a free lunch if one wants to eat from trash bins.
Stephen E Arnold, March 15, 2014
March 13, 2014
March 13, 2014
Microsoft partners are responsible for SharePoint add-ons that increase usability and efficiency for users. Webtrends is one such partner that offers an Analytics for SharePoint solution. Broadway World covers their latest announcement in the article, “Employee Adoption for SharePoint Soars With Webtrends Analytics.”
The article begins:
“Webtrends, a Microsoft-preferred partner for SharePoint analytics, today announced a 64% year-over-year increase in customer bookings for its Analytics for SharePoint business . . . Leveraging deep analytics expertise and use cases from customers like BrightStarr and Siemens, Webtrends highlights key insights and successes, including a preview of an analytics for Yammer solution, during the SharePoint Conference in Las Vegas, NV on March 3-6.”
Stephen E. Arnold has a lot to say about SharePoint from his platform, ArnoldIT.com. As a longtime search expert, Arnold knows that SharePoint’s success hinges on customization and add-ons, which allow an organization to take this overwhelming solution and make it work for them.
Emily Rae Aldridge, March 13, 2014
March 12, 2014
The article titled IBM and Thiess Collaborate on Predictive Analytics and Modeling Technologies on Mining-Technology.com explores the partnership of IBM and Thiess, an Australian construction, mining and service provider. The collaboration is centered on both predictive analytics in regards to maintenance and replacement information as well as early detection of malfunctions. The article states,
“Thiess Australian mining executive general manager Michael Wright said the analytics and modeling can offer great opportunities to improve business of the company. “Working with IBM to build a platform that feeds the models with the data we collect and then presents decision support information to our team in the field will allow us to increase machine reliability, lower energy costs and emissions, and improve the overall efficiency and effectiveness of our business,” Wright said.”
This is another big IBM bet. The collaboration will start with Thiess’s mining haul trucks and excavators. Models will be constructed around such information as inspection history of the equipment, weather conditions and payload size. These models will then be used to help make more informed decisions about operational performance, and will allow for early detection of anomalies as well as predictions about when a piece of equipment will require a replaced part. This will in turn allow Thiess to plan productions more accurately around the predicted health of a given machine.
Chelsea Kerwin, March 12, 2014
March 12, 2014
Investment site the Street is very enthused about Tableau Software, which went public less than a year ago. In fact, they go so far as to announce that “Tableau’s Building the ‘Google for Data’.” In this piece, writer Andrea Tse interviews Tableau CEO Christian Chabot. In her introduction, Tse notes that nearly a third of the company’s staff is in R&D—a good sign for future growth. She also sees the direction of Tableau’s research as a wise. The article explains:
“The research and development team has been heavily focused on developing technology that’s free of skillset constraints, utilizable by everyone. This direction has been driven by the broad, corporate cultural shift to employee-centric, online-accessible data analytics, from the more traditional, hierarchical or top-down approach toward data analysis and dissemination.
“Tableau 9 and Tableau 10 that are in the product pipeline and soon-to-be-shipped Tableau 8.2 are designed to highlight ‘storytelling’ or visually striking data presentation.
“Well-positioned to ride the big data wave, Tableau shares, as of Tuesday’s [February 11] intraday high of $95, are now trading over 206% above its initial public offering price of $31 set on May 16.”
In the interview, Chabot shares his company’s research philosophy, touches on some recent large deals, and takes a gander at what’s is ahead. For example, his developers are currently working hard on a user-friendly mobile platform. See the article for details. Founded in 2003 and located in Seattle, Tableau Software grew from a project begun at Stanford University. Their priority is to help ordinary people use data to solve problems quickly and easily.
Cynthia Murrell, March 12, 2014
March 11, 2014
Attensity has been a quiet sentiment, analytics, text processing vendor for some months. The company has now released a new version of its flagship product, Analyze, now at version 6.3. The headline feature is “enhanced analytics.”
According to a company news release, Attensity is “the leading provider of integrated, real-time solutions that blend multi-channel Voice of the Customer analytics and social engagement for enterprise listening needs.” Okay.
The new version of Analyze delivers to licensees real time information about what is trending. The system provides “multi dimensional visualization that immediately identifies performance outliers in the business that can impact6 the brand both positively and negatively.” Okay.
The system processes over 150 million blogs and forums, Facebook, and Twitter. Okay.
As memorable as these features are, here’s the passage that I noted:
Attensity 6.3 is powered by the Attensity Semantic Annotation Server (ASAS) and patented natural language processing (NLP) technology. Attensity’s unique ASAS platform provides unmatched deep sentiment analysis, entity identification, statistical assignment and exhaustive extraction, enabling organizations to define relationships between people, places and things without using pre-defined keywords or queries. It’s this proprietary technology that allows Attensity to make the unknown known.
“To make the unknown known” is a bold assertion. Okay.
I have heard that sentiment analysis companies are running into some friction. The expectations of some licensees have been a bit high. Perhaps Analyze 6.3 will suck up customers of other systems who are dissatisfied with their sentiment, semantic, analytics systems. Making the “unknown known” should cause the world to beat a path to Attensity’s door. Okay.
Stephen E Arnold, March 11, 2014
March 11, 2014
Butler Analytics collected a list of “20+ Text Analytics Platforms” that delve through the variety of text analytics platforms available and what their capabilities are. According to the list, text analytics has not reached its full maturity yet. There are three main divisions in the area: natural language processing, text mining, and machine learning. Each is distinct and each company has their own approach to using these processes:
“Some suppliers have applied text analytics to very specific business problems, usually centering on customer data and sentiment analysis. This is an evolving field and the next few years should see significant progress. Other suppliers provide NLP based technologies so that documents can be categorized and meaning extracted from them. Text mining platforms are a more recent phenomenon and provide a mechanism to discover patterns that might be used in operational activities. Text is used to generate extra features which might be added to structured data for more accurate pattern discovery. There is of course overlap and most suppliers provide a mixture of capabilities. Finally we should not forget information retrieval, more often branded as enterprise search technology, where the aim is simply to provide a means of discovering and accessing data that are relevant to a particular query. This is a separate topic to a large extent, although again there is overlap.”
Reading through the list shows the variety of options users have when it comes to text analytics. There does not appear to be a right or wrong way, but will the diverse offerings eventually funnel
down to few fully capable platforms?
March 7, 2014
Infogistics calls itself a leading company in text analysis, document retrieval, and text extraction for various industries. One would not think that after visiting their Web site that has not been updated since 2005. The company does, however have a new vested interest in DaXtra Technologies, its new endeavor to provide content processing solutions for personnel and human resources applications.
Here is an official description from the Web site:
“For almost a decade we’ve been at the forefront of technology and solutions within our marketplace, giving our customers the competitive edge in their challenge to source the best available jobseekers, and find them quickly. Over 500 organizations, spanning all continents, use our resume analysis, matching and search products – from the world’s largest staffing companies to boutique recruiters, corporate recruitment departments, job boards and software vendors. This global reach is made possible via our multilingual CV technology which can automatically parse in over 25 different languages.”
DaXtra’s products include DaXtra Capture-a recruitment management software, DaXtra Search, DaXtra Parser-turns raw data into structured XML, DaXtra Components-to manage Web services, and DaXtra Analytics to come in 2014. The company appears to make top of the line personnel software that deletes the confusion in HR departments. What is even better is that the Web site is updated.
March 4, 2014
Bayes’s Theorem is the founding basis for predictive analytics. Gigaom’s article tries to explain how not only Bayes’s Theorem is used in predictive analytics, but there is another factor: “How the Solution To the Monty Hall Problem Is Also The Key To Predictive Analytics.”
The Monty Hall Problem is named after the Let’s Make a Deal host. Here is how it works:
“The show used what came to be known as the Monty Hall Problem, a probability puzzle named after the original host. It works like this: You choose between three doors. Behind one is a car and the other two are Zonks. You pick a door – say, door number one – and the host, who knows where the prize is, opens another door – say, door number three – which has a goat. He then asks if you want to switch doors. Most contestants assume that since they have two equivalent options, they have a 50/50 shot of winning, and it doesn’t matter whether or not they switch doors. Makes sense, right?”
If a data scientist had been on the show, he would have used Bayes’s Theorem to win the prize. The solution is to switch doors.
The Monty Hall Problem is used in business, but Bayes’s Theorem is becoming more widespread. It is used to link big data and cloud computing, which also powers predictive analytics. What follows is an explanation of the theorem’s importance and impact on business, which is not new. It ends with encouraging people to rely on Bayes over Monty Hall.
What will the next metaphor comparison be?