Big Thoughts on Big Data
May 26, 2012
CorrelSense recently reported on one of the hottest IT trends to date in the article, “Big Data is Truly Transforming the Enterprise.”
According to the MIT’s principal research scientist, Andrew McAfee, Big Data can be likened to the invention of the Microscope in the sense that it exposes information that we couldn’t have found before the way that the Microscope allows you to view things that previously could not be seen.
The article states:
“As IT Pros, you are going to have to learn to process this big data and find tools for the non-technical experts and suits in the C-Suite to mix and match the data. The big difference between this and traditional business intelligence is that with BI you were looking back where you were at a given point in time, whereas with Big Data, you can analyze data in real time and begin to make more intelligent decisions about where to put your resources at any given moment.”
Rather than reducing jobs, as many people fear that technological progression may do, it rather will create them. We’re obviously going to need more people to decipher through this growing pile of data.
Jasmine Ashton, May 26, 2012
Sponsored by PolySpot
SAP Big Blue Rides Hana
May 25, 2012
The University of Kentucky‘s business intelligence team has had to make some adjustments after the school implemented SAP‘s HANA system. ComputerWorld declares, “For Univ. of Kentucky, SAP’s HANA is ‘Disruptive’.” Writer Patrick Thibodeau, punning on the term “disruptive technology,” notes that the University is (purposely) using HANA to restructure its BI system to better analyze student retention.
The new in-memory systems like HANA pull data from RAM instead of from hard disks. Speed and relative simplicity are the advantages, but these systems do require a hardware investment. In this case, Dell provided the hardware and developed the school’s student retention data models.
HANA is only a year old, and questions about its longevity are still in the air. Part of the issue is the hardware question—should organizations deploy on the tried and true x86 system or go with an engineered system, like IBM’s new PureSystems. Thibodeau writes:
“Engineered systems offer performance gains, meaning faster time to realize value and ‘less cumbersome’ management, said Alys Woodward, a research director at IDC. On the other hand, ‘software on commodity hardware reduces vendor lock-in and enables the use of cheaper components,’ said Woodward.
“How SAP HANA ‘will play in the broader marketplace — outside SAP’s core install base — against Oracle Exadata and IBM engineered systems, depends to some extent on how these two opposing concepts will play out,’ said Woodward.”
So, x86 or engineered, take your pick. If you are considering HANA, though, the write up notes that you should make sure it will do what you want before buying the pricey software. It will not, for example, make up for poor data quality. It is also more worth the cost and effort someplace where business requirements change frequently than for an organization with a more static environment.
Cynthia Murrell, May 25, 2012
Sponsored by PolySpot
Talend Updates Open Studio Applications
May 19, 2012
Talend’s Open Studio platform offers more business intelligence and big data with an enhancement: master data management. The H Open describes the updates found in the most recent version in “Talend Updates Data Tools to 5.1.0.”
Based on open source Eclipse, the Open Studio environment hosts Talend’s Data Integration, Big Data, Data Quality, Master Data Management, and Enterprise Service Bus (ESB). A user-friendly GUI allows users to define processes. The write up specifies that the updates give Open Studio:
“. . .enhanced XML mapping and support for XML documents in its SOAP, JMS, File and Mom components. A new component has also been added to help manage Kerberos security. Open Studio for Data Quality has been enhanced with new ways to apply an analysis on multiple files, and the ability to drill down through business rules to see the invalid, as well as valid, records selected by the rules.
“ESB and Open Studio for ESB appear to be the most revised of the products, with the release notes documenting improvements to the REST and SOAP services, an improved route builder, and improvements to the runtime system . . . . Open Studio for Master Data Management has seen enhancements in the development environment, with searching and filtering available as ways to view an entity, and in the web user interface with improvements in visual cues, easier image storage and resizable sliding panels.”
Talend ESB and Big Data are under the Apache 2.0 License. Open Studio for ESB, Data Integration, Data Quality, and MDM are under the GPLv2.
Talend is a leading open source vendor, providing middleware for both data management and application integration. The company was already a leader in open source data management when its 2010 acquisition of Sopera boosted its standing in the open source middleware market. The company takes pride in providing powerful and flexible open solutions for all sorts of organizations, great and small.
Cynthia Murrell, May 19, 2012
Sponsored by PolySpot
MapReduce: A Summary
May 19, 2012
Want to know about MapReduce? Here you go:
Remember. Think batch processing.
Stephen E Arnold, May 19, 2012
Sponsored by Ikanow
MarkLogic: The Door Revolves
May 17, 2012
MarkLogic hit $55 or $60 million. Not good enough. Exit one CEO; enter an Autodesk exec. Hit $100 million. Not good enough. Enter a new CEO. Navigate to “Former senior Oracle exec Gary Bloom named CEO of Mark Logic.” The new CEO is either going to grow the outfit or get it sold if I understand the write up. Here’s a passage which caught my attention:
Gary Bloom has been named CEO of Mark Logic, which returns him to his database roots.
According to MarkLogic’s Web page, the company is:
an enterprise software company powering over 500 of the world’s most critical Big Data Applications with the first operational database technology capable of handling any data, at any volume, in any structure.
However, I can download a search road map. Hmmm. I thought search was dead. Well big data search is where the action is. MarkLogic is pushing forward with its XML data management system.
Stephen E Arnold, May 17, 2012
Sponsored by HighGainBlog
IBM Asserts Its i Technology Can Handle XML
May 9, 2012
IBM asserts that DB2 can do big data, including XML in IBM Systems Magazine’s “i Can Use XML in a Relational World.” Blogger and IBM employee Nick Lawrence writes:
“In this most recent round of announcements, IBM has included support for the XMLTABLE table function in SQL. XMLTABLE is designed to convert an XML document into a relational result set (rows and columns) using popular XPath expressions. This function has been referred to as the Swiss army knife for working with XML because it can help solve a wide variety of XML related problems.”
Lawrence recommends a good XML TABLE tutorial, located in the SQL XML Reference in IBM’s Info Center. He also identifies and elaborates upon areas that he says could use some more clarification. For example, a way to create an XML response document that involves creating the document “inside out.” I guess that’s a technical term?
It’s a helpful piece if that’s the route you want to travel. However, it involves lots of code, lots of fiddling. A bit like mining asteroids we think.
Our question: Why not use a NoSQL data management system? After all, big data is what those do best.
Cynthia Murrell, May 9, 2012
Sponsored by PolySpot
Oracle and SAP: The Milagro Database War
May 3, 2012
I received an email inducing me to read “Hana and Exalytics: SAP’s Hype Versus Oracle’s FUD.” The write up takes a serious or at least semi serious at Milagro database war. If you are not familiar with the Milagro Beanfield War, you might find the write up a loose allegory of what’s happening in traditional data management companies and the NoSQL farmers.

The Information Week write up does not talk about the real story, however. What we get is two giants of traditional enterprise software squabbling over which traditional data management system is most likely to keep the Fortune 1000, government agencies, and big educational institutions within the traditional enterprise software corral.
With regard to Oracle, the write up asserts:
Oracle’s Larry Ellison and Safra Catz have missed few opportunities to discredit Hana in recent months. But executive VP Thomas Kurian took the slams a level deeper on Friday with a one-hour Webinar clearly intended to sow seeds of fear, uncertainty and doubt in the minds of would-be Hana customers. The session was billed as an Exalytics seminar, but each point set up a contrast with Hana. Kurian claimed, among other things, that SAP’s product costs five times to 50 times more than Exalytics and that it doesn’t support SQL (relational) or MDX (multidimensional) query languages, requiring apps to be rewritten to run on the new database.
The Information Week write up reports:
SAP’s hype about these apps is getting a little ahead of deployed market reality. Both Hana and Oracle Exalytics can point to dramatic before-and-after differences in query speeds. (Even SAP grants that Exalytics can accelerate queries.) SAP says the real payoff from Hana will be in transforming business processes, not just accelerating queries. But we haven’t seen enough solid, real-world customer examples documenting transformed business competitiveness.
Datameer Has a New Analytics Toy
April 5, 2012
According to Marketwatch.com, Datameer, Inc, a provider of Apache built end user analytics solutions, announced the release Datameer 1.4 in “Datameer Releases a Major New Version of Analytics Platform. Datameer 1.4” improves functionality in data management, user and data security, and expanded support for data source adaptors, Hadoop, Cloudera, and IBM. We learned:
The new features in Datameer 1.4 demonstrate that Datameer is committed to delivering what customers want with an emphasis on quality and ease of use,” stated David Cornell, Software Development Manager at SophosLabs. “We are particularly excited to see support for partitioning which will dramatically enhance report generation performance.
Datameer 1.4 was released to meet the growing demands of the company’s clients. As the only Apache Hadoop analytics solution, Datameer builds solutions to aid businesses in linear scalability and cost-effectiveness to analyze/, integrate, and visualize structured and unstructured data. Datameer is a company that relies on open source software and is working hard to make a name for themselves in the business world.
The hook for this new release may be performance. Speed, more than fancy analytics, is becoming more important.
Whitney Grace, April 5, 2012
Sponsored by Pandia.com
Publishers Pose Threats to Text Mining Expansion
March 26, 2012
Text mining software is all the rage these days due to its ability to make significant connections by quickly scanning through thousands of documents. This software can recognize, extract and index scientific information from vast amounts of plain text, allowing computers to read and organize a body of knowledge that is expanding too fast for any human to keep up with. However, Nature.com recently reported on a some issues that have developed in this growing industry in the article “Trouble at the Text Mine.”
According to the article, text mining programmers Max Haeussler and Casey Bergman have run into trouble trying to get science publishers to agree to let them mine their content.
The article asserts:
Many publishers say that they will allow their subscribers to text-mine, subject to contract and the text-miners’ intentions, and point to a number of successful agreements. But like many early advocates of the technology, Haeussler and Bergman complain that publishers are failing to cope with requests, and so are holding up the progress of research. What is more, they point out, as text-mining expands, it will be impractical for individual academic teams to spend years each working out bilateral agreements with every publisher.
While some publishers are getting on board the text mining train, many are still trying to work out how to take advantage of the commercial value before signing on. Too bad it takes more than a degree in English to make text mining deliver useful results. Bummer.
Jasmine Ashton, March 26, 2012
Sponsored by Pandia.com
Big Data, Small Talent Pool
March 24, 2012
It may be big data’s biggest issue; Government Computer News asks “Big Data’s Big Question: Where Are the Data Scientists?” Writer Rutrell Yasin explains:
“Even as organizations are trying to define the role of those tasked with analyzing and managing the new phenomenon of big data, people capable of that job are already projected to be in short supply.
“The move from a network-centric to a data-rich environment requires a different skill set, John Marshall, CTO of the Directorate of Intelligence J2 with the Joint Chiefs of Staff, said March 6 during a forum on big data. . . .
“A recent study reported that shortages of qualified workers who understand the power of big data is estimated to be between 140,000 and 190,000 people by 2018, Marshall said.”
Students are beginning to exit college with data analytics and data mining skills, but there may not be enough to fill the gap, especially in the public sector. There are professionals who have developed the required subject matter, math, and programming skills, but most of them are content to retain their lucrative jobs in Silicon Valley or New York.
The article does note that the broad term “data scientist” is akin to “doctor,” in that there are specialists within the field. Michael Lazar, a former intelligence community member who is now a senior solutions architect with VMware, recommends that public sector organizations internally train their people to meet their unique data analysis and management needs.
Though the article focuses on government organizations, it is a relevant read for anyone interested in big data. Also, it suggests a potentially lucrative field for young people looking to build a career in a difficult economy.
Stephen E. Arnold, March 24, 2012
Sponsored by Pandia.com


