Commonsense Conclusions from Azure Consultant
June 26, 2014
An azure chip consultant explains that enterprise search is just what the doctor ordered for big data; Gartner declares that “Enterprise Search Can Bring Big Data Within Reach.” Um, it seems that search is kind of implied in the term. What good are data and data analysis if there is no way to find and use the information? Actually, research director Darin Stewart is talking about gaining the benefits of big data without the cost and turmoil of “overhauling the data center,” which, apparently, some folks don’t understand is an option. He writes:
“Providing big data functionality without overhauling the data center is achievable with a two-pronged approach that combines augmented enterprise search and distributed computing. Search is very good at finding interesting things that are buried under mounds of information. Enterprise search can provide near-real-time access across a wide variety of content types that are managed in many otherwise siloed systems. It can also provide a flexible and intuitive interface for exploring that information. The weakness of a search-driven approach is the fact that a search application is only as good as the indexes upon which it is built….
“Distributed computing frameworks provide the environment necessary to create these indexes. They are particularly well-suited to efficiently collect extremely large volumes of unprocessed, individually low-value pieces of information and apply the complex analytics and operations that are necessary to transform them into a coherent and cohesive high-value collection. The ability to process numerous source records and apply multiple transformations in parallel dramatically reduces the time that is required to produce augmented indexes across large pools of information.”
The article goes on to point out that cloud-friendly open-source tools to support such a framework are readily available. Stewart shares a link to his Gartner document on the topic (registration required), and is scheduled to speak about it at Gartner’s Catalyst Europe conference, held in London in mid-June.
Cynthia Murrell, June 26, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Big Data for Enterprise Logistics
June 26, 2014
The complex field of logistics and transport management is one that can surely benefit from data analysis. Inbound Logistics brings the benefits to the attention of its readers in, “Big Data Tools Enable Predictive and Prescriptive Analytics.” Writer Shannon Vaillancourt advises that, since the cost of implementing data systems has decreased, now is the time for companies to leverage these tools to understand and adjust their transportation patterns. He writes:
“By leveraging the big data tools that are becoming more prevalent, companies can quickly spot trends that would otherwise have gone unnoticed. Many people are under the impression that big data only refers to a large amount of data. The second definition of big data is that the dataset is too difficult to process using traditional data processing applications. When it comes to supply chain operations, many large companies are still dependent on using a spreadsheet to manage a very complex global part of the business.
“With big data tools, shippers can move past the business intelligence side of measuring and diagnosing, and move into the predictive and prescriptive side. A big data tool will allow transportation teams to have fewer experienced supply chain staff members, because the data will be more actionable.”
Stewart seems to acknowledge the shortfalls of current prescriptive algorithms; he reassures readers that the prescriptive side will be more useful as the technology evolves. Right now, we know it as the algorithm that tells us to buy more stuff at Amazon. Someday soon, though, it might accurately tell a manager which means of transport will most efficiently get a certain shipment to its destination.
It is interesting to watch as the big data trend spreads into different industries. As the hype fades, more of the truly useful applications will become clear.
Cynthia Murrell, June 26, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Visage Dubbed the Latest in Visualization Software
June 11, 2014
The big data boom has made it vital for many companies to quickly and easily translate data points into pretty pictures. Now, FastCompany reports on the latest tool to help with that in, “A Tool for Building Beautiful Data Visualizations.” Of course, there are many programs that supply custom graphics templates for data visualization, and writer Margaret Rhodes links to an article that describes 30 of them. What makes this latest tool, called Visage, different enough to entice designers at prominent sites like Mashable, MSNBC, and A&E? It’s all about the flexibility, especially in branding. Rhodes explains:
That on-brand bit is where Visage shines. “There’s a spectrum of how people define ‘on brand,'” [Visage co-founder Jake] Burkett tells Co.Design. “What’s sufficient for most people are fonts and color palettes.” Visage’s tool has what Burkett calls “canned logos and color palettes,” but they’ll also produce templates in customized color selections–free of charge. “Give us your brand guidelines; it just takes us a little time,” Burkett says.
For taller orders, the Visage team will build a more sophisticated set of tools to channel a company’s visual language. They’ll be available only for the company in question, and even be made as open-source templates, so designers on staff can tweak them on the go. “Everything is designed to be dynamic,” Burkett says.
The Visage platform was launched in 2012 by Column Five Media. Founded in 2009, the design and branding company is based in Newport Beach, California, and maintains an office in Brooklyn, New York City. They also happen to be hiring as of this writing.
Cynthia Murrell, June 11, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Palantir Advises More Abstraction for Less Frustration
June 10, 2014
At this year’s Gigaom Structure Data conference, Palantir’s Ari Gesher offered an apt parallel for the data field’s current growing pains: using computers before the dawn of operating systems. Gigaom summarizes his explanation in, “Palantir: Big Data Needs to Get Even More Abstract(ions).” Writer Tom Krazit tells us:
“Gesher took attendees on a bit of a computer history lesson, recalling how computers once required their users to manually reconfigure the machine each time they wanted to run a new program. This took a fair amount of time and effort: ‘if you wanted to use a computer to solve a problem, most of the effort went into organizing the pieces of hardware instead of doing what you wanted to do.’
“Operating systems brought abstraction, or a way to separate the busy work from the higher-level duties assigned to the computer. This is the foundation of modern computing, but it’s not widely used in the practice of data science.
“In other words, the current state of data science is like ‘yak shaving,’ a techie meme for a situation in which a bunch of tedious tasks that appear pointless actually solve a greater problem. ‘We need operating system abstractions for data problems,’ Gesher said.”
An operating system for data analysis? That’s one way to look at it, I suppose. The article invites us to click through to a video of the session, but as of this writing it is not functioning. Perhaps they will heed the request of one commenter and fix it soon.
Based in Palo Alto, California, Palantir focuses on improving the methods their customers use to analyze data. The company was founded in 2004 by some folks from PayPal and from Stanford University. The write-up makes a point of noting that Palantir is “notoriously secretive” and that part(s) of the U.S. government can be found among its clients. I’m not exactly sure, though, how that ties into Gesher’s observations. Does Krazit suspect it is the federal government calling for better organization and a simplified user experience? Now, that would be interesting.
Cynthia Murrell, June 10, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Software AG Happy About JackBe
May 30, 2014
Business Wire via Sys Con has some great news: “Software AG’s Acquisition Of JackBe Recognized As Strategic M&A Deal Of The Year.” Software AG is a big data, integration, and business process technologies firm driven to help companies achieve their desired outcomes. With the acquisition of real time visual analytics and intelligence software provider JackBe will be the foundation for Software AG’s new Intelligent Business Operations Platform. The acquisition even garnered attention from the Association for Corporate Growth and was recognized as the Strategic M&A deal of the year in the $100 million category.
JackBe will allow Software AG to offers its clients a broader range of enterprise functions in real-time, especially in areas related to the Internet of Things and customer experience management.
“The real-time analysis and visualization of massive amounts of data is increasingly becoming the basis for fast and intelligent business decisions. With the capabilities of JackBe integrated in its Intelligent Business Operations platform, Software AG has been able to provide customers with a comprehensive 360-degree view of operational processes by combining live, historical and transactional data with machine-to-machine communications.”
Purchasing JackBe was one of the largest big data deals in 2013 and it also proves that technology used by the US government can be turned into a viable commercial industry.
Software AG definitely has big plans for 2014. Will they continue to make headlines this year?
Whitney Grace, May 30, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Centrifuge Says It Offers More Insights
May 29, 2014
According to a press release from Virtual Strategy, Centrifuge Systems-a company that develops big data software-has created four new data connectors within its visual link analysis software. “Centrifuge Expands Their Big Data Discovery Integration Footprint,” explains that with the additional data software users will be able to make better business decisions.
“ ‘Without the ability to connect disparate data – the potential for meaningful insight and actionable business decisions is limited,’ says Stan Dushko, Chief Product Officer at Centrifuge Systems. ‘It’s like driving your car with a blindfold on. We all take the same route to the office every day, but wouldn’t it be nice to know that today there was an accident and we had the option to consider an alternate path.’ ”
The new connectors offer real time access to ANX file structure, JSON, LDAP, and Apache Hadoop with Cloudera Impala. Centrifuge’s entire goal is to add more data points that give users a broader and more detailed perspective of their data. Centrifuge likes to think of itself as the business intelligence tool of the future. Other companies, though, offer similar functions with their software. What makes Centrifuge different from the competition?
Whitney Grace, May 29, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
MapR Integrates Elasticsearch into Platform
May 7, 2014
Writer Christopher Tozzi opens his Var Guy article, “MapR, Elasticsearch Partner on Open Source Big Data Search,” with a good question: With so many Hadoop distributions out there, what makes one stand out? MapR hopes an integration with Elasticsearch will help them with that. The move brings to MapR, as the companies put it, “a scalable, distributed architecture to quickly perform search and discovery across tremendous amounts of information.” They report that several high-profile clients are already using the integrated platform.
Tozzi concludes with an interesting observation:
“From the channel perspective, the most important part of this story is about the open source Hadoop Big Data world becoming an even more diverse ecosystem where solutions depend on collaboration between a variety of independent parties. Companies such as MapR have been repackaging the core Hadoop code and distributing it in value-added, enterprise-ready form for some time, but Elasticsearch integration into MapR is a sign that Hadoop distributions also need to incorporate other open source Big Data technologies, which they do not build themselves, to maximize usability for the enterprise.”
It will be interesting to see how that need plays out throughout the field. MapR is headquartered in San Jose, California, and was launched in 2009. Formed in 2012, Elasticsearch is based in Amsterdam. Both Hadoop-happy companies maintain offices around the world, and each proudly counts some hefty organizations among their customers.
Cynthia Murrell, May 07, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Big Data: Can the Latest Trend Deliver?
April 25, 2014
If you track Big Data, you will want to read “Why Big Data Is Stillborn (for Now).” The write up hits the highlights of the flickering hyperbole machine that sells fancy math to the government and organizations desperate for a Silver Bullet.
The article asserts:
Most “big data” has to be moved in physical containers. Most data centers do not have excess capacity to handle petabyte level simultaneous search and pattern discovery.
Believe in real time and high speed access? Consider this statement:
Bandwidth, throughput, and how “real time” is defined all come down to the weak link in the chain and we have many ***very weak*** links across the chain and especially in Washington, D.C. The bottom line is always “who benefits?” The FCC decision to destroy net neutrality is in error. The citizen, not the corporation, is “root” in a Smart Nation.
If you wonder why your Big Data investments have yet to deliver a golden goose pumping out 24 caret eggs everyday, check out this write up. Worth reading.
Stephen E Arnold, April 25, 2014
Caution Advised on Big Data
April 25, 2014
Someone is once again raining on the big data parade, urging us to consider carefully before jumping on the bandwagon. FT Magazine warns, “Big Data: Are We Making a Big Mistake?” Writer Tim Harford points to Google’s much-lauded Google Flu Trends as an emblematic example in the field. That project notes an increase in certain search terms, like “flu symptoms” or “pharmacies near me”, by point of origin. With those data points, its algorithm extrapolates the spread of the disease. In fact, it does so with only one day’s delay, compared to a week or more for the CDC’s analysis based on doctors’ reports.
The thing is, this successful project is also an example of the blind faith many are putting into the results of data analysis. The scientists behind it aren’t afraid to admit they don’t know which search terms are most fruitful or how, exactly, its algorithm is constructing its correlations—it’s all about the results. Correlation over causation, as Harford puts it. However, Google Flu Trends hit a speed bump in 2012: it greatly over-estimated the flu’s spread, unnecessarily alarming the public. Correlation is much, much easier to determine than causation, but we must not let ourselves believe it is just as good.
The article cautions:
“Cheerleaders for big data have made four exciting claims, each one reflected in the success of Google Flu Trends: that data analysis produces uncannily accurate results; that every single data point can be captured, making old statistical sampling techniques obsolete; that it is passé to fret about what causes what, because statistical correlation tells us what we need to know; and that scientific or statistical models aren’t needed because, to quote ‘The End of Theory’, a provocative essay published in Wired in 2008, ‘with enough data, the numbers speak for themselves’.
“Unfortunately, these four articles of faith are at best optimistic oversimplifications. At worst, according to David Spiegelhalter, Winton Professor of the Public Understanding of Risk at Cambridge university, they can be ‘complete bollocks. Absolute nonsense.'”
Another quote from Spiegelhalter summarizes the problem with letting ourselves be seduced by big data’s promise of certainty: “There are a lot of small data problems that occur in big data. They don’t disappear because you’ve got lots of the stuff. They get worse.” The article goes on to discuss in detail the statistical flaws behind big data’s promises. It is an important read for anyone facing the alluring shimmer of the big data trend.
Cynthia Murrell, April 25, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Hadoop Bridges Gap between IT and Business
April 24, 2014
The IT side of the coin and the business side of the coin never really seem to be looking in the same direction, do they? Which is a shame, because so much productivity is lost over this battle at work. Thankfully, we are not the only ones thinking so. Some even have solutions, as we found in a recent Information Week story, “Big Data Forces IT and Business to get Synced.”
The answer, according to the story:
Hadoop, the foundation of HGST’s BDP, is particularly well suited to breaking through data silos. Traditional relational databases store their data in well-defined table structures, and therefore require detailed data modeling before a single row of data can be loaded. Hadoop, on the other hand, simply stores its data as files on its distributed file system, greatly streamlining the data loading process.
Hadoop does seem to be a solid example in our book, too. This is evidence, as is a recent story in InfoWorld about Lucidworks teaming with Hadoop to bridge that gap even more. We like what we are seeing and have no doubt the business and IT worlds will mesh if they keep thinking like this.
Patrick Roland, April 24, 2014
Sponsored by ArnoldIT.com, developer of Augmentext