Hadoop Officially a Big Deal for Big Data

June 26, 2012

Hadoop, our favorite batch-processing data management system, is now more important than ever. InfoWorld reveals in “Hadoop Becomes Critical Cog in the Big Data Machine.” The previous version of Apache‘s Hadoop has been adopted by more and more organizations with vast swaths of data to manage. Many users develop their own technologies to complement the Hadoop stack.

Writer Paul Krill details ways NASA, Twitter, Netflix , and Tagged use Hadoop technology, as well as challenges each has faced with the software. Recommended reading for anyone with Hadoop in their lives.

Regarding the upcoming version, the article cites Eric Baldeschwieler, CTO of HortonWorks, a company which has contributed to Hadoop. The write up tells us:

“Hadoop 2.0 focuses on scale and innovation, with Yarn (next-generation MapReduce) and federation capabilities. Yarn will let users add their own compute models so that they do not have to stick to MapReduce. ‘We’re really looking forward to the community inventing many new ways of using Hadoop,’ Baldeschwieler says. Expected uses include real-time applications and machine-learning algorithms. Scalable, pluggable storage is planned also. Always-on capabilities in Version 2.0 will enable clusters with no downtime. Scalable storage is planned as well.”

Notice that MapReduce has been renamed Yarn; the entire layer has been rewritten. Expect Hadoop 2.0 to be generally available within the year.

Cynthia Murrell, June 26, 2012

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.