Is IBM Delivering a Speedier File System?
May 27, 2012
IBM’s Parallel file system just got a makeover, but speed and synchronization took a hit in the process. The article IBM Parks Parallel File System on Big Data’s Lawn gives us a breakdown on the pros and cons of GPFS 3.5.
On a positive note, this is a big data friendly system. GPFS has a multi-cluster synchronous replication feature enabling a central site to be mirrored with remote sites. The user gets continuous file access to mirrored sites.
Clients lose some control with the new GPFS. Data access is only available at a lessor local network speed instead of high speed. Users also can’t control the amount of data they take in from mirrored sites.
GPFS adds additional user requirements as;
“IBM expected GPFS customers to use flash storage with de-clustered RAID to hold its specific metadata.”
“GPFS is pretty much independent of what goes on below the physical storage.”
“GPFS 3.5 can also be run in a shared-nothing, Hadoop-style cluster and is POSIX-compliant, unlike Hadoop’s HFS. GPFS 3.5 is big-data capable and can deliver big insights from a big insight cluster. This release of GPFS does not, however, have any HFS import facility.”
One might view the overall convenience as a balance to the issues. However, when speed and synchronicity are necessary, GPFS’s efficiency is put to question. We like the parallel file system, but we have to wonder if synchronization is a concern?
Jennifer Shockley, May 27, 2012
Comments
One Response to “Is IBM Delivering a Speedier File System?”
GPFS is one of IBM’s hidden crown jewels. Having worked in this area a while back, it really has some battle-tested performance and scalability that has withstood many challengers from inside and outside IBM. To get it into more hands because big data is trendy is a good thing.