Aster Data
An Interview with Quentin Gallivan
In 2005 when I wrote The Google Legacy, I worked through an interesting Google paper about what is now known as MapReduce. That paper signaled greater awareness of what I described as "next-generation data management systems." By the time I finished Google: The Digital Gutenberg in 2009, a number of companies had rolled out products that used technology to push the boundaries of data management, analysis, and user interaction. Google was not the only trigger, but the visibility of the company has put it in a position of being perceived as a trigger. I learned about Aster Data when my Overflight information systems kicked out references to the company. I knew little about it when I first encountered the name of the company in 2005, just as I was wrapping up work on The Google Legacy. At the time, I plugged the company name into Overflight and checked on the information collected every quarter or so. |
By 2009, Aster Data was getting traction. The firm rolled out its Massively Parallel Data Application Server and captured some interesting clients. Among those I noted were LinkedIn and Akamai. I also learned that its backers included Sequoia Capital, Institutional Venture Partners, and a handful of other highly-regard entities. I describe investors like Aster Data's as "smart money".
Earlier this week, I was able to meet up with Quentin Gallivan, Aster Data's new CEO for a fruit juice at the San Carlos, Calif. Jamba Juice. The full text of my interview with Mr. Gallivan appears below:
Aster Data continues to disrupt the somewhat staid world of data management. In fact, the World Economic Forum singled out Aster Data as a "pioneer". What's the cause of this attention and "magnetism"?
It has definitely been a staid market for some time now with not much having changed over the last three decades since the emergence of the early relational databases. However, with the continuing explosion of data across organizations, the pre-existing $20 billion market is being uprooted.
Data in enterprise organizations is growing at a rate of 60% per year. Internet organizations are experiencing 100% data growth per year. One terabyte used to be considered a lot five years ago. Now 60 percent of companies have over 10 terabytes of data to manage and many companies have hundreds of terabytes to petabytes.
So where does Aster Data fit?
Aster Data is breaking new ground in big data management and processing as recognized by the World Economic Forum and others. We’ve brought to market a new platform for big data management and advanced processing of data that provides a fundamental shift in how data will be stored and processed the next decade. We tap into the existing $20 billion market opportunity and open a new $7 billion market opportunity – take these numbers together and solutions like ours draw a lot of attention.
I know you recently joined Aster Data as its chief executive officer. What experience do you bring with you?
Previous to Aster Data I was CEO of a SaaS BI provider, Pivotlink and prior to that CEO of Postini which was acquired by Google IN 2007. Both were in the business of mining a lot of data and were high growth companies. Managing fast growth, scaling quickly, and WORKING WITH solutions that center on ‘data’ has been my focus and passion for the last six years. I bring this to Aster Data as we accelerate our momentum in this new multi-billion dollar market. I am looking forward to expanding our market reach as we scale out.
What is the core of the technology that Aster Data uses in its products/services?
That's one of my favorite topics. Aster Data's core technology is a massively parallel database that runs on commodity hardware and uniquely integrates within it a complete analytics engine. The system is called Aster Data nCluster. The market shift I mentioned earlier has created an opportunity from the explosive data volumes companies are struggling with. Big data requires moving ‘processing’ to the ‘data’ versus moving data to a separate processing tier.
Our technology does this in a very unique way Now a client can house all of its data and deep analytical processing in a single system. The result of this is very cost-effective storage of massive data volumes and very fast, rich processing of all your data. This has become critical to organizations as they heavily rely on deep data insights and information to improve their business.
Does Aster Data create an island or a silo of data?
Oh, no. We interoperate with other data solutions where data flows between those systems and our platform. However, the bulk of the data is increasingly being stored in the Aster Data platform and all the processing and analytics takes place in our system.
We refer to the platform as a ‘Data-Analytics Server’ – it serves frontline analytic applications. A great example is comScore...
That's the big Web analytics outfit , right?
Right, right. comScore stores hundreds of terabytes in Aster and runs their critical analytic applications on the Aster Data system. Their dozens of business analysts use the system daily to provide deep insights to more than 1,300 clients who buy its data about what people are looking at on the Web. Its flagship product is Media Metrix, which is used by web publishers and agencies to optimize their media buying and planning strategies according to the size and composition of Web site audiences.
How does Aster Data's product/service approach differentiate the firm from its competitors in the data management space?
Aster Data’s solution is unique in that it allows complete processing of analytic applications ‘inside’ the Aster Data MPP database. This means you can now store all your data inside of Aster Data’s MPP database that runs on commodity hardware and deliver richer analytic applications that are core to improving business insights and providing more intelligence on your business. To enable richer analytic applications we offer both SQL and MapReduce.
I think you know that MapReduce was first created by Google and provides a rich parallel processing framework. We run MapReduce in-database but expose it to analysts via a SQL-MapReduce interface. The combination of our MPP DBMS and in-database MapReduce makes it possible to analyze and process massive volumes of data very fast.
Some of Google's insights are now open source. What's special about your method and system?
Aster Data’s solution is unique in several areas. First, we scale linearly on commodity hardware to keep hardware costs low as you store more data in the system and scale to 10’s to 100’s of terabytes and petabytes.
Second, the integration of an analytics engine inside the system allows all analytics processing and procedural code to be co-located with the data which makes it very effective and fast to analyze 10’s to 100’s of terabytes’s of data.
And third is the technique Aster Data uses to process data, which is via a technology first created by Google called MapReduce. Aster Data has implemented the MapReduce processing framework popularized by Google inside the nCluster system which allows for very rich analytics.
But did this deliver what you wanted?
Well, in 2008 we uniquely coupled MapReduce with SQL to introduce the patent-pending ‘SQL-MapReduce’ framework that makes it easy for any organization to leverage the power of MapReduce. All they need is their existing SQL skills and can get started building rich analytic applications without having to learn the in’s and out’s of MapReduce or parallel programming.
Everyone from business intelligence giants like IBM Cognos SPSS to upstarts like Megaputer in Bloomington, Indiana, are touting their analytics. What's your approach?
Aster Data’s analytics is powered by Google’s MapReduce processing framework which was first introduced by Google in 2004. Aster Data however took this one step further by coupling SQL with MapReduce and brought to market the patent-pending SQL-MapReduce analytics technology and in-database MapReduce -- so now any organization with SQL skill sets can now leverage the power of MapReduce.
With our solution for big data analytics, customers can now run rich ad hoc queries, do deep data exploration on massive data volumes, extend predictive analytics, and uncover new insights in their data – essentially they can build a new class of interactive analytic applications. The class of analytics enabled with the Aster Data solution span customer intelligence and customer behavior applications, click stream analysis, graph analysis, event-triggered analysis such as analyzing a fraud event or marketing promotional event, systems analysis and a whole range of other analytics that require analyzing massive data sets to better understand the business, customer behavior pattern, events and future trends.
Will you give me a use case and explain the upside that your firm enabled for your client?
I am sure you understand that I cannot reveal too many details about our clients' use of our technology. I can say, however, that Aster Data gives clients like Barnes & Noble the ability to elegantly scale and handle more sophisticated analysis of larger data volumes than they could get with the “big boys”.
You mean "big boys" like IBM, Microsoft, and Oracle?
Well , you named these companies. Let me get back to my example. Barnes & Noble has consolidated multiple Oracle warehouses into the Aster Data platform, and is using our in-database MapReduce analytics techniques and SQL-MapReduce to better understand cross-channel buying patterns.
Aster Data nCluster has given Barnes & Noble not only the ability to easily scale to handle dozens of terabytes of data, but to get deeper behavioral insights into fine-grained, cross-channel data which gives them a better sense of marketing attribution (what promotions through various channels led to a desired outcome, such as a sale).
Using Aster Data’s SQL-MapReduce framework they are able to run sophisticated analytic algorithms fully in-database where they execute in an MPP fashion right where the data resides, increasing the speed at which they can gain insights from the data, as well as offering an additional depth of analysis and business insight. A recent article in InformationWeek details the business impact Aster Data will have for Barnes & Noble.
You can see what Barnes & Noble says.
- “Bookseller Barnes & Noble has "dozens and dozens" of terabytes, says Marc Parrish, VP of retention and loyalty marketing, and until recently that data was spread across nine Oracle data warehouses. One warehouse handled point-of-sale data from 730 retail stores. Another handled 630 college bookstores. Another handled the Web site."
And so on.
Yet one of the single most important insights Barnes & Noble needs, as e-books and e-readers take off, is how readers interact across those channels. It's no accident that its new CEO, William Lynch, ran the retailer's Web site before taking the helm.
Does your technology fit into the Nook e-reader service?
Yes. Shortly after Barnes & Noble entered the e-reader market last year, with the Nook device and iPhone and Android e-readers, its executives backed an investment in a consolidated enterprise data warehouse. The company finished migrating to an Aster Data nCluster database, running on commodity MPP hardware, this spring and started using it for analysis within the last month. Parrish says Barnes & Noble already is doing a better job of cross-channel analysis, which was next to impossible with silos. Look at this passage of the Information Week article:
- 'Before, when somebody visited us online, we only knew about their online purchases,' he says. Now that all our data is in one place, we can understand their interactions across our entire ecosystem.' Barnes & Noble gets better understanding of customer reading interests, as well as insight into the dynamics among e-reading, online activity, use of in-store cafes, and store purchases.
I think Google is shifting from MapReduce. What technological innovations exist at Aster Data do you have to keep your clients in the forefront of technology and deal with such challenges as performance, big data, and the demand for near real time updates?
Google’s recent opinion on MapReduce came from the realization that native MapReduce as originally specified had some shortcomings when it comes to very fast, interactive or iterative analytics. Aster Data addressed exactly this issue when we first implemented MapReduce. We recognized from day one that MapReduce can be augmented with rich functionality of a relational database that creates a unique implementation to allow for fast, interactive processing of data.
What we did was leverage the MapReduce fundamentals but implemented it in a unique way by doing two things; first implemented a MapReduce data processing engine in-database inside of the Aster MPP DBMS and second coupled MapReduce with SQL to bring to market SQL-MapReduce to ease adoption of MapReduce within organizations. It is important to note that MapReduce is both a framework/language and an implementation. To get the full power of MapReduce and parallel processing, a system requires a unique implementation of MapReduce which is what Aster Data’s solution offers. With this innovation we enable a new class of interactive, high performance analytics including near real-time processing of data that is essential for a variety of situations like fraud analysis.
In the last six months, data management has been getting what I call the Madison Avenue treatment. I hear more about IBM, Oracle, SAS, and SAP Business Objects than I have for years. What broad market forces are turning up the noise without improving the quality of the sound if I may use an audio metaphor?
Companies such as IBM, Oracle, SAS, Business Objects, SAP and others all realize the importance of the big data market and opportunity this presents. Their solutions and footprint in the account expands substantially the larger the data sizes get that live in the Aster Data platform or any big data platform. These vendors, especially the business intelligence solutions and packaged analytic application providers benefit from an underlying DBMS platform that can store and process larger volumes of data. So the Aster Data solution and integrating with such a platform becomes critically important. The more data in a solution like Aster Data’s the bigger the opportunity for the BI and analytics application vendors. Simply put, the more data in Aster Data’s solution, the more license sales for the BI and analytic application providers. The big data management phenomenon and new platforms such as Aster Data’s are creating more market opportunity for them.
As you look down the road, what are the three major challenges you see for vendors who keep trying to solve big data and other "now" problems with old tools?
Old tools and traditional architectures cannot scale effectively to handle massive data volumes that reach 100’s of terabytes nor can they effectively process large data volumes in a high performance manner. Further, they are restricted to what SQL querying allows. The three challenges I have noted are:
First, performance, specifically, poor performance on large data volumes and heavy workloads: The pre-existing systems rely on storing data in a traditional DBMS or data warehouse and then extracting a sample of data to a separate processing tier. This greatly restricts data insights and analytics as only a sample of data is analyzed and understood. As more data is stored in these systems they suffer from performance degradation as more users try to access the system concurrently. Additionally moving masses of data out of the traditional DBMS to a separate processing tier adds latency and slows down analytics and response times. This pre-existing architecture greatly limits performance especially as data sizes grow.
Second, limited analytics: Pre-existing systems rely mostly on SQL for data querying and analysis. SQL poses several limitations and is not suited for ad hoc querying, deep data exploration and a range of other analytics. MapReduce overcomes the limitations of SQL and SQL-MapReduce in particular opens up a new class of analytics that cannot be achieved with SQL alone.
And, third, limitations of types of data that can be stored and analyzed: Traditional systems are not designed for non-relational or unstructured data. New solutions such as Aster Data’s are designed from the ground up to handle both relational and non-relational data. Organizations want to store and process a range of data types and do this in a single platform. New solutions allow for different data types to be handled in a single platform whereas pre-existing architectures and solutions are specialized around a single data type or format – this restricts the diversity of analytics that can be performed on these systems.
What new features have you added to the most recent Aster Data product/service offerings?
I am glad you asked me. Right before we sat down today, [September 15, 2010] we announced Aster Data nCluster 4.6, which includes a column data store, making Aster Data nCluster the first platform with a unified SQL-MapReduce analytic framework on a hybrid row and column massively parallel processing (MPP) database management system (DBMS). The unified SQL-MapReduce analytic framework and our suite of more than 1,000 MapReduce-ready analytic functions, delivers a substantial breakthrough in richer, high performance analytics on large data volumes where data can be stored in either a row or column format.
I thought I saw a reference to a free download. Is my memory serving me correctly?
Yes, it is. Additionally, we recently announced Aster Data Developer Express, a free, downloadable MapReduce development environment which makes it easy for companies to develop rich analytic applications using SQL-MapReduce in less than one hour. In mid August we pointed out that much of the MapReduce coding, including programming concepts like parallelization and distributed data analysis, is addressed by the IDE without the developer or analyst needing to have expertise in these areas.
This simplification makes it much easier for developers to be successful quickly and eliminates the need for them to have any deep knowledge of the MapReduce parallel processing framework. Google first published MapReduce in 2004 for parallel processing of big data sets. Aster Data has coupled SQL with MapReduce and brought SQL-MapReduce to market, making it significantly easier for any organization to leverage the power of MapReduce.
The Aster Developer Express IDE simplifies application development even further with an intuitive point-and-click development environment that speeds development of rich analytic applications. Applications can be validated locally on the desktop or ultimately within Aster Data nCluster, a massive parallel processing (MPP) database with a fully integrated analytics engine that is powered by MapReduce — known as a data-analytics server.”
There is a lot of talk about commercial software companies putting licensees in handcuffs. What is your firm's approach to giving licensees the freedom each needs to solve particular business problems?
Our licensing model is very simple and liberating. We charge by dollars per terabyte ($/TB) of usable data in a system. Not based on storage capacity (“spinning disk”), number of users, or CPU cores like some of ourcompetitors.
Our approach lets our customerS easily scale the number of servers in their cluster to match their needs without being handcuffed into complex licensing schemes. Secondly, because we run on various lines of commodity hardware such as those from Dell and HP in addition to cloud platforms like Amazon Web Services, we give our customers more choice in how they can deploy Aster Data for their big data analytical needs.
What is Aster Data's product/service line up today? What do you anticipate offering or enhancing in 2011?
Today Aster Data offers the MPP Database with an integrated analytics engine powered by SQL and MapReduce – the nCluster solution. Additionally to accelerate analytics, Aster Data offers a visual development environment for rapid development of analytics applications (Aster Developer Express that is downloadable from www.asterdata.com) and offers a suite of pre-packaged analytic functions that offer analytics and developers a fast path to building rich analytic applications. Aster Data will continue to add more innovations to the nCluster platform and continue to deliver more pre-packed MapReduce-ready analytics functions. Today Aster Data offers 1000+ MapReduce-ready functions and will continue to add to this suite to enable faster, richer development of analytic applications.
Without giving away any secrets, what do you see as the three major trends in data management in the next 12 months?
Another difficult question. I would have to say that moving to management of diverse data types in a single platform; different data formats in a single platform (for example, row and column stores); and richer analytical capabilities enabled by innovations such as MapReduce and other analytic breakthroughs. Exceptional performance on massive data volumes and high volume of concurrent queries will also continue to be a big trend; this will be brought about by breakthroughs in hardware, technologies like solid state discs as well as innovations in the software infrastructure itself.
How does a person get more information about Aster Data?
Anyone can navigate to www.asterdata.com for detailed information on Aster Data’s solution and download the product.
ArnoldIT Comment
Aster Data is one of the next-generation data management firms that is putting pressure on traditional relational database vendors and traditional business intelligence vendors. This one-two punch positions Aster Data to capture some significant accounts. The economies possible from an Aster System range from a reduction in capital expenditures to eliminating certain enterprise systems. In today's financial climate, Aster Data is an example of a company that adds new capabilities and permits certain efficiencies. Aster Data is a company to watch.
Stephen E. Arnold, September 22, 2010