Intel: What Business Is It In?

August 9, 2008

Intel’s push in cloud computing strikes me as a “me too” response to a customer rebellion that is brewing. Maintaining servers, struggling with heat and power consumption costs, and the the mind-numbing wackiness of enterprise software fuel the shift. Intel in a search for more revenue is looking for a two-fer.

Intel wants to grow its revenue, particularly in its semiconductor business and Intel wants a bigger piece of the action in cloud computing. Can Intel perform this trick? This is a difficult question to answer. Now Intel seems to be probing other markets as well.

On August 8, 2008, Intel surprised me with its release of its Summary Statistics Library. You can read the Web log post by Dmitry Kabaev here. You can download the library here. There is also an installation guide available from the download page. You can choose either the Linux or the Windows library. There are two low key requests for your email, but as far as I could tell, I was able to suck down the libraries without registering. If you want to participate in Intel forums, you will have to cough up some information, but I register my dog, who seems quite happy to ignore his email.

The stats pack is part of Intel’s initiative. Intel wants to be a good open source citizen, and it is an excellent way to allow developers to start mud wrestling with programming for massively parallel systems. Intel is upfront about this point, describing the library as “a set of algorithms for parallel processing of multi-dimensional datasets. It contains functions for initial analysis of raw data which allow investigating structure of datasets and get their basic characteristics, estimates, and internal dependencies.”

You can whack on data sets with:

  • Basic statistics. Algebraic and central moments up to 4th order, skewness, kurtosis, variation coefficient, quantiles and order statistics.
  • Estimation of Dependencies. Variance-covariance/correlation matrix, partial variance-covariance/correlation matrix, pooled/group variance-covariance/correlation matrix.
  • Data with Outliers. The Intel® Summary Statistics Library contains a tool for detection of outliers in a dataset. Also the library allows computing robust estimates of the covariance matrix and mean in presence of outliers.
  • Missing Values. Data which contains missing values can be effectively processed using modern algorithms implemented in the package.
  • Out-of-Memory Datasets.  Many algorithms of the library support data which cannot fit into the physical memory processing huge data arrays in portions. Specifically, variance-covariance matrix estimators, algebraic and central moments, skewness, kurtosis, and variation coefficient can process a dataset in portions.
  • Various Data Storage Formats. The Intel Summary Statistics Library supports in-rows and in-columns storage formats for datasets, full and packed format for variance-covariance matrix.

The libraries support C and Fortran90/95.

Intel has invested in Endeca, and I don’t think this is a casual greenfield seeding. Endeca’s technology performs some interesting processes on structured and unstructured content. I see not overt evidence that Intel is overtly moving into information retrieval. I am tracking announcements like this stats pack as part of my research effort to figure out how Endeca figures in Intel’s plans.

While I root around for information, download the statistics libraries. My quick look revealed some useful work by Intel’s engineers, who merit a happy quack.

Stephen Arnold, August 9, 2008


One Response to “Intel: What Business Is It In?”

  1. Intel: What Business Is It In? | Easycoded on August 9th, 2008 1:24 pm

    [...] 9, 2008 ScottGu Intel’s push in cloud computing strikes me as a “me too” response to a customer [...]