Infobright: The Warsaw Connection to Rough Sets

May 19, 2008

Infobright is an interesting company. The wizard founders have some keen math skills. The company’s management might be as good but in marketing and sales. The combination means that Infobright is a company to watch.

The firm’s core business is selling a data management system that helps break the well-known and increasingly problematic bottlenecks that traditional databases put in front of business analysts. The relational database with its familiar rows and columns require serious engineering to make work in our world of petabyte data.

In a nutshell (and I am glossing over significant technical details), Infobright pulls data into its system. Then using some fancy math involving rough sets and other cutting-edge techniques builds a data warehouse. When you need to run a query, the Infobright system doesn’t run to the data table. The metadata allow the Infobright system to ignore data that are not germane and snag only that which is appropriate. The Infobright metadata method builds an index (not a very good word for these “views”, “set abstractions”, and probability matrices) that, for many queries, can answer the question. No hitting of the data in the warehouse necessary, thank you.

Infobright’s system interests me because years ago I did a small job for a Polish math wizard who set up shop in North Carolina. Several of the engineers used rough set and related math to create a search engine called Inferno. The metaphor of the “inferno” was intended to communicate the swarming math techniques that “discovered” relationships. Although not directly analogous to what Infobright is doing, I learned from the company’s founder, Dr. Zbigniew Michalewicz.

In my conversations with Dr. Michaelewicz, he communicated the significant potential of rough sets, mereology, and evolutionary computation to force me back to the math books. Infobright appears to be tapping into this mathematical mother lode.

It founders have some ties to Warsaw, one of the places where these mathematics are valued and made part of the curriculum for students who can deal with the notion of sparse tables, fuzzy sets, and recursive ant equations.

Infobright’s speed has caught the attention of organizations looking for ways to perform analyses quicker. You will want to navigate to the Infobright Web site and read the clear, but economical documentation available. However, to understand the sophistication of the application, you can think in terms of column-oriented data management systems.

The best-known column system is Google’s BigTable. But there are significant differences between Infobright’s approach and Google’s. The key point is that both Infobright and Google use some sophisticated math. Neither company is particularly forthcoming about these methods.

From my research, which may be incomplete, the key point is that the use of “Warsaw math”, a term I coined to refer to rough sets theory and related methods, allows a query to be satisfied without having to fetch data from the data store.

Infobright has implemented a fast-cycle data loader. The architecture of Infobright “sits outside” of the database system. In effect, the Infobright store is refreshed when new data are pumped into the Infobright system.

Google, on the other hand, updates its data store using its “relaxed write” method. So Infobright is a classic data warehouse / data analysis set up. Google is an online operation.

The key point is that some of the mathematical underpinnings are similar, at least to my aging eyeballs.

If you want to know more about the founders of Infobright, you can peruse very sparse biographical information here. The Warsaw connection jumps right out even in a thumbnail sketch. Details about rough sets may be found here. You can also run a Google query and follow the links on the first two pages of results. Most are useful. Information about mereology appears in Wikipedia. Though uneven, the entry is useful and contains additional links to follow. Information about BigTable appears in my 2005 study, The Google Legacy, which is available from Infonortics, Ltd. in Tetbury, Glou.

Stephen Arnold, May 20, 2008

Comments

7 Responses to “Infobright: The Warsaw Connection to Rough Sets”

  1. Full Table Scan on May 23rd, 2008 10:08 am

    Links for the Week…

    • A blog post by an admittedly-biased KickFire employee who makes some interesting points about power consumption.

    • DATAllegro’s take on why they’re better than Netezza. We don’t get to see open and public assaults like this nearly of…

  2. HiQube: Another Business Intelligence System : Beyond Search on May 25th, 2008 3:56 pm

    […] from specialist firms often based outside the United States. I wrote about the Canadian outfit Infobright last week, now I want to talk about the Italian company HiQube (formerly […]

  3. Joe Harris on May 27th, 2008 4:52 am

    As an amateur statistician (aka BI professional) I’ve often wondered why statistics and advanced math don’t play a bigger role in databases and data management.

    Currently everything is boolean logic over b-trees. When the DB doesn’t know where your data is it has to check every single row. The system doesn’t even attempt to start looking in a specific location.

    I think the row vs column debate may well prove to be irrelevant. The real question will be who knows their math. Clearly Infobright does.

    {Please consider switching to full text feeds. I hate having to click through and I can guarantee that I’ll only read about 1 in 10 of your posts from the current feed. 🙂 If you want to have advertising in the feed try Feedburner.}

  4. Stephen E. Arnold on May 27th, 2008 5:20 am

    Hi, Joe, thanks for posting. My thought about advanced math include these points [a] math is difficult for many folks, so the issues cannot be discussed, an approach agreed upon, and meaningful adjustments made. [b] doing math requires that a company invest money in people who may not mesh with a sales-oriented organization. Many competent people never make it out of a preliminary interview which are word oriented and math folks like logic, equations, and people who “get it”. [c] Slapping heavy math on an existing computer infrastructure can slow the puppy down. The idea of spending money to crunch numbers doesn’t make sense in many organizations or to over worked IT guys who want a light weight, no brainer approach to analysis.

    With regard to click throughs, I will look into this problem. I don’t want to create work for you. The ad angle is irrelevant to me. I slap Google AdSense on the Web log so I can see how Google works.

    If you have any other comments, or if you want to write a short item about math, let me know. I’m doing a work up on a hot Australian company but need some help with a short item about transformations (Riemann maybe?).

    Stephen Arnold, May 27, 2008, 6 20 am Eastern

  5. Stephen E. Arnold on May 27th, 2008 10:58 am

    Joe, quick note about the full text feeds: Although the Subscribe link leads to a partial-content feed, you can get full text RSS2 or atom feeds for the blog at the following links:

    http://arnoldit.com/wordpress/feed/rss2
    http://arnoldit.com/wordpress/feed/atom

    Hope this helps,
    Stephen Arnold

  6. Joe Harris on May 28th, 2008 5:45 am

    Thanks for the full articles feed Stephen. I really like your articles to date. Keep up the good work.

    BTW I particularly agree with you about [b]. BI / Analytics companies are rightly sales focused and their sales and product teams just can’t make the leap from technical innovation to real world problems.

    The relatively slow uptake of KXEN is a great example. I sat in on a pitch for this a few years ago and the salesperson kept referring to it as “snowstorm math” and saying that “it does the modeling for you”. But it’s a brilliant product!

    Maybe the next wave of innovation will come from pushing the math into graphics processors broken into thousands of parallel threads. The newest chips can perform super-computer worthy floating point ops. I wonder if this is what Kickfire are doing inside their product, although they refer to custom chips.

    Joe

  7. Infobright: Sun Sees the Light : Beyond Search on September 15th, 2011 3:49 pm

    […] I wrote about Infobright in May 2008. Predictably mainstream trade publications and most technical Web logs ignored the story. The idea of figuring out rough sets and applying their mathematics to data storage is less exciting than writing about Google and Microsoft. You can read about the Warsaw connection here. […]

  • Archives

  • Recent Posts

  • Meta