SSNOrder Google: The Digital GutenbergSurf on Google

SQL Does Too Scale

March 16, 2010

The Dennis Forbes on Software and Technology blog published “Getting Real about NoSQL and the SQL-Isn’t-Scalable Lie”. The article caught my attention because it expresses the viewpoint that SQL does scale. I found the write up interesting, and I wanted to highlight several  of the arguments presented.

First,  the article points out that bashing SQL is an increasingly popular sport. Mr. Forbes writes:

In the case of the NoSQL hype, it isn’t generally the inventors over-stating its relevance — most of them are quite brilliant, pragmatic devs — but instead it is loads and loads of terrible-at-SQL developers who hope this movement invalidates their weakness.

Second, he makes clear that SQL does scale. He offers:

Such a solution — even on a stodgy old RDBMS — is scalable far beyond any real world need because you’ve built a system for a large corporation, deployed in your own datacenter, with few constraints beyond the limits of technology and the platform. Your solution will cost hundreds of thousands of dollars (if not millions) to deploy, but that isn’t a critical blocking point for most enterprises. This sort of scaling that is at the heart of virtually every bank, trading system, energy platform, retailing system, and so on. To claim that SQL systems don’t scale, in defiance of such obvious and  overwhelming evidence, defies all reason.

Third, he points out that progress is being made:

Scalability noise based upon the limitations of a cloud vendor’s offerings needs to be put into context: They don’t apply to most of the users of relational databases. MySQL isn’t the vanguard of the RDBMS world. Issues and concerns with it on high load sites have remarkably little relevance to other database systems. And of course the SQL/RDBMS world is changing (side note: Few love SQL, but I’ve yet to see a viable replacement). Wouldn’t it be a grand world where every desktop (platforms that spend about 99% of their time completely idle) in a corporation was a part of the corporate cloud, all seamlessly acting as a part of the corporate information system in a reliable, redundant way? A simple SQL statement silently and transparently fulfilled by hundreds of distributed systems?

But the real interesting part of the write up is the comment section of the Web log. Some are clever and others like Alex Popescu’s are thought provoking. Excellent write up.

Stephen E Arnold, March 17, 2010

No one paid me to write this. Because the article is about relational databases, I will report non payment to DHS, an outfit with quite fascinating RDBMS challenges. Those folks and their consultants get paid I believe.

Database Skirmishes: Relational versus Non-Relational

March 15, 2010

A happy quack to the reader who sent me a link to “SCALE 8x: Relational vs. Non-relational.” I was able to access the link, but the page was marked “subscribers only.” The main point of the write up was to explain some of the performance differences between Codd databases like SQL Server and Oracle and the non relational data management systems like those in use at Google or Digg.com. You can download the slides upon which the article was based at http://www.pgexperts.com/document.html?id=40. The article also includes a useful write up about the issue at http://ossdbsurvey.org/.

Stephen E Arnold, March 15, 2010

No one paid me to write this news item. I will report this sad fact to Health & Human Services because I used a variant of the word “relation”.

Endeca Files OfficeFurniture.com under Sold

March 8, 2010

I learned that OfficeFurniture.com has rolled out its Web site powered by the Endeca system. You can read the story in “OfficeFurniture.com Introduces Endeca Search and Navigation Technology with Newly Redesigned Website.”

The site features point and click refinement, recommended products, and sort options. My test queries rendered within a couple of seconds. When I selected “View all matching file cabinets”, the system generated a single long page with pictures and links to additional information. I had a Bing Image moment when I first encountered this feature. The long pages are a bit of a hassle on my netbook, which has a wimpy graphics card and minimal RAM. That’s not Endeca’s issue, however. Overall performance was good.

For me the most interesting comment in the write up was:

“OfficeFurniture.com are experts in their market and have a deep understanding of the factors most important to each customer as they seek the right furniture for their specific office environment,” said Rob Swint, global lead, B2B eCommerce and distribution at Endeca. “By continuing their advancements in overall web presence, they are allowing customers unprecedented ability to search and evaluate their product lines, while improving sales effectiveness by better matching customers to the right product for their needs. The result is a better user experience for both customer and business.”

The positioning of Endeca struck me as squarely in the eCommerce sector. Congratulations to the Endeca team on this big win.

Stephen E Arnold, March 6, 2010

No one paid me to write this news item. I will report non payment to the Department of Commerce, an agency on top of all things commercial.

The Google PSE circa 2007 Becomes News

March 4, 2010

Yep, another big surprise for the Google mavens, pundits, and azure chip crowd. You can get a good snapshot of the “discovery” at “Google Index to Go Real Time.” The big idea is that a Web publisher can “automatically submit new content to Google.” The news is a bit stale in my opinion. If you take a peek at the five patent documents submitted by Google in February 2007, you can get the full scoop, see code examples, and learn that this “method” has some interesting plumbing; namely, the context server. The inventor of this “new” method is a bright fellow in the Google engineering den. For the detail about this news, which dates from late 2005 or early 2006, check out US200700386616. The four related patent documents (filed on the same day by the way) and the team’s post PSE filings provide more color. The real question is, “What’s next?” I discuss this question in my 2007 monograph, Google Version 2.0, published in mid 2007 by Infonortics Ltd. in Tetbury, Glos. In my opinion reading about a fait accompli is probably not the best way to stay abreast of Google’s technology trajectory. The patent documents make clear how the method works. Let’s see. This is 2010, a bit more than three years since the patent documents appeared. This interval is a typical Google “deployment” interval. Check out the context server and ask, “What’s with this semantic Web stuff?”

Stephen E Arnold, March 4, 2010

No one paid me to write this post. When I get royalties, my publisher sometimes pay me. So I suppose this is a self funded post.

IBM: Database or Public Relations Wizardry?

March 4, 2010

I cannot figure out if IBM has revealed a breakthrough in technology or publications. You will have to make up your own mind. Navigate to “Putting the Web in a Spreadsheet”. The write up explains that IBM has used Hadoop and its own code called Big Sheets to help make sense of Web information. According to the write up:

BigSheets uses Hadoop to crawl through Web pages, parsing them to extract key terms and other useful data. BigSheets organizes this information in a very large spreadsheet, where users can analyze it using the sort of tools and macros found in desktop spreadsheet software. Unlike ordinary spreadsheet software, however, there’s no limit to the size of a spreadsheet created through BigSheets.

The example in the article is the British Library’s use of the technology as part of an archive project. The article said:

The first test for BigSheets came at the British Library, which has been working since 2004 to create an archive of the roughly eight million UK websites. At regular intervals, the Library takes snapshots of Web pages, converts them to an archival file format, and stores them. But searching and analyzing this data is another challenge, and that’s where BigSheets came in.

IBM, according to the article will use this technology in future products. I will reserve judgment. I did write about the British Library taking months to create an archive of Web sites, noting that the project seemed to be moving slowly. The disconnect in my mind remains because this Web in a Spreadsheet write up suggests that the British Library has an archive of eight million Web sites, not a few thousand. More information is needed.

I don’t know if this is technology or PR.

Stephen E Arnold, March 3, 2010

No one paid me to write this. Since I mention IBN, recipient of a large US government integration project, I will report the fact that I wrote for no dough to IBM Federal Systems, a unit which does work for dough.

Stephen E

IBM to Upend Server Market

March 3, 2010

My feedreader disgorged a link to an IBM news release that trumped the mainframe baloney and the weird IBM SEO expertise news items I have reported on in this goosely blog. Now between you and me, I no longer buy branded hardware. I go with the commodity stuff. I paid 1 800 GotJunk to haul away my last two NetFinity 5500s which I replaced with a single white box dual quad core machine plugged into a Drobo sporting three one terabyte drives. My energy consumption dropped from 3000 watts to 750 watts and the 300 pounds of IBM hardware with all sorts of IBM stickers, Serve RAID goodies, and two EXP chassis became one box about the size of a briefcase. Don’t get me wrong. In the good old days of easy money and CFOs who would rubber stamp any information technology purchase order, the branded hardware made sense. Today. Not so much. My two NetFinities rang in years ago in the $10,000 range. So two cost me $20,000. The white box replacement was about $700. This goose prefers to keep his money, not give it to IBM with its $750 tech roll fees and crazy prices for FRUs.

What does the IBM news release assert? First you have to read this gem yourself. Navigate to “IBM Unveils Industry’s First Systems that Rewrite Economics of ‘Industry-Standard’ Computing”. You can watch a Flash video at the IBM x86 Web page. Second, note these points:

  • I can plug in memory as needed.
  • I can increase my virtualization capacity.
  • I can select different form factors.

In short, I can pretty much do what I am now doing. The difference is that if I buy IBM is “upend” my ROI as IBM “upend” the branded server market. Here’s an example of IBM’s math:

IBM eXFlash technology would eliminate the need for a client to purchase two entry-level servers and 80 JBODs to support a 240,000 IOPs database environment, saving $670,000 in server and storage acquisition costs.

To me, JBOD means “just a bunch of discs” which is trendier than direct access storage devices or DASDs.

Will this address the performance problems of IBM’s implementation of search on its Web site? Nope. The problems have little to do with hardware. In my opinion, I don’t think throwing hardware at an architecture problem is the route to follow. Your mileage may vary. Have at it. Hope that I am not doing the IV&V on this type of solution to a problem such as IBM’s own Web site search system running these zippy new servers and their acronyms and assertions.

Stephen E Arnold, March 3, 2010

No one paid me to write this. Because IBM Federal Systems is helping to fix the US government’s computing infrastructure, I will report non payment to the folks in Gaithersburg. Didn’t IBM help design much of the infrastructure that IBM is now rearchitecting? Hmmm. Interesting.

Is IBM Annoyed with EMC? Seems So

February 27, 2010

A happy quack to the reader who sent me a link to the article “NetApp Slammed over Tiering Is Dead Comments, EMC Savaged by IBM and Pillar, Named in Disclosure Scandal.”

IBM is hopping from search engine optimization to analysis of competitors. Here’s the passage in the write up from URL4.eu that was fascinating:

IBM’s Tony Pearson finds little to like in EMC’s enhanced Atmos. Storagebod also takes issue with the company on several issues.

What issues” A quick look at Mr. Pearson’s comments was enlightening. First, I noted this passage:

Tony Pearson is a Master Inventor and Senior IT Storage Consultant for the IBM System Storage product line at the IBM Executive Briefing Center in Tucson Arizona…

Okay, poobah.

Next I noted this statement by Mr. Pearson:

Is EMC positioning ATMOS as “Storage for Terrorists”? I can certainly appreciate the value of being able to protect 6PB of data with only 9PB of storage capacity, instead of keeping two copies of 6PB each, the trade-off means that you will be accessing the majority of your data across your intranet, which could impact performance. But, if you are in an illicit or illegal business that could have a third of your facilities “seized by the government”, then perhaps you shouldn’t house your data centers there in the first place. Having two copies of 6PB each, in two “friendly nations”, might make more sense.

The word choice is definitely interesting. The “T” word which will light up some filters in my opinion.

What’s with the contentiousness? EMC is in storage, has the fine Documentum system that IBM once supported, and some eDiscovery tools. IBM has everything, including consulting services and search engine optimization.

My hunch? IBM is feeling the pressure of companies like EMC in some key clients.

Remind me not to sell anything to an IBM client. EMC can withstand the alleged criticism. The addled goose has difficulty spelling words like maroon and its variants.

Stephen E Arnold, February 27, 2010

No one paid me to write this. I will report non payment to the IBM Federal Systems folks who are reinventing the GSA’s computer infrastructure. IBM will pass the statement of non payment along I assume.

SQL at 40: Ready for Retirement?

February 26, 2010

Darned interesting write up in the Kellogg (formerly the Web log of the CEO of Mark Logic Corporation). The title caught my attention: “The Database Tea Party: The NoSQL Movement.” If you are struggling with your favorite 50-year old database technology, you will want to read Mr. Kellogg’s article. This comment sums up Kellogg’s position:

If you’re struggling with an RDBMS on a given application problem you shouldn’t say:  we need an open source, NoSQL type thing.  You should say:  we need to look at relational database alternatives.  Those alternatives include a open source database projects (e.g., MongoDB, CouchDB) and key-value stores (e.g., Hadoop), but they also include commercial software offerings such as specialized DBMSs like Streambase (for real-time streams), Aster (for analytics on big data), and MarkLogic (for semi-structured data).  Don’t throw out the commercial-software-benefits baby with the RDBMS bathwater.

I have written about the challenges SQL poses. I want to point out that even firms with non-RDBMS solutions * can use * SQL for certain tasks. I heard one Googler several years ago mention that MySQL was a useful tool. That may have changed now, but I have a couple of RDBMS files that work just fine. The “fine” is the key word because I am not pushing beyond the capabilities of the 40-year old invention of Dr. Codd.

You don’t see too many 40-year-olds athletes in the Olympics or professional sports. Why not take the same pragmatic approach to data management?

Stephen E Arnold, February 25, 2010

The addled goose has been paid by Mark Logic Corporation to give talks at the firm’s user meetings. I was not paid to write this news item, however. Next time I am in San Francisco I will try to get a taco out of this company’s engineering department.

EasyAsk: NLP, SQL, and MDX

February 25, 2010

A reader sent me a link to DataPrix’s write up about EasyAsk as a business intelligence tool. Now owned by Progress Software, EasyAsk has dropped off my radar. My recollection is that the system supported search, facets, and eCommerce. I did wonder why Endeca did not crack down on EasyAsk’s use of the phrase “guided navigation”, but I may have mixed up which marketer cooked up which phrase. I had pegged as a system for searching structured data. DataPrix’s article positions EasyAsk as a business intelligence tool. The screen shots show the system generating the type of output that I associate with companies such as Megaputer, based on math magic from Moscow-based wizards.

image

Source: http://www.dataprix.com/node/1068

EasyAsk supports reporting, analysis, and scorecards (dashboards). The user formulates a query and the EasyAsk system retrieves the data and generates an answer. The query can be expressed in natural language, so the EasyAsk approach obviates the need for a goose like me to create a well formed query against a properly formed cube.

The DataPrix write up suggests that EasyAsk can be used to perform search and retrieval plus the more sophisticated queries against large structured data sets. EasyAsk added support for mainstream databases years ago. EasyAsk can also generate a user’s query in MDX (multi dimensional expressions), a favorite of end users who majored in business and then jumped into sales.

DataPrix provided a link to a demonstration of the system with which I was not familiar. You can check it out at http://www.easyask.com/demo.html.

How does EasyAsk stack up against SAS and Megaputer? I don’t know the answer to that question. I quite like Megaputer. Those Russian math dudes are in the flow in my opinion.

Stephen E Arnold, February 25, 2010

Nope, nope. Not paid. I will report a failure to receive money for this article and its frisky reference to undergrads with degrees in business and extensive sales training. The agency to which I am reporting is the Census Bureau. No explanation needed.

Perfect Video Search?

February 22, 2010

I wrangled a free meal from two Perfect Search engineers. I learned that Perfect Search was providing the technology for i.TV, pronounced “i dot TV.”

I have written about Perfect Search’s robust, high-performance search and content processing system previously.  You may know that the company was founded in 2007 by veterans of the search industry. Perfect Search has achieved significant, game-changing, patent-protected innovation in the core processes of search. The Perfect Search system can chop down the number of servers needed to manipulate petabytes of data by an order of magnitude. The result is increases in indexing and query speeds and throughput and dramatically lower infrastructure costs. Perfect Search products include a Database Search Appliance for Oracle, a OneBox Extender for the Google Search Appliance, and search for Backup and Storage solutions.

I am not “into video” so I was not familiar with i.TV. The company offers an application for the iPhone and iPod touch that helps people discover, share and consume media. With i.TV, users can browse hundreds of thousands of up-to-date local TV and movie listings, as well as a catalog of hundreds of thousands of TV and movie titles available for download and DVD rental. i.TV also includes community features and allows people to write reviews, rate shows and recommend shows to followers and friends on Twitter and Facebook. i.TV enables users to watch movie trailers and television previews, purchase movie tickets, manage their Netflix queues, and use their iPhone or iPod touch as a remote control.

My host, Ken Ebert, one of Perfect Search’s senior technologist, told me:

We have been able to replace the native search functionality of the MySQL application and integrate the Perfect Search engine into the i.TV application and have high-throughput functionality for indexing of new data and querying of the multiple MySQL databases that i.TV maintains. Companies that have multiple relational databases struggle to index and search these content repositories in a timely, cost-effective manner, especially when the query involves complex database joins. We are able to search over a billion records on a single Database Search Appliance. We are excited to be able to be involved with a company that has such a great product and that is poised to have significant growth.

Mr. Ebert explained that the Perfect Search team was delighted to to be installed as part of one of the top downloaded iPhone applications.

We downloaded the app and were able to locate specific shows quickly and easily. When I travel, I will be able to catch my History Channel favorite, “Engineering an Empire.” The i.TV app is available at the Apple Store and in the iTunes store and is a top download. Perfect Search brings order to the untidy world of programming databases. From the iPhone there is snappy performance for basic and advanced search.

Besides matching up geo-codes to determine the customer location, Perfect Search is handling some complex database joins, allowing i.TV customers to search by TV Network, actor name, or TV Show title with blistering response times. Perfect Search is also providing queries of several TV and movie listing databases.

Stephen E Arnold, February 22, 2010

I did get a free meal, but I was not otherwise compensated for this write up. I will report good food and fine company as a payoff to the Economic Research Service, a unit of the Department of Agriculture. Adhere Solutions is working with Perfect Search. My son is a smart lad in my opinion.

Next Page »