Publishers: Bet the Farm

April 16, 2009

Uncertain about the vagaries of digital products, publishers are doing what we call in Harrod’s Creek, “betting the farm.” The idea is simple. The person who plays poker looks at her hand. She decides that the cards are a sure bet. Instead of raising a dollar or two, she goes whole hog (another farm colloquialism) and bets the farm; that is, she puts the deed to her two acres of rocky ground on the pile of money and says, “Call.” If you are a gambler with a math background, you may suggest that she’s nuts. If you are a Kentucky lottery customer, you say, “That’s smart, lady.”

These down home images waddled though my goose brain when I read Lynn Neary’s “Publishers Gamble on Blockbuster Book Deals” on the NPR.org Web site. You can read her story here. (There’s probably a link to the audio somewhere on the page, but these tired eyes can’t spot gray links, sorry.) The idea is that publishers are opening their checkbooks to get their ink stained paws on “sure winner” authors. What’s a sure winner? How about books by female humorists such as Tina Fey and Sarah Silverman, among others?

The strengths and weaknesses of this approach to media was documented in a memorable book called Carnival Culture by James Twitchell. I am certain the economists from Stanford University who read this Web log will point to earlier and probably less readable analyses of this aspect of the media business. If you want to take my recommendation, grab a copy from Amazon here. The strength of the “blockbuster” is that when a media company gets one, money rains in torrents. The problem is that when the blockbuster flops, the advance and the effort goes to the remainder bin or direct to video. A publisher with a blockbuster that occurs with a textbook adoption cycle is in a pickle. The risk of losing the text adoption for an economics or psych book can plunge the publishing company into a sea of red ink. The narrowing margins mean that the costs of updating become more burdensome over time. Like I said, pickle.

Ms. Neary wrote:

Publishing has been always a gamble… no one really knows what will take off is part of the fun. But he thinks these days the stakes are getting too high, with the publishers taking all the risks and writers getting paid whether the book sells or not.

My conclusion: publishing companies’ business strategy is pretty similar to my Harrod’s Creek neighbor who bets the farm on the flip of a card. I don’t have the nerves for this type of business model. I prefer to float in my mine run off choked pond and watch the wizards of traditional media deal with the information opportunities from afar. My network connection feeds by addled goose brain. I am not sure what sustains the publishing companies with the blockbuster tactic.

Stephen Arnold, April 16, 2009

True Knowledge: Semantic Search System

April 16, 2009

A happy quack to the readers who sent me a link to this ZDNet Web log post called “True Knowledge API Lies at the Heart of Real Business Model” here. I had heard about True Knowledge — The Internet Answer Engine —  a while back, but I tucked away the information until a live system became available. I had heard that the computer scientist spark plug of True Knowledge (William Tunstall-Pedoe) has been working on the technology for about 10 years. The company’s Web site is www.trueknoweldge.com, and it contains some useful information. You can sign up for a beta account, read Web log posts, and get some basic information about the system.

About one year ago, the Financial Times’s Web log here reported:

Another Semantic Web company looking for cash: William Tunstall-Pedoe of True Knowledge says he needs $10m in venture capital to back the next stage of his Cambridge (UK)-based company, which is trying to build a sort of “universal database” on the Web.

In April 2009, the company is raising its profile with an API that allows developers to make Web sites smarter.

image

Interface. © True Knowledge

The company said:

True Knowledge is a pioneer in a new class of Internet search technology that’s aimed at dramatically improving the experience of finding known facts on the Web. Our first service – the True Knowledge Answer Engine – is a major step toward fulfilling a longstanding Internet industry goal: providing consumers with instant answers to complex questions, with a single click.

The company’s proprietary technology allows a user to ask questions and get an answer. Quite a few companies have embraced the “semantic” approach to content processing. The reason is that traditional search engines require that the person with the question find the magic combination that delivers what’s needed. The research done by Martin White and my team, among others, makes clear that about two thirds of the users of a key word search system come away empty handed, annoyed, or both. True Knowledge and other semantic-centric vendors see significant opportunities to improve search and generate revenue.

architecture

Architecture block diagram. © True Knowledge

Paul Miller, the author of the ZDNet article, wrote:

True Knowledge is certainly interesting, and frequently impressive. It remains to be seen whether a Platform proposition will set them firmly on the road to riches, or if they’ll end up finding more success following the same route as Powerset and getting acquired by an existing (enterprise?) search provider.

ZDNet wrote a similar article in July 2007 here. In 2008, Venture Beat here mentioned True Knowledge here in July 2008 in a story that referenced Cuil.com (former Googlers) and Powerset (now part of Microsoft’s search cornucopia). Hakia.com was not mentioned even though at that time in 2008, Hakia.com was ramping up its PR efforts. Venture Beat mentioned Metaweb, another semantic start up that obtained $42 million in 2008, roughly eight times the funding of True Knowledge. (Metaweb’s product is Freebase, an open, shared database of the world’s information. More here.) You will want to read Venture Beat’s April 13, 2009, follow up story about True Knowledge here. This article contains an interesting influence diagram.

I don’t know enough about the appetite of investors for semantic search systems to offer an opinion. What I found interesting was:

  • The company has roots in Cambridge University where computational approaches are much in favor. With Autonomy and Lemur Consulting working in the search sector, Cambridge is emerging as one of the hot spots in search
  • The language and word choice used to describe the system here reminded me of some Google research papers and the work of Janet Widom at Stanford University. If there are some similarities, True Knowledge may be more than a question answering system
  • The company received an infusion of $4.0 million in a second round of funding completed in mid 2008. Octopus Ventures provided an earlier injection of $1.2 million in 2007.
  • The present push is to make the technology available to developers so that the semantic system can be “baked in” to other applications. The notion is a variant of that used in the early days of Verity’s OEM and developer push in the late 1980s. The API account is offered without charge.
  • There’s a True Knowledge Facebook page here.

I recall seeing references to a private beta of the system. I can’t locate my notes from my 2007 trips to the UK, but I think that may have been the first time I heard about the system. I did locate a link to a demo video here, dated late 2007 That video explains that the information is represented in a way “that computers can understand”. I made a note to myself about this because this type of function in 2007 was embodied in the Guha inventions for the Google Programmable Search Engine.

The API allows systems to ask questions. The developer can formulate a query and see the result. Once the developer has the query refined, the True Knowledge system makes it easy for the developer to include the service in another application. The idea, I noted, was to make enterprise software systems smarter. The system performs reasoning and inference. The system generates answers and a reading list. The system can handle short queries, performing accurate disambiguation; that is, figuring out what the user meant.  The system made it possible for a user to provide information to the system, in effect a Wikipedia type of function. The approach is a clever way for the user to teach the True Knowledge system.

Monetizing Online: A Keeper for Newspaper CFOs

April 15, 2009

Please, click here to read “One Paper’s Online-Only Move Had Little Effect on Web Traffic, Study Says”, an article that appeared in the Wall Street Journal’s Web log. The article describes the impact of shifting to online from traditional print for a sample of one. You can read a summary of the case example by an academic here. The study reveals that the newspaper suffered a decline in traffic. The assumption of the dead tree crowd was that online traffic would increase because readers of print would become readers of online news. Wrongo. For me the most interesting comment in the WSJ article was this statement:

“It doesn’t make for very pleasant reading,” he [WSJ source] acknowledges. “The Web is a fundamentally different medium, and you have to completely revise your expectations of how your audience is going to use your content if you’re publishing exclusively online.”

For more analysis of online, run a query on this Beyond Search Web log for “mysteries of online”. I posted a series of write ups that bring together observations and findings I have compiled over the last 30 years based on my experience in online. Hint: demographics and habit come into play. MBAs thinking often goes off the rails, but my, oh my, are the mavens confident. A zippy search system doesn’t work without traffic. Come to think about it, neither does online advertising.

Stephen Arnold, April 15, 2009

Kickfire: Crunching Data Quickly

April 15, 2009

Reuters published “Kickfire Launches First Analytic Appliance for the Data Warehousing Mass Market” here. On the surface, Kickfire seems to be another player in the data management enabling business. Digging into the company’s Web site here reveals another angle–Kickfire is consumerizing high-end methods. The Reuters’ article said:

Kickfire’s revolutionary appliance delivers the industry`s highest performance per dollar spent. Starting at just $32,000, it makes the high-end capabilities of commercial database  systems available to every organization. Combining the industry`s first SQL chip… and an advanced column-store engine with full ACID compliance, it achieves the industry`s best price/performance based on rigorous benchmark tests and ensures complete data integrity.

In my opinion, the Kickfire approach blends several innovations. First, the company uses proprietary silicon packaged in an appliance. Cloud based consumer business analytics are slowly gaining traction. The Kickfire appliance is a here-and-now solution. Second, the appliance eliminates some (not all) of the headaches associated with scaling for industry standard number crunching methods. The performance of the Kickfire appliance becomes available without choking other enterprise systems. Finally, Kickfire implements some data structure methods that breathes life into traditional Codd tables.

Kickfire, a privately-held firm, is backed by blue-chip venture capital firms: Accel Partners, Greylock Partners, The Mayfield Fund and Pinnacle Ventures.

Stephen Arnold, April 15, 2009

The Google: Scores a Big Win

April 15, 2009

The goslings and I have been quite busy at the goose pond today. A happy quack to the reader in the UK who alerted me to ZDNet.co.uk’s story “Virgin to Migrate Customers onto Google Mail.” You can read the story here.

Colin Barker wrote:

The company said the rollout will be one of the largest deployments to date of Google Partner Edition Apps, which lets businesses and individual customers use Google’s communication and collaboration applications under their own domain names.

I think this announcement is a big deal. First, Virgin is a high profile company and the top Virgin is an executive who gets quite a bit of attention in major companies.  Second, this deal makes clear that it makes financial and technical sense for organizations to get out of the email business. Email has become complex and costly. Organizations like Virgin looked at the facts and made a decision to go with Googzilla. Smart choice. If litigation becomes necessary. The GOOG is in the archiving business too. The company doesn’t call much attention to its Postini-centric solution, but it is there and promises to slash the cost of some discovery actions.

What the Gmail deal means to this addled goose is that the Google Apps initiative is going to find increasingly attractive opportunities. Will Virgin stop at email? My hunch is that Virgin will be an interesting Google customer to watch. I give more detail about what can be done with the Google App Engine in my next column in KMWorld.

So, this is a big deal.

Stephen Arnold, April 15, 2009

Google: The Tapeworm

April 15, 2009

I enjoy the New York Times. I find the write ups a good mix of big city hipness and “we know more than you” information about major trends in the world. The editorials are fun too. Having worked at a daily paper and a big magazine publisher, I know how some of the ideas take shape. Outside contributions are useful as well. Getting a chunk of column inches can do wonders for a career or an idea.

I liked “Dinosaur at the Gate” here. The author is Maureen Dowd. She summarizes big media’s view of the GOOG. The image of “tapeworm” was fresh and amusing. I never thought of math as having tapeworm qualities, but I am an addled goose.

The bottomline is that this write up will probably have quite a bit of influence in shaping the dialog about Googzilla, a term I coined when on a panel with a Googler in 2005. The Googler laughed at my image of a smiling Googzilla as did the audience. I used the term affectionately in 2005. Then Googzilla was at the gate. Today Googzilla is in the city, kicking back at TGIF, sipping a mojito.

More about its influence within the core of the information world appears in Google: The Digital Gutenberg here. By the way, Google definitely has some characteristics of middleware, but it is more. Much, much more. I think Google is a positive in today’s information world, and I urge readers to consider “surfing on Google”. If this phrase doesn’t make sense, check out my two Google monographs, dating from 2005 here.

Stephen Arnold, April 15, 2009

Exclusive Interview with MaxxCat

April 15, 2009

I spoke with Jim Jackson on April 14, 2009. Maxxcat is a search and content processing vendor delivering appliance solutions. The full text of the interview appears below:

Why another appliance to address a search and content processing problem?

At MaxxCat, we believe that from the performance and cost perspectives, appliance based computing provides the best overall value. The GSA and Google Mini are the market leaders, but provide only moderate performance at an expensive price point.  We believe that by continuously obsessing about performance in the three major dimensions of search (volume of data, speed of retrieval, and crawl/indexing times), our appliances will continue to improve.  Software only solutions can not match the performance of our appliances.  Nor can software only, or general purpose hardware approaches provide the scaling, high availability or ease of use of a gray-box appliance.  From an overall cost perspective, even free software such as Lucene, may end up being more expensive than our drop-in and use appliance.

jim_600-optimized

Jim Jackson, Maxxcat

A second factor that is growing more important is the ease of integration of the appliance.  Many of our customers have found unique and unexpected uses for our appliances that would have been very difficult to implement with black box architectures like Google’s.  Our entry level appliance can be set up in 3 minutes, comes with a quick start guide that is only 12 pages long, and can be administered from two simple browser pages. That’s it!  Conversely, software such as Lucene has to be downloaded, configured, installed, understood, matched with suitable hardware. This is typically followed by a steep learning curve and consulting fees from experts who are involved in getting a working solution, which sometimes doesn’t work, or won’t scale.

But just because the appliance features easy integration, this does not mean that complex tasks cannot be accomplished with it.  To aid our customers in integrating our appliances with their computing environments, we expose most of the features of the appliance to a web API.  The appliance can be started, stopped, backed up, queried, pointed at content, SNMP monitored, and reported upon by external applications.   This greatly eases the burden on developers who wish to customize the output, crawl behavior and integration points with our appliance.  Of course this level of customization is available with open source software solutions, but at what price?  And most other hardware appliances do not expose the hardware and operating system to manipulation.

Throughput becomes an issue eventually. What are the scaling options you offer

Throughput is our major concern. Even our entry level appliance offers impressive performance using, for the most part, general purpose hardware.  We have developed a micro-kernel architecture that scales from our SB-250 appliance all the way through our 6 enterprise models.  Our clustering technology has been built to deliver performance over a wide range of the three dimensions that I mentioned before.  Some customers have huge volumes of data that are updated and queried relatively infrequently.  Our EX-5700 appliance runs the MaxxCAT kernel in a horizontal, front-facing cluster mode sitting on top of our proprietary SAN; storage heavy, adequate performance for retrieval.  Other customers may have very high search volumes on relatively smaller data sets (< 1 Exabyte). In this case, the MaxxCAT kernel runs the nodes in a stacked cluster for maximum parallelism of retrieval.  Same operating system, same search hardware, same query language, same configuration files etc, but two very different applications. Both heavy usage cases, but heavy in different dimensions.  So I guess the point I am trying to make is that you can say a system scales, but does it scale well in all dimensions, or can you just throw storage on it?  The MaxxCAT is the only appliance that we know of that offers multiple clustering paradigms from a single kernel.  And by the way, with the simple flick of a switch on one of the two administration screens I mentioned before, the clusters can be converted to H/A, with symmetric load balancing, automatic fault detection, recovery and fail over.

Where the the idea for the MaxxCat solution originate?

Maxxcat was inspired by the growing frustration with the intrinsic limitations of the GSA and Google Mini.  We were hearing lamentations in the market place with respect to pricing, licensing, uptime, performance and integration.  So…we seized the opportunity to build a very fast, inexpensive enterprise search capability that was much more open, and easier to integrate using the latest web technologies and general purpose hardware.  Originally, we had conceived it as a single stand alone appliance, but as we moved from alpha development to beta we realized that our core search kernel and algorithms would scale to much more complex computing topologies. This is why we began work on the clustering, H/A and SAN interfaces that have resulted in the EX-5000 series of appliances.

What’s a use case for your system?

I am going to answer your question twice, for the same price.  One of our customers had an application in which they had to continuously scan literally hundreds of millions of documents for certain phrases as part of a service that they were providing to their customers, and marry that data with a structured database.  The solution they had in place before working with us was a cobbled together mish mash of SQL databases, expensive server platforms and proprietary software.  They were using MS SQLServer to do full text searching, which is a performance disaster. They had queries that were running on very high end Dell quad core servers maxed out with memory that were taking 22 hours to process.  Our entry level enterprise appliance is now doing those same queries in under 10 minutes, but the excitement doesn’t stop there.  Because our architecture is so open, they were able to structure the output of the MaxxCAT into SQL statements that were fed back into their application and eliminate 6 pieces of hardware and two databases.  And now, for the free, second answer.  We are working with a consortium of publishers who all have very large volumes of data, but in widely varying formats, locations and platforms.  By using a MaxxCAT cluster, we are able to provide these customers, not customers from different divisions of the same company, but different companies, with unified access to their pooled data.  So the benefits in both of these cases is performance, economy, time to market, and ease of implementation.

Where did the name “MaxxCat” come from?

There are three (at least) versions of the story, and I do not feel empowered to arbitrate between the factions.  The acronym comes from Content Addressable Technology, an old CS/EE term. Most computer memories work by presenting the memory with an address, and the memory retrieves the content.  Our system works in reverse, the system is presented with content, and the addresses are found. A rival group, consisting primarily of Guitar Hero players,  claims that the name evokes a double x fast version of the Unix ‘cat’ command (wouldn’t MaxxGrep have been more appropriate?).  And the final faction, consisting primarily of our low level programmers claim that the name came from a very fast female cat, named Max who sometimes shows up at our offices.  I will make as many friends as enemies if I were to reveal my own leanings.  Meow.

What’s the product line up today?

Our entry level appliance is the SB-250, and starts at a price point of $1,995.  It can handle up to several million web pages or documents, depending upon size.  None of our appliances have artificial license restrictions based upon silly things like document counts.  We then have 6 models of our EX-5000 enterprise appliances that are configured in ever increasing numbers of nodes, storage, and throughput.  We really try to understand a customer’s application before making a recommendation, and prefer to do proofs of concept with the customer’s actual data, because, as any good search practitioner can tell you, the devil is in the data.

8. What is the technical approach of your search and content processing system?

We are most concerned with performance, scalability and ease of use.  First of all, we try to keep things as simple as possible, and if complexity is necessary, we try to bury it in the appliance, rather than making the customer deal with it.  A note on performance; our approach has been to start with general purpose hardware and a basic Linux configuration.  We then threw out most of Linux, and built our operating system that attempts to take advantage of every small detail we know about search.  A general purpose Linux machine has been designed to run databases, run graphics applications, handle network routing, sharing and interface to a wide range of devices and so forth.  It is sort of good at all of them, but not built from the ground up for any one of them.  This fact is part of the beauty of building a hardware appliance dedicated to one function — we can throw out most of the operating system that does things like network routing, process scheduling, user accounting and so forth, and make the hardware scream through only the things that are pertinent to search.  We are also obsessive about what may seem to be picayune details to most other software developers.  We have meetings where each line of code is reviewed and developers are berated for using one more byte or one more cycle than necessary.   If you watch the picoseconds, the microseconds will take care of themselves.

A lot of our development methodology would be anathema to other software firms.  We could not care less about portability or platform independence.  Object oriented is a wonderful idea, unless it costs one extra byte or cycle.  We literally have search algorithms that are so obscure, they take advantage of the Endianess of the platform.  When we want to do something fast, we go back to Knuth, Salton and Hartmanis, rather than reading about the latest greatest on the net.  We are very focused on keeping things small, fast, and tight.  If we have a choice between adding a feature or taking one out, it is nearly unanimous to take it out.  We are all infected with the joy of making code fast and small.  You might ask, “Isn’t that what optimizing compilers do”.  You would be laughed out of our building.  Optimizing compilers are not aware of the meta algorithms, the operating system threading, the file system structure and the like.  We consider an assembler a high level programming tool, sort of.  Unlike Microsoft Operating systems which keep getting bigger and slower, we are on a quest to make ours smaller, faster.  We are not satisfied yet, and maybe we won’t ever get there.  Hardware is changing really fast too, so the opportunities continue.

How has the Google Search Appliance affected the market for your firm’s appliance?

I think that the marketing and demand generation done by Google for the GSA is helping to create demand and awareness for enterprise search, which helps us.  Usually, especially on the higher end of the spectrum, people who are considering a GSA will shop a little, or when they come back with the price tag, their boss will tell them “What??? Shop This!”. They are very happy when they find out about us.  What we share with Google is a belief in box based search (they advocate a totally closed black box, we have a gray box philosophy where we hide what you don’t need to know about, but expose what you do).  Both of our companies have realized the benefits of dedicating hardware to a special task using low cost, mass produced components to build a platform.   Google offers massive brand awareness and a giant company (dare I say bureaucracy).  We offer our customers a higher performing, lower cost, extensible platform that makes it very easy to do things that are very difficult with the Google Mini or GSA.

What hooks / services does your API offer?

Every function that is available from the browser based user interface is exported through the API. In fact, our front end runs on top of the API, so customers who are so inclined to do so could rewrite or re-organize the management console.  Using the API, detailed machine status can be obtained. Things such as core temperature, queries per minute, available disk space, current crawl stats, errors and console logs are all at the user’s fingertips.  Furthermore, collections can be added, dropped, scheduled and downloaded through the API.  Our configuration and query languages are simple, text based protocols, and users can use text editors or software to generate and manipulate the control structures.  Don’t like how fast the MaxxCAT is crawling your intranet, or when?  Control it with external scheduling software.  We don’t want to build that and make you learn how to use it.  Use Unix cron for that if that’s what you like and are used to.  For security reasons, do you want to suspend query processing during non-business hours?  No problem.  Do it from a browser or do it from a mainframe.

We also offer a number of protocol connectors to talk to external systems — HTTP, HTTPS, NFS, FTP, ODBC.  And we can import the most common document formats, and provide a mechanism for customers to integrate additional format connectors.  We have licensed a very slick technology for indexing ODBC databases. A template can be created to create pages from the database and the template can be included in the MaxxCAT control file.  When it is time to update say, the invoice collection, the MaxxCAT can talk directly to the legacy system and pull the required records (or those that have changed or any other SQL selectable parameters), and format them as actual documents prior to indexing.  This takes a lot of work off of the integration team.  Databases are traditionally tricky to index, but we really like this solution.

With respect to customizing output, we emit a standard JSON object that contains the result and provide a simple templating language to format those results.  If users want to integrate the results with SSIs or external systems, it is very straightforward to pass this data around, and to manipulate it.  This is one area where we excel against Google, which only provides a very clunky XML output format that is server based, and hard to work with.  Our appliance can literally become a sub-routine in somebody else’s system.

What are new features and functions added since the last point release of your product?

Our 3.2 OS (not yet released) will provide improved indexing performance, a handful of new API methods, and most exciting for us, a template based ODBC extractor that should make pulling data out of SQL databases a breeze for our customers.   We also have scheduled toggle-switch H/A, but that may take a little more time to make it completely transparent to the users.

13. Consolidation and vendors going out of business like SurfRay seem to be a feature of the search sector. How will these business conditions affect your company?

Another strange thing about MaxxCAT, in addition to our iconoclastic development methods is our capital structure.  Unlike most technology companies, especially young ones, we live off of revenue, not equity infusions.  And we carry no debt.   So we are somewhat insulated from the current downturn in the capital markets, and intend to survive on customers, not investors.  Our major focus is to make our appliances better and faster.  Although we like to be involved in the evaluation process with our customers, in all but the most difficult of cases, we prefer to hand off the implementation to partners who are familiar with our capabilities and who can bring in-depth enterprise search know how into the mix.

Where do I go to get more information?

Visit www.maxxcat.com or email sales@maxxcat.com

Stephen Arnold, April 15, 2009

Google and Its Red Ink Geyser

April 15, 2009

Internet Evolution’s David Silversmith wrote “Google Losing up to $1.65M a Day on YouTube”. You can read it here. I would have slapped the title “So You Want to Be a Video Search Service?” I am not sure if the numbers are spot on. Talk about the Google’s losing $400 million a year or more has been floating around for quite a while. The point is that it is expensive to acquire video. host it, index it, and serve it. Not even Googzilla can deal with these costs. Hence, the new love birds: Googzilla and Universal.

Stephen Arnold, April 15, 2009

The Data Management Mismatch

April 15, 2009

I used to play table tennis in tournaments. Because table tennis is not America’s game, I found myself in matches with folks from other countries. I recall one evening in FAR on the Chambana campus I faced a fit Chinese fellow. We decided to hit a few and then play a match. In about 10 seconds, I realized that fellow was a squash player, and he had zero chance against me. There are gross similarities between squash and table tennis, but the context of each game is very different.

That’s the problem with describing one thing (ping pong) and squash (mainland China style). The words look similar and to the naive, the words may mean the same thing.

Now the data management mismatch. You can read a summary of a “controversial” report that pits the aging Codd database against the somewhat more modern MapReduce system. I describe the differences in my 2005 study The Google Legacy, and I won’t repeat them here.

Eric Lai’s “Researchers: Databases still beat Google’s MapReduce” here provides a good summary of this alleged face off. I am not going to dig into the entrails of this study nor the analysis by Mr. Lai. I do want to highlight this passage which caught my attention:

Databases “were significantly faster and required less code to implement each task, but took longer to tune and load the data,” the researchers write. Database clusters were between 3.1 and 6.5 times faster on a “variety of analytic tasks.” MapReduce also requires developers to write features or perform tasks manually that can be done automatically by most SQL databases, they wrote.

The paragraph makes clear that according to the wizards who ran this study, the Codd style database has some mileage left on it engine. I agree. In fact, I think some of the gurus at Google would agree as well.

What’s going on here is that the MapReduce system works really well for Google-scale, Google-type data operations for search and closely allied functions. When a Googler wants to crunch on a result set, the Googlers fire up a Codd database; for example, MySQL and do their thing.

Codd style databases can jump through hoops as well. But the point is that MapReduce makes certain types of large dataset tasks somewhat less costly to kit out.

I don’t think this is an either or. My research suggests that there is a growing interest in different types of data management systems. There are clever systems emerging from a number of companies. I have written about InfoBright, for instance.

I wrote a white paper with Sue Feldman which focused on a low profile Google project to tame dataspace. The notion is a step beyond Codd and MapReduce, yet dataspace has roots and shoots in both of these systems.

What we have is a mismatch? The capabilities of the systems are different. If I were to play the Chinese table tennis star in my basement, I would probably win. He would knock himself out on the hot water pipe that dips exactly where he steps to hit a forehand.

The context of the data management problem and the meaning of the words make a difference. Use the system that solves the problem.

Stephen Arnold, April 15, 2009

Lou Rosenfeld on Content Architecture

April 15, 2009

Editor’s Note: The Boye 09 Conference in Philadelphia takes place the first week in May 2009, May 5 to May 7, 2009, to be precise. Attendees can choose from a number of special interest tracks. These include strategy and governance, Intranet, Web content management, SharePoint, user experience, and eHealth. You can get more information about this conference here. One of the featured speakers, is Lou Rosenfeld. You can get more information here. Janus Boye spoke with Mr. Rosenfeld on April 14, 2009. The full text of the interview appears below.

Why is it so hard for organizations to get a grip on user experience design?

Because UX is an interdisciplinary pursuit. In most organizations, the people who need to work together to develop good experiences–designers, developers, content authors, customer service personnel, business analysts, product managers, and more–currently work in separate silos. Bad idea. Worse, these people already have a hard time working together because they don’t speak the same language.

Once you get them all in the same place and help them to communicate better, they’ll figure out the rest.

Why is web analytics relevant when talking about user experience?

Web sites exist to achieve goals of some sort. UX people, for various reasons, rely on qualitative research methods to ensure their designs meet those goals. Conversely, Web analytics people rely on quantitative methods. Both are incomplete without the other – one helps you figure out what’s going on, the other why. UX and WA folks two more groups that need help communicating; I’m hoping my talk in some small way helps them see how they fit together.

Is your book “Information Architecture for the World Wide Web” still relevant 11 years later?

Nah, not the first edition from 1998. It was geared toward developing sites–and information architectures–from scratch. But the second edition, which came out in 2002, was almost a completely new book, much longer and geared toward tuning existing sites that were groaning under the weight of lots of content: good and bad, old and new. The third edition–which was more of a light update–came out in 2006. I don’t imagine information architecture will ever lose relevance as long as there’s content. In any case, O’Reilly has sold about 130,000 copies, so apparently they think our book is relevant.

Does Facebook actually offer a better user experience after the redesign?

I really don’t know. I used to find Facebook an excellent platform for playing Scrabble, but thanks to Hasbro’s legal department, the Facebook version of Scrabble has gone the way of all flesh. Actually, I think it’s back now, but I’ve gotten too busy to fall again to its temptation.

Sorry, that’s something of an underhanded swipe at Facebook. But now, as before, I find it too difficult to figure out. I have a hard time finding (and installing) applications that should be at my fingertips. I’m overwhelmed – and, sometimes, troubled–by all the notifications which seem to be at the core of the new design. I’d far prefer to keep up with people via Twitter (I’m @louisrosenfeld), which actually integrates quite elegantly with the other tools I already use to communicate, like my blog (http://louisrosenfeld.com) and email. But I’m the wrong person to ask. I’m not likely Facebook’s target audience. And frankly, my opinion here is worth what you paid for it. Much better to do even a lightweight user study to answer your question.

Why are you speaking at a Philadelphia web conference organized by a company based in Denmark?

Because they asked so nicely. And because I hope that someday they’ll bring me to their Danish event, so I can take my daughter to the original Legoland.

Janus Boye, April 15, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta