IBM and Standards

September 24, 2008

The headline “IBM May Quit Technology Standards Body”, if true, marks an important change in direction at IBM. The article here references the Wall Street Journal asserting:

IBM has become frustrated by what it considers opaque processes and poor decision-making at some of the hundreds of bodies that set technical standards for everything from data-storage systems to programming languages…

In my opinion, IBM’s effort to support open source was a useful endorsement of open source. The Eclipse Foundation owes IBM a debt which it may not be able to repay. Now, IBM and other super platforms may be shifting back to the good, old, and lucrative days of walled gardens. In a world distorted by Google’s gravitational pull, companies like IBM have to protect their assets. Standards may be a problem, not a solution. The losers? I think it will be small fish like me.

Stephen Arnold, September 24, 2008

Google and Sparse Tables

September 24, 2008

Google received a patent for an invention crafted by some of the firm’s most wizardly wizards; for example, Jeff Dean, Sanjay Ghemawat, and Andrew Fikes, among others. US7,428,524 is a plumbing invention with the helpful title “Large Scale Data Storage in Sparse Tables”. The abstract said:

Each of a plurality of data items is stored in a table data structure. A row identifier and column identifier are associated with each respective data item, and each respective item is stored at a logical location in the table data structure specified by its row identifier and column identifier. A plurality of data items is stored in a cell of the table data structure, and a timestamp is associated with each of the plurality of data items stored in the cell. Each of the data items stored in the cell has the same row identifier, the same column identifier, and a distinct timestamp. In some embodiments, each row identifier is a string of arbitrary length and arbitrary value. Similarly, in some embodiments each column identifier is a string of arbitrary length and arbitrary value.

Don’t let the fuzzy legalese put you to sleep. This is a key infrastructure invention which adds one more paving stone to Google’s building a Roman road right to the heart of enterprise data management and high performance for consumer facing services. You can obtain a copy from the USPTO’s Web site here.

 

Stephen Arnold, September 24, 2008

SharePoint Thesaurus Joy

September 24, 2008

I4heard more about SharePoint than Google today at the Enterprise Search Summit. Like it or not, SharePoint plays a prominent role in the world of enterprise information management. The Microsoft Enterprise Search Web log added some joy to my otherwise dreary day. The article “How to Customize the Thesaurus in SharePoint Search and Search Server” is a useful read. You can access the essay published on September 23, 2008, here. The article includes an explanation, a code sample, and useful notes. Highly recommended.

Stephen Arnold, September 24, 2008

A Head in the Clouds

September 23, 2008

Disclaimer: this is a live on the fly post during a talk. I may edit it later.

I wormed my way into Werner Vogels’ keynote at the Streaming Media conference in San Jose, California. The title of this Web log post is not precisely what Dr. Vogels’ typed on his title slide. He offered “Ahead in the Clouds”, and the idea is that Amazon is leaving Google, Salesforce.com, and others like Apple behind. My version of the title makes clear my skepticism about some of the cloud initiatives for people my age. I know that those under 20 in body and mind see the era of clunky PCs, weird laptops, and other assorted access devices that promise unparalleled freedom. I don’t want to be free of my computing infrastructure but I want to learn. I’m perched on a metal chair with an open mind. I want to capture two or three ideas from Dr. Vogels and then offer my own comments. If you want a complete summary of his remarks, look for Web log postings from “real” journalists; I’m the addled goose, not a human tape recorder.

First, the subtitle of the talk is “The Power of Infrastructure as a Service”. I think I understand, but I wonder what happens if I have lousy bandwidth and the service crashes. Uptime and stability are often a work in progress even at Amazon as I await the lecture. In the back of my mind is the hunch that getting customers to rent infrastructure needed to deliver Amazon’s ecommerce services is a financial angle first and a substantive revenue generator second. I wonder how distant the Amazon Web services’ revenue is from Amazon’s retail revenue? If I remember I will try to find this number.

Second, looks to me as if this keynote is outpulling the other two going on at the same time. Amazon is a much bigger “name brand” than the consultants and software vendors competing for an audience. I estimate the crowd at about 150. You can buy an audio version of Dr. Vogels’ talk at www.streamingmedia.com. No information about the cost or who can buy the talks.

Third, a video is running with quite a few Microsoft centric folks in the images. Site referenced is Animoto. It is not clear if this is an Amazon-allied enterprise. Animoto is a music matching type site. Animoto is running on Amazon Web services. I wonder if Amazon is defraying some of the fees for a share in the company. Animoto is using all of Amazon’s Web services, so it’s a smart start up. Note: sound system is making it tough for me to parse Dr. Vogels’ speech. Animoto, if I heard correctly, delivers its users instant audio gratification.

Fourth, a slide of instance usage shows steady rise over time. I can’t make out the y axis or the x axis. The slide shows Animoto’s usage over time. The company can handle 35,000 customers per hour. Amazon made additoinal resources available. At start up 50 servers and now work is spread over 5,000 servers. The scaling is automatic and Animoto is happy. The pay off is that Animoto’s capital expense is minimized.

Fifth, most of the people in the room are Amazon customers. Now the meat of the talk–the technology side of Amazon. Amazon is a technology company. Technology is at the heart of Amazon. “We just happen to do retail,” says Dr. Vogels. Another graph showing the bandwidth demand over time. This is a hockey stick graph. Amazon Web services is sucking more bandwidth that “regular retail” Amazon. I wonder how the telecommunjications costs work out. The current slide shows the growth in Amazon developers. Now the company has 400,000 developers. The next graph shows a diagram that looks like a picture of an atom exploding. I am not sure what the graph depicts. There are no data and no labels on the chart. “It took us 10 to 12 years to get this Amazon architecture right.” My recollection is that Dr. Vogels joined the company more recently. I will have to look up his date of joining.

Sixth, Dr. Vogels is showing a list of the cost “heavy lifting” that Amazon has done. The idea is that AWS is a “shared services platform”. The infrastructure services scale up and down and are “highly reliable.” I wonder if uptime data will be available in this talk or on the Amazon Web site. The last time I looked, I could not find hard data to support these assertions about uptime as I recall.

Seventh, the services are now available as a content delivery network. This will be a “pay as you go” service. One benefit is scaling up and scaling down. The down scaling takes place in a matter of minutes. Amazon has “spent billions of dollars over ten years to create the infrastructure.” No data provided on total investment.

Eighth, the AWS story is the core of this presentation because it holds down production costs and it is a distribution medium. Companies in the media business want to hold down costs and get distribution. New services can be enabled. The idea is that it is easier and cheaper to build a successful business using AWS.

Ninth, the four stages — produce, encode, distribute, and archive — of a business and AWS can play a role in each stage. Dr. Vogels is going through companies using AWS to deliver their media services. The Web site names are unfamiliar to me and there is no labeling of the sites on the PowerPoint slides. These AWS customers get “extremely high reliability services”. One site is RenderRocket.com. AWS provides capacity to this company. Vimeo.com uses AWS; the site is a social video site. The Indy Racing League uses AWS. IRL shows videos, delivers commentary, and community services. IRL reported a 50 percent savings using AWS. No figures provided. Another video example. This site allows the user to view the scene by selecting different camera’s. Panda Video is an open source community video service. The video sites are hard for me to differentiate. The message is clear: lots of buyers, reliable service, and more economical than using other options. No data on the specific charges for services and bandwidth.

Tenth, the “billions of objects in Amazon S3”. This slide shows growth but there is not definition of an object, so the slide is floating without concrete back up. I now want more substance, not just a run through of small sites using the AWS. I guess I am showing my age.

Stephen Arnold, September 23, 2008

Information Overload Is a Filter Problem

September 23, 2008

I just clicked through some of the hundreds of posts about the Google Android phone. I was startled by the redundancy in the posts. There were some useful items buried in the flood of messages, but none of these added to my understanding of this Google initiative.

Bored and underwhelmed, I turned my attention to other information snagged by my newsreader. My eye was hooded by the video of Clay Shirky’s keynote at Web 2.0 Expo during the week of September 15. You can watch the talk here. The snippet that caught my attention was this remark:

Privacy is a way of managing information flow. The inefficiency of information flow wasn’t a bug, it was a feature.

I found this comment somewhat disturbing even though I agree with most of Mr. Shirky’s comments. Here’s what troubled me:

  1. Information flow in today’s volumes are largely unexpected and not fully understood. As a result, most organizations and experts don’t know how to address the issues of data flow scale in a helpful manner. Social “voting”, old fashioned key word filters, and zingy visualizations of hot spots can help as well as give users a sense of false confidence. That’s risky.
  2. Enterprise and consumer systems are mostly toys in that only a handful of services can operate at petabyte scale. Even mid sized businesses are struggling with terabyte flows and most tools are not very good, economical, or easy to use. I am concerned about the assumption that these systems deliver good enough solutions. I don’t think these systems do. Example: the financial crashes caused by flawed models’ ability to pinpoint significant data fed into them. A trillion dollar mistake strikes me as a relatively big problem.
  3. Social media is one tool, and it is [a] not understood, [b] immature, and [c] chock full of potential weaknesses. Many of these issues–such as security–will be addressed over time. For now, I think the risks in regulated companies may outweigh the benefits. Another silver bullet shifts the focus from problem solving to a quick fix.

The final issue I have is that I don’t have an answer to this question: “When I don’t know what I need to answer my question, what do I filter in and out?” Information does not behave like some other human constructs. For example a doctor who misdiagnoses a problem, prescribes the wrong treatment, and assumes her solution is the right one can injure, maybe kill, a patient. The doctor filtered information, but the decision was not optimal.

I am not yet convinced that this “social” trends in information will do much to alleviate the severe information problems that face most organizations. I am certainly not trendy, and I need to see fungible evidence that the payoffs are substantive, not just another wagon load of baloney sold to pump cash into vendors’ threadbare pockets.

Stephen Arnold, September 23, 2008

Knol Understanding

September 23, 2008

Slate’s Farhad Manjoo’s “Why Google’s Online Encyclopedia Will Never Be as Good as Wikipedia” takes a somewhat frosty stance toward Knol. You can read his interesting essay here. For me the most significant point was this one:

Knol is a wasteland of such articles: text copied from elsewhere, outdated entries abandoned by their creators, self-promotion, spam, and a great many old college papers that people have dug up from their files. Part of Knol’s problem is its novelty. Google opened the system for public contribution just a couple months ago, so it’s unreasonable to expect too much of it at the moment; Wikipedia took years to attract the sort of contributors and editors who’ve made it the amazing resource it is now.

Knol is one of those Google products that appear and seem to have little or no overt support. I agree. I would like to make three comments:

  1. Knol may be a way for Google to get content for itself first and then secondarily for its users. Google wants information, and Knol is a different mechanism for information acquisition. Assuming that it is a Wikipedia may only be partially correct.
  2. Knol, like many other Google services, does not appear to have a champion. As a result, Knol evolves slowly or not at all. Knol may be another way for Google to determine interest, learn about authors who are alleged experts, and determine if submitted content validates or invalidates other data known to Google.
  3. Knol may be part of a larger grid or data ecosystem. As a result, looking at it out of context and comparing it to a product with which it may not be designed to compete might be a partially informed approach.

Based on my analysis of the Google JotSpot acquisition and the still youthful Knol service, I’m not prepared to label Knol or describe it as either a success or failure. In my 10pinion, Knol is a multi purpose beta. Its principal value may be in the enterprise, not the consumer space. But for me, I have too little data and an incomplete understanding of how the JotSpot “plumbing” is implemented; therefore, I am neutral. What’s your view?

Stephen Arnold, September 23, 2008

Amazon Oracle in Cloud Services Play

September 23, 2008

Amazon, the company run by the world’s smartest man, has aced Google again. Amazon’s information technology budget is a fraction of Google’s. Over the last three years, Amazon has beaten Google to the punch when it comes to cloud computing. Based on this article on the Amazon Web services Web log, Amazon is now offering Oracle database services on the AWS platform. Jeff Bezos has had a sixth sense or a heck of a Google technology watching operation in place. Amazon has moved more quickly than Google to deliver cloud services that Google * could * have delivered but did not. For example, the work to worker service called MT or Mechanical Turk aced the GOOG. The Amazon storage service beat the GOOG to the market. The elastic cloud service was first out of the gate. Now, Amazon with a fraction of Google’s technical horsepower and information technology budget must watch and learn from Amazon’s Oracle deal. I recall reading somewhere that at the core of Amazon beats the aging but reliable Oracle database. I don’t know if this is true any longer, but I was not expecting this type of deal. Amazon has been making noise with Linux and open source plus some stealth graduate students from European universities. Oracle was a bolt from the blue for me.

Will Oracle prove to be cloudable? Probably, but I anticipate some latency issues. Developers who assume that Oracle’s tricks can be learned on the fly are likely to create some problems. Most of these will be worked out in time.

The larger question is, “What will Google do?” My research provided some data, not definitive data unfortunately, that Google could offer a cloud based enterprise data management service. Google has the plumbing. Its patent documents reveal nifty technology to allow an enterprise to “hook” into the Google infrastructure to use Google services to crunch data. Google has the next generation data management tools that many organizations need at a time when data volume threatens to choke existing database systems. Frankly, I’m not sure.

Here are my thoughts about this surprising Amazon move:

  1. Google either has to take action to position itself against Amazon, a company defining the cloud service space for some developers, or be content to be a follower. Google, in fact, may be acquiring some of Microsoft’s market methods, which may be both good and bad.
  2. Amazon has to make the Oracle service work. Cooking up an S3 or EC2 is one thing. Delivering Oracle services is another. Amazon has a spotty record with regard to stability and uptime. A flop might open the door for a competitor to supplant Amazon. Google could exploit such an Amazon stumble, but the company seems to have a fuzzier view of the enterprise market and may not be able to act quickly with regard to Amazon.
  3. The Amazon aggressiveness might force Google to buy Salesforce.com, deal with the programming issues, and use Salesforce.com’s marketing position as a launch pad in an attempt to wrest momentum from Amazon.

You can read a different take on this Amazon development in Larry Dignan’s “Amazon Adds Oracle Support to EC2” here.

What’s clear to me is that Amazon has raised the stakes for Google in cloud computing services.

Stephen Arnold, September 23, 2008

SharePoint: Improving Performance

September 23, 2008

In my opinion, SharePoint is a slow poke. Among the reasons:

  • SQL Server bottlenecks
  • My old pal IIS
  • Churning when complex pages experience latency because needed data are scattered far and wide across the SharePoint landscape.

In what has to be the most amazing description of sluggish performance, Microsoft has released SharePoint Performance Optimization: How Microsoft IT Increases Availability and Decreases Rendering Time of SharePoint Sites . This is a 27 page Word document, which I was able to download here.

I scanned the white paper. I did not dig through it. The good stuff appears after the boilerplate about how to find out what part of the SharePoint system is the problem. In my experience, it’s not “one part”. Performance issues arise when there are lots of users, complex “sites”, and when some of the other required servers are tossed into the stew.

A happy quack to Nick MacKechnie who pointed to this Microsoft white paper in his Web log here.

Stephen Arnold, September 23, 2008

VideoSurf: Video Metasearch

September 23, 2008

I received an invitation to preview VideoSurf, a video metasearch provider, based in San Mateo, California. I tested the system whilst recovering from my wonderful Northwest Airlines flight from Europe to the US of A. When I fired up my laptop with the high speed Verizon service, I couldn’t get the video to run. When I switched to a high speed connection in my office, the search results were snappy and the videos I viewed ran without a hitch. Nice high speed network, Verizon.

The system offers a number of useful features:

  • When I misspelled Google, the system offered a “did you mean” to fix up my lousy typing
  • A handy checkbox in the left hand column allowed me to exclude certain video sites from the query. I noticed that the “world’s largest video search engine” Blinkx was not included.
  • There’s a porn and no porn filter, which you can use to turn on porn. However, when I ran my test query “teen dancing” on the non-porn setting, I got some pretty exciting videos in my result set. I was too tired to watch more than a few seconds of gyrations to conclude that the non porn filter needs some fine tuning.

VideoSurf analyzes the contents of video. Most video search engines work with metadata and close caption information. Googzilla, not surprisingly, has introduced its own technology to index the audio content of files. For now, I thought VideoSurf was useful for general purpose queries. I did not find it as helpful for locating Google lectures at universities or for pinpointing presentations given at various Microsoft events. But it’s early days for the service.

videosurf screen bill gates

This is what I saw when I ran my test query “Bill Gates”.

The company says here:

VideoSurf has created a better way for users to search, discover and watch online videos. Using a unique combination of new computer vision and fast computation methods, VideoSurf has taught computers to “see” inside videos to find content in a fast, efficient, and scalable way. Basing its search on visual identification, rather than text only, VideoSurf’s computer vision video search engine provides more relevant results and a better experience to let users find and discover the videos they really want to watch. With over 10 billion (and rapidly growing!) visual moments indexed from videos found across the web, VideoSurf allows consumers to visually navigate through their results to easily find the specific scenes, people or moments they most want to see. Users can now spend less time searching and more time being entertained! VideoSurf was founded in 2006 by leading experts in search, computer vision and fast computation technology and aims to become the destination for users looking to find, discover and watch online videos. The company is based in San Mateo, California.

The company was founded by Lior Delgo of FareChase.com fame. The technical honcho is Achi Brandt, who is a certified math whiz. The rest of the company’s management team is here.

The service merits a closer look.

Stephen Arnold, September 23, 2008

Autonomy: Compliance Initiative

September 23, 2008

Autonomy bought Zantaz in July 2007 for $375 million. The company continues to enrich its compliance line of services. For example, Autonomy has been quick to roll out services that need information management, search, and content processing. Examples include the firm’s Zantaz bundle described here in April 2008, and  its recent compliance with the UK’s FSA Conduct of Business Sourcebook (COBS) requirements. Competitors in the search, content processing, and records management markets will want to pay close attention to what Autonomy is doing. I’ve been convinced for several years that Autonomy is one of the quickest reacting search vendors. New opportunities appear in Autonomy’s marketing collateral and news releases with greater precision than in the mid range consultants’ reports about industry trends. Autonomy has a nose for trends and beats many of its competitors to these markets.

As I was thinking about Autonomy, I recalled an article that appeared in Silicon Valley Watcher in April 2008. I was able to locate a copy of that article here. Written by Tom Foremski, the write up had the zippy title “A Policeman Inside Your Commuter and Inside Your Corporate Blog. Autonomy Releases Software that Flags Illegal Communications and Other Corporate Content.” For me, the most interesting comment in the article was:

There are some good and bad aspects to this software. The bad is a big brother type use for it…It could be used to restrict blogging. A lot of people tell me that large corporations are scared of blogs violating a regulation and so every corporate blog entry has to be run through lawyers– it has to be “lawyered.” This can take time, days, even weeks. Paradoxically, I think AIG could be used to clear a blog post in real-time and could thus increase the amount of good, legal information that company workers can share in public. Either way, it automates some of the tasks of a lawyer…. Less lawyering, means lower operating costs, which maximize share holder value, and that’s what corporate officers are required to do.

With the great concern about Google I heard in my various meetings in Europe last week, I was surprised that most of those Google critics were blissfully ignorant of vendors such as Autonomy who have robust tools for monitoring available and in use. I suppose the difference is that an organization can monitor in order to comply with regulations. In the next month or so, I want to profile some of the companies with content monitoring systems. I will pick a handful of representative companies. Google’s not the only game in town, not by a long shot.

Stephen Arnold, September 22, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta