SharePoint’s Bottlenecks: Databases Table Makes It Clear Now

August 12, 2008

Tucked away on the Microsoft MSDN Web site is this document, “Databases Table”. Sadly there is no author. I want to congratulate the person for what must have taken days to compile. This lengthy document gathers together and explains to some degree the database tables upon which the SharePoint edifice stands.

As useful as this enumeration is, for me the best information in the MSDN Web log entry is this re assuring statement:

Modifying the database schema or database structures is not supported. Changes that you make to the database contents may be overwritten when you install updates or service packs for Windows SharePoint Services, or when you upgrade an installation to the next product version.

I like to fiddle. I am officially on notice to keep my hex editor away from SharePoint’s database table. Not only will I lose my table changes, I can blow away my data. Nice to know.

Stephen Arnold, August 12, 2008

Googzilla Stumbles Then Apologizes (Gasp!)

August 12, 2008

I recall an earnest Googler in May 2008 accusing me of creating a “fake” screen shot from a Google patent document. I am confident that the Googler, bright smile, direct gaze, was supremely confident that as a Googley person, he was right. Well, he was wrong. The school-mother who crafted his supreme self-confidence and his fraternity brothers who fawned over his brilliance has his head in his patootie. I followed up with an email–not Gmail, thank my lucky stars. I provided the patent document number, and I invited him to give me a buzz to talk about what I called Google’s “profiling” service. The “invention” disclosed in a publicly available document converts a query such as “Michael Jackson” or some other proper noun into a dossier. Yep, just like the type of intelligence reports that wizards at McKinsey & Co. generate for their well-groomed clients.

Every time Google–an outfit I have dubbed Googzilla in honor of the giant, dangerous, but rubber suited monster from the Japanese horror films I loved in my youth–stumbles, I think of Googlers. I recall a situation a couple of years ago when a Googler showed up for a talk at the International Online Show in London. On the panel was a fellow who had worked at the original AltaVista.com. The Googler ran through a canned PowerPoint which was unfamiliar to him. His bright smile and earnest gaze did little to disguise his failure to look at the program, the title of his talk, or how the program would be orchestrated by yours truly. Well, the AltaVista.com wizard gutted and roasted the Googler to the delight of the crowd. At that time, I had a financial incentive to get the Googler off the roasting spit and back into his seat. I failed. The AltaVista.com wizard enjoyed the bar-b-que.

I have reserved comment about the increasing friability of Google operations that must read and write data. In my 2005 study The Google Legacy, I tried in my non-Googley way to explain that the engineers responsible for Google had focused on really fast read speeds. In fact, the company built upon the learnings of AltaVista.com and other information retrieval scientists to use commodity drives to deliver lightning fast performance using a wide range of engineering insights, clever techniques, and elbow grease to resolve known bottlenecks in serving queries. My publisher reports a spurt of interest in my Google studies. We hypothesized that now almost four years after the first analysis appeared, some folks are figuring out that Google has designs on more than online advertising. Also, I work through some of Google’s vulnerabilities such as a digital Achilles’ heel.

image

My hunch is that my analysis of Google’s weaknesses are now of interest because Googzilla appears to have feet of clay when read-write functions overwhelm the allocated resources.

Why am I reminding people of Google’s focus on reads and Google’s somewhat arrogant attitude toward competitors, customers, and, of course, addled me?

Easy.

Here are some links to bring you up to date on Google’s most recent online outage:

  • Google’s very own apology. Click here to read the sincere “We Feel Your Pain, and We’re Sorry”. Yep, I believe this.
  • TechCrunch’s report, including useful updates that pinpoint when Gmail went south and when the Googlers figured out what went wrong. Access this good write up here. I love the whale illustration, but I would have used my Googzilla art. Including the error message with another “We’re sorry but your Gmail account is currently experiencing errors” line. The wording is remarkable because an account experiences errors because the system is not working.
  • Rafe Needleman’s post which was a subtle reminder that some youthful thinkers rely on a communications medium more immediate (but almost as reliable) than Gmail. You can read his post here.

News aggregators have hundreds of stories about this failure, and I will leave it to you to click through the links on Daily Rotation, PopURLs, and Megite.

My take on this problem is that Google’s architecture does some things well (server result sets, track user behavior, sell ads) and others not so well (Gmail, Android, Knol). The decreasing interval between failure and visible loss of service is encouraging to some competitors, I surmise. Microsoft, despite its slow start, has delivered a reasonably solid Olympics service. So, Microsoft stays online; Google doesn’t. That should bring some smiles to the Microsoft faces.

Stephen Arnold, August 12, 2008

Virtualization for MOSS (SharePoint 2007): Sort of. Maybe. Some Day

August 12, 2008

Ketaanhs wrote “What Is the Support for Virtualization for MOSS (SharePoint 2007)? You can read the full post on MSDN Web logs here. First, the good news. Mr. Ketaanhs provides links to hard-to-find Microsoft documentation about virtualization; for example, KB897615 which explains a gotcha for everyone but premier support buyers. He has also scoured these fine expository documents for crucial SharePoint virtualization information. So, if you are a lucky MOSS licensee and want to use virtualization to maximize your “scale up and scale out” investments, jump to MSDN and download these files.

Second, the bad news. There is no solid information about support of virtualization in SharePoint 2007. There’s speculation, and Mr. Ketaanhs writes:

So currently as of Aug 08th 2008 we are awaiting an Official statement to come out in few weeks, until then I *assume* Microsoft Support may provide Commercial Reasonable Support.

Virtualization is one of the hot trends in server rooms. With upwards of 65 million SharePoint users, some of those IT managers would like to virtualize, squeeze more mileage from their hardware, and increase the performance of SharePoint when it processes documents, performs queries, and generates those tasty Web 2.0-style interfaces.

In my opinion, Microsoft continues with some of its pre-Ozzie code synchronization policies. Microsoft marketers hype virtualization and cook up zippy new product names like Hyper-V. Licensees, on the other hand, don’t have what they need to do substantive quarterly planning. Not good.

Stephen Arnold, August 12, 2008

hakia’s Founder Riza Berkan on Search

August 12, 2008

Dr. Riza Berkan, founder of hakia, a company engaged in search and content processing, reveals the depth of engineering behind the firm’s semantic technology. Dr. Berkan said here:

If you want broad semantic search, you have to develop the platform to support it, as we have. You cannot simply use an index and convert it to semantic search.

With its unique engineering foundation, the hakia system goes through a learning process similar to that of the human brain. Dr. Berkan added:

We take the page and content, and create queries and answers that can be asked to that page, which are then ready before the query comes.

He emphasized that “there is a level of suffering and discontent with the current solutions”. He continued:

I think the next phase of the search will have credibility rankings. For example, for medical searches, first you will see government results – FDA, National Institutes of Health, National Science Foundation. – then commercial – WebMD – then some doctor in Southern California – and then user contributed content. You give users such results with every search; for example, searching for Madonna, you first get her site, then her official fan site, and eventually fan Web logs.

You can read the full text of the interview with Dr. Riza Berkan on the ArnoldIT.com Web in the Search Wizards Speak series. The interview was conducted by Avi Deitcher for ArnoldIT.com.

Stephen Arnold, August 12, 2008

Data Centers: Part of the Cost Puzzle

August 11, 2008

The “commentary” is “Servers: Why thrifty Isn’t Nifty” which appears here. The “commentary” is by a wizard, Kenneth G. Brill, and he takes a strong stand on the topic of data center costs. The “commentary” is sponsored by SAP, an outfit that exercises servers to the max. Mr. Brill is the executive director of the highly regarded Uptime Institute in Santa Fe, New Mexico. Santa Fe is a high-tech haven. The Santa Fe Institute and numerous think tanks populate this city, a reasonable drive from LANL (Los Alamos National Laboratory). LANL is world famous for its security as you may know. With chaos theory and technical Jedis in every nook and cranny of the city except the art galleries, I am most respectful of ideas from that fair city’s intelligentsia.

The hook for the “commentary” is a report called Revolutionizing Data Center Efficiency. The guts of the report are recommendations to chief information officers about data centers. With the shift to cloud computing, data centers are hotter than a Project Runway winner’s little black dress. For me the most interesting part of this “commentary” was this statement:

One of these recommendations is to dramatically improve cost knowledge within IT…The facility investment required to merely plug-in the blades was an unplanned $54 million. An additional unplanned $30 million was required to run the blades over three years. So what appeared to be a $22 million decision was really an enterprise decision of over $106 million.

The “commentary” includes a table with data that backs up his analysis. The data are useful but as you will learn at the foot of this essay, offer only a partial glimpse of a more significant cost issue. You may want to read my modest essay about cost here.

What baffles me is the headline “Servers: Why Thrifty Isn’t Nifty”. Forbes’s editors are more in the know about language that I. I’m not sure about the use of the word thrifty because the “commentary” uses servers as an example of the cost analysis problem facing organizations when folks make assumptions without experience, adequate accounting methods, and a rat pack of 25 year old MBAs calculating costs.

Let me make this simple: cost estimations usually have little connection to the actual expenditures required to make a data center work. This applies to the data centers themselves, applications, or the add ons that organizations layer on their information technology infrastructure.

Poor cost analysis can sink the ship.

Mr. Brill has done a fine job of pointing out one cost hockey stick curve. There are others. Until folks like the sponsors of Mr. Brill’s “commentary” spell out what’s needed to run bloated and inefficient enterprise applications, cost overruns will remain standard operating procedure in organizations.

Before I close this encomium to Santa Fe thinking, may I point out:

  • Engineering data centers is not trivial
  • Traditional methods don’t work particularly well nor economically in the world of multi core servers and peta-scale storage devices stuffed into poor engineered facilities
  • Buying high end equipment increases costs because when one of those exotic gizmos dies, it is often tough to get a replacement or a fix quickly. The better approach is to view hardware like disposable napkins?

Which is better?

[a] Dirt cheap hardware that delivers 4X to 15X the performance of exotic name brand servers or [b]  really expensive hardware that both fails and runs slowly at an extremely high price? If you picked the disposable napkin approach, you are on the right track. Better engineering can do more than reduce the need for expensive, high end data center gear. By moving routine tasks to the operating system, other savings can be found. Re engineering cooling mechanisms can extend drive and power supply life and reduce power demands. There are other engineering options to exercise. Throwing money at a problem works if the money is “smart”. Stupid money just creates more overruns.

Mr. Brill’s “commentary” provides one view of data center costs, but I trust that he has the brand name versus generic costing in the report he references. If not, there’s always an opportunity in Santa Fe for opening an art gallery or joining LANL’s security team.

Stephen Arnold, August 11, 2008

Stephen Arnold, August 11, 2008

Microsoft SharePoint Olympic Watch

August 11, 2008

Microsoft’s plan to get Silverlight on millions of personal computers is now underway. It’s too soon to determine if it wins the gold for software downloads. One of my sources reports that the Mojave Web site runs on Flash. Hmmm. If this image  is real and not Photoshopped, , I guess most attendees know what this translucent blue screen shot is. A BSOD (blue screen of death) appears at the Chinese Olympics. You can see the ghostly image here, courtesy of PowerApple.com. In case the image 404s, here’s what I saw.

image

If you have any additional information about this “image”, please, let me know.

Stephen Arnold, August 11, 2008

Hot News: Google Is Interested in Content

August 11, 2008

That wild and wonderful New York Times has a rear view mirror article that you must read. It’s here and called “Is Google a Media Company?” by Miguel Helft, a really good writer. For me, the key point in the article is this statement:

Google has long insisted that it has no plans to own or create content, and that it is a friend, not a foe, of media companies. The Google search engine sends huge numbers of users to the digital doorsteps of thousands of media companies, many of which also rely on Google to place ads on their sites.

This is, of course, Google’s standard verbiage, its “game plan” talk.

Mr. Helft quotes a range of experts who offer a contrary view. A Harvard professor (David B. Yoffie) surely is in the know, is quoted saying:

‘If I am a content provider and I depend upon Google as a mechanism to drive traffic to me, should I fear that they may compete with me in the future?’ Professor Yoffie asked. ‘The answer is absolutely, positively yes.’

I talk a bit–I recall I devoted 20 or 25 pages–to Google’s publishing and content acquisition / distribution inventions in my August 2007 study Google Version 2.0. If you are curious, there’s more information here. Outsell, a nifty consulting outfit in Burlingame, California, recycled some of my research late last year. There is a bit of dissonance between what my research suggested and the tasty sound bites in the New York Times article.

The key point is that Google’s been beavering away in “publishing” for quite a while. Actually, publishing, even the word media, is too narrow. Google has somewhat wider vistas in mind if I understand its patent documents and technical papers.

It’s exciting to know that now the paper of record has made it official. Google has some media thoughts in its Googzilla brain.

Stephen Arnold, August 11, 2008

New Era in Visualization Emerging

August 11, 2008

Traditional tools can’t deal with petabyte and larger data sets. The Department of Homeland Security and the National Science Foundation have tapped Georgia Tech “as the lead academic research institution for all national Foundations of Data and Visual Analytics (FODAVA) research efforts. Seven other FODAVA Partnership Awards will be announced later this year, all working in conjunction with eleven Georgia Tech investigators to advance the field.” News of an initial grant of $3 million was reported by the university earlier this month. You can read one version of the announcement here in PhysOrg.com’s article “New Grant Supports Emerging Field of Massive Data Analysis and Visual Analytics.”

I think this is important because US government funding does have in impact on information-related innovation. Data mining, text mining, and search have blossomed with government support. A recipient of US government money is asked to look for ways to push the innovations into the commercial channel. The idea is for government funds to “give back” to citizens.

My view of visualization is mixed. Most of the startling visualizations such as Australia National University’s three dimensional rock are interesting but not practical. Last week I marveled at a collection of wild and crazy visualizations. The problem is that most visualizations get in the way of my understanding the data. A good example is Indiana University’s visualization of movies. I still have a heck of time figuring out what the visualization depicts. For me, using it is out of the question. You can see this visualization here.

My hunch is that visualization will be in my face in the months and years ahead.

Stephen Arnold, August 11, 2008

.

Google and Hosted Telephony

August 11, 2008

Network World’s Matthew Nickasch wrote an interesting article “Will Google Consider Hosted Telephony?”. You will want to read it in its entirety. The story is here. The premise of the story is that Google may offer a range of wireless services from the cloud. Mr. Nickasch asserts:

While no official plans, or even rumors have been released, a Google-hosted VoIP environment may be incredibly popular for organizations that utilize Google Apps for all other collaboration needs. We’ve seen our fair share of free hosted VoIP environments, like Skype, Free World Dialup, etc, but Google has yet to venture into such a market.

My own research into Google’s telephony activities suggested to me that:

  1. Google started working on mobile and other telephony services as early as 1999
  2. Telephony, based on my analysis of Google patent documents, has been one of the areas of intense activity for almost a decade.
  3. Google’s innovations extend deeper than hosted applications; for example, Google has a clever invention for routing calls in a distributed mesh environment.

Mr. Nickasch ends his article with several questions. What’s your take? Has Google lost its chance to make a telco or has Google a different game underway? In Google Version 2.0, I discuss options for Google’s “other game”. Hosted services are already here, and I think Googzilla is watching and learning.

Stephen Arnold, August 11, 2008

QlikTech: More Business Intelligence from Sweden

August 11, 2008

QlikTech is one of the fastest growing business intelligence companies in the world. The company’s Web site here asserts that it has more than 7,300 customers. Based in Lund, QlikTech has morphed from consulting to software. Its core technology is software that makes exploring data a point-and-click affair. Most graphical interfaces require that the user know what specific statistical processes can do and how they work. QlikTech’s approach exposes options. When a user clicks on a option, excluded in inappropriate options are grayed out. A typical manager can point and click her way analysis of a Web site’s traffic or explore cash flow.

The company offers a number of demos here. One caution. Not all of the demos work. In my tests, latency played a part in the Java demos I tried. The Ajax demos were for the most part acceptable, but several rendered empty browser screens. You will need to explore these on your computer.

Sybase has inked a deal with QlikTech for the company’s analytics system. You can read the article from the WebNewsWire here. Sybase will use QlikTech to provide “dashboards” to Sybase users who want to give Sybase licensees point-and-click interfaces and graphical displays that show important data at a glance.

Sybase offers its own analytic tools (Sybase IQ), but a typical user needs training and technical expertise that most managers cannot acquire quickly. So, QlikTech to the rescue. QlikView operates in-memory, thus eliminating the hassle of building cubes and delays associated with traditional queries. The QlikView system automatically associates related data as a user clicks on options in the interface. With the in memory approach, a user can whip through data in a more fluid manner.

Business intelligence is becoming the new “search”. QlikView’s technology can manipulate most structured data.

Stephen Arnold, August 11, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta