For SharePoint and Dot Net Fans: The London Stock Exchange Case

September 13, 2008

Cyber cynic Stephen J. Vaughan-Nichols wrote “London Stock Exchange Suffers Dot Net Crash”. You should click here and read this well-written post. Do it now, gentle readers. The gist of the story is that LSE, with the help of Accenture and Microsoft, built a near real time system running on lots of Hewlett Packard upscale servers, Windows Server 2003, and my old pal, SQL Server 2000. The architecture was designed to run really fast, a feat my team has never achieved with Windows Server or SQL Server without lots of tricks and lots of scale up and scale out work. The LSE crashed. For me the most significant statement in the write up was:

Sorry, Microsoft, .NET Framework is simply incapable of performing this kind of work, and SQL Server 2000, or any version of SQL Server really, can’t possibly handle the world’s number three stock exchange’s transaction load on a consistent basis. I’d been hearing from friends who trade on the LSE for ages about how slow the system could get. Now, I know why.

Why did I find this interesting? Three reasons:

  1. There’s a lot of cheerleading for Microsoft SharePoint. This LSE melt down is a reminder that even with experts and resources, the Dot Net / Windows Server / SQL Server triumvirate get along about as well as Pompey, Crassus and Caesar. Pretty exciting interactions with this group.
  2. Microsoft is pushing hard on cloud computing. If the LSE can’t stay up, what’s that suggest for mission critical enterprise applications running in Microsoft’s brand new data centers running on similar hardware and using the same triumvirate of software
  3. Speed and Dot Net are not like peanut butter and jelly or ham and eggs. Making Microsoft software go fast requires significant engineering work and sophisticated hardware. The speed ups don’t come in software, file systems, or data management methods. Think really expensive engineering year in and year out.

I know there are quite a few Dot Net fans out there. We have it running on one of our servers. Are your experiences like mine, generally good. Or are your experiences like the LSE, less than stellar. Oh, Mr. Vaughan-Nichols asserts that the LSE is starting to use Linux on its hardware.

Stephen Arnold, September 13, 2008

Google and Content: Way Back in 1999

September 13, 2008

Nine years ago Google was a search engine, right? If you said, “Yes,” you were not wrong but not correct either. Google was worrying about static Web pages and ways to inject content into those Web pages. The idea was to get “some arbitrary input” from the user, and Google would take it from there. In 1999, Google’s wizards were working on ways to respond to user actions, create methods to assemble pertinent content likely to match the user’s need, and generate a Web page with that disparate information.

Why do I care about Google in 1999? Two reasons:

  1. Google was thinking “publishing” type thoughts a long time ago
  2. A patent with information on the specific system and method just popped out of the USPTO’s extremely efficient system.

The patent in question is US7424478 B2. Google filed it in 2000, received patent 6728705 and now this most recent incarnation. The title is “System and Method for Selecting Content for Displaying over the Internet Based upon Some User Input.” With the recent release of Chrome, the notion of assembling and publishing content from disparate sources is somewhat analogous to what Ziff Communications Co. used to do when it was publishing magazines or what its database units did when generating job opportunities in its General Business File product.

With Google scanning books and newspapers, it seems logical that it would take user input and assemble a Web page that goes beyond a laundry list. For me, the importance of this invention is that the GOOG was thinking these thoughts before it had much search traffic or money. Postscript: the mock screen shots are fun as well. You can see the sites that were catching Google’s attention almost a decade ago. Anyone remember Go.com?

Stephen Arnold, September 13, 2008

eDiscovery: Speed Bumps Annoy Billing Attorneys

September 12, 2008

A happy quack to my Australian reader who called “eDiscovery Performance Still a Worry”. The article by Greg McNevin appeared on the IDM.net.au Web site on September 10, 2008. The main point of the write up is that 60 percent of those polled about their organization’s eDiscovery litigation support system said, “Dog slow.” The more felicitous wording chosen by Mr. McNevin was:

The survey also found that despite 80 percent of organisations claiming to have made an investment in IT to address discovery challenges, 60 percent of respondents think their IT department is not always able to deliver information quickly enough for them to do their legal job efficiently.

The survey was conducted by Dynamic Markets, who polled 300 in house legal eagles in the Uk, Germany, and the Netherlands. My hunch is that the 60 percent figure may well apply in North America as well. My own research unearthed the fact that two thirds of the users of enterprise search systems were dissatisfied with those systems. The 60 percent score matches up well.

In my view, the larger implication of this CommVault study is that when it comes to text and content processing, more than half the users go away annoyed or use the system whilst grumbling and complaining.

What are vendors doing? There’s quite a bit of activity in the eDiscovery arena. More gladiators arrive to take the place of those who fall on their swords, get bought as trophies, or die at hands of another gladiator. Sadly, the activity does not address the issue of speed. In the sense for this context, “speed” in not three millisecond response time. “Speed” means transforming content, updating indexes, and generating the reports needed to figure out what information is where in the discovered information.

Many vendors are counting on Intel to solve the “speed” problem. I don’t think faster chips will do much, however. The “speed” problem is that eDiscovery relies on a great many processes. Lawyers, in general, have a need for what’s required to meet a deadline. There’s little reason for them to trouble their keen legal minds with such details as content throughput, malformed XML, flawed metatagging, and trashed indexes after an index update.

eDiscovery’s dissatisfaction score mirrors the larger problems with search and content processing. There’s no fix coming that will convert a grim black and white image to a Kodachrome version of reality.

Stephen Arnold, September 12, 2008

Convera Ties Up with Fast Search

September 12, 2008

I have to admit that I was surprised with this headline “Convera and Fast, a Microsoft Subsidiary Announce Extended Business Relationship.” You can read the story from MarketWatch here. (Note: this is a wacky url and I have a hunch the source will 404 unless you click in a timely manner.) After releasing its Web Part for SharePoint, I was expecting a tapering off in Fast Search & Transfer innovation. Was I surprised? Yes, a tie up between Convera (formerly Excalibur Technologies) and Microsoft’s enterprise search unit is news. Convera, like Fast Search, seems to have gone through a rough patch on the information superhighway. There was the marriage and divorce from Intel and the NBA several years ago. Then there was a decline in revenues. Finally, Convera exited the enterprise search business to focus on building vertical search engines for publishers. My recollection is that the company’s revenues continue to dog paddle. Fast, as you may know, is alleged to have made a math error with its financial reports in the 24 months prior to its sell out to Microsoft in April 2008 for $1.2 billion. I thought opposites attracted, and these companies seem to have some interesting similarities. I recall that Autonomy bought a chunk of Convera. Then Fast Search jumped in and bought another piece of Convera. There’s a history between Fast Search and Convera, and obviously the management teams of both companies get along swimmingly.

According to the write up that appeared on September 11, 2008:

Convera will integrate FAST ESP(TM) search capabilities into its hosted vertical search solution for publishers. Publishers will be able to configure and customize an integrated site search and vertical search solution using Convera’s Publisher Control Panel (PCP).  Additionally, Convera will extend FAST AdMomentum across the Convera search platform for publishers. Customers of the Convera professional network would be able to leverage targeted search and contextual ad inventory supplied by Microsoft and distributed from the Convera platform.

I certainly hope this deal becomes a 1 + 1 = 3 for their respective stakeholders. Fast Search has faded to the background at Microsoft from my vantage point. Convera had, in all honesty, dropped completely off my radar. The Intel – NBA blow ups waved me off the company years ago. My take on this deal is that I am not sure it will generate significant revenue unless Convera has found a way to out Google Google. This seems unlikely to me. Agree? Disagree? Help me learn.

Stephen Arnold, September 12, 2008

Search: Google’s 10 Percent Problem

September 11, 2008

I love it when Google explains the future of search. Since Google equals search for more than 70 percent of the users in North America and even more outside the US, the future of search means Google. And what does Google’s helpful Google Web log here tells us:

So what’s our straightforward definition of the ideal search engine? Your best friend with instant access to all the world’s facts and a photographic memory of everything you’ve seen and know. That search engine could tailor answers to you based on your preferences, your existing knowledge and the best available information; it could ask for clarification and present the answers in whatever setting or media worked best. That ideal search engine could have easily and elegantly quenched my withdrawal and fueled my addiction on Saturday.

The “universal search” play announced at the hastily conceived Searchology news conference–anyone remember that?–has fallen by the wayside. I have wondered if the BearStearns’ publication of the Google Programmable Search Engine report and the suggestion that Google may be angling to become the Semantic Web spawned that Searchology program.

I don’t think search is a 10 percent problem for Google. The problem is bandwidth, regulations, traffic, and the market. After digging through Google’s technical papers and patent documents, I have reached the conclusion that the GOOG has the basics in place for next-generation search; for example:

  • Search without search
  • Dossier generation
  • Predictive content assembly
  • Integration of multiple functions because “search” is simply a way station on the path to solving a problem.

Most of the search pundits getting regular paychecks for now from mid level consulting firms assert that we are at the first step or Day One of a long journey with regard to search. Sorry, mid range MBAs. Search–key word variety–has been nailed. Meeting the needs of the herd searcher–nailed. Personalization of results–nailed.

What’s next are these search solutions. The reason that vendors are chasing niches like eDiscovery and call center support is simple. These are problems that can be addressed in part by information access.

Meanwhile the GOOG sits in its lairs and ponders when and how to release to maximum advantage the PSE, dataspaces, “I’m feeling doubly lucky” and dozens of other next generation search goodies, including social. Keep in mind that the notion of clicks is a social function. Google’s been social since the early days of BackRub.

There you have it. Google has a 10 percent challenge. In my opinion, that last 10 percent will be tough. Lawyers and other statistically messy non-algorithmic operations may now govern Googzilla’s future. If you want links to these Google references, you can find them here. My rescue boxer Tess needs special medical attention, so you have to buy my studies for the details. Sorry. Rescue boxers come before free Web log readers. Such is life. Sigh.

Stephen Arnold, September 11, 2008

Tribune Says: Google’s Automated Indexing Not Good

September 11, 2008

I have been a critic of Sam Zell’s Tribune since I tangled with the site for my 86 year old father. You can read my negative views of the site’s usability, its indexing, and its method of displaying content here.

Now on with my comments on this Marketwatch story titled “Tribune Blames Google for Damaging News Story” by John Letzing, a good journalist in my book. Mr. Letzing reports that Google’s automated crawler and indexing system could not figure out that a story from 2002 was old. As a result, the “old” story appeared in Google News and the stock of United Airlines took a hit. The Tribune, according to the story, blames Google.

Hold your horses. This problem is identical to the folks who say, “Index my servers. The information on them is what we want indexed.” As soon as the index goes live, these same folks complain that the search engine has processed ripped off music, software from mysterious sources, Cub Scout fund raising materials, and some content I don’t want to mention in a Web log. How do I know? I have heard this type of rationalization many times. Malformed XML, duplicate content, and other problems means content mismanagement, not bad indexing by a search systems.

Most people don’t have a clue what’s on their public facing servers. The content management system may be at fault. The users might be careless. Management may not have policies and create an environment in which those policies are observed. Most people don’t know that “dates” are assigned and may not correlate with the “date” embedded in a document. In fact, some documents contain many dates. Entity extraction can discover a date, but when there are multiple dates, which date is the “right one”? What’s a search system supposed to do? Well, search systems process what’s exposed on a public facing server or a source identified in the administrative controls for the content acquisition system.

Blaming a software system for lousy content management is a flashing yellow sign that says to me “Uninformed ahead. Detour around problem.”

Based on my experience with indexing content managed by people who were too busy to know what was on their machines, I think blaming Google is typical of the level of understanding in traditional media about how automated or semi automated systems work. Furthermore, when I examined the Tribune’s for fee service referenced in my description identified above, it was clear that the level of expertise brought to bear on this service was in my opinion rudimentary.

Traditional media is eager to find fault with Google. Yet some of these outfits use automated systems to index content and cut headcount. The indexing generated by these systems is acceptable, but there are errors. Some traditional publishers not only index in a casual manner, these publishers charge for each query. A user may have to experiment in order to find relevant documents. Each search puts money in the publisher’s pocket. The Tribune charges for an online service that is essentially unusable by my 86 year old father.

If a Tribune company does not know what’s on its servers and exposes those servers on the Internet, the problem is not Google’s. The problem is the Tribune’s.

Stephen Arnold, September 11, 2008

tyBit: Zero Click Fraud

September 11, 2008

I’m getting my bobbin loaded and squirting the trestle on my sewing machine. It’s almost time for Project Runway, my favorite television show. I put down my can of 3 in 1 oil and scanned my newsreader for gewgaws. What did I see? A story in the prestigious Forbes Magazine about a new search engine called tyBit. I put down my bobbin and picked up my mouse. The paragraph in  the Business Wire story on the lofty Forbes.com’s Web site said here:

tyBit is the only Internet search solution that eliminates click fraud for its advertisers and provides itemized billing for all advertising dollars spent. It is also a no-cost private label search engine for traditional media so they can win back their advertisers, subscribers and revenue.

I navigated to the tyBit Web site, which was new to me, and saw this splash page complete with my tyBit “man”.

tybit splash

I ran my favorite query “ArnoldIT Google” and received this list of results:

arnoldit query

I was happy. The first hit pointed to something I had written.

I then ran an image search on the query “arnoldit” and saw this:

imae search

There I was in bunny rabbit ears in 1981 and in 2007 with my lifetime achievement award for ineptitude. Happy again.

But I clicked on the ad with label “Get free advertising now.” I don’t do advertising. No one hires me anyway. I clicked on the ad, hit back, and then clicked again. What do you know? Click fraud; that is, the click with no intent to buy. In fact, I did it seven or eight times until I decided that the zero click fraud assertion did not apply to house ads on queries about “ArnoldIT Google.”

The site indexes images, video, news, music, “local” and “shop”. I found a line to sign up for tyBit mail. I did not click on each of these links. Project Runway awaits. The Forbes.com write up provides some metrics about the company:

  • More than 6,000 advertisers test the click fraud technology
  • The site averages 2.1 million search per day and 50 million searches in August 2008
  • One advertiser got more than 40 leads.

Sounds good. My suggestion is read the Forbes.com write up, explore the tyBit site here, and make up your mind. Google’s dominance does not seem to intimidate Clarence Briggs, CEO of tyBit. I have added this company to my watch list. Lots of search innovation out there right now there is, there is.

Stephen Arnold, September 11, 2008

The 451 Group’s SharePoint Data

September 10, 2008

A happy quack to the reader who called “Old News Department: Continued Growth for SharePoint” to my attention. “Too Much Information” is a Web log published by the 451 Group. (The number 451 echoes the science fiction story and reminds us of temperature at which paper ignites on earth under “normal” conditions. A book burning is, therefore, a 451 event.) You can read the original 451 “take” here. I quite like the “old news” angle. I’m a specialist in a number of “old” postings. The reader wanted my comment on this piece of data picked up from Microsoft via a news release cited in the “Old News Department: Continued Growth for SharePoint” article; to wit:

Microsoft claimed $800m in SharePoint revenue (in a press release) last year for fiscal 2007, so 30% growth puts 2008 revenue at $1.04 billion, 35% growth puts it at $1.08 billion.  The company also made a rather vague announcement in March the SharePoint Conference and via a press release that it had surpassed the $1 billion revenue mark.  At that point, we dug into it to find the $1 billion number was for the rolling twelve-month period.

The 451 Group pointed out that the numbers were mushy. In my experience, most numbers related to software company’s revenues and customers are indeed soft. I still don’t know the final numbers on the BearStearns’ fiasco, the Enron scam, or the US government’s budget for software licenses at the General Services Administration. Therefore, it’s a safe bet that SharePoint numbers will be squishy too.

Let’s assume, however, that SharePoint is a multi billion dollar product. Further, let’s accept the idea that there are upwards of 100 million SharePoint licenses in the wild. And, let’s embrace the notion that SharePoint is Microsoft’s next generation operating system. If these assumptions are correct within a range of plus or minus 20 percent, here’s my take on the growth of SharePoint:

  1. The incredibly wild and wacky world of content management is going to face a nuclear winter. Already discredited in many organizations, content management like key word enterprise search systems, don’t work, are disappointing to their users, and incredibly expensive to operate. SharePoint may not be the best cookie in the batch, but Microsoft is making it easy and economical to get SharePoint and “do” content management. Interwoven, Documentum, Ektron, and the rest of the CMS crowd will have to do some fancy dancing to keep their revenues flowing and stakeholders wearing happy faces.
  2. SharePoint itself is going to be a big consulting business. For the most part, SharePoint works when one doesn’t ask too much of the system. Two or three people can share and collaborate. The search function is pretty awful, but that can be fixed with a quick phone call to ISYS Search Software, an outfit whose software we just tested. Watch for this write up as a feature on September 15, 2008.
  3. The Microsoft ecosystem is going to follow the trajectory of the mainframe ecosystem or the Oracle database ecosystem. The environment will change but the micro climates will persist within organizations for a long time. Certified Professionals will fight tooth and nail to keep SharePoint and their jobs.

The net net on SharePoint for me is that software is following the consolidation route traveled by the auto companies. Chrysler, Ford, and GM are not competitive. These giants are suffering financial emphysema. Death can be postponed, but none of these companies will be doing much more than walking slowly to local convenience store to buy a microwaved burrito.

Users are going to be the losers. SharePoint is a complex system. It hogs resources. The scale up and out model becomes too costly for most organizations. I think of SharePoint as the digital equivalent of train travel in the US. Yes, one can do it, but the journey is filled with uncertainties. When the train breaks down, the passenger has little choice but wait until the repairs are made and the journey can resume. When the trip is over, passengers step off the Amtrak thankful to have arrived and eager to put the experience behind them.

And CMS? It won’t survive in its present form. Most CMS vendors will struggle to survive on the margin of the SharePoint ecosystem and have to fight off predators hungry for the customers CMS companies have been able to retain. The phrase “nasty, brutish, and short” comes to mind. Squish numbers of not, SharePoint is cat’s pajamas.

Stephen Arnold, September 10, 2008

Google: Employing Lawyers by the Score

September 10, 2008

A Walt Disney wizard is on the job for the US government. The issue, well stated by Jeff Jarvis, is “We Hate Success.” You can read his take on the latest and more threatening legal challenge to Google in its 10 year history here. The hook for his write up is the increasing heat directed at Google’s feet about “its growing dominance in advertising.” I agree with Mr. Jarvis when he wrote:

I’ve long argued that we do, indeed, need competition in the ad market but it’s not going to come from regulation. It’s going to come from getting off our asses and creating those competitors. I said that we need an open-source ad marketplace. Nobody’s heeded that advice.

I would like to add several comments about my perception and Mr. Jarvis’ regarding Google.

First, I do not believe that advertisers, telcos, and publishers understand what Google has built, what it is doing, or how the physics of Google operates in the cloud chamber businesses these business sectors try to enforce. Without understanding, there’s precious little hope of developing an effective, appropriate response to Google. These industries are watching Google’s digital beams punch holes in their semi closed business environments. Left to its own devices, Google will vaporize the “walls”. Then what?

Second, the competitors have watched Google through search colored glasses with some frou frou trim called online advertising. Competitors have assumed that with traffic, ad dollars would flow to them. So, the perception of what was needed to respond to Google was wrong in 1998 and it is wrong today–a decade later. That’s a pretty long time to misdiagnose a problem. At this point, there is no single company with sufficient resources to leap frog Google. I make this point in my 2007 study Google Version 2.0. That’s the reason that the billions spent by Microsoft haven’t d3elivered. The company is not investing enough. Meanwhile, Google keeps on widening its lead in technology, customer traffic, and advertising. You will have to read my study to find out who and what can get past the GOOG.

Third, local regulation won’t do the job. Where is Google selling advertising? In what country is the money booked? What is Google selling?

I am not sure that an auction, operated from servers somewhere in the cloud and arguably not in the US, and delivering a message on behalf of a person or company to an unknown user who happens to have an interest in a subject is going to be easy to limit. Google is not a local newspaper selling a fungible product with a guaranteed circulation, a physical product, and specific customers who are betting that the newspaper can catch the interest of a person wanting to buy a used car.

In short, lawyers will make a great deal of money chasing Google and its monopoly. The problem is that Google is a supra national entity dealing in zeros and ones. I was on the fringes of the AT&T break up in the late 1970s and early 1980s. That was trivial compared to dealing with the actions of individuals who do what ad agencies used to do by themselves on servers routing the work hither and yon.

The definition of terms will generate enough billable hours to keep a legion of attorneys gainfully employed for a long time. When a decision is reached, the question becomes, “Will Google be the same at that point in time as it was when the legal engine started running?” I don’t think so.

Stephen Arnold, September 10, 2008

Redshift: With Google It Depends From Where You Observe

September 10, 2008

My research suggests that opportunities, money, and customers are rushing toward Google. Competitors–like publishers–are trying to rush away, but the “gravitational pull” is too great. Traditional publishers don’t have the escape velocity to break away. What is this a redshift or a blueshift?

Dr. Greg Papadopoulos, Sun Microsystems wizard, gave a talk at the 2007 Analyst Summit (summit is an over used word in the conference universe in my opinion) called “Redshift: The Explosion of Massive Scale Systems.” I think much of the analysis is right on, but the notion of a “redshift” (not a misspelling) applies to rushing away from something, not rushing toward something. You can download a copy of this interesting presentation here. (Verified on September 9, 2008).

Dr. Papadopoulos referenced Google in this lecture in 2007. For the purposes of this post, I will think of his remarks as concerning Google. I’m a captive of my own narrow research. I think that’s why this presentation nagged at my mind for a year. Today, reading about hadron colliders and string theory, I realized that it depends on where one stands when observing Doppler effects. From my vantage point, I don’t think Google was a redshift. You can brush up on this notion by scanning the Wikipedia entry, which seems okay to me, but I am no theoretical physicist. I did work at a nuclear engineering firm, but I worked on goose feathers, not gluons and colors. From what I recall, when the object speeds away from the observer, you get the “red shift”. When the object rushes towards the observer, you get the blue shift. Redshift means the universe is expanding when one observes certain phenomena from earth. Blueshift means something is coming at you. Google is pretty darn blue to my eyes.

The Papadopoulos presentation contains a wealth of interesting and useful data. I am fighting the urge to cut, paste, borrow, and recycle. But there are three points that warrant a comment.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta