A Useful SharePoint Functional Diagram

September 12, 2008

Eric Johnson, enterprise architect, invested some time in creating a SharePoint functional breakout. Most of the SharePoint diagrams I have seen are too complicated for my goose-sized brain. You can look at his “Moss Functional Breakout” here and snag a copy of this useful diagram. His Web log post identifies a couple of individuals who have “borrowed” his diagram without permission. I don’t want to make this error, nor do I want you to use it without appropriate attribution. Here’s a low resolution version of this schematic. The credit line is Eric Johnson, formerly Microsoft, now enterprise architect. You can reach him here.

!sharepoint functional breakout

A happy quack to Mr. Johnson and a little goose present on the windshield of those who used this diagram with out permission or attribution.

Stephen Arnold, September 12, 2008

eDiscovery: Speed Bumps Annoy Billing Attorneys

September 12, 2008

A happy quack to my Australian reader who called “eDiscovery Performance Still a Worry”. The article by Greg McNevin appeared on the IDM.net.au Web site on September 10, 2008. The main point of the write up is that 60 percent of those polled about their organization’s eDiscovery litigation support system said, “Dog slow.” The more felicitous wording chosen by Mr. McNevin was:

The survey also found that despite 80 percent of organisations claiming to have made an investment in IT to address discovery challenges, 60 percent of respondents think their IT department is not always able to deliver information quickly enough for them to do their legal job efficiently.

The survey was conducted by Dynamic Markets, who polled 300 in house legal eagles in the Uk, Germany, and the Netherlands. My hunch is that the 60 percent figure may well apply in North America as well. My own research unearthed the fact that two thirds of the users of enterprise search systems were dissatisfied with those systems. The 60 percent score matches up well.

In my view, the larger implication of this CommVault study is that when it comes to text and content processing, more than half the users go away annoyed or use the system whilst grumbling and complaining.

What are vendors doing? There’s quite a bit of activity in the eDiscovery arena. More gladiators arrive to take the place of those who fall on their swords, get bought as trophies, or die at hands of another gladiator. Sadly, the activity does not address the issue of speed. In the sense for this context, “speed” in not three millisecond response time. “Speed” means transforming content, updating indexes, and generating the reports needed to figure out what information is where in the discovered information.

Many vendors are counting on Intel to solve the “speed” problem. I don’t think faster chips will do much, however. The “speed” problem is that eDiscovery relies on a great many processes. Lawyers, in general, have a need for what’s required to meet a deadline. There’s little reason for them to trouble their keen legal minds with such details as content throughput, malformed XML, flawed metatagging, and trashed indexes after an index update.

eDiscovery’s dissatisfaction score mirrors the larger problems with search and content processing. There’s no fix coming that will convert a grim black and white image to a Kodachrome version of reality.

Stephen Arnold, September 12, 2008

Convera Ties Up with Fast Search

September 12, 2008

I have to admit that I was surprised with this headline “Convera and Fast, a Microsoft Subsidiary Announce Extended Business Relationship.” You can read the story from MarketWatch here. (Note: this is a wacky url and I have a hunch the source will 404 unless you click in a timely manner.) After releasing its Web Part for SharePoint, I was expecting a tapering off in Fast Search & Transfer innovation. Was I surprised? Yes, a tie up between Convera (formerly Excalibur Technologies) and Microsoft’s enterprise search unit is news. Convera, like Fast Search, seems to have gone through a rough patch on the information superhighway. There was the marriage and divorce from Intel and the NBA several years ago. Then there was a decline in revenues. Finally, Convera exited the enterprise search business to focus on building vertical search engines for publishers. My recollection is that the company’s revenues continue to dog paddle. Fast, as you may know, is alleged to have made a math error with its financial reports in the 24 months prior to its sell out to Microsoft in April 2008 for $1.2 billion. I thought opposites attracted, and these companies seem to have some interesting similarities. I recall that Autonomy bought a chunk of Convera. Then Fast Search jumped in and bought another piece of Convera. There’s a history between Fast Search and Convera, and obviously the management teams of both companies get along swimmingly.

According to the write up that appeared on September 11, 2008:

Convera will integrate FAST ESP(TM) search capabilities into its hosted vertical search solution for publishers. Publishers will be able to configure and customize an integrated site search and vertical search solution using Convera’s Publisher Control Panel (PCP).  Additionally, Convera will extend FAST AdMomentum across the Convera search platform for publishers. Customers of the Convera professional network would be able to leverage targeted search and contextual ad inventory supplied by Microsoft and distributed from the Convera platform.

I certainly hope this deal becomes a 1 + 1 = 3 for their respective stakeholders. Fast Search has faded to the background at Microsoft from my vantage point. Convera had, in all honesty, dropped completely off my radar. The Intel – NBA blow ups waved me off the company years ago. My take on this deal is that I am not sure it will generate significant revenue unless Convera has found a way to out Google Google. This seems unlikely to me. Agree? Disagree? Help me learn.

Stephen Arnold, September 12, 2008

Search: Google’s 10 Percent Problem

September 11, 2008

I love it when Google explains the future of search. Since Google equals search for more than 70 percent of the users in North America and even more outside the US, the future of search means Google. And what does Google’s helpful Google Web log here tells us:

So what’s our straightforward definition of the ideal search engine? Your best friend with instant access to all the world’s facts and a photographic memory of everything you’ve seen and know. That search engine could tailor answers to you based on your preferences, your existing knowledge and the best available information; it could ask for clarification and present the answers in whatever setting or media worked best. That ideal search engine could have easily and elegantly quenched my withdrawal and fueled my addiction on Saturday.

The “universal search” play announced at the hastily conceived Searchology news conference–anyone remember that?–has fallen by the wayside. I have wondered if the BearStearns’ publication of the Google Programmable Search Engine report and the suggestion that Google may be angling to become the Semantic Web spawned that Searchology program.

I don’t think search is a 10 percent problem for Google. The problem is bandwidth, regulations, traffic, and the market. After digging through Google’s technical papers and patent documents, I have reached the conclusion that the GOOG has the basics in place for next-generation search; for example:

  • Search without search
  • Dossier generation
  • Predictive content assembly
  • Integration of multiple functions because “search” is simply a way station on the path to solving a problem.

Most of the search pundits getting regular paychecks for now from mid level consulting firms assert that we are at the first step or Day One of a long journey with regard to search. Sorry, mid range MBAs. Search–key word variety–has been nailed. Meeting the needs of the herd searcher–nailed. Personalization of results–nailed.

What’s next are these search solutions. The reason that vendors are chasing niches like eDiscovery and call center support is simple. These are problems that can be addressed in part by information access.

Meanwhile the GOOG sits in its lairs and ponders when and how to release to maximum advantage the PSE, dataspaces, “I’m feeling doubly lucky” and dozens of other next generation search goodies, including social. Keep in mind that the notion of clicks is a social function. Google’s been social since the early days of BackRub.

There you have it. Google has a 10 percent challenge. In my opinion, that last 10 percent will be tough. Lawyers and other statistically messy non-algorithmic operations may now govern Googzilla’s future. If you want links to these Google references, you can find them here. My rescue boxer Tess needs special medical attention, so you have to buy my studies for the details. Sorry. Rescue boxers come before free Web log readers. Such is life. Sigh.

Stephen Arnold, September 11, 2008

Tribune Says: Google’s Automated Indexing Not Good

September 11, 2008

I have been a critic of Sam Zell’s Tribune since I tangled with the site for my 86 year old father. You can read my negative views of the site’s usability, its indexing, and its method of displaying content here.

Now on with my comments on this Marketwatch story titled “Tribune Blames Google for Damaging News Story” by John Letzing, a good journalist in my book. Mr. Letzing reports that Google’s automated crawler and indexing system could not figure out that a story from 2002 was old. As a result, the “old” story appeared in Google News and the stock of United Airlines took a hit. The Tribune, according to the story, blames Google.

Hold your horses. This problem is identical to the folks who say, “Index my servers. The information on them is what we want indexed.” As soon as the index goes live, these same folks complain that the search engine has processed ripped off music, software from mysterious sources, Cub Scout fund raising materials, and some content I don’t want to mention in a Web log. How do I know? I have heard this type of rationalization many times. Malformed XML, duplicate content, and other problems means content mismanagement, not bad indexing by a search systems.

Most people don’t have a clue what’s on their public facing servers. The content management system may be at fault. The users might be careless. Management may not have policies and create an environment in which those policies are observed. Most people don’t know that “dates” are assigned and may not correlate with the “date” embedded in a document. In fact, some documents contain many dates. Entity extraction can discover a date, but when there are multiple dates, which date is the “right one”? What’s a search system supposed to do? Well, search systems process what’s exposed on a public facing server or a source identified in the administrative controls for the content acquisition system.

Blaming a software system for lousy content management is a flashing yellow sign that says to me “Uninformed ahead. Detour around problem.”

Based on my experience with indexing content managed by people who were too busy to know what was on their machines, I think blaming Google is typical of the level of understanding in traditional media about how automated or semi automated systems work. Furthermore, when I examined the Tribune’s for fee service referenced in my description identified above, it was clear that the level of expertise brought to bear on this service was in my opinion rudimentary.

Traditional media is eager to find fault with Google. Yet some of these outfits use automated systems to index content and cut headcount. The indexing generated by these systems is acceptable, but there are errors. Some traditional publishers not only index in a casual manner, these publishers charge for each query. A user may have to experiment in order to find relevant documents. Each search puts money in the publisher’s pocket. The Tribune charges for an online service that is essentially unusable by my 86 year old father.

If a Tribune company does not know what’s on its servers and exposes those servers on the Internet, the problem is not Google’s. The problem is the Tribune’s.

Stephen Arnold, September 11, 2008

tyBit: Zero Click Fraud

September 11, 2008

I’m getting my bobbin loaded and squirting the trestle on my sewing machine. It’s almost time for Project Runway, my favorite television show. I put down my can of 3 in 1 oil and scanned my newsreader for gewgaws. What did I see? A story in the prestigious Forbes Magazine about a new search engine called tyBit. I put down my bobbin and picked up my mouse. The paragraph in  the Business Wire story on the lofty Forbes.com’s Web site said here:

tyBit is the only Internet search solution that eliminates click fraud for its advertisers and provides itemized billing for all advertising dollars spent. It is also a no-cost private label search engine for traditional media so they can win back their advertisers, subscribers and revenue.

I navigated to the tyBit Web site, which was new to me, and saw this splash page complete with my tyBit “man”.

tybit splash

I ran my favorite query “ArnoldIT Google” and received this list of results:

arnoldit query

I was happy. The first hit pointed to something I had written.

I then ran an image search on the query “arnoldit” and saw this:

imae search

There I was in bunny rabbit ears in 1981 and in 2007 with my lifetime achievement award for ineptitude. Happy again.

But I clicked on the ad with label “Get free advertising now.” I don’t do advertising. No one hires me anyway. I clicked on the ad, hit back, and then clicked again. What do you know? Click fraud; that is, the click with no intent to buy. In fact, I did it seven or eight times until I decided that the zero click fraud assertion did not apply to house ads on queries about “ArnoldIT Google.”

The site indexes images, video, news, music, “local” and “shop”. I found a line to sign up for tyBit mail. I did not click on each of these links. Project Runway awaits. The Forbes.com write up provides some metrics about the company:

  • More than 6,000 advertisers test the click fraud technology
  • The site averages 2.1 million search per day and 50 million searches in August 2008
  • One advertiser got more than 40 leads.

Sounds good. My suggestion is read the Forbes.com write up, explore the tyBit site here, and make up your mind. Google’s dominance does not seem to intimidate Clarence Briggs, CEO of tyBit. I have added this company to my watch list. Lots of search innovation out there right now there is, there is.

Stephen Arnold, September 11, 2008

Google: EULA Loop

September 11, 2008

The Australian PC World ran a news story I found interesting. “Google Claims License to User Content in Multiple Products.” You can read Grant Gross’s article here. The news peg is that the “ownership” language in the Chrome license agreement appears in other Google EULAs as well. Mr. Gross mentions these products:

  • Picasa
  • Blogger
  • Google Docs
  • Google Groups

You and I need to verify Mr. Gross’s assertions. If true, my observation that this type of language is not accidental may not be wide of the mark. Mr. Gross reports that some folks find these EULAs sufficient cause to work around Google services. Google, on the other hand, seems to suggest that we’re really great guys and won’t take anyone’s content.

For me the  most interesting comment in the write up was:

…the copyright terms that still exist in Picasa, Blogger and other Google applications would allow the company to use its customers’ content to promote the Google service. That could allow Google to use the content in live product demonstrations, for example, or in some promotional materials…

If true, I need to do some hard thinking about what and what not to do via Google services. If false, it’s Google all the way. Right now, I’m on the fence. I think this is a safe place to be until this EULA issues becomes clearer to me.

Stephen Arnold, September 11, 2008

Simplexo: Another Open Source Enterprise Search Platform

September 11, 2008

A satisfied reader alerted me to the Simplexo open source search announcement today. Simplexo offers its search system on the open source plan. If you use the engine, you can pay Simplexo to customize, support, and tune the system. The business model strikes me as quite similar to Lemur Consulting’s approach described here.

The article “Simplexo Launches Open Source Enterprise Search Platform” by Steve Evans appeared in CBROnline.com here. Simplexo offers a Butler Group (a Datamonitor Company) “audit” for download here.

According to Mr. Evans’ write up:

The software is capable of searching through unstructured data such as email, word processor documents, images, text files and spreadsheets, as well structured data including databases, payroll, HR systems, and SAP.

Mr. Evans identifies one interesting method used by Simplexo. He observed:

Simplexo Enterprise uses the indexing capabilities of databases and other legacy software and therefore does not need to index this data. It only indexes unstructured data, reducing the amount of resources taken up by search indexes.

The Butler “audit” noted that Simplexo opened for business in September 2008 and is a start up. The company generates revenue by supporting and customizing its open source search system. The product analysis seemed a bit sketchy, which is not surprising. The Butler “auditor” reported that the system can index two terabytes of data in about five hours. I urge you to download and read the Butler “audit”. I don’t want to recycle that firm’s information for a new company whose technology is unfamiliar to me. You can absorb the “audit” yourself and decide if the system is right for you.

More information about the company is available at www.simplexo.com or click here.

My view on open source search engines is that this is becoming a sector with a number of options for the organization interested in this approach. I mentioned Lemur Consulting. I have also written about Tesuji here. You can also take a look at my write up about Lucene here. These open source options are selective, not comprehensive.

I am neutral on open source search solutions. If you have the technical resources, open source can deliver excellent results. If you are not comfortable with open source, then you may be better served by running a try-before-you-buy analysis and then a bake off. Let your data collection guide you.

Stephen Arnold, September 11, 2008

The 451 Group’s SharePoint Data

September 10, 2008

A happy quack to the reader who called “Old News Department: Continued Growth for SharePoint” to my attention. “Too Much Information” is a Web log published by the 451 Group. (The number 451 echoes the science fiction story and reminds us of temperature at which paper ignites on earth under “normal” conditions. A book burning is, therefore, a 451 event.) You can read the original 451 “take” here. I quite like the “old news” angle. I’m a specialist in a number of “old” postings. The reader wanted my comment on this piece of data picked up from Microsoft via a news release cited in the “Old News Department: Continued Growth for SharePoint” article; to wit:

Microsoft claimed $800m in SharePoint revenue (in a press release) last year for fiscal 2007, so 30% growth puts 2008 revenue at $1.04 billion, 35% growth puts it at $1.08 billion.  The company also made a rather vague announcement in March the SharePoint Conference and via a press release that it had surpassed the $1 billion revenue mark.  At that point, we dug into it to find the $1 billion number was for the rolling twelve-month period.

The 451 Group pointed out that the numbers were mushy. In my experience, most numbers related to software company’s revenues and customers are indeed soft. I still don’t know the final numbers on the BearStearns’ fiasco, the Enron scam, or the US government’s budget for software licenses at the General Services Administration. Therefore, it’s a safe bet that SharePoint numbers will be squishy too.

Let’s assume, however, that SharePoint is a multi billion dollar product. Further, let’s accept the idea that there are upwards of 100 million SharePoint licenses in the wild. And, let’s embrace the notion that SharePoint is Microsoft’s next generation operating system. If these assumptions are correct within a range of plus or minus 20 percent, here’s my take on the growth of SharePoint:

  1. The incredibly wild and wacky world of content management is going to face a nuclear winter. Already discredited in many organizations, content management like key word enterprise search systems, don’t work, are disappointing to their users, and incredibly expensive to operate. SharePoint may not be the best cookie in the batch, but Microsoft is making it easy and economical to get SharePoint and “do” content management. Interwoven, Documentum, Ektron, and the rest of the CMS crowd will have to do some fancy dancing to keep their revenues flowing and stakeholders wearing happy faces.
  2. SharePoint itself is going to be a big consulting business. For the most part, SharePoint works when one doesn’t ask too much of the system. Two or three people can share and collaborate. The search function is pretty awful, but that can be fixed with a quick phone call to ISYS Search Software, an outfit whose software we just tested. Watch for this write up as a feature on September 15, 2008.
  3. The Microsoft ecosystem is going to follow the trajectory of the mainframe ecosystem or the Oracle database ecosystem. The environment will change but the micro climates will persist within organizations for a long time. Certified Professionals will fight tooth and nail to keep SharePoint and their jobs.

The net net on SharePoint for me is that software is following the consolidation route traveled by the auto companies. Chrysler, Ford, and GM are not competitive. These giants are suffering financial emphysema. Death can be postponed, but none of these companies will be doing much more than walking slowly to local convenience store to buy a microwaved burrito.

Users are going to be the losers. SharePoint is a complex system. It hogs resources. The scale up and out model becomes too costly for most organizations. I think of SharePoint as the digital equivalent of train travel in the US. Yes, one can do it, but the journey is filled with uncertainties. When the train breaks down, the passenger has little choice but wait until the repairs are made and the journey can resume. When the trip is over, passengers step off the Amtrak thankful to have arrived and eager to put the experience behind them.

And CMS? It won’t survive in its present form. Most CMS vendors will struggle to survive on the margin of the SharePoint ecosystem and have to fight off predators hungry for the customers CMS companies have been able to retain. The phrase “nasty, brutish, and short” comes to mind. Squish numbers of not, SharePoint is cat’s pajamas.

Stephen Arnold, September 10, 2008

Google: Employing Lawyers by the Score

September 10, 2008

A Walt Disney wizard is on the job for the US government. The issue, well stated by Jeff Jarvis, is “We Hate Success.” You can read his take on the latest and more threatening legal challenge to Google in its 10 year history here. The hook for his write up is the increasing heat directed at Google’s feet about “its growing dominance in advertising.” I agree with Mr. Jarvis when he wrote:

I’ve long argued that we do, indeed, need competition in the ad market but it’s not going to come from regulation. It’s going to come from getting off our asses and creating those competitors. I said that we need an open-source ad marketplace. Nobody’s heeded that advice.

I would like to add several comments about my perception and Mr. Jarvis’ regarding Google.

First, I do not believe that advertisers, telcos, and publishers understand what Google has built, what it is doing, or how the physics of Google operates in the cloud chamber businesses these business sectors try to enforce. Without understanding, there’s precious little hope of developing an effective, appropriate response to Google. These industries are watching Google’s digital beams punch holes in their semi closed business environments. Left to its own devices, Google will vaporize the “walls”. Then what?

Second, the competitors have watched Google through search colored glasses with some frou frou trim called online advertising. Competitors have assumed that with traffic, ad dollars would flow to them. So, the perception of what was needed to respond to Google was wrong in 1998 and it is wrong today–a decade later. That’s a pretty long time to misdiagnose a problem. At this point, there is no single company with sufficient resources to leap frog Google. I make this point in my 2007 study Google Version 2.0. That’s the reason that the billions spent by Microsoft haven’t d3elivered. The company is not investing enough. Meanwhile, Google keeps on widening its lead in technology, customer traffic, and advertising. You will have to read my study to find out who and what can get past the GOOG.

Third, local regulation won’t do the job. Where is Google selling advertising? In what country is the money booked? What is Google selling?

I am not sure that an auction, operated from servers somewhere in the cloud and arguably not in the US, and delivering a message on behalf of a person or company to an unknown user who happens to have an interest in a subject is going to be easy to limit. Google is not a local newspaper selling a fungible product with a guaranteed circulation, a physical product, and specific customers who are betting that the newspaper can catch the interest of a person wanting to buy a used car.

In short, lawyers will make a great deal of money chasing Google and its monopoly. The problem is that Google is a supra national entity dealing in zeros and ones. I was on the fringes of the AT&T break up in the late 1970s and early 1980s. That was trivial compared to dealing with the actions of individuals who do what ad agencies used to do by themselves on servers routing the work hither and yon.

The definition of terms will generate enough billable hours to keep a legion of attorneys gainfully employed for a long time. When a decision is reached, the question becomes, “Will Google be the same at that point in time as it was when the legal engine started running?” I don’t think so.

Stephen Arnold, September 10, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta