SharePoint Sunday: Microsoft Fast Tuning Document
January 11, 2010
Microsoft has published in the XPS format “Optimize Search Relevance with Microsoft FAST Search Server 2010 for SharePoint (Beta)”. If you don’t have the XPS viewer installed, you will have to download that Adobe Acrobat inspired program from Microsoft.com/downloads. If you can’t locate the file, you can use Google to bing you to the appropriate link. (I had to fiddle a bit to get the XPS file to render because the download for XP would not run. My Windows 7 machine was more compliant.)
What interested me about this document is that it addresses relevance performance issues for the new search system. Even more interesting is that Fast Search & Transfer has been licensing software to the enterprise since 2004. The performance challenges of ESP have become familiar friends to some administrators. The fact that a 23-page technical white paper is required * before * the actual product ships furrowed my brow.
I know that many Certified Microsoft Professionals salivate when these types of opportunities arise. I don’t think that many Chief Financial Officers will be as gleeful. Money is required to address certain search performance issues. The Fast ESP system, like other complex, older search architectures, poses some significant hurdles for the CFO who wants engineers to spring to finish a search optimization job and the related bits of craft necessary to tune content processing, index refreshes, and the burden of certain content that must be crunched as soon as the info arrives.
The document reminds me that I cannot get too frisky when thinking about or using the information in the white paper. I will do my best to keep my friskiness under control. I think you should read the document and let it speak for itself.
Let me highlight some of the tips that caught my attention. First, the white paper makes clear that one may want to “prevent indexing irrelevant SharePoint content.” That’s interesting advice. The user does not know what he or she needs. A search therefore that is selective or winnows down what is in the index means that a user may not find what he or she seeks, may not have a complete set of pertinent information, or will have to supplement an online search with the old fashioned, real expensive approach—the Easter egg hunt. The trend in my experience is to index as much information that is available. Exclusion of information is tricky, and I for one don’t want to explain that the irrelevant content I did not process is exactly what was needed to close a big deal or get a fact verified. I can apply the same concern to the statement “Encourage archiving and deleting old content.” I know that old content may not be frequently accessed. But if the needed information is “old”, should the user be denied knowledge that germane content is indeed available.
The paper then shifts to running the Fast Search connector. A connector is a code shim that hooks content to the content processing system. My son’s company, Adhere Software, is in the connector business, and my understanding is that multiple connectors are required because organizations have diverse content ty8pes. But if there is one Fast Search connector, one has to tune it. The settings strike to the heart of the performance on a content processing system. The idea is not to crawl for new or updated content frequently. This is at odds with some companies’ desire to have the most timely information in the search system. If you get the crawl wrong, in my experience the users email, asking, “Where is that document?” One of the flaws in enterprise search is that the basic content acquisition method is at odds with the expectations of the users. Exalead, for example, delivers content within a 12 to 15 minute window. The other vendors struggle to match this timeliness. I don’t think the “new” version of Fast Search will be much different. Exalead and a couple of other outfits are the present speed champs in the enterprise. The code base in Exalead is “newer” than that in Fast Search which dates from the late 1990s.
Relevance is a tricky topic. In order to generate results that are useful to a user, enterprise systems require more care and feeding than some vendors reveal during the run up to contract signing. The section “Tune relevance in Fast Search for SharePoint” makes clear that term lists and manual promotion or demotion of certain information are needed. Hit boosting is important and today’s boosted document may be tomorrow’s demoted document. The hands on part of a search system like Fast Search is often a matter of trial and error. An unexpected result can create quite a bit of excitement. This type of tuning is expensive. The dependencies within the Fast Search system often create a need to revise the changes. The tuning segment is several pages long, and I suggest you read it, considering the cost implications of the recommendations that you as the search manager will have to perform yourself or hire specialists. Plan to spend quite a bit of time with the staging server implementation of Fast Search. Making relevancy changes while the plane is in the air can be exciting. I find drill level setting particularly enervating.
The linguistic relevance tuning is an important exercise. The idea is that some of nifty features like suggested content and forgiveness for poor spelling requires some manual experimentation and adjustment. I don’t know too many SharePoint administrators who have deep linguistic and semantic background. I know I don’t, and we do this stuff for a living. I rely on specialists, but these can be tough to find. A fiddly mistake can wreck havoc with the search system itself.
The white paper devotes a page and a half to custom search applications. As you might expect, quite a bit of useful detail was excluded to make the custom search application fit in a page and a half. The inclusion of the topic revealed to me that Microsoft is making an effort to minimize the complexity of creating a useful search enabled application. I know from many years in the search field that few SharePoint administrators posses the expertise to handle the challenges a search based application presents. Some point administrators have confidence in their ability. This confidence can undergo some alteration when a simple job becomes a death march that seems to have no end. If one does not know what one does not know, the search based application will send that person to boot camp quickly.
Several observations:
- Microsoft is going to give SharePoint licensees the idea that tuning Fast Search is not big deal. I think that Fast Search tuning will become a big deal and very quickly. The complexity of Fast Search cannot be minimized or swept under the rug no matter how many azure chip consultants assert that life is simple. Not in Fast Search land I opine.
- The white paper is a shopping list of tasks. The code samples are helpful, but the guidance is not deep. This means that a SharePoint administrator following the steps outlined in this white paper will find himself or herself with quite a technical challenge. What makes life exciting is that the dependencies within and among the Fast Search “control knobs” are not documented. Fast Search is not an iPhone app that one can master if a matter of minutes. Tuning Fast Search is challenging technical work.
- The omission of references to third party experts who will be needed to handle certain tasks such as the controlled terms operations sets the stage for a big surprise. Linguistic components are not intuitive, and the work requires experts who know how to set up term lists that deliver meaningful results to the user.
My view is that Microsoft is beginning to understand what a challenge Fast Search will be to the average SharePoint administrator. Enterprise search appliance and purpose built search systems vendors will see the broad deployment of Fast Search as the best marketing for their competitive products.
Pretty exciting stuff.
Stephen E. Arnold, January 11, 2010
No one paid me to write this, but I believe that if I were actually a salesperson I could have found a couple of outfits eager to have me slap their SharePoint search solution on the blog page displaying this essay. Also, no dough. I will report this to the administrator of Walter Reed Hospital. My idea is that those under stress may seek remediation at that fine facility.