Extending SharePoint Search
September 15, 2008
Microsoft SharePoint is a widely used content management and collaboration system that ships with a workable search system, which I’ll refer to as ESS, for Enterprise Search System. But for program expansion and customization, you’ll want to look to third-party systems for help.
Sharepoint has reduced the time and complexity of customizing result pages, handling content on Microsoft Exchange servers, and accessing most standard file types. In our tests of SharePoint, ESS does a good job and offers some bells and whistles like identifying the individual whose content suggests an author is knowledgeable about a specific topic. Managing crawls or standard index cycles are point and click, SharePoint is security aware, and customization is easy. But licensees will hit a “glass ceiling” when indexing upwards of 30 million documents. To provide a solution, Microsoft purchased Fast Search & Transfer. Microsoft has released a Fast Search Web part to make integration of the FAST Enterprise Search Platform or ESP easier. The SharePoint FAST ESP Web part is located Microsoft’s CodePlex web site and the documentation can be obtained here.
But licensing Fast ESP can easily soar above $250,000, excluding customizing and integrating service fees making it a major investment to deliver acceptable search-and-retrieval functionality for large, disparate document collections. So what can a SharePoint licensee do for less money?
The good news is that there are numerous solutions available. These range from open source options such as Lucene and FLAX to the industrial-strength Autonomy IDOL (intelligent data operating layer), which can cost $300,000 or more before support and maintenance fees are tacked on.
Third-party systems can reduce the time required to index new and changed documents. One of the major reasons for shifting from the ESS to a third-party system is a need to provide certain features for your users. Among the most-requested functions are deduplication of result sets, parametric searching/browsing, entity extraction and on-the-fly classification, and options for merging different types of content in the SharePoint environment. The good news is that there are more than 300 vendors with enterprise search systems that to a greater or lesser degree support SharePoint. The bad news is that you have to select a system.
Switching Methodology
Each IT professional with Microsoft certification knows how to set up, configure, and maintain SharePoint and other “core” Microsoft server systems. Let’s look at a methodology for replacing SharePoint with ISYS Search Software’s ISYS:web. ISYS is one of a half-dozen vendors offering so-called “SharePoint Search” capabilities.
Here’s a run down of a procedure that minimizes pitfalls:
- Set up a development server with SharePoint running. You don’t need to activate the search services. This can be on a computer running Windows Server 2003 or 2008. Microsoft recommends at a minimum a server with dual CPUs, each running at least 3 GHz, and 2 GB of memory. Also necessary for installation are Internet Information Services (IIS, along with its WWW, SMTP, and Common Files components), version 3.0 or greater of the .NET Framework, and ASP.NET 2.0. A more detailed look at these requirements can be found here.
- Create a single machine with several folders containing documents and content representative of what you will be indexing.
- Install ISYS:web 8 on the machine running SharePoint.
- Work through the configuration screens, noting the information required to add additional content repositories to index. An intuitive ISYS Utilities program will let you configure SharePoint indexes.
- Launch the ISYS indexing component. Note the time indexing begins and ends. You will need these data in order to determine the index build time when you bring the system up for production.
- Run test queries on the indexed content. If the results are not what you expect, make a return visit to the ISYS set up screens, verify your choices, delete the index, and reindex the content collection. Be sure to check that entities are appearing in the ISYS display.
- Open the ISYS results template so you can familiarize yourself with the style sheet and the behind-display controls.
- Once you are satisfied that the basics are working, verify that ISYS is using security flags from Active Directory.
At this point, you can install ISYS on the production server and begin the processing of generating the master index. Image files for the ISYS installation are available from ISYS. These include screen shots illustrating how to set up the ISYS index.
Some Gotchas to Avoid
First, when documents change, the search system must recognize that change, copy or crawl the document, and make the changed document available to the indexing subsystem. The new index entries must be added to the main index. When a slow down occurs, check the resources available.
Second, keep in mind that new documents must be indexed and changed documents have to be reindexed. Setting the index update at too aggressive a level can slow down query processing. Clustering can speed up search systems, but you will need to allocate additional time to configure and optimize the systems.
Third, additional text processing features such as deduplication, entity extraction, clustering, and generating suggestions or See Also hints for users suck computing resources. Fancy extras can contribute to sluggish performance. Finally, trim the graphical bells and whistles. Eye candy can get in the way of a user’s getting the information required quickly.
To sum up, SharePoint ships with a usable search-and-retrieval system. When you want to break through the current document barrier or add features quickly, you will want to consider a third-party solution. Regardless of the system you select, set up a development server and run shake downs to make user the system will deliver the results the users need.
Stephen Arnold, September 15, 2008
Comments
6 Responses to “Extending SharePoint Search”
[…] Read the rest of this great post here […]
You should have a look at Coveo and their Sharepoint connector. Cheap and efficient technology when compared with Fast and Autonomy.
JF Martin,
I don’t know too much about Coveo. I hear good things about the company’s technology and customer support. I have learned that Canadian engineers are often quite good. The Waterloo computer science program is among the world’s best. I will continue to monitor the company. I know about its new cloud based email service. We did a small research project for the firm recently. The company received high marks from my engineers. Keep me informed, please.
Stephen Arnold, September 16, 2008
[…] Arnold has a good post on the topic of SharePoint search. Over the years, Microsoft has been routinely criticized for its poor search, and much of the […]
I second JF’s mention of Coveo. We bought it recently and have performed extensive tests and are extremely impressed with two key points: security and live index updates. Support’s is good too, a consideration people may not give (if you considered Google by the way)…
The discussion is about search….but by far the most common and effective method of finding information is browsing a structured classification system (taxonomy). By default SharePoint has one taxonomy only: the site structure and navigation. You can enhance this using site categories and content types.
If you want to categorize single items or documents directly with centrally managed cross-site tree-style categories you can use taxonomy extensions, that are offered by several vendors, e.g.
http://www.sharepartxxl.com/products/taxonomy/default.aspx
The content categorizing can be done directly in the edit view of an item or document, guided by rule based suggestions or completely automatically. If the content is once categorized it can be found using additional navigations like category tree, A-Z index, tag clouds or related item links, directly shown in the items detail view, using default SharePoint search procedures in background.
Categories can offer different views to your content – independend from the items storage location, e.g. site or list or item type. This views can be based on the organizational structure, procucts and solutions your caompany offers, based on target groups or ordered geographically by language or subsidiaries.
Category-based navigations can greatly improve knowledge bases, product or media libraries or intranets based on SharePoint. Using categories your SharePoint portal really can become a place to share knowledge as well as content.