Fast to Integrate with SharePoint

November 2, 2009

Overflight’s SharePoint search container has been an empty can for weeks. Today I saw a link to the Microsoft Enterprise Search Blog’s “Fast Meets SharePoint – What’s Coming in Search for SharePoint 2010”. Microsoft has owned Fast Search & Technology since April 2008. Since that time, there’s been a Web part and lots of speculation. The “legacy” Fast ESP customers seem to have the impression that Microsoft will support those systems for years. I heard that the commitment was for a decade. The mystery has been the union of Fast ESP with SharePoint. In my opinion, both software systems are bundles of subsystems, and the complexities of each separate product are familiar to me and my goslings here in Harrod’s Creek, Kentucky.

The October 28, 2009, write up contains a number of interesting points. Let me run down those that struck me as significant to my work. Your mileage may vary because the points in the Microsoft blog post are a feature run down, not substantive information about the “new” Fast ESP for SharePoint.

First, the SharePoint conference was sold out. This reminds me that training people to use Microsoft products is a big business. Obviously this is an important point because it focuses on what matters—conference attendance—not search it seems to me.

Second, there was a flashback to a conference held in February 2009. That tells me that there’s not much new in the way of SharePoint / Fast ESP news.

Third, there is a reminder to me that SharePoint is a work in progress. I think someone told me that it is the next operating system from Microsoft. The Fast component will be called Fast Search Server 2010 for SharePoint. Now the story gets interesting. Here’s what’s coming:

  • A content processing pipeline. Most enterprise search systems use intake, content processing, query processing, and administrative controls. Frankly I don’t know how this “pipeline” will differ from other search systems crafted in the late 1990s, when the core of Fast Search was built.
  • Metadata extraction. The idea of identifying concepts to help a user find documents or other content objects “about” a topic is not new. The hitch in the existing systems is that metadata extraction imposes a performance hit on a system. Some metadata system do “deep extraction” which takes significant computational time. Some vendors use the notion of metadata and facets interchangeably. I will be interested to see if Fast ESP will bring the product into line with the systems available from such companies as Coveo, Exalead, and Ontolica, among others. (Some of the terms used to describe the approach remind me of Endeca’s current approach to extending its system.)
  • Structured data search. I find the notion of merging data from database tables or third party applications which use an RDBMS to house data and unstructured text interesting. Attivio, Clarabridge, and other firms are working in this niche now. There are significant challenges related to data transformation. In fact, data transformation can chew up significant resources * before * the search system can begin intake and processing.
  • Visual search. This is the Bing 3D interface. I am not sure how that will fly because it does not seem to be a component of text processing which remains the thorn in the side of the SharePoint user.
  • Advanced linguistics. My recollection is that Fast Search & Transfer had a shop in Germany that worked on certain linguistic functions. Some linguistic manipulations require set up and testing. Out of the box linguistic functions are available from some vendors like Basis Tech, but there is a lot of work that must be done to get these systems working so that the outputs match the needs of some users.
  • Best bets. This is a variation of what I think of as Google’s “I’m feeling lucky”. These best bets work when there are sufficient data to make the recommendations useful. If the Fast ESP system uses a simple metric such as the number of documents “about” a topic written by a SharePoint user, I don’t think the best bet will be particularly helpful. More sophisticated methods require big data to generate useful results. Sophisticated methods operating of too small a data flow return bad bets in my experience.
  • Development platform. The original Fast ESP system was coded in a range of languages. Each time an issue was discovered in the installation of Fast on my watch, new scripts had to be written and inserted into the Fast ESP system. My recollection is that Fast was not a homogeneous system. My guess is that the development system will be Microsoft’s own programming tools which include its scripting language and the VisualStudio line of products. It’s easy to talk about a development platform, but the reality of complex systems like Fast ESP may require considerable creativity to achieve certain objectives.
  • Customization. User profiles have been around a long time. The SharePoint version of Fast will make use of available flags as well as information about a user or the groups to which a user belongs in order to present a Google “ig” type of interface. (“IG” means “individualized Google”, which is a newer version of the MyYahoo feature.)

The last point in the write up is the one that took my breath away. Earlier today I wrote about Autonomy’s assertion about “infinite scalability”. Now I read “extreme scale and performance” for the SharePoint / Fast search system. Keep in mind that “extreme” implies some non standard behavior. With Microsoft’s up and out approach to scaling, this phrase “extreme scale and performance” mean that lots and lots of hardware are going to be needed to handle large content flows. In fact only a handful of systems we have tested this year can deal with petascale content flows. I want to mention the little known Perfect Search as one example of a company that has nailed big data with a very modest hardware footprint. I will have to test the new Fast system before I can accept this “extreme” assertion.

Three thoughts crossed my mind as I worked through this SharePoint / Fast blog post from Microsoft:

First, why wasn’t the Fast ESP for SharePoint rolled out at the SharePoint conference? A delay suggests that something was not in sync. I wondered, “Is this not an interesting business strategy for a search vendor to implement?”

Second, these Microsoft blog posts recycle the same old information without adding any substantive new data. Where’s the block diagram? Where’s the sample / default interfaces? Where’s the list of methods?

Third, I keep thinking about the interactions among two complex systems. Who is going to have the time, money, patience, and management support to get these beasties to cuddle in a sleeping bag? With an alleged 100 million SharePoint licenses, those Microsoft Certified Professionals will be eager to give the work a try.

Billing for consulting services ahead for some search experts!

Stephen Arnold, November 2, 2009

Microsoft routed a question to me last week, but no money was forthcoming. After this article, I am a gone goose.

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta