Search Vendors and Source Code
January 21, 2008
A reader of this Web log wrote and asked the question, “Why is software source code (e.g. programs, JCL, Shell scripts, etc.) not included with the “enterprise search” [system]?”
In my own work, I keep the source code because: [a] it’s a miracle (sometimes) that the system really works, and I don’t want youngsters to realize my weaknesses, [b] I don’t want to lose control of my intellectual property such as it is, [c] I am not certain what might happen; for example, a client might intentionally or unintentionally use my work for a purpose with which I am not comfortable, or [d] I might earn more money if I am asked to add customize the system.
No search engine vendor with whom I have worked has provided source code to the licensee unless specific contractual requirements were met. In some U.S. Federal procurements, the vendor may be asked to place a copy of a specific version of the software in escrow. The purpose of placing source code in escrow is to provide an insurance policy and peace of mind. If the vendor goes out of business — so the reasoning goes — then the government agency or consultants acting on the agency’s behalf can keep the system running.
Most of the search systems involved in certain types of government work do place their systems’ source code in escrow. Some commercial agreements with which I have familiarity have requested the source code to be placed in escrow. In my experience, the requirement is discussed thoroughly and considerable attention is given to the language regarding this provision.
I can’t speak for the hundreds of vendors who develop search and content processing systems, but I can speculate that the senior management of these firms have similar reasons to [a], [b], [c], and [d] above.
Based on my conversations with vendors and developers, there may be other factors operating as well. Let me highlight these but remember, your mileage may vary:
First, some vendors don’t develop their own search systems and, therefore, don’t have source code or at least complete source code. For example, when search and content processing companies come into being, the “system” may be a mixture of original code, open source, and licensed components. At start up, the “system” may be positioned in terms of features, not the underlying technology. As a result, no one gives much thought to the source code other than keeping it close to the vest for competitive, legal, or contractual reasons. This is a “repackaging” situation where the marketing paints one picture, and the technical reality is behind the overlay.
Second, some vendors have very complicated deals for their systems technology. One example are vendors who may enjoy a significant market share. Some companies are early adopters of certain technology. In some cases, the expertise may be highly specialized. In the development of commercial products some firms find themselves in interesting licensing arrangements; for example, an entrepreneur may rely on a professor or classmate for some technology. Sometimes, over time, these antecedents are merged with other technology. As a result, these companies do not make their source code available. One result is that some engineers, in the search vendor’s company and at its customer locations, may have to research the solution (which can take time) or perform workarounds to meet their customers’ needs (which can increase the fees for customer service).
Third, some search vendors find themselves with orphaned technology. The search vendor licensed a component from another person or company. That person or company quit business, and the source code disappeared or is mired in complex legal proceedings. As a result, the search vendor doesn’t have the source code itself. Few licensees are willing to foot the bill for Easter egg hunts or resolving legal issues. In my experience, this situation does occur, though not often.
Keep in mind that search and content processing research funded by U.S. government money may be publicly available. The process required to get access to this research work and possibly source code is tricky. Some people don’t realize that the patent for PageRank (US6285999) is held by the Stanford University Board of Trustees, not Google. Federal funding and the Federal “strings” may be partly responsible. My inquiries to Google on this matter have proven ineffectual.
Several companies, including IBM, use Lucene or pieces of Lucene as a search engine. The Lucene engine is available from Apache. You can download code, documentation, and widgets developed by the open source community. One company, Tesuji in Hungary, licenses a version of Lucene plus Lucene support services. So, if you have a Lucene-based search system, you can use the Apache version of the program to understand how the system works.
To summarize, there are many motives for keeping search system source code out of circulation. Whether it’s fear of the competition or a legal consideration, I don’t think search and content processing vendors will change their policies any time soon. I know that when my team has had access to source code for due diligence conducted for a client of mine, I recall my engineers recoiling in horror or laughing in an unflattering manner. The reasons are part programmer snobbishness and part the numerous short cuts that some search system vendors have taken. I chastise my engineers, but I know only too well how time and resource constraints impose constraints that exact harsh penalties. I myself have embraced the policy of “starting with something” instead of “starting from scratch.” That’s why I live in rural Kentucky, burning wood for heat, and eating squirrels for dinner. I am at the opposite end of the intellectual spectrum from the wizards at Google and Microsoft, among other illustrious firms.
Bottom line: some vendors adopt the policy of keeping the source code to themselves. The approach allows the vendors to focus on making the customer happy and has the advantage of keeping the provenance of some technology in the background. You can always ask a vendor to provide source code. Who knows, you may get lucky.
Stephen Arnold, January 21, 2008
Comments
2 Responses to “Search Vendors and Source Code”
I was just now searching around about this when I came by your blog post. I’m simply dropping by to say that I definitely liked seeing this post, it is really well written. Are you thinking of blogging more about this? It appears like there is more material here for more posts.
I’m just dropping by to say that I very much liked seeing this post, it’s very clear and well written. Are you considering posting more about this? It appears like there is more fodder here for more posts.