Blossom Software
An Interview with Dr. Alan Feuer
The first time I met Dr. Feuer, I was impressed with his knowledge of search. When I asked him about his other interests, he said that he had a strong interest in art, entertainment, and cinema. He looked at me and said, "I've also tried my hand as a beer maker." Dr. Feuer worked at Bell Labs in its salad days. I knew that many of the Bell wizards had broad, often eclectic interests. Since our first conversation, I have watched Blossom Software become one of the, if not the dominant, vendor of search technology to municipalities At the same time, the company has signed up clients across most industry sectors. |
Unlike some vendors of search technology, Dr. Feuer keeps his hands in the plumbing. In fact, we used the Blossom search system for an open source intelligence Web site called the Threat Open Source Information Gateway. Users told us they liked the speed and flexibility of the system. I spoke with Dr. Feuer in his office in Boston, Massachusetts. My questions are in bold.
What's your background?
I have an MS degree from UC Berkeley and a PhD from Northeastern, both in computer science. I spent seven years as a researcher at Bell Labs during the Unix years. Since then I have developed commercial software, taught in industry and on campus, been the technical editor for Addison Wesley's Advanced Windows Series, and written a couple of books. Currently I spend most of my development time finding interesting things to do to Blossom's search system. You know about my hobbies, and I'm sorry. I forgot to bring my home brew today. You wanted to sample it as I recall.
Yes. I did. Tell me, why did you tackle search and retrieval? It's a fiercely competitive space and there are more than 150 vendors hunting for business?
Intellectually I find search very stimulating as it mixes engineering topics like algorithm design with more subjective fields like interface aesthetics. I enjoy building tools that help people every day, and I particularly like the freedom and flexibility of working for a small technology-centric company.
I've been thinking about features that will help my users. For the past year I have been working on a feature that I call "phrasal query suggestions". When a query generates too few results, the search engine suggests broader queries that are guaranteed to generate more. Similarly, if a query generates too many results, the engine suggests narrower queries guaranteed to generate fewer results.
How do you implement this?
Because Blossom uses proximity search, dropping terms from a phrase broadens it. To generate suggestions, the engine automatically performs a parallel search of all possible subphrases for a query then suggests those that contain the most terms and have the most hits.
If a query generates more than 14 hits, the search engine suggests narrower queries that, again, are guaranteed to produce fewer results. For a proximity search, adding terms narrows a query. To know what terms to add, at indexing time we create an index of all the phrases in all of the documents. To generate suggestions, the engine performs a search against the phrase index, ordering the phrases by length and number of hits.
In a small company, you lack the resources of a Bell Labs. What do you do to get hard data?
That's a good question. As you know, I keep in touch with my former colleagues and I'm a member of ACM's SIGIR. Also, our technique has been the focus of a research study carried out at Northeastern University. The study found that adding phrasal suggestions significantly improves recall. We have also found that phrasal suggestions are a popular search engine feature. After adding them to our commercial service, over 50 percent of follow-up queries utilize a suggestion. You can find full details about the study in a paper delivered at the 2007 Conference on Information and Knowledge Management titled "Evaluation of Phrasal Query Suggestions"). Click here to access this paper.
What motivated you to develop phrasal suggestion?
Query guidance addresses the primary shortcoming of proximity search, its low recall. Guidance also addresses the well-known problem that search engine users usually enter very short queries that often match many documents.
You do code every day? Is that right?
Yes, I continue to write code, and as you know, I answer the phone, give talks, and wash the dishes. Blossom is a small company and there's interesting new work everyday.
Some of the search executives don't have their hands in the technical plumbing. What else makes Blossom different?
"Degree of magic" is a telling scale for classifying search engines. At one end are search engines that take queries very literally; at the other are systems that try to be your intimate personal assistant. Systems high on the magic scale make hidden assumptions that influence the search results. High magic usually implies low transparency. Blossom works very hard to get the user results without throwing too much pixie dust in anyone's eyes.
Can you elaborate on the "magic" metaphor?
Sure. As search systems get more sophisticated, they tend to climb the magic scale. At Blossom we attempt to add power without using magic, providing features that enhance search but don't reduce transparency. Blossom's use of proximity search and prefix matching are good examples. Proximity search has the highest precision of common search operators; it usually gives you get exactly what you ask for. The alternatives sacrifice precision for recall, delivering on average less relevant results.
Prefix matching is an alternative to stemming. Prefix matching requires no inference on the part of the engine-it works just like search in a word processor. Stemming reduces verbs and nouns to a common form, triggering surprising matches that reduce transparency.
What's Blossom's machine architecture?
Blossom servers are dedicated Linux machines running on dual processor X86 boxes. They are geographically distributed at leased facilities in the usual places -- California, Texas, and Virginia.
What's the code base?
Most of the Blossom software is proprietary C or C++. Plus some scripts in Python. We pay quite a bit of attention to algoritimic efficiency. Blossom is pretty fast, and I keep thinking about ways to make efficiency improvements while coming up with innovations.
I agreed not to up your avocation, but I do want you to play fortune teller for a moment. What's ahead in the next 12 to 24 months?
I think there will be some fall out from the Microsoft -- Fast Search acquisition. Also, I think there will be more effort put into understanding what users' want. Personalization seems to be an interest at Google, so I anticipate more activity in that area as well. Mobile is of growing importance. I think these are trends to watch.
I won't let you off the hook. What's coming in 2008 for Blossom?
Our focus will remain on enhancing search. We are looking particularly at applying ideas from question answering systems.
The news this week is Microsoft's purchase of Fast Search & Transfer. Is Blossom for sale?
We are not seeking a buyer, but are always looking for partners. I'm pleased with the growth of our company, and I enjoy working with our customers. I'm having too much fun. I want to grow Blossom and continue my research.
ArnoldIT Comment
Blossom Software provides behind-the-firewall search as an on-premises or hosted solution. The company can index Web sites, combine internal and external content, or index information behind an organization's firewall. For more information, contact Blossom Software by clicking here.
Stephen E. Arnold, February 18, 2008