New Idea Engineering
An Interview with Miles Kehoe
New Idea Engineering is combination of entrepreneurial spirit, rocket science, and an enterprise search system repair shop. The company's founder is Miles Kehoe, a former Verity wizard. Mr. Kehoe has surrounded himself with colleagues who have deep experience with enterprise search systems. The company supports, debugs, fixes, and customizes commercial systems from the likes of Endeca and Google. In addition, New Idea works on innovative services (some of which are available without charge) and others for which the company charges. |
I met Mr. Kehoe and his co-founder Mark Bennett, a Jedi knight skilled in the dark arts of programming, technical analysis, and system integration at Birk's Restaurant in Santa Clara. I grilled Messrs Kehoe and Bennett for more than an hour as we waited for the chef to grill our entrées. Messrs. Kehoe and Bennett have a sixth sense. In the course of the discussion, each completed the other's sentences. I have taken the liberty of converting the text of the interview into statements attributed to Mr. Kehoe; otherwise, I would have be splitting comments in such a way that readability was compromised. The full text of the interview appears below:
What is your background in search?
I started by career in search at Verity in 1989 where I was the first Tech Support person they hired to support the beta customer base - mostly large banks and three-letter agencies. I moved to Fulcrum after a couple of years, but returned to Verity to manage the consulting group and sales systems engineers a year or so before the IPO in 1995.
We – Mark Bennett and I – left Verity in 1996 to start New Idea Engineering. We knew we could help companies implement Verity until we figured out what we wanted to do. We soon found out that all of the major search technologies needed technical experts to make them work, and we had found our niche. We’ve expanded our offerings since then to include strategy and roadmap consulting, but we’re still focused on corporate and customer-facing search technologies.
What did you learn from your work at Verity?
Well, besides a really good understanding of regex, I'd say I learned how useful a well-implemented search application can be in helping people find content. I also had close contact with customers, and came to understand the kinds of things they need to accomplish using search.
Back then we were searching using a proprietary character-mode screen, but Verity had hyperlinks, image link, and really accurate stored queries or 'agents'. I remember one of our test profiling systems notified us about the start of Desert Storm a full 30 minutes before CNN was on the air with the story, and it was awesome to see really great agent technology in use.
What was the origin of SearchButton?
One of our customers was interested in minimizing travel cost; we built the prototype of their search application at our facility where they could monitor our progress over the Internet. Over dinner one night, we wondered to ourselves why companies bought Verity software when they could actually rent it a lot easier. We put together a prototype, and found a friend of ours who was willing to test it on their public-facing site, quote.com.
After a week, he told us they were really happy with the capability. But he also asked what search terms people were using - and of course, given search technology of the day, we had no idea. That sort of information was left to the customer. But we realized that a really useful hosted search application had to provide reporting as well, so we built in full reporting prior to our release at Demo 99.
After a sale, most executives cut back. What is the goal of New Idea Engineering?
After we sold SearchButton.om to Mondosoft, we knew we had some great ideas we had not been able to implement yet. Since Mondosoft had founders, we took the ideas that were only in the concept stage and turned them into products, some of which we still market today.
Why is there an opportunity for a firm with your expertise and skills?
Great search - whether you call it enterprise search, or 'behind the firewall' search – is still more an art than science. And it's not a skill set that is highly portable. Over the years we've built up a skilled team of people who not only know how to implement search technology from all of the leading vendors – they have years or experience designing search that works.
For example, if you are a developer in a large company's IT department, and your manager comes to you and says "I want to send you to Oracle training", you know that the skill will be great to have on your resume. On the other hand, if your manager comes to you and says "I want you to learn the SuperFind Search API", and you’ve never seen any job postings calling for those skills, you might wonder if you should be looking around - it feels like a dead-end skill set.
To make matters worse, the real skill in implementing great search is more methodology than technology. Look at the Gartner Magic Quadrant or the Forrester Wave - chances are good that you are already using a technology in the far upper right corner and it's not working out. Why do companies think that by switching to another technology also in the upper right, their search will magically improve? Success in search means understanding technology, but also understanding business goals, process, content, and user interface design.
What's your take on the proliferation of search appliances from Exegy, Index Engines, Google, Thunderstone, and others?
The search appliance is supposed to resolve complexity, is this a fairy tale or is search getting easier?
There has always been a push to improve the 'out of the box' experience, and the search appliances are the extreme towards OOB ease-of-use. But that ease comes at a price.
The appliance vendors - notably Google, but the others as well - have made assumptions about your content to make it easy to index and search. If your content and requirements matches their generic model, they you've probably got a pretty decent, easy-to-use solution.
But if your content is a little different than the model, you may have just as much work ahead of you to get your search appliance to work on your content. And you actually have to give up some of the capabilities that many companies have come to expect - predictable indexing schedule, query tuning, and even document tuning are just some of these.
What are the most common challenges a company faces when deploying a search system?
I’d say even before deployment begins, getting the right technology for your environment is critical. Understand your content sources and security. What sources will you be allowed to federate, and which will not be part of your index? What do your documents look like? Is there a standard metadata vocabulary?
Make sure you understand who has responsibility for each of the tasks around making search great: interface design; indexing; content promotion; thesaurus terms. How will you measure success? Do you have a taxonomy in place, or do you want to build one based on search activity?
Companies that are successful with search understand it is not a “fire and forget” technology. You need to monitor search periodically – monthly for a new deployment, perhaps quarterly for a mature site. What are your top queries? What vocabulary do your searchers use, and does it match the terms your content owners use? Are there documents that come back in every search? They may be too vague to be useful. Are there documents that never come back in any query? Review them to see if they are still relevant.
How do you complement the internal IT staff?
One of the main things we do is teach IT and Business owners how they can best organize for search. They don't recognize the wide range of skills that go into making an enterprise-class search application. It's not just a search button, it's the search interface, the indexing and data preparation, the project management, the content auditing - not just a simple Java-based UI. And you need to understand what users are actually doing with search.
We encourage companies to create a 'Search Center of Excellence' (SCOE), whether actually or virtually staffed - simply having recognized responsibilities can make the whole process work better.
We also bring our experience with the methodology of great search that probably is not anywhere within the corporation. One thing that often surprises our customers is that once we transfer that knowledge to IT, they are better able to implement subsequent search projects at lower cost and with less outside assistance.
Finally, helping companies map their requirements to the vendors’ features is important. As Mark says, if you’re looking for magic search beans, there are many companies out there that will sell them to you – but chances are the magic beans really won’t work.
When you need to integrate search with other systems, what's the New Idea method?
First, of course, we need to identity those systems and meet with the folks who own them. And we often need to educate IT staff and the site owners about issues around search: bandwidth, security, and content quality.
We want to understand the content that these other systems will provide, and understand how users will view that content. For example, indexing database content is relatively straightforward; but when a user asks to view the data, how will we display it? After all, a row of raw data is not very useful. Some vendors provide hooks for controlling how the document looks, while others leave the exercise to the customer. And if the database record is from a call tracking system, for example, you may want to view the document in that application – how do we launch it, and respect its security?
We’ve also seen situations where some data from the external content management source was missing. In this case it wasn’t entire documents – it was just some of the metadata: some documents had no abstract; some, no title. The search indexer failed to bring the content over – but without giving any hint that content was missing. This is one reason we strongly encourage companies to perform a “data audit” every now and then to make sure nothing critical is missing. If you ever need to produce content for a subpoena, explaining that your search engine missed it is not a great defense.
I find it difficult to locate open source, commercial, and freeware tools to help me manipulate text. Is this a common problem?
There are very few places to find good tools for search. Part of the problem is that when you go looking for these kinds of tools on Google, you get millions and millions of hits because the terms you use are so common. It’s not always easy to find what you are looking for.
We've actually just launched a new site, www.searchcomponentsonoline.com. It’s a community-driven directory of the most useful open source, low cost, and commercial tools on the web. We encourage users to provide feedback on the tools we point to, as well as on tools we should be including. Hopefully, the site will make it a bit easier to find those handy utilities, widgets and web parts to solve problem people have with search.
What programming languages and tools do you use?
We use a variety of languages and tools depending on the task, the search technology, and on the customer’s standard platform.
Java is our primary development language, although we use Perl for scripting. Python is often part of what we do as well. Our guys started using it at SearchButton to make sure I didn’t do any coding, but it’s interesting how many search technology companies rely on Python – Ultraseek, FAST and the Google come to mind.
As far as tools, we have a suite of libraries that we bring to engagements that lets us accomplish common tasks quickly without needing to reinvent the wheel.
Our developers use Eclipse as their development environment; I use vi, the editor of the future.
Can you describe a typical engagement in which you fix a troubled SYSTEM and add a custom function?
Sure. We have a customer who had just acquired new search technology and wanted to bring it up in their customer support environment.
Their detailed specification describe five different sets of content, two if which contained secure data behind SiteMinder. Once we began work, we discovered one of their ‘unsecured’ data sources showed different results based on whether you were authenticated or not: their web server would redact parts of each document that should not be seen by their clients.
Of course, this didn’t seem a problem for them, but as you know the search engine is normally going to index everything it finds – a spider doesn’t know about redacted content.
We explained the issue, and developed a workaround that involved scripting content for two unique collections on-the-fly during indexing, along with some heavy duty parsing scripts. Along the way, we discovered their HTML was not as pure as they thought, which led to even more scripting.
We got the solution working, but it certainly made the engagement much more interesting and intense than it might have been.
Do you provide price estimates?
Yes. Our price varies based on the mixture of skills, the duration of the project, and of course the task to be done.
Speaking of price, let me add that I always worry when a prospect is more interested in price than the quality of the implementation or the speed with which the task can be accomplished. For well understood tasks with commodity tools – Java application development for example – I understand that companies want to find the best price that will meet their requirements. And the project has to be incredibly well defined if you are going to outsource the work.
Search is not a commodity project, even when the vendor provides a search appliance. And the issues and architectures that can impact search are not generally well understood by corporate IT departments – they just don’t know what landmines are waiting.
We encourage customers to bring us in for a short planning phase – 3 to 5 days – to create a detailed statement of work. We view their environment with our understanding of search; we often can suggest small changes that can cut development time or significantly improve the quality of the search results; we avoid ‘fire drills’ during the engagement; and our customers are generally more satisfied with the outcome. In reality, most customers want a proposal up front, before we can really analyze their environment; so we build in a brief “Planning Phase’ into projects, so we can at least identify any potential problems before we are deep in a project.
Most search vendors are unable to support and maintain their products. Why are these firms finding it difficult to have sufficient engineering resources on their full time staff?
It seems that every major search vendor has this kind of problem, but I think the causes vary. Some companies are growing so fast they cannot keep up with the need. Others design products that need expert help to install - and then cannot grow their internal consulting team fast enough.
Another issue is the skill set. I always looked for systems engineers at Verity who had skills such that they could work as a developer, but who liked to get out in front of customers. And let me tell you, it's tough. One of the reasons that our team is so good is that the folks we have working for us worked for us in customer-facing roles at Verity, SearchButton.com, or started out here at New Idea Engineering.
And, of course, I'd have to go back to the methodology versus technology issue I mentioned earlier. Colleges and universities don't teach 'Methodology of Search', so you either have to hire a good person from a competitor, or you need to spend time bringing your own folk up to speed. I’ve seen vendors sent a new consultant on-site – at a premium rate – only to have the consultant spend the entire engagement on the phone to someone who knows how the product works.
Finally, founders of many of today's largest search vendor companies will tell you their product is completely unique in the universe - that you don't need to do anything to get just the right answer automagically. If you believe it's that easy, you don't staff up your field technical organization.
When a search system causes problems, what are the lessons you have learned about common causes of search system failure?
I'd have to say that the biggest problem we see in failed implementations is that the technology the customer picked is just not the right one for their environment.
Corporate IT managers have to remember that a great demo is indistinguishable from product, but sometimes they seem willing to accept the vendor's demo as a suitable substitute for their environment.
There is also a mindset in many IT departments that search is either not critical - it's still often a "check-box item" - or that it must be terribly easy, because Google delivers such great user experience on the internet. They don't see how much more complex their corporate intranet is than the Internet: security, one 'best' document', metadata, taxonomies, and so on.
There are more than 300 enterprise search vendors. Most large companies have five or more so called enterprise systems, what problems do these multi system environments pose?
This multi-vendor environment causes a huge headache in companies, and there doesn't seem to be much hope for a peaceful solution.
First, vendors still subscribe to what we call the "monolithic search" concept.
Every vendor tells you that they can do everything and that their technology can index all of your content - and as you know, Steve, that just isn't so. A pharma researcher doesn't search research content in the same way that an executive searches financials. And if you want to federate results from multiple sources - and the vendor has not been able to convince you to go with the monolithic approach - they assure you that *their* federation product is the only one you want to consider.
Yet companies go on purchasing search technology for specific verticals across the corporation. Sometimes it's just a vertical search – perhaps eDiscovery. Other times it's just easier for the Illinois division to buy from a local vendor and buy it "under the limit" for corporate oversight.
Once all of these technologies are in and running, the problem starts.
Which, if any, should be the monolithic search? Who should federate other sources? What kind of search activity is happening on each of these sites?
On the mega-search that spans many countries, which "retirement plan" page gets promoted to the top position?
New Idea has from Day One tried to make our products and tools cross-vendor, but none of the major vendors has any incentive to do so until customers start objecting.
ArnoldIT Comment
New Idea is one of a small number of firms able to provide management and technical guidance plus hands-on coding. The firm's experience complements the firm's range of software tools. An organization mired in an "enterprise search swamp" may want to seek the advice of New Idea Engineering. The company's unique blend of business and technical savvy make it a resource to consider involving in a project--whether that project has gone right and requires additional functionality or gone wrong and needs a house call from the "search doctors" at New Idea.
Stephen E. Arnold, July 21, 2008