Lucid Imagination: Open Source Search Reaches for Big Data

September 30, 2011

We are wrapping up a report about the challenges “big data” pose to organizations. Perhaps the most interesting outcome of our research is that there are very few search and content processing systems which can cope with the digital information required by some organizations. Three examples merit listing before I comment on open source search and “big data”.

The first example is the challenge of filtering information required by orgnaizatio0ns produced within the organization and by the organizations staff, contractors, and advisors. We learned in the course of our investigation that the promises of processing updates to Web pages, price lists, contracts, sales and marketing collateral, and other routine information are largely unmet. One of the problems is that the disparate content types have different update and change cycles. The most widely used content management system based on our research results is SharePoint, and SharePoint is not able to deliver a comprehensive listing of content without significant latency. Fixes are available but these are engineering tasks which consume resources. Cloud solutions do not fare much better, once again due to latency. The bottom line is that for information produced within an organization employees are mostly unable to locate information without a manual double check. Latency is the problem. We did identify one system which delivered documented latency across disparate content types of 10 to 15 minutes. The solution is available from Exalead, but the other vendors’ systems were not able to match this problem of putting fresh, timely information produced within an organization in front of system users. Shocked? We were.

lucid decision copy

Reducing latency in search and content processing systems is a major challenge. Vendors often lack the resources required to solve a “hard problem” so “easy problems” are positioned as the key to improving information access. Is latency a popular topic? A few vendors do address the issue; for example, Digital Reasoning and Exalead.

Second, when organizations tap into content produced by third parties, the latency problem becomes more severe. There is the issue of the inefficiency and scaling of frequent index updates. But the larger problem is that once an organization “goes outside” for information, additional variables are introduced. In order to process the broad range of content available from publicly accessible Web sites or the specialized file types used by certain third party content producers, connectors become a factor. Most search vendors obtain connectors from third parties. These work pretty much as advertised for common file types such as Lotus Notes. However, when one of the targeted Web sites such as a commercial news services or a third-party research firm makes a change, the content acquisition system cannot acquire content until the connectors are “fixed”. No problem as long as the company needing the information is prepared to wait. In my experience, broken connectors mean another variable. Again, no problem unless critical information needed to close a deal is overlooked.

Read more

The Governance Air Craft Carrier: Too Big to Sail?

August 31, 2011

In a few days, I disappear into the wilds of a far off land. In theory, a government will pay me, but I am increasingly doubtful of promises made from 3,000 miles from Harrod’s Creek. As part of the run up to my departure, we held a mini webinar/consultation on Tuesday, August 30, 2011, with a particularly energetic company engaged in “governance.” (SharePoint Semantics has dozens of articles about governance. One example is “A Useful Guide to SharePoint Success from Symon Garfield”. The format of the call was basic. The people on the call asked me questions, and I provided only the perspective of three score years and as many online failures can provide. (I will mention SharePoint but my observations apply to other systems as well; for instance, Documentum, Interwoven, FileNet, etc.)

What I want to do in this short write up is identify a subject that we did not tackle directly in that call, which concerned a government project. However, after the call, I realized that what I call an “air craft carrier” problem was germane to the discussion of automated indexing and entity extraction. An air craft carrier today is a modular construction. The idea is that the flight deck is made by one or more vendors, moved to the assembly point, and bolted down. The same approach is taken with cabins, electronics, and weapon systems.

The basic naval engineering best practice is to figure out how to get the design nailed down. Who wants to have propeller assemblies arrive that do not match the hull clearance specification?

What’s an air craft carrier problem? An air craft carrier is a big ship. It is, according to my colleague Rick Fiust, a former naval officer, a “really big ship.” Unlike a rich person’s yacht or a cruise ship, an air craft carrier does more than surprise with its size. Air craft carriers pack a wallop. In grade school I remember learning the phrase “gun boat diplomacy.” The idea was that a couple of gun boats sends a powerful message.

image

What every content centric system aspires to be. Some information technology professionals will tell their bosses or clients, “You have a state of the art search and content processing system. Everything works.” Unlikely in my experience.

Governance or what I like to think of as “editorial policy” is an air craft carrier. The connotation of governance is broad, involves many different functions, and sends a powerful message. The problem is that when content in an organization becomes unmanageable, the air craft carrier runs aground and the crew is not exactly sure what to to about the problem.

Consider this real life example. A well meaning information technology manager installs SharePoint to allow the professionals in marketing to share their documents, price lists, and snippets from a Web site. Then the company acquires another firm, which runs SharePoint as well as a handful of enterprise applications. On the surface, the situation looks straight forward. However, the task of getting the two organizations’ systems to work smoothly is a bit tricky. There are the standard challenges of permissions and access as well as somewhat more exotic ones of coping with intra-unit indexing and index refreshes. Then a third company is acquired, and it runs SharePoint. Unlike the first two installations which were “by the book”, the third company’s information technology unit used SharePoint as a blank canvas and created specialized features and services, plugged in third party components, and some home grown code.

Now the content issue arises. What content is available, when, to whom, and under what circumstances. Because the SharePoint installation was built in separate modules over time, will these fit together? Nope. There was no equivalent of the naval engineering best practice.

Governance, in my opinion, is the buzz word slapped on content centric systems of which SharePoint is but one example. The same governance problem surfaces when multiple content centric systems are joined.

Will after the fact governance solve the content problems in a SharePoint or other content centric environment? In my experience, the answer is, “Unlikely.” There are four reasons:

Cost. Reworking three systems built on the same platform should be trivial. The work is difficult and in some situations, scrapping the original three systems and starting over may be a more cost effective solution. Who knows what interdependencies lurk within the three systems which are supposed to work as one? Open ended engineering projects are likely to encounter funding problems, and the systems must be used “as is” or fixed a problem at a time.

Read more

Interview with John Steinhauer, Search Technologies

August 29, 2011

Search Technologies Corp., a privately-held firm, continues to widen its lead as the premier enterprise search consulting and engineering services firm. Founded six years ago, the company has grown rapidly. The firms dozens of engineers offer clients deep experience in Microsoft (SharePoint and Fast), Lucene/Solr, Google Search Appliances, and Autonomy systems, among others. Another factor that sets Search Technologies apart is that the company is profitable and debt-free, and its business continues to grow at 20 percent or more each year. It is privately held and headquartered in Herndon, VA.

John-Steinhauer

John Steinhauer, vice president of technology, Search Technologies

John Steinhauer

On August 8, I spoke with John Steinhauer,  vice president of technology of Search Technologies. Before joining Search Technologies, Mr. Steinhauer was the director of product management at Convera. He attended Boston University and the University of Chicago. At Search Technologies, Mr. Steinhauer is Responsible for the day-to-day direction of all technical and customer delivery operations. He manages a growing team of more than 75 engineers and project managers. Mr. Steinhauer is one of the most experienced project directors in the enterprise search space, having been involved with hundreds of sophisticated search implementations for commercial and government clients. The full text of the interview appears below.

What’s your role at Search Technologies?

Search Technologies is an IT services provider focused on search engines. Working with search engines is essentially all we do. We’re technology independent and work with most of the leading vendors, and with open source. The things we do with search engines covers a broad spectrum – from helping companies in need of some expert resources to deliver a project on time, to fully inclusive development projects where we analyze, architect, develop and implement a new search-based solution for a customer, and then provide a fully managed service to administer and maintain the application. If required, we can also host it for the customer, at one of our hosting facilities or in the cloud.

My title is VP, Technology and I am one of the three original founders of the company and have been in the search engine business full-time since 1997. I am responsible for the technical organization, comprised of 70+ people, including Professional Services, Engineering, and Technical Support.

From your point of view, what do customers value most about your services?

We bring hard-won experience to customer projects and a deep knowledge of what works and where the difficult issues lie. Our partners, the major search vendors, sometimes find it difficult to be pragmatic, even where they have their own implementation departments, because their primary focus is their software licensing business. That’s not a criticism. As with most enterprise software sectors, license fees pay for all of the valuable research & development that the vendors put in to keep the industry moving forward. But it does mean that in a typical services engagement, less emphasis is put on the need for implementation planning, and ongoing processes to maintain and fine-tune the search application. We focus only on those elements, and this benefits both customers, who get more from their investment, and search engine partners who end up with happier customers.

In your role as VP of Technology, what achievements are you most proud of?

I’m proud that we have built a company with happy customers, happy employees, and good profits. I’m also proud that we’ve delivered some massively complex projects on time and on budget, even after others have tried and failed. It is gratifying that we have ongoing, multi-year relationships with household names such as the US Government Printing Office, Library of Congress, Comcast, the BBC, and Yellowpages.com.

But our primary achievement is probably the level of expertise of our personnel, along with the methodologies and best practices they use that are now embedded into our company culture. When we engage with customers, we bring experience and proven methodologies with us. That mitigates risks and saves money for customers.

Do you recommend search engines to customers?

Occasionally, but only after conducting what we call an “Assessment. We start from first principles and understand the customer’s circumstances; business needs, data sets, user requirements, infrastructure, existing licensing arrangements, etc. Based on a full knowledge of those issues, we offer independent advice and product recommendations including, where appropriate, open source alternatives.

So you also work with customers who have already chosen a search engine?

This is our primary business. Often, our initial engagement with a customer is to solve a problem; they’ve acquired a software license, spent significant time and money on implementation and are having technical problems and/or trouble meeting their deadlines and budgets. Problems include poor relevancy, performance and scaling issues, security issues, data complexity issues, etc. Probably 70% of our customers first engaged with us by asking us to look at a narrow problem and solve it. Once they discover what we can do and how cost effective we are, they typically expand the scope into implementation of the full solution. We help people to implement best practices to reduce complexity and ownership cost, while dramatically improving the quality of the search service.

So, what’s your secret sauce?

With search projects, usually the secret sauce is that there is no secret sauce. Success is down to hard work and execution at the detail level.

What makes Search Technologies unique?

Sure. If there is any secret to building great search applications, it is usually in showing greater respect for the data and how best to process and enhance it to enable sophisticated search features to work effectively through the front end. That and just experience from hundreds of search application development projects. When a customer hires a Search Technologies Engineer to participate in their project, they are not just getting a well-trained, hard working and hugely experienced individual who writes good code, they are getting access to 80+ technical colleagues in the background with more than 40,000 person-days experience on search projects. We’re great at sharing experiences and best practices – we’ve worked hard at that since the beginning. Also, our staff turnover is really low. People who like working with search engines like it here, and they tend to stick around. That huge body of experience is our differentiation.

So you’re pure services, no software of your own?

In customer engagements we’re pure services. That’s our business. But as a company of largely technical people, of course we’ve developed software along the way. But we do so for the purposes of making our implementation services more efficient, and our support and maintenance services more reliable and sustainable.

Where is the search engine industry heading?

There are now two 800 pound Gorillas in the market, called Microsoft and Google. That’s a big difference from the somewhat fractious market that existed for 10 years ago. That will certainly make it harder for smaller vendors to find oxygen. But at the same time, these very large companies have their own agendas for what features and platforms matter for them and their customers. They will not attempt to be all things to all prospective customers in the same way that smaller hungrier vendors have. In theory this should leave gaps for either products or services companies to fill where specific and relatively sophisticated capabilities are required. We see those requirements all over the place.

Open source (primarily SOLR/Lucene) is making major inroads too. We are seeing a lot of large companies move in this direction.

So is innovation dead?

Not at all. Actually we see lots of companies doing really cool and innovative things with search. Many people have been operating on the assumption that search software would reach a sort of commodity state. Analysts have predicted this for years, that once all the hard problems had been solved, then all search engines would have equivalent capabilities and compete on price. What we’re seeing is very different from that. People are realizing that these problems can’t just be solved and then packaged into an off the shelf solution.

Instead the software vendors are putting a ring fence around the core search functionality and then letting integrators and smart customers go from there. With search, there are now some firmly established basics: Platforms need good indexing pipelines, relevancy algorithms that can be tweaked to suit the audience, navigation options based on metadata, readable, insightful results summaries. But that’s just the starting point for great search.

Here’s an example we’ve been involved with recently. Auto-completion functions have been around for years. You start the search clue, the system suggests what you’re looking for, to help you complete it more quickly. We’ve recently implemented some innovative new ways of doing this, working with a customer who has a specific business need. This includes relevancy ranking and tweaking of auto-completions suggestions, and the inclusion of industry jargon. Influencing search behavior in this way not only helps the customer to provide a very efficient search service, it also supports business goals by promoting particular products and services in context. Think of it as a form or relevancy tuning, but applicable to the search clue and not just the results. These are small tweaks that can have a big impact on the customer’s bottom line.

Another big innovation is SaaS models for search applications. This has also been talked about for years, but is really just now coming into focus in practical ways that customers can leverage.

I understand that your business is growing. Where are you heading and what might Search Technologies look like in a couple of years?

Perhaps the most pleasing thing of all for me personally, is that a lot of our growth, which is averaging 20%+ year on year, comes from perpetuating existing relationships with customers. This speaks well for customer satisfaction levels. We’ve just renewed our Microsoft GOLD partner status, and as a part of that, we conduct a customer satisfaction survey and share the results with Microsoft. The returns this year have been really great. So one of the places we are heading is to build ever longer, deeper relationships with companies for who search is a critical application. We initially engaged with all of our largest customers by providing a few consultant-days of search expertise and implementation services. Today, we provide these same customers with turnkey design and implementation, hosting services, and “hands-off” managed services where all the customer does is use the search application and focus on their core business. This model works really well. Through our experience and focus on search we can run search systems very efficiently and provide a consistently excellent search experience to the customer’s user community. In the future we’ll do a lot more of this.

Finally, tell me something about yourself

I grew up in Michigan, have lived in Chicago, Boston, DC, London and now in San Diego. The best thing about that is I can ride my bike to work most mornings year round. I have two boys (4 years old and 6 months old), neither of whom have the slightest clue what a Michigan winter entails. I expect that will continue for the foreseeable future.

Don C Anderson, August 29, 2011

Sponsored by Search Technologies

Going Fast and Missing a Curve: Collision or Near Miss?

July 23, 2011

Last week we heard a number of rumors about layoffs and other organizational shifts at the Microsoft Fast Search units. We are not sure whether the news reported at Enterprise Search: The Business and Technology of Corporate Search was accurate. We don’t want to speculate.

We, like you, read:

[We] just learned that most of the FAST people we work with here in California and across the country have been laid off by Microsoft, apparently effective immediately. This is the team that was responsible for selling the FAST ESP products – FSIS and FSIA – as well as working with the Microsoft sales teams on Fast Search for SharePoint (FS4SP). Funny, I was just drafting a blog post today on ‘the future of FAST’ and I’m glad I hadn’t finished; I never would have guessed this at all.

Let’s assume that the rumor is false. The Microsoft consultants don’t make any changes. SharePoint generates significant consulting opportunity just the way it is.

Let’s assume the rumor is true. There are many firms ready, willing, and able to provide the technical support you need for your current SharePoint and Fast search installation. For most licensees, Microsoft’s shifting staff or reorganizing is almost a business-as-usual management method in Redmond.

Let’s assume there is just more uncertainty about the Fast search technology. My view is that deep experience in search is more important than speculating about what a very large company is doing to manage its products and services for its clients. I explain some of the issues associated with Microsoft’s approach to search in my new monograph The New Landscape of Enterprise Search. Check it out. (Sorry. I don’t provide the juicy details in this free blog.)

So, let’s put aside the issue of a single shift in a product. The focus at most SharePoint focused service firms will be on helping clients solve their technical problems. What is likely to happen is that some SharePoint licensees will look for search solutions which have traction in the marketplace and proven staying power. For that reason, you may want to check out the Exalead approach.

Stephen E Arnold, July 23, 2011

Sponsored by Article One Partners, your source for patent research.

Search: An Information Retrieval Fukushima?

May 18, 2011

Information about the scale of the horrific nuclear disaster in Japan at the Fukushima Daiichi nuclear complex is now becoming more widely known.

Expertise and Smoothing

My interest in the event is the engineering of a necklace of old-style reactors and the problems the LOCA (loss of coolant accident) triggered. The nagging thought I had was that today’s nuclear engineers understood the issues with the reactor design, the placement of the spent fuel pool, and the risks posed by an earthquake. After my years in the nuclear industry, I am quite confident that engineers articulated these issues. However, the technical information gets “smoothed” and simplified. The complexities of nuclear power generation are well known at least in engineering schools. The nuclear engineers are often viewed as odd ducks by the civil engineers and mechanical engineers. A nuclear engineer has to do the regular engineering stuff of calculating loads and looking up data in hefty tomes. But the nukes need grounding in chemistry, physics, and math, lots of math. Then the engineer who wants to become a certified, professional nuclear engineer has some other hoops to jump through. I won’t bore you with the details, but the end result of the process produces people who can explain clearly a particular process and its impacts.

image

Does your search experience emit signs of troubles within?

The problem is that art history majors, journalists, failed Web masters, and even Harvard and Wharton MBAs get bored quickly. The details of a particular nuclear process makes zero sense to someone more comfortable commenting about the color of Mona Lisa’s gown. So “smoothing” takes place. The ridges and outcrops of scientific and statistical knowledge get simplified. Once a complex situation has been smoothed, the need for hard expertise is diminished. With these simplifications, the liberal arts crowd can “reason” about risks, costs, upsides, and downsides.

image

A nuclear fall out map. The effect of a search meltdown extends far beyond the boundaries of a single user’s actions. Flawed search and retrieval has major consequences, many of which cannot be predicted with high confidence.

Everything works in an acceptable or okay manner until there is a LOCA or some other problem like a stuck valve or a crack in a pipe in a radioactive area of the reactor. Quickly the complexities, risks, and costs of the “smoothed problem” reveal the fissures and crags of reality.

Web search and enterprise search are now experiencing what I call a Fukushima event. After years of contentment with finding information, suddenly the dashboards are blinking yellow and red. Users are unable to find the information needed to do their job or something as basic as locate a colleague’s telephone number or office location. I have separated Web search and enterprise search in my professional work.

I want to depart for a moment and consider the two “species” of search as a single process before the ideas slip away from me. I know that Web search processes publicly accessible content, has the luxury of ignoring servers with high latency, and filtering content to create an index that meets the vendors’ needs, not the users’ needs. I know that enterprise search must handle diverse content types, must cope with security and access controls, and perform more functions that one of those two inch wide Swiss Army knives on sale at the airport in Geneva. I understand. My concern is broader is this write up. Please, bear with me.

Read more

BA-Insight Sees Opportunity through Azure Colored Glasses

May 9, 2011

It seems that BA Insight is embracing the media marketing trend as they showcase their new technology on Microsoft Channel 9. The interview and article “Building On Azure: BA Insight” which are located on the Microsoft Channel 9 Web Site provide some interesting details about the new search technology. BA Insight integrated its new search technology into FAST and SharePoint 2010. A passage that caught my attention was:

BA Insight’s advanced user interface, which, among other things, removes the burden of having to download content to assess relevance. Using this technology, individual pages, slides, or worksheets can be previewed without downloading the entirety of any one file.

Cloud computing through Microsoft Office 365 and the Windows Azure Platform allow BA Insight to handle heavy workloads efficiently. The cloud is still a relatively new technology but the possible implications of the technology could provide Microsoft customers with notable options. However, the cloud computing problems that have struck the very popular Amazon do raise doubt but maybe Azure can prove that there is light at the end of the tunnel?

Is the cloud the future of computing? It seems to make sense for organizations struggling to contain computing costs and cope with staffing challenges. However, the assumption is that organizations can afford the bandwidth and the risk of losing a connection when a big deal is in the balance. Google is cheerleading for cloud computing as well.

What happens when a cloud based search system is unavailable? Employees will have to scramble. The big deal may be saved but at what cost? Will senior managers and CFOs listen and act? Sure, until there is an Amazon event. Everything works on paper and in PowerPoint presentations. The real world often behaves in unexpected ways.

Alice Holmes, May 9, 2011

Freebie

Study of Enterprise Search

March 12, 2011

Research vendors, magazines owned by consulting firms, and dozens of “experts” just keep explaining why search is an issue. I find these reports fascinating because each purports to explain what enterprise search is, provide profiles to six, 12 or in this case more than 30 vendors’ products. The information involves opinion, surveys, and rehashes of previous reports. I am old enough (66) and jaded from more than three decades of laboring in the online vineyards to view these reports with a curious frame of mind and amusement.

You can get a synopsis of a longer report in the Information Week story “Go Rogue with Enterprise Search.” What? “Go Rogue?” Before I read the four part article I wondered how a key function like finding an electronic document or other information object is “rogue.” My understanding of rogue is “a deceitful or unreliable scoundrel” or the Australian horror film about tourists who are pursued by a giant crocodile.

image

Source: Graph Jam, where consultants often get their graphs. http://graphjam.memebase.com/upcoming/page/2531/

Search or finding needed information is too important to be slapped with the “rogue” moniker. But that is my opinion, and you may well find that “rogue” is the perfect description for what enterprise search has become in today’s marketing-centric world. Like other enterprise applications, the software system may be difficult to put under a simple, clear explanation of what happens upon installation.

Please, read the Information Week story and sign up for the full report.

Here’s my view of three key points in the write up.

First, here’s a factoid that I don’t understand.

Despite more than a decade of product development aimed at helping companies find information across their networks, a paltry 22% of the 433 business technology professionals polled in InformationWeek Analytics’ Search 2011 survey have purchased the technology. That’s down from 24% in our 2008 survey.

Read more

Concept Searching Finds Map Gold

February 18, 2011

The Microsoft Partner Program consists of three levels, the highest represented by the Gold label.  Given the diverse array of companies that partner with Microsoft, it has added Competencies to their plan.  These must be earned and showcase a level of ability and specialization within the structure of the course.  Attaining this grade of certification has long been an arduous process, and it appears the requirements have been elevated. These revisions include employing Microsoft certified individuals, exams for the staff and customer references.

Of course, the benefits have been sweetened too, one of which is the integrity and distinction the logo seems to carry, something numerous businesses find attractive.  Additional perks: software packages, bundled technical support and training resources.  The list continues.

Concept Searching, a U.K. based metadata generation and classification software  company founded in 2002, recently announced their attainment of the gold level.  The firm’s freshly gilded conceptClassifier for SharePoint is “the only statistical based classification and taxonomy solution to use concept extraction and conceptual metadata generation to achieve the optimal approach to manage unstructured content”.  The company was asked to join the Microsoft Managed ISV Program in 2009, an offer reserved for reportedly 1% of the partners worldwide.  The press release goes on to state their technology is being implemented in the search and management fields to solve a broad set of problems.

Sarah Rogers, February 18, 2011

Freebie

Repositioning 2011: The Mad Scramble

December 15, 2010

Yep, the new year fast approaches. Time to turn one’s thoughts to vendors of search, content processing, data fusion, text mining, and—who could forget?—knowledge management. In the last two weeks, I have done several live-and-in-person briefings about ArnoldIT.com’s views on enterprise search and related disciplines.

Today enterprise search has become what I call an elastic concept. It is stretched over a baker’s dozen of quite divergent information retrieval concepts. Examples range from the old bugaboo of many companies customer support to the effervescence of knowledge management. In between the hard realities of the costs of support actual customers and the frothy topping of “knowledge”.

Several trends are pushing through the fractured landscape of information retrieval. Like earthquakes, the effects can vary significantly depending on one’s position at the time of the event.

image

Source: http://www.sportsnet.ca/gallery/2009/12/30/scramble_gal_640.jpg

Search can looked at in different ways. One can focus on a particular problem; for example, content management system repositories. The challenge is to find information in these systems. One would think that after years of making Web pages, the problem would be solved. Apparently not. CMS with embedded search stubs trigger some grousing in most of the organizations with which I am familiar. Search works, just not exactly as the users expect. A vendor of search technology can position the search solution as one that makes it easy for users to locate information in a CMS. This is, of course, the pitch of numerous Microsoft Certified Gold resellers of various types of search solutions, utilities, and work arounds. This an example of a search market defined by the type of enterprise system that creates a retrieval problem.

Other problems for search crop up when specific rules and regulations mandate a particular type of information processing. One example is the eDiscovery market. Anyone can be sued, and eDiscovery systems have to make content findable, but the users of an eDiscovery system have quite particular needs. One example is bookkeeping so that the time and search process can be documented and provided upon request under certain conditions.

Social media has created a new type of problem. One can take a specific industry sector such as the Madison Avenue crowd and apply information technology to the social media problem. The idea is for a search system to “harvest” data from social content sources like Facebook or Twitter, process the text which can be ambiguous, and generate information about how the people creating Facebook messages or tweets perceive a product, person, ad, or some other activity for the advertising team. The idea is that search unlocks hidden information. The Mad Ave crowd thinks in terms of nuggets of information that will allow the ad team to upsell the advertiser. Search is doing search work but the object of the exercise is to make sense out of content streams that are too voluminous for a single person to read. This type of search market—which may not be classic search and retrieval at all—is closer to what various intelligence agencies want software to do to transcribed phone calls, email, and general information from a range of sources.

Let’s stop with the examples of information access problems already. There are more information access problems than at any other time, and I want to move on to the impact of these quite diverse problems upon vendors in 2011.

Now let’s take a vendor that has a search system that can index Word documents, email, and content found in most office environments. Nothing tricky like product specifications, chemical structures, or the data in the R&D department’s lab notebooks. For mainstream search, here is the problem:

Commoditization

Right now (now pun on the vendor of customer support solutions by the way) anyone can download an open source search solution. It helps if the person downloading Lucene, Solr, or one of the other open source solutions has a technical bent. If not, a local university’s computer science department can provide a student to do the installation and get the system up and running. If the part time contracting approach won’t work, you can hire a company specializing in open source to do the work. There are dozens of these outfits bouncing around.

Read more

Exclusive Interview: Brian Pinkerton

December 15, 2010

Introduction

At a recent conference, there was much buzz about consulting firms’ opinions about enterprise search. I spoke with several people who expressed surprise at the “rankings”. For example, one high-profile firm pronounced Vivisimo as the top vendor in enterprise search. Vivisimo positions itself as an “information optimization” company. I am not sure what that means, but it is clear that “enterprise search” is not the company’s main focus. Nevertheless, Vivisimo is number one.

Okay, but Vivisimo started life a company with on-the-fly clustering. Then Vivisimo morphed into a vendor of federated search. Next Vivisimo dabbled in government contracts. After an executive shake up and an infusion of venture capital, Vivisimo emerged as an “information optimization” company. The phrase is as confusing as Google’s “contextual discovery.”

What are these marketers talking about? The answer is making sales and no-calorie marketing jargon. The consulting firms know a sales opportunity exists when user satisfaction with enterprise search is chugging along in the 50 to 70 percent range. Yes, most users of an enterprise “findability” system are unhappy. Procurement teams are, therefore, busy because most companies are looking for a search silver bullet.

To cater to those looking for a quick, simple way to solve an enterprise information access problem, consultants and advisors offer impressionistic write ups. Madison Avenue works fine when selling toothpaste. Apply that method to the very tough problem of information retrieval, and you end up with confusion, rising costs, and unhappy users.

Let me give you another example that surfaced in my conversations with vendors in London at the December International Online Conference. I learned that one consulting firm named Endeca as the top dog in enterprise search. I am okay with that assertion as long as there are some data to back up the claim. When I hear the name “Endeca”, I think of eCommerce as the core strength. The system can be applied to other information problems, but when I recall Endeca’s patent applications, I think about eCommerce, not discovery and data fusion.

Perhaps some search firms are more adept at social engineering than software engineering? Are some search advisors doing Madison Avenue-type thinking, not engineering analyses?

I don’t have any quibble with consulting firms who peg Autonomy as Number One. The revenue alone makes the difference between Autonomy and other information access vendors evident. Last time I saw Andrew Kanter, the chief operating officer for the vendor of meaning-based computing solutions, I asked him, “When will Autonomy break the $1.0 billion in revenue barrier?” He told  and an audience of about 175 people that Autonomy “was only $900 million.” Yep, $900 million, which is orders of magnitude greater than most of the 300 vendors whose information retrieval technology I track. IBM, Google, Microsoft, and Oracle do not provide search revenue detail in the financial reports. So on revenue Autonomy has a valid claim to the Number One position in enterprise search.

Consulting Firms Want to Sell Work, Not Expose Warts

Consulting firms—particularly those confined to the mid-tier below the McKinseys, the Bains and the Booz Allens and above the independent experts—have to feed their firms’ revenue hunger. Consulting is an expensive business because full time employees have to be kept billable. Making sales, therefore, is more important than objectivity in my experience.

What mid tier consulting firm sales professional wants to irritate an IBM, Google, Microsoft, or Oracle? Big companies, therefore, are often graded on the curve. Is it not easier to rubber stamp search systems from these Big Four vendors? Get along, go along is perhaps the motto in certain situations.

One consequence of the pressure to make sales is that consulting firms have to back certain horses. The idea is to focus on commercial vendors who are likely to have an appetite for buying and paying for the services of the consulting firm.

Somewhat surprisingly, most of the consulting firms’ search analyses fumble the ball when it comes to open source search; namely, Lucene/Solr, FLAX, Tesuji, and others. The fact is that organizations like Cisco Systems, eHarmony, LinkedIn, MTV, and Twitter, among others are relying on open source “findability” solutions, in particular Lucene/Solr. Open source search is now a viable option for many organizations, and the deprecation of Lucene/Solr is surprising to me.

The bottom-line is that most search vendor league tables are suspect. Unfortunately, these league tables are viewed fact.

On December 10, 2010, I wanted to get an open source technology to talk about open source search and how that option is perceived by marketing organizations masquerading as independent analysts.

The Interview

I spoke with Dr. Brian Pinkerton, one of Lucid Imagination’s vice president of product development. Brian has has a Ph.D. in Computer Science & Engineering and started his work career as a senior software engineer at NeXT. He then developed WebCrawler, the Web’s first comprehensive search engine.

image

Brian Pinkerton, VP Product Development, Lucid Imagination

Since then he was Technical Architect at AOL (which acquired WebCrawler), VP of Engineering and Chief Scientist at Excite, Principal Architect at A9, Director of Search at Technorati and co-founder/President of Minimal Loop, whose technology was acquired by Scout Labs and where Brian was VP of Engineering.

Today (December 15, 2010) Lucid Imagination is announcing the general availability of its Lucid Works enterprise product, which is available for free download. the product is described as a search solution development platform built on open source Apache Lucene/Solr.

The full text of my interview with Brian appears below:

Several consulting firms have issued analyses of the enterprise search market. I noted that open source search in general and Lucid Imagination in particular were not highlighted as top candidates for the enterprise. Why is open source search put on the bench?

Economics, primarily.  Because customers spend huge amounts of money on commercial packages, a small industry has grown up to support and encourage such decisions.  This process is naturally set up to ignore disruptive technologies, especially ones that are price-disruptive.  The consulting firms don’t work for free: getting prominent placement in a report usually costs money.  Who’s paying that fee for open source?   Another important reason is the market: developers, not IT managers, are the main adopters of open source solutions, while IT execs are the main consumers of the fancy reports.

Large organizations rely on consultants’ reports. In your opinion are these reports accurate?

It’s hard to comment on these reports because the methods are not always transparent. These consultants spend a lot of time talking to vendors and customers, and draw some conclusions based on that. Many of them have been at it for a while, and they survive by providing useful insights.  One useful thing to note, though, is that their conclusions are biased by those they talk to and their target audience: the IT exec.  If you’re one of those, I’m sure you like the reports.  If you’re a developer, you might not.

How is Lucid Imagination productizing open source search?

We have released a product, LucidWorks Enterprise, that extends Lucene/Solr with features commonly needed by commercial customers.  We focus on is providing technology that will make open source Lucene/Solr more accessible to more people.  For instance, user interfaces  that simplify getting started, or APIs that are specifically targeted to the way enterprises build and integrate applications today.

For example, we extend Solr with RESTful interfaces for configuration; that provides developers with the ability to integrate it more easily. We also simplify functions that could be built from open source, but are more convenient to take as ready-made features.  Finally, we add features that 99.9% of software developers probably can’t create easily from scratch, such as our Click Scoring framework, which boosts search results selected most often by users.

Furthermore, open source projects are really good at broad innovation, transparency, and easy access.  But the communities around open source projects are not support organizations, so many vendors help companies adopting open source with timely expert support. That’s another one of the things we do at Lucid.

What steps have you taken to ensure the stability of the open source search product you offer?

We take the latest, most stable innovations from the open source development tree (known as ‘trunk’) and provide rigorous integration testing, as well as regular, stable releases driven by customer opportunities. We follow strict software engineering principles and use a quality-driven  release process to build LucidWorks Enterprise.  And we provide maintenance fixes and releases for our product in timely fashion to customers.

Proprietary search vendors emphasize that their approach ensures that licensees get timely bug fixes and updates. Is this a valid statement? What does Lucid Imagination provide a customer who wants timely bug fixes and updates?

I think both open-source vendors and commercial software suppliers provide timely bug fixes and updates.  On the open-source side, it’s an interesting challenge because some bugs are fixed nearly instantly by the open source community, but they are not packaged in a way that a production customer can easily consume.  Production customers want bug-fix-only branches of the the software, not bug fixes accompanied by the latest feature innovations that happened to be committed at the same time.  We insulate our customers from the open-source volatility by releasing stable, bug-fix-only branches for our production customers.

Search technology has fragmented into a mind numbing number of implementations such as an appliance, cloud or hosted search, on premises search, and combinations of methods. How does Lucid Imagination’s search product fit into this fragmented solutions landscape?

LucidWorks Enterprise is a product that spans the range from software appliance to developer toolkit.  Customers new to search can deploy it in a turnkey fashion, while more sophisticated customers can dive under the hood and build a complex application around it.  A key secret to great search is how well it fits the business it is meant to serve — in fact, this is true of any application, particularly custom built apps. We believe that anyone who needs better than ‘adequate’ search results will want to build their search solution, and we created LucidWorks Enterprise to provide the best, lowest cost, most scalable platform for building that search solution.

Microsoft SharePoint provides a search solution. Microsoft offers the Fast technology for a more robust solution. What does Lucid Imagination provide to a SharePoint licensee wanting an enhanced search solution?

We will release a robust SharePoint solution in the first two quarters of 2011 and provide anyone to use LucidWorks Enterprise to search their SharePoint data alongside data from other common sources.  One of the open questions about the new SharePoint solution is how long Microsoft will support Fast’s integration with anything but SharePoint.

Many search vendors offer faceted search; that is, the system generates hot links to related or supporting content. What is Lucid Imagination’s approach to faceted search?

Both LucidWorks Enterprise and Solr provide faceting support on every query that enables users to refine their results.   Faceting is most obviously useful in eCommerce, though a wide variety of applications also take advantage of the feature.  LucidWorks Enterprise and Solr support efficient and scalable faceting on any field, providing human-readable labels and accurate facet counts for the top facets.  One of the important considerations for large collections is the degree to which faceting works in a distributed configuration.  In LucidWorks Enterprise and Solr, faceting is supported seamlessly in distributed situations, offering the full performance at scale.

Would you describe a customer support use case for Lucid Imagination search?  What are some common themes?

Because we have a diverse base of customers, we see a wide range of search applications.  One common theme is relevance tuning: for instance, customers who need help tying certain results to certain queries, or just better optimizing the algorithms built with Solr & Lucene to deliver the right results.  Another common theme, and one that I personally enjoy helping customers with, is performance.  We had one customer who replaced a commercial search engine with Solr, reducing their median query response time from 30 seconds to about four seconds without our help. We then helped them reduce that by another factor of eight, to a median query response time of under half a second.

With open source search gaining acceptance within large companies like Cisco and high demand Web applications like Twitter, why are the consulting firms giving open source and Lucid so little attention?

One reason is that it’s coming up really, really fast — and they may not see it coming.  Also, open source adoption is often driven by a broad, diffuse population of developers.  The developers don’t generally put much stock in what the analysts say, if they’re even aware of the reports to begin with.  And on the flip side, the analysts are paying attention to their own customers, CIOs and vendor salespeople, who may not know how the work is really getting done.

What do you suggest a procurement team do to evaluate fully an open source search solution such as the one Lucid Imagination offers?

I think they need to make sure their company is comfortable with creating their own applications; it’s not a passive technology, but one that can be actively used to drive competitive advantage.  In looking at vendors, find one that can offer a solution that grows as their needs and skills grow: from something simple in the beginning to something fully customizable as they become more sophisticated consumers.  And most importantly, they should look for a company with the depth and expertise to provide training, support, and consulting to help them harness the full scope of search innovation.  Finally, they should do the math compared to what they might pay for a comparable implementation with a commercial enterprise search vendor. In many cases, they’re already spending many times what it would cost them to buy an open source-based solution. Sometimes they’ll pay more just for the annual maintenance — excluding consulting and license fees — than for a complete subscription for LucidWorks Enterprise.

In several of the recent analyses of enterprise search systems I have reviewed, I learned about such companies as Sinequa, Fabasoft and Expert System, both examples of firms that have zero profile in many organizations. In your opinion, why are these types of search vendors given so much attention in the search market?

I can imagine that the marketing guys at such organizations are always happy to talk to industry analysts. I spend my time mainly talking to customers and developers.

How can one get more information about Lucid Imagination and its open source enterprise search solution?

Our Web site  www.lucidimagination.com  is full of information about our product, LucidWorks Enterprise, and other information about the open source technologies, Lucene and Solr.  We also have case studies that show how customers are building applications and products with Solr, Lucene, and LucidWorks Enterprise. And I always recommend downloading our product, now available free to developers, and taking it for a spin.

ArnoldIT Comment

My view about consulting firms’ analyses of search and content processing vendors has evolved over the last two years. The economic impact has put pressure on most of the companies that sell technical advice. Since the 2008 financial storm roiled commercial waters, certain advisory firms have shifted from independent analyses to what generates revenue for the consulting firms.

Many of the consulting firms’ reports are white papers or marketing material. The problem is that search is a particularly difficult technical field. Selecting a search system is often a difficult challenge for a procurement team. There are numerous, complex factors to consider.

Consulting firms offer “advice” about what system or systems is the “best” at a particular function. The problem is that writing about search is different from implementing search. It is easier to describe what a search vendor asserts in a demo. It is harder to take that solution and solve a real-world problem in a Microsoft SharePoint environment or in a setting where numerous mission critical applications operate in a stand alone manner.

If you are looking for a search solution, you will need to develop a “tight spec” and then investigate the options that match specific requirements. Few organizations have the time or resources to test multiple systems before making a decision about what search system to license.

The need for information about search creates an opportunity for independent firms to provide information, often at a hefty fee. In my experience, selecting a search system requires an approach close to the one that Martin White and I set forth in our 2009 book Successful Enterprise Search Management, published by Galatea in the UK.

We suggest that procurement teams become familiar with the available literature about search. Then a methodical process of assessment and evaluation can be followed. The short cut often leads to the all-too-common complaints about a search system. Users cannot locate needed information and user satisfaction plummets.

Stephen E Arnold, December 15, 2010

Sponsored

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta