Interview with Rahul Agarwalla, Uchida Spectrum

April 28, 2011

Introduction

After breaking my foot and cancelling my trips for April and May 2011, I spoke with Rahul Agarwalla, one of the individuals whom I wanted to meet at the upcoming Lucene Revolution Conference. For more information about this important open source search event, click the banner at the top of the blog page or navigate to www.lucenerevolution.com.

rahul agarwalla profile picture

Rahull Agarwalla, Uchida Spectrum Inc.

Rahul Agarwalla heads international business for Uchida Spectrum Inc, Japan. Previously he has built and exited two content/technology ventures including Matrix Information, the pioneer of digital content syndication in India. He has over 14 years of experience with various search technologies like Verity, Fast ESP (enterprise search platform) and Solr/Lucene.

Mr. Agarwalla is the founder of Uchida Spectrum, a key factor in Japan’s search and content processing market. USI provides SMART InSight, a search application used by many Fortune 500 companies for specialized industry applications like R&D and quality assurance for manufacturing, claims and customer management etc.

Originally SMART/InSight was based on Microsoft Fast Search technology. SMART InSight, a search application that integrates and analyzes enterprise information.The solution is used by such organizations as Canon and Moody’s. Uchida Spectrum is working with Lucid Imagination across Asia as the Strategic Alliance Partner to integrate LucidWorks Enterprise into its products and offer Lucene/Solr support services to clients.

Mr. Agarwalla’s firm has become something of a specialist in moving organizations from the ageing Fast Search &  Transfer search platform to the newer open source solutions available today.  I spoke with him on April 27, 2011. The full text of the interview appears below.

The Interview

What’s the principal business focus of Uchida Spectrum?

Uchida Spectrum is one the leaders in the Japan search market. It all started in 2002, when we saw opportunity at the intersection of software and information. That was the inspiration to launch the search business. Our product, SMART InSight, is a search application that integrates information from across the enterprise in easy–to-navigate cross department information chains, and adds visual summaries that add value through contextual metadata and analytics.

Instead of focusing on enterprise search as a horizontal solution, we found companies placed great value on Line of Business applications. SMART InSight is used for specific applications; for example, quality assurance, research and development, product development, claims and customer management. Those interested in our solutions are large multi-nationals across various sectors, such as electronics, automotive, chemicals, finance and engineering, as well as smaller organizations.

When did you become interested in text and content processing?

During my MBA, I won a competition for my paper on the power of information and the Internet. One of the judges, S. Swaminathan, pushed me towards the emerging internet/information sector. I went on to set up India’s first digital content syndication business where we had to deal with massive amounts of content from our more than 200 content partners. Doing this manually was just not scalable. Therefore, we embraced content and text processing technologies. We were partly inspired by the Dialog Information Service, which was once part of Lockheed. In 1998, Dialog had more content than all of the World Wide Web. I think the number I heard was 12 terabytes of data. The massive growth in information flows since then means today the challenges of extracting, normalizing and adding metadata, are common to all businesses.

Uchida has been investing in search and content processing for several years, and has recently moved from embedding Fast Search’s technology to Open Source Solr and Lucene. What’s been the big payoff from your work with Lucid Imagination?

The tie up has been very positive.

Our product, SMART InSight, uses search to integrate and retrieve information — so scalability and reliability, at reasonable cost, are critical factors. Lucene/Solr has delivered this in spades. The amount of data we can index on a server and the ability to scale in a linear fashion are unmatched. For instance, in one project we found a 10x improvement due to lower cost of ownership combined with higher performance.

As the quantity of data grows unabated, customers are extremely concerned about cost of future expansion. Working with Lucid Imagination we are able to meet such technical and business challenges and build a future proof foundation.

Are you doing research and development too?

Absolutely.

Our research and development efforts are focused on dealing with massive disparate data. MNC [Multi-National Corporations] have to deal with multiple languages, complex security rules, and different data formats and structures.

Integrating this data in a user-friendly manner involves much more than conventional content normalization. We need to understand the meaning of the information and its context. Presenting various types of information in an intuitive interface and quickly is our second challenge. Here we are looking at constantly improving our mash up and RIA [Rich Internet Application] widgets technology.

Many vendors argue that mash ups are and data fusion are “the way information retrieval will work going forward”. What’s your view of the structured – unstructured data approach?

Information fusion is fundamental to cognition. For example, if you have a stuffy nose, the food you eat can be tasteless. We are better off because we use all our senses together. Information retrieval needs to be adapted to our way of working and not the other way around.

Islands of information have come into being due to the historical approach of information technology. Search gives us a paradigm to overcome this. For instance, customer surveys, warranty claims, quality testing should all have a relationship with product and part data to analyze defects and impact. The massive automobile recalls in 2009 are symptomatic of not connecting the dots. Companies today are much better equipped for product improvement and quick failure detection to the extent they have integrated car performance data (from the car’s onboard systems) with other data sources. The same holds true when we look at voice of customer data or degree customer management.

Based on the experience of our customers, we strongly believe search has more value when you add more data sources. The key is to enable users to understand and explore the inter-relationships between different datasets in an intuitive manner.

Without divulging your firm’s methods, will you characterize a typical use case for your firm’s content processing, tagging, and search and retrieval capabilities?

SMART InSight enables customers to integrate multiple data sets from one or more departments, to easily navigate across these datasets and to analyze massive data sets using charts and tables driven by search. The application interface is determined by the data and by what kind of discovery & analysis customer needs – somewhat similar to a BI system.

Can you give me an example?

Sure. I want to use an automotive industry use case.

We fed data from the American agency NHTSA (National Highway and Transport Safety Administration) into SMART InSight. We shared this prototype with some of the large Japanese auto firms. Their analysts discerned found trends and common issues using the standard charts and other features.

We then integrated customer’s datasets with the NHTSA data to build a powerful analysis application. One of its key features is, what we call: “Data Chain”.

What’s a “data chain”?

A data chain uses fields with similar meaning—for example, component category or VIN [Vehicle Identification Number–to create data driven inter-linkages. These linkages allow you to navigate dynamically across the data sets. So users starting from part failure were able to ‘data chain’ to performance data for the affected cars, related claims, NHTSA reports etc. Engineers were thus able to comprehend the problem in a comprehensive way, what I call a 360 degree view. Analysts were then able with a mouse click to drill down to causes and solutions. The analysts could also comment on and tag the data to create a shared context and highlight trends and issues for their co-workers. All the tagging and other usage and sharing information is part of an automated learning loop, which constantly improves the search relevancy and makes users more productive.

Services are often among the highest margin offerings an organization can offer. Is the need to sell consulting altering the simplicity of the installation, configuration, tuning, and customization of search applications?

Information, even if accessed on an iPad, is a complex challenge.

While we provide services, our business model is built around providing the product and related services like maintenance & support. We want our customers and partners to be able to manage the technology, as they best know how to maximize the value. We advise on best practices, help them overcome technical hurdles and provide support to ensure risk is minimized.

What’s the upside?

The key benefits depend upon each offering. First, we have a product which delivers upon installation a rich feature set in a reliable and scalable product. This enables customers to build solutions that address their use cases by focusing on the business logic without worrying about the technology.

Next our approach includes maintenance and support. We know that customers want support in order to reduce risk and ensure a successful experience during the life of the solution.

Finally, we help our clients create an internal team, which can manage and expand the search solution in tight synchronization with evolving business requirements.

How does an information retrieval engagement move through its life cycle?

It usually starts with customers asking us to help them understand how information retrieval can play a part in meeting their business challenges. We then get the customer’s sample data and wrap SMART InSight around it.

The approach involves some data analysis to integrate the information and building a few sample screens using our Ajax portal interface. Once users play with the data using the sample screens, they can imagine how best to analyze the information and what kind of application UI is required. We recommend this approach, as customers do not need to first create requirements specifications. Customers and users find it is much easier to change and improve the interface working from this kind of prototype than it is to start from a blank page.

The final implementation focuses on helping customers tune our widget library and pages to build the required application UI. Once this is underway, we then map mapping the data to configure correctly  the content processing and index schema.

In our projects, ground up development is minimal as our feature set includes content ingestion, search, portal and collaboration features. Post implementation moves to training the customer team and helping them maintain and enhance the solution through support services.

One challenge to those involved with squeezing useful elements from large volumes of content is the volume of content AND the rate of change in existing content objects. What does your firm provide to customers to help them deal with the volume (scaling) challenge?

We serve two market segments: An Intranet within the licensee’s enterprise, and Internet or outside the firewall information.

In the enterprise market, update frequency is relatively lower except while dealing with transactional databases. We have implemented customer solutions with over 100 million records from 10 data sources. There were no latency issues.

Things get more challenging in the Internet segment. We are currently dealing with a project in China where not only does the data have over 300 facets, its volume and update frequency are both amongst the largest in the world. Having the expertise brought by Lucid Imagination becomes critical in such situations. Together, Lucid Imagination and Uchida Spectrum are helping this customer architect a large scale system by optimizing queries and the schema with multiple indexing and search nodes.

Another challenge, particularly in professional intelligence operations, is moving data from point A to point B; that is, information enters a system but it must be made available to an individual who needs that information or at least must know about the information. What does your firm offer licensees to address this issue of content “push”, report generation, and personalization within a workflow?

That’s a great question.

My colleagues and I believe that the “Right information, right person, right time” is the critical need of many of our customers. SMART InSight offers sophisticated alerting to achieve this. Multi-parameter rule driven alerts can be sent out in real time. We also offer daily or weekly digests for other information needs. The value of alerting increases as we add more data sources into the index. Users are then able to monitor and track all relevant information flows.

There has been a surge in interest in putting “everything” in a repository and then manipulating the indexes to the information in the repository. On the surface, this seems to be gaining traction because network resident information can “disappear” or become unavailable. What’s your view of the repository versus non-repository approach to content processing?

As you know, Solr/Lucene creates an index of the information and does not store the actual information. With this approach one of significant advantages we experience is flexibility. Typically, repository solutions are developed following strict waterfall methodology with stable requirement specifications. We think this approach may be a bit out of step with today’s rapidly evolving information climate. By comparison we can be far more flexible, for example, by using dynamic fields in Solr/Lucene and readily changing the ranking algorithm.

We use connectors bundled with LucidWorks Enterprise to pull the data from databases and other content repositories. In some cases, our system integration partners or us may build a custom connector. The LucidWorks Enterprise connector framework we get from Lucid Imagination makes this much easier.

Visualization has been a great addition to briefings. On the other hand, visualization and other graphic eye candy can be a problem to those in stressful operational situations. What’s your firm’s approach to presenting “outputs” for end user reuse or for mobile access? Is there native support in Solr/Lucene or via Lucid Imagination for results formats?

What and how much information to put on the screen is always a challenge; SMART InSight resolves the clutter problem in two ways.

First, visualization, when used correctly, is extremely powerful. For this reason, our solution implementation focuses on designing the right application UI. We have built up a great deal of experience over multiple projects and are able to guide customers to design screens for experts and different ones for the simple user.

Second, we also enable users to build their own UI by selecting widgets much like iGoogle or My Yahoo. Thus a user who prefers graphs can add chart widgets and manipulate what should be the X an Y axes. We use LucidWorks Enterprise features like faceting and scoring to build accurate charts. Control over the widgets and what content fields users would like to see enables fully personalized information consumption.

I am on the fence about the merging of retrieval within other applications. What’s your take on the “new” method that some people describe as “search enabled applications”? Autonomy and Endeca each have work flow components as part of their search platforms? What’s Uchida Spectrum’s capability in workflow or similar enterprise embedding of search?

SMART InSight is both “search enabled” and “database enabled”. I wonder if any vendor uses the term “database enabled application”. The point of the “search enabled” jargon is that search is a relatively newer technology than databases. As technology becomes embedded into our lives it is no longer noticed.

Search is much more than a search box and a set of results. I think some of the work being done here by Autonomy and Endeca is commendable. The question is whether they can deliver value at a reasonable price point and thus cater to more customer segments. In this context, we are using the Lucene/Solr open technology as the foundation because we are able to deliver high return on investment with a flexible and scalable solution.

We believe this will expand the market for search and thus, hopefully, make the phrase “search enabled application” redundant.

I see you will be speaking at the forthcoming Lucene Revolution conference. What are the key trends you expect to see materialize there?

One of the key debates is search versus database. Lucene Revolution will inform this debate by showcasing how more and more large firms are choosing search. This impacts the perception of search as an enterprise ready technology. As a snowball effect, I see search augmenting databases in many applications. Companies will then need to build search expertise much the same way they have database architects and developers. I believe Lucid Imagination will play a central role in making this happen.

Lucene Revolution brings higher cohesiveness to the Lucene/Solr movement and makes visible its size. Its disruptive innovation and open source model poses a strong challenge for the established commercial vendors. The mainstreaming of the interest in Lucene/Solr means these players need to fashion a cogent strategy response. Might this trigger realignment within the search industry – mergers, diversification or focus on niche markets?

Our priority is expanding the search story in relatively under-penetrated markets like China & India. The large IT pool especially in India offers an opportunity to expand the Lucene/Solr movement. Today these engineers have developed the habit of only using databases in their solution architecture – and as the adage goes “if you have a hammer, everything looks like a nail”. We need to train them on search so it is a default part of their solution toolkit. This becomes imperative as China and India will be at the center of the Internet due to the size of their fast growing online populations and rising income levels.

ArnoldIT Comment

If you are seeking a resource to assist your organization in moving from Fast Search’s ageing technology to the Lucene/Solr platform, you will want to speak with Uchida Spectrum. You can get more information about Uchida Spectrum at the Lucene Revolution Conference and from the firm’s Web site at http://www.spectrum.co.jp/.

Stephen E Arnold, April 28, 2011

Interview courtesy of Lucid Imagination

Google Squeezes SEO Experts: The Panda Choke Hold

April 26, 2011

Introduction

In late March 2011, I gave a 15 minute talk at the iBreakfast Meeting in Manhattan. A few days earlier, I spoke at an Incisive conference in Hong Kong, delivering essentially the same message. In a nutshell, I pointed out that Google’s algorithm changes were only the tip of the iceberg regarding relevance improvement in search results. Search engine optimization or SEO has gamed the free Web indexes so that relevance is decreasing. The fix, I said, is to focus on content. My name for this approach is “content with intent.” The idea I told the two different groups is to create high value content and follow the basic rules of providing facts, sources, and useful information. SEO methods talk about content and then fall back on techniques that try to deliver something for nothing. When you run a query and get pages with no information, the click benefits the owner of the page and does absolutely nothing for the user when the page is without substance.

In Hong Kong, the audience reacted positively. The idea of publishing detailed information, providing sources for the information, and injecting original ideas was enervating. In New York, the opposite was true. I received a couple of emails with harsh, New York style comments. If I were 22, I suppose my feelings would have been hurt. At age 66, my reaction was, “Man, these people don’t understand the change that is upon them.”

I did get a couple of positive follow ups. One person was a stealth type financial analyst and we have talked via telephone. The other was feedback from Peter Niemi. I poked around and learned that Mr. Niemi was the founder of GHG Interactive, the digital marketing arm of the Grey Healthcare Group, in 1995. He was the senior executive until 2000. After more than a decade at Grey he joined Torre Lazur McCann led the team that designed and executed the Paxil Web presence for GlaxoSmithKline which remained one of the top ten pharmaceutical sites in the world until the Paxil patent expiration.

IMG_9505

In 2001 Peter co-founded a hybrid technology-advertising agency called Hyphen, an Omnicom company. Hyphen built a software package for managing clinical trial data as well as an online experience guiding consumers through the complex details surrounding the fertility treatment process. Peter currently uses his digital marketing expertise to craft the launch strategies for early stage ventures. Peter earned his MBA at Columbia Business School and also holds a BA in English Literature from Columbia.

I spoke with Mr. Niemi on April 18, 2011. The full text of my interview with him appears below:

What drew you into search and content processing it seems far afield from English Literature and business school?

I spent my career in the ad agency world as the field of digital advertising evolved. This experience lead to my fascination with applying technology tools to the challenges of branding.

Search proved very useful when I was working on my MBA at Columbia Business School. I suppose it was that experience that hooked me on online and various other tools of high finance. I applied many of the modeling principles I learned to our world of digital marketing, and the result is the approach my colleagues and I use today in our work. It’s a quantitative methodology for approaching what is traditionally considered a qualitative challenge.

And the poems?

Still useful particularly the line from George Bernard Shaw: “If all economists were laid end to end, they would not reach a conclusion.” In my work today, we have a small group with an entrepreneurial streak that motivates us to apply our methods to early stage and growth companies. Our mission is to deliver effective (and affordable!) digital marketing results for the next generation of great companies.

We also focus on results, skipping the economic theory stuff.

You reacted to my discussion of “content with intent” in a positive manner. Most of the search engine optimization professionals wanted to tar and feather me. What’s behind your interest in using content to generate impact?

Yep, the SEO experts are reeling from Google’s crack down on gaming the Google relevance system. Some SEO professionals react poorly to evidence that they may not be the smartest guys in the room as we saw.

There are two shortcomings to search marketing, persistence and commoditization. There is no doubt that SEO is a necessary part of any digital marketer’s toolkit, a critical element for digital marketing success. The majority of Web sessions commence with search. That’s where the eyeballs start, but not where they end. That’s the first problem, the lack of persistence. How do you get a customer if the customer never comes back or forgets you in a second or two? A successful approach includes search but must go further than that, surrounding a target audience with messaging at every touch point.

The second problem is that SEO is a commodity. Everyone is doing it to some degree, from the smallest blog to the biggest consumer brand site. SEO requires constant managing to achieve consistent success. In the last couple of years, more and more effort seems to be needed to keep one’s head above water. Market forces, competition, and changing technology require marketing professionals to revisit our campaigns more and more often. Search media agencies charge nice monthly fees to perpetuate what I call a “search arms race.” Google makes $28 billion a year off search engine marketing. In my experience, neither Google nor the marketers are motivated to challenge the status quo. Like investment banks, they make a good living off the status quo and change is not in their best interests.

Why is Google making changes?

There is growing concern about the relevance of search results. There is mixed information about big social search services are becoming. In the Facebook, LinkedIn, and Twitter world, SEO methods don’t work. SEO is like a mechanic who tries to fix a Ford with parts from a washing machine.

In the ad and marketing world, what’s the perception of SEO?

Agency people are wired to focus on the creative product as an end in itself, not the results it produces. There is some appreciation for the value of targeted SEO, but little awareness of the overwhelming power that a properly executed campaign can wield. Search is often an afterthought in the traditional marketing world, and there is not nearly enough understanding of how effective the tools of direct response marketers can be when applied branded messaging to create explosive mindshare growth.

In our practice we care about nothing but the results. In fact, we have long held that creative advertising awards such as the Addy Awards should be abolished. Creative achievement runs contrary to the best interests of the marketer. We are salesmen, not artists, and must be measured quantitatively, not aesthetically. If you want to paint, paint. If you want to sell, sell. There is no ambiguity.

Give me an example?

Sure, our creative director is enormously talented, truly a genius. But he does not lead the conversation about what we are going to do. The audience does that. The data do that. Once we’ve done the analysis then the rest of the team leaves him alone to work his magic. He creates content that works; content with intent. If he wins any creative awards he loses his key to the executive washroom.

But today, traffic generation techniques like SEO should be a marketer’s first concern. Conversion effectiveness should be the second. All content should be tailored to maximize those factors. Artistic quality doesn’t even enter the conversation.

I am delighted to discuss Shakespeare’s unique literary genius, just not while I am working. There were no content algorithms back then so as optimized language goes, his plays perform terrible and his sonnets are even worse.

What’s the impact of this new world?

Any digital marketing effort that is solely reliant on search for traffic is at the whim of the search engines, as we saw during Google’s recent formula changes. Web site owners too reliant on SEO find themselves under tremendous pressure to adapt to rule changes that can wipe out carefully built site traffic overnight. Just like investing, reliance on one property for too much of your returns is eventually going to let you down hard. The solution is to diversify. Successful digital marketers use a portfolio of techniques to minimize risk from any one source of revenue. The most powerful of these tools is semantic content crafted to deliver results that last. The problem is few know how to use this tool properly.

SEM and online advertising can certainly help to kick start traffic to a new site. Paid media can help to plug traffic holes created by changing conditions. Nothing, however, can substitute for the long term effectiveness of a thoughtful semantic campaign with content, creative, and search all working together to blanket a market with a message. Our approach will drive any idea to the top of every conversation, online and offline. This is true in any field. Branding. Politics. Entertainment. Direct sales.

Is there room for a different approach? The reaction to my talk in Hong Kong was positive. In Manhattan, it was mixed.

Aggressive marketers are always looking for competitive advantages. The pioneers recognize that the current state of search makes any gains temporary until the other guys figure out what you did and copy it. Next generation techniques are of great interest to these people out on the leading edge. Current methods find out where traffic is and then compete to capture it. Semantic techniques go where the traffic is going to be and own it. Ad agencies are figuring out what the early adopters of semantic advertising–our clients–are doing, but the entrepreneurs are a few steps ahead right now. Your content with intent method involves words, tactics, and technology. That’s a different bundle to shoulder.

In Hong Kong, the mobile device is the primary means of accessing digital content. What’s your view of the mobile revolution?

The practice of building web sites as marketing tools, online brochures, is dwindling in significance and has been for some time. This is not a short term situation for traditional communication methods. It is a long term trend that is increasing in momentum like a snowball rolling down a hill. Soon enough more than half of web sessions will be conducted from a mobile device, so the traditional web site is only part of the picture.

Marketers do recognize that and the standard response can be seen in the rush to roll out social media strategies and mobile apps. These are largely versions of marketing content repurposed for different platforms, hardly an innovative or disruptive approach. It’s also unsustainable, as platforms come and go, rise and fall in significance.

What is required is an approach that looks ahead, not back. An approach built for the future, not based on the past. That’s what we have created and employ for our clients. That’s the beauty of our semantic content and proprietary distribution engine.

Let’s jump back to Google. What’s your take on the company’s recent efforts to improve relevance for its search users?

Good question and a tough one.

Google’s recent moves put the pressure on Web marketers to adapt and improve their offerings to continue to see the results which they are accustomed to achieving. Hopefully it also puts pressure on them to diversify, a prescription we have been advocating for some time now. Google is dominant in search (and with YouTube, online video) and so remains a pivotal factor in any successful digital marketing portfolio.

For the most part, Google does not, however, create any content. For that they need the rest of the online world: advertisers, publishers, and sellers. Content creators. Without them there is no Google–at least not with a market cap of $185 billion.

Despite their lofty ideology, Google is a business like any other. To quote the company’s annual report:

We generate revenue primarily by delivering relevant, cost-effective online advertising.

By “primarily” Google’s wordsmiths seem to  mean that more than 95% of their income comes from paid advertising through the AdWords and AdSense networks. That’s where they make their revenue and as a public company they have a fiduciary obligation to their shareholders to maximize that revenue.

The recent changes were in service of driving revenue by increasing the quality and effectiveness, and thus the cost, of that advertising. Plain and simple.

Google does not have any issue with so-called “content farms” except that they drive down advertising prices through redundant and low quality content. After we recognize that we can move on to addressing the recent shifts, which are significant for all of us, content farmers or not.

Read more

Asia Technical Services

April 20, 2011

An Interview with Patrick and Jean Garez

In Hong Kong in late March 2011, I met with one of the senior officers of Asia Tech. The company’s official name is “Asia Technical Services Pte Ltd.” I learned about the company from Dassault Exalead. For eight years Asia Tech has been the partner for Exalead in Asia and has become the “go to” resource for the Dassault Systèmes team covering South Asia regarding Exalead after the acquisition. Based in Singapore, Asia Tech is hours away from Dassault clients in Thailand, China, and Viet-Nam, among other countries whose thirst for Dassault technology continues to increase. In my initial conversation with Jean Garez, the person who appears to be the heir apparent to the firm his father founded, I learned that Asia Tech is now responding to a surge of inquiries about Exalead’s search based applications.

jeanpatrick

Patrick (founder) and Jean Garez (senior manager), Asia Technology Services Pte Ltd.

Upon my return to the US, I followed up with Mr. Garez via Skype for a more lengthy discussion. On the call, Patrick Garez joined the interview. For convenience, I have merged the comments from both Garezs into one stream. The full text of that interview appears below:

What’s the history of Asia Tech?

Asia Technical Services Pte Ltd was first conceived in Hong Kong in 1974 by our founder, and my father, Patrick Garez. The original business was the marketing and after-sales support of products, engineering services and asset management solutions to the commercial aviation industry. My father was a pioneer because he was among the first to predict the growth potential of commercial aviation in the Asia Pacific region and to identify Singapore as the future hub for South East Asia and beyond.

Along the way ATS tackled some industry-specific software solutions supporting various maintenance data management, engineering processes and workflows, but it wasn’t until 2003 that ATS officially began distributing software solutions as a dedicated part of our business.

What triggered the shift?

Client demand. ATS has prided itself on responding to the needs of its clients across this region. Once we started doing work in a different area, word of mouth sent additional projects our way.

ATS focuses on finding leading edge innovative and cost effective ISV solutions from Europe and the US and offering them a platform to enter into the Asia Pacific market with a limited investment.

And your activity in search?

Same path.

In the mid-2000’s up until probably 2009, the search market in Singapore and the region was dominated by legacy platforms built with an 80’s approach key word indexing and  information retrieval. There was some interest in the SPSS and SAS approach to structured data, of course.

However, in response to a client project, we came across a technologically-advanced company in Paris, France. The founder was a member of the original Digital Equipment AltaVista.com search team and making significant progress with technology that was scalable and very, very speedy. In addition, Exalead was deploying a lighter, automated semantic engine that did the thinking for the user by automatically categorizing and providing structure to unstructured data. We tapped them for our client project from then on, we knew we were going to see great things from them. We continued to follow and participate in the growth of this company from their incubation phase until its acquisition in 2010 by Dassault Systems. ATS remains its partner for the region.

Read more

Recorded Future in the Spotlight: An Interview with Christopher Ahlberg

April 5, 2011

It is big news when In-Q-Tel, the investment arm of the US intelligence community, funds a company. It is really big news when Google funds a company. But when both of these tech-savvy organizations fund a company, Beyond Search has to take notice.

After some floundering around, ArnoldIT was able to secure a one-on-one interview with the founder of Recorded Future. The company is one of the next-generation cloud-centric analytics firms. What sets the company apart technically is, of course, the magnetism that pulled In-Q-Tel and Google to the Boston-based firm.

Mr. Ahlberg, one of the founders of Spotfire which was acquired by the hyper-smart TIBCO organization, has turned his attention to Web content and predictions. Using sophisticated numerical recipes, Recorded Future can make observations about trends. This is not fortune telling, but mathematics talking.

In my interview with Mr. Ahlberg, he said:

We set out to organize unstructured information at very large scale by events and time. A query might return a link to a document that says something like “Hu Jintao will tomorrow land in Paris for talks with Sarkozy” or “Apple will next week hold a product launch event in San Francisco”). We wanted to take this information and make insights available through a stunning user experiences and application programming interfaces. Our idea was that an API would allow others to tap into the richness and potential of Internet content in a new way.

When I probed for an example, he told me:

What we do is to tag information very, very carefully. For example, we add metatags that make explicit when we locate an item of data. We tag when that datum was published. We tag when we analyzed that datum. We also tag when we find it, when it was published, when we analyzed it, and what actual time point (past, present, future) to which the datum refers. The time precision is quite important. Time makes it possible for end users and modelers to deal with this important attribute. At this stage in our technology’s capabilities, we’re not trying to claim that we can beat someone like Reuters or Bloomberg at delivering a piece of news the fastest. But if you’re interested in monitoring, for example, the co-incidence of an insider trade with a product recall we can probably beat most at that.

To read the full text of the interview with Mr. Ahlberg click here. The interview is part of the Search Wizards Speak collection of first person narratives about search and content processing. Available without charge on the ArnoldIT.com Web site, the more than 50 interviews comprise the largest repository of first hand explanations of “findability” available.

If you want your search or content processing company featured in this interview series, write seaky2000 at yahoo dot com.

Stephen E Arnold, April 5, 2011

Freebie

Exclusive Interview with Kamran Khan

March 15, 2011

Enterprise search vendors are changing their market positioning more quickly than at any other time. The vendors’ technology gets new features and functions. With an already complex system, a licensee often needs the help of specialists to get the system up and running. Other companies may have a search system, find it unsuitable, and need help preparing a business case for a new procurement. In short, almost any facet of an enterprise search project may need specialized expertise.

Search Technologies Corp., a privately held firm, has experienced steady, rapid growth over the last five years. The economic downturn had little effect on the company which now has offices across the US and in the United Kingdom. I was able to talk with the founder of this professional engineering services firm in order to get some insight into why Search Technologies has been unaffected by the economic storms that ripple across the business landscape.

Kamran Khan, the founder of Search Technologies, told me in response to my question, “What’s the secret of Search Technologies’ success?”

The founders and the management team are all veterans of the enterprise search industry with between 15 and 22 years experience. I entered the industry in the early 1990s. We all used to work for major search engine vendors. It seemed to us that most search engines contained great technology but were poorly implemented, and we thought that forming a company focused on helping people to implement search software made a lot of sense. Today, we have more than 80 staff, but implementing search solutions is still all we do. Word of mouth about our competence is another important factor in our success.

(You can read the full text of the interview at this link.)

Many companies assert their expertise in dealing with the search and content processing systems of such companies as Microsoft, Google, Autonomy and others. But customers want more than key words. One hot trend is data fusion or a mash up of disparate information instead of search list. I asked Mr. Khan about this demand. He told me:

Absolutely. Many customers are demanding more than a laundry list of results. Mash ups, data fusion and other sophisticated approaches to information presentation will no doubt proliferate. In our experience, the importance of data structure creation is often under-estimated though. People get excited by cool new features but don’t follow through and plan properly to create the necessary data structures to support the cool features, or put processes in place to maintain data structure quality through time as their data set evolves. Data sets have an annoying habit of evolving, just when you thought you’d nailed the search engine implementation. So a substantial part of our business involves helping customers with existing search systems to address challenges, such as relevancy issues. A lack of attention to detail in preparing the data set for search is often the root cause. This has become a significant part of our business and we’ve established a specific services practice around something we call Document Preparation Methodology for Search. Maintaining data structure requires proven ongoing processes, and not just technology.

To read the full text of my interview with one of the leaders in the search engineering and services sector, navigate to Search Wizards Speak on the ArnoldIT.com Web site. For more information about Search Technologies, visit www.searchtechnologies.com.

Stephen E Arnold, March 15, 2011

Exclusive Interview: Abe Music, Digital Reasoning

February 16, 2011

Digital Reasoning, based in Franklin, Tennessee, is one of a handful of companies breaking a path through the content jungle. The firm’s approach processes a wide range of “big data”. The system’s proprietary methods make it easy to discern trends, identify high-value items of data, and see the relationships among people, places, and things otherwise lost in the “noise” of digital information.

In addition to a number of high-profile customers in the defense and intelligence communities, the company is attracting interest from healthcare and financial institutions. Also, professionals engaged in eDiscovery, and practitioners in competitive intelligence are expressing interest in the company’s approach to “big data”. The idea of “big data” is large volumes of structured and unstructured content such as Twitter messages, Web logs, reports, email messages, blog data and system generated numerical outputs is increasingly important. The problem is that the content arrives continuously and in ever increasing volume.

Digital Reasoning has created a system and an interface that converts a nearly impossible reading task into reports, displays, and graphics that eliminate the drudgery and the normal process of looking at only a part of a very large collection of content. Their flagship product, Synthesys® essentially converts “big data” into the underlying facts, connections and associations making it possible to understand large scale data by examining facts instead of reading first.

I spoke with senior software engineer, Abe Music about Digital Reasoning’s approach and the firm’s activities in the open source community. Like some other next-generation analytics companies, Digital Reasoning makes use of open source software in order to reduce development time and introduce a standards-based approach into the firm’s innovative technology.

The full text of my interview with Abe Music appears below.

When did you first start following open source software?

I originally began learning about open-source software while in college. At Western Kentucky University we had a very prominent Linux users group that advocated open-source software wherever possible. This continued throughout my college career in any project that would allow it and after, where in my first job out of school, Python was the language of choice.

How does Digital Reasoning create a contribution to Open Source community through github?

Currently, PyStratus is the only contribution through github although more contributions are underway.

What is github?

Good question. github is a Web-based hosting service for open source software projects that use a revision control system. github offers both commercial plans and free accounts for open source projects, and it is a key community resource for the open source developers.

What is PyStratus?

Here at Digital Reasoning, we were using a set of Python scripts from Cloudera’s Hadoop distribution to manage our Hadoop clusters in the cloud.

Soon after, we had the need to easily manage our Cassandra clusters as well. We decided to leverage the work Cloudera had already done by converting the Cloudera Distribution of Hadoop or CDH scripts into an all-in-one solution for managing Hadoop, Cassandra and hybrid Hadoop/Cassandra clusters.

For us, we did a complete refactoring of the CDH scripts into an easily extensible Python framework for managing our services in the cloud.

What’s “refactoring”?

“Refactoring” to me is the process of changing a computer program’s source code without modifying its external functional behavior. Here at Digital Reasoning, when we refactor were are improving some of the attributes of the software such as performance or resource consumption, etc.

Thank you. Why are some firms supporting open source software?

I personally don’t see any downside to open-source software, but, of course, I am quite biased.

I can see, from the business side, a reason to stay closed if you had developed your business around some intellectual property that you wanted to control.

But I believe that open-source software really fills a void in the tech community because it allows anyone to take the software and extend it to fit their individual requirements without having to reinvent the wheel.

I also think it is important to use open-source software as a reference to learn some new technology or algorithm.

Personally I think that working with open source software is a great way to learn and I would recommend anyone writing code to consider using open source as a way to add to their personal coding knowledge base.

What are the advantages of tapping into the open source software trend that seems to be building?

One of the major advantages I see from using open-source software is that it makes possible taking some outstanding work from a community of developers. With open source software, I can put software to work immediately without much effort.

As a developer leveraging that technology — and not developing it yourself — you get the added benefit of very minimal maintenance on that piece of your software. If there is a bug, the community taps the collective pool of expertise. When someone adds to a project, everyone can take advantage of that innovation. The advantages of this approach range from greater reliability or a more rapid pace for innovation.

And I would definitely recommend giving back to the community wherever possible.

When you want to use open source software, what is your process for testing and determining what you can do with a particular library or component?

That’s a very good question. This is my favorite part actually.

Because there are so many great open-source technologies out there I get to play with all of them when considering which component(s) to use. I don’t have a particular process that I use to evaluate the software. I have a clear idea of what I need out of the component before I begin the evaluation. If there are similar components I will try to match each of them up to one another and determine which one fits my requirements the best.

Is this work or play? You seem quite enthusiastic about what strikes me as very complicated technical work?

To be candid, I find exploring, learning, and building enjoyable. I can’t speak for the other technologists at Digital Reasoning, but I find this type of problem-solving and analytical work both fun and rewarding. Maybe “play” is not the right word, but I like the challenge of this type of engineering.

Quite a few companies are supporting open source, including IBM. in your view will more companies be developing with open source in mind?

Yes, I definitely believe that more and more companies will begin supporting the open-source community simply because of the vast amount of benefits they can gain.

As a strategic move to support open-source a company could easily reduce development costs by “outsourcing” development to a particular piece of community-supported technology rather than developing it themselves.

The use of open source means that an organization not only get access to a piece of software that is not completely developed by them, but they also get to interface with some potential candidates for employment, contribute to fostering new ideas, and work within a community that is very passionate about what they are contributing to.

What next for Digital Reasoning and open source?

Our commitment to open source is strong. We have a number of ideas about projects. Look for further announcements in the future.

How can a person get more information about Digital Reasoning?

Our Web site is www.digitalreasoning.com. I know that you have interviewed our founder, Tim Estes, on two separate occasions, and there is a great deal of detailed information in those interviews as well. We have also recently announced Synthesys® Platform as a beta program allowing API access to our “big data” analytics with your data where we take complete responsibility for managing the cloud resources. More information about his new program can be found at http://dev.digitalreasoning.com.

Beyond Search Comment

A number of companies have embraced open source software. In an era of big data, Digital Reasoning has identified open source technology that helps cope with the challenges of peta-scale flows of structured and unstructured content. The firm’s new version of its flagship Synthesys service delivers blistering performance and easy-to-understand outputs in near-real time. Open source software has influenced Digital Reasoning and Digital Reasoning’s contribution to the open source community helps make useful technical innovations available to other developers.

Our view is that Digital Reasoning is taking a solid engineering approach to service its customers.

Stephen E Arnold, January 12, 2011

Exclusive Interview: The Kochs of Pandia.com

February 1, 2011

Many professionals in search and content processing read the information produced by Pandia.com. I talked with Susanne Koch at the International Online Show in London in December 2010. I followed up last month. You can read the full text of my conversation with Per and Susanne Koch in the Search Wizards Speak feature about Pandia.

Pandia’s Search Central contains a weal of information. The company also provides a round up of search engine news and provides a number of useful search tools. I was curious about the wide range of content available on the Pandia.com Web site. Per Koch said:

My background is from the humanities. Therefore the social transformative power of search interests me. Technology is often seen as something different than social processes. For me technical change and social/cultural change are two sides of the same coin. Google is an end product of a social revolution, including – for instance – the hippie inspired gospel of open access, as well as new technological possibilities grown out of ICT. A particularly interesting side to using web search in marketing is the social aspect of it all. There was a time when search engine marketers believed they could find the recipe for successful online marketing by reverse engineering the search engine algorithm. Now you really need to develop a feeling for useful communication, relevant content generation and the social and cultural rules of social media.

Susanne added:

We do follow enterprise search to a certain extent, but we readily admit that our main focus has been on web search. Pandia is a two person part time exercise, and there are limits to what we have been able to do. Enterprise search is very interesting, though. These companies face challenges that the web search companies do not, including – for instance – access to a limited amount of data and interlinkages. Enterprise search companies may also bring out new innovations that can enrich web searching.

Over the course of our conversations, ArnoldIT and Pandia discovered three areas of interest:

First, the world of search is touching many disciplines, software applications, and many different social activities.

Second, we learned that although we don’t agree on every point in the volatile world of search and content processing, we enjoy discussing, dissecting ideas, and learning new things.

Third, we believe we can collaborate to bring to our respective audiences a new type of “landscape” view of search and content processing. We now have a new monograph in the works, and we will be announcing the details of that writing project, what the monograph will cover, and how you can reserve a copy in the next two or three weeks.

To learn more about Pandia, point your browser to www.pandia.com. We visit the site on a regular basis and we strongly suggest that you give it a look as well.

Stephen E Arnold, February 1, 2011

Freebie but the Oslo-Kentucky connection will produce more than digital herring.

Exclusive Interview with the Founder of Xyggy

January 25, 2011

Last year, I had an email exchange with Dinesh Vadhia, founder of the Xyggy search and content processing company. I did some poking around as did one of my colleagues. We were able to engage Mr. Vadhia in a lengthy conversation on January 20, 2011. In the course of that discussion he said:

Xyggy’s item-search is a new framework for IR based on how people learn concepts and generalize to new items. For instance, shown one or two apples for the first time you will thereafter be able to point to apples every time one crosses your path. The apple may appear as the fruit or in an image and yet we have the remarkable ability to absorb a small amount of information and generalize to new instances. The ability to learn concepts from examples and to generalize to new items is one of the cornerstones of intelligence….Xyggy’s item-search method is a new IR tool for solving the ‘findability’ problem. Without a new tool you only have conventional and well travelled paths to address the problem.

We found his approach and insights refreshing. You can read the full text of the interview with Mr. Vadhia on the ArnoldIT.com Web site in the Search Wizards Speak sub-site. SWS is the largest collection of first-person statements about search and content processing available without charge. Why pay crazy amounts for recycled pablum. Read what search developers themselves say about their methods and systems.

Stephen E Arnold, January 25, 2011

Freebie

Exclusive Interview: Francois Bourdoncle, Dassault Exalead

January 4, 2011

Before the New Year break, I interviewed François Bourdoncle, who is the co-founder and Chief Strategist of Exalead, a global leader of information access software for businesses and the Web. You can see the complete interview on Vimeo.

image

François Bourdoncle, Chief Strategist of Dassault Exalead

As you may know, Dassault Systèmes acquired Exalead in June 2010. In my analyses of search and content processing systems, Exalead has scored at the top of the league table I maintain for many years. Since I first tested the firm’s technology in 2003, Exalead has moved up the league table as other vendors stalled or drifted downwards. Not Exalead, however.

I have been invited to Exalead several times to explain my views about Exalead’s search-based applications approach and my particular method of analyzing content processing. In late 2010, I was able to interview François Bourdoncle after my presentation.

This interview provides insight into how Mr. Bourdoncle’s passion for understanding Internet technologies, and search technologies in particular, can be used as the foundation of next-generation business applications that are faster to develop, less expensive to deploy and operate, and more intuitive to use.

Exalead, asserted Mr. Bourdoncle, takes a platform approach. The platform angle is one key to Exalead’s success. Most vendors repackage traditional key word methods. Exalead has an infrastructure approach which has some characteristics of mainstream database vendors. Exalead supports cloud methods, delivers a holistic view of digital information, and provides a new way of performing many enterprise tasks, including business intelligence, customer support, and finding the item of information necessary to close a deal.

In the interview Mr. Bourdoncle said:

Key word search does not work…. The question is now how do we help people find the what they want.

I asked Mr. Bourdoncle about the benefits of joining the Dassault organization. He emphasized, “Visibility and the company as a whole.”

The interview runs about eight minutes, and you can access it at this link. For 2011, my suggestion is that organizations wanting to improve such diverse information functions as customer support to business intelligence, check out Exalead. You can also read the Search Wizards Speak interview with Mr. Bourdoncle at this link. If you have not added Exalead’s excellent Web search function to your bookmarks, navigate to http://www.exalead.com/search/. I use this service as my primary Web search system because the results are not distorted by ad-driven hit boosting.

Stephen E Arnold, January 4, 2011

Freebie

Exclusive Interview: Brian Pinkerton

December 15, 2010

Introduction

At a recent conference, there was much buzz about consulting firms’ opinions about enterprise search. I spoke with several people who expressed surprise at the “rankings”. For example, one high-profile firm pronounced Vivisimo as the top vendor in enterprise search. Vivisimo positions itself as an “information optimization” company. I am not sure what that means, but it is clear that “enterprise search” is not the company’s main focus. Nevertheless, Vivisimo is number one.

Okay, but Vivisimo started life a company with on-the-fly clustering. Then Vivisimo morphed into a vendor of federated search. Next Vivisimo dabbled in government contracts. After an executive shake up and an infusion of venture capital, Vivisimo emerged as an “information optimization” company. The phrase is as confusing as Google’s “contextual discovery.”

What are these marketers talking about? The answer is making sales and no-calorie marketing jargon. The consulting firms know a sales opportunity exists when user satisfaction with enterprise search is chugging along in the 50 to 70 percent range. Yes, most users of an enterprise “findability” system are unhappy. Procurement teams are, therefore, busy because most companies are looking for a search silver bullet.

To cater to those looking for a quick, simple way to solve an enterprise information access problem, consultants and advisors offer impressionistic write ups. Madison Avenue works fine when selling toothpaste. Apply that method to the very tough problem of information retrieval, and you end up with confusion, rising costs, and unhappy users.

Let me give you another example that surfaced in my conversations with vendors in London at the December International Online Conference. I learned that one consulting firm named Endeca as the top dog in enterprise search. I am okay with that assertion as long as there are some data to back up the claim. When I hear the name “Endeca”, I think of eCommerce as the core strength. The system can be applied to other information problems, but when I recall Endeca’s patent applications, I think about eCommerce, not discovery and data fusion.

Perhaps some search firms are more adept at social engineering than software engineering? Are some search advisors doing Madison Avenue-type thinking, not engineering analyses?

I don’t have any quibble with consulting firms who peg Autonomy as Number One. The revenue alone makes the difference between Autonomy and other information access vendors evident. Last time I saw Andrew Kanter, the chief operating officer for the vendor of meaning-based computing solutions, I asked him, “When will Autonomy break the $1.0 billion in revenue barrier?” He told  and an audience of about 175 people that Autonomy “was only $900 million.” Yep, $900 million, which is orders of magnitude greater than most of the 300 vendors whose information retrieval technology I track. IBM, Google, Microsoft, and Oracle do not provide search revenue detail in the financial reports. So on revenue Autonomy has a valid claim to the Number One position in enterprise search.

Consulting Firms Want to Sell Work, Not Expose Warts

Consulting firms—particularly those confined to the mid-tier below the McKinseys, the Bains and the Booz Allens and above the independent experts—have to feed their firms’ revenue hunger. Consulting is an expensive business because full time employees have to be kept billable. Making sales, therefore, is more important than objectivity in my experience.

What mid tier consulting firm sales professional wants to irritate an IBM, Google, Microsoft, or Oracle? Big companies, therefore, are often graded on the curve. Is it not easier to rubber stamp search systems from these Big Four vendors? Get along, go along is perhaps the motto in certain situations.

One consequence of the pressure to make sales is that consulting firms have to back certain horses. The idea is to focus on commercial vendors who are likely to have an appetite for buying and paying for the services of the consulting firm.

Somewhat surprisingly, most of the consulting firms’ search analyses fumble the ball when it comes to open source search; namely, Lucene/Solr, FLAX, Tesuji, and others. The fact is that organizations like Cisco Systems, eHarmony, LinkedIn, MTV, and Twitter, among others are relying on open source “findability” solutions, in particular Lucene/Solr. Open source search is now a viable option for many organizations, and the deprecation of Lucene/Solr is surprising to me.

The bottom-line is that most search vendor league tables are suspect. Unfortunately, these league tables are viewed fact.

On December 10, 2010, I wanted to get an open source technology to talk about open source search and how that option is perceived by marketing organizations masquerading as independent analysts.

The Interview

I spoke with Dr. Brian Pinkerton, one of Lucid Imagination’s vice president of product development. Brian has has a Ph.D. in Computer Science & Engineering and started his work career as a senior software engineer at NeXT. He then developed WebCrawler, the Web’s first comprehensive search engine.

image

Brian Pinkerton, VP Product Development, Lucid Imagination

Since then he was Technical Architect at AOL (which acquired WebCrawler), VP of Engineering and Chief Scientist at Excite, Principal Architect at A9, Director of Search at Technorati and co-founder/President of Minimal Loop, whose technology was acquired by Scout Labs and where Brian was VP of Engineering.

Today (December 15, 2010) Lucid Imagination is announcing the general availability of its Lucid Works enterprise product, which is available for free download. the product is described as a search solution development platform built on open source Apache Lucene/Solr.

The full text of my interview with Brian appears below:

Several consulting firms have issued analyses of the enterprise search market. I noted that open source search in general and Lucid Imagination in particular were not highlighted as top candidates for the enterprise. Why is open source search put on the bench?

Economics, primarily.  Because customers spend huge amounts of money on commercial packages, a small industry has grown up to support and encourage such decisions.  This process is naturally set up to ignore disruptive technologies, especially ones that are price-disruptive.  The consulting firms don’t work for free: getting prominent placement in a report usually costs money.  Who’s paying that fee for open source?   Another important reason is the market: developers, not IT managers, are the main adopters of open source solutions, while IT execs are the main consumers of the fancy reports.

Large organizations rely on consultants’ reports. In your opinion are these reports accurate?

It’s hard to comment on these reports because the methods are not always transparent. These consultants spend a lot of time talking to vendors and customers, and draw some conclusions based on that. Many of them have been at it for a while, and they survive by providing useful insights.  One useful thing to note, though, is that their conclusions are biased by those they talk to and their target audience: the IT exec.  If you’re one of those, I’m sure you like the reports.  If you’re a developer, you might not.

How is Lucid Imagination productizing open source search?

We have released a product, LucidWorks Enterprise, that extends Lucene/Solr with features commonly needed by commercial customers.  We focus on is providing technology that will make open source Lucene/Solr more accessible to more people.  For instance, user interfaces  that simplify getting started, or APIs that are specifically targeted to the way enterprises build and integrate applications today.

For example, we extend Solr with RESTful interfaces for configuration; that provides developers with the ability to integrate it more easily. We also simplify functions that could be built from open source, but are more convenient to take as ready-made features.  Finally, we add features that 99.9% of software developers probably can’t create easily from scratch, such as our Click Scoring framework, which boosts search results selected most often by users.

Furthermore, open source projects are really good at broad innovation, transparency, and easy access.  But the communities around open source projects are not support organizations, so many vendors help companies adopting open source with timely expert support. That’s another one of the things we do at Lucid.

What steps have you taken to ensure the stability of the open source search product you offer?

We take the latest, most stable innovations from the open source development tree (known as ‘trunk’) and provide rigorous integration testing, as well as regular, stable releases driven by customer opportunities. We follow strict software engineering principles and use a quality-driven  release process to build LucidWorks Enterprise.  And we provide maintenance fixes and releases for our product in timely fashion to customers.

Proprietary search vendors emphasize that their approach ensures that licensees get timely bug fixes and updates. Is this a valid statement? What does Lucid Imagination provide a customer who wants timely bug fixes and updates?

I think both open-source vendors and commercial software suppliers provide timely bug fixes and updates.  On the open-source side, it’s an interesting challenge because some bugs are fixed nearly instantly by the open source community, but they are not packaged in a way that a production customer can easily consume.  Production customers want bug-fix-only branches of the the software, not bug fixes accompanied by the latest feature innovations that happened to be committed at the same time.  We insulate our customers from the open-source volatility by releasing stable, bug-fix-only branches for our production customers.

Search technology has fragmented into a mind numbing number of implementations such as an appliance, cloud or hosted search, on premises search, and combinations of methods. How does Lucid Imagination’s search product fit into this fragmented solutions landscape?

LucidWorks Enterprise is a product that spans the range from software appliance to developer toolkit.  Customers new to search can deploy it in a turnkey fashion, while more sophisticated customers can dive under the hood and build a complex application around it.  A key secret to great search is how well it fits the business it is meant to serve — in fact, this is true of any application, particularly custom built apps. We believe that anyone who needs better than ‘adequate’ search results will want to build their search solution, and we created LucidWorks Enterprise to provide the best, lowest cost, most scalable platform for building that search solution.

Microsoft SharePoint provides a search solution. Microsoft offers the Fast technology for a more robust solution. What does Lucid Imagination provide to a SharePoint licensee wanting an enhanced search solution?

We will release a robust SharePoint solution in the first two quarters of 2011 and provide anyone to use LucidWorks Enterprise to search their SharePoint data alongside data from other common sources.  One of the open questions about the new SharePoint solution is how long Microsoft will support Fast’s integration with anything but SharePoint.

Many search vendors offer faceted search; that is, the system generates hot links to related or supporting content. What is Lucid Imagination’s approach to faceted search?

Both LucidWorks Enterprise and Solr provide faceting support on every query that enables users to refine their results.   Faceting is most obviously useful in eCommerce, though a wide variety of applications also take advantage of the feature.  LucidWorks Enterprise and Solr support efficient and scalable faceting on any field, providing human-readable labels and accurate facet counts for the top facets.  One of the important considerations for large collections is the degree to which faceting works in a distributed configuration.  In LucidWorks Enterprise and Solr, faceting is supported seamlessly in distributed situations, offering the full performance at scale.

Would you describe a customer support use case for Lucid Imagination search?  What are some common themes?

Because we have a diverse base of customers, we see a wide range of search applications.  One common theme is relevance tuning: for instance, customers who need help tying certain results to certain queries, or just better optimizing the algorithms built with Solr & Lucene to deliver the right results.  Another common theme, and one that I personally enjoy helping customers with, is performance.  We had one customer who replaced a commercial search engine with Solr, reducing their median query response time from 30 seconds to about four seconds without our help. We then helped them reduce that by another factor of eight, to a median query response time of under half a second.

With open source search gaining acceptance within large companies like Cisco and high demand Web applications like Twitter, why are the consulting firms giving open source and Lucid so little attention?

One reason is that it’s coming up really, really fast — and they may not see it coming.  Also, open source adoption is often driven by a broad, diffuse population of developers.  The developers don’t generally put much stock in what the analysts say, if they’re even aware of the reports to begin with.  And on the flip side, the analysts are paying attention to their own customers, CIOs and vendor salespeople, who may not know how the work is really getting done.

What do you suggest a procurement team do to evaluate fully an open source search solution such as the one Lucid Imagination offers?

I think they need to make sure their company is comfortable with creating their own applications; it’s not a passive technology, but one that can be actively used to drive competitive advantage.  In looking at vendors, find one that can offer a solution that grows as their needs and skills grow: from something simple in the beginning to something fully customizable as they become more sophisticated consumers.  And most importantly, they should look for a company with the depth and expertise to provide training, support, and consulting to help them harness the full scope of search innovation.  Finally, they should do the math compared to what they might pay for a comparable implementation with a commercial enterprise search vendor. In many cases, they’re already spending many times what it would cost them to buy an open source-based solution. Sometimes they’ll pay more just for the annual maintenance — excluding consulting and license fees — than for a complete subscription for LucidWorks Enterprise.

In several of the recent analyses of enterprise search systems I have reviewed, I learned about such companies as Sinequa, Fabasoft and Expert System, both examples of firms that have zero profile in many organizations. In your opinion, why are these types of search vendors given so much attention in the search market?

I can imagine that the marketing guys at such organizations are always happy to talk to industry analysts. I spend my time mainly talking to customers and developers.

How can one get more information about Lucid Imagination and its open source enterprise search solution?

Our Web site  www.lucidimagination.com  is full of information about our product, LucidWorks Enterprise, and other information about the open source technologies, Lucene and Solr.  We also have case studies that show how customers are building applications and products with Solr, Lucene, and LucidWorks Enterprise. And I always recommend downloading our product, now available free to developers, and taking it for a spin.

ArnoldIT Comment

My view about consulting firms’ analyses of search and content processing vendors has evolved over the last two years. The economic impact has put pressure on most of the companies that sell technical advice. Since the 2008 financial storm roiled commercial waters, certain advisory firms have shifted from independent analyses to what generates revenue for the consulting firms.

Many of the consulting firms’ reports are white papers or marketing material. The problem is that search is a particularly difficult technical field. Selecting a search system is often a difficult challenge for a procurement team. There are numerous, complex factors to consider.

Consulting firms offer “advice” about what system or systems is the “best” at a particular function. The problem is that writing about search is different from implementing search. It is easier to describe what a search vendor asserts in a demo. It is harder to take that solution and solve a real-world problem in a Microsoft SharePoint environment or in a setting where numerous mission critical applications operate in a stand alone manner.

If you are looking for a search solution, you will need to develop a “tight spec” and then investigate the options that match specific requirements. Few organizations have the time or resources to test multiple systems before making a decision about what search system to license.

The need for information about search creates an opportunity for independent firms to provide information, often at a hefty fee. In my experience, selecting a search system requires an approach close to the one that Martin White and I set forth in our 2009 book Successful Enterprise Search Management, published by Galatea in the UK.

We suggest that procurement teams become familiar with the available literature about search. Then a methodical process of assessment and evaluation can be followed. The short cut often leads to the all-too-common complaints about a search system. Users cannot locate needed information and user satisfaction plummets.

Stephen E Arnold, December 15, 2010

Sponsored

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta