Interview with Rahul Agarwalla, Uchida Spectrum

April 28, 2011

Introduction

After breaking my foot and cancelling my trips for April and May 2011, I spoke with Rahul Agarwalla, one of the individuals whom I wanted to meet at the upcoming Lucene Revolution Conference. For more information about this important open source search event, click the banner at the top of the blog page or navigate to www.lucenerevolution.com.

Rahull Agarwalla, Uchida Spectrum Inc.

Rahul Agarwalla heads international business for Uchida Spectrum Inc, Japan. Previously he has built and exited two content/technology ventures including Matrix Information, the pioneer of digital content syndication in India. He has over 14 years of experience with various search technologies like Verity, Fast ESP (enterprise search platform) and Solr/Lucene.

Mr. Agarwalla is the founder of Uchida Spectrum, a key factor in Japan’s search and content processing market. USI provides SMART InSight, a search application used by many Fortune 500 companies for specialized industry applications like R&D and quality assurance for manufacturing, claims and customer management etc.

Originally SMART/InSight was based on Microsoft Fast Search technology. SMART InSight, a search application that integrates and analyzes enterprise information.The solution is used by such organizations as Canon and Moody’s. Uchida Spectrum is working with Lucid Imagination across Asia as the Strategic Alliance Partner to integrate LucidWorks Enterprise into its products and offer Lucene/Solr support services to clients.

Mr. Agarwalla’s firm has become something of a specialist in moving organizations from the ageing Fast Search & Transfer search platform to the newer open source solutions available today. I spoke with him on April 27, 2011. The full text of the interview appears below.

The Interview

What’s the principal business focus of Uchida Spectrum?

Uchida Spectrum is one the leaders in the Japan search market. It all started in 2002, when we saw opportunity at the intersection of software and information. That was the inspiration to launch the search business. Our product, SMART InSight, is a search application that integrates information from across the enterprise in easy–to-navigate cross department information chains, and adds visual summaries that add value through contextual metadata and analytics.

Instead of focusing on enterprise search as a horizontal solution, we found companies placed great value on Line of Business applications. SMART InSight is used for specific applications; for example, quality assurance, research and development, product development, claims and customer management. Those interested in our solutions are large multi-nationals across various sectors, such as electronics, automotive, chemicals, finance and engineering, as well as smaller organizations.

When did you become interested in text and content processing?

During my MBA, I won a competition for my paper on the power of information and the Internet. One of the judges, S. Swaminathan, pushed me towards the emerging internet/information sector. I went on to set up India’s first digital content syndication business where we had to deal with massive amounts of content from our more than 200 content partners. Doing this manually was just not scalable. Therefore, we embraced content and text processing technologies. We were partly inspired by the Dialog Information Service, which was once part of Lockheed. In 1998, Dialog had more content than all of the World Wide Web. I think the number I heard was 12 terabytes of data. The massive growth in information flows since then means today the challenges of extracting, normalizing and adding metadata, are common to all businesses.

Uchida has been investing in search and content processing for several years, and has recently moved from embedding Fast Search’s technology to Open Source Solr and Lucene. What’s been the big payoff from your work with Lucid Imagination?

The tie up has been very positive.

Our product, SMART InSight, uses search to integrate and retrieve information — so scalability and reliability, at reasonable cost, are critical factors. Lucene/Solr has delivered this in spades. The amount of data we can index on a server and the ability to scale in a linear fashion are unmatched. For instance, in one project we found a 10x improvement due to lower cost of ownership combined with higher performance.

As the quantity of data grows unabated, customers are extremely concerned about cost of future expansion. Working with Lucid Imagination we are able to meet such technical and business challenges and build a future proof foundation.

Are you doing research and development too?

Absolutely.

Our research and development efforts are focused on dealing with massive disparate data. MNC [Multi-National Corporations] have to deal with multiple languages, complex security rules, and different data formats and structures.

Integrating this data in a user-friendly manner involves much more than conventional content normalization. We need to understand the meaning of the information and its context. Presenting various types of information in an intuitive interface and quickly is our second challenge. Here we are looking at constantly improving our mash up and RIA [Rich Internet Application] widgets technology.

Many vendors argue that mash ups are and data fusion are “the way information retrieval will work going forward”. What’s your view of the structured – unstructured data approach?

Information fusion is fundamental to cognition. For example, if you have a stuffy nose, the food you eat can be tasteless. We are better off because we use all our senses together. Information retrieval needs to be adapted to our way of working and not the other way around.

Islands of information have come into being due to the historical approach of information technology. Search gives us a paradigm to overcome this. For instance, customer surveys, warranty claims, quality testing should all have a relationship with product and part data to analyze defects and impact. The massive automobile recalls in 2009 are symptomatic of not connecting the dots. Companies today are much better equipped for product improvement and quick failure detection to the extent they have integrated car performance data (from the car’s onboard systems) with other data sources. The same holds true when we look at voice of customer data or degree customer management.

Based on the experience of our customers, we strongly believe search has more value when you add more data sources. The key is to enable users to understand and explore the inter-relationships between different datasets in an intuitive manner.

Without divulging your firm’s methods, will you characterize a typical use case for your firm’s content processing, tagging, and search and retrieval capabilities?

SMART InSight enables customers to integrate multiple data sets from one or more departments, to easily navigate across these datasets and to analyze massive data sets using charts and tables driven by search. The application interface is determined by the data and by what kind of discovery & analysis customer needs – somewhat similar to a BI system.

Can you give me an example?

Sure. I want to use an automotive industry use case.

We fed data from the American agency NHTSA (National Highway and Transport Safety Administration) into SMART InSight. We shared this prototype with some of the large Japanese auto firms. Their analysts discerned found trends and common issues using the standard charts and other features.

We then integrated customer’s datasets with the NHTSA data to build a powerful analysis application. One of its key features is, what we call: “Data Chain”.

What’s a “data chain”?

A data chain uses fields with similar meaning—for example, component category or VIN [Vehicle Identification Number–to create data driven inter-linkages. These linkages allow you to navigate dynamically across the data sets. So users starting from part failure were able to ‘data chain’ to performance data for the affected cars, related claims, NHTSA reports etc. Engineers were thus able to comprehend the problem in a comprehensive way, what I call a 360 degree view. Analysts were then able with a mouse click to drill down to causes and solutions. The analysts could also comment on and tag the data to create a shared context and highlight trends and issues for their co-workers. All the tagging and other usage and sharing information is part of an automated learning loop, which constantly improves the search relevancy and makes users more productive.

Services are often among the highest margin offerings an organization can offer. Is the need to sell consulting altering the simplicity of the installation, configuration, tuning, and customization of search applications?

Information, even if accessed on an iPad, is a complex challenge.

While we provide services, our business model is built around providing the product and related services like maintenance & support. We want our customers and partners to be able to manage the technology, as they best know how to maximize the value. We advise on best practices, help them overcome technical hurdles and provide support to ensure risk is minimized.

What’s the upside?

The key benefits depend upon each offering. First, we have a product which delivers upon installation a rich feature set in a reliable and scalable product. This enables customers to build solutions that address their use cases by focusing on the business logic without worrying about the technology.

Next our approach includes maintenance and support. We know that customers want support in order to reduce risk and ensure a successful experience during the life of the solution.

Finally, we help our clients create an internal team, which can manage and expand the search solution in tight synchronization with evolving business requirements.

How does an information retrieval engagement move through its life cycle?

It usually starts with customers asking us to help them understand how information retrieval can play a part in meeting their business challenges. We then get the customer’s sample data and wrap SMART InSight around it.

The approach involves some data analysis to integrate the information and building a few sample screens using our Ajax portal interface. Once users play with the data using the sample screens, they can imagine how best to analyze the information and what kind of application UI is required. We recommend this approach, as customers do not need to first create requirements specifications. Customers and users find it is much easier to change and improve the interface working from this kind of prototype than it is to start from a blank page.

The final implementation focuses on helping customers tune our widget library and pages to build the required application UI. Once this is underway, we then map mapping the data to configure correctly the content processing and index schema.

In our projects, ground up development is minimal as our feature set includes content ingestion, search, portal and collaboration features. Post implementation moves to training the customer team and helping them maintain and enhance the solution through support services.

One challenge to those involved with squeezing useful elements from large volumes of content is the volume of content AND the rate of change in existing content objects. What does your firm provide to customers to help them deal with the volume (scaling) challenge?

We serve two market segments: An Intranet within the licensee’s enterprise, and Internet or outside the firewall information.

In the enterprise market, update frequency is relatively lower except while dealing with transactional databases. We have implemented customer solutions with over 100 million records from 10 data sources. There were no latency issues.

Things get more challenging in the Internet segment. We are currently dealing with a project in China where not only does the data have over 300 facets, its volume and update frequency are both amongst the largest in the world. Having the expertise brought by Lucid Imagination becomes critical in such situations. Together, Lucid Imagination and Uchida Spectrum are helping this customer architect a large scale system by optimizing queries and the schema with multiple indexing and search nodes.

Another challenge, particularly in professional intelligence operations, is moving data from point A to point B; that is, information enters a system but it must be made available to an individual who needs that information or at least must know about the information. What does your firm offer licensees to address this issue of content “push”, report generation, and personalization within a workflow?

That’s a great question.

My colleagues and I believe that the “Right information, right person, right time” is the critical need of many of our customers. SMART InSight offers sophisticated alerting to achieve this. Multi-parameter rule driven alerts can be sent out in real time. We also offer daily or weekly digests for other information needs. The value of alerting increases as we add more data sources into the index. Users are then able to monitor and track all relevant information flows.

There has been a surge in interest in putting “everything” in a repository and then manipulating the indexes to the information in the repository. On the surface, this seems to be gaining traction because network resident information can “disappear” or become unavailable. What’s your view of the repository versus non-repository approach to content processing?

As you know, Solr/Lucene creates an index of the information and does not store the actual information. With this approach one of significant advantages we experience is flexibility. Typically, repository solutions are developed following strict waterfall methodology with stable requirement specifications. We think this approach may be a bit out of step with today’s rapidly evolving information climate. By comparison we can be far more flexible, for example, by using dynamic fields in Solr/Lucene and readily changing the ranking algorithm.

We use connectors bundled with LucidWorks Enterprise to pull the data from databases and other content repositories. In some cases, our system integration partners or us may build a custom connector. The LucidWorks Enterprise connector framework we get from Lucid Imagination makes this much easier.

Visualization has been a great addition to briefings. On the other hand, visualization and other graphic eye candy can be a problem to those in stressful operational situations. What’s your firm’s approach to presenting “outputs” for end user reuse or for mobile access? Is there native support in Solr/Lucene or via Lucid Imagination for results formats?

What and how much information to put on the screen is always a challenge; SMART InSight resolves the clutter problem in two ways.

First, visualization, when used correctly, is extremely powerful. For this reason, our solution implementation focuses on designing the right application UI. We have built up a great deal of experience over multiple projects and are able to guide customers to design screens for experts and different ones for the simple user.

Second, we also enable users to build their own UI by selecting widgets much like iGoogle or My Yahoo. Thus a user who prefers graphs can add chart widgets and manipulate what should be the X an Y axes. We use LucidWorks Enterprise features like faceting and scoring to build accurate charts. Control over the widgets and what content fields users would like to see enables fully personalized information consumption.

I am on the fence about the merging of retrieval within other applications. What’s your take on the “new” method that some people describe as “search enabled applications”? Autonomy and Endeca each have work flow components as part of their search platforms? What’s Uchida Spectrum’s capability in workflow or similar enterprise embedding of search?

SMART InSight is both “search enabled” and “database enabled”. I wonder if any vendor uses the term “database enabled application”. The point of the “search enabled” jargon is that search is a relatively newer technology than databases. As technology becomes embedded into our lives it is no longer noticed.

Search is much more than a search box and a set of results. I think some of the work being done here by Autonomy and Endeca is commendable. The question is whether they can deliver value at a reasonable price point and thus cater to more customer segments. In this context, we are using the Lucene/Solr open technology as the foundation because we are able to deliver high return on investment with a flexible and scalable solution.

We believe this will expand the market for search and thus, hopefully, make the phrase “search enabled application” redundant.

I see you will be speaking at the forthcoming Lucene Revolution conference. What are the key trends you expect to see materialize there?

One of the key debates is search versus database. Lucene Revolution will inform this debate by showcasing how more and more large firms are choosing search. This impacts the perception of search as an enterprise ready technology. As a snowball effect, I see search augmenting databases in many applications. Companies will then need to build search expertise much the same way they have database architects and developers. I believe Lucid Imagination will play a central role in making this happen.

Lucene Revolution brings higher cohesiveness to the Lucene/Solr movement and makes visible its size. Its disruptive innovation and open source model poses a strong challenge for the established commercial vendors. The mainstreaming of the interest in Lucene/Solr means these players need to fashion a cogent strategy response. Might this trigger realignment within the search industry – mergers, diversification or focus on niche markets?

Our priority is expanding the search story in relatively under-penetrated markets like China & India. The large IT pool especially in India offers an opportunity to expand the Lucene/Solr movement. Today these engineers have developed the habit of only using databases in their solution architecture – and as the adage goes “if you have a hammer, everything looks like a nail”. We need to train them on search so it is a default part of their solution toolkit. This becomes imperative as China and India will be at the center of the Internet due to the size of their fast growing online populations and rising income levels.

ArnoldIT Comment

If you are seeking a resource to assist your organization in moving from Fast Search’s ageing technology to the Lucene/Solr platform, you will want to speak with Uchida Spectrum. You can get more information about Uchida Spectrum at the Lucene Revolution Conference and from the firm’s Web site at http://www.spectrum.co.jp/.

Stephen E Arnold, April 28, 2011

Interview courtesy of Lucid Imagination

Written by Stephen E. Arnold · Filed Under Business strategy, Enterprise, Enterprise search, Interview, Open source, Technology

Comments

One Response to “Interview with Rahul Agarwalla, Uchida Spectrum”

Lucid Imagination » Uchida Spectrum on Solr/Lucene and Lucid Imagination: Search beyond Databases on April 30th, 2011 1:32 am

[…] You can read the full interview here. […]

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.