New Landscape of Enterprise Search Details Available

May 18, 2011

Stephen E Arnold’s new report about enterprise search will be shipping in two weeks. The New Landscape of Enterprise Search: A Critical Review of the Market and Search Systems provides a fresh perspective on a fascinating enterprise application.

The centerpiece of the report are new analyses of search and retrieval systems offered by:

Unlike the “pay to play” analyses from industry consultant and self-appointed “experts,” Mr. Arnold’s approach is based on his work in developing search systems and researching search systems to support certain inquiries into systems’ performance and features.

, to focus on the broad changes which have roiled the enterprise search and content processing market. Unlike his first “encyclopedia” of search systems and his study of value added indexing systems, this new report takes an unvarnished look at the business and financial factors that make enterprise search a challenge. Then he uses a historical base to analyze the upsides and downsides of six vendors’ search solutions. He puts the firm’s particular technical characteristics in sharp relief. A reader gains a richer understanding of what makes a particular vendor’s system best suited for specific information access applications.

Other features of the report include:

  • Diagrams of system architecture and screen shots of exemplary implementations
  • Lists of resellers and partners of the profiled vendors
  • A comprehensive glossary which attempts to cut through the jargon and marketing baloney which impedes communication about search and retrieval
  • A ready-reference table for more than 20 vendors’ enterprise search solutions
  • An “outlook” section which offers candid observations about the attrition and financial health of the hundreds of companies offering search solutions.

More information about the report is available at http://goo.gl/0vSql. You may reserve your copy by writing seaky2000 @ yahoo dot com. Full ordering information and pricing will be available in the near future.

Donald C Anderson, May 18, 2011

Post paid for by Stephen E Arnold

Topix and Blekko Search

May 17, 2011

I noticed a change in the Topix.com search system over the weekend. Results come from Blekko.com. As you may know, Blekko.com is a good option to have in hand when the results from Google or Bing miss the mark. For some of my queries, I use Blekko.com before I wade through the Google result lists.

In my tests of the new search system, I did notice one somewhat annoying characteristic. To see what created extra work for me, navigate to www.topix.com. You will see a page of “Recent US News News & Discussions.” Ignore the “News News” weirdness and look at the search box. Enter a query for a popular topic. I tried “sharepoint” and “dancing with the starts.”.

Notice that the search results appear with a site limiter command; specifically, site:topix.com shown in the screenshot below:

blekko topix 01

The implementation is a hosted service much like the Blossom.com search solution I use for the Beyond Search blog. The problem is that the index is stale. In the “sharepoint” query, I got a hit to Houston Newswire but the story was no longer available: http://www.topix.com/wire/houston/p5.

What’s happening is that the new content in Topix is not being indexed quickly by Blekko. Now what I mean by quickly is that I need to be able to locate news stories with my query terms over the last two or three days. I also need to locate today’s stories. Topix is a “news news” site, and I need the search system to refresh its index quickly.

What’s the fix? Easy, copy the query string from the Blekko Topix results page, in my case, site:topix.com sharepoint and navigate to Google.com. Then paste the query string into the Google search box and you get more current hits on the topic in which you are interested.

Can Blekko speed up its indexing cycle? Sure, and I think the company will. Google and other search systems create priority lists for content acquisition. Blekko needs a similar high priority operation to make Topix search useful to those who look for “news news”, to use Topix’ own weird lingo.

How much faster is the Google cycle? On my spot tests, Google was lagging by hours. Blekko was lagging by days but even more disturbing was the Blekko Topix results did not present content. I was getting for my sharepoint query a list of jobs shown below.

blekko topix jobs

This is a really bad relevance issue, which when combined with the index staleness, is causing me to use Google.com to search Topix.

Hopefully this issue will be remediated soon.

Stephen E Arnold, May 17, 2011

Freebie

Landscape

May 17, 2011

The New Landscape of Enterprise Search. A Critical Review of the Market and Search Systems will be available in a few weeks.

To get a free copy, just sign up for our monthly newsletter. Write thehonk at yandex.com.

This 125 page monograph was published by Pandia.com. Pandia has closed its publishing operation. The 2011 report provides an overview of the enterprise search market at a time when many vendors walk a knife edge of profitability and other vendors have either failed like Convera and Delphes are in a somewhat frantic quest for additional funding.

In a time of considerable financial duress, an enterprise search system is an important part of many organizations’ operations. However, search vendors are using a diverse, often Madison Avenue approach to explaining information retrieval. To make the landscape more interesting, there are hundreds of companies offering broad solutions and an equal number selling eDiscovery, customer support systems, business intelligence systems, and sentiment analysis solutions, among others.

cover 5 10 C

Scrape away the marketing jargon, and these systems are often quite similar. Dig more deeply and you will discover that some solutions use open source software wrapped in proprietary code. Other vendors license third party tools from specialists and essentially “package up” solutions which are pitched as a cohesive whole. Little wonder most enterprise search systems generate dissatisfaction levels among their users of 50 percent, 65 percent, and even higher.

The principal chapters of the report are:

  1. A preface. This explains why my team at ArnoldIT.com and I wrote another book about search. In the last half dozen years we have generated multiple editions of the now defunct Enterprise Search Report which ballooned to a massive 600 pages when printed out, the Beyond Search report about value-added indexing for the Gilbane Group, Successful Enterprise Search Management for Galatea in the UK, and our third analysis of Google technology in Google: The Digital Gutenberg. The reason rests with the type of information that is now circulating about major search systems and enterprise search. We wanted to try and provide an anchor point for today’s procurement minded professionals.
  2. An introduction. We have pulled information from our annual review of the search sector which we prepare for our clients each year and additional current information about the market, hot sectors, and the problem “big data” poses to organizations regardless of their revenues or number of employees.
  3. Autonomy. We review the guts of Autonomy’s Integrated Data Operating Layer and provide facts about why the company is able to sustain solid growth and deliver search technology to more than 20,000 customers.
  4. Endeca. We talk about the “under the covers” aspect of Endeca’s Guided Navigation. We explore how Endeca has penetrated eCommerce, search, and business intelligence. Unlike Autonomy, Endeca is a privately held company and has been the victim of a “glass ceiling” in certain aspects of its business.
  5. Exalead. Like Google, Exalead based its revolutionary approach on experiences its founders gleaned working on other search and retrieval methods. After its purchase by Dassault Systemès in 2010, Exalead exploded into a market niche described as “search based applications.” The chapter dissects the “plumbing” of Exalead and identifies how its next generation technology is pushing the company toward new types of information integration, including augmented reality.
  6. Google. The information in this chapter departs from the pure technical dissection of Google in my three Google monographs. There is a strong technical component but we present pricing and a frank discussion about the commitment Google has to make to the Google Search Appliance to make it a cost effective option for organizations. The information about Google’s cloud-based search initiative and the 2011 search appliance pricing provides a view different from what is offered by Google’s public relations.
  7. Microsoft. The focus is on the Fast Search & Transfer SA system which is the carrier-class search solution for SharePoint licensees. We look at what Microsoft Fast Search Server is and we document what is different from the old, pre-implosion Fast Search. We gathered information that explains why Fast Search was beginning a complete rewrite of the core Fast Search system prior to the acquisition by Microsoft. What happened to that project? We reveal that in this chapter.
  8. Vivisimo. The company has a new management team and is now pushing aggressively into enterprise search. Unlike some vendors, Vivisimo has kept a focus on search and added features to make Vivisimo useful in customer support and eDiscovery applications. Is Vivisimo a solid search solution or a clever utility packaged like many other vendors’ technology as a Swiss Army knife?
  9. Outlook. In this chapter we provide a glimpse of the search landscape as tomorrow’s sun breaks the horizon. Search as a stand alone solution is not casting a long shadow. What will the future hold for today’s leaders and the hundreds of companies chasing the search brass ring? We try to answer the question. Our view may surprise but not shock you.

The volume also contains a listing of resellers and partners of the profiled vendors. This information is often needed when a problem arises or a new feature or function is required. The listing also provides stark evidence of the “footprint” each of the vendors has in specific market sectors. To our knowledge, such data have not previously been collected.

We have also prepared a table listing another two dozen vendors of enterprise search. For each vendor we describe its core positioning and provide essential facts such as the firm’s url. In a sense, this table provides a summary of the key points in my other analyses of key vendors and their systems.

Finally, we took our various glossaries, updated them, and compiled a fresh list of terms and definitions. The jargon of search is one of the signals that vendors are struggling to make sales. The glossary provides short explanations of important terms. Our approach is not academic. We intended to craft explanations that will allow a person who is not an expert in information retrieval translate the explanations some vendors provide.

Are we confident that this report is the last word about search?

No, of course not.  Search is among the most complex information challenges organizations, developers, and researchers face. In fact, the weakness of the report comes from the decision to focus on six vendors and emphasize the processing of unstructured textual information. We do touch upon the challenge of rich media, but that is an aspect of search that looms as a significant technical hurdle for a number of companies. Only Autonomy and Exalead have developed mature solutions to rich media processing. The other vendors lag behind these two engineering-centric firms.

To get a free copy, write thehonk@yandex.com. Note. When you request a free book, you will be opting into our new email, restricted distribution weekly newsletter about search and content processing.

Updated, February 3, 2012

Endeca and Cyber Situational Awareness

May 16, 2011

Wow, that’s a fresh spin on eCommerce, database technology and search. “Cyber situational awareness” is a semantic angle from Endeca that is fresher than sentiment analysis or lame old search and retrieval.

Bob Gourley acquaints us with “Endeca’s Cyber Situational Awareness.” Endeca revamped their indexing technology in ’09, and it has several features to crow about. However, the most interesting to us is it’s “Cyber Situational Awareness. The article asserts:

Many streams of data constantly pour into the [Security Operations Center]: log analysis, incident reports, network analysis, threat intelligence, and more. When a significant incident occurs, the urgent question is not only ‘how do we handle the incident’ but ‘what’s the impact to current missions and readiness?’ Endeca lets the SOC answer that question with search/discovery tools, by interactively tracing the dependency relationships that start with the compromised asset or exfiltrated data. All the key data is ingested into a common operating picture, inside which analysts can search, drill and pivot through lists and visualizations of each cyber data source.

Now that’s how to go beyond search: probe cyber situational awareness. It will be interesting to see where this leads. I wonder if there will be a YouTube.com series called CSA with intrepid search experts cracking tough problems with next generation technology?

Cynthia Murrell May 16, 2011

Freebie

Visualization Components

May 15, 2011

David Galles, of the Computer Science University of San Francisco, gives us a useful collection of visualization components in his “Data Structure Visualizations” list. The structures and algorithms addressed include the Basics, Indexing, Sorting, Heap-like Data Structures, Graph Algorithms, Dynamic Programming, and “Others.”

In his page discussing visualizations, Galles explains,

The best way to understand complex data structures is to see them in action. We’ve developed interactive animations for a variety of data structures and algorithms. Our visualization tool is written in JavaScript using the HTML5 canvas element, and run in just about any modern browser — including iOS devices like the iPhone and iPad, and even the web browser in the Kindle! (The frame rate is low enough in the Kindle that the visualizations aren’t terribly useful, but the tree-based visualizations — BSTs and AVL Trees — seem to work well enough).

Galles also provides a tutorial for creating one’s own visualizations. Check it out if you’re wrestling with your own complex data structures. As search vendors thrash and flail, business intelligence looks like a promising market sector. Nothing sells business intelligence like hot graphics. Just ask Palantir.

Cynthia Murrell May 15, 2011

Google: Search to Knowledge

May 6, 2011

Short honk: Knowledge is a sticky wicket. When I was working on a PhD, one of my advisors was the fellow for whom I was indexing Latin sermons. His name was Arthur Barker. In his spare time, he cranked out bibliographies and studies of topics best not discussed among search engine optimization experts, poobahs, and technology satraps. I recall one conversation in which he complained about the problem of knowledge. My recollection is that he suggested knowledge was a slippery fish. Epistemology, hermeneutics, and other hot topics left “knowledge” poorly defined. What did I know? I was indexing medieval Latin. Who needed knowledge for what in 1968 was pretty much irrelevant to everyone in the world. Flash forward to my reading “Why Google Renamed Its Search Group ‘Knowledge’”. Google has put one of its HP hires in charge of search which is now called knowledge. Check out the original article. You will undoubtedly understand it. Forget Fichte. Don’t say Kant. Google’s got knowledge nailed or Knolled.

Stephen E Arnold, May 6, 2011

Freebie

The Arnold Columns: May 2011

May 3, 2011

It is that time again. Four columns this month and for cash money. Every time I get a check I think of PT Barnum. The topics I tackled this month required research, thought, and some wordsmanship. This blog, on the other hand, is a record of the items that strike me as interesting. I have help converting my snips into write ups. If you want to know who works on this Beyond Search blog, check out the new Author tab available from the Beyond Search splash page.

So what did real publishers instruct me to cover or, in some cases, allow me to explore? Here’s the line up. Keep in mind that you will have to either get a hard copy of the publishers’ outputs or find my work on the publishers’ Web site. In one case, that could take you a day or two. Search is really easy when folks responsible for search don’t use their own search system. Such is life.

  • ETM (Enterprise Technology Management, published by ISIGlobal.com), “Google’s Management Change and the Enterprise”. The idea is that Google is making significant management changes and, either intentionally or unintentionally, sending signals that indicate the enterprise unit is not part of Larry Page’s inner circle. I hope I am wrong, but if enterprise were the key to firm’s future, I think the management shake up would have added an olive and a dash of bitters to the enterprise group. What I saw was several squirts of cold water.
  • Information Today, which is technically a newspaper, “When Key Words Fail, Will Predictive Search Deliver?”. The write up uses Recorded Future, funded by the CIA and Google, as a case example. The main idea is that semantic technology have to step up because the volume of data facing a worker and the worker’s diminished appetite for research require software to be smarter.
  • KMWorld, “SharePoint Governance: Is Semantic Technology the Answer?”. My team has been immersed in things semantic. What our work revealed is that the baloney word governance really means indexing and editorial policies. The article provides some links to useful resources and then reminds the reader that putting the information horse back in the barn when the barn is on fire can be tough.
  • Online Magazine, “Rob ROI: Open Source and Technology Costs.” I apologize for the literary license, my assumption that the readers will know about Sir Walter, the Waverly novels, and Rob Roy. The thrust of the write up is that open source software reduces some costs but not every cost. As a result, poor budgeting for open source software can yield the same ROI killing overruns that plague commercial software. Don’t agree with me? Sigh.
  • Smart Business Network, a series of city business magazines and a Web site, “Coupon Monsoon: Downpours of Digital Deals.” The focus of the write up is the deluge of deals, coupons, and discounts. The problem with most of these services is building an audience and delivering offers that make sense to customers and merchants. I answer the question, “Should your business use coupons?”

Every two or three years I gather up these for-fee outputs and slap them in the ArnoldIT.com archive. However, you cannot rely on me to be much of an information professional. I can barely write these outputs. Organizing and archiving—beyond my skill set. Subscribe to these publications. The information in my for-fee columns is different from the Web log’s.

Stephen E Arnold, May 3, 2011

Not free. I am paid for columns so this write up is a shameless commercial promotion.

Access Innovation Merges Data Harmony and Microsoft SharePoint 2010

April 29, 2011

According to the EContentmag.com article “Access Innovation Integrates Data Harmony with Microsoft SharePoint 2010” Access Innovation hopes its Data Harmony and Microsoft SharePoint 2010 integration will provide clients with even more valuable options. The Data Harmony suite provides users with a content rich thesaurus and management tools to help them organize their information resources. “Data Harmony can be used to provide semantic capabilities to SharePoint to help users take full advantage of their metadata through auto classification, enterprise taxonomy management, entity extraction and enhanced search.”

The new MAIstro program offers users a whole new level of automation services. The software program will automatically index any SharePoint content using a combination of taxonomy and thesaurus database tools. The indexing results obtained “can be more than 90 percent accurate.” Individuals can search a specific subject and even find additional information using related terms. Sounds like the Data Harmony Microsoft SharePoint merger could be the beginning of a beautiful relationship.

April Holmes, April 29, 2011

Freebie but I have been promised a Mexican burrito

Nuxeo and the Google Search Appliance

April 28, 2011

I saw a brief news item about the integration of the open source content management system with the Google Search Appliance. Nuxeo already hooks into Lotus Notes and a number of other enterprise applications. The cheery “Great News…Nuxeo Integration with Google Search Appliance” points out:

Nuxeo’s recently announced Google Search Appliance (GSA) connector is an important component for any enterprise indexing and search strategy. Nuxeo content is actively indexed and can be searched using the familiar Google search page. Of course, to access Nuxeo content you still to login and you must have appropriate rights. And because the Nuxeo connector is open source, it can always be customized to meet your specific requirements!

My reaction to this announcement was a question about the cost of scaling a GSA search solution. I covered some of Google’s publicly posted pricing data for its GB 7007 and GB 9009 devices. The article appeared in ETM, a publication of ISIGlobal.com. (This was a for fee column, so you will have to chase down the hard copy of the publication or contact ISIGlobal.com.) I had a couple of comments about the cost of the GSA, particularly when an organization has to upgrade to handle tens of millions of documents.

My reaction is that organizations considering the GSA will want to make certain about the document count and then get written price quotations for the appropriate GSA AND the cost of scaling that Google Search Appliance as the volume of content increases.

The savings from an open source CMS could be consumed by a GSA upgrade unless the licensee does his or her homework.

Stephen E Arnold, April 28, 2011

Freebie unlike the GSA

Interview with Rahul Agarwalla, Uchida Spectrum

April 28, 2011

Introduction

After breaking my foot and cancelling my trips for April and May 2011, I spoke with Rahul Agarwalla, one of the individuals whom I wanted to meet at the upcoming Lucene Revolution Conference. For more information about this important open source search event, click the banner at the top of the blog page or navigate to www.lucenerevolution.com.

rahul agarwalla profile picture

Rahull Agarwalla, Uchida Spectrum Inc.

Rahul Agarwalla heads international business for Uchida Spectrum Inc, Japan. Previously he has built and exited two content/technology ventures including Matrix Information, the pioneer of digital content syndication in India. He has over 14 years of experience with various search technologies like Verity, Fast ESP (enterprise search platform) and Solr/Lucene.

Mr. Agarwalla is the founder of Uchida Spectrum, a key factor in Japan’s search and content processing market. USI provides SMART InSight, a search application used by many Fortune 500 companies for specialized industry applications like R&D and quality assurance for manufacturing, claims and customer management etc.

Originally SMART/InSight was based on Microsoft Fast Search technology. SMART InSight, a search application that integrates and analyzes enterprise information.The solution is used by such organizations as Canon and Moody’s. Uchida Spectrum is working with Lucid Imagination across Asia as the Strategic Alliance Partner to integrate LucidWorks Enterprise into its products and offer Lucene/Solr support services to clients.

Mr. Agarwalla’s firm has become something of a specialist in moving organizations from the ageing Fast Search &  Transfer search platform to the newer open source solutions available today.  I spoke with him on April 27, 2011. The full text of the interview appears below.

The Interview

What’s the principal business focus of Uchida Spectrum?

Uchida Spectrum is one the leaders in the Japan search market. It all started in 2002, when we saw opportunity at the intersection of software and information. That was the inspiration to launch the search business. Our product, SMART InSight, is a search application that integrates information from across the enterprise in easy–to-navigate cross department information chains, and adds visual summaries that add value through contextual metadata and analytics.

Instead of focusing on enterprise search as a horizontal solution, we found companies placed great value on Line of Business applications. SMART InSight is used for specific applications; for example, quality assurance, research and development, product development, claims and customer management. Those interested in our solutions are large multi-nationals across various sectors, such as electronics, automotive, chemicals, finance and engineering, as well as smaller organizations.

When did you become interested in text and content processing?

During my MBA, I won a competition for my paper on the power of information and the Internet. One of the judges, S. Swaminathan, pushed me towards the emerging internet/information sector. I went on to set up India’s first digital content syndication business where we had to deal with massive amounts of content from our more than 200 content partners. Doing this manually was just not scalable. Therefore, we embraced content and text processing technologies. We were partly inspired by the Dialog Information Service, which was once part of Lockheed. In 1998, Dialog had more content than all of the World Wide Web. I think the number I heard was 12 terabytes of data. The massive growth in information flows since then means today the challenges of extracting, normalizing and adding metadata, are common to all businesses.

Uchida has been investing in search and content processing for several years, and has recently moved from embedding Fast Search’s technology to Open Source Solr and Lucene. What’s been the big payoff from your work with Lucid Imagination?

The tie up has been very positive.

Our product, SMART InSight, uses search to integrate and retrieve information — so scalability and reliability, at reasonable cost, are critical factors. Lucene/Solr has delivered this in spades. The amount of data we can index on a server and the ability to scale in a linear fashion are unmatched. For instance, in one project we found a 10x improvement due to lower cost of ownership combined with higher performance.

As the quantity of data grows unabated, customers are extremely concerned about cost of future expansion. Working with Lucid Imagination we are able to meet such technical and business challenges and build a future proof foundation.

Are you doing research and development too?

Absolutely.

Our research and development efforts are focused on dealing with massive disparate data. MNC [Multi-National Corporations] have to deal with multiple languages, complex security rules, and different data formats and structures.

Integrating this data in a user-friendly manner involves much more than conventional content normalization. We need to understand the meaning of the information and its context. Presenting various types of information in an intuitive interface and quickly is our second challenge. Here we are looking at constantly improving our mash up and RIA [Rich Internet Application] widgets technology.

Many vendors argue that mash ups are and data fusion are “the way information retrieval will work going forward”. What’s your view of the structured – unstructured data approach?

Information fusion is fundamental to cognition. For example, if you have a stuffy nose, the food you eat can be tasteless. We are better off because we use all our senses together. Information retrieval needs to be adapted to our way of working and not the other way around.

Islands of information have come into being due to the historical approach of information technology. Search gives us a paradigm to overcome this. For instance, customer surveys, warranty claims, quality testing should all have a relationship with product and part data to analyze defects and impact. The massive automobile recalls in 2009 are symptomatic of not connecting the dots. Companies today are much better equipped for product improvement and quick failure detection to the extent they have integrated car performance data (from the car’s onboard systems) with other data sources. The same holds true when we look at voice of customer data or degree customer management.

Based on the experience of our customers, we strongly believe search has more value when you add more data sources. The key is to enable users to understand and explore the inter-relationships between different datasets in an intuitive manner.

Without divulging your firm’s methods, will you characterize a typical use case for your firm’s content processing, tagging, and search and retrieval capabilities?

SMART InSight enables customers to integrate multiple data sets from one or more departments, to easily navigate across these datasets and to analyze massive data sets using charts and tables driven by search. The application interface is determined by the data and by what kind of discovery & analysis customer needs – somewhat similar to a BI system.

Can you give me an example?

Sure. I want to use an automotive industry use case.

We fed data from the American agency NHTSA (National Highway and Transport Safety Administration) into SMART InSight. We shared this prototype with some of the large Japanese auto firms. Their analysts discerned found trends and common issues using the standard charts and other features.

We then integrated customer’s datasets with the NHTSA data to build a powerful analysis application. One of its key features is, what we call: “Data Chain”.

What’s a “data chain”?

A data chain uses fields with similar meaning—for example, component category or VIN [Vehicle Identification Number–to create data driven inter-linkages. These linkages allow you to navigate dynamically across the data sets. So users starting from part failure were able to ‘data chain’ to performance data for the affected cars, related claims, NHTSA reports etc. Engineers were thus able to comprehend the problem in a comprehensive way, what I call a 360 degree view. Analysts were then able with a mouse click to drill down to causes and solutions. The analysts could also comment on and tag the data to create a shared context and highlight trends and issues for their co-workers. All the tagging and other usage and sharing information is part of an automated learning loop, which constantly improves the search relevancy and makes users more productive.

Services are often among the highest margin offerings an organization can offer. Is the need to sell consulting altering the simplicity of the installation, configuration, tuning, and customization of search applications?

Information, even if accessed on an iPad, is a complex challenge.

While we provide services, our business model is built around providing the product and related services like maintenance & support. We want our customers and partners to be able to manage the technology, as they best know how to maximize the value. We advise on best practices, help them overcome technical hurdles and provide support to ensure risk is minimized.

What’s the upside?

The key benefits depend upon each offering. First, we have a product which delivers upon installation a rich feature set in a reliable and scalable product. This enables customers to build solutions that address their use cases by focusing on the business logic without worrying about the technology.

Next our approach includes maintenance and support. We know that customers want support in order to reduce risk and ensure a successful experience during the life of the solution.

Finally, we help our clients create an internal team, which can manage and expand the search solution in tight synchronization with evolving business requirements.

How does an information retrieval engagement move through its life cycle?

It usually starts with customers asking us to help them understand how information retrieval can play a part in meeting their business challenges. We then get the customer’s sample data and wrap SMART InSight around it.

The approach involves some data analysis to integrate the information and building a few sample screens using our Ajax portal interface. Once users play with the data using the sample screens, they can imagine how best to analyze the information and what kind of application UI is required. We recommend this approach, as customers do not need to first create requirements specifications. Customers and users find it is much easier to change and improve the interface working from this kind of prototype than it is to start from a blank page.

The final implementation focuses on helping customers tune our widget library and pages to build the required application UI. Once this is underway, we then map mapping the data to configure correctly  the content processing and index schema.

In our projects, ground up development is minimal as our feature set includes content ingestion, search, portal and collaboration features. Post implementation moves to training the customer team and helping them maintain and enhance the solution through support services.

One challenge to those involved with squeezing useful elements from large volumes of content is the volume of content AND the rate of change in existing content objects. What does your firm provide to customers to help them deal with the volume (scaling) challenge?

We serve two market segments: An Intranet within the licensee’s enterprise, and Internet or outside the firewall information.

In the enterprise market, update frequency is relatively lower except while dealing with transactional databases. We have implemented customer solutions with over 100 million records from 10 data sources. There were no latency issues.

Things get more challenging in the Internet segment. We are currently dealing with a project in China where not only does the data have over 300 facets, its volume and update frequency are both amongst the largest in the world. Having the expertise brought by Lucid Imagination becomes critical in such situations. Together, Lucid Imagination and Uchida Spectrum are helping this customer architect a large scale system by optimizing queries and the schema with multiple indexing and search nodes.

Another challenge, particularly in professional intelligence operations, is moving data from point A to point B; that is, information enters a system but it must be made available to an individual who needs that information or at least must know about the information. What does your firm offer licensees to address this issue of content “push”, report generation, and personalization within a workflow?

That’s a great question.

My colleagues and I believe that the “Right information, right person, right time” is the critical need of many of our customers. SMART InSight offers sophisticated alerting to achieve this. Multi-parameter rule driven alerts can be sent out in real time. We also offer daily or weekly digests for other information needs. The value of alerting increases as we add more data sources into the index. Users are then able to monitor and track all relevant information flows.

There has been a surge in interest in putting “everything” in a repository and then manipulating the indexes to the information in the repository. On the surface, this seems to be gaining traction because network resident information can “disappear” or become unavailable. What’s your view of the repository versus non-repository approach to content processing?

As you know, Solr/Lucene creates an index of the information and does not store the actual information. With this approach one of significant advantages we experience is flexibility. Typically, repository solutions are developed following strict waterfall methodology with stable requirement specifications. We think this approach may be a bit out of step with today’s rapidly evolving information climate. By comparison we can be far more flexible, for example, by using dynamic fields in Solr/Lucene and readily changing the ranking algorithm.

We use connectors bundled with LucidWorks Enterprise to pull the data from databases and other content repositories. In some cases, our system integration partners or us may build a custom connector. The LucidWorks Enterprise connector framework we get from Lucid Imagination makes this much easier.

Visualization has been a great addition to briefings. On the other hand, visualization and other graphic eye candy can be a problem to those in stressful operational situations. What’s your firm’s approach to presenting “outputs” for end user reuse or for mobile access? Is there native support in Solr/Lucene or via Lucid Imagination for results formats?

What and how much information to put on the screen is always a challenge; SMART InSight resolves the clutter problem in two ways.

First, visualization, when used correctly, is extremely powerful. For this reason, our solution implementation focuses on designing the right application UI. We have built up a great deal of experience over multiple projects and are able to guide customers to design screens for experts and different ones for the simple user.

Second, we also enable users to build their own UI by selecting widgets much like iGoogle or My Yahoo. Thus a user who prefers graphs can add chart widgets and manipulate what should be the X an Y axes. We use LucidWorks Enterprise features like faceting and scoring to build accurate charts. Control over the widgets and what content fields users would like to see enables fully personalized information consumption.

I am on the fence about the merging of retrieval within other applications. What’s your take on the “new” method that some people describe as “search enabled applications”? Autonomy and Endeca each have work flow components as part of their search platforms? What’s Uchida Spectrum’s capability in workflow or similar enterprise embedding of search?

SMART InSight is both “search enabled” and “database enabled”. I wonder if any vendor uses the term “database enabled application”. The point of the “search enabled” jargon is that search is a relatively newer technology than databases. As technology becomes embedded into our lives it is no longer noticed.

Search is much more than a search box and a set of results. I think some of the work being done here by Autonomy and Endeca is commendable. The question is whether they can deliver value at a reasonable price point and thus cater to more customer segments. In this context, we are using the Lucene/Solr open technology as the foundation because we are able to deliver high return on investment with a flexible and scalable solution.

We believe this will expand the market for search and thus, hopefully, make the phrase “search enabled application” redundant.

I see you will be speaking at the forthcoming Lucene Revolution conference. What are the key trends you expect to see materialize there?

One of the key debates is search versus database. Lucene Revolution will inform this debate by showcasing how more and more large firms are choosing search. This impacts the perception of search as an enterprise ready technology. As a snowball effect, I see search augmenting databases in many applications. Companies will then need to build search expertise much the same way they have database architects and developers. I believe Lucid Imagination will play a central role in making this happen.

Lucene Revolution brings higher cohesiveness to the Lucene/Solr movement and makes visible its size. Its disruptive innovation and open source model poses a strong challenge for the established commercial vendors. The mainstreaming of the interest in Lucene/Solr means these players need to fashion a cogent strategy response. Might this trigger realignment within the search industry – mergers, diversification or focus on niche markets?

Our priority is expanding the search story in relatively under-penetrated markets like China & India. The large IT pool especially in India offers an opportunity to expand the Lucene/Solr movement. Today these engineers have developed the habit of only using databases in their solution architecture – and as the adage goes “if you have a hammer, everything looks like a nail”. We need to train them on search so it is a default part of their solution toolkit. This becomes imperative as China and India will be at the center of the Internet due to the size of their fast growing online populations and rising income levels.

ArnoldIT Comment

If you are seeking a resource to assist your organization in moving from Fast Search’s ageing technology to the Lucene/Solr platform, you will want to speak with Uchida Spectrum. You can get more information about Uchida Spectrum at the Lucene Revolution Conference and from the firm’s Web site at http://www.spectrum.co.jp/.

Stephen E Arnold, April 28, 2011

Interview courtesy of Lucid Imagination

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta