An Interview with Iqbal and Zubair Talib
About seven years ago, a fellow from MIT told me about a lightning-fast search and content processing system. My contact, an MIT engineer working in on-the-fly Chinese translation of speech was a technologist whose judgment I could trust. I tracked down the company, which was working from offices in the Virginia country-side, about 30 minutes from the White House. I remember the demo because it was live data running a yellow page service for a major South African telecommunications company. Compared to the USDEX service which I had analyzed before USWest spiraled into a financial swamp, this system was downright amazing.
My host, an engineer responsible for system implementation, introduced me to a father and son team, Iqbal and Zubair Talib. Today, the company has expanded into larger offices in the heart of the Ashburn, Virginia technology corridor. The firm's customer base has expanded across multiple continents and includes the largest directory publisher in South America, Dun & Bradstreet, and dozens of companies who have abandoned the fiendishly-complex and technically-needy systems from much larger, much higher profile vendors. Word of mouth works.
The company is Intelligenx, which you pronounce as "intelligence". The original founder, Iqbal Talib, brought his son into the business several years ago. The blend of experienced technology innovator and the brilliant son have powered Intelligenx to double digit growth.
I spoke to the father and son team on May 9, 2008, in the company's offices in the suburban Washington countryside. The full-text of my interview with them appears below:
You fellows are a well-kept secret in the US, yet you have a very high profile in South Africa, Colombia, Brazil, and India. Is this intentional?
No, I think it is an accident. We are getting referrals from our clients, and we are following these. Dun & Bradstreet is a important to us, but it seems that our international clients are actively talking about our technology at international conferences. We can always do a better job of marketing, but we put our customers first. Sales occur because people come to us and say, "We want to license your system".
What was the origin of Intelligenx? why did you tackle search, which is a very challenging market?
It is true that search is a very challenging marketplace at the moment but we are only now at the beginning of what is to come. The sheer amount of information continues to grow exponentially each and every day and consequently, search becomes more and more important as time goes by.
In a prior life, Intelligenx's founder Iqbal Talib owned and operated a hardware and software consulting business that sourced parts from all over the world to assemble large-scale PC node systems. At the time (late 80's) there were no good mechanisms to find parts and suppliers other than the printed version of the Thomas Register.
The idea of an intelligent and human-friendly search that would guide users and help them to find of the right vendor of parts from among tens, hundreds or thousands of qualified vendors appeared to be a great emerging opportunity. The onset of the Web in the early 90's gave us the perfect platform to distribute and make such data available in a highly functional and advantageous way.
In prototyping the desired search functionalities, including full-text search with categorizations, we discovered that no existing technology (full-text search engines or RDBMS software that was available) could achieve the performance and scalability that we knew would be required by companies who would have terabytes and petabytes of data to query.
My father and a small number of experienced engineers built a prototype on our own. We found that the technology itself attracted attention as a very useful tool to slice and dice data in real-time.
I learned about you from an MIT PhD specializing in text-to-speech translation. Is that a typical referral?
Yes, we maintained certain relationships among an elite group of scientists and engineers. We never signed up to give marketing talks at the marketing-oriented venues. Our success comes because certain people understand our technology and recognize that it delivers scale, speed, performance, data management today. Our technology is our marketing I think.
From the day we started working on search more than seven years ago and from that point forward we invested heavily in R&D and in the technology itself to ensure that others who would like to use the technology could do so in a scalable, high performance manner. Our product performs many content-related functions in a way that eliminates the complex, expensive, and time-consuming technical work that characterizes many of the high-profile search systems.
What sets Intelligenx apart from such competitors as Dieselpoint and MarkLogic, for example?
Intelligenx was first to market with technology that offered a true full-text search with what many people call faceted or assisted search results. To achieve this functionality, performance under heavy loads is the prevailing challenge and simply put, our Discovery Engine® solves the problem in what we think is a most elegant fashion.
"Facets" or "guided navigation" are not just a “checkbox” on a feature matrix but an underlying central philosophy in our technology, the company, and in the development of our system.
We believe that search and discovery, about how to aid users to find what they need through the education and discovery process that arises by exposing structure in data is fundamental. Discovery Engine's feature set and its underlying engine has now processed billions of queries for our customers.
It has evolved to include new functionalities that seek to understand a query and help to anticipate user's needs. Intelligenx is not confined to the traditional "software in a box" model.
Discovery Engine is often sold with professional services, pre-built plug-ins and modules that leverage years of experience and best practices. Moreover, additional core technologies have emerged over the years to support those end-goals - for example, crawling business websites to enhance content, mining log data to improve search experience. For example, with a single click, a user or a licensee can see a report that answers a question, not a long list of results.
While the technology works well within the enterprise on intranets, Intelligenx has typically catered to larger scale public-facing Web sites like the yellow page service offered in Brazil. These types of implementations must accommodate extreme traffic and complex business and relevancy rules. Performance and scalability are, therefore, paramount and the tools and features associated with scaling the technology are very robust and highly refined. The result is a solid end-to-end system that is geared to the end-user.
Is your system engineered like Google's?
We think that Google and Intelligenx share some common design philosophies. Like Google, we have focused on design principles that allow fast, economical scaling; data management techniques that avoid the known bottlenecks that plague most competitors' systems; and a "snap in" architecture that makes customizing a system much easier for our customers.
What's your approach to search? For example, do you discover entities or perform other advanced content processing functions? What's the architecture of the core engine?
At the broadest level the system performs full-text search and returns results according to a very flexible relevance ranking scheme and according to the most specific facets or attributes that describe the resultant set. Employing the in-built entity extraction can further refine the searchable data reducing the recall but increasing the precision. Additionally categories can be dynamically clustered or “determined” on the fly through a variety of cluster-flow techniques.
What do you mean "cluster flow"?
Let me simplify. We process in parallel a stream of information. Using various algorithms, supplemented by a knowledgebase if the licensee wants to exercise this option, the system can add additional metadata to a document or a part of a document. These objects are classified in a precise way.
What about the components of your Discovery Engine?
The core engine architecture comprises an index engine and the search engine. The index engine governs the data path which allows for virtual aggregation of data from a variety of sources – documents (txt, xml, html, pdf, ms office documents, etc.), databases (oracle, ms sql, mysql, etc.), or any other text accessible data format or repository. The Discovery Engine created index is segmented for easy incremental updates.
The Search Engine accesses the Index to return the appropriate categorized results. Custom search plug-ins are used to customize the “Search Flow” governing, for example, which linguistic tools to use at what time, how to incorporate the usage of category matches, sort order business rules, failed search handling, and more. The query processing, the search logic, and ranking facilities are all highly customizable. In order to accommodate even the most complex search logic and business rules.
From a scalability perspective the system is highly granular and has been particularly engineered to ensure that it can be scaled out across multiple processors and multiple servers. Complete with its own multi-server clustering communication – repositories can be split up across multiple servers into “slices” to handle very large data volumes. Distribution and aggregation of results is seamless to the application developer. As a fully stateless system, servers can be expanded horizontally for infinite scalability.
What's a typical typical use of your product by a customer?
You know that a number of large yellow page publishers use our system, right?
Okay, let's talk about a typical installation in South America, what we call a "local search provider".
Local search providers like yellow page publishers and companies like ilocal.nl (a pure-play local search engine in the Netherlands) have a unique set of technical and business requirements that cannot be solved using conventional search technology.
For this reason, many local search-related projects run drastically over time and over budget, and desperately short of the expectations of the user community. Consequently, these companies often end up with highly complex and inflexible search platforms that have been developed on top of incompatible search technology. These systems often suffer performance issues and are difficult to manage because necessary changes, big or small, are costly and require substantial development each time.
This company heard about our system at a major telecommunications conference I think. We demonstrated our system and explained how our Discovery Engine combines the best aspects of a full-text search engine with real-time multi-faceted browsing.
This company saw that our search system was intuitive search and that the retrieval on its structured data was ideal for their local search. For such an application, the company told us that our Discovery Engine provided fast and scalable search and indexing. From the outset, the company saw how our system allowed ilocal.nl to exploit the value of its directory information.
I recall that ilocal.nl liked our way of dynamically presenting search results organized by multiple categorizations (i.e., locations, headings, and normalized keywords). These categorized results provide users with a context and scope about the data and their search result set. This core search functionality shifts the paradigm for local search providers and brings new flexibility and vast new capabilities for product development, usability, distribution and a new paradigm in search-based advertising.
We now know that for companies to take full advantage of the emerging opportunities in local search, it is necessary to have a deep understanding of the business and an appreciation of what’s possible.
I think our hands-on experience and understanding of the directory business combined with our direct and continuing involvement in the R&D of our search technology enables Intelligenx to bring a unique perspective and distinct competencies. We continue to push ahead in R&D to constantly provide our customers with innovations like the single search bar for IYP, advanced fuzzy search and failed search resolution, and the new Intelligenx AdOptimizer™.
Almost every vendor today says that each system handles both structured and unstructured data? What are some of the challenges organizations face with when dealing with structured and unstructured data? How does Intelligenx handle these two types of data?
The challenges with unstructured data are the same that have always been there; for example, how to extract meaningful information to search on. Meta-data associated with unstructured data as well as entity extraction are a few useful techniques. The in-built search flow and hierarchical search logic is also helpful - so examining the "highest' quality structured data first and then drilling into the less.
For structured data sets, some of the value of the metadata can be exploited using conventional RDBMS software (allowing users to select from multiple pull-down menus) or conventional full-text search (having the metadata of each record be searchable). However, neither conventional method fully exploits the value of the structure in data. As a result, much of the worth of the structure remains hidden and untapped.
We use extremely high performance indexes of structured data to allow each query to maximize the amount of information that can be gleaned from the structure itself and returned to the user to inform his next search. At the same time, we integrate a robust full text search of unstructured data into the same search operation. The trick is to seamlessly integrate the search experience so that the end user is not aware of the underlying structure or lack of structure in the data. We use intelligent, customizable query processors to extract the structural elements and full text query terms. We also routinely apply our content enhancement technologies to unstructured data to bring out hidden structure in the form of ontologies and taxonomies.
Structure is rampant in most data sets even when it is not readily apparent. For example, Web sites are either .coms, .orgs, .edu’s, or some other domains. In a local search, not only is every business tagged with business attributes, but each business also has a physical location that can be classified into a hierarchical structure (for example, state, county, neighborhood, city, street). Discovery Engine helps to fully exploit the value of structure in data by presenting search results organized by multiple categorizations with counts. Users can find (or discover what they are looking for by refining (or expanding) search results. The structure in data can also be used to reduce the number of failed searches (e.g., there are no Ethiopian restaurants in Herndon, Virginia, but there are these types of other ethnic restaurants). Finally, Discovery Engine allows the use of structure in data to drive better relevance and ordering of search results.
One of the problems with some of the current search and content processing systems is their administrative overhead and complexity. What does Intelligenx do to reduce these two major problems?
Intelligenx has a simple API and an administrative tool that ships with the software. Intelligenx does not "lead" with simple administration. Serious and useful search applications are not meant to be plugged in and left alone - like any high performance system they typically perform best when crafted to the situation. So while Intelligenx does have tools to ease the administration burden - typically a well trained team is the best solution to minimize complexity.
Throughput or the amount of data processed every minute is an issue in some vendors' systems. What's the processing capability of a typical Intelligenx system?
High performance, high availability search server implementations are at the core of our customer base. Therefore, we pay very close attention to optimizing query processing performance.
Performance varies substantially based on the application, data requirements and the hardware utilized but the following are good guides.
A simple search across 10 million records takes 50ms on commodity class processors. The search performance scales with the number of CPUs available for query processing. This means that a dual processor server running at 100 percent capacity can serve 50 or more queries per second.
The system scales very easily such that additional servers and processing power can be added to increase the throughput. As additional data is added, additional servers and processing power can be added to keep latency low.
In terms of Indexing speeds – performance is quite fast and can reach five million records per hour using commodity class servers. The indexing process is also scalable and can be distributed out amongst multiple servers to handle larger data sizes at faster processing times.
Many vendors are jumping on the analytics and report bandwagon. What typeof reports does your system support? Are there reports for users? Are there reports for system administrators?
We use the Discovery Engine to provide a comprehensive business intelligence and an analytics engine. Our product, InsightX, provides not only the ability for site or system administrators to look at snapshot reports, but also to drill into arbitrary level of detail to explore trends or anomalies and they can also query the logs to find information that would otherwise be almost completely invisible. Information about user behaviors that is captured in log files is critical for most of our customers. InsightX provides access to log information in an exceedingly empowering way so that marketing professionals and product managers have the information and insight they need to effectively perform their functions. We are also working towards providing analytics on user behavior and site performance using data mining and knowledge management technologies.
What are the major new features in the current version of Intelligenx?
That's a good question. I will try to remember some of these because we continue to add new features. We have made improvements to our ability to acquire content and perform entity extraction. We have tweaked our query processing to handle foreign grammars and we offer a single search box or federated searchthat can go across multiple collections of information. I think I mentioned the clustering function. Oh, we also have included a category ranking feature, so a user can see which category may be most relevant to the query before diving into specific records. We added some Web services features and support for XML Serializable Objects. Also important is our adding aggregation of document fields using statistical functions (MODE, AVERAGE, MIN, MAX, etc). I mentioned count flows. That's what I remember off the top of my head.
There's quite a buzz about the acquisition of Fast by Microsoft. What do you think the impact of this will be on the search sector in general and on Intelligenx in particular?
Not too much in the short term. Fast is still trying to sell "search" as an enterprise software like RDBMS database software is sold. The reality is that search is sold and consumed in much different ways.
For Intelligenx, while we have competed against Fast in the past, we have always prevailed in a head-to-head challenge simply because we pursue opportunities where the prospect actually requires a faceted or assisted search rather than an industrial strength full-text search like Fast ESP.
We do not see that the acquisition of Fast will pull the company to become any more specialized. Rather, Microsoft is more likely to further generalize the ESP technology from Fast and adopt that technology for applications in their existing product lines like MS Sharepoint Server which was an $800 million dollar business for Microsoft in 2007.
What are some of the challenges search vendors face?
I think that search is a open-ended problem. Different applications have substantially different needs that keep evolving over time.
In search, there are no meaningful standards like the SQL for databases. A developer cannot easily move from one technology to another and evaluate different search solutions side by side. There are non-trivial nuances to dealing with particular types of data and industries as well as in dealing with specific languages. Finally, there is the tough issue of human learning and human input. The success of the Mahalo search system is an example.
What are three major trends you see coming in search in the next 12 months?
We think about this a great deal and talk with our customers. I'm not sure about the order, but there are three of four ideas we think are important.
First, this "social search" concept. This means a better use of user behavior, user communities, user activities and user-generated content (for instance, reviews and user-provided ratings) into search results and the interface. This is somewhat like personalization but less granular and more macro level trending.
Second, the one-size-fits-all approach is not what most users want. A few years ago, the predictability of the search results was very appealing. Now we think it is now quite limiting. Google's universal search is a good main stream example of where certain queries are treated "differently". in the enterprise in particular certain queries or certain users should be given different results, different interfaces
Third, users want richer interfaces. The laundry list of results is quite limiting. The folders like those used a while ago by Northern Light and now by Vivisimo is better but still limiting.
How do you meaningfully and intuitively educate a user about what’s available without the user having to read too much? How do you employ graphics, rich interaction, intuitive navigation controls for broadening and narrowing so the user does not have to read so much text?
Search is very challenging. At Intelligenx, we will keep working on these challenges.
Several companies are following in Intelligenx's footsteps. Intelligenx continues to innovate and capture new customers. The speed of the system is remarkable. The company showed me a one-click function that allows a licensee's sales person to look at what hits are related to ads displayed with a list of results. The licensee's sales person can contact a company whose ad would have displayed had the company purchased the service. Intelligenx's user-facing features are impressive, but the company offers a number of useful functions that permit a licensee to use data generated by the Intelligenx system in ways that make a direct, immediate contribution to sales and system design. Before making a decision to license a system from a vendor specializing in structured and unstructured data, invest a few moments to learn more about Intelligenx's system.
Stephen Arnold, May 12, 2008