Attivio
An Interview with Ali Riaz
. |
I'm now sitting in Ariadne, within spitting distance of the Mass Pike. I arrived late, but the affable founder of Attivio is gracious. "No problem," he says. I explain that I had zero clue where Newtonville, Massachusetts, was. I add that Google Maps sent me to a beer depot one exit west. I get my iced tea (part of the new health kick my doctor has me follow), and I turn my attention to Ali Riaz, entrepreneur and wizard executive at Fast Search & Transfer and other high-tech operations. |
The full text of the interview appears below:
What's an Attivio? I have a tough time spelling the word.
“Attivo” is Italian for “active”. We added the extra “i” for intelligence (and it makes it easier to say in English).
That's the type of detail I associate with Steve Jobs. Is this attention to detail a characteristic of the company?
I don't know whether to be happy you compared me to Steve Jobs or whether I should be concerned. He's much more exciting than I am. I just try to make decisions that help people with information problems have a way to make a positive association. Like I said, active and intelligent information. I really hope we make Attivio into a Pixar or an Apple. People are really interested in what we have. The market is, no pun intended, active for us right now.
You're a newcomer in search and content processing, right?
We're new, but not that new. I've been involved with search, content processing, and business intelligence for a long time. Plus we have been very careful in selecting people for the Attivio team.
For example, we have people from a number of search vendors. We have a number of Fast Search & Transfer alums too, plus people from companies that sell products in the structured world.
My memory may fail me, but we have experienced professionals from Northern Light, AltaVista, LingoMotors, Dragon Systems, OpenPages, BasisTech, Inso, Sybase, Ab-Initio, Netezza, Attunity, Thomson Reuters, Fidelity Investments, and Palladium.
Hiring technical talent is a problem for many companies. What have you done to attract these professionals?
Relationships. It is fair to say that a majority of us have worked together in our past lives. The opportunity to work together is one of the most attractive features for us at Attivio.
What are you doing to create an active and intelligent information system? There must be 200, maybe 300, companies in the US alone pursuing this market sector.
From all our accumulated experiences in the industry, we realized that search is simply not enough to solve the problems that search has been trying to solve. We realized that today’s search platforms – specifically enterprise search – have become legacy technologies. The market needed a fresh approach, and that’s why we created Attivio. You wrote a study about this shift.
Yes, but what's the difference between your approach and efforts underway at IBM, Google, and companies that are also in this business?
That's a good question. Let's go to the basics. We started with some core beliefs that have given us a competitive advantage.
The first belief is that a successful software company must build a strong framework around acquiring clients and partners and making them both successful. This is the main driver behind the makeup of our team.
Second, search results from unstructured content such as email, documents and Web pages, or query results from structured databases alone always sub-optimize the business process and polarize the information retrieval capabilities of enterprise applications.
Excuse me, what do you mean sub optimize?
What I mean is that the business procedures in organizations are often secondary to the needs of the search system. The proper focus, we think, is that the company's business process must be the focal point, not the needs of the software.
Thanks, what's the third tenet?
The third belief is that answers have to get to the right person. So Attivio can deliver answers based on corporate information to an individual. We can also deliver the answer to another piece of software. Our focus on getting the information to the right person shifts from finding to using. One of the problems many enterprise search systems have is that the entire system focuses on finding. The other functions are less robust.
But most of the enterprise search systems are intentionally complicated. Don't the vendors want to sell consulting, essentially following in Verity's footsteps?
That's a good point. Our fourth belief is that sophisticated software should be easy to install and use for all companies. Innovation should focus on introducing breakthrough technologies for sure, but it should also help people manage and improve the technologies.
Usability?
Yes, usability, but each of the puzzle pieces has to be easy to fit together. So our fifth belief is the system has to deliver a universal index.
What's that?
Okay, this is really important for us at Attivio. You know that many of the big search systems bend the search engine to accept structured data or force a database to understand unstructured content. And you know that both of these compromises have proven to be failed approaches, right?
Yes.
Just putting a search box and a bar graph together in the same business intelligence “corporate dashboard” is not solving the problem either.
The key is to devise a universal index that manages both structured data and unstructured content without losing the integrity of either. This type of index must be built from the ground up with both data and content in mind in order to succeed; a true “mash up” of structured and unstructured information to allow for a true “mash up” of search and BI functionality.
You aren't really delivering key word search. Are you pushing into business intelligence?
Yes, but more the information management behind the BI rather than just the graphics. What we are doing at Attivio is focusing on bringing new solutions to an emerging market that is variously referred to by industry analysts as “information infrastructure”, “information fabric”, or “unified information access”.
We offer a suite of products for different applications, for example discovery, information portals, and content transformation, that are all based on a common underlying platform we call the AIE.
What does AIE mean?
The Active Intelligence Engine, or AIE. Our AIE enables enterprises to blend their structured data and unstructured content without compromising the richness of either, offering the precision of SQL and the fuzziness of search by “mashing up” search and business intelligence data warehouse technologies.
The push back about key word search systems in general is getting more forceful. Do you think the naked search box, Boolean, and typing one word and hitting enter is reaching the end of its useful life?
No, I do not, but your statement hits on some search sore spots.
Legacy search thinking suggests that the search box is the answer for everything. At the simplest level, people look for things in two ways: “I know what I’m looking for, just tell me where it is” (the search box), and “I don’t know what I’m looking for specifically, but I know how to describe it so let me navigate and explore; let me ‘disambiguate’ my way to the answer.”
We think the latter approach is the more powerful of the two and what we’re hearing from prospects, especially in the enterprise. In many ways, you can think of the search box as the user interface of last resort. It has its place, but search is best used when it is woven throughout an application’s capabilities rather than offered as an isolated, off-to-the-side interface on its own.
Search needs to feel natural to be truly effective. Of course, when all else fails, then type something in a search box, but a good search engine prevents this happening wherever it can.
What do you mean by navigation? How is navigation related to the user interface?
I use “navigation” here to mean a user interface that lets the user click on items to drill down, up or out through patterns in the content to refine the body of information presented on the screen. The user interface could be a list of terms – the most common approach – or alternative interfaces like a tag cloud or some other graphic representation.
The key here is where the navigators come from. Navigators or links can come from a traditional taxonomy, and in markets where there is an accepted, widely-used or static model, this makes sense. Medical information services use the medical subject headings. In other sectors, there is no single term list.
But the more intriguing sources for navigators are from the content itself.
At Attivio, we ask, "What is the content telling you?" Facets, Use For terms, and See Also links are popular because they nicely exploit the inherent structure of the content. A common example is navigating through product properties from a product catalog for an eCommerce application. Another source, popular with tag clouds, is the exposure of concepts derived from the content. Another is extracting entities such as names, places, dates, etc.
As for Boolean logic, the average search query is what, 2.2 words long? That says it right there. A syntactically-rich Boolean-based advanced search language is very important for a specific class of people – librarians and engineers come to mind – but it is using technology to solve a human problem. I know some very successful, smart people who happen to not be technology-savvy, and they don’t know how to construct Boolean search queries. Why should they have to learn?
What are you offering that takes a user "beyond search" or gets the user free of the search box trap?
I’m assuming with the way you worded this question that you refer to “search” and the “search box” as essentially the same thing; the way Google might think of it.
But Google hates me. Go on.
Okay, then the navigation capabilities we just discussed offer the most common alternative to the search box. But most legacy search engines support navigation in one form or another; it is hardly new. The real value, though, comes from the ability to generate on-the-fly navigators for each query in real time.
This is not possible for a taxonomy for obvious reasons, but is for the other navigator sources, especially facets. Think of it as a recommendation engine. The system takes the handcuffs off the user and says, "Here are the facets we think best fit the query and the order we think they should be listed." No need to define or declare the facets up front, although you could if you wanted to. As far as we can tell, we are the only vendor who can do this, and every time we show it to prospects, we get an immediate, positive reaction.
Does Attivio do alerts? Push results to a user?
Yes, and alerting brings a whole new dimension to the engine. Rather than asking a question and getting an answer, you can ask once and have the engine tell you, perhaps through email or to your cell phone, every time new relevant content appears.
This is the simplest form of alert, of course, and most of the legacy search engines support it.
What does Attivio do?
So what if the alert could post an event to an application to tell it to launch a process, or post a message in a queue, or write data to a database, or ftp a file?
These features are built into the framework of AIE. We currently support A significant number of communication protocols to handle the movement of information in and out of the engine at any point in the ingestion and query processes.
In fact, we can function as a completely data-streamed process. We also support low-latency properties that require rapid updating. The point is to get your information to do something without having to explicitly do it yourself. Not only is the search box not used in this case, the engine could be operating entirely lights-out.
That sort of technology is computationally intensive. How can you perform these advanced functions without imposing hefty hardware costs on your customers?
AIE was designed from the beginning with scale and performance in mind. It can index over 100 million documents per server, where many legacy search engines cap out at around 40 million. It can ingest content at a rate of 1,000 documents per second – about 30 gigabytes per hour – possibly the fastest in the industry.
Are you confident in these performance figures?
Yes, these are real-world numbers all achieved on the same server, a standard production box typical for search index storage and a document size typical of enterprise content. The system was not tuned separately for each test.
Also, the index scales linearly like many systems do, but with AIE you can add the hardware when needed without impacting the running system because you do not have to re-index to expand the index capacity. This means lower setup cost for the customer. You can even physically partition data in separate data silos. To show our confidence in our scale and performance, we will soon be posting the statistics on our Web site: hardware configuration, corpus, and performance statistics. You can review them. Please, let me know what you think.
Okay, I see. What will you do to help a customer reduce the administrative complexity of a system like yours?
Well, first it starts with creating a simple image for your engine with few “parts”. If your engine consists of, say, 100 different services or components, in multiple programming languages and environments, each having to be installed and configured on their own, then you have complexity right from the start and there is no way around it.
The AIE engine core occupies less than 15 megabytes of disk space and is a single, 100-percent Java process. You can install the entire system right from scratch, get it up and running, including admin console and complete demo application in about 12 to 15 minutes (we tested this extensively).
We make the life of the implementer much easier. We provide a single API, designed by engineers for engineers, that covers all ingestion, query, and result processing, and supports java and .NET interfaces. We provide a simple set of semantically consistent configuration files. We automatically roll over our log files. We promote high quality product: we have attained 80 percent JUnit code coverage, way above the industry standard. Even the default demo is integrated into the development environment: you can copy the snippets of code right out of the portal screen for each of its functional components, like for example, managing a tag cloud or a frame of facets.
What makes your approach different from everyone else?
A number of pivotal distinctions. Let me run through the principal ones.
We combined some truly innovative ideas with a good dose of pragmatism and then reviewed them with potential customers. The pragmatism was obvious but not easy: we took all the things that annoyed our customers and partners in our past lives and fixed those first. Some of them may not be very glamorous, but they mean a lot to the legacy search engine user, for example, platform footprint, ease of integration, true dynamic scaling, getting it to work well. Something as mundane as packaging your engine as a single image instead of many, many separate components makes a huge difference. In short, we challenged the assumption that in order to have sophisticated technology you must sacrifice simplicity and ease of use.
We can summarize our differentiation around five main capabilities. The first three we already discussed: the dynamic facet recommendation, the extended alerting functions, and our scale and performance, specifically our ability to let you add hardware to the architecture when you need it without having to re-index.
The fourth is our workflow layer. AIE gives you complete control for indexing content, processing queries, and returning results by passing them through multiple processing stages before they reach their destinations. These stages are organized into workflows that support branching, conditional logic, and parallel processing. As an example, zip files and emails with attachments have historically been problematic with the legacy search engines, but with AIE they are processed automatically through a simple looping workflow that indexes the container first and then the contained items. Our workflows for video and audio files process the meta-data first, making it visible for searching immediately. In parallel, they spawn a separate task to generate the voice-to-text transcription that when completed is added to the meta-data in the index at a later time. AIE provides many of these workflows out of the box but you can also create your own.
The last is our most innovative component: how we handle data from a database and how we relate it to unstructured content. You likely are familiar with the SQL JOIN statement. It defines the cross-section of results among two or more database tables. Imagine extending the JOIN to unstructured content like documents and email. AIE’s JOIN feature understands how to JOIN any two objects that conceptually share a common property. The property could be a field in a database, a tag in a document, or an entity extracted from the content of the text itself. A query like, “give me all blog and press information about our 100 best-selling products in the last quarter” is now doable.
How legacy search engines tackle databases in general is a broad problem. The conventional practice is an a priori SQL query that retrieves a single “shape” of the data at index time. This means the index only ever has one, static set of database results. To execute a different query requires reconfiguring the index and re-indexing the content. AIE extracts all the data from every table in the database at index time, but performs the JOINs at query time, within the AIE engine, using various techniques like MapReduce. Aside from greater flexibility in data response, the system performs much, much faster.
Most internal information technology units are overwhelmed.
Is there appetite in the market for YAIPS, yet another information processing system?
Of course, for precisely the reasons we've been talking about: internal information technology units are overwhelmed. We believe they are overwhelmed by complexity and incompatibility.
Today’s legacy search engines are simply too complex to manage, and needlessly, I might add. As well, their pricing models in most cases are a disincentive to grow the technology throughout the organization because they commonly charge per usage, either index size – number of documents or disk space – or query capacity, or per user. We offer a much simpler, more one-size-fits-all model. By the way, we also federate across to the legacy search systems so you don’t have to rip and replace. This is a request from a number of our prospects.
Incompatibility exists most prevalently between the two basic information silos that exist in every organization: the business intelligence-data warehousing stack for structured data and the search-content management system stack for unstructured content. Our long term strategy is to merge these two.
What is the impact of the failure of Entopia and the buyouts like Autonomy snapping up Verity, then Zantaz on your company?
Don't forget Microsoft's buyout of Fast Search & Transfer. Really, these deals are a justification of our belief that the enterprise search industry is in a classic watershed and wants something new.
The bottom line is that we see the consolidation as an opportunity for us. As I mentioned earlier, we know search is not enough. Perhaps the industry is now figuring that out. As for Autonomy, we believe they have shifted their business from selling search to selling solutions such as Zantaz and etalk, and frankly, I think this is a smart move on their part.
When will your new product become widely available? What is the product line up?
We released our first version along with our first client on January 22, 2008. That client is now live--and quite well. AIE Version 1.1 just came out a few weeks ago, and Version 1.2 will be available end of June. Through the remainder of the year you will see releases of “AIE for Portals”, “AIE for Discovery”, and “AIE for Site Search”.
What are some of the new features and functions that your system will deliver to users?
Our 1.2 release will include all the innovative functionality we’ve talked about. And, please, don't forget the basics that are already in the current product, things like a complete advanced search query language with the usual capabilities like Boolean logic, proximity operators, fielded search, etc.
We also include spelling suggestion using dictionaries or indexed content, advanced stemming, synonyms, stop words, phrasing, hit highlighting, a statistical relevancy model with multiple boosting capabilities, plug-in support for other interface languages like SQL, SPARQL, and XQUERY (which are currently under consideration), full multi-field sorting, the usual 300-odd document format types, a pluggable morphology with optional dictionaries, and a complete security model that ingests ACLs for parsing at query time.
As for the future, it’s our policy not to publicly announce our roadmap, but you can expect us to push the industry envelope in a number of areas, for example introducing a new, state-of-the-art conceptual search.
ArnoldIT Comment
Attivio has attracted considerable interest in the Fortune 1000, and a number of traditional search vendors have expressed interest in what the company is doing. What's clear is that Attivio has developed a line up of products and services that speak to healing the painful wounds inflicted by some vendors of enterprise search systems. The company's technology has been designed to focus on supporting business operations, not throwing them off their well-worn rails. ArnoldIT.com thinks Attivio is worth a close look.
Stephen E. Arnold, May 26, 2008