An Interview with Emeka Akaezuwa
I met Emeka Akaezuwa in the student center at Drexell University. I was giving a talk, and he offered to catch up with me to demonstrate his Universal Search system. I purchased the desktop version of his system and found that it had several useful features. What impressed me was that I could easily point the system at an old Outlook archive and quickly search for emails and attachments in the bloated, proprietary PST gym bags of data.
What surprised me was that Dr. Akaezuwa had not been on my radar despite his tenure at Dow Jones & Co. and later at Elsevier. We had a number of acquaintences in common, and I found myself eagerly arguing with him about the needs of users and the future of search.
The full text of our conversation about his Universal search system and the future of search and retrieval appears below.
What's your background?
I have a Ph.D. in Information Systems & Structures from Rutgers University. I worked at Dow Jones & Company where I was project manager for the re-engineering of the Dow Jones News Retrieval System. Later on, I became advanced technology manager, responsible for search. I taught management information systems at the school of business at Ryder University in New Jersey. Finally, I worked for Elsevier where I was Director of Digital Libraries. In that capacity I was responsible for building search systems for government, academic and corporate clients in the Americas. So my background is designing and building large-scale information retrieval systems.
What's a Universal SearchOS?
Universal Search Operating System (SearchOS for short), is a next-generation search technology that runs on any device that is capable of storing digital data.
What do you mean any device? How about a USB memory stick?
Yes, USB. Here's a watch with the Universal Search system installed. I made this to show law enforcement professionals how a small device could be used for certain types of quick, forensic searches.
Okay, go on, please.
Think of SearchOS as a universal remote control for your Data. SearchOS brings search interoperability across devices, computer networks, operating systems and dataspaces to the market. Gaviri Technologies (soon to be renamed Universal SearchOS Inc.), is the world’s first and only Universal Search Operating System company. SearchOS ushers in the era of what I call "many devices, many OSes, many data spaces." I am convinced that this "One Search," The Universal Search Operating System, approach puts us in a class all our own.
What was the trigger in your career that made search and retrieval a focal point? Weren't there other, easier opportunities for you to use your technical training and expertise? Why not work for McKinsey & Co. or the US government?
The trigger was the realization in the early ‘90s, of just how complex search technology is. I was intrigued by the challenging engineering needed to solve the proverbial finding a "needle in a haystack" problem, which in my view, search technology aims to solve.
At the same time, I was frustrated by the many, incompatible computer interfaces and search silos that made finding this "needle" – the right data for a particular user – in a haystack of information difficult.
Today, the problem is even more complicated because we are not talking about finding a needle in a haystack anymore, we are talking about finding a needle in haystacks (think distributed data).
There were easier opportunities for me for sure. I could have continued as a technical director, managing people. I could also have decided to take the safer, easier, well-travelled path – the legacy-think, silos approach to search.
But for me none of these would have been mentally challenging. I enjoy the challenge of solving problems that no one else is solving. I like to do something different, something beyond-search. There’s no intellectual challenge in imitating what others have already done.
What problems aren’t other people solving?
True search interoperability – the ability to make search work independent of devices, data storage frameworks or computing environments. This is what matters most to users today. You would think by now we would have really smart search that understands users’ contexts and morphs accordingly to find data, regardless of the devices or the dataspaces holding the data.
But legacy search solutions are clueless to the ways of the modern searcher.
That's a word I like to use "clueless".
Yes, these systems don’t understand the user and user contexts. Most search systems don’t understand portable data devices, don’t understand wired and wireless networks, don’t understand distributed data and they are oblivious to social networks. Search generally still looks for the needle in a haystack when there are many haystacks. And they are imperial, they force the user to conform to their narrow definition of search.
What are these barriers?
So, from the point of view of the modern user, legacy solutions are a hindrance because they create artificial barriers to findability. Consider this, in a world where users employ a panoply of mega-storage, portable and network-enabled devices to create and store data, desktop search engines know nothing about these devices.
Most mobile search systems are oblivious to the data on desktops. Web search cares little about portable, desktop, mobile devices. Behind-the firewall (enterprise) search inexplicably treats the most important internal dataspaces – the corporate desktops that are at the center of information creation and productivity – as second-class search citizens. Clearly, we need a better search framework. Search needs to be brought into the 21st century.
How do you see your Universal Search technology getting around or through these road blocks?
I like to talk about our approach as a Universal Search OS framework. Universal SearchOS was designed from the ground up to address how search should work in a new era. In designing Universal SearchOS, I set out to redefine search as a user would want it to work today.
The vision was a world where search is no longer bound to specific devices, operating systems, frameworks, locations or data type, but by the needs of the user. I call this User-Centered Search. This vision is what informs the architectural and operational capabilities of SearchOS. This is what sets us apart. We’re blazing a new trail, one informed by the future rather than one tied to search’s archaic roots.
To me, it is important to build the conceptual foundation and operational capabilities of 21st century search on user profiles, interoperability, connectivity, universal data access, morphing technology, embeddable search and security and privacy, rather than to continue with the silos concept. Concept extraction, automatic summarization techniques, probabilistic methods, semantic search, Bayesian algorithms and natural language processing won’t have much bite if they can’t reach most of the data. If you searched only 40% of the possible datasources (because of legacy restriction not because of datasource relevance restriction), have you found the most relevant documents? What is the impact of silos restriction on recall/precision in the age of distributed data? How does lack of search interoperability and limited data reach impact decision engines, concept extraction, automatic summarization techniques, probabilistic methods, semantic search Bayesian algorithms and natural language processing? We’ve got to rethink these in light of user behavior changes and the across-the-board technological advances of the past 20 years.
What are the some of the features of Universal SearchOS?
One of the core features of SearchOS is that it is a universal operating system for search. It hides all the device connectivity, OSes, network access complexities and lets the user simply find what they are looking for. Specifically, the core features can be summarized as search interoperability, automatic e-discovery, universal file management across devices, networks and dataspaces and portability. Portability allows SearchOS to run from any device capable of storing digital data – external drives, flash drives, wristwatches, digital cameras, iPods, picture frames, cars, trucks, airplanes, wearable devices, etc. with the same industrial-strength search features and functionality. There are lots more very useful user-centered features, but I’ll stop at these.
Many beyond-search capabilities set Universal SearchOS apart. We have already touched on its search operating system behavior, search interoperability, native e-discovery, universal file management and portability. These in and of themselves are important, but there are more, many more unique capabilities. I’ll start with Search Morphing Technology (morphy) – SearchOS morphs, based on the user’s context. The software is mobile search when the user is mobile, it is network search when a network is present, it is desktop search when only a desktop is available, it is universal search when one or more devices are connected. It is behind-the-firewall search when the user walks into the office. It is Digital Home Search when the user is at home. SearchOS adapts (morphs) On-the-fly, no user or IT intervention needed and no widgets or additional software to download. Morphy introduces the concept where software adapts to the user’s context rather than the current approach of forcing users to adapt to inflexible software.
Two, sandbox indexing and searching techniques (or index hiding). Sandbox indexing provides a base layer of built-in data privacy. Three, User-Centered Universal Data Access. SearchOS automatically makes available all datasources that the user has access to, based on the user’s log in credentials. Additional datasources can be added to search on the fly – no programming, scripting or downloads required. Four, Spheres Of Search Influence (SOSI) for defining user and administrator spheres of influence. SOSI inverts search flow so search is initiated from the user sphere rather than from a server/admin sphere of influence. Five, Zero-Deployment search (ZD-Search). Because of its User-Centered foundation and Spheres Of Search Influence design, SearchOS is a zero-deployment behind-the-firewall search solution. Six, Search-Device Integration (SDI). SDI integrates device knowledge into search so search is ‘aware’ of a user’s portable data devices. Seven, Modular, Dynamic Search Logic (MDSL). Eight, Embeddable Search (ESD). ESD allows for embedding of SearchOS in firmware on a device or microprocessor. Nine, Trusted Search Technology. SearchOS provides trusted search by way of trusted devices (portable device registry technology, user profile, sandbox indexing and spheres of search influence techniques. Finally, Plug and Play. SearchOS is plug and play, no configuration or additional work needed to get this beyond-search technology to work.
In addition to these, we have advanced search query, Search-as-you-type, ability to index and read email, even if Outlook is not installed, Boolean logic, proximity operators, fielded search, advanced stemming, synonyms, stop words, phrasing, hit highlighting, folder-specific search, support for ID3 tags and full multi-field sorting.
Our entire approach is beyond search, I think. As you can see, we do not think in terms of desktop, behind-the-firewall (enterprise), Web, Mobile search. We use these terms only as familiar frames of reference for our customers and for licensing.
What's the floor licensing fee?
For the Digital Home solution, the licensing fee is $50. For this amount, the user may run SearchOS on up to fifteen devices – home servers, network drives, digital TVs, picture frames, home PCs, laptops and gaming consoles. The Professional version is $40. This license allows the user to run SearchOS on up to eight devices. Licensing for enterprises and OEMs varies, depending on the number of users and devices. SearchOS brings unrivalled value to behind-the-firewall search. For one, it is orders of magnitude less expensive than a legacy solution while it improves productivity faster and by an order of over 50%. It is zero-deployment and even though it can index billions of documents, it does not require the purchase of additional servers.
How can you index so many documents without additional servers? How many documents can the system index? What's the content processing throughput on a dual core dual processor Dell 3950 with SATA drives and eight Giga of RAM?
SearchOS can index so many documents without additional servers because of its Sandbox, Distributed Indexing Architecture. Let us take a behind-the-firewall setting with one thousand users and maybe six servers. Each user indexes all the documents within their sphere of search influence (desktops, portable storage devices, etc.) using their PC or laptop. If we assume that each user has three million documents, we would index three billion documents. And if each of the six servers has fifty million documents, we would index 300 million documents on the servers. Indexing is distributed on each user’s machine.
By design, SearchOS can index as many documents as are available on a device. We did not test on the specific hardware configuration you mentioned, but SearchOS’ processing throughput on a dual core processor is about 3,000 documents a minute. Keep in mind that SearchOS does full-text, not partial-text, indexing so the number may be less if a system has many large-size documents (over 500 MB). The software has a CPU utilization throttle that allows a user or a sys admin to power-up or decelerate content processing throughput to match available system resources. SearchOS not only morphs to user contexts but also can be scaled to a device’s content processing capabilities. Device-scaling – which I did not mention as one of what sets us apart – is necessary given the array of systems – from resource-challenged PDAs to high-powered servers that SearchOS must run on.
The number of new companies entering the search and content processing "space" is increasing. Are there too many competitors in this market sector given today's economic situation?
To some extent, there will be a shakeout in the search and content processing industry. The companies that can provide real search value will survive. But I think there are lots of search opportunities. As you well know, there is an explosion of user-generated content – no one really has a handle on this yet. There are also lots of vertical content-processing possibilities – medical records, real-time search, facial recognition search, financial data search, location-based search, voice-activated search and what I call Digital Home content processing opportunities. In my view, there are chocolate chip cookies in each of these verticals. The successful companies will be the ones who know where the cookie jar is and are able to get there faster.
What are the functions that you want to deliver to your customers?
We want to continue to enhance the search interoperability, universal file management and search portability functions which we have pioneered. There’s a lot more we can do for our customers in these areas. We really have barely scratched the surface.
What are two or three of the key features you are / will be implementing?
The two features we are considering relate to what I call smart sets and enhancements to cloud computing. How about if we just leave it at that?
There's a push to create mashups – that is, search results that deliver answers or reports. What's your view of this trend?
Mashups are a good idea. But mashups will be much more potent if they are based on a Universal Data Access foundation as opposed to a Data Silo foundation. If the silos foundation underpinning the mashup prevents the decision engine from accessing all the data needed to formulate the right answer or generate the right report, then the ability of mashups to deliver accurate answers and reports will be undermined. As you may already have noticed, universal data access is a big issue in search as far as I am concerned.
Are you supporting other vendors' systems or are you a stand-alone solution? What differentiates your approach from the systems available? Can you be more specific? What are some of the key differentiators?
SearchOS is designed to work with other systems. Its inverted search flow approach allows the user to run search from any device and connect to other systems. SearchOS may use connectors, federated search techniques or SQL to talk to other systems. The key differentiator is the ease with which SearchOS integrates with other systems.
Semantic systems have been getting quite a bit of coverage, yet the Powerset technology and other semantic players like Hakia.com have been slow out of the gate. What's your view on semantics and natural language processing? Are these technologies ready for prime time?
Semantics and natural language processing (NLP) systems are great search solutions if the technologies can be made to work correctly. But this is a tall order because language is complex even for humans. Getting semantics and NLP to understand what the user is looking for will not be an easy task. But I will say these technologies have made significant progress in terms of extracting meaning from queries. As you know, semantics and NLP require an enormous amount of CPU processing. Second, today’s searchers use one or two search terms. Semantics and NLP don’t work well with few search terms because extracting meaning from one or two search terms is difficult. Also, as we know, Semantics and NLP processing work well with small collections but the reality is that we are facing data explosion. How quickly and how well semantics and NLP solve these problems will be critical. I think these systems will be ready for prime time soon because big brains are working on the problems.
A number of vendors have shown me very fancy interfaces. The interfaces take center stage and the information within the interface gets pushed to the background. Are we entering an era of eye candy instead of results that are relevant to the user?
Users won’t be fooled by eye candy. My view is we are not entering an era where nice interfaces will rule over results that are relevant. Vendors are always looking for differentiators, so some vendors may think sleeker interfaces are it. But success in the search industry has always been predicated on the ability to return relevant results. People search to find relevant answers not to marvel at fancy interfaces. I don’t see anything on the horizon as far as user behavior and expectations are concerned, that suggests eye candy is the way to go.
What is it that you think people are looking for from semantic technology? What is that discontent that you describe? Is it richer interface, better content? What are people really looking for?
People are looking for relevant answers to their queries and they want the answers with minimal effort and a simple interface. Richer interfaces are nice but worthless in and of themselves. Better content is always a good thing, but relevant content is what was, and still is important. Semantic technology is a great idea but it still faces serious problems with regards to ‘understanding’ users’ queries in context, and returning relevant results. The technology is improving, but it still has ways to go.
What are the hot trends in search for the next 12 to 24 months? How will you take advantage of them; for example, go public, partner, sell to a larger firm, etc.
Universal Search and embedded search are the hot trends in my view. As pioneers of universal search, and embeddable search, we already have products that address these trends in a unique way. We foresaw these trends a long time ago. When we started working on the idea of universal search we had difficulty convincing VCs and search experts that a Universal SearchOS was a possibility. We were told the idea was ‘pie in the sky,’ ‘a solution looking for a problem’ and were in some instances, laughed out the door. I am happy we stuck to the vision and now have a viable product in the nick of time. We intend to take advantage of our first-to-market position by partnering with other forward-looking companies.
Where can people get more information?
Please, ask people to navigate to our Web site at www.gaviri.com
After the interview, I learned that Dr. Akaezuwa has an office in South Africa. In addition, he works with a number of organizations to make technology a part of their learning experience. We use the desktop version of Dr. Akaezuwa's system to deal with email archives and use the system for certain types of information projects where the contents of the collections are not known in advance.
As customers, the Arnold IT.com team has found the system to be speedy, stable, and adept at returning relevant results. We recommend a close look by individuals and organizations interested in a next-generation search system with considerable value for forensic search and retrieval.
Stephen E. Arnold, September 8, 2009