Interview Exclusive: Exalead’s New US Chief Executive Officer
January 5, 2009
On January 2, 2008, I spoke with Paul Doscher, the newly appointed chief executive officer for Exalead, the Paris-based information access company. I received a preview of Exalead technology in November 2008, and I will summarize some of my impressions in a short white paper on my ArnoldIT.com Web site in the next few days.
The full text of my interview with Mr. Doscher appears below:
Why are you expanding in the US market? What’s your background?
Exalead has seen tremendous growth in Europe over the past few years and unlike some of our competitors, our clients are with us for the long haul. We enjoy 100% customer referenceability in Europe. The US represents a significant growth engine for Exalead and we believe we are in a unique position not just to grow our US business – but to help redefine the information access industry.
I have been in the computer software space for 30 years starting in sales and sales management eventually leading to my most recent role as CEO. I have worked in companies such as Oracle, Business Objects and VMware. Before becoming CEO of Exalead, Inc I was CEO of JasperSoft, the leading open source business intelligence company.
What is the major content processing problem your system solves?
This is a new era in information access. In business, valuable information is increasingly stored in silos – dozens of various locations and data formats – that are hard to retrieve in a way that provides necessary context to the end user. Exalead CloudView has been designed to make sense of the structured and unstructured data found both internally behind the firewall and from external sources. Exalead offers quick-to-implement information access solutions that help workers, partners and customers make better, faster and more accurate business decisions.
What is the basis of your firm’s technical approach?
Exalead provides a highly scalable and manageable information access platform built on open standards. Exalead transforms raw data, whatever its nature, into actionable intelligence through best of breed indexing, extraction and classification technologies.
Can you give me an example of your system in action? You don’t have to mention a company name, but I am interested in what the problem was and what your system delivered to the customer?
Exalead is moving beyond what people generally think of when they think about enterprise search. I’ll give you two examples – one that discusses an innovative use case of searching structured data. The second discusses unstructured data.
First is an example of our dealing with structured data. GEFCO, €3.5 billion company, ranks among Europe’s leading transport and logistics firms. They are using Exalead to track their vehicles. GEFCO’s new “Track and Trace” application is built upon Exalead’s flagship platform that offers powerful search functionality and can provide up-to-the-minute information from an extremely large data set. Integrated into GEFCO’s Internet portal Gefconet, Track and Trace allows GEFCO staff, partners and customers to locate the exact position of vehicles, track their progress and optimize transport schedules in real time.
Second is a project where we search and make sense of unstructured data. Our engineers at Exalead built an unreleased project called Restminer – a site aimed at helping find restaurants in a large city like New York City. What we do here is interesting. Restminer gives the user useful, structured information extracted from the unstructured web including dedicated press, blog posts, restaurant reviews, directories – with relevant tips coming from different sources.
Exalead is French owned company. What’s the customer footprint? As you look forward what is your goal for the footprint in 2009?
At the end of 20008, we have around 190 customers across multiple vertical markets including on-line media/publishing, social networking, the public sector, on-line directories, financial services and telecommunications. We are looking for 50% growth in our customer base in 2009.
The Exalead software was quite solid? What are the benefits your system delivers to a typical enterprise customer? Is it search or another type of solution?
Exalead provides information access and search solutions in basically three market segments: OEM, B2C and B2B.
In the OEM [original equipment manufacturing] market, software companies have realized what a powerful embedded search platform can bring to their own solution. ISVs [independent software vendors] enrich their functional capabilities by introducing new sources of content and more powerful access retrieval into their core applications.
In the B2C space, consumer web sites such as our customer RightMove in the UK are finding that a highly scalable information access solution can save on hardware costs and make their visitor’s experience much better (for www.rightmove.co.uk). Globally, we are seeing sites use our cutting edge semantic mash-up technologies to bring search result from video, audio and text, such as http://virgilio.alice.it/ in Italy.
For our B2B customers, we are seeing companies implement real-time search across multiple data repositories. Any search platform tied to mission critical business applications have to be flexible, scalable and fast. Exalead’s product is used in various mission critical implementations, including track and tracing trucks; operational reporting and large scale document searches.
I recall hearing that your firm has patented technology? Can you provide me with a snapshot of this invention? What’s the patent application number? How many patents does your firm have? What are the key features of the Exalead CloudView system?
Exalead has a significant number of patents granted and pending both in the US and EU relating to the areas of intelligent searching, indexing, keyword extraction and other aspects of the search technology. For example, US Patent 7152064 was issued to Exalead in 2006, providing for improved unified search results – allowing for end users to more easily navigate and refine complex search results.
Our explosive growth continues to drive innovation and functionality into our products – we continue to submit for new patents as our product expands.
In the OEM sector, Autonomy seems to be the giant with its OEM deals with BEA and the Verity OEM deals. Some of the Verity deals date from the late 1980s. How do you see Exalead fitting into this sector?
There is always a place for innovation. We are confident in our capabilities and how they can meet the growing demands of OEMs.
We are beginning to see customers move away from our competitor’s legacy OEM solutions. We provide an easy to implement, scalable and manageable solution. Also, we see growing demand for our simpler licensing model – which makes life much easier for our customers.
Exalead OEM has all the rich features as our other product platforms such as Enterprise Search Edition and the 360 Edition. No matter how huge the volume of information processed by the OEM application, Exalead CloudView provides an easy to implement SOA architecture. OEM customers build applications that search their own system’s content – as well as from any kind of other sources that can be relevant. OEMs can dramatically increase their product functionality and differentiation by adding search of external Web sites, external knowledge bases and building in new hybrid services using our developer kit.
There’s quite a bit of turmoil in search. In fact, the last few weeks Alexa (an Amazon company) closed its web search unit and Lycos Europe (which purchased software from my partner and me in the mid 1990s) said it would close up shop. What’s that mean for Exalead going forward?
Our web search engine is available at www.exalead.com/search. Based on CloudView, it provides Internet users with an innovative way of discovering results and content from the Web’s 8 billion+ pages. Web search has always been a real world lab to test our technologies and user features – some of which, like facial recognition, have been implemented on Exalead well in advance of their use on other major search sites. But, more than this, we consider the Web as a key source of information – competitive intelligence, partner information, customer information, legal documents, external database providers, blogs, etc. There is more and more key information on the web that enterprises need to manage effectively. Exalead Web search is key in the overall Exalead strategy – and the functionality on our Internet search site will continue to drive innovation in our information access platform.
One trend in enterprise content processing is the shift from results lists to answers. Among the companies in this sector are Relegence (a Time Warner company), Connotate (privately held but backed by Goldman Sachs), and Attivio (a company describing itself as delivering active intelligence). Each of these firms is really in the search business but positioning search as “intelligence”. What’s your take on the changing face of search in an organization?
If making information instantly available for decisions is intelligence, we definitely are working in the information intelligence business. Our approach is driven by customer demand for TCO and ROI – we bring real value to businesses looking to make better, faster decisions. For example, at our customer GEFCO, structured data is available in real time for staff and customers so transportation cycles can be adjusted in real time – significantly improving their bottom line.
As the economic crisis depends, we continue to see our partners such as Capgemini, Logica, and Sogeti come up with new, exciting solutions for Exalead CloudView for their customers.
Google has been a disruptive force in search. In one US agency, different Google resellers have placed search appliances, often at $400,000 a unit in a major US government agency. No single person realized that there were more than $6 million worth of devices. As a result, the project to “fix” search means that Google is the default search system. What are the challenges and opportunities Google presents to Exalead? What about the challenges and opportunities Microsoft presents with its strong grip on the desktop and a growing presence in servers?
Ironically, former Google and Microsoft customers fuel much of our sales funnel – so we appreciate and benefit from everyone’s niche in this marketplace.
Google raised end-user expectations about what web search can achieve – it brought a new level of simplicity, relevancy and interactivity. But as we’ve seen as more Google Enterprise Search customers move to Exalead – bringing this functionality to enterprises is a different matter all together.
Google Enterprise Search has technical and functional limits in terms of scalability, security compliance, the ability to search structure and unstructured data and the ability to provide all the necessary context to make a search relevant. Enterprises know that information access means more than a flat list of results – which is driving more companies to look at Exalead.
Microsoft and its acquisition of FAST Search & Transfer brought many opportunities to us as well. For example, we’ve seen a growing number of companies who use Linux or other non-Microsoft operating systems look for a new partner instead of Microsoft.
Mobile search is slowly making headway. Some of the push has been because of the iPhone and Google’s report that queries on an iPhone are higher than from users with other brands of smart phones? What does Exalead provide for mobile search?
Exalead is actively working with mobile companies and telcos in a number of ways. We launched an iPhone search www.exalead.com/iphone in Europe. We are also working with mobile companies to help connect mobile devices to PCs and help accelerate access to mobile content. We will announce more of this functionality in 2009.
The economic climate is quite weak. How is Exalead adjusting to this global problem? I have heard that you have built out a US office with more than two dozen people? Is that correct?
We met all of our aggressive sales numbers in 2008 – in large part because our technologies provide our customers a high return on their investment. We unleash new levels of information access and allow better, faster decision-making. So far, it appears the appetite for our offerings is growing in this economic client.
What are the three major trends you see with regards to search and content processing in 2009?
The biggest trend we see in 2009 is that search will become a development platform. Open product platforms like Exalead will become a platform for new, unexpected solutions by 3rd party vendors.
Other big trends in 2009 will be continuation of what we’ve seen over past few years: smarter context around search results and better searching of rich content including audio and video.
Can you hint at what’s coming in 2009 in terms of features in the CloudView system?
The launch of Exalead CloudView 360 later this year will be a game changer for the industry. Exalead CloudView 360 will have functionality that will transform heterogeneous corporate data into contextualized building blocks of business information that can be directly searched and queried – and allow for an explosion of new applications to be built on top of the platform.
Stephen Arnold, January 5, 2008
Natural Search: SEO Boffin Changes His Spots
January 2, 2009
The crackle of gun fire echoed through the hollow this morning. I am not sure if my neighbors are celebrating the new year or just getting some squirrel for a mid day burgoo. As I scanned the goodies in my newsreader, I learned about a type of search that had eluded me. I want to capture this notion before it dribbles off my slippery memory. Media Post reported in “Search Insider: The Inside Line on Search Marketing” that 2009 is ripe for “natural search”. The phrase appears in Rob Garner’s “Measuring Natural Search Marketing Success” here. The notion (I think) is that content helps a Web site come up in a results list. I had to sit down and preen my feathers. I was so excited by this insight I was ruffled. For me the most important comment was:
For starters, think of an investment in natural search as a protection for what you are currently getting from natural search engines across the board. Good natural search advice costs are a drop in the bucket compared to returns from natural search, and the risk of doing harm only once can far exceed your costs, and even do irreparable damage. I see clients with returns coming from natural search at over one half-billion to one billion dollars a year or more, and one simple slip could cost millions.
I must admit that I have to interpolate to ferret the meaning from this passage. What I concluded (your mileage may differ) is that if you don’t have content, you may not appear in a Google, Microsoft, or Yahoo results list.
What happened to the phrase “organic search”. I thought it evoked a digital Euell Gibbons moving from Web site to Web site, planting content seeds. “Natural search” has for me a murkier connotation. I think of Tom’s toothpaste, the Natural Products Association, and Mold Cleaner Molderizer.
My hunch is that Google’s tweaks to its PageRank algorithm places a heavy load on the shoulders of the SEO consultants. I have heard that some of the higher profile firms (which I will not name) are charging five figure fees and delivering spotty results. As a result, the SEO mavens are looking for a less risky way to get a Web site to appear in the Google rankings.
Mr. Garner is one of the first in 2009 to suggest that original content offering useful information to a site visitor is an “insurance policy”. I don’t agree. Content is the life support system of a Web site. You buy insurance for you automobile and home.
Stephen Arnold, January 1, 2009
Duplicates and Deduplication
December 29, 2008
In 1962, I was in Dr. Daphne Swartz’s Biology 103 class. I still don’t recall how I ended up amidst the future doctors and pharmacists, but there I was sitting next to my nemesis Camille Berg. She and I competed to get the top grades in every class we shared. I recall that Miss Berg knew that there five variations of twinning three dizygotic and two monozygotic. I had just turned 17 and knew about the Doublemint Twins. I had some catching up to do.
Duplicates continue to appear in data just as the five types of twins did in Bio 103. I find it amusing to hear and read about software that performs deduplication; that is, the machine process of determining which item is identical to another. The simplest type of deduplication is to take a list of numbers and eliminate any that are identical. You probably encountered this type of task in your first programming class. Life gets a bit more tricky when the values are expressed in different ways; for example, a mixed list with binary, hexadecimal, and real numbers plus a few more interesting versions tossed in for good measure. Deduplication becomes a bit more complicated.
At the other end of the scale, consider the challenge of examining two collections of electronic mail seized from a person of interest’s computers. There is the email from her laptop. And there is the email that resides on her desktop computer. Your job is to determine which emails are identical, prepare a single deduplicated list of those emails, generate a file of emails and attachments, and place the merged and deduplicated list on a system that will be used for eDiscovery.
Here are some of the challenges that you will face once you answer this question, “What’s a duplicate?” You have two allegedly identical emails and their attachments. One email is dated January 2, 2008; the other is dated January 3, 2008. You examine each email and find that difference between the two emails is in the inclusion of a single slide in the two PowerPoint decks. You conclude what:
- The two emails are not identical and include both and the two attachments
- The earlier email is the accurate one and exclude the later email
- The later email is accurate and exclude the earlier email.
Now consider that you have 10 million emails to process. We have to go back to our definition of a duplicate and apply the rules for that duplicate to the collection of emails. If we get this wrong, there could be legal consequences. A system develop who generates a file of emails where a mathematical process has determined that a record is different may be too crude to deal with the problem in the context of eDiscovery. Math helps but it is not likely to be able to handle the onerous task of determining near matches and the reasoning required to determine which email is “the” email.
Which is Jill? Which is Jane? Parents keep both. Does data work like this? Source: http://celebritybabies.typepad.com/photos/uncategorized/2008/04/02/natalie_grant_twins.jpg
Here’s another situation. You are merging two files of credit card transactions. You have data from an IBM DB2 system and you have data from an Oracle system. The company wants to transform these data, deduplicate them, normalize them, and merge them to produce on master “clean” data table. No, you can’t Google for an offshore service bureau, you have to perform this task yourself. In my experience, the job is going to be tricky. Let me give you one example. You identify two records which agree in field name and data for a single row in Table A and Table B. But you notice that the telephone number varies by a single digit. Which is the correct telephone number? You do a quick spot check and find that half of the entries from Table B have this variant, or you can flip the analysis around and say that half of the entries in Table A vary from Table B. How do you determine which records are duplicates.
Moore’s Law: Not Enough for Google
December 29, 2008
I made good progress on my Google and Publishing report for Infonortics over the last three days. I sat down this morning and riffed through my Google technical document collection to find a number. The number is interesting because it appears in a Google patent document and provides a rough estimate of the links that Google would have to process when it runs its loopy text generation system. Here’s the number as it is expressed in the Google patent document:
50 million million billion links
Google’s engineers included an exclamation point to US7231393. The number is pretty big even by Googley standards. And who cares? Few pay much attention to Google’s PhD like technical documents. Google is a search company that sells advertising and until the forthcoming book about Google’s other business interests comes out, I don’t think many people realize that Moore’s law is not going to help Google when it processes lots of links–50 million million billion give or take a few million million.
When I scanned “Sustaining Moore’s Law – 10 Years of the CPU” by Vincent Chang here, I realized that Google has little choice to use fast CPUs and math together. In fact, the faster and more capable the CPU, the more math Google can use. Name another company worrying about Kolmogorov’s procedures?
Take a look at Mr. Chang’s article. The graph shows that the number of transistors appear to keep doubling. The problem is that information keeps growing and the type of analysis Google wants to do to use various probabilistic methods is rising even faster.
The idea that building more data centers allows Google to do more is only half the story. The other half is math. Competitors who focus on building data centers, therefore, may be addressing only part of the job when trying to catch up with Google. Leapfrogging Google seems difficult if my understanding of the issue.
Getting Doored by Search
December 28, 2008
Have you been in Manhattan and watch a bike messenger surprised by a car door opening. The bike messenger loses these battles, which typically destroy the front wheel of the bike. When this occurs, the messenger has been doored. You can experience a similar surprise with enterprise search.
What happens when you get doored. Source: http://citynoise.org/author/ken_rosatio
The first situation is one that will be increasingly common in 2009. As the economy tanks, litigation is likely to increase. This means that you will need to provide information as part of the legal discovery process. You will get doored if you try to use your existing search system for this function. No go. You will need specialized systems and you will have to be able to provide assurance that spoliation will not occur. “Spoliation” refers to changing an email. Autonomy offers a solution, to cite one example.
The second situation occurs when you implement one of the social systems; for example, a Web log or a wiki. You will find that most enterprise search systems may lack filters to handle the content in blogs. Some vendors–for example, Blossom Search–can index Web log content. Exalead has a connector to index information within the Blogger.com and other systems. However, your search system may lack the connector. You will be doored because you will have to code or buy a connector. Ouch.
The third situation arises when you need to make email searchable from a mobile device. To pull this off, you need to find a way to preserve security, prevent a user from deleting mail from her desktop or the mail server, and deliver results without latency. When you try this trick with most enterprise search systems, you will be doored. The fix is to tap a vendor like Coveo and use that company’s email search system.
There’s a small consulting outfit prancing around like a holiday elf saying, “Search is simple. Search is easy. Search is transparent.” Like elves, this assertion is a weird mix of silliness, fairy dust, and ignorance. If this outfit helps you deal with a “simple” search, prepare to get doored. It may not be the search system; it may be your colleagues.
Stephen Arnold, December 28, 2008
Google Translation Nudges Forward
December 27, 2008
I recall a chipper 20 something telling me she learned in her first class in engineering; to wit, “Patent applications are not products.” As a trophy generation member, flush with entitlement, she’s is generally correct, but patent applications are not accidental. They are instrumental. If you are working on translation software, you may want to check out Google’s December 25, 2008, “Machine Translation for Query Expansion.” You can find this document by searching the wonderful USPTO system for US20080319962. Once you have that document in front of you, you will learn that Google asserts that it can snag a query, generate synonyms from its statistical machine translation system, and pull back a collection. There are some other methods in the patent application. When I read it, my thought was, “Run a query in English, get back documents in other languages that match the query, and punch the Google Translate button and see the source document in English.” Your interpretation may vary. I was amused that the document appeared on December 25, 2008, when most of the US government was on holiday. I guess the USPTO is working hard to win the favor of the incoming administration.
Stephen Arnold, December 27, 2008
The Future of EasyAsk: Depends on Progress
December 18, 2008
EasyAsk is a search system that works quite well. You can read EasyAsk Facts here. The company is now a unit of Progress Software. Progress began with a core of original code and over the years has acquired a number of companies. I think of the firm as a boutique, which is not what the Progress public relations people want me to keep in my tiny goose brain. I saw a news item about Progress Software’s most recent financial report. You can read a summary of the numbers here. If you want more detail, navigate to Google Finance here. The story is simple: earnings are down to $8.5 million from $15.8 million in the fourth quarter of 2007. With the economic climate in deep chill mode, Progress will have to retool its sales and marketing. If the downdraft continues, the company will have to make some tough decisions about which of its many products to hook up to life support. EasyAsk is like other search systems a complicated beastie. Search systems gobble up money, and the sales cycle is often long even when the MBAs are running at full throttle. When the MBAs are home worrying about their mortgage payments, the search business is likely to suffer. One warning sign: EasyAsk was not mentioned in the news release I read. This goose is accustomed to watching the weather for signs of a storm. My thought is that one might be building and heading the EasyAsk way. What’s your take? No PR people need reply, thanks.
Stephen Arnold, December
Leximancer Satmetrix Tie Up
December 18, 2008
Leximancer has partnered with Satmetrix so that company can utilize Leximancer’s Customer Insight Portal. Satmetrix provides software applications and consulting services to improve customer loyalty. Using “intuitive concept discovery” — semantic analysis — Leximancer develops responses on customer attitudes. Leximancer will provide customer analytics and unstructured text mining for Satmetrix’s Net Promoter, which automatically sifts and categorizes data from blogs, Web sites, social media, e-mails, service notes and survey feedback to increase companies’ customer loyalty, retention and growth. The focus on analyzing positive and negative trends in text entries from customers is key to speed and response for customer service-oriented companies. Satmetrix serves a wide spread of markets including telecommunications firms like Verizon and business services like Careerbuilder.
Jessica Bratcher, December 17, 2008
SharePoint: ChooseChicago
December 18, 2008
I scanned the MSDN Web log postings and saw this headline: “SharePoint Web Sites in Government.” My first reaction was that the author Jamesbr had compiled a list of public facing Web sites running on Microsoft’s fascinating SharePoint content management, collaboration, search, and Swiss Army Knife software. No joy. Mr. Jamesbr pointed to another person’s list which was a trifle thin. You can check out this official WSS tally here. Don’t let the WSS fool you. The sites are SharePoint, and there are 432 of them as of December 16, 2008. I navigated to the featured site, ChooseChicago.com. My broadband connection was having a bad hair day. It took 10 seconds for the base page to render and I had to hit the escape key after 30 seconds to stop the page from trying to locate a missing resource. Sigh. Because this was a featured site that impressed Jamesbr, I did some exploring. First, I navigated to the ChooseChicago.com site and saw this on December 16, 2008:
The search box is located at the top right hand corner of the page and also at the bottom right hand corner. But the search system was a tad sluggish. After entering my query “Chinese”, the system cranked for 20 seconds before returning the results list:
Expert System’s Luca Scagliarini
December 18, 2008
ArnoldIT.com’s Search Wizards Speak’s series has landed another exclusive. Hard on the heels of the interview with Autonomy’s chief operating officer, Luca Scagliarini, one of the senior executives at Expert System in Modena, Italy, explains the company’s technology and strategy for 2009. Mr. Scagliarini is a technologist’s technologist and a recognized leader in next generation search systems. The company’s COGITO technology has cut a wide swath through European markets and is now available in North America. Mr. Scagliarini told ArnoldIT.com’s Beyond Search:
A major mobile handheld manufacturer uses our technology to address the issue of supporting new users in learning how to use the device. The objective was to reduce the return rate of the device AND to reduce the customer support costs. This natural language-based solution leverages our semantic technology to provide their customers with a simple and effective tool to answer questions and how-to queries with consistency and high precision. As of today the system has answered, in only 5 months, more than 4 million questions with more than 87% precision.
Search is no longer key word matching and long lists of results. Mr. Scagliarini said:
To deliver an effective question and answer system that works on more than a small set of FAQ, it is very important to have a deep understanding of the text. This is possible only through deep semantic analysis. We have several implementations of our natural language Q&A product recently renamed COGITO Answer. In the next 12 months, we will be investing to expand our footprint worldwide–especially in the U.S. and in the Persian Gulf region to replicate our European success there. In the U.S, we are now supporting customer service operations with natural language Q&A for a government unit of the Department of the Interior and we are one of only 5 semantic partners actively promoted by Oracle.
You can read the complete interview with Mr. Scagliarini on the ArnoldIT.com Web site or you can click here. More information about the company and its technology may be found on the firm’s Web site http://www.expertsystem.net or click here.