Sinequa SA
An Interview with Jean Ferré
In December 2007, I visited the exhibits at the International Online Meeting in London. I snagged a technical paper with the title "A New Generation Enterprise Search Solution". The analysis was the work of Sinequa SA, a company with which I was not familiar. A 10-minute jaunt through the Byzantine, hard copy conference program told me that Sinequa had an exhibit. I made a note that when I was next in Paris, I would visit Sinequa and find out more about their flagship product, Sinequa CS. I met the urbane Jean Ferré in Le Café de la paix, not far from the must-visit Opéra on the Boulevard de Capuchines. |
There were no monks present, but I saw some capuchine pigeons pecking in Paris's late afternoon rain. Very picturesque. Very Parisian. I spoke with the firm's managing director. The text of that interview appears below:
Thanks for meeting me here. Are there any monks nearby?
No, just tourists and pigeons--even in the rain as you see.
What's the origin of Sinequa? The Latin phrase sine qua means condition, doesn't it?
Sinequa is based on an earlier technology company, CORA. In 1984 Cora was funded and was from the beginning focused on automatic treatment of Language applied to information management. CORA hosted research projects and delivered services to several publishing companies and to some government agencies.
Over time, Cora evolved to be a software vendor. In 2000, the name Sinequa was chosen to suggest analysis. The company also continued to develop its expertise and intellectual property in the field of information access of which search is a part. Even though the name changed, the customer base kept its publishing customers and expanded into government projects, industry, and eCommerce.
In 2004, Alexandre Bilger and I did a leveraged buy out. Alexandre is the chief architect, and I am the managing director. This means I get to do a little bit of everything.
Now Sinequa has been focusing on packaging a turnkey solution for enterprise search needs.
We are a little surprised at how much appetite the market has for our approach to information and search. We have tripled in size measured in terms of financial turnover. If we stay on track, we will grow even faster in 2008. In 2006, we received some $6 million in backing from a venture capital firm. in the form of venture capital. We are profitable I am happy to say.
Where do you fit in this very complex "search" market sector? Are you a search and retrieval system, a business intelligence system, a content processing system for business intelligence?
We are a search-and-retrieval system focused on the enterprise promoting our “Connect to Knowledge™” approach. What's different is that our technology is a self-contained packaged delivered in two formats:
First, we offer a flagship solution called Sinequa CS. I'm delighted to say that our sales doubled in 2007. Sinequa CS consists of a full fledged packaged platform including connectivity, navigation and obviously the core engine deployed in a large number of enterprises such as Bouygues, Arkema, MBDA, the French Army, EADS, Eurocopter, LCF Rothschild, the French Police, etc.
Second, we have what we call the OEM offer (original equipment manufacture license). Another software company licenses our technology an uses it in their enterprise system. Some OEMs embed our technology in enterprise applications, Web sites, or inside Intranets.
Are there public Internet sites where I can use Sinequa?
Yes, when you get a chance navigate to Wiko, Les Echos.fr, Radio France International , Le Monde.fr, Agefi.fr, Europages, or Pixmania.com.
Are you able to mention some OEM customers?
Of course, but you cannot log into these systems. We have Instanet, BlueKiwi, and SGT as OEM partners deploying their packaged solution worldwide. Let me give you an example: Instranet is deployed in Russia, the US, and Western Europe mainly in the telco and financial market sectors.
Will you describe the technical concept that makes your company different from Autonomy, Endeca, or Google?
That's a good question, and we get asked about this quite often. Let me identify the main differences.
First, Connectivity "out of the box" to more than 30 applications (ECM, CRM, File, RDMS, ERP, e-mail, etc.) including security, extensive data and metadata indexing and the ability to expand to enterprise legacy and homegrown application via a connector SDK.
This means that Sinequa can be deployed without the manual work required by some of the systems you mention. Our customers don't want to build a system. Our customers want a solution. Here's an example of what Sinequa delivers.
It takes typically 2 hours to connect to a given application with a packaged Sinequa connector and typically five days to build a new connector. Benefits are dramatic reduction of cost and complexity, and real capacity for our customers to maintain a search solution that follows the information perimeter of their company.
Second, our relevancy. "Out of the box" our system requires minimum tuning. The system is ready to use. Our approach uses a unique combination of statistical, structured, linguistic and semantic indexes. The technology benefits from more than 20 years of R&D and our many years of experience with enterprise information access. Those years allowed enriching the dictionaries (across numerous languages), the rules set, the algorithms.
Those components are such that once installed, Sinequa can be quickly customized to meet a customer's specific needs. No specific development is required other then importing company or business specific dictionaries or terms and synonyms.
Our technology that also enables entity extraction and meaning extraction (thus enabling tag generation to use fashionable words) is an important part of the “Connect to Knowledge™” approach as it contributes to transforming information into knowledge (which Sinequa defines as information with relevant context and links). Our customer can use whatever jargon is current, and our system handles that type of language use.
Third, and finally, we deliver strong, "out of the box" security. I've implied this when I spoke about our connectivity and structured index, but this is obviously a major highly regarded advantage as no additional work is required neither on the sources (non intrusive approach) or on the search environment (no additional hard and expensive to maintain security layer)
But I think the key differentiators are driven by our continuous involvement in research projects. Research consumes about half of our R&D. We work with leading labs such as LIMSI (Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur), INRIA (Institut National de Recherche en Informatique et en Automatique) and involving major customers and partners such as EADS, DGA (Direction Générale de l’Armement), EDF (Electricity of France), OverBlog, Vecsys, AFNOR (Association Française de Normalisation)
Enterprise search or what I call behind-the-firewall search is a very general idea. Can you provide some depth about what you customers are doing with your firm's system? Is it customer support? Litigation support? General search and retrieval?
The answer is, "It completely depends on the industry." People are using Sinequa for many different functions, solving many different problems.
We have a customer, Akema. This is a high-traffic internal Web site for legal, human resources, and information technology professionals. Akima is using Sinequa to do some content management type of functions; for example, respond to questions during a financial audit when R&D projects are funded by third parties and Akima must prove the amount of work cone on each project. Akema also uses our system to prove the anteriority of invention when patents are questioned or when other competitors claim anteriority. When a new research project is funded, our system makes it possible to steer the new research project based on prior projects' results.
Another example is the French police. Investigators use our system to reduce the time required to conduct certain inquiries. We have been told that our system may double the close rate for certain cases. This is an interesting item because if this rate holds up, it means that the police can do much more work with existing staff.
We also know that LCF Rothschild, a large financial institution, has been using our system to deal with the information about its investments. A typical fund manager receives a large volume of economic research from brokers plus the normal 400 to 800 e-mails a day. Needless to say an analyst can’t get much value out of that maze when making investment decision. The bank allows the analysts to extract what's important from the source (Internet, Intranet, applications) to help making educated decision faster.
At the other end of the spectrum, Bouygues construction uses Sinequa to move information from more seasoned managers to younger professionals in the firm. Sinequa is providing knowledge transfer in a modern construction setting.
How much customization and tuning must a customer do before your system can be deployed? Some systems like Autonomy, Endeca, and Fast Search require quite a bit of work before deployment. Others like Google and Thunderstone are ready to deploy in a matter of a day or two. Where does Sinequa fall on this spectrum?
I think Sinequa falls in between a "search toaster" and a box of technical parts you assemble. We resolve the complexity of exhaustive secured connectivity, profile based interface and yet best in class relevancy but delivers much faster at a much lower cost and complexity.
Simply put, Sinequa wins in about 80 percent of the information access competitions after a proof of concept because of the speed with which the system can be deployed and the small amount of time required for tuning and customizing the system.
We are now offering a turnkey deployment for enterprise content. If the client wants to search and process information in file systems, relational databases, Microsoft SharePoint, the Web crawling, RSS and enterprise content management--no problem. We can have the company up and running in four days. As an example; we recently were chosen in replacement of Autonomy by one of the largest global IT integrator for its worldwide internal search. We had to compete with what the IT director wanted--Google. We won this important contract, and we will be providing more information later this year.
What APIs are available for your product?
You can find details about the application programming interface on our Web site.
In a nutshell, we support a number of platforms and languages; for example, Java, Dot Net, C++, Active Server Pages, and the Web services scripting languages like php and per.
How does a licensee integrate Sinequa into a third-party enterprise application?
Again, we've tried to cover the most widely used techniques. We support such drivers as ODBC, JDBC, ADO, and ADO Dot Net, among others. Sinequa CS can accept incoming queries in almost any flavor of SQL, including PL/ SQL.
How does this work?
We have created some extensions. Give me your napkin. [Writes on the napkin this code]:
SELECT * FROM lagardere WHERE (Text in '/Film/title' contains 'Shrek' and modified = 2007-06-06') and globalrelevance > 40 SKIP 0 COUNT 20
We also support SOAP and REST, as I mentioned a moment ago.
Vendors often say they have an adaptor, and then customers find out that the vendor charges for an adaptor because it wasn't really ready for use in a live environment. What adaptors do you include in your Sinequa CS?
When a customer installs Sinequa CS it has connectors for accessing file systems, a Web crawler, an RSS connector, the database connectors I mentioned, and support for comma separated value data. We also provide external XML specific adaptors and adaptors for Microsoft Office, including the 2007 DOCX format.
But you didn't mention any applications?
Okay, we also have these application connectors. I'm doing this from memory, so I may miss one or two. Here goes. We have SharePoint 2001, 2003, and 2007 connectors. We also have the WSS 1.- to WSS 3.0 connectors [WSS is an acronym for Windows SharePoint Services] and the ubiquitous Microsoft Exchange and Outlook.
We also have connectors today in our product for EMC Documentum and the eRoom application, IBM's FileNet 5.4 and P8 versions, Lotus Notes, Alfresco, Nuxeo, Oracle BEA Plumtree Corporate Portal, Xerox Docushare, Sesis Poseidon, OpenText's Hummingbird DM and LiveLink, WebDav, and, of course, LDAP.
SharePoint is a moving target. What do you do with content in SharePoint?
There are easy-to-implement choices. For example, you can use Sinequa as a portlet within SharePoint. So it looks and feels like SharePoint, but the user gets speedier, more robust information access methods. You can also use Sinequa as a replacement for the search built into SharePoint. We are fully compliant with the OpenSearch specification, so we integrate in the Microsoft federated search framework.
Will this be affected by the Fast Search acquisition?
Who knows. We think it will take some time for Microsoft to figure out what it will be doing with Fast technology. In the meantime, customers can't wait.
Do you support the more than 200 file formats that vendors tell me are built in their systems?
No, we support more than 300 formats, so maybe that's a plus? We don't talk about that too much because the more sophisticated connectors like the FileNet and EMC components have more value than reading everyday word processing files which is a pretty easy task.
When your system processes content, what's happening under the hood? What are the indexes for your system?
Again this is hard to describe in one or two sentences. We have a statistical index. This ranks a word's importance based on frequency. We have a structured index which contains the metadata extracted by the connectors for each specific source recognized. So, if metadata are in EMC Documentum, we use that.
We also have a linguistic index.
Can you describe this to me?
Sure, the approach we take is a POS or part of speech analysis based on large dictionaries and linguistic rules. We auto-detect stop words. We stem; for example, horses = horse. We identify composed or compound phrases such as stock exchange as one word. We also recognized agglutinated words like skateboard as skate board. And we extract entities from the dictionaries. Rules can be set to add new words and perform filtering and navigation by a person's name, company name, license plates, telephone number, and so on. If a customer has a specific vocabulary for a chemical or pharmaceutical business, then we can import those specific word lists using the graphical administrative interface.
Do you use any semantic techniques?
Yes, we have a patented technology. We associate each document and query with a mathematical vector in a space with 800 dimensions.
Is that a reflection of the complexity of French culture?
No, it's a mathematical technique French engineers developed, but it has great use for people from other countries tool
The benefit of our patented approach is that search is performed independently of the user's query. A term mentioned in a query might not be in a retrieved document. Our technique identifies the match because the indexed document has a similar meaning. So, we are able to rank this document as relevant. Other systems can miss these documents because many systems are still at their core based only on key words.
Does Sinequa support parametric search on values?
Yes, you can use our counting engine. For example, you can display the number of elements associated with a given piece of metadata. For example in the Location tab, if five relevant documents contain or are related to the UK will show the value five as a match. Count can be used or ranking in our interface.
Do you have an office in the US? Many vendors are opening offices because the US market seems hungry for systems that offer "beyond search" functions?
At this time, no, we don't have a US office. We do have some OEM partners. For example, ATOS Origin is using our system as its enterprise search system. We are looking for more partners to help us in the US. If you know of some people who might be interested, let me know, okay?
Sure. Let's jump back to the patents. Are these pending or granted?
Yes, our semantic (800 dimension vector based technology) is patented in the US as "Process for Storing Text and Procedure for Searching Stored Texts for Those Pertinent to a Question". We received it on July 12, 2005. I don't recall the number, however.
How many employees do you have?
I think we have about 50, maybe a few more. Most people are in Paris, but we have a number of projects underway with various research labs and some universities. As I said, we do fundamental research. Productization is a separate function for us.
What are some of the trends you see in behind-the-firewall search in the next nine to 18 months?
Hmmm. Okay, I think one big trend is that organizations will become more adamant in their demand for systems that can transform information access into directly usable knowledge. I think users will expect the system to know them via a profile. Instead of just key words, the need for interfaces that guide or show information will become a "must have".
What is Sinequa doing to respond to this trend?
We're working on interfaces that allow the user to navigate in a virtual graphical space. We are exploring what we call a "center of interest". This is a combination of the “Second Life” virtual space applied to search.
We are also are exploring what we call "blogoscopy". For our researchers, we want to manage reputation, emotional reaction to content and context to anticipate crisis, evaluate satisfaction, and react to those.
We're finding that organizations are starting to pay attention to multi-media information. In large organizations, this means handling content in different languages.
Microsoft has made its DOCX format into a somewhat controversial standard. What's the impact of this?
We are very supportive of standards. For example, we participate in Lyrics, the Linguistic Infrastructure for Interoperable Resources and Systems (Member of LIRICS Industry Advisory Group (IAG). We have this spring about 10 projects related to standards.
ArnoldIT Comment
Sinequa is an interesting company because it jumps over search into a more sophisticated solution approach to information access. The company's fast-cycle deployment and large number of connectors give it an advantage over certain well-known behind-the-firewall search systems (enterprise search systems) that are essential a "box of parts". Sinequa's customization requires less hassle and makes its competitors three to 12 month implementation schedules look like dinosaurs in St. Tropez--creatures from another time in the wrong place. Give Sinequa a close look. Keep in mind that the company is very French. You get to have a good lunch, and you will have an opportunity to polish your French. More information about Sinequa is here.
In Beyond Search, Vivisimo is one of a handful of companies I pegged as "up and coming vendors to watch". The company provides solutions that users embrace. The interface innovations rest upon a solid core of engineering. Unlike many search systems, Vivisimo delivers performance, scalability, and stability. These virtues complement the sophisticated on-the-fly text processing functions that make content access and exploration easy, fast, and enjoyable. Check out Sinequa.
Stephen E. Arnold, April 21, 2008