Sophia Search
An Interview with David Patterson
Sophia, based in Belfast, Ireland, has entered the intensely competitive enterprise search market. I met with David Patterson, one of the founders of the company, in February 2011. His business partner is Dr. Vladimir Dobrynin. The system blends traditional methods with cutting-edge technology. In that discussion I learned that Sophia “uses the related disciplines of semiotics and Intertextuality to increase the findability of information – as opposed to other approaches that rely on taxonomies, basic mathematics or pattern matching.” Semiotics focuses on signs and symbols as indicators of meaning. As implemented in Sophia, the approach enables “Sophia to understand and interpret the meaning and context of information within documents.” |
Other vendors may also be trying to deploy next-generation methods to squeeze meaning from content, but Sophia has uniquely tuned its approach to understand meaning at two levels – the document level and the corpus level. Therefore the “meaning and relevancy of a document depends on both the user's query and importantly all the other documents within the organization.”
I found the approach quite suggestive. Dr. David Patterson is the CEO and co-founder of Sophia Search Ltd. Prior to founding the company, he was the director of an artificial intelligence research and development laboratory at the University of Ulster. Dr. Dobrynin heads the St Petersburg-based office of the company.
The full text of the interview appears below:
What's your background?
Prior to founding Sophia, I was Director of Research at an Artificial Intelligence Research Lab based at the University of Ulster in Northern Ireland. We did a lot of “geeky” work in data mining, machine learning and information retrieval.
I was collaborating with a colleague at St. Petersburg State University on the Information Retrieval work and that is where the technology behind the Sophia search and discovery tool was born.
That was Dr. Dobrynin?
Yes.
And you also have a degree in biochemistry from Queens University, right?
Yes, that blend of academic experiences has helped me think about the problems of information retrieval from different angles. The freshness of Sophia’s approach is a result of our teams’ broad academic foundation.
So, what's Sophia?
It is a lot of things. To describe it as a search engine isn’t telling the full story. I prefer to call Sophia a “contextual discovery engine.” Sophia can automatically disambiguate the different meanings of words based on their context within a document.
In short, Sophia searches by the meaning of what the user is looking for as opposed to just the key words they use in their query. Sophia enables users to discover contextually relevant information they were previously unaware of, and it increases the users’ understanding of their content.
One of the benefits of our technical approach is that Sophia operates without human guidance or training, and it does not require taxonomies, ontologies or thesauri.
What is it that you think people are looking for from information access technology?
Simply, people want answers to their problems. They want painless access to the right information that will provide them with those answers.
Won't free software capture a certain segment of the market and then lose out in big accounts where the buyer wants a known brand like IBM or Microsoft?
It’s an interesting point you raise about people ultimately migrating to the more tried and trusted vendors. But I suppose it’s about solving the real problems that people have. It’s back to my view that there is no such thing as the ultimate search tool for every occasion.
Ultimately a major factor driving customer decisions should be, “can the vendor solve my problems”?
At Sophia, we have put time into creating a list of questions designed to help users understand if Sophia is a contender for them. Just because the vendor is a Google or a Microsoft doesn’t mean they have all the answers in the field of search.
I don’t think even the giant search companies would try and claim that they do. Other factors to consider are price point, flexibility and compatibility So if the customer has needs beyond a big brand’s capability, then they need to consider other options.
What was the trigger in your career that made search and retrieval a focal point?
That’s a good question. I didn’t intentionally set out to develop a commercial information retrieval system.
What motivated me was solving the problems that faced the world of search from a research perspective. I was aware of the limitations of some commercial systems through my own research and experience.
The more I thought about the problems of locating the information I needed, I began to question the basic assumptions that conventional search vendors make.
What do you mean?
One big assumption that many search vendors make is that the user knows what he or she is looking for. That’s just not what I experienced or what our research said.
Another is that a user can form good quality queries to express their information needs. Again, experience and research revealed that users often find creating a quality query a challenge.
I agree but most vendors put a search box on a page and move on. What did you do with your insights into user behavior?
We believed that these basic assumptions were wrong and wanted to develop a search capability that enabled users to find what they needed even with poorly formed queries.
Also we believed that search shouldn’t just be about the REcovery of information the user knows about and expects to find. Early on, we realized that an effective system had to also be about the DIScovery of new information. That is information that the user was unaware of.
We realized that the conventional search tools and systems don’t address the discovery component of search. How can the user query for information they don't know exists?
Finally, we were fascinated by solving what we call “the context problem”. Most systems simply do not understand the context of information. Therefore, most search and retrieval systems provide a lot of irrelevant hits to the user.
Sophia is all about context and providing users with relevant information in the right context. It is about understanding the meaning of what the user is looking for, not simply returning lists of documents just because they contain the user’s query terms.
So yes commercially speaking there may have been easier nuts to crack but the commercialization of the technology was not an initial consideration.
After we worked out our approach, we began to realize just how well we had solved these issues. When we demonstrated Sophia to colleagues and a handful of search experts, we realized how useful the resulting tool was to those performing research within organizations.
What are the core features of Sophia's system?
There are a number of key features. Let me run down those that I find most important, but I cannot list every innovation our team has implemented.
No problem. Run down your list, please.
First, Effectiveness: Sophia automatically discovers themes from content and organizes documents into the themes they are most contextually relevant to.
When users pose a query they are presented with the contexts that most closely meet their needs. Users learn a lot about their data in this way and discover new knowledge that they were unaware of as a result.
So, discovery is a major part of the Sophia experience.
This sounds a bit like Endeca’s approach or the faceted methods used by many other vendors.
There may be some superficial similarities. But most of the commercial systems rely on a combination of controlled term lists, taxonomies, and basic statistical methods to determine the clusters. We don't rely on any “background knowledge” – we discover structure automatically from the content itself using our patented semiotic methods. We are not constrained by what is already known within a domain, but empower the discovery of new contexts and knowledge organically.
Okay, what’s another key feature?
Sophia is efficient. Because our method organizes and presents information contextually, users spend less time sifting through irrelevant information and can focus on information that they know is of value.
>Can you elaborate on what you mean by context?
Sure, let's take a very simple example – the query ‘java’. Using a conventional search tool you will typically get results that mix different contexts between one returned document and the next. For example the first document may be about programming, the next about coffee, the next indonesia, then maybe another programming context – and so on. The user has to spend their time sifting through long lists of information scanning for information that relates to their particular information needs at that time. Once they find a useful document they then need to continue scanning the list to find other documents of value. This is a very time consuming process.
Because Sophia automatically organises inofrmation into thematic folders it drastically reduces the time needed to locate lots of relevant information. In the example screen grab I have provided (based on a dataset provided by the New York Times), you can see Sophia discovers a folder on Microsoft software, a folder on horse racing (did you know java gold is a breed of horse? I didn’t), a folder on Indonesian politics, coffee, Javascript, prehistoric man, etc. This makes it really easy for a user to focus on a context of interest and grab lots of relevant information quickly without having to monotonously sift through topically diverse lists. This is a massive time saver for companies.
Click image for larger version.
Okay, now we have the context, what’s the payoff for the user?
Right, with the context for the query and the retrieved information, our approach yields increased understanding and clarity. One of the unsatisfactory features of today’s search tools is the poor way in which they summarize documents when returning results to users.
Often these summaries are based around the users query terms and do not convey the true meaning of returned documents.
Our research and testing showed that users find this approach misleading. Users click on documents only to find out they are not as relevant as the summaries had led them to believe.
Sophia is unique in that it presents summaries that focus on the core topic of the document, not just a user’s query term. In this way, users understand that a document will be useful before they click on it.
Sophia’s approach reduces the time for locating useful information, empowering users to do more of what they are employed to do – spend time acting on the information they have found and making better decisions.
That’s three. What’s the fourth key feature?
Semiotics. Uniquely Sophia is based on a model of linguistics called semiotics which is the science behind how we as humans understand the meaning of information in context. This is the power behind the technology that drives our Discovery Engine and ability to improve the findability of information.
The number of new companies entering the search and content processing “space” is increasing. What’s your view on a fiercely competitive and crowded market?
Good question. Competition is healthy. It's what drives us to innovate to be better than anyone else.
Today’s search market place is great for customers in terms of choice, but there is no such thing as the perfect search tool for every occasion, and anyone who claims that one company’s search system is the best is not being completely open.
What does the best mean?
It’s so hard to define and what tool is best depends on a number of factors including, the users search goals, their experience and their expertise within the domain. One tool may suit a user who just wants to retrieve one or two documents that meets their needs while another user may want all documents relating to a specific topic.
We have found that it is unlikely they will find the same tool to be best in both these search scenarios.
So what we do at Sophia is discuss with the customer right at the start where the strengths of Sophia lie as we don’t want to waste their time if we don’t believe it is the best tool for their needs.
Would you give me an example?
Sure. say you are interested in queries like “Where is Joe’s Office?” or “What is his phone number?”, deploying Sophia is overkill. We don’t add any further value over other tools such as Google Search Appliance or some of the other basic key word system in this instance.
But if you are interested in discovering what topics exist in your data, or unearthing new information related to your query that you didn’t know existed, or deciphering how documents are semantically linked to one another within a particular context or understanding the meaning of your information at a glance, then Sophia is a tool that is worth spending time evaluating.
There's a push to create mashups--that is, search results that deliver answers or reports. Does your system come to grips with meaning? Language is slippery.
Yes. But the Sophia technology is all about semantics and understanding meaning. Just not in the traditional sense which utilizes background knowledge structures such as taxonomies or knowledgebase like IBM Watson did on the game show Jeopardy.
Sophia discovers meaning from content automatically. It also creates customized search reports that can be exported as PDF or XML to enable ease of integration with other analytical tools within an organization.
Sophia enables and encourages results sharing among employees to reduce the amount of time people spend re-executing queries already carried out by others and it can automatically watch the corpus for new information indexed after a result set has been returned to the user.
If you think about it, your results are only valid at the precise moment you receive them. They immediately become outdated as new information could have arrived that wasn’t part of your old search results.
Sophia addresses this by alerting the user when new content is published of contextual relevance to them. The key point is being “contextually relevant”.
It’s not just a matter of alerting the user to the fact that another document has been indexed containing their search term. This approach is basically spam to the user. For example, if I search using “Java” as my query and my interest is in the island, I don’t want to know if a new document has arrived focusing on Web programming.
Are you supporting other vendors' systems or are you a stand-alone solution?
The answer to this is both!
We don’t necessarily see ourselves as directly competitive with other search vendors. The reason is that our technology is so fundamentally different providing search capabilities many other tools don’t have.
We have purposefully developed Sophia with an open architecture to enable ease of integration through our Java API’s and RESTful web services. In this way, we have made it easy to augment other search tools with Sophia’s contextual capabilities and to build additional applications based on third-party products.
Would you give me an example?
Yes. Our APIs have been used to augment existing content management and e-Discovery solutions, adding Semantic Search capabilities to previously limited Boolean search functionality. We are a very open company and are interested in finding additional partners to help build out the number of applications utilizing Sophia technology.
A number of vendors have shown me very fancy interfaces. The interfaces take center stage and the information within the interface gets pushed to the background. Are we entering an era of eye candy instead of results that are relevant to the user?
I hope not. I am not denying that interfaces are important in search, and it’s an area where Sophia could be improved to more optimally leverage the full potential of the knowledge we discover. Our view is that it’s the quality of the results that are the key thing that drives innovation and decision making in organizations.
A cool interface isn’t going to cut it on its own - you need the quality information to back it up. Given the choice between a cool interface with an average set of results or a usable interface and a great set of results I know which I would prefer to use.
What are the hot trends in search for the next 12 to 24 months? How will you take advantage of them; for example, go public, partner, sell to a larger firm, etc.
Ahh – the future of search? I think there will be continued consolidation in the market as competition continues to heat up & companies strive to keep ahead of the curve so they can offer their customers the best service. Going public isn’t really an option for too many companies right now. At Sophia we have a strong focus on partnering as part of our philosophy. We believe our technology is not just a stand alone product but also it is complementary to many existing solutions and can be used to enhance their capabilities.
Give me a 15 second summary of Sophia and tell me where I can get more information about Sophia.
I'll do it in two! We have built a tool to help people understand their content and find the information they need without bombarding them with noise and irrelevant information.
Our Web site has lots of further information, including videos, white papers, and technical data. That’s the best place to look.
ArnoldIT Comment
The interest in next-generation search systems is rising. When we use the phrase “next-generation search” at ArnoldIT.com, we are referring to the use of systems and methods that identify the “meaning” of content and the user’s query. The methods may be numerical, rules-based, automatic, or blended.
We think that Sophia’s approach is interesting and its emphasis on semiotics an approach that is a departure from key word matching. Our view is that an organization looking for a search solution will definitely want to navigate to the Sophia Web site, review the technical information, and get a demonstration or evaulation license.
Stephen E. Arnold, March 1, 2011