Semantico

An Interview with Richard Padley

Richard Padley of Semantico

At the online show in London, England, in December 2008, there was quite a bit of talk about semantic search and semantic processes. Semantic systems make an attempt to discern what a document is "about". Quite a few consultants and academics were talking about "aboutness", a neologism that I find awkward. One refreshing demonstration was the technology developed by Semantico, based in Brighton, East Sussex. The company works with a number of traditional publishers, and I found the firm's directness and its technology interesting. In January 2009, I caught up with the engaging founder of the company, Richard Padley. The full text of my interview with him appears below.

What is the problem your company’s software solves?

Semantico exists, as a company, to help publishers and information providers make their information more discoverable, more useful and more valuable. The technical challenge is how to deliver and monetize that information online in a way that’s sensitive to brand and content. Aggregators (which we are not) provide a valuable service, but they can’t deliver that level of sensitivity to the needs of the content itself. The history of publishing is one of diversity. Published content, intellectual output, never falls into the same size and shape: the challenge is how to let users interact with that content in the new ways afforded by online. That’s the problem we have designed our software toolset to solve, and the solution’s different for every client – because every client has slightly different needs within that broad area of need.

What’s different, I think, about the way we design and configure publishing platforms is our user-centered design process. We will always ask questions about how a user interacts with the content – how it sits within their workflow. This is the central issue for us: publishers are having to come to grips with having a direct relationship with every one of their users, and an ‘off-the-shelf’ solution is just not going to get them to the place they need to be in making that relationship work for both parties.

What's your background?

In the mid 90’s I was working at Macmillan on production systems for The New Grove Dictionary of Music. I built an SGML content management system, with editorial workflow and a management information and tracking system. We had a project called ‘Electronic Grove’. This was pre-web. They knew that electronic publishing was coming, but nobody knew exactly what it was, or what form it might take. They just knew that it was going to be big. Then Richard Charkin came to Macmillan and gave everyone a mandate to put stuff on the web quickly. I had built the editorial and content management and workflow systems for them, and then it was, ‘let’s start thinking about user interfaces, prototyping’ – putting it in front of people who might actually buy it. It was an exciting time. I realized very quickly that there was going to be a lot of this sort of work, and that what I was doing for Macmillan was fairly pioneering stuff - Encyclopaedia Britannica did it first, but apart from that there wasn’t much else out there: That’s where the idea for Semantico came from.

What is the major technical innovation in your system?

Innovative interfaces and innovative products for our clients.

Because of the sort of company we are, it’s less about technology invention for us than about adaptation and selection. Open source is very important to us, for instance, and we’re strong believers in using the best tool for the job. In the past 18 months we’ve started using the Mark Logic database to build the publishing platforms we deliver – which is a very innovative platform. It allows us to put different kinds of content together and query them in ways that allow publishers to build new kinds of products, while still respecting the different sizes and shapes that content comes in. ‘Content in context’ is a phase I hate, but it does provide a way of getting a handle on this for the layman: it’s about providing small snippets of contextualized information within the workflow. Say you’re a vet – we can build a platform that can integrate reference information about veterinary medicine with available drugs and your own patient records, and so on.

As you can see, the solutions are quite specific. We’re very driven in our approach to innovation by client need. We don’t design from a blank sheet, and it’s not about second guessing the user, like a straight product company. Innovation is a three-sided conversation for us, at least – often there are other stakeholders as well, such as institutional users. So in developing our toolset we have to have a very clear focus on what the upcoming client need is going to be. However our strong feeling at the moment is that it’s end-users who are driving the need, in the way they adopt. So it’s very important for us to look forward at what’s coming up over the hill, from the point of view of what users are doing – which new delivery platforms and devices are really proving to be a hit with users, for instance.

What is the basis of your firm's technical approach?

Open source technologies

Don’t let the tech drive the solution
Pick the right tools for the job
Work in an iterative manner
We’re keen on agile methodologies
We’re keen on the Java toolset and use components like Solr and frameworks like Spring MVC

Can you give me an example of your system in action? You don't have to mention a company name, but I am interested in what the problem was and what your system delivered to the customer?

CABI had a well-respected bibliographic abstracts database of around eight million records, built on a technology platform that had reached the end of its useful life. We took a fresh approach by deploying user-centered design processes and user testing to build a user interface that allowed their customers to explore the content in new ways. We also allowed CABI themselves to repackage that functionality on other products by building open web service APIs.

As you look forward what is your goal for the footprint in 2009?

We see two opposing forces dominating 2009 – the market will continue to expand, but economic conditions will mean that inevitably projects get pulled back and deferred. The prediction industrywide is for gains to be made in Semantic Web technologies, particularly where they can be applied in a focused manner to subjects like tagging and blogging. Cloud computing will continue to be important, and tech providers need to step up to the challenge of taking advantage of multi-core CPUs.

I recall hearing that your firm has semantic technology? What are the key features of the Semantico system?

We’ve used systems like Protégé to build taxonomies and ontologies for clients like Wiley Blackwell. We’ve integrated those into the search results offered by their platform to enable faceted navigation. We’re currently experimenting with named entity recognition within the Mark Logic database – so watch this space!

There's quite a bit of turmoil in search. In fact, the last few weeks Alexa (an Amazon company) closed its Web search unit and Lycos Europe (which purchased software from my partner and me in the mid 1990s) said it would close up shop. What's that mean for Semantico going forward?

The turmoil in the search market doesn’t really have a direct effect on us, but it has an indirect one in that it puts more power into the hands of Google and Microsoft – which in turn increases the pressure on publishers, frankly.

One trend in enterprise content processing is the shift from results lists to answers. Among the companies in this sector are Relegence (a Time Warner company), Connotate (privately held but backed by Goldman Sachs), and Attivio (a company describing itself as delivering active intelligence). Each of these firms is really in the search business but positioning search as "intelligence". What's your take on the changing face of search in an organization?

We come into touch with some of these issues in that the highly specialized content that we’re working with (e.g. dictionaries) doesn’t fit well into the standard search results model. We’re not about organizational search, we’re more about specialized content types and how to build the best solutions for end-users that respects the individual shape and character of that very specific content.

Google has been a disruptive force in search. In one US agency, different Google resellers have placed search appliances, often at $400,000 a unit in a major US government agency. No single person realized that there were more than $6 million worth of devices. As a result, the project to "fix" search means that Google is the default search system. What are the challenges and opportunities Google presents to Semantico? What about the challenges and opportunities Microsoft presents with its strong grip on the desktop and a growing presence in servers?

The challenge for us in this is how to get publishers to engage with Google and Microsoft in ways that allow them to minimize their risk whilst exploring the new business models available.

Mobile search is slowly making headway. Some of the push has been because of the iPhone and Google's report that queries on an iPhone are higher than from users with other brands of smartphones? What does Semantico provide for mobile search? Mobile content systems?

I’ve made a big case of being fairly vendor-agnostic in this interview so far, but I’ll declare an interest here: I’ve been an iPhone fan since you could first get hold of them – in fact I bought one on the first day they were available. We’re currently in early discussions with publishers about how to provide online dictionaries and online delivery of medical content, which is about all I can safely say, as it’s all sub rosa. The iPhone makes that kind of development a real snap. So while it’s early days for many clients with this one, I’d expect progress to be pretty rapid in the next couple of years – again driven by end-user adoption.

The economic climate is quite weak. How is Semantico adjusting to this global problem?

We’re having to be careful with our cost base – looking at everything we can do to make sure that, on the one hand we are bringing hosting costs down by using virtualization, and on the other, choosing technologies that enable us to get product out to market more quickly. Hard times don’t stop innovation necessarily – they just force you to be more creative. Which can be a good thing.

Can you hint at what's coming in 2009 in terms of features in the Semantico system?

That one’s simple to answer: more usage of RDF-based Semantic Web technologies. I’ve made no bones about the fact that we see the way ahead to be opening up our systems so that they can link into as many other systems as possible. The only way to do that is to embrace the Semantic Web.

ArnoldIT Comment

Semantico works with traditional publishers to deliver cost effective solutions that mesh with the business processes with which content generating companies are familiar. Semantico's approach is to use a range of tools--some open source and some from commercial software vendors--to streamline and enhance a publisher's content operations. Semantics play an important part, but the Semantico approach is tackle broader issues. Technology becomes a tool, not an end in itself. If you are engaged in commercial content production, take a look at the Semantico services and systems. Their web site is at www.semantico.com.

Stephen Arnold, January 19, 2009

Search AIT

Semantico

An Interview with Richard Padley