Ask.com’s Search Technology Advances

January 12, 2009

Ask.com keeps trying. On January 8,2009, the company announced “Semantic Search technology Advances from Ask.com.” You can read the company’s statement here. The company asserts:

In October last year we introduced our proprietary DADS (Direct Answers from Databases), DAFS (Direct Answers from Search), and AnswerFarm technologies, which are breaking new ground in the areas of semantic, web text, and answer farm search technologies. Specifically, the increasing availability of structured data in the form of databases and XML feeds has fueled advances in our proprietary DADS technology.  With DADS, we no longer rely on text-matching simple keywords, but rather we parse users’ queries and then we form database queries which return answers from the structured data in real time.  Front and center. Our aspiration is to instantly deliver the correct answer no matter how you phrased your query.

The idea is that a user–assuming there is enough traffic to make the site viable in 2009–can enter a query any way he or she wishes. The Ask.com system will figure out the query and provide a Direct Answer. Let’s check out the system.

My first query was, “What’s the daily show?” The system responded with the top result “The Daily Show with Jon Stewart.” Good. My second query was, “What is a dataspace’s application?” The system responded by asking me the question, “What is a data spaces application?” The first result was a link to Sourceforge’s information about EQUIP2. Sorry, the correct answer was in my mind a link to the ACM papers about dataspaces. My third query was, “What is an information manifold?” This is no trick question because there is a technical paper with a title that contains the bound phrase “information manifold.” The Ask.com system asked me, “What is an information mannford?” I don’t know what a “mannford” is.

For the types of questions a middle school student might ask, the new system will work pretty well. For popular culture topics, the system will probably be better than some I have examined this week. For the types of queries I have about technologies that address the known weaknesses of traditional semantic processing, Ask.com won’t help me too much. That’s good. Knowing what questions to ask allows me to feed my goslings. Ask.com won’t put me out of job this year. One final point: I clicked on “mannford”. It’s a a city in Oklahoma. No dataspaces among that state’s wide open spaces. Look west, young search, look to Mountain View, California.

Stephen Arnold, January 12, 2009

Xsearch CEO Norbert Weitkämper Interviewed

January 12, 2009

Weitkämper Technology–based in Staffelsee, Germany–is a search and content processing vendor with a low profile in North America. The firm offers its multi-source search suite that incorporates proprietary technology to deliver fast content and query processing. The company’s XSEARCH package is customizable to focus on the client’s specific need. It offers nine variables: Clustering Engine, Suggest, DidYouMean, Summarizer, Linguistic Engine, Federated Search, Facet Navigator, Entity Extractor and Intelligent Classifier.

The industrial engineer was dissatisfied with the search results available from commercial products. Norbert Weitkämper developed  Xsearch after working in electronic publishing. He told Search Wizards Speak:

As we are specialized on search for more than a decade our package is very well tuned; not only for speed but also for content for example. We will combine our new HitEngine with our established technologies like Linguistic, Did-You-Mean, clustering, synonyms and ontologies, or our personal ranking mechanisms. They are already released, we just have to melt them together.

He added:

For the complex roman languages our linguistic engine with its morphologic analysis is a big advantage, because algorithmic approaches like Bayesian or Porter, which are doing a good job for English, are a miserable failure.

On the subject of semantic analysis, Mr. Weitkämper said:

Semantic analysis is much more difficult for European languages than for English. We are already able to integrate thesauri or ontologies. I have not seen any system yet which meets the requirements for semantic analysis – at least when you have a closer look into the system. But storing information in a quick and accessible way is even more important for this approach, as you have to consider much more than only keywords and positions. So I can imagine that our optimized index structure may help also in this field to achieve adequate results in an acceptable amount of time.

More information about the company is available at its Web site, http://www.weitkamper.com. The full text of the interview with Mr. Weitkämper is at http://www.arnoldit.com/search-wizards-speak/xsearch.html.

Stephen Arnold, January 12, 2009

Google Semantics Surfacing

January 8, 2009

ReadWriteWeb.com (January 6, 2009) ran an interesting article that tiptoes around Google’s semantic activities. You will want to read “Did Google Just Expose Semantic Data in Search Results”. Google won’t answer the question, of course. But the addled goose will, “Yep, where have you been since early 2007?” Let me point out that Marshall Kirkpatrick has done a good job of tracking down “in the wild” examples of Google’s machine-based semantic methods. These examples (and others in Google open source documents) make it clear that the semantic activities are chugging along and maturing nicely. “Semantics” as used in this write up means “figuring out what something is about.” Once one knows the “about” part of an information object, then other methods can hook these “about” metadata together. If you want to get a sense of the scope of the Google semantic system, click here. I have a checking copy of the report I wrote for BearStearns before that outfit went up in flames or down the drain. (Pick your metaphor.) My write up here does not include the detail that is in the full discussion in Google Version 2.0 here. But this draft provides some meat for the “in the wild” examples found in Mr. Kirkpatrick’s good article. How significant is the investment in semantics at Google? You can find some color on the sweep of Google’s semantic activities in the dataspace white paper Sue Feldman and I wrote (September 2008). You can get this report from IDC; it is report number 213562.

Let me close with three observations:

  1. Google is deeply involved in semantics, but with a Googley twist. Watching for examples in the wild is a very useful activity, especially for competitors
  2. The notion of semantics is sufficiently broad to embrace metadata generation and new types of metadata so that new types of data constructs can be automatically generated by Google. Think publishing new constructs for money.
  3. The competitors chasing Google face the increasingly likely prospect that Google has jumped over its own present position and will land even farther ahead of the likes of IBM, Microsoft, Oracle, and SAP. Yahoo. Forget them. The smart Yahooligans are either at Google or surfing on Google.

Now I expect some push back from the anti Google crowd. Have at it. Just make sure you have internalized Google’s technical papers, Google “talks”, and the patent documentation. This goose is not too interested in uninformed challenges. You can read more about Google semantics and in my forthcoming Google and Publishing study from my trusty publisher Infonortics Ltd. located near Oxford, England, in the spring.

Stephen Arnold, January 8, 2009

New Conference Pushes beyond Search

January 5, 2009

After watching some of the traditional search and content processing conferences fall on their swords, muffins, and self-assurance in 2008, I have rejiggled my conference plans for 2009. One new venue that caught my attention is The Rockley Group’s event in Palm Springs, California, January 29-30, 2009. You can get more informatio0n about the program here. The event organizer is Ann Rockley, who is one of the people emphasizing the importance of intelligent content.

image

Ann Rockley, The Rockley Group

I spoke with Ms. Rockley on January 2, 2008. The text of that conversation appears below:

Why is another conference needed?

Admittedly there are a lot of conferences around for people to attend, but not one that focuses specifically on the topic of Intelligent Content. My background is content management, structured content and XML. There are lots of conferences that focus mainly on the technology, others that focus on the content vehicle or channel (e.g., web) and others that focus on XML. The technology oriented conferences often lose sight of the content; who it’s for, how can we most effectively create it and most importantly how can we optimize it for our customers. The content channel oriented conferences e.g. Web, focus only on the vehicle and forget that content is not just about the way we distribute it; content should be optimized for each channel yet at the same time it should be possible to repurpose and reconfigure the content for multiple channels. And XML conferences tend to be highly technical, focusing on the code and the applications and not on how we can optimize our content using XML so that we can manipulate it and transform it much the way we do data. So this conference is all about the CONTENT! Identifying how we can most effectively create it so that we can manipulate it, transform it and deliver it in a multitude of ways personalized for a particular audience is an area of focus sadly lacking in many conferences.

With topics like Web 2.0 and Social Search I am at a loss to know what will be covered. What are the issues your conference will address?

Web 2.0 is about social networking and sharing of content and media and it has had a tremendous influence on content. Organizations have huge volumes of content stuck in static web pages or files and they have a growing volume of content stuck, and sometimes lost in the masses of content being accumulated in wikis, blogs, etc. How can organizations integrate their content, share their content and make it most useful to their customers and readers without a lot of additional work? How do we combine the best of Web 2.0 with the best of traditional content practices? Organizations don’t have the time, resources or budget to do all the things we need and want to do for our customers, but if we create our content intelligently in the first place (structure it, tag it, store it) we can increase our ability to do so much more and increase our ability to effectively meet our customers’ needs. This conference was specifically designed to answer those questions.

Intelligent Content provides a venue for sharing information on such topics as:

  • Personalization (structured content, metadata and XQuery)
  • Intelligent publishing (dynamic multichannel delivery)
  • Hybrid content strategies (integrating Web 2.0 content with traditional customer content)
  • Dynamic messaging/personalized marketing
  • Increasing findability
  • Content/Information Management

Most attendees complain about two things: The quality of the presentations and the need for better networking with other attendees. How are you addressing these issues?

We are doing things a little differently. All the speakers have been assigned a mentor for review of their outline, drafts and final materials. We are working with them closely to ensure that the presentations are top notch and we have asked them all to summarize their information in Best Practices and Tips. In addition, Intelligent Content was designed to be a small intimate conference with lots of opportunities to network. We will have a luncheon with tables focused on special interests and we are arranging “Birds of a Feather” dinners where like-minded people can get together over a great meal and chat, have fun and network. We also have a number of panels which are designed to work interactively with the audience. And to increase the feeling intimacy we have not chosen to hold the conference in a traditional “big box” hotel, rather we have chosen a boutique hotel, the Parker Palm Springs (http://www.starwoodhotels.com/lemeridien/property/overview/index.html?propertyID=1911), a hotel favored by Hollywood stars from the 1930s. It is a very cool hotel with lots of character that encourages people to have fun while they interact and network.

What will you offer attendees?

The two day conference includes 16 sessions, 2 panels, breakfast, lunch and snacks. It also includes organized networking sessions both for lunch and dinner, and opportunities to ask the Experts key questions. And the conference isn’t over when it is over, we are setting up a Community of Practice including a blog, discussion forum, and webinars to continue to share and network so that every attendee will have an instant ongoing network.

I enjoy small group sessions and simple things like going to dinner with a group of people whom I don’t know. Will you include activities to help attendees make connections?

Absolutely. We deliberately designed the conference to be a small intimate learning experiencing so people weren’t lost in the crowd and we have specifically created a number of luncheon and dinner networking experiences.

How can I register to attend? What is the url for more information.

The conference information can be found at www.intelligentcontent2009.com. Contact info@intelligentcontent2009.com if you have questions. Note that the conference hotel is really a vacation destination so we can only hold the rooms at the special rate for a limited time and that expires January 12th so act quickly. And we’ve extended the early bird registration to Jan. 12 as well. If you have any other questions you can contact us at moreinfo@rockley.com.

Stephen Arnold, January 5, 2008

Interview Exclusive: Exalead’s New US Chief Executive Officer

January 5, 2009

On January 2, 2008, I spoke with Paul Doscher, the newly appointed chief executive officer for Exalead, the Paris-based information access company. I received a preview of Exalead technology in November 2008, and I will summarize some of my impressions in a short white paper on my ArnoldIT.com Web site in the next few days.

The full text of my interview with Mr. Doscher appears below:

Why are you expanding in the US market? What’s your background?

Exalead has seen tremendous growth in Europe over the past few years and unlike some of our competitors, our clients are with us for the long haul. We enjoy 100% customer referenceability in Europe. The US represents a significant growth engine for Exalead and we believe we are in a unique position not just to grow our US business – but to help redefine the information access industry.

I have been in the computer software space for 30 years starting in sales and sales management eventually leading to my most recent role as CEO. I have worked in companies such as Oracle, Business Objects and VMware. Before becoming CEO of Exalead, Inc I was CEO of JasperSoft, the leading open source business intelligence company.

What is the major content processing problem your system solves?

This is a new era in information access. In business, valuable information is increasingly stored in silos – dozens of various locations and data formats – that are hard to retrieve in a way that provides necessary context to the end user. Exalead CloudView has been designed to make sense of the structured and unstructured data found both internally behind the firewall and from external sources. Exalead offers quick-to-implement information access solutions that help workers, partners and customers make better, faster and more accurate business decisions.

What is the basis of your firm’s technical approach?

Exalead provides a highly scalable and manageable information access platform built on open standards. Exalead transforms raw data, whatever its nature, into actionable intelligence through best of breed indexing, extraction and classification technologies.

Can you give me an example of your system in action? You don’t have to mention a company name, but I am interested in what the problem was and what your system delivered to the customer?

Exalead is moving beyond what people generally think of when they think about enterprise search. I’ll give you two examples – one that discusses an innovative use case of searching structured data. The second discusses unstructured data.

First is an example of our dealing with structured data. GEFCO, €3.5 billion company, ranks among Europe’s leading transport and logistics firms. They are using Exalead to track their vehicles. GEFCO’s new “Track and Trace” application is built upon Exalead’s flagship platform that offers powerful search functionality and can provide up-to-the-minute information from an extremely large data set. Integrated into GEFCO’s Internet portal Gefconet, Track and Trace allows GEFCO staff, partners and customers to locate the exact position of vehicles, track their progress and optimize transport schedules in real time.

Second is a project where we search and make sense of unstructured data. Our engineers at Exalead built an unreleased project called Restminer – a site aimed at helping find restaurants in a large city like New York City. What we do here is interesting. Restminer gives the user useful, structured information extracted from the unstructured web including dedicated press, blog posts, restaurant reviews, directories – with relevant tips coming from different sources.

Exalead is French owned company. What’s the customer footprint? As you look forward what is your goal for the footprint in 2009?

At the end of 20008, we have around 190 customers across multiple vertical markets including on-line media/publishing, social networking, the public sector, on-line directories, financial services and telecommunications. We are looking for 50% growth in our customer base in 2009.

The Exalead software was quite solid? What are the benefits your system delivers to a typical enterprise customer? Is it search or another type of solution?

Exalead provides information access and search solutions in basically three market segments: OEM, B2C and B2B.

In the OEM [original equipment manufacturing] market, software companies have realized what a powerful embedded search platform can bring to their own solution. ISVs [independent software vendors] enrich their functional capabilities by introducing new sources of content and more powerful access retrieval into their core applications.

In the B2C space, consumer web sites such as our customer RightMove in the UK are finding that a highly scalable information access solution can save on hardware costs and make their visitor’s experience much better (for www.rightmove.co.uk). Globally, we are seeing sites use our cutting edge semantic mash-up technologies to bring search result from video, audio and text, such as http://virgilio.alice.it/ in Italy.

For our B2B customers, we are seeing companies implement real-time search across multiple data repositories. Any search platform tied to mission critical business applications have to be flexible, scalable and fast. Exalead’s product is used in various mission critical implementations, including track and tracing trucks; operational reporting and large scale document searches.

I recall hearing that your firm has patented technology? Can you provide me with a snapshot of this invention? What’s the patent application number? How many patents does your firm have? What are the key features of the Exalead CloudView system?

Exalead has a significant number of patents granted and pending both in the US and EU relating to the areas of intelligent searching, indexing, keyword extraction and other aspects of the search technology. For example, US Patent 7152064 was issued to Exalead in 2006, providing for improved unified search results – allowing for end users to more easily navigate and refine complex search results.

Our explosive growth continues to drive innovation and functionality into our products – we continue to submit for new patents as our product expands.

In the OEM sector, Autonomy seems to be the giant with its OEM deals with BEA and the Verity OEM deals. Some of the Verity deals date from the late 1980s. How do you see Exalead fitting into this sector?

There is always a place for innovation. We are confident in our capabilities and how they can meet the growing demands of OEMs.

We are beginning to see customers move away from our competitor’s legacy OEM solutions. We provide an easy to implement, scalable and manageable solution. Also, we see growing demand for our simpler licensing model – which makes life much easier for our customers.

Exalead OEM has all the rich features as our other product platforms such as Enterprise Search Edition and the 360 Edition. No matter how huge the volume of information processed by the OEM application, Exalead CloudView provides an easy to implement SOA architecture. OEM customers build applications that search their own system’s content – as well as from any kind of other sources that can be relevant. OEMs can dramatically increase their product functionality and differentiation by adding search of external Web sites, external knowledge bases and building in new hybrid services using our developer kit.

There’s quite a bit of turmoil in search. In fact, the last few weeks Alexa (an Amazon company) closed its web search unit and Lycos Europe (which purchased software from my partner and me in the mid 1990s) said it would close up shop. What’s that mean for Exalead going forward?

Our web search engine is available at www.exalead.com/search. Based on CloudView, it provides Internet users with an innovative way of discovering results and content from the Web’s 8 billion+ pages. Web search has always been a real world lab to test our technologies and user features – some of which, like facial recognition, have been implemented on Exalead well in advance of their use on other major search sites. But, more than this, we consider the Web as a key source of information – competitive intelligence, partner information, customer information, legal documents, external database providers, blogs, etc. There is more and more key information on the web that enterprises need to manage effectively. Exalead Web search is key in the overall Exalead strategy – and the functionality on our Internet search site will continue to drive innovation in our information access platform.

One trend in enterprise content processing is the shift from results lists to answers. Among the companies in this sector are Relegence (a Time Warner company), Connotate (privately held but backed by Goldman Sachs), and Attivio (a company describing itself as delivering active intelligence). Each of these firms is really in the search business but positioning search as “intelligence”. What’s your take on the changing face of search in an organization?

If making information instantly available for decisions is intelligence, we definitely are working in the information intelligence business. Our approach is driven by customer demand for TCO and ROI – we bring real value to businesses looking to make better, faster decisions. For example, at our customer GEFCO, structured data is available in real time for staff and customers so transportation cycles can be adjusted in real time – significantly improving their bottom line.

As the economic crisis depends, we continue to see our partners such as Capgemini, Logica, and Sogeti come up with new, exciting solutions for Exalead CloudView for their customers.

Google has been a disruptive force in search. In one US agency, different Google resellers have placed search appliances, often at $400,000 a unit in a major US government agency. No single person realized that there were more than $6 million worth of devices. As a result, the project to “fix” search means that Google is the default search system. What are the challenges and opportunities Google presents to Exalead? What about the challenges and opportunities Microsoft presents with its strong grip on the desktop and a growing presence in servers?

Ironically, former Google and Microsoft customers fuel much of our sales funnel – so we appreciate and benefit from everyone’s niche in this marketplace.

Google raised end-user expectations about what web search can achieve – it brought a new level of simplicity, relevancy and interactivity. But as we’ve seen as more Google Enterprise Search customers move to Exalead – bringing this functionality to enterprises is a different matter all together.

Google Enterprise Search has technical and functional limits in terms of scalability, security compliance, the ability to search structure and unstructured data and the ability to provide all the necessary context to make a search relevant. Enterprises know that information access means more than a flat list of results – which is driving more companies to look at Exalead.

Microsoft and its acquisition of FAST Search & Transfer brought many opportunities to us as well. For example, we’ve seen a growing number of companies who use Linux or other non-Microsoft operating systems look for a new partner instead of Microsoft.

Mobile search is slowly making headway. Some of the push has been because of the iPhone and Google’s report that queries on an iPhone are higher than from users with other brands of smart phones? What does Exalead provide for mobile search?

Exalead is actively working with mobile companies and telcos in a number of ways. We launched an iPhone search www.exalead.com/iphone in Europe. We are also working with mobile companies to help connect mobile devices to PCs and help accelerate access to mobile content. We will announce more of this functionality in 2009.

The economic climate is quite weak. How is Exalead adjusting to this global problem? I have heard that you have built out a US office with more than two dozen people? Is that correct?

We met all of our aggressive sales numbers in 2008 – in large part because our technologies provide our customers a high return on their investment. We unleash new levels of information access and allow better, faster decision-making. So far, it appears the appetite for our offerings is growing in this economic client.

What are the three major trends you see with regards to search and content processing in 2009?

The biggest trend we see in 2009 is that search will become a development platform. Open product platforms like Exalead will become a platform for new, unexpected solutions by 3rd party vendors.

Other big trends in 2009 will be continuation of what we’ve seen over past few years: smarter context around search results and better searching of rich content including audio and video.

Can you hint at what’s coming in 2009 in terms of features in the CloudView system?

The launch of Exalead CloudView 360 later this year will be a game changer for the industry. Exalead CloudView 360 will have functionality that will transform heterogeneous corporate data into contextualized building blocks of business information that can be directly searched and queried – and allow for an explosion of new applications to be built on top of the platform.

Stephen Arnold, January 5, 2008

Google Translation Nudges Forward

December 27, 2008

I recall a chipper 20 something telling me she learned in her first class in engineering; to wit, “Patent applications are not products.” As a trophy generation member, flush with entitlement, she’s is generally correct, but patent applications are not accidental. They are instrumental. If you are working on translation software, you may want to check out Google’s December 25, 2008, “Machine Translation for Query Expansion.” You can find this document by searching the wonderful USPTO system for US20080319962. Once you have that document in front of you, you will learn that Google asserts that it can snag a query, generate synonyms from its statistical machine translation system, and pull back a collection. There are some other methods in the patent application. When I read it, my thought was, “Run a query in English, get back documents in other languages that match the query, and punch the Google Translate button and see the source document in English.” Your interpretation may vary. I was amused that the document appeared on December 25, 2008, when most of the US government was on holiday. I guess the USPTO is working hard to win the favor of the incoming administration.

Stephen Arnold, December 27, 2008

Semantics Where None Had Gone Before

December 20, 2008

My view of semantic technology is that it is plumbing. Users have other tasks to complete so making time to add tags is limited. Technology Review, the nerdy corollary to the Harvard Business Review, published a remarkable article here. “Semantic Sense for the Desktop” by Erica Naone reports that the Nepomuk Project will deliver to me a semantic desktop. Oh, goodie. The idea is that the

the software adds a lot of semantic information automatically and encourages users to add more by making annotated data more useful. It also provides an easy way to share tagged information with others.

No less a luminary than Nova Spivack says to Ms. Naone:

This might be the semantic desktop that actually survives,” says Nova Spivack, CEO and founder of Radar Networks, the company behind Twine, a semantic bookmarking and social-networking service. “There’s a lot of potential to build on what they’ve done.” Spivack notes that other efforts to bring semantic technology to the desktop haven’t succeeded in reaching end users. “Nepomuk is designed for real people and developers…”

Google’s approach, if I read Ramanathan Guha’s five patent documents accurately is that if user, software, and combinations can’t do the semantic tagging job, Google will. The Google does not say too much about semantics, but my inclination is that a system that keeps semantics away from the user may be the one that succeeds. In a race between Nepomuk and Googzilla on whom will you wager?

Stephen Arnold, December 20, 2008

Leximancer Satmetrix Tie Up

December 18, 2008

Leximancer has partnered with Satmetrix so that company can utilize Leximancer’s Customer Insight Portal. Satmetrix provides software applications and consulting services to improve customer loyalty. Using “intuitive concept discovery” — semantic analysis — Leximancer develops responses on customer attitudes. Leximancer will provide customer analytics and unstructured text mining for Satmetrix’s Net Promoter, which automatically sifts and categorizes data from blogs, Web sites, social media, e-mails, service notes and survey feedback to increase companies’ customer loyalty, retention and growth. The focus on analyzing positive and negative trends in text entries from customers is key to speed and response for customer service-oriented companies. Satmetrix serves a wide spread of markets including telecommunications firms like Verizon and business services like Careerbuilder.

Jessica Bratcher, December 17, 2008

Expert System’s Luca Scagliarini

December 18, 2008

ArnoldIT.com’s Search Wizards Speak’s series has landed another exclusive. Hard on the heels of the interview with Autonomy’s chief operating officer, Luca Scagliarini, one of the senior executives at Expert System in Modena, Italy, explains the company’s technology and strategy for 2009. Mr. Scagliarini is a technologist’s technologist and a recognized leader in next generation search systems. The company’s COGITO technology has cut a wide swath through European markets and is now available in North America. Mr. Scagliarini told ArnoldIT.com’s Beyond Search:

A major mobile handheld manufacturer uses our technology to address the issue of supporting new users in learning how to use the device. The objective was to reduce the return rate of the device AND to reduce the customer support costs. This natural language-based solution leverages our semantic technology to provide their customers with a simple and effective tool to answer questions and how-to queries with consistency and high precision. As of today the system has answered, in only 5 months, more than 4 million questions with more than 87% precision.

Search is no longer key word matching and long lists of results. Mr. Scagliarini said:

To deliver an effective question and answer system that works on more than a small set of FAQ, it is very important to have a deep understanding of the text. This is possible only through deep semantic analysis. We have several implementations of our natural language Q&A product recently renamed COGITO Answer. In the next 12 months, we will be investing to expand our footprint worldwide–especially in the U.S. and in the Persian Gulf region to replicate our European success there. In the U.S, we are now supporting customer service operations with natural language Q&A for a government unit of the Department of the Interior and we are one of only 5 semantic partners actively promoted by Oracle.

You can read the complete interview with Mr. Scagliarini on the ArnoldIT.com Web site or you can click here. More information about the company and its technology may be found on the firm’s Web site http://www.expertsystem.net or click here.

Semantic Search Laid Bare

December 17, 2008

Yahoo’s Search Blog here has an interesting interview with Dr. Rudi Studer. The focus is semantic search technologies, which are all the rage in enterprise search and Web search circles. Dr. Studer, according to Yahoo:

is no stranger to the world of semantic search. A full professor in Applied Informatics at University of Karlsruhe, Dr. Studer is also director of the Karlsruhe Service Research Institute, an interdisciplinary center designed to spur new concepts and technologies for a services-based economy. His areas of research include ontology management, semantic web services, and knowledge management. He has been a past president of the Semantic Web Science Association and has served as Editor-in-Chief of the journal Web Semantics.

If you are interested in semantics, you will want to read and save the full text of this interview. I want to highlight three points that caught my attention and then–in my goosely manner–offer several observations.

First, Dr. Studer suggests that “lightweight semantic technologies” have a role to play. He said:

In the context of combining Web 2.0 and Semantic Web technologies, we see that the Web is the central point. In terms of short term impact, Web 2.0 has clearly passed the Semantic Web, but in the long run there is a lot that Semantic Web technologies can contribute. We see especially promising advancements in developing and deploying lightweight semantic approaches.

The key idea is lightweight, not giant semantic engines grinding in a lights out data center.

Second, Dr. Studer asserts:

Once search engines index Semantic Web data, the benefits will be even more obvious and immediate to the end user. Yahoo!’s SearchMonkey is a good example of this. In turn, if there is a benefit for the end user, content providers will make their data available using Semantic Web standards.

The idea is that in this chicken and egg problem, it will be the Web page creators’s job to make use of semantic tags.

Finally, Dr. Studer identifies tools as an issue. He said:

One problem in the early days was that the tool support was not as mature as for other technologies. This has changed over the years as we now have stable tooling infrastructure available. This also becomes apparent when looking at the at this year’s Semantic Web Challenge. Another aspect is the complexity of some of the technologies. For example, understanding the foundation of languages such as OWL (being based on Description Logics) is not trivial. At the same time, doing useful stuff does not require being an expert in Logics – many things can already be done exploiting only a small subset of all the language features.

I am no semantic expert. I have watched several semantic centric initiatives enter the world and–somewhat sadly–watched them die. Against this background, let me offer three observations:

  1. Semantic technology is plumbing and like plumbing, semantic technology should be kept out of sight. I want to use plumbing in a user friendly, problem free setting. Beyond that, I don’t want to know anything about plumbing. Lightweight or heavyweight, I think some other users may feel the same way. Do I look at inverted indexes? Do you?
  2. The notion of putting the burden on Web page or content creators is a great idea, but it won’t work. When I analyzed the five Programmable Search Engine inventions by Ramanathan Guha as part of an analysis for the late, great BearStearns, it was clear that Google’s clever Dr. Guha assumed most content would not be tagged in a useful way. Sure, if content was properly tagged, Google could ingest that information. But the core of the PSE invention was Google’s method for taking the semantic bull by the horns. If Dr. Guha’s method works, then Google will become the semantic Web because it will do the tagging work that most people cannot or will not do.
  3. The tools are getting better, but I don’t think users want to use tools. Users want life to be easy, and figuring out how to create appropriate tags, inserting them, and conforming to “standards” such as they are is no fun. The tools will thrill developers and leave most people cold. Check out the tools section at a hardware store. What do you see? Hobbyists and tinkerers and maybe a few professionals who grab what they need and head out. Semantic tools will be like hardware: of interest to a few.

In my opinion, the Google – Guha approach is the one to watch. The semantic Web is gaining traction, but it is in its infancy. If Google jump starts the process by saying, “We will do it for you”, then Google will “own” the semantic Web. Then what? The professional semantic Web folks will grouse, but the GOOG will ignore the howls of protest. Why do you think the GOOG hired Dr. Guha from IBM Almaden? Why did the GOOG create an environment for Dr. Guha to write five patent applications, file them on the same day, and have the USPTO publish five documents on the same day in February 2007? No accident tell you I.

Stephen Arnold, December 17, 2008

Stephen Arnold

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta