Ontos: a Text Processing Company, Not a Weapon

June 5, 2008

In a conference call yesterday (June 4, 2008), someone mentioned “Ontos”. Another person asked, “What’s an Ontos?” I answered, “An anti-tank vehicle” What I remembered about the Ontos is that it was a tank loaded down with so many weapons I a turtle was speedier. Big laugh. Ontos is a company engaged in text and content processing with a product called ObjectSpark. To fill in the void in my knowledge, I navigated to the GOOG, plugged in “Ontos” and found a link to a 2001 article in Intelligent Enterprise, a very good Web site now that the print magazine has been put out to pasture. You can read the description here.

The company’s English language Web site is at www.ontos.com. The product line up no longer relies on the ObjectSpark name. You can license:

  • OntosMiner, which “analyzes natural language text. It recognizes objects and their relations and adds them as annotations to the related text parts. The technology is based on semantic rules, i.e. NLP (Natural Language Processing). It uses ontologies to define the area of interest.”
  • LightOntos for Workgroups, which “helps to organize and search information and documents. It allows the user to process and annotate PDF, Word, RTF, Text or HTML files using OntosMiner.”
  • Ontos SOA, which “realizes the whole cycle of semantic-syntactical processing, management and analysis of unstructured information located in the Internet and large corporative data banks.”
  • TAIS Ontos, which is “created as an Application Package using ORACLE technologies and Java. The system uses a semantic designed for building and maintaining object oriented databases. Additional components are effective engines for the search of explicit and hidden relations between objects. A visualization environment (interface) supports the analysts when analyzing a domain of interest. The product is adapted for the segment of law enforcing structures and attributed to the class of anti-criminal analytical systems”

The display of tagged text uses color to identify specific elements. When I saw this display, it reminded me to the output from Inxight Software’s text processing system.

ontos mark up

The company’s Russian partner–ZAO AviComp Services–participated in the recent German technical extravaganza, CEBIT 2008.

You will find a handful of white papers on the Ontos Web site. I found “Ontos Solutions for the Semantic Web” quite interesting and informative. You can download it here.

I wasn’t able to locate any pricing or licensing information. If you have some of these data points, please, use the comment form below this essay to share the information with other readers. My email to the company went unanswered.

Based on my clicking through the Web site, you might want to take a look at this system. The white papers and technical descriptions use the buzz words that other vendors bandy about. The one drawback to a system that lacks a high profile in the US is this question, “Does the system meet US security guidelines?” My hunch is that the system is industrial strength; otherwise, the Brussels customer would not have signed a deal to use the Ontos technology.

Stephen Arnold, June 5, 2008

Knewco: Community Tags

May 29, 2008

Peter Suber offers a clear, detailed post about a new approach to community tags. You can read his post “Combining OA, Wikis, Community Annotation, Semantic Processing, and Text Mining” here. Mr. Suber includes a link to a discussion of the idea in Genome Biology here.

What’s interesting to me is the specialist nature of the effort. Although anyone can tag, the focus is STM (scientific, technical, and medical). The idea is to create rich indexing for technical information. I think this is a good idea. I think there will be challenges because a small number of people do most of the work. Nevertheless, these types of projects are sorely needed.

The company responsible for the technology is Knewco, founded by several academics. You can learn more about the firm here. Knewco has developed some tag options that are interesting. I think the value will come from POW or “plain old words”.

Why do I care about this and what’s the wiki variant have to do with search? Well, a lot. First, technical information has long been in the hands of a small number of multi-national firms. If you want to search engineering or chemical information, you have to use specialist files and sometimes pay big, big online access charges. This type of project is one more example of the research community feeling its oats. Good for researchers and potentially threatening to the oligopolies in the STM information business.

Second, I like the idea that information innovation is coming from thinkers outside the traditional IR (information retrieval) community. When I go to conferences, there are 20-somethings who have an opportunity to lecture me on their major insight, Use For references. Okay, been there. Done that. Fresh thinking is important, and I am delighted that Knewco is trying pop ups, colors, and other bells and whistles that may point to some new directions in tagging.

Finally, the larger the body of publicly accessible tags, the better the next-generation systems will be. Google, as I point out in my new study due out in September 2008, is focused on making its software smarter. Humans play a role, but the GOOG knows the value of indexing, taxonomies, tags, and their breathern.

On the downside, I don’t like the company name “Knewco”. In fact, Knewco uses coinages for its different functions; for example, a “knowlet”. I hate having to memorize a neologism for something I call a cross reference. But that’s a personal preference. Check the company’s Web technology here.

Stephen Arnold, May 29, 2008

The Library of Congress and Semantic Search

May 14, 2008

The buzz about semantic search is rising. Powerset’s demonstration using Wikipedia data has triggered interest in searching in more intuitive ways. I received a news item about Semantra http://www.semantra.com, another player in this search market segment.

The Library of Congress is in the game too.

There’s an interesting news item “Semantic Search the Library of Congress”. To see how the US government approaches “beyond search”, navigate to http://lcsh.info/sh95000541. Once you have this url in your browser’s address bar, you can open a new window, and use this url to get a list of LCCNs to search semantically.
http://lcsh.info/.

The search result is a list of Use For terms, Narrower Terms (each of which is a hot link to more terms), the LC Classification, the date the entry was created, the date the entry was modified and alink to the Concept URI.

You will want to navigate to ProgrammableWeb.com http://www.programmableweb.com/api/library-of-congress-subject-headings and check out their explanation.

Based on this demonstration, today’s semantic search engines are not likely to be challenged in a meaningful way by a US government initiative any time soon.

Stephen Arnold, May 14, 2008

Collective Intelligence Anthology Available

May 14, 2008

The Arnoldit.com mascot admires the new collection of essay by Mark Tovey. Collective Intelligence: Creating a Prosperous World at Peace, published by the Earth Intelligence Network in Oakton, Virginia (ISBN: 13: 978-0-97-15661-6-3) contains more than 50 essays by analysts, consultants, and intelligence practitioners. You can obtain a copy from the publisher, Amazon, or your bookseller.

ci_art_02 copy

The ArnoldIT mascot completed reading the 600-page book with remarkable alacrity for a duck.

The collection of essays is likely to find many readers among those interested in social phenomena of networks. Many of the essays, including the one I contributed, talk about information retrieval in our increasingly inter connected world.

This essay will provide a synopsis of my contribution, “Search–Panacea or Play. Can Collective Intelligence Improve Findability”, which I wrote shortly before completing Beyond Search: What to Do When Your Search System Doesn’t Work“. My essay begins on page 375.

Social Search

The dominance of Google forces other vendors to look for a way over, under, around, or through its grip on the Web search. The vendor landscape now offers search and content processing systems that arguably do a better job of manipulating XML (Extensible Markup Language) content, figuring out who knows whom (the social graph initiative), and the “real” meaning of content (semantic search). There are more than 100 vendors who have technology that offers, if one believes the marketing collateral and conference presentations, a way to squeeze more information from information.

Social search is the name given to an information retrieval system that incorporates one or more of these functions:

  1. Users can suggest useful sites. Examples: Delicious.com and StumbleUpon.com
  2. The system discovers relationships between and among processed documents and links: Powerset.com and Kartoo Visu
  3. The system analyzes information extracts entities and identifies individuals and their relationships: i2 Ltd (now part of ChoicePoint) and Cluuz.com
  4. Monitoring of user behavior and using data to guide relevance, spidering and other system functions: public Web indexing companies

There are other types of social functions, but these provide sufficient salt and pepper for this information side dish. The reason I say side dish is that social functions are not going to displace the traditional functions on which they are based. Social search has been in the mainstream from the moment i2 Ltd. introduced its workbench product to the intelligence community more than a decade ago. “Social” functions, then, are a recent add-on to the main diet in information retrieval.

Old Statistics and Cheap, Powerful Computers

What’s overlooked in the rush to find a Google “killer” is that the new companies are using some well-known technologies. For example, the inner workings of Autonomy’s “black box” is somewhat dependent on the work of a slightly unusual Englishman, Thomas Bayes. Mr. Bayes left the world a couple of centuries ago, but his math has been a staple in college statistics courses for many years. To deploy Bayesian techniques on a large scale is, therefore, not exactly a secret to the thousands of mathematicians who followed his proofs in pursuit of their baccalaureate.

Read more

Intelligenx Discloses Referrals Fuel Rapid Growth

May 12, 2008

In an exclusive interview, Iqbal and Zubair Talib, senior managers of Intelligenx, reveal that referrals have fueled the company’s rapid growth. Intelligenx has a leadership position in directory and “yellow page” search in South Africa, South America, and elsewhere. The company’s profile, despite its US headquarters in suburban Washington, DC, is modest.

The father-son team said:

It seems that our international clients are actively talking about our technology at international conferences. We can always do a better job of marketing, but we put our customers first. Sales occur because people come to us and say, “We want to license your system”… we maintained certain relationships among an elite group of scientists and engineers. We never signed up to give marketing talks at the marketing-oriented venues. Our success comes because certain people understand our technology and recognize that it delivers scale, speed, performance, data management today. Our technology is our marketing.

Unlike search and content processing firms who issue news releases when a Web site signs on to use a well-known search engine or when a vendor announces for the second or third time a reseller deal, Intelligenx keeps innovating and selling.

The company’s system offers almost all of the features associated with the best-known vendors in the search market sector. The Talibs said:

Intelligenx was first to market with technology that offered a true full-text search with what many people call faceted or assisted search results. To achieve this functionality, performance under heavy loads is the prevailing challenge and simply put, our Discovery Engine® solves the problem in what we think is a most elegant fashion “Facets” or “guided navigation” are not just a “checkbox” on a feature matrix but an underlying central philosophy in our technology, the company, and in the development of our system.

You can read about the company’s new stream processing of information, what the Talibs call “cluster flow”. In addition to near real time index updating, additional metadata are generated without adding latency to the system. Another interesting feature of the Intelligenx system is that a licensee can provide its sales people with a real time view of what advertisements are germane to a popular query. The sales person is able to show a prospective advertiser a live report of traffic and the payoff from an advertisement in a specific context.

The company’s technology offers an alternative to the better-known MarkLogic system and the specialist firm, Dieselpoint.

You can read the entire interview on the ArnoldIT.com Web site. The full text of the interview is part of the Search Wizards Speak feature. The exclusive interview is the 13th in this series of first-person accounts of the origin and functionality of important search and content processing systems. Click here to read the interview.

Kartoo’s Visu: Semantic Search Plus Themescape Visualization

May 11, 2008

In England in December 2007, I saw a brief demonstration of Kartoo.com’s “thematic map”, which was announced in 2005.

The genesis for the company was developed from the relationships with large publishing groups into 1997. Mr. Baleydier was working to make CD-ROMs easily searchable. Founded in 2001 by Laurent and Nicholas Baleydier to provide a more advanced search interface. You can find out more about the company at Kartoo.net. Kartoo S.A. offers a no-charge metasearch Web system at Kartoo.com.

The original Kartoo service was one of the first to use dynamic graphics for Web search. Over the last few years, the interface became more refined. But the system presented links in the form of dynamic maps. Important Web sites were spherical, and the spheres were connected by lines. Here’s an example of the basic Kartoo interface as it looked on May 11, 2008, for the query “semantic search” run against the default of English Web sites. (The company also offers Ujiko.com, which is worth a quick look. The interface is a bit too abstract for me. You can try it here.)

defaultresultsonmay2008

The dark blue “ink blots” connect related Web sites. The terms provide an indication of the type of relationship between or among Web sites. You can click on this interface and explore the result set and perform other functions. Exploration of the interface is the best way to explore its features. Describing the mouse actions is not as effective as playing with the system.

Another company–Datops SA–was among the first to use interesting graphic representations of results. I recall someone telling me that the spheres that once characterized Groxis.com’s results had been influenced by a French wizard. Whether justified or not, when I saw spheres and ink blots, I said to myself, “Ah, another vendor influenced by French interface design”. In talking with people who use visualizations to help their users understand a “results space”, I’ve had mixed feedback. Some people love impressionistic representations of results; others, don’t. Decades ago I played a small role in the design of the F-15 interface or heads-up display. The one lesson I learned from that work was that under pressure, interfaces that offer too many options can paralyze reaction time. In combat, that means the pilot could be killed trying to figure out what graphics means. In other situations where a computational chemist is trying to make sense of 100,000 possible structures, a fine-grained visualization of the results may be appropriate.

Read more

Semantic Web: Useful Links

May 10, 2008

Advancing Insights posted a list of useful links for “Web 3.0, RDF, and the Semantic Web”. A content goose squawk for Jim Wilde for the links. Clicking through these documents is instructive. If you follow Google’s activities in the semantic space, you can see why Google has pushed forward with its programmable search engine.

Invented by former IBM Almaden scientist, the PSE or programmable search engine could, if deployed on a large scale by Google, make Google the de facto “hub” for semantic processing. You can download one of the Google PSE documents by navigating to the USPTO’s awesome Web site and searching for US2007 00386616, filed on April 10, 2005, and published on February 15, 2007.

Stephen Arnold, May 9, 2008

Lingospot: In Text Content Discovery Means Auto Linking

May 9, 2008

A semi-happy quack to the person who called Lingospot to my attention. The company uses linguistic analysis to identify and create dynamic links on publishers’ or bloggers’ pages. The idea is that you hover over a Lingospot link, a “Discovery Bubble” pop ups up and shows content from related Web sites. The idea is that a user will discover new, contextually relevant content. The company offer “online content discovery services”. The idea is that a publisher doesn’t have the money to pay a human to build these “See Also” references.

Lingospot inked a deal with Yedda.com. You can read the full news story here. Hurry, PR announcements can be ephemeral. A more compelling illustration of Lingospot’s system in action appears on the Forbes.com Web site. The idea is that the technology will increase a Forbes visitor’s “engagement”. Translation: time on the site and clicks which presumably will boost ad revenue. The Lingospot asserts that a typical licensee would enjoy a two to five percent increase in page views, a significant boost for a high-traffic site.

Forbes.com exposes the content to the Lingospot system, and then the Lingospot linguistic technology generates the self-referential links. My tests on Forbes.com may have been erroneous, but I was bounced around the sprawling Forbes.com Web site, not set to relevant content on Business Week’s or Fortune’s Web site, which would have been more useful to me.

My typical behavior on any site that features pop ups is to dismiss and ignore annoying fly over ads. I then avoid any links in the article text that produce these pop up links.

forbespopup

I may be the odd duck out (see logo to understand the metaphor), but I want to scan the Forbes.com article, not ads, not related content, and not pop ups that get in the way of reading the story. The Forbes’ story is what caused me to click in the first place. You may have a different view of these helpful “Discovery Bubbles”, and I encourage you to form your own opinion.

There’s a modest amount of information on the Lingospot.com Web site. If I were to sign up for the service, the Lingospot.com system puts a JavaScript snippet on the Web log. At this time, the company supports Blogger, Moveable Type, TypPad, and WordPress. You can, however, put the Lingospot function on any page. I decided to opt out of the service. My experience is that link services have to process the content on the Web site, and the delay can be a couple of days. However, I added Lingospot.com to my list of auto linkers which includes such companies as AdValiant.com, EchoTopic.com, and Kontera.com. You can find even more of these services on the Online Marketing Innovations’ Web log Folden here.

One of my sources told me that the company’s NLP technology is based on five years of research and development. The beta service became available in 2005. The company was founded in 2006.

Lingospot’s CEO is Nikos Iatropoulos. You can hear an interview with him at Social Buzz. Bob Sherry, formerly at ValueClick,is the senior VP of sales and marketing for the company. The firm operates from its offices in Los Angeles. If you want to know more about Lingospot, you could ring 310 475 1600 and leave a message.

The use of linguistic technology to make related information available is a good one. Describing these pop ups as a content discovery tools seems to be massaging a well-worn advertising chestnut. As Google’s dominance of online search continues without a significant challenge from Ask.com, Microsoft.com, or Yahoo.com, these “new marketing tools” becomes important to Web masters who can’t generate enough revenue from traffic to keep the lights on.

What’s interesting to me is that the once-exotic linguistic and semantic technologies are now sufficiently tame for use by marketing companies. The semantic revolution is indeed here when an account rep can mouth a phrase like “in text content discovery”, confident that the demo relies on high-tech voodoo that sort of works. For cash-strapped and traffic-challenged publishers, auto linking may be a modest silver bullet.

Stephen Arnold, May 9, 2008

Google and Semantics: More Puzzle Pieces Revealed

May 6, 2008

On May 5, 2008, Search Engine Round Table carried an interesting post, “Google Improves Semantic Search”. You can find the post here. The key point is that Google is using truncation “to stem complex plurals”. SEO Round Table points to the Google Groups thread as well. That link is here.

Google’s been active in semantics for a number of years. In 2007, I provided information to the late, great Bear Stearns’ Internet team. Based on my work, Bear Stearns issued a short note about Google’s semantic activity. This document may be available from a Bear Stearns’ broker, if there is one on the job.

An in-depth discussion of five Google semantic-centric inventions appears in Google Version 2.0. This analysis pivots on five patent applications filed in February 2007. A sole inventor, Ramanathan Google, describes a programmable search engine that performs semantic analysis and stores various metadata in a context server. The idea is that the context of a document, a user, or a process provides important insights into the meaning of a document. If you are a patent enthusiast, the five Guha inventions are:

    US2007 00386616, filed on April 10, 2005, and published on February 15, 2007 as “Programmable Search Engine”

    US2007 0038601, filed on August 10, 2005, and published on February 15, 2007, as “Aggregating Content Data for Programmable Search Engines”

    US2007 0038603, filed on August 10, 2005, and published on February 15, 2007, as “Sharing Context Data across Programmable Search Engines”

    US2007 0038600, filed on August 10, 2005, and published on February 15, 2007, as “Detecting Spam-Related and Biased Contents for Programmable Search Engines”

    US2007 0038614, filed on August 10, 2005, and published on February 15, 2007, as “Generating and Presenting Advertisements Based on Context Data from Programmable Search Engines”.

These patent documents don’t set a time table for Google’s push into semantics. It is interesting to me that an influential leader in the semantic standards effort invented the PSE or programmable search engine. Dr. Guha, a brilliant innovator, demonstrates that he is capable of doing a massive amount of work in a short span of time. I recall that he joined Google in early 2005, filing more than 130 pages of semantic systems and methods in less than nine months. I grouped these because filing five documents on the same day with each document nudging Google’s semantic invention forward from slightly different angles struck me as interesting.

Stephen Arnold, May 7, 2008

Mondeca: A Semantic Technology Company

April 25, 2008

Twice in the last two days I’ve been asked about Mondeca, based in Paris. If you are not familiar with the company, it has been involved in semantic content processing for almost a decade. The company describes itself in this way:

Mondeca provides software solutions that leverage semantics to help organizations obtain maximum return from their accumulated knowledge, content and software applications. Its solutions are used by publishing, media, industry, tourism, sustainable development and government customers worldwide.

The company made a splash in professional publishing with its work for some of the largest scientific, technical, legal, and business publishers. Its customers include Novartis, the Thomson Corporation, LexisNexis, and Strabon.

Mondeca makes a goodly amount of information available on its Web site. You can learn more about the company’s technology, solutions, and management team by working through the links on the Web site.

Indexing by the Book: Automatic Functions Plus Human Interaction

Semantic technology or semantic content analysis can carry different freights of meaning. My understanding is that Mondeca has been a purist when it comes to observing standards, enforcing the rules for well-formed taxonomies, and assembling internally consistent and user friendly controlled term lists. If you are not familiar with the specifics of a rigorous approach to controlled terms and taxonomies, take a look at this screech of Bodega’s subject matter expert interface. Be aware that I pulled this from my files, so the interface shipping today may differ from this approach. The principal features and functions will remain behind the digital drapery, however.My recollection is that this is the interface used by Wolters Kluwer for some of its legal content.

Interface

What is obvious to me is that Mondeca and a handful of other companies involved in semantic technology take an “old school” approach with no short cuts. Alas, some of the more jejune pundits in the controlled vocabulary and taxonomy game can sometimes be more relaxed. Without training in the fine art of thesauri, a quick glance makes it difficult for an observer to see the logical problems and inconsistencies in a thesaurus or taxonomy. However, after the user runs some queries that deliver more chaff than wheat, the quick-and-dirty approach is like one of those sugar-free and fat-free cookies. There’s just not enough substance to satisfy the user’s information craving.

Read more

« Previous Page

  • Archives

  • Recent Posts

  • Meta