LTU: Challenging the Thomson Reuters Trademark Fortress

May 16, 2008

LTU Technologies in France is putting its well-regarded image search technology to work in a proprietary trademark database. The LTU system compare a submitted digital image against the database to confirm an already extant trademark or industrial design. You can read about that here. Click quickly. Some of news stories disappear without warning.

LTU’s image search has been among the most accurate available. Military and intelligence entities have been among LTU’s most eager customers. Now LTU is moving into a service where Thomson Reuters has a strong, if not dominant, position. You can read about Thomson’s Trademarkscan service here.

Thomson also operates Derwent, a patent information service, and the company has dozens of complimentary information services for intellectual property.

What are LTU’s chances of running with the big dog? We think that LTU will have to move quickly and be prepared for some Thomson Reuters push back.

If LTU puts out a quality product and focuses its effort, LTU may be able to offer an alternative to Thomson Derwent customers looking for options. But speed and quality are important. Oh, LTU has to be prepared for Thomson Reuters-style competition.

Jessica Bratcher, May 16, 2008

Enterprise Search Vendors’ Taglines

May 16, 2008

A colleague in San Francisco asked me on May 14, 2008, “How do the search engine vendors position themselves?”

I told him that I would think about the question on the luxurious red-eye flight from SFO to Detroit. I did. I worked through the files on my trusty laptop and compiled a list of the taglines for some of the vendors whom I monitor. The list is not exhaustive, but I had data about a couple of dozen companies in the behind-the-firewall search business.

The table below provides a summary of the taglines. These are quite interesting, and I was surprised at the different approaches taken to explaining the companies’ systems. For example, I liked the taglines that echoed Caesar’s I came, I saw, I conquered (Vini, vidi, vici). SchemaLogic says, “Find. Use. Protect.” Thetus asserts, “Find. Assess. Fit. Understand.” Lexalytics crafts, “Discover. Understand. Act.”

Several of the companies use active or instrumental catchphrases. Brainware, a spin out from a German content management company, uses, “Intelligence unleashed.” I thought of a tiger pursuing me through the Louisville Zoo. And InQuira says, “Harvest knowledge.” Nstein, a company that has undergone accelerated evolution,

Less creative influences put a damper on marketing passion in these slogans. Panoptic (now Funnelback) gently offers, “Internet and Enterprise Search.” Almost matching the Australian’s tagline is Fast Search & Transfer’s “The business of search.” Clearforest matches these in understatement with its “Text Analytics Solutions.” ZyLAB comes close too, saying, “Infomation Access Solutions.”

Other companies use the tagline as elevator speeches on a diet. For example, Endeca, flush with investments from Intel and SAP, states, “Innovative Software to Help People Explore, Analyze, and Understand Information.” Not to be outdone in the pitch department is ISYS Search Software’s “Enterprise Search Solutions for Real People Doing Business in the Real World.” (I like the “real” part of this statement because some of the taglines are a bit abstract.) Stratify (formerly Purple Yogi) stikes a Zen-like note: “Focus on the Matter of eDiscovery with Peace of Mind.” When I repeat this five times, my heart rate slows and my blood pressure drops.

Other vendors assert that their system is Numero Uno in the search-and-retrieval sector in a nice way, of course. Open Text, a company with as many search technologies as Microsoft, declares themselves “The Content Experts.” And, Dieselpoint opines, “The Leader in Search & Navigation Technology.”

A small number of vendors drift into the poetic. Exegy uses repetition and alliteration to explain its super-fast appliance: “Extreme Speed. Extreme Insight.” Or, SurfRay (owner of Mondosoft and Speed of Mind) and its rhytmic “We Move People to Discover.” Note that SurfRay itself, a relative newcomer to search, describes itself this way, “Pioneers in Enterprise Search and Behavior Analytics.” Strong stuff and sure to cat catch the attention of Autonomy working overtime to catch up with the “Don’t be evil” Googlers.

Read more

Content Transformation: A Challenge that Won’t Go Away

May 15, 2008

We live in a world of Web 2.0 and Web 3.0 goodness. At the Where 2.0 conference in Burlingame, California on May 14, 2008, I overheard this snippet of conversation:

We had everything working, but when we imported content, the system crashed. I reinstalled. I checked the config files. It still crashed. I have to open each file, resave it as an RTF, and import them one at a time. Grrrr.

Sound familiar?

I have heard this complaint many times before. In our content-savvy, XML-ized era, moving a source file into a content processing system should be trivial. The content processing system can extract entities. It can metatag. Some can slice, dice, and cook a chicken. But unless the system can intake content and transform it to something that the content processing subsystem understands, the system is dead in the water. Even worse, the text processing system only processes some of the source documents. In certain mission critical applications, kicking out documents is a no-no. Not only is the manual manipulation expensive, it’s time consuming. In those minutes or hours of fiddling, potentially significant data are not available to the analysts. What does missing information cost? Well, it depends on your work situation. In the Wall Street world, investment information can turn a win into a loss in a millisecond. In certain military applications, the information may mean the difference between health and harm.

square circle

Transforming a square into a circle or a circle into a square looks easy. With a triangle and a compasss you can create two objects. Its the intermediate steps that become tricky for an artist or a budding mathematician.

What is file or data transformation? In its simplest form, you have a file in Microsoft Word 2007 format, and you want to “transform” or change the file into a format recognized by another system’s import filter. So, one approach would be to open the File in Word 2007, click on File Save As, select RTF (Rich Text Format), and save the file. You can then allow your search or content processing system to suck the file into the conversion subsystem and turn the RTF into whatever target output format the filter generates. In a more sophisticated form, you take an unstructured document or a database table, and you transform it into some file type that your system can process. A more interesting task is to convert a file into a file with a comparable structure; for instance, take and SGML instance and convert it to HTML. Some search system vendors include filters and transformation tools with their system. Others provide an application programming interface. The idea is that you will write a script to perform whatever conversion you require, handle entities in an appropriate manner, and preserve the information and metadata (if available) throughout the process.

Let’s take a quick look at several transformation challenges and then step back to consider what steps you can follow to minimize these problems. Before jumping into the causes, keep in mind that as much as 30 percent of an information technology department’s budget is consumed by transformation costs. This astounding number surfaced in a presentation given by a Google engineer in 2007. If that number seems high, you can knock it down to a more acceptable 10 or 20 percent. The point is that fiddling with data when moving it from one system and format to another is a common task. Any transformation activity can go off the tracks. Read more

New Contract for Clarabridge

May 15, 2008

Clarabridge, a “customer experience management vendor,” recently scored a posh client in Gaylord Hotels, who wants to utilize text analysis to review customer satisfaction surveys. Keeping millionaires happy requires technology.

The Clarabridge contract will install its content mining platform at Gaylord properties. The goal: to relate textual commentary to a satisfaction scale. Clarabridge’s product dumps extracted, unstructured data into a star schema to make associated fact tables, just like progenitor-once-removed MicroStrategy, the business intelligence company that passed on its reporting, analysis, and monitoring solutions DNA.

Clarabridge has a client list that includes big names Marriott, The Gap, H&R Block and more – making it quite unlikely that it will suffer a stock crash like Microstrategy did ($333 to $1 – ouch!) in 2001. Some pundits assert that Clarabridge is a company that will challenge Attensity www.attensity.com, a low-profile, fast-growing text analytics company headed by David Bean.

Gaylord, owner and operator of four vast and lavish resort hotel properties,  receives tens of thousands of guest commentaries through its Opryland (Nashville, Tenn.), Palms (Orlando, Fla.), Texan (Dallas/Fort Worth), and National (Washington, D.C./Maryland) properties in a Web-based survey. While polled information is fairly straightforward, the information gained in the “other comments” box at the end of a survey is expensive, difficult to quantify, and make useful using humans. Clarabridge’s platform will change all that.

At Clarabridge’s web site, you can download their white papers, case studies, industry resources and more.

Jessica Bratcher, May 15, 2008

The Library of Congress and Semantic Search

May 14, 2008

The buzz about semantic search is rising. Powerset’s demonstration using Wikipedia data has triggered interest in searching in more intuitive ways. I received a news item about Semantra http://www.semantra.com, another player in this search market segment.

The Library of Congress is in the game too.

There’s an interesting news item “Semantic Search the Library of Congress”. To see how the US government approaches “beyond search”, navigate to http://lcsh.info/sh95000541. Once you have this url in your browser’s address bar, you can open a new window, and use this url to get a list of LCCNs to search semantically.
http://lcsh.info/.

The search result is a list of Use For terms, Narrower Terms (each of which is a hot link to more terms), the LC Classification, the date the entry was created, the date the entry was modified and alink to the Concept URI.

You will want to navigate to ProgrammableWeb.com http://www.programmableweb.com/api/library-of-congress-subject-headings and check out their explanation.

Based on this demonstration, today’s semantic search engines are not likely to be challenged in a meaningful way by a US government initiative any time soon.

Stephen Arnold, May 14, 2008

Collective Intelligence Anthology Available

May 14, 2008

The Arnoldit.com mascot admires the new collection of essay by Mark Tovey. Collective Intelligence: Creating a Prosperous World at Peace, published by the Earth Intelligence Network in Oakton, Virginia (ISBN: 13: 978-0-97-15661-6-3) contains more than 50 essays by analysts, consultants, and intelligence practitioners. You can obtain a copy from the publisher, Amazon, or your bookseller.

ci_art_02 copy

The ArnoldIT mascot completed reading the 600-page book with remarkable alacrity for a duck.

The collection of essays is likely to find many readers among those interested in social phenomena of networks. Many of the essays, including the one I contributed, talk about information retrieval in our increasingly inter connected world.

This essay will provide a synopsis of my contribution, “Search–Panacea or Play. Can Collective Intelligence Improve Findability”, which I wrote shortly before completing Beyond Search: What to Do When Your Search System Doesn’t Work“. My essay begins on page 375.

Social Search

The dominance of Google forces other vendors to look for a way over, under, around, or through its grip on the Web search. The vendor landscape now offers search and content processing systems that arguably do a better job of manipulating XML (Extensible Markup Language) content, figuring out who knows whom (the social graph initiative), and the “real” meaning of content (semantic search). There are more than 100 vendors who have technology that offers, if one believes the marketing collateral and conference presentations, a way to squeeze more information from information.

Social search is the name given to an information retrieval system that incorporates one or more of these functions:

  1. Users can suggest useful sites. Examples: Delicious.com and StumbleUpon.com
  2. The system discovers relationships between and among processed documents and links: Powerset.com and Kartoo Visu
  3. The system analyzes information extracts entities and identifies individuals and their relationships: i2 Ltd (now part of ChoicePoint) and Cluuz.com
  4. Monitoring of user behavior and using data to guide relevance, spidering and other system functions: public Web indexing companies

There are other types of social functions, but these provide sufficient salt and pepper for this information side dish. The reason I say side dish is that social functions are not going to displace the traditional functions on which they are based. Social search has been in the mainstream from the moment i2 Ltd. introduced its workbench product to the intelligence community more than a decade ago. “Social” functions, then, are a recent add-on to the main diet in information retrieval.

Old Statistics and Cheap, Powerful Computers

What’s overlooked in the rush to find a Google “killer” is that the new companies are using some well-known technologies. For example, the inner workings of Autonomy’s “black box” is somewhat dependent on the work of a slightly unusual Englishman, Thomas Bayes. Mr. Bayes left the world a couple of centuries ago, but his math has been a staple in college statistics courses for many years. To deploy Bayesian techniques on a large scale is, therefore, not exactly a secret to the thousands of mathematicians who followed his proofs in pursuit of their baccalaureate.

Read more

Groping the Enterprise Search Elephant

May 12, 2008

In the 2000 to 2003 period, ArnoldIT.com delivered a number of tutorials about search. Some of these presentations were held in conjunction with conferences such as the Boston Search Engine Meeting, Gilbane’s conferences, and the Information Today line up of professional programs. Others were delivered to small groups at various financial institutions, search vendors, and government entities.

elephant_final

This is the search elephant. In a meeting, you will hear many people talk about search. Each person will have a specific meaning and assume that the others in the room will know exactly what’s meant when she uses the word search. If you take all these individual meanings of search and put them together, you have a better idea of what a search system is supposed to deliver.

In each case, I had to take more time than budgeted to define the different types of search encountered in enterprise behind-the-firewall deployments. This issue surfaced this week end when I spoke with a colleague grousing about the different perceptions of search in a consulting firm in Europe.

The purpose of this essay is to provide an abbreviated and hopefully useful look at the different meanings of search. To help make these ideas concrete, You can learn more about this subject in Enterprise Search Report and the brand-new Beyond Search study that came out in April 2008. I wrote the first three editions of ESR and played a minor part in the current edition, but you will get some color on this topic in those for-fee analyses.

Everybody Knows about Search

The definition issue is skipped over because most people today believe they know about search. At dinner last night, people said, “I did a search for a cruise to Brazil”, “I looked up my health care benefits and found they were reduced” and I’m not sure it’s worth seeing” and “My boss had me find a proposal he thought he had lost when his laptop was stolen”. None of these people were information retrieval professionals or computer scientists. But each of them talked about search as if it were a routine activity like finding a parking space.

The need for a definition goes up when people assume others mean the same thing for search. Let’s look at the meanings for search in an enterprise.

Enterprise Search or Behind-the-Firewall Search

This is the buzz word of the moment. Companies know intuitively that if a worker can’t find information on the company’s own internal network, the worker is going to waste time looking for what’s needed. Even worse, the employee can’t find the accurate information and makes a bone head decision.

Enterprise search is a contradiction. No boss in the world wants “everything” indexed and searchable. Problems come from indexing “everything”. A few of the bombs in the enterprise search mine field are:

  • Email on topics that are or can be problematic
  • Information about company secrets like Coca Cola’s formula for the fizzy drink
  • Information about legal matters
  • Information an employee puts on a company server about non-company activities
  • Personal, salary, and medical information
  • Pricing information
  • Stolen software, information from a third-party provider without paying a license fee or obtaining a copyright permission, information about a competitor that was obtained via an email from a friend

Search works best when the domain of information to index is narrowly defined, reviewed, and subject to a formal approval and review policy. Ad hoc indexing of behind-the-firewall information can trigger big trouble fast.

Read more

Intelligenx Discloses Referrals Fuel Rapid Growth

May 12, 2008

In an exclusive interview, Iqbal and Zubair Talib, senior managers of Intelligenx, reveal that referrals have fueled the company’s rapid growth. Intelligenx has a leadership position in directory and “yellow page” search in South Africa, South America, and elsewhere. The company’s profile, despite its US headquarters in suburban Washington, DC, is modest.

The father-son team said:

It seems that our international clients are actively talking about our technology at international conferences. We can always do a better job of marketing, but we put our customers first. Sales occur because people come to us and say, “We want to license your system”… we maintained certain relationships among an elite group of scientists and engineers. We never signed up to give marketing talks at the marketing-oriented venues. Our success comes because certain people understand our technology and recognize that it delivers scale, speed, performance, data management today. Our technology is our marketing.

Unlike search and content processing firms who issue news releases when a Web site signs on to use a well-known search engine or when a vendor announces for the second or third time a reseller deal, Intelligenx keeps innovating and selling.

The company’s system offers almost all of the features associated with the best-known vendors in the search market sector. The Talibs said:

Intelligenx was first to market with technology that offered a true full-text search with what many people call faceted or assisted search results. To achieve this functionality, performance under heavy loads is the prevailing challenge and simply put, our Discovery Engine® solves the problem in what we think is a most elegant fashion “Facets” or “guided navigation” are not just a “checkbox” on a feature matrix but an underlying central philosophy in our technology, the company, and in the development of our system.

You can read about the company’s new stream processing of information, what the Talibs call “cluster flow”. In addition to near real time index updating, additional metadata are generated without adding latency to the system. Another interesting feature of the Intelligenx system is that a licensee can provide its sales people with a real time view of what advertisements are germane to a popular query. The sales person is able to show a prospective advertiser a live report of traffic and the payoff from an advertisement in a specific context.

The company’s technology offers an alternative to the better-known MarkLogic system and the specialist firm, Dieselpoint.

You can read the entire interview on the ArnoldIT.com Web site. The full text of the interview is part of the Search Wizards Speak feature. The exclusive interview is the 13th in this series of first-person accounts of the origin and functionality of important search and content processing systems. Click here to read the interview.

Powerset Available

May 12, 2008

Navigate to Powerset.com and try out the much-publicized Web search system. Using proprietary technology plus third-party components, Powerset is a semantic search system. The system differentiates itself with fact extraction (Factz, in Powerset jargon), direct links to definitions, and a summary / outline view. A big yellow sticky note says that Powerset is searching Wikipedia articles, but my test queries returned useful information in the results list in default mode; for example, the name of Tropes Zoom, a system I had heard about but never seen. A quick Google search allowed me to pinpoint Semantic Knowledge as a company with a technology of this name. I’m not sure Powerset envisioned my use of its system as a front end for Google, but that use jumped out at me. Check it out and let me know if you think it is better than Google, Hakia, or Exalead. These are systems that contain a dollop of semantic sauce. Hopefully the company will provide a larger content index either by spidering the Web or via a metasearch like Vivisimo’s.

Stephen Arnold, May 12, 2008

Kartoo’s Visu: Semantic Search Plus Themescape Visualization

May 11, 2008

In England in December 2007, I saw a brief demonstration of Kartoo.com’s “thematic map”, which was announced in 2005.

The genesis for the company was developed from the relationships with large publishing groups into 1997. Mr. Baleydier was working to make CD-ROMs easily searchable. Founded in 2001 by Laurent and Nicholas Baleydier to provide a more advanced search interface. You can find out more about the company at Kartoo.net. Kartoo S.A. offers a no-charge metasearch Web system at Kartoo.com.

The original Kartoo service was one of the first to use dynamic graphics for Web search. Over the last few years, the interface became more refined. But the system presented links in the form of dynamic maps. Important Web sites were spherical, and the spheres were connected by lines. Here’s an example of the basic Kartoo interface as it looked on May 11, 2008, for the query “semantic search” run against the default of English Web sites. (The company also offers Ujiko.com, which is worth a quick look. The interface is a bit too abstract for me. You can try it here.)

defaultresultsonmay2008

The dark blue “ink blots” connect related Web sites. The terms provide an indication of the type of relationship between or among Web sites. You can click on this interface and explore the result set and perform other functions. Exploration of the interface is the best way to explore its features. Describing the mouse actions is not as effective as playing with the system.

Another company–Datops SA–was among the first to use interesting graphic representations of results. I recall someone telling me that the spheres that once characterized Groxis.com’s results had been influenced by a French wizard. Whether justified or not, when I saw spheres and ink blots, I said to myself, “Ah, another vendor influenced by French interface design”. In talking with people who use visualizations to help their users understand a “results space”, I’ve had mixed feedback. Some people love impressionistic representations of results; others, don’t. Decades ago I played a small role in the design of the F-15 interface or heads-up display. The one lesson I learned from that work was that under pressure, interfaces that offer too many options can paralyze reaction time. In combat, that means the pilot could be killed trying to figure out what graphics means. In other situations where a computational chemist is trying to make sense of 100,000 possible structures, a fine-grained visualization of the results may be appropriate.

Read more

Next Page »