Cirilab: Entity Extraction
April 6, 2009
I took a quick look at Cirilab in order to update my files about entity extraction vendors.
Cirilab develops practical search, retrieval and categorization software designed to increase organizational productivity by effectively harnessing key knowledge resources. Cirilab offers a range of advanced analysis and organization applications and tools.
I learned about the company when another consultant sent me links to several online demonstrations of the Cirilab’s technology. I located an older but useful discussion of the Crilab technology here. You can explore a Wikipedia entry about Winston Churchill here and a document navigator of Sir Winston’s writings here. The engine generating these demos is called the KGE or Knowledge Generation. The idea is that KGE can process unstructured text and generate insights into that text.
The company’s enterprise solutions include vertical builds of the KGE:
- Publishing. The Web Ready Publishing service allows an organization to take unstructured data in WordPerfect, Word, Adobe PDF, HTML, and even Text files, and publish it in a Web Ready Publishing format so that it is instantly available to your customers in a thematically navigable format.
- Pharma. Cirilab can “read” the documents and therefore allow “mining” of existing data.
- Legal. KGE permits discovery of information.
- Security and intelligence. Cirilab products provide unique insights into this information not otherwise available.
The company offers a range of desktop products. These are excellent ways to learn about the features and functions of the Crilab’s KGE system.
More recently, Cirilab has succeeded in developing and bringing to market a core suite of technologies known as KOS (Knowledge Object Suite) based on its Multidimensional Semantic Spatial Indexing Technology.
You can register and receive a free, thematic map of your Web site. The company is located in Ottawa, Ontario. You can get more information here.
Stephen Arnold, April 6, 2009
Google Leximo Tie Up
April 2, 2009
Leximo is a social dictionary; specifically, “a Multilingual User Collaborated Dictionary that lets you search, discover and share your words with the World.” Google snapped up the company. You can read the Leximo manifesto here. One of the tenets is:
Open community-based and user-friendly functions promote participation, accountability and trust.
What’s Google need a dictionary for? In my opinion, the GOOG wants a flow of new words plus definitions to fatten up its existing knowledgebases. I am confident the idealism of Leximo will persist at the GOOG.
Stephen Arnold, April 2, 2009
Coveo Lands a Whale-Sized Search Deal
March 31, 2009
Bell Mobility and Coveo, a leading provider of information access and search solutions for the enterprise, announced an exclusive next-generation search and access tool called Enterprise Search from Bell.
Powered by Coveo’s patented search and index technology, Enterprise Search from Bell offers business clients comprehensive search capability on their BlackBerry smartphones, including full mobile access to information contained within their Microsoft Exchange server accounts and even across their entire corporate information technology systems.
Wade Oosterman, President of Bell Mobility and Chief Brand Officer for Bell, said:
Enterprise Search from Bell is the only mobile business tool available that provides clients with such instant mobile access to critical enterprise content via their BlackBerry devices. Our partnership with Coveo is another example of Bell’s dedication to delivering data solutions that meet the evolving needs of mobile business. Enterprise Search from Bell provides business users with the ability to securely search and retrieve any information within their Microsoft Exchange server accounts, including emails, attachments content, calendars, tasks and contacts via their BlackBerry devices. With this capability, users can access the precise information they need within seconds, when they need it, without having to know where the document was previously stored.
Louis Tetu, Coveo’s Executive Chairman, said:
Search technology is by far one of the most promising information technology investments for enabling workforce productivity across the enterprise. Mobile search solutions provide employees with all the relevant information they need from any location. In contributing our expertise to Bell Mobility, we are helping to drive innovation in their mobile business solutions, which is part of our initiative to be a player in the ‘smartphones for business revolution’ market trend.
The service features Coveo’s user-friendly interface. The new enterprise search service from Bell offers numerous possibilities to business users who work with large amounts of rapidly changing information. Executives, sales professionals, account managers, professional services providers, customer care, call center representatives, IT administrators, project managers and human resources, legal and engineering professionals are provided instant access to the information they require to perform their roles more effectively. As a result, this drives improved productivity and enables higher levels of self-service across the workforce.
Bell Mobility’s partnership with Coveo is unique in the market as it combines the mass distribution reach of a market-leading national mobile carrier with an innovative enterprise search solution to offer customers in all areas of Canada the ability to drive better business performance across their workforce in a rapid and economical deployment model.
For further information on Enterprise Search from Bell click here. For more information about Coveo, click here. To read the interview with Laurent Simoneau, the search expert driving Coveo’s technology, click here.
A happy quack to the Coveo team from the goslings in Harrod’s Creek.
Stephen Arnold, March 31, 2009 Gets Better
March 30, 2009
I did a fly over of the Web site. What triggered an overflight was a Google patent; specifically, US20090070312, “Integrating External Related Phrase Information into a Phrase-Based Indexing Information Retrieval System”. Filed in September 2007, the USPTO spit it out on March 12, 2009. I discussed a chain of Dr. Patterson’s inventions in my 2007 study Google Version 2.0 here. Dr. Patterson is no longer a full-time Googler, the tendrils of her research from Xift to Cuil pass through the GOOG. When I looked at today (March 29, 2007), I ran my suite of test queries. Most of them returned more useful and accurate results than my first look at the system in July 2008 here.
Several points I noticed:
- The mismatching of images to hits has mostly been connected. The use of my logo for another company, which was in the search engine optimization business was annoying. No more. That part of the algorithm soup has been filtered.
- The gratuitous pornography did not pester me again. I ran my favorites such as pr0n and similar code words. There were some slips which some of my more young at heart readers will eagerly attempt to locate.
- The suggested queries feature has become more useful.
- My old chestnut “enterprise search” flopped. The hits were to sources that are not particularly useful in my experience. The Fast Forward conference is no more, but there’s a link to the now absorbed user group. The link to the enterprise search summit surprised me. The conference has been promoting like crazy despite the somewhat shocking turn out last year in San Jose, so it’s obvious that flooding information into sites fools the relevancy engine.
- The Explore by Category is now quite useful. One can argue if it is better than the “improved” Endeca. I think’s automated and high-speed method may be more economical to operate. Dr. Patterson and her team deserve a happy quack.
I am delighted to see that the improvements in are coming along nicely. Is the system better than Google’s or Microsoft’s Web search system? Without more testing, I don’t think I can make a definitive statement. I am certain that there will be PhD candidates or ASIS members who will rise to fill this gap in my understanding.
I have, however, added the system to my list of services to ping when I am looking for information.
Stephen Arnold, March 30, 2009
Storage a Problem for Most Organizations
March 30, 2009
Most people don’t know too much about Kroll, a unit of a diversified financial services firm. I was surprised, therefore, to see a public story about a survey conducted by this ultra low profile outfit. The article was “Storage Practices Don’t Match Policies” in IDM.Net, a Australian Web log here. The point of the write up was that in the Kroll survey storage policies were not particularly well conceived. The most important comment in the write up was:
The survey found that 40 percent of individuals stated that their company has a policy regarding where data should be stored. However, the survey results also revealed that 61 percent of respondents “usually” save to a local drive instead of a company network.
Makers of automated back up systems will rejoice. Attorneys suing an organization with lousy back up practices are probably dancing in the streets. Where there are informal collections of data, there is gold for the eDiscovery prospector.
If you want to know more about Kroll, click here and read the Search Wizards Speak with David Chaplin, one of the developers of Engenium, an interesting software for extracting nuggets from these data gold mines.
Stephen Arnold, March 30, 2009
Google Interview Worth Reading
March 25, 2009
The interview with Alfred Spector in ComputerWorld is interesting for what it says and what it omits. You can find the article “The Grill: Google’s Alfred Spector on the Hot Seat” here. This is a three part interview. Mr. Spector is billed as Google’s vice president of research. For me, the most interesting comment was:
Do you have plans to go after that huge body of information on the Internet that is not currently searched? There is stuff on the Web, the so-called Deep Web, that is only “materialized” when a particular query is given by filling fields in a form. Since crawlers only follow HTML links, they cannot get to that “hidden” content. We have developed technologies to enable the Google crawler to get content behind forms and therefore expose it to our users. In general, this kind of Deep Web tends to be tabular in nature. It covers a very broad set of topics. It’s a challenge, but we’ve made progress.
I would hope so. Google has Drs. Guha and Halevy chugging away or had them chugging away on this problem. Furthermore, Google bought Transformics, a company that most of the Google pundits have paid scant attention to. Yep, Googzilla is making progress. Just plonking along with the fellow who worked on the semantic Web standards and the chap who invented the information manifold. I enjoy Google understatement.
Stephen Arnold, March 24, 2009
Palantir: Data Analysis
March 24, 2009
In the last month, three people have asked me about Palantir Technologies. I have had several people mention the work environment and the high caliber of the team at the company. The company has about 170 employees and is privately held. I have heard that the firm is profitable, but I have that from two sources now hunting for work after their financial institutions went south. The company is one of the leaders in finance and intelligence analytics. The specialities of the company include global macro research and trading; quantitative trading; knowledge discovery and knowledge management.
If you are not familiar with the company, you may want to navigate to and take a look at the company’s offerings. Located in Palo Alto, the company focuses on making software that facilitates information analysis. With interest in business intelligence waxing and waning, Palantir has captured a very solid reputation for sophisticated analytics. Law enforcement and intelligence agencies “snap in” Palantir’s software to perform analysis and generate visualizations of the data. The company has been influenced by Apple in terms of the value placed upon sophisticated design and presentation. Palantir’s system makes highly complex tasks somewhat easier because of the firm’s interfaces. If you want to generate a visualization of a large, complex analytic method, Palantir can produce visually arresting graphics. If you navigate to the company’s “operation tradestop” page here, you can access demonstrations and white papers.
When I last checked the company’s demos, a number of them provided examples drawn from military and intelligence simulations. These examples provide a useful window into the sophistication of the Palantir technology. The company’s tools can manipulate data from any domain where large datasets and complex analyses must be run. The screenshot below comes from the firm’s demonstration of an entity extraction, text processing, and relationship analysis:
A Palantir relationship diagram. Each object is a link making it easy to drill down into the underlying data or documents.
Each object on the display is “live” so you can drill down or run other analyses about that object. The idea is to make data analysis interactive. Most of the vendors of high-end business intelligence systems offer some interactivity, but Palantir has gone further than most firms.
The company has a Web log, and it seems to be updated with reasonable frequency. The Web log does a good job of pointing out some of the features of the firm’s software. For example, I found this discussion of the Palantir monitoring server quite useful. The Web site emphasizes the visualization capabilities of the software. The Web log digs deeper into the innovations upon which the graphics rest.
Be careful when you run a Google query for Palantir. There are several firms with similar names. You will want to navigate to You may find yourself at another Palantir when you want the business intelligence firm.
Stephen Arnold, March 24, 2009
ISYS Search Software: Google Patent Collection
March 24, 2009
You will want to take a look at the ISYS Search Software demonstration here. The company took my collection of Google patent documents from 1998 to December 2008 and processed them. You can run a key word query, click on the names of people, and explore this window into Google’s technology hot house via the ISYS Search Version 9. When you locate a patent document that interests you, a single click will display the PDF of the patent document. You can browse the drawings and claims with the versatile ISYS system at your beck and call.
I have used the ISYS Search Software since Version 3.0. The system delivers high speed document processing, high speed query processing, and a raft of features. For more information about ISYS Version 9, click here. I have been critical of search systems for more than two decades. ISYS Search Software engineers’ have listened to me, and I know from experience that the team in Crow’s Nest and in Denver have a long term commitment to their customers and implementing useful features with each release.
Highly recommended. More information about ISYS Search Software is at
Stephen Arnold, March 24, 2009
Financial Times: Try, Try, Try
March 20, 2009
Flashback. year 2005. I was a paying subscriber. I got a user name and a password. I logged on. Ran a query and the system timed out. Flash forward to 2007. licenses Fast Search & Transfer. I tested the system. Slow. I was asked to test a semantic system under consideration by the Financial Times. Useful but slow, slow, slow. Now the Financial Times has tapped another point and click vendor for a “deep” search experience. Time out. The Financial Times, arguably one of the two bigger franchises in business information, has been a laggard in online search for quite a while. The FT’s parent owns a chunk of the Economist, another blue chip in business information. I was a subscriber to * both * the print and online editions until late 2007. Why did I drop these must read news sources? Too much hassle. I hope the FT’s new system moves from the “deep” to the daylight. I hope the FT monetizes successfully its content. I hope that I will be able to play in the World Cup, but I am a realist and recognize that hope not mean accomplishment. If you are cheerleading for a dead tree outfit that once owned a wax museum, read the Guardian’s “Financial Times Launches Business-Focused Deep Search Service” here by Kevin Anderson. The article included a useful description of what the FT hopes to do with indexing:
The service allows users to search easily by news topic, organisation, person, place or theme. If a user searches for stories about business in China, the search can quickly be refined to cities in China, showing stories about Beijing, Shanghai or Hubei. Greenleaf described this as a “know before you click” model so that users can see related topics and the number of stories available for each sub-topic. In addition to automatic tagging, Newssift editors have also added other relationships to the service relevant to their business audience so that if someone looks for news about Ford Motor Company, they can also see related content from Ford suppliers.
This type of metatagging is useful, but it is computationally and human intensive. But the main difference between this most recent try in FT’s quest to develop an online service that makes up for the precipitous loss of revenue from its traditional dead tree business is the economy. Too late. I wish the FT team success, but I don’t think this most recent service will deliver the cash needed to get the ship squared away for even rougher seas ahead. Red ink ahead in my opinion.
Stephen Arnold, March 20, 2009
Marc Krellenstein Interview: Inside Lucid Imagination
March 17, 2009
Open source search is gaining more and more attention. Marc Krellenstein, one of the founders of Lucid Imagination, a search and services firm, talked about the company’s technology with Stephen E. Arnold, Mr. Krellenstein was the innovator behind Northern Light’s search technology, and he served as the chief technical officer for Reed Elsevier, where he was responsible for search.
In an exclusive interview, Mr. Krellenstein said:
I started Lucid in August, 2007 together with three key Lucene/Solr core developers – Erik Hatcher, Grant Ingersoll and Yonik Seeley – and with the advice and support of Doug Cutting, the creator of Lucene, because I thought Lucene/Solr was the best search technology I’d seen. However, it lacked a real company that could provide the commercial-grade support and other services needed to realize its potential to be the most used search software (which is what you’d expect of software that is both the best core technology and free). I also wanted to continue to innovate in search, and believed it is easier and more productive to do so if you start with a high quality, open source engine and a large, active community of developers.
Mr. Krellenstein’s technical team gives the company solid open source DNA. With financial pressures increasing and many organizations expressing dissatisfaction with mainstream search solutions, Lucid Imagination may be poised to enjoy rapid growth.
Mr. Krelllenstein added:
I think most search companies that fail do so because they don’t offer decisively better and affordable software than the competition and/or can’t provide high quality support and other services. We aim to provide both and believe we are already working with the best and most affordable software. Our revenue comes not only from services such as training but also from support contracts and from value-add software that makes deploying Lucene/Solr applications easier and makes the applications better.
You can read the full text of the interview on the Web site here. Search Wizards Speak is a collection of 36 candid interviews with movers and shakers in search, content processing, and business intelligence. Instead of reading what consultants say about a company’s technology, read what the people who developed the search and content processing systems say about their systems. Interviews may be reprinted and distributed without charge. Attribution and a back link to and the company whose executive is featured in the interview are required. Stephen E. Arnold provides these interviews as a service to those interested in information retrieval.
Stephen Arnold, March 17, 2009