Connotate’s Agent Approach Explained

March 25, 2009

The Connotate Web log here offers up a transcript of Bruce Molloy’s explanation of the firm’s software agent approach. The venue was a podcast interview with an interlocutor named Mike Lippis of the Outlook Series. You can find the transcript here. The information is useful, but the best way to read it is by scrolling to the end of the post where Part One is located and then reading upwards to Part Five. For me, the most interesting comment in the transcript was:

I’ll give you 2 or 3 ways that is realized through the simple design of our software. One is Agent creation. If you have someone who’s working in business intelligence or research or an analyst or someone who wants to do price comparisons and that person wants to monitor certain, say, prices or developments or products from a competitors’ site they can very quickly and easily paint the screen, if you will, create an Agent and have that agent then available to monitor over time, every day, or every hour, every minute kind of, what’s going on. Secondly, in terms of the Library and this is, there’s a real multiplier effect here in terms of the Library. As you get people starting to share the Agents, those Agents come to represent really best practices, best ways to get information delivered, to look at it, to compare it, to mash it and as such it’s a repository of expertise that is then shared and multiplied in the organization. And lastly, in terms of output just because it’s so easy to get this output because it’s so well personalized it becomes a solution that individuals, non-technical folks in an organization can use without having to go to IT and get into a long development cycle, if it’s even possible.

A social spin on creating and sharing intelligent software. Interesting idea in my opinion.

Stephen Arnold, March 23, 2009

Google Interview Worth Reading

March 25, 2009

The interview with Alfred Spector in ComputerWorld is interesting for what it says and what it omits. You can find the article “The Grill: Google’s Alfred Spector on the Hot Seat” here. This is a three part interview. Mr. Spector is billed as Google’s vice president of research. For me, the most interesting comment was:

Do you have plans to go after that huge body of information on the Internet that is not currently searched? There is stuff on the Web, the so-called Deep Web, that is only “materialized” when a particular query is given by filling fields in a form. Since crawlers only follow HTML links, they cannot get to that “hidden” content. We have developed technologies to enable the Google crawler to get content behind forms and therefore expose it to our users. In general, this kind of Deep Web tends to be tabular in nature. It covers a very broad set of topics. It’s a challenge, but we’ve made progress.

I would hope so. Google has Drs. Guha and Halevy chugging away or had them chugging away on this problem. Furthermore, Google bought Transformics, a company that most of the Google pundits have paid scant attention to. Yep, Googzilla is making progress. Just plonking along with the fellow who worked on the semantic Web standards and the chap who invented the information manifold. I enjoy Google understatement.

Stephen Arnold, March 24, 2009

ATT on Social Networking Impacts

March 25, 2009

AT&T teamed with an azure chip consultancy Early Strategies Consulting. The white paper is eight pages of information about “The Business Impacts of Social Networking”. You can find a copy of this document by clicking the link here. The newly and partially reassembled Ma Bell has a tendency to move content around. If this link doesn’t work, just buzz AT&T customer service. The operators will be delighted to help you.

What’s the point of the white paper? According to the executive summary:

Social networking fosters collective intelligence, collaborative work and support communities. Tools and behaviors from the consumer world are now making the transition to the corporate world, with diverse implications for changing the way businesses operate. This paper explores 10 opportunities presented by social networking, along with 10 associated challenges.

My hunch is that the paper is designed to generate revenue.

For me, the most interesting part was the diagram of the organization chart of the future. The idea is that the traditional top down organization will have social networks embedded within them.


A close second was the vocabulary of the document. I enjoyed this blend of Ma Bell and MBA speak. Give it a read then send the document around and up the organization chart of the future.

Stephen Arnold, Marcy 25, 2009

ISYS Search Software: Google Patent Collection

March 24, 2009

You will want to take a look at the ISYS Search Software demonstration here. The company took my collection of Google patent documents from 1998 to December 2008 and processed them. You can run a key word query, click on the names of people, and explore this window into Google’s technology hot house via the ISYS Search Version 9. When you locate a patent document that interests you, a single click will display the PDF of the patent document. You can browse the drawings and claims with the versatile ISYS system at your beck and call.

I have used the ISYS Search Software since Version 3.0. The system delivers high speed document processing, high speed query processing, and a raft of features. For more information about ISYS Version 9, click here. I have been critical of search systems for more than two decades. ISYS Search Software engineers’ have listened to me, and I know from experience that the team in Crow’s Nest and in Denver have a long term commitment to their customers and implementing useful features with each release.

Highly recommended. More information about ISYS Search Software is at

Stephen Arnold, March 24, 2009

Newssift Technical Plumbing

March 23, 2009

Thanks to the readers who sent me information about the new “test version” of the Financial Times’s news service. I hope it revives the Financial Times as an online financial news source. I know that the FT has a solid brand and great potential.

Some of the information about vendors pointed back to TechCrunch; other readers just made statements which I will pass along for additional comment / correction. Here’s the line up:

  • Endeca–the guided navigation company
  • Nstein–content management (started life as a content processing company but changed and now reports record revenues)
  • Lexalytics–the new entity formed with the merger / fusion of Lexalytics (sentiment analysis) and Infonic (information management)
  • ReelTwo–search, data analysis, and “custom portals”.

My take on this use of multiple technologies:

First, the Financial Times’s beta makes clear that no single search and content processing system can meet the needs of a client like the Financial Times.

Second, the Financial Times implemented a try try try strategy before taking a clear sheet of paper and figuring out how to make its content more accessible to its target user group. I don’t think I can estimate the cost of the present system because it makes clear that earlier efforts at search failed. Those “sunk” and “opportunity” costs are wiped away, but a full accounting of the total cost of making FT information available to its users is more than today’s chief financial officer wants to put on his / her books for an ROI calculation. The same multi year investment in search plagues another European publishing company as well. The problem is not unique to the FT, and that’s important. The shift from traditional publishing business models and methods to Internet models is neither easy nor obvious.


The main splash page with a welcome screen that obscures the news I wanted to view.

Third, who is the intended user? The site offers a number of powerful functions. The folks who want these types of online operations may already have them available without charge from such places as,, or (hold your breath) American Online here. As a side note, the AOL service (linked to via Google Finance) runs on the potent Relegence platform here.

aol page

The America Online splash page for business and financial information. Note: the information is not obscured by a pop up greeting. Remember. This is the deeply challenged America Online and it is handling business information with its own technology, not a collection of four discrete systems.

Fourth, I think the FT is late to the party. Maybe too late? The company has knuckled down to create Newssift, but the window of time for making big traffic gains has closed. Financial information and analytic tools are available to investors with online brokers such as Fidelity and TDWaterhouse. Business news is available from high traffic outfits like Yahoo News and lower profile services such as Newsflashr.

Fifth, one wonders if the price tag for integrating the various technologies has been tallied. What happens when one of the three or four vendors makes a change? I don’t have sufficient data to estimate these costs. Perhaps the costs are trivial? Somehow I doubt it.

In short, the FT is trying again. Like other companies shifting from dead tree business models to the crunchier online variety of business model, the timing is not optimal. I wish the FT and its vendors good luck  and fair weather. My weather charts predict stormy seas ahead followed by a flood of red ink rushing from different points on the compass.

Stephen Arnold, March 23, 2009

Evri: Semantic Smack Down

March 21, 2009

I don’t know much about Evri. Semantic technologies intrigued me a few years ago, but the shift is toward real time content processing. Semantics are important but in my mind plumbing that operates as a contributory component.  I did write about the company’s deal with the Washington Post here. The Washington Post needs every (no pun intended) advantage it can get. Ad revenues are down. The Treasury is printing money like one of those fake countries in South American pot boilers. Even upscale restaurants’ business is down in the ultimate Power Lunch town.

I was surprised, maybe shocked, that Evri was shedding staff. Venture Beat here published “Semantic Search Engine Evri Cuts Staff by 25 Percent.” My impression was that Evri was going like a Harrod’s Creek mine worker on his way to the local watering hole. The most interesting comment in the article was:

Even so, Roseman [the president] says the company is pleased with its traction and progress, drawing more than 20 million monthly users.

Google AdSense on 20 million uniques should generate big money for Evri if properly monetized courtesy of Mother Google. Plus, Evri  has received about $8 million from a Seattle investor. With strong uptake and big traffic, I wonder if staff cutbacks are a sign of the times or a signal that semantic search may be suffering in a down market for publishers such as those Evri has nailed as customers.

Stephen Arnold, March 21, 2009

Financial Times: Try, Try, Try

March 20, 2009

Flashback. year 2005. I was a paying subscriber. I got a user name and a password. I logged on. Ran a query and the system timed out. Flash forward to 2007. licenses Fast Search & Transfer. I tested the system. Slow. I was asked to test a semantic system under consideration by the Financial Times. Useful but slow, slow, slow. Now the Financial Times has tapped another point and click vendor for a “deep” search experience. Time out. The Financial Times, arguably one of the two bigger franchises in business information, has been a laggard in online search for quite a while. The FT’s parent owns a chunk of the Economist, another blue chip in business information. I was a subscriber to * both * the print and online editions until late 2007. Why did I drop these must read news sources? Too much hassle. I hope the FT’s new system moves from the “deep” to the daylight. I hope the FT monetizes successfully its content. I hope that I will be able to play in the World Cup, but I am a realist and recognize that hope not mean accomplishment. If you are cheerleading for a dead tree outfit that once owned a wax museum, read the Guardian’s “Financial Times Launches Business-Focused Deep Search Service” here by Kevin Anderson. The article included a useful description of what the FT hopes to do with indexing:

The service allows users to search easily by news topic, organisation, person, place or theme. If a user searches for stories about business in China, the search can quickly be refined to cities in China, showing stories about Beijing, Shanghai or Hubei. Greenleaf described this as a “know before you click” model so that users can see related topics and the number of stories available for each sub-topic. In addition to automatic tagging, Newssift editors have also added other relationships to the service relevant to their business audience so that if someone looks for news about Ford Motor Company, they can also see related content from Ford suppliers.

This type of metatagging is useful, but it is computationally and human intensive. But the main difference between this most recent try in FT’s quest to develop an online service that makes up for the precipitous loss of revenue from its traditional dead tree business is the economy. Too late. I wish the FT team success, but I don’t think this most recent service will deliver the cash needed to get the ship squared away for even rougher seas ahead. Red ink ahead in my opinion.

Stephen Arnold, March 20, 2009

Marc Krellenstein Interview: Inside Lucid Imagination

March 17, 2009

Open source search is gaining more and more attention. Marc Krellenstein, one of the founders of Lucid Imagination, a search and services firm, talked about the company’s technology with Stephen E. Arnold, Mr. Krellenstein was the innovator behind Northern Light’s search technology, and he served as the chief technical officer for Reed Elsevier, where he was responsible for search.

In an exclusive interview, Mr. Krellenstein said:

I started Lucid in August, 2007 together with three key Lucene/Solr core developers – Erik Hatcher, Grant Ingersoll and Yonik Seeley – and with the advice and support of Doug Cutting, the creator of Lucene, because I thought Lucene/Solr was the best search technology I’d seen. However, it lacked a real company that could provide the commercial-grade support and other services needed to realize its potential to be the most used search software (which is what you’d expect of software that is both the best core technology and free). I also wanted to continue to innovate in search, and believed it is easier and more productive to do so if you start with a high quality, open source engine and a large, active community of developers.

Mr. Krellenstein’s technical team gives the company solid open source DNA. With financial pressures increasing and many organizations expressing dissatisfaction with mainstream search solutions, Lucid Imagination may be poised to enjoy rapid growth.

Mr. Krelllenstein added:

I think most search companies that fail do so because they don’t offer decisively better and affordable software than the competition and/or can’t provide high quality support and other services. We aim to provide both and believe we are already working with the best and most affordable software. Our revenue comes not only from services such as training but also from support contracts and from value-add software that makes deploying Lucene/Solr applications easier and makes the applications better.

You can read the full text of the interview on the Web site here. Search Wizards Speak is a collection of 36 candid interviews with movers and shakers in search, content processing, and business intelligence. Instead of reading what consultants say about a company’s technology, read what the people who developed the search and content processing systems say about their systems. Interviews may be reprinted and distributed without charge. Attribution and a back link to and the company whose executive is featured in the interview are required. Stephen E. Arnold provides these interviews as a service to those interested in information retrieval.

Stephen Arnold, March 17, 2009

Voice Web Sites: New Frontier for Search

March 16, 2009

The Economic Times (India) reported that IBM has developed a technology for voice only Web sites. The story “IBM Develops a Technology That Will Allow Users to Talk to Web” here reported:

“People will talk to the web and the web will respond. The research technology is analogous to the Internet. Unlike personal computers it will work on mobile phones where people can simply create their voice sites,” IBM India Research Laboratory Associate Director Manish Gupta said.

The notion of a spoken Web in interesting. The question I have is, “What technology will one use to search these sites?” I find that as I age, certain frequencies become difficult for me to hear and certain speech patterns become unparseable for me. Has IBM a breakthrough technology to address the challenges of searching voice only Web sites?

Stephen Arnold, March 15, 2009

EveryZing: Exclusive Interview with Tom Wilde, CEO

March 16, 2009

Tom Wilde, CEO of EveryZing, will be one of the speakers at the April 2009 Boston Search Engine Meeting. To meet innovators like Mr. Wilde, click here and reserve your space. Unlike “boat show” conferences that thrive on walk in gawkers, the Boston Search Engine Meeting is content muscle. Click here to reserve your spot.

EveryZing here is a “universal search and video SEO (vSEO) firm, and it recently launched MediaCloud, the Internet’s first cloud-based computing service for generating and managing metadata. Considered the “currency” of multimedia content, metadata includes the speech transcripts, time-stamped tags, categories/topics, named entities, geo-location and tagged thumbnails that comprise the backbone of the interactive web.

With MediaCloud, companies across the Web can post live or archived feeds of video, audio, image and text content to the cloud-based service and receive back a rich set of metadata.  Prior to MediaCloud and the other solutions in EveryZing’s product suite — including ezSEARCH, ezSEO, MetaPlayer and RAMP — discovery and publishing of multimedia content had been restricted to the indexing of just titles and tags.  Delivered in a software-as-a-service package, MediaCloud requires no software to purchase, install or maintain.  Furthermore, customers only pay for the processing they need, while obtaining access to a service that has virtually unlimited scalability to handle even large content collections in near real-time. The company’s core intellectual property and capabilities include speech-to-text technology and natural language processing.

Harry Collier (Infonortics Ltd) and I spoke with Mr. Wilde on March 12, 2009. The full text of our interview with him appears below.

Will you describe briefly your company and its search / content processing technology?

EveryZing originally spun out of BBN technologies in Cambridge MA.  BBN was truly one of the godfathers of the Internet, and developed the email @ protocol among other breakthroughs.  Over the last 20 years, the US Government has spent approximately $100MM with BBN on speech-to-text and natural language processing technologies.  These technologies were spun out in 2006 and EveryZing was formed.  EveryZing has developed a unique Media Merchandising Engine which is able to connect audio and video content across the web with the search economy.  By generating high quality metadata from audio and video clips, processing it with our NLP technology to automatically “tag” the content, and pushing it through our turnkey publishing system, we are able to make this content discoverable across the major search engines.

What are the three major challenges you see in search / content processing in 2009?

Indexing and discovery of audio and video content in search; 2) Deriving structured data from unstructured content; 3) Creating better user experiences for search & navigation.

What is your approach to problem solving in search and content processing?

Well, yes, meaning that all three are critical.  However, the key is to start with the user expectation.  Users expect to be able to find all relevant content for a given key term from a single search box.  This is generally known as “universal search”.  This requires then that all content formats can be easily indexed by the search engines, be they web search engines like Google or Yahoo, as well as site  search engines.  Further, users want to be able to alternately search and browse content at will.  These user expectations drive how we have developed and deployed our products.  First, we have the best audio and video content processing in the world.  This enables us to richly markup these files and make them far more searchable.  Second, our ability to auto-tag the content makes it eminently more browsable.  Third, developing a video search result page that behaves just like a text result page (i.e. keyword in context, sortability, relevance tuning) means users can more easily navigate large video results.  Finally, plumbing our meta data through the video player means users can search within videos and jump-to the precise points in these videos that are relevant to their interests.  Combining all of the efforts together means we can deliver a great user experience, which in turn means more engagement and consumption for our publishing partners.

Search / content processing systems have been integrated into such diverse functions as business intelligence and customer support. Do you see search / content processing becoming increasingly integrated
into enterprise applications?

Yes, absolutely.  Enterprises are facing a growing pile of structured and unstructured content, as well as an explosion in multimedia content with the advent of telepresence, Webex, videoconferencing, distance learning etc.  At the same time, they face increasing requirements around discovery and compliance that requires them to be able to index all of this content.  Search is rapidly gaining  the same stature as databases and document management systems as core platforms.

Microsoft acquired Fast Search & Transfer. SAS acquired Teragram. Autonomy acquired Interwoven and Zantaz. In your opinion, will this consolidation create opportunities or shut doors?

Major companies are increasingly looking to vendors with deep pockets and bench strength around support and R&D.  This has driven some rapid market consolidation.  However, these firms are unlikely to be the innovators, and will continue to make acquisitions to broaden their offerings.  There is also a requirement to more deeply integrate search into the broader enterprise IT footprint, and this is also driving acquisitions.

Multi core processors provide significant performance boosts. But search / content processing often faces bottlenecks and latency in indexing and query processing. What’s your view on the performance of
your system or systems with which you are familiar?

Yes, CPU power has directly benefited search applications.  In the case of EveryZing, our cloud architecture takes advantage of quad-core computing so we can deliver triple threaded processing on each box.  This enables us to create multiple quality of service tiers so we can optimize our system for latency or throughput, and do it on a customer by customer basis.  This wouldn’t be possible without advances in computing power.

Graphical interfaces and portals (now called composite applications) are making a comeback. Semantic technology can make point and click interfaces more useful. What other uses of semantic technology do you see gaining significance in 2009?

Semantic analysis is core to our offering.  Every clip we process is run through our NLP platform, which automatically extracts tags and key concepts.  One of the great struggles publishers face today is having the resources to adequately tag and title all of their video assets.  They are certainly aware of the importance of doing this, but are seeking more scalable approaches.  Our system can use both a unsupervised and supervised approach to tagging content for customers.

Where can I find more information about your products, services, and research?

Our Web site is

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta