Coveo: Email Face Off in Canada

April 15, 2008

Coveo, a vendor of search and content processing technology, rolled out a limited release of Coveo G2Bâ„¢ for Email.

Based on a preview glimpsed by ArnoldIT’s above-the-radar, spy goose, Coveo is perhaps the only vendor with an email search application able to deliver unified search and navigation across both live and archived email. In the world of email, the term “archived” means message stores running on such vendors’ repositories as Microsoft Exchange and Symantec Enterprise Vault, among others. The search function can be used from any connected desktop, or Windows Mobile or BlackBerry mobile device. Even Beyond Search’s spy goose queried email from his lowly Treo 650 with near zero latency.

“Beyond Search” was able to look at a new email and search for a referenced attachment in an email no longer on the Treo 650. The spy goose concluded that this functionality is a long over due function and will be a welcome service for anyone who relies on email.

Coveo’s founder Laurent Simoneau said:

Businesses today cannot properly leverage their email content for critical decisions…and desktop search applications simply do not scale to meet the needs of today’s mobile workers. Coveo G2B for Email benefits businesses tremendously by enabling their employees to access instantly and easily the information they need most, whether they are in the office or on the road.

You can read an exclusive interview with Mr. Simoneau here.

Coveo’s system appears to challenge Waterloo, Ontario-based Research in Motion in email search and content processing. RIM’s been able to expand its revenues due to new handset margins. In the view of ArnoldIT.com, RIM’s software relies on an aging architecture which is likely to become increasingly problematic as RIM expands its consumer user base.

Coveo’s approach uses a fresh approach. Based on Beyond Search’s probe into the product, Coveo is now playing with a man advantage in this face off between Canadian technology rivals. You can find more information on the Coveo Web site.

Stephen Arnold, April 15, 2008

Arnold’s New Study “Beyond Search” Now Available

April 15, 2008

Stephen E. Arnold’s most recent study–Beyond Search: What to Do When Your Enterprise Search System Doesn’t Work–is now available from the Gilbane Group. The 270-page study contains practical information about fixing problems with an existing behind-the-firewall search system, a market analysis and vendor road map, profiles of 24 vendors of behind-the-firewall search and content processing systems, and a glossary.

The three key findings from the year-long research behind the book are that user dissatisfaction with incumbent search systems is increasing. The need to deploy a system that meets increasingly savvy users’ needs is rising sharply.

Mr. Arnold also says that Google’s dataspace technology–largely unknown by search vendors and not yet deployed by Google–could reshape enterprise search in a very short time if Google makes it available. Google is keeping quiet about the dataspace technology acquired when Google purchased Transformic, Inc. in 2006. He said, “Few outside of Google know about dataspaces, and the technology offers one way to deliver new ty8pes of query functionality so users can know how certain a particular result is to be accurate and to determine the lineage of a particular result.” He added, “The world is starting to think about BigTable, but dataspaces are a quantum leap beyond the functionality of BigTable, which is in itself a quatum leap beyond relational database technology. Google’s engineering and technical prowess are its chief competitive advantage.”

Finally, Mr. Arnold’s research reveals that remarkable new, extremeley useful technologies are being develoiped outside the US. Mr. Arnold says, “There’s a perception that innovation in search only arises in the United States. That’s simply not true. Non-U.S. vendors like Exalead and ISYS Search Software are making strong thrusts into the North American market. Others are opening offices in the U.S. and will increase the competititve heat for many of the best-known search vendors.

Martin White, noted British search and content expert, said about Beyond Search, “a fabulous job on the book and the industry, and the CIO fraternity, should be very grateful that Mr. Arnold found the energy to write it.”

You can order the study from the Gilbane Group. Selected quotations from the study appear on Mr. Arnold’s Web site, ArnoldIT.com. An abbreviated table of contents is available on that site as well.

Stuart Schram, April 15, 2008

Google’s Janitors: Clean Up Crew Ready for a Clean Sweep

April 15, 2008

At my Buying & Selling eContent keynote this morning, I discussed briefly Google’s invention of “janitors”. You can get the full text of the patent from the USPTO site. Search for US20070198481, “Automatic Object Reference Identification and Linking in a Browseable Fact Repository.” The inventors are Andrew Hogue and Jonathan Betz, Google, Inc.

The patent is of keen interest to me. It makes use of functions that Google is now making available via its App Engine service, among others. My suggestion is that you read about the App Engine and then look at US20070198481. If you have read about Google’s Programmable Search Engine, you may see linkages among these inventions that the individual patent documents do not make explicit. Google is not hiding any of these technologies, just using its infrastructure in fresh, intriguing ways. Keep in mind that a patent document is not a product. I believe it is useful to look at open source information in order to keep a finger on the pulse of a company’s innovation heart beat.

Figure from US20070198481

Now look at this illustration, which I used in my keynote. I want to direct your attention to two things. First, the query generates a report about the topic, in this case, the named entity “Michael Jackson”. Second, this result is not a hit list; it is a report. If my research for my new Gilbane Group study Beyond Search is accurate, Google’s US20070198481 seems to address some of the problems that users experience when confronted with results lists.

You will need to draw your own conclusions about this type of automated report generation. Google is not just in step with what user wants, the company appears to possess technology that makes it possible for the GOOG to jump into professional publishing, expand its reach as a business intelligence tool, and make users happy who want a distillation, not a laundry list of results.

Stephen Arnold, April 15, 2008

Search, “No Problem”; Explaining the Value of IT, “Problem”

April 15, 2008

Gartner, the IT consulting giant, exposed its list of the major information technology challenges. ZDNet UK points to Silicon.com’s summary in a post titled “Seven IT Challenges to Change the World”.

Enterprise search–indeed search of any type–is not on the list. Please, check out this list of seven items before it becomes unfindable. As you scan the seven items, think about number seven: Developing clear indicators to spell out the financial benefits of IT investment to business.

Presumably once we revolutionize IT with self-charging devices and automated coding, we will have a way to explain in dollars and cents the value of information technology. For organizations struggling with search and retrieval, good news. You will be able to find needed information before you can explain the value of IT to your colleagues, peers, and superiors.

Stephen Arnold, April 15, 2008

Autonomy Aces Its Rivals Once More

April 14, 2008

Autonomy Information Governance is, according to the company, “the industry’s first information governance platform that automates real-time policy management based on forming a conceptual and contextual understanding of all enterprise information.”

The value of this functionality is that risks inherent in information can be reduced by applying policy based on understanding what an email, document or phone recording says instead of relying solely on its metadata. You can read the full news announcement here.

Autonomy has consistently beaten its rivals in defining search markets and niches. The company’s “portal in a box” promotion remains a high-water mark in search salesmanship. Most of its rivals follow
in Autonomy’s marketing wake. Autonomy’s management has a knack for anticipating opportunities and differentiating its offerings from other vendors’ products. Kudos to Autonomy marketing… again.

Stephen Arnold, April 14, 2008

Groovie Info about Google and Data Management

April 14, 2008

Groovie.org has an essay that does a very good job of explaining what the Google App Engine is and is not. If you are a Google watcher, click here for the information.

The analysis of Google App Engine is one of the more informed reviews of this Google innovation.

Most of the pundits overlook the fact that relational databases, despite their usefulness, pose some cost challenges that only those with deep pockets can resolved.

The difference between data management and database is significant.

See also: http://highscalability.com/google-appengine-second-look

Stephen Arnold, April 14, 2008

GOOG to SFRD: Push Them Back, Push Them Back, Way Back!

April 14, 2008

One of the worst kept secrets is that Salesforce is supporting Google’s various enterprise applications. Newsfactor’s discussion “Google Gearing Up for the Enterprise” is a good place to start reading about this blog-tacular event.

The tie up is not new; it is an extension of Google’s cheer leading for Salesforce.com’s approach to the enterprise. Salesforce.com’s marketing angle pivots on a solid anti-Microsoft block of rhetoric. Google is more indirect, even gentler about Microsoft’s dominance. Furthermore, Google has been talking with Salesforce.com for years, and the most recent “development” is an extension of that relationship. Keep in mind that Google is not acquiring Salesforce.com, at least not yet.

Salesforce.com needs a way to work around some of its architectural issues. Like Amazon, there’s razzle-dazzle needed to deliver cloud-based services. Salesforce.com’s multi-tenancy inventions provide some punch that other companies don’t–as yet–have.

The Google Apps allow Salesforce.com to crank its anti-Microsoft marketing engine, and–perhaps more significantly–allows Google to [a] get more information about the traction its products and services have in the enterprise, [b] learn more about the upside and downside of Salesforce.com as a revenue generator, and [c] observe Microsoft’s reaction. How much does this cost Google? Based on the information available to me, the deal costs Google little, and it delivers a significant “intelligence” upside. Microsoft has shown a strong knee-jerk reaction to Google’s activities, and this deal may be another way to agitate Microsoft’s senior executives.

The big question is, “If this Salesforce.com relationship starts to put wood behind Google’s enterprise efforts, will Google buy Salesforce.com?” On the surface, there are some easy benefits to both Google and Salesforce.com. But there are some significant downsides as well; namely, the somewhat fragile nature of the Salesforce.com “plumbing” that a tradtional relational database at its core. I’ve been told that Salesforce.com jumped on the Oracle database when it opened for business. That database has been good and bad. The good is that it can be reliable. The bad is that Salesforce.com has had to do many clever things to avoid choking that database with transactions the Salesforce.com multi-tenant approach; that is, many customers with separate, “virtual” databases. Salesforce.com’s engineers have figured out how to deliver near-real time updates without bringing down the multi-tenant database platform.

Maybe Google will learn enough from this deal, stop cheerleading Salesforce.com from the sidelines, and buy the entire team? Salesforce.com would benefit from more substantive Google engineering. To date, that’s a sideline Google has not chosen to step over.

Stephen Arnold, April 14, 2008

Bitext’s Antonio S. Valderrábanos Interviewed

April 14, 2008

You may not be familiar with Bitext, a search and content processing vendor specializing in natural language processing or NLP. The company has found an appetite for its technology in Spain and in other European countries. The company recently landed a deal to provide search and content processing technology to support a new citizen-facing information service in Spain. Dubbed Red 060, this system will be similar to the US government’s service, USA.gov. The company also is working with US search vendor dtSearch.

Antonio Valderrábanos, founder of Bitext in Madrid, Spain, told Beyond Search:

Our goal is to complement search engines, giving them the ability to handle text according to its content, rather than its form as it happens in most applications, including search engines. We are interested in all forms of search, including search in databases or Geographical Information Systems.

Unlike some vendors, the Bitext system meshes with other vendors’ systems, adding important new functionality. Mr. Valderrábanos told Beyond Search:

Our approach is to say, “Okay, you have a perfectly good key word indexing system. We add value to that system in ways that make users happier and without getting rid of the system in which you have invested significant time and money.” We integrate, complement, turbo-charge.

Bitext is working on important enhancements to the company’s content processing functions, including entity extraction. Entity extraction identifies people, places, events, and certain numerical data in a source document.

Looking farther into the future, Bitext engineers are working on new ways to make access easy and intuitive. Mr. Valderrábanos observed:

I think the future will want one single interface to different information sources, whether documents or databases or some combination of data from many different systems. be them docs or databases or hybrid.

Of course, the interface will be natural language, the simplest most effective way of communicating for human. We will certainly not want to bother with different applications and formal languages–so no key word queries, Boolean statement, SQL strings, or forms. People want to get the information they need without hurdles.

The full interview with Mr. Valderrábanos appears on the ArnoldIT.com Web site as part of the “Search Wizards Speak” series. You can learn more about Bitext’s line of products on the Bitext Web site.

Stephen Arnold, April 14, 2008

Digital Dodos: Fed Web Site Archives

April 13, 2008

Computerworld‘s Heather Havenstein wrote a story on April 11, 2008 “Agency Under Fire for Decision Not to Save Federal Web Content”. Please, read it before it goes into the digital never-never land of Computerworld stories, thus becoming almost impossible to find without real sleuthing.

The key point in the story was for me:

NARA, which until this year had collected a “harvests” of federal Web sites at the end of presidential and congressional terms, said in a recent memo that it would discontinue the practice at the end of George W. Bush’s presidency.

NARA for the acronym-challenged is the National Archives and Records Administration. This Federal entity is supposed to keep a copy of government information. Now, government information is slippery, and it is very difficult to put it in one location.

In year 2000, I was one of the lucky dweebs involved in the US Federal government’s citizen-facing portal, now called USA.gov. As part of that project, Inktomi indexed more than 20,000 public facing Web servers and made the information searchable. I thought indexing Federal Web sites would be a piece of cake. Boy, was I wrong.

A Search Puzzle with Hundreds of Pieces

Just take a gander at the Government Printing Office catalog and then do a bit of poking into the Web sites of the Department of Energy, and you won’t find much overlap for big printed reports and studies. For even more government fun, run a query on DEO for “ECCS”. You will get zero results. Now run the query on www.usa.gov, and you get hits to a nuclear power plant’s “nuclear core cooling system”. Related information is not in a single place, and there are different filters in place on different agencies’ Web sites. In short, the job of NARA is gather the information in one place for research or crazed attorneys. There are overlapping jurisdictions, of course. It’s murky water. Few know who is responsible for what information at what point in time.

The same wacky situation plagues the Library of Congress, the library in the US Senate, and the two dozen executive branch agencies. I don’t even want to think about figuring out the information on the public and not-so-public Web sites operated by various intelligence, military, and quasi-government entities. (Remember, I struggled with this information landscape until I threw in the digital towel in 2006.)

You will have to form your own opinion about what information should be gathered by whom. I only know that trying to figure out which agency has what information is no trivial job. With NARA seemingly giving up and other Federal entities grabbing different parts of the information elephants, there may be no solution. Alexa and the Internet Archive have tried / are trying to do the work, but over the years, I’m less and less confident with those efforts.

Microsoft indexes some Federal content as part of its contract for USA.gov with Vivisimo, but that’s a hit and miss index based on my tests. Microsoft asserts that it has more than 30 billion Web pages in its index, but my tests don’t back up that claim. Microsoft is struggling to make resources available for its various initiatives, and I think the index of Federal government content is not at the top of that list. Google indexes a cart load of government information, including a decent job of a number of states’ content.

Let Google Do It

I’m all for letting the GOOG index the Federal govrenment, store the data in the Googleplex, and call it a day. At least I would know where to look for my “emergency core cooling system” documents and the report I did in 1991 about Japan’s investments in high-speed network technology. Under the present system, the information is essentially unfindable with public-facing systems.

If you know a specific item exists, it can be almost impossible to find it on any public index. In my experience, you have to able to log in to the agency’s network and go data spelunking, find a version of the document, and then gather up the different instances of the document to figure out which is the “official” one. Just when you think you have what you need, someone asks, “Did you check the Lotus Notes’ repository? I think there are some modifications in those files too.” So, it’s back to the old data cave for more exploration in the dark. My miner’s light burned out, and I won’t go into the dark any more.

Stephen Arnold, April 13, 2008

Interse: Danish Search Vendor Opens a US Office

April 12, 2008

Chances are you have not heard of Interse A/S and its iBox technology for SharePoint. The company’s headquarters are in Copenhagen, Denmark, and the firm has recently opened a US office at 3200 Whitehaven Street NW in Washington, DC.

iBox is a Windows-centric component that adds metadata modeling and classification to an incumbent search solution. I did not include this company in the 24 profiles that make up the bulk of my new Beyond Search: What to Do When Your Enterprise Search System Doesn’t Work study for the Gilbane Group. I did get a demo before the company opened its US offices. You may want to take a look at the Interse approach if you are struggling with a SharePoint search issue.

You can find more information on the company’s Web site at www.interse.com. If you want a demo, you will need to register (an increasingly frequent vendor practice that gets in the way of learning about a system). Give the company a jingle at 202 797 5350.

Stephen Arnold, April 11, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta