Former Clandestine Operative Says Automated Systems Not Good Enough

May 13, 2008

Editor’s Note: Robert Steele, former Marine Corp. officer and intelligence operative, was one of the first, if not the first, intelligence professional since World War II to question the relative value of secret sources and technologies in relation to open sources and technologies. Mr. Steele agreed to meet me near his office in suburban Washington, D.C. The full text of the interview appears below. After we spoke, Mr. Steele provided me with illustrations he referenced in our conversation. I have included these in the transcript at the point where Mr. Steele references them. You can read more about Mr. Steele at his Web site, OSS.Net.

How did you get interested in using information that’s readily available to anyone in a library, in newspapers, and online as a source of useful intelligence?

I went into the international spy program at CIA with a Master’s in International Relations, and knew quite a bit about citation analysis and primary research. What I was not expecting over the course of my clandestine career was the obsession with stealing secrets to the exclusion of all that could be known from open sources.

steele

Robert D. Steele

The clandestine officers also refused to interact with the analysts—before leaving for my first overseas assignment, the Chief of Station took me to the analysis side of the house, and on my way there he said something along the lines of “these folks know nothing useful, and we tell them nothing.”

When the Marine Corps asked me to leave CIA to create the Marine Corps Intelligence Center in 1988, I promptly did what I thought the government wanted; that is, I spent $20 million on a codeword analysis center, including a Special Intelligence Communications (SPINTCOM) work station. I thought it would do everything except kill the terrorist.

Was I in for a shock. I had put a PC with Internet access in an isolated room, not connected to any government network. The PC had a modem. I was curious about online and bulletin board systems. In a short time, analysts were leaving their super charged workstations to stand in line to use the PC. These professionals were looking for information that was not in the government system and not known to our officers in the field (including diplomats and commercial or defense attaches).

What a wake up call.

That is when I learned that expensive systems are as good as their sources—narrow casting into the secret world made much of our multi-billion dollar technology virtually worthless. Analysts using the PC showed me that 80 to 90 percent of the information we needed could be obtained using the PC and public information to include direct calls to overt human experts. I also learned that useful information was available in 183 other languages no one in the US Government can speak or understand. Even today, a large number of Washington officials don’t understand the intelligence value of open sources of information including commercial imagery, foreign-language broadcasts that must be accessed locally, and gray literature, such as university yearbooks for a photo of a terrorist. Washington is completely out of touch with human experts that are not US citizens eligible for a secret clearance—the spies don’t want them unless they agree to commit treason, and the analysts are not allowed to talk to them by paranoid ignorant security officials.

Almost every vendor asserts that their systems can “do” business or competitive intelligence. In your experience is this accurate?

Look. BI and CI are not really intelligence.

BI or business intelligence is commonly used as a descriptor for what is nothing more than internal knowledge management, spiced up with a point-and-click graphics dashboard. Not only are most of these system non-interoperable with everything else, they are as smart or as stupid as the digital data they can access.

The reality of information in most organizations is that most of what is really valuable is not digital. And, most CEOs have zero idea what intelligence (decision support) actually means.

CI or competitive intelligence focuses on competitors. What I practice, Commercial Intelligence, focuses on

  • External information
  • Collaborative work
  • Knowledge management
  • Organizational intelligence.

Commercial intelligence leverages what can be drawn from the human social networks interacting with an organization and the other sources of information. External information is not information about competitors. It includes such factors as “true cost” of goods and next-generation “cradle to cradle” opportunities. You have to factor in the art and science of retaining Organizational Intelligence. I will send you a diagram that shows my view of this commercial intelligence space.

four sectors

In my experience, today’s systems are edging toward failure. The systems aren’t very good, useful, or usable. As the Gartner Group recently said about Windows, it is untenable. I like Microsoft for its cash flow—they need to dump the legacy and launch an open source network with shared call centers and Blue Cube power processing.

Read more

ZyLAB’s Dr. Johannes Scholtes Interviewed

May 5, 2008

ZyLAB’s chief executive officer, Dr. Johannes Scholtes, said in an exclusive interview for the “Search Wizards Speak” series that the company has more than 7,500 licensees world wide. This customer base puts the company on a par with search sector leaders Autonomy, Fast Search & Transfer (Microsoft), and Google.

He told ArnoldIT.com, sponsor of the Search Wizards Speak series:

Our approach has been to say to our customer, “Here’s our list of components. Just select the ones you need. You pay only for these, so we don’t ask our customers to pay huge fees for functions that will never be used.” Our modular approach is now mature, and I see more vendors in Europe and the US emulating what we’ve been doing for a long time. Our customers tell us our “couple-of-day” deployments are very unusual. For us, fast deployment is business as usual for us. These three and six month installation efforts are problems for many organizations, and these become great sales leads for us.

The failure of key word search to meet the needs of today’s organizations is becoming more well-known. ZyLAB, according, to Dr. Scholtes has pushed beyond the search box. He said:

In the basic search, a user can see the number of hits for a query, hit-density ranking, file date and time for creation, modification, and access. There are many other features in basic mode. For advanced search, you can rank on automatically extracted entities, including names, companies, countries, measurements, dates, monetary amounts, and named-phrases. You can rank by semantic relevance using an automatically derived taxonomy or your own taxonomy. Results can be personalized. You can organize result lists in a variety of ways. You can run a query on a linguistic pattern like “a person got a job” and then rank results in these patterns higher than hits in the full text. Through all this additional meta information, we can support clustering, full text similarity inside documents where precision and recall can be set.

He made the point that ZyLAB’s relevance ranking algorithms are not locked up like those from other well-known vendors.

You can read the full interview on the ArnoldIT.com Web site in the Search Wizards Speak section of the ArnoldIT Web site. This is the 12th interview in the series. An index of the previous interviews is here.

Stephen Arnold, May 5, 2008

Kroll’s Ontrack Enhances Other Vendors’ Search System

April 28, 2008

David Chaplin, founder of Engenium, runs Kroll’s search and content processing business. Kroll acquired Engenium in 2006 and has moved quickly to integrate the firm’s content processing technologies into its products and services. Kroll is a unit of Marsh & McLennan Companies, a diversified firm with interests ranging from insurance to risk assessment and professional services.

Mr. Chaplin told Beyond Search in an exclusive interview that the Kroll solution “can enhance enhance search results from any search engine. If the desired search result is not on page one of the results we will bring all the results onto page one and provide a well organized and labeled folder structure to navigate to the best result.” He added, “We have two basic products: the query based conceptual keyword and parametric search and non-query based automatic information clustering.”

He also said:

I don’t believe that the volatility [in search] will decrease. I do believe there are not very many big moves to be made right now. I believe there are some big guys out there who want to make a move in this space.An underlying factor is that I do not believe corporate America believes that they are getting what they need from search and they are finding an increasing number of employees go to the Internet first before even checking their internal systems.

You can read the full interview on the ArnoldIT.com Web site. The interview is part of the Search Wizards Speak series. The interview is the 11th in this series.

Stephen Arnold, April 28, 2008

Sinequa’s Jean Ferré Interviewed

April 21, 2008

Sinequa, based in Paris, provides search and content processing systems that straddle traditional search, business intelligent, and data management. The company has a strong customer base, primarily outside the United States. I reacquainted myself with the company at the International Online Meeting in London, England, in December 2007. Curious about the new features in the system, I was successful in getting the firm’s managing director to speak with me.

The positioning of the company is different from some search vendor’s approach. Mr. Ferré said:

We are a search-and-retrieval system focused on the enterprise promoting our “Connect to Knowledge™” approach. What’s different is that our technology is a self-contained packaged delivered in two formats: First, we offer a flagship solution called Sinequa CS. I’m delighted to say that our sales doubled in 2007. Sinequa CS consists of a full fledged packaged platform including connectivity, navigation and obviously the core engine deployed in a large number of enterprises such as Bouygues, Arkema, MBDA, the French Army, EADS, Eurocopter, LCF Rothschild, the French Police, etc. Second, we have what we call the OEM offer (original equipment manufacture license). Another software company licenses our technology an uses it in their enterprise system. Some OEMs embed our technology in enterprise applications, Web sites, or inside Intranets.

The complexity of search systems has been the subject of some discussion. Mr. Ferré told Beyond Search:

I think Sinequa falls in between a “search toaster” and a box of technical parts you assemble. We resolve the complexity of exhaustive secured connectivity, profile based interface and yet best in class relevancy but delivers much faster at a much lower cost and complexity…. We are now offering a turnkey deployment for enterprise content. If the client wants to search and process information in file systems, relational databases, Microsoft SharePoint, the Web crawling, RSS and enterprise content management–no problem. We can have the company up and running in four days. As an example; we recently were chosen in replacement of Autonomy by one of the largest global IT integrator for its worldwide internal search. We had to compete with what the IT director wanted–Google. We won this important contract…

You can read the complete interview on the ArnoldIT.com Web site. This interview is part of the exclusive series “Search Wizards Speak”, which allows you to learn first hand about some of the most interesting companies in the behind-the-firewall (enterprise search or Intranet search) market.

Stephen Arnold, April 21, 2008

Bitext’s Antonio S. Valderrábanos Interviewed

April 14, 2008

You may not be familiar with Bitext, a search and content processing vendor specializing in natural language processing or NLP. The company has found an appetite for its technology in Spain and in other European countries. The company recently landed a deal to provide search and content processing technology to support a new citizen-facing information service in Spain. Dubbed Red 060, this system will be similar to the US government’s service, USA.gov. The company also is working with US search vendor dtSearch.

Antonio Valderrábanos, founder of Bitext in Madrid, Spain, told Beyond Search:

Our goal is to complement search engines, giving them the ability to handle text according to its content, rather than its form as it happens in most applications, including search engines. We are interested in all forms of search, including search in databases or Geographical Information Systems.

Unlike some vendors, the Bitext system meshes with other vendors’ systems, adding important new functionality. Mr. Valderrábanos told Beyond Search:

Our approach is to say, “Okay, you have a perfectly good key word indexing system. We add value to that system in ways that make users happier and without getting rid of the system in which you have invested significant time and money.” We integrate, complement, turbo-charge.

Bitext is working on important enhancements to the company’s content processing functions, including entity extraction. Entity extraction identifies people, places, events, and certain numerical data in a source document.

Looking farther into the future, Bitext engineers are working on new ways to make access easy and intuitive. Mr. Valderrábanos observed:

I think the future will want one single interface to different information sources, whether documents or databases or some combination of data from many different systems. be them docs or databases or hybrid.

Of course, the interface will be natural language, the simplest most effective way of communicating for human. We will certainly not want to bother with different applications and formal languages–so no key word queries, Boolean statement, SQL strings, or forms. People want to get the information they need without hurdles.

The full interview with Mr. Valderrábanos appears on the ArnoldIT.com Web site as part of the “Search Wizards Speak” series. You can learn more about Bitext’s line of products on the Bitext Web site.

Stephen Arnold, April 14, 2008

Brainware’s James Zubok Interviewed

March 31, 2008

Privately-held Brainware, once a unit of the German high-tech content management vendor SER Systems AG, is expanding rapidly, the company told Stephen Arnold, managing partner of ArnoldIT.com. The company uses a patented system and method anchored in numerical processes.

James Zubok, an attorney and the company’s chief financial officer, said in an interview on March 30, 2008: “In less than two years we’ve experienced remarkable growth. Our sales have grown by more than 900 percent and we’ve doubled our sales force.”

The complete interview appears as part of the Search Wizards Speak series available on the ArnoldIT.com Web site.

Brainware has a patented method for processing text. In sharp contrast to the dozens of vendors who index by key word and then try to discover metadata. The technique involves trigrams or three-letter sequences. Mr. Zubok described the system in this way:

When we index the word “BRAINWARE” we store a representation of the following trigrams: “BRA”; “RAI”; “AIN”; “INW”; etc. We create a similar trigram representation of all of the text in a search query. During a search, instead of trying to match up entire words, we match the trigrams, which allows our application to be incredibly fault tolerant. Even if some of the trigrams are not a match, our search yields relevant results without relying on any dictionaries or other pre-defined rules.

The system lends itself to some high-value applications; for example, patent application and patent analysis, email discovery, and competitive intelligence activities.

One interesting aspect of the Brainware approach to content processing is its work flow functions. Mr. Zubok said:

We have workflow solutions for our intelligent data capture offerings (they have embedded search capabilities). We have two workflow applications: WF-distiller, which is our principal workflow component that is used for creating and managing workflows of all types of complexities; and A/P-WebDesk, a specialized workflow module built using WF-distiller but used specifically for Accounts Payable management. A/P-WebDesk (which includes A/P-WebDesk for SAP, a version built specifically for seamless integration with SAP) provides an easy-to-use interface to manage the entire invoice processing lifecycle.

The company’s system can be “tuned” using additional word lists and knowledge bases. You can read the complete interview with James Zubok here. More information about Brainware is available on the company’s Web site. You can download a trial version of the desktop build of Brainware’s search and content processing system from the Brainware.com Web site.

Stephen Arnold, March 30, 2008

Vivisimo’s Founders Interviewed: Raul Valdes-Perez and Jerome Pesenti

March 21, 2008

In mid-March, Vivisimo received an infusion of $4 million from North Atlantic Capital. Vivisimo has emerged as a full-scale “behind the firewall” search provider. The company landed the high-profile search-and-retrieval deal with the US Federal government for USA.gov, the public-facing portal for government information. Then, the company inked a deal with Interwoven, the content management company, to provide search and content processing system for the Interwoven CMS system.

Some pundits see Vivisimo as specialist vendor. That view of the company is incorrect. My sources tell me that Vivisimo is finding itself invited to bid on a range of commercial, government, and association projects. Executives at some well-known, high-profile search firms have asked me about Vivisimo. In my experience, this means Vivisimo is doing something right.

Read more

Endeca’s Pete Bell Interviewed

March 17, 2008

Endeca broke the $100 million revenue barrier in 2007, and the company has received additional financial backing from Intel and SAP. Endeca’s Pete Bell spoke with me in March 2007 and provided substantive information and insight into Endeca’s success.

Mr. Bell said: “We’re thriving as an Information Access platform whose architecture is based on a new class of database.” At the outset of the interview, I was operating on the assumption that Endeca was a search engine. Mr. Bell’s explanation of the company’s approach gave me a fresh appreciation of the firm’s engineering prowess. For example, Mr. Bell said:

Since imitators were playing catch up, nearly everyone else grafted facets onto their existing engine, so they do things like manage facets through application-side code. If you have a thousand products and three facets, that’s could work. But it gets ugly when you need to scale or want to make changes. But since we architected for facets from the very beginning, we built it into the engine. We’ve evolved industrial strength infrastructure for this.

With news of the Intel and SAP financial support, I wanted to know what those additional funds would fuel. Mr. Bell said:

Intel and SAP give us the opportunity to plan a product roadmap today that will be ready for how enterprises look three years from now…. It’s all about multi-core — what would you do with an 80 core chip? … Intel wants visionaries to create the demand for its next generations of chips. As for SAP, their software manages a lot of the world’s most valuable data. Today, the SAP data support business processes … But as soon as you veer off from a specific process, it can be difficult to make use of those data.

You can read the complete interview with Mr. Bell on the Search Wizards Speak section of the ArnoldIT.com Web site. Key links are:

If you want more information about Endeca, click here.

Stephen Arnold, March 17, 2008

Coveo’s Laurent Simoneau Interviewed

March 11, 2008

Coveo has been growing at a double digit pace since I first wrote about the company in the first edition of Enterprise Search Report. In the first week of March 2008, Coveo announced that it had received additional investment to accelerate the company’s growth. You can read the official news release here. Coveo has also added support for multimedia along with streamlining the company’s solid graphical administrative access.

Right after the announcement about the $2.5 million infusion, I sent an email to Mr. Simoneau, and he agreed to meet with me last week. In the course of this candid discussion, he provided some fresh insight into the reasons contributing to Coveo’s success in behind-the-firewall search.

I was particularly interested in his firm’s success in licensing Coveo to Microsoft SharePoint customers. He told me that his team had worked to understand how Microsoft “does things”. Armed with this technical knowledge, he says:

We provide a smart document address so a user can access a document. We also include specific item types so a user knows what type of information object she will get back. We also strive to understand what information people store in SharePoint. Documents are one thing, but there is a lot of structured information as well that has to be leveraged.

I’ve been tracking the addition of rich content processing features and functions for several years. Coveo has added a number of these features, including support for Salesforce.com. The system makes it possible for a licensee to offer a user a key word search box plus a point-and-click “assisted navigation” interface.

Mr. Simoneau reveals:

We have had to create some new solutions. I cannot say too much, but we have proprietary language detection and multilingual stemming algorithms that enrich the indexing of large corporate databases. We have also patented speech recognition technologies that enable businesses to perform high quality indexing of multimedia content like podcasts or videos.

You can read the full interview with Mr. Simoneau on the ArnoldIT.com Web site here. I will be tracking Coveo as the company continues to innovate.

Stephen Arnold, March 11, 2008

ISYS’ Ian Davies Interviewed

March 6, 2008

An interview, conducted in February 2008, with Ian Davies, founder of ISYS Search Software, is now available. ISYS — founded in Australia almost 20 years ago. Beyond Search has identified this company as one to watch. Its ready-to-run solution is fast, intuitive, and feature complete.

ISYS result screen

The company has made strong inroads into law enforcement, litigation support, and competitive intelligence. These are sectors long viewed as the domain of certain publicly-traded search system vendors. ISYS’ success is a result of more than a decade of innovation.

Mr. Davies says in his interview, “We described our early product as an iceberg … the really important bits were never seen. The bit below the water-line was crucial to make the whole thing work… Some of our competitors were ‘all tip and no berg’”.

ISYS lets its system “do the talking.” The key to ISYS’ marketing is the company’s almost obsessive desire to listen and respond to its customers.

I asked Mr. Davies about the marketing hype that swirls through the content processing industry. He laughed and said, “When I attend an industry event and hear the jargon and the buzzwords, I chuckle. ISYS is a search solution that offers features that are useful to a large percentage of users. You don’t need fancy jargon like semantic wiki or enhanced Bayesian algorithms to make a system useful. There needs to be rocket science, without doubt… [Our] system does the talking, not the buzzwords.”

In my testing of ISYS 8.x, I was impressed by the system’s performance. I asked Mr. Davies about ISYS’ speedy indexing and query processing. He told me:

Performance comes in two ways. The first is algorithmic performance, where the algorithms you come up with scale well with volume. It’s all about curves, and it’s key. If your curves are wrong, you’re never going to scale. The second is implementation performance, where there’s craftsmanship in how well you implement your algorithm. We spend a lot of time analyzing our code and profiling to ensure the bottlenecks are eliminated.

According to Mr. Davies, the company is experiencing rapid growth in its offices in Australia, the U.S., and Europe. ISYS is worth a hard look. You can read the full interview here. You can download a trial version here.

Stephen Arnold, March 8, 2008

Next Page »