Knol: A Google Geologic Hillock with an Interesting Core

August 3, 2008

I am heading to Illinois, and I vowed I would hit the road and post a comment upon arrival in America’s most scenic area: the prairie between Bloomington and Chillicothe, Illinois. Breathtaking. Almost as stunning as the discussion about Knol, Google’s alleged Wikipedia “killer”. Apophenia here weighs in with the “standing on the shoulders of giants” argument. The idea is that Google should have done more with Knol. For me the key point in his write up was:

What makes me most annoyed about Knol though is that it feels a bit icky. Wikipedia is a non-profit focused on creating a public good. Google is a for-profit entity with a lot of power in controlling where on the web people go. Knol content is produced by volunteers who contribute content for free so that Google can make money directly from ads and indirectly from search traffic. In return for ?

The challenge is valid if Knol were designed to generate revenue from ads. At the risk of being accused of recycling information that I have been speaking and writing about for six years, let remind myself that Google has a voracious hunger for information and data in any form. Knol fits into this Google-scape. I will return to this point after I refer you to iAppliance Web here. Bernard Cole’s “Google Knol Takes on Wikipedia’s Online Encyclopedia. The key point for me in this good article was:

Knol’s collaboration model is also more hierarchical. Article collaborators can suggest changes but cannot make them without the author’s approval. While this bottleneck may lead to Knol being less timely than Wikipedia, it should prevent the revision wars that plague controversial Wikipedia articles.

I absolutely agree that Google will get something for nothing when people contribute. A Knol article, as Mr. Cole notes, will have an “owner”, a person who has met some Googley criterion as an individual qualified to write a Knol essay.

Let’s step back. When I worked my way through Google’s patent documents and the publicly available technical papers here, I noticed that a great many of these Google writings refer to storage and data management systems that hold a wide range of metadata. Google wants data about the user’s context. Google wants data about user behavior. Google wants data from books in libraries. In its quest for data, Google has been the focal point of a firestorm about copyright. Google knows that for many queries, Wikipedia with its faults pops up at the top of various Google reports listing “important” sites.

Google is a publisher and has been for a long time. The company has a wide range of mechanisms to obtain “content” from users. With the purchase of JotSpot, Google gained access to a publishing system, not a Web log tool, but a system that allowed users to input specific items in a form. The resulting information is nicely structured and ready for additional Google massaging.

When I learned about Knol, my research gave me the foundation to see Knol as a typical Google Swiss Army knife play. Let me highlight a few of the functions that I noted. Keep in mind that Google keenly desires that a coal mine explosion under my log cabin in rural Kentucky explodes and coverts me to assorted quarks and leptons:

Knol has an author, so Google can figure out that anything a Knol author posts has some degree of “quality”. Knowing the author, therefore, provides a hook to add a quality score to other writings by a Knol author. Google doesn’t have legions of subject matter experts. Knol provides a content source that can help with the “quality” scoring that Google does and sometimes in an unsatisfactory manner.
Knol gives Google a hook to get copyrighted material that it owns, not some Jurassic publisher who sees Google as the cause of the pitiful condition of book, magazine, and journal publishers. Once a Knol author gets some content in the system and maybe a stroke from Google or a colleague, Katie, bar the door. I would publish my next monograph on Google in a heartbeat. The money would be okay if Google used its payment system to sell my work, but the visibility would be significant. In my business, visibility is reasonably important.
Know gives Google a clump of information to analyze. Google wants to know the type of things that a company like Attensity or SAS can ferret out of text. These “nuggets” provide useful values to set threshold in other, separate or dependent processes within Google.

Notice that I did not focus on Wikipedia. Google, as I understand the company, floats serenely above the competition. The thrashings of companies threatened by Google are irrelevant to Google’s forward motion. I think Wikipedia needs some fixes, and I don’t think Knol will rush to do much more than what it is now doing. Knol is sitting there waiting to see if its “magnetism” is sufficiently strong to merit additional Google effort. If not, Knol’s history. If there is traffic, Google will over time nudge the service forward.

I also ignored the ad angle. Google’s patent documents contain scores of inventions for selling ads. There’s a game-based ad planning interface that to my knowledge remains behind closed doors. Everything Google does can have an ad stuck in it. So Knol may or may not have ads. Knol is not purpose built to sell more ads, but that’s an option for Google.

Based on my research, Google has a good sense of video content. Google has not figured out how to monetize it, but Google knows who makes hot videos, the traffic a hot video pulls, and similar metrics. Google knows similar data about Web logs. Now Google wants to know about individual authors’ willingness to generate original content and how the users will behave with regard to that content.

Scroll forward two years and think about Google as a primary publisher. Knol is one cog in a far larger exploration of the feasibility of Google’s becoming a combination of the old newspaper barons and the more financially frisky Robert Maxwells of the publishing world. Toss in a bit of motion picture studio and you have a new type of publishing company taking shape.

Granted Google Publishing may never come into being. Lawyers, Google’s own management, or a technical challenge from Jeff Bezos or a legal eagle could bring Googzilla down. But narrowing one’s view of Knol to a Wikipedia killer is not going to capture Knol, what it delivers, and where it may lead.

Knol is exciting for these reasons not because it is an ersatz Wikipedia. Okay, tell me I’m recycling old information, living in a dream world, or just plain wrong. Any of these is okay with me. Remember the disclaimer for this personal Web log.

Stephen Arnold, August 3, 2008

Written by Stephen E. Arnold · Filed Under Business strategy, Cloud computing, Google, News, Online (general), Search, Technology, Text processing | Comments Off on Knol: A Google Geologic Hillock with an Interesting Core

Email Analysis

July 5, 2008

This summer I have been asked about email analysis on two different occasions. In order to respond to these requests, I had to grind through my archive of email-related information. I wrote about Clearwell Systems and its approach earlier this year. You can read this essay here.

I cannot reproduce the information my paying customers received. I can take a representative company–in this case, Stratify, a unit of Iron Mountain–and show you two different screen shots. These layouts and representations are the property of Stratify, and I am including them in this essay for two reasons:

Stratify has been one of the early players in text analytics. First as Purple Yogi and then as Stratify, the company was engaged in the difficult missionary marketing needed to make non believers into believers
The company has gained some traction in the legal market, which in the US, is a booming sector. The problems of the economy translate into a harvest of riches for some legal firms. Email is a big deal in discovery, and few have the resources to get a human to read all the baloney that zooms around an organization involved in a legal matter.

The Problem

You know the problem. Email was once ASCII shot between two people on Arpanet. Today email is the bane of the knowledge worker. The volume is high. The storage systems antiquated. The attachments madden the sane. The people using email forget that the messages live on different servers and can, in the process of discovery, be copied to a storage device and delivered to the attorney or attorneys who have to find something germane to the legal matter in the terabytes of digital data.

To summarize the challenges:

Email volume (lots of it, maybe a billion messages in a mid-sized organization every year)
Email attachments (tough to find the “right” one)
Email crashes (restores don’t always work, which you probably know first hand)
Email sent as if it were a one-time, secret communication
Email with recipients who, by definition, have some relationship.

For a lawyer, email is good and bad. It’s good if one finds a smoking gun or better yet a gun in the act of shooting. It’s bad if the bullets are coming at the opposing side’s legal eagles, worse if the bullet shoots a legal eagle out of the sky with a slug through the brain.

Ergo: email is a big, big deal in the information world of litigation.

The Solution

The fix is obvious–search. Actually to be precise, the conundrums of email invite text processing, text analytics, link analysis, relationship extraction, entity extraction, and other nifty methods.

The basics of email analysis are actually simple on the surface, more complicated under the hood and out of sight of non-technical types like lawyers: [a] copy email to a storage device that is fast, [b] tell email analysis program to index the email, [c] key word search or browse outputs, [d] make notes, print out email, and read individual documents of interest, [e] repeat taking care to bill for the time. (That’s the best part of email analysis. It’s quicker than manual methods, but the systems have to have a baby sitter. Those operating these systems can bill without working up too much of mental headache. Automated processes do make some legal thinking less painful. The best part is billing for this less stressful time.)

What do these systems show the user? The illustration below shows a Stratify search screen. Since I obtained this screen shot, Stratify has probably updated the interface. The main features are our interest. Take a look at what the Stratify system user sees when analyzing processed email:

Stratify’s email visualization

The principal features of this display are:

Simplicity. You don’t want to confuse attorneys
A picture showing people and their relationships as discerned by the system. Remember, an email can be sent to a person unrelated to a subject either by accident or for some other reason such as an “this is what I am doing” courtesy
Links on the right hand panel to make it easy for the user to poke around by sender, topic, etc.

Let’s assume that the email is one part of a discovered collection of information. Stratify provides a richer interface. This one includes the bells and whistles that warrants the Stratify system price tag which is in six figures in case you want to license the system.

Written by Stephen E. Arnold · Filed Under Enterprise, Feature, Online (general), Search, Text processing | Comments Off on Email Analysis

Enterprise Search Top Vendors: But Who Is the Judge?

July 3, 2008

My jaw dropped when I saw “The Top Enterprise Search Vendors,” an essay by Jon Brodkin, a writer affiliated with Network World. You can read the two-part document here. (Note: The url is one of those wacky jobs with percent signs and random characters, the product of a misbegotten content management system. So, if you can’t get the link to work after I write this [July 3, 2008, 2 pm Eastern time], you are on your own.)

Let’s cut to the chase.

Mr. Brodkin is using a consulting firm’s report as the backbone of his analysis. There is nothing wrong with that approach, and I use it myself for some documents. He picks up assertions in the consultant report and identifies some companies as “best” or “top” in the “enterprise search” market. We need a definition of “enterprise search”. A definition, in my view, is an essential first step. Why? I wrote a 300-page study about moving beyond search for Gilbane Group. A large part of my argument was that no one knows what enterprise search so dissatisfaction runs high, in the 50 to 75 percent. Picking the “best” or “top” vendor when the majority of system users are unhappy is an issue with me.

He writes:

The best enterprise search products on the market come from Autonomy, Endeca, the Microsoft subsidiary Fast and Vivisimo, but Google’s Search Appliance continues to dominate the market in terms of brand awareness and sheer number of customers, Forrester Research says in a new report.

Ah, yes, the Forrester “wave” report. Now we know the origin of the adjectives “top” and “best”. Other vendors to note include:

Coveo
IBM
Microsoft’s own MOSS and MSS search systems (distinct from the Fast Search & Transfer ESP system). This is in too much flux to warrant discussion by me. I handle this in Beyond Search by saying, “Wait and see.” I know this is not what 65 million SharePoint users want to hear, but “wait and see”.
Oracle
Recommind.

Let’s do a reality check here, not for Mr. Brodkin’s sake or that of the Forrester “wave” team. Just in case an individual wants to license a search system, some basic information may be useful.

First, there are more than 300 vendors offering search, content processing, and text analytics systems at this time. There is no leader for several reasons:

Autonomy has diversified aggressively and much of their market impact comes from systems in which search is a comparatively modest part in a far larger system; for example, fraud detection. So, revenues alone or total customer count are not key indicators of search.
Fast Search & Transfer has been struggling with a modest challenge; namely, the investigation of its finances over an alleged loss of FY2007 $122 million in the fiscal year prior to Microsoft’s buying the company for $1.2 billion. Somehow “best” and “top” are in conflict with this alleged short fall. So, “best” and “top” mean one thing to me and definitely another to the Mr. Brodkin and the Forrester “wave” team. If an outfit is the best, I assume the firm’s financial health is part of its being “top” or “best”. I guess I am old fashioned or an addled goose.
Endeca works hard to explain that it is an information access company. Sure, search functions work in an Endeca implementation, but I think lumping this company with Autonomy (diversified information services) and Fast Search & Transfer (murky financial picture) clarifies little and confuses more.
Vivisimo is a relative newcomer to enterprise search. The company has some nifty de-duplication technology and it can federate results from different engines. The company is making sales in the enterprise arena. I categorize it as an up-and-coming vendor. I wonder if Vivisimo was surprised by its being labeled as a firm nosing around in Autonomy and Endeca territory. Great publicity. But Autonomy is about $300 million in revenue. Endeca is in the $110 million in revenue range. Vivisimo is far smaller, maybe one tenth Endeca’s size, but growing. A set to my way of thinking should contain like objects. $300 million, $100 million, $10 million–not the type of set I would craft to explain “enterprise search”.

Second, have vendors been miscategorized. I am okay with mentioning Coveo and Recommind. Both companies seem to have a solid value proposition and a clear sense of who their prospects are. Coveo, in particular, has some extremely tasty technology for mobile search. Recommind, despite its efforts to break out of the legal market, continues to make sales to lawyer-types. I am not sure the word “search” covers what these two firms are offering their customers. I think of both vendors offering “search plus other services and functions.”

Third, identifying IBM and Oracle as key players in search baffles me. Both buy consulting and advertising, but in “enterprise search”, neither figures prominently in my analyses. IBM is not a search company; it is a consulting firm using advice to push hardware, software, and services. Search at IBM can mean Lucene with an IBM T shirt. IBM also sells DB2, FileNet, iPhrase, and assorted text processing tools whose names I cannot keep straight. IBM also has an industry “openness” initiative called UIMA, a gasping swan right now in my opinion.

And, Oracle has been beating the secure search drum to deaf ears for a couple of years. Oracle SES 10g sells more Oracle servers, but Oracle is moving a lot of Google Search Appliances. So, what’s Oracle search? Is it the PL/SQL stuff that fuels more Oracle database installations, the SES 10g, or the Google Search Appliance? My sources indicate that Oracle sells more Google Search Appliances than SES 10g. Why? Well, it works and has a nifty API that allows Oracle consultants to hook the GSA into other enterprise systems. Forrester says Oracle is a search vendor, which is accurate. Forrester and Mr. Brodkin don’t mention the importance of the GSA in Oracle’s information access efforts.

Then there is Google or the GOOG. Google rates inclusion in the list of search leaders. The surprise is that Google is THE leader in enterprise search. The company doesn’t provide much information, but based on my research, Google has more than 11,000 Google Search Appliance licensees and more coming every day. When you add up the revenue from various enterprise activities, Google is not generating the paltry $188 million reported in its FY2007 financials. Nope. The GOOG is in the $400 million range. If my data are correct, Google, not Autonomy, is number one in gross revenue related to search.

What’s this all mean?

Let me boil out the waste products for you:

Enterprise search is a non-starter in organizations. People don’t like the “search” experience, so the market is shifting. The change is coming quickly, and the established vendors are trying to reposition themselves by adding social search, business analytics, and discovery functions. The problem is that other companies are moving more quickly and delivering these much needed options quicker.
There are some very significant vendors in the information access market, and these must be included on any procurement team’s “look at” list; specifically, Exalead (Paris) and Isys Search Software (Sydney and Denver). Both companies serve slightly different sectors of the information access market, but omitting them underscores a lack of knowledge of what’s hot and what’s not.
Specialist vendors are having a significant impact in niche markets, and these vendors could make leaps into other segments as well. Examples that come to my mind are Attensity and Clearwell Systems.
New players are poised to disrupt existing information access markets. Examples range from Silobreaker (Stockholm) to companies such as Attivio and Connotate. In fact, there is an ecosystem of new and interesting approaches that have search and retrieval functions but are definitely distancing themselves from the train wreck that is “enterprise search”.

I urge you to read the Forrester report. Just be sure of your facts before you base your decision on a single firm’s analysis. There is a reason that a pecking order in consulting exists. At the top are Booz, Allen & Hamilton, Boston Consulting Group, Bain, and McKinsey. Then there is a vast middle tier. Below the middle tier are firms that offers boutique services. Instead of accepting a firm’s view of the “top” or the “best”, make sure the advice you take comes from a firm that has a blue-chip recommendation.

The growing dissatisfaction with enterprise search can come back and bite hard.

Stephen Arnold, July 3, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, News, Search, Text processing | Comments Off on Enterprise Search Top Vendors: But Who Is the Judge?

Text Analytics Summit Summary Sparks UIMA Thoughts

June 22, 2008

Seth Grimes posted a useful series of links about the Text Analytics Summit, held in Boston the week of June 16, 2008. You can read his take on the conference here. I was not at the conference. I was on the other side of the country at the Gilbane shin dig. To make up for my non attendance, I have been reading about the summit.

From what I can deduce from the Web log posts, the conference attracted the Babe Ruths and Ty Cobbs of text analysis, a market that nestles between enterprise search and business intelligence. I am not too certain about the boundaries of either of these markets, but text analytics is polymorphic and can appear searchy or business intelligency depending upon the context.

I clicked through the links Mr. Grimes provides, and I recommend that you spend a few finites with each of the presentations. I learned a great deal. Please, review his short essay.

One point stuck in my mind. The purpose of this essay is to call your attention to this comment and offer several observations about its implications for those who want to move beyond key word retrieval. Keep in mind that I am offering my opinion.

Here’s the comment. Mr. Grimes writes:

I’ll conclude with one disappointing surprise on the technical front, that UIMA — the Unstructured Information Management Architecture, an integration framework created by IBM and released several years ago as open source to the Apache — has not been more broadly accepted. IBM software architect Thomas Hampp spoke about his company’s use of the framework in the OmniFind Analytics edition, but Technology Panel participants said that their companies — Attensity (David Bean), Business Objects (Claire Thomas), Clarabridge (Justin Langseth), Jodange (Larry Levy), and SPSS (Olivier Jouve) — simply do not perceive user demand for the interoperability that UIMA can offer.

My understanding of this statement and the supporting evidence in the form of high profile industry executives is that an open standard developed by IBM has little, if any, market traction. In short, if the UIMA standard were gasoline, your automobile would not run or just sputter along.

Let us assume that this lack of UIMA demand is accurate. Now I know this is a big assumption, and I am confident that an IBM wizard will tell me that I am wrong. Nevertheless, I want to follow this assumption in the next part of the essay.

Possible Causes

[Please, keep in mind that I am offering my opinion in a free Web log. If you have not read the editorial policy for this Web log, click on the About link on any page of Beyond Search. Some readers forget that I am using this Web log as a journal and a container for the information that does not appear in my for fee reports and my paid writings such as my monthly column in KMWorld. Some folks are reading my musings and ignoring or forgetting what I am trying to capture for myself in these posts. Check out the disclaimer here.]

What might be causing the lack of interest in UIMA, which as you know is an open source framework to allow different software gizmos to talk to one another? For a more precise definition UIMA, you can give the IBM search engine a whirl or click this Wikipedia link, http://en.wikipedia.org/wiki/UIMA.

Here is my short list of the causes for the UIMA excitement void. I am not annoyed with IBM. I own IBM servers, but I want to pick up Mr. Grimes’ s statement and perform a thought experiment. If this type of writing troubles you, please, click away from Beyond Search. Also, I am reacting to a comment about IBM, but I want to use IBM as an example of any large company’s standards or open source initiative.

First, IBM is IBM. IBM has an obligation to its shareholders to deliver growth. Therefore, IBM’s promulgating a standard is in some way large or small a way to sell IBM products and services. Maybe potential UIMA users are not interested in the potential upsell that may follow.

Second, open source and standards have proven to be incredibly useful. Maybe IBM nees to put more effort into educating partners, vendors, and customers about UIMA? Maybe IBM has invested in UIMA and found that marketing did not produce the expected results, so IBM has moved on.

Third, maybe today IBM lacks clout in the search and content processing sector. In 1960, IBM could dictate what was hot and what was not. UIMA’s underwhelming penetration might be evidence that the IBM of today lacks the moxie the company enjoyed almost a half century ago.

And one fourth possibility is that no one really wants to embrace UIMA. Enterprise software is not a level playing field. The vendor wants to own the customer, locking out any other vendor who might suck dollars from the company owning a customer. IBM and other enterprise vendors want to build walls, not create open doors.

I have several other thoughts on my list, but these four provide insight into my preliminary thinking.

Observations

Now let’s consider the implications of these four points, assuming, of course, that I am correct.

Big companies and standards do not blend as well as a peanut butter and jelly sandwich. The two ingredients may not yet be fully in harmony. Big companies want money and open standards do not have the revenue to risk ratio that makes financial officers comfortable.
Open source is hard to control. Vendors and buyers want control. Vendors want to control the technology. Buyers want to control risk. Open source may reduce the vendor’s control over a system and buyers lose control over the risk a particular open source system introduces into an enterprise.
Open source appeals to those willing to break with traditional information technology behavior. IBM, despite its sporty standards garb, is a traditional vendor selfing traditional solutions. Open source is making headway, but it is most successful when youthful blood flows through the enterprise. Maybe UIMA needs more time for the old cows to leave the stock pen?

What is your view? Is your organization ready to embrace UIMA, big company standards, and open source? Agree? Disagree? Let me know.

Stephen Arnold, June 22, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, Feature, Search, Semantic, Text processing | 1 Comment

Business Intelligence: Turmoil and Change Loom

June 18, 2008

Fern Halper, a member of Hurwitz & Associates team, wrote “Text Analytics and the Predictive Enterprise” on June 13, 2008. The story appeared on IT Analyses, and I just saw it.

Ms. Halper makes two point about text analysis. She is talking about analytics vendor SPSS, but her comments apply across the business intelligence spectrum.

First, she makes it clear that text contributes to business intelligence. Structured data and text yield useful insights. The idea is that mining both is more meaningful.

Second, she asserts that analysis of Web logs and other social information can add value to traditional business intelligence activities.

SPSS, SAS Institute, Business Objects (now part of SAP), Clarabridge, and other vendors share somewhat similar views.

My hunch is that market friction is going to become more evident as IBM, Microsoft, and Oracle increase their analytics efforts. Business intelligence, like search, is moving downmarket and to some extent becoming a utility functions.

My research into frustration with enterprise search shined a light into a formerly dark corner of an increasingly important function. Business intelligence also has annoyed users with its complexity, hard-to-understand reports, and lack of “average manager” interfaces.

What’s this mean?

My thought is that head-to-head competition will increase. Business intelligence vendors will find themselves pressured to keep their clients from drifting toward analytics solutions bundled with other enterprise applications from the likes of IBM, Microsoft, and Oracle. In addition, traditional business intelligence vendors have to figure out how to keep newcomers like Attensity (deep extraction) and Aster Data (data management) from making sales in organizations where there once was a traditional business intelligence monopoly.

For many years, competition among the SAS Institute and SPSS was governed by the type of rules that once governed duels with pistols. Business Objects brought more Madison Avenue sizzle to business intelligence. Now lines are blurring between high-end, specialist business intelligence and what I call “baked in BI” from IBM, Microsoft, and Oracle. Add to this the upstarts arriving with zippier technology and a hunger for making sales. The result is an uptick in competitiveness.

Companies today need to find ways to keep customers and squeeze meaning from available data. Search on its own does not deliver what an organization needs. Crunching numbers does not deliver. Text analytics does not deliver. Organizations need all three functions to be available and usable by the average manager.

With this problem getting more attention, a hybrid solution is needed. With a lucrative pay off for the company that cracks this problem, accelerating change is not just likely, significant disruption awaits us in business intelligence. Who could profit from this increased turmoil? I think Google may be a factor going forward. Hosted crunching, customers wanting ease of use, canned analytics and APIs, and social data–are ingredients for a new enterprise recipe from the GOOG?

Stephen Arnold, June 17, 2008

Written by Stephen E. Arnold · Filed Under Database, Enterprise, Google, News, Social, Text processing | 6 Comments

IBM Puts Document Classification Front and Center

May 30, 2008

IBM was thoughtful. I received a hot link to the new IBM white paper, Beyond Automation: Accelerating Processes with Classification. Before I clicked the link, I had a hunch that IBM’s Automated Classification Resource Center would feature OmniFind. I was right. IBM says:

The IBM Classification Module is a part of the IBM ECM portfolio and primarily targeted for use by IBM ECM customers. The previous version of the product (V8.3) was named IBM Classification Module for OmniFind Discovery Edition.

I had a bit of trouble figuring out what product name was current. In fact, depending on which IBM link you select you get directed to catalog pages, FileNet document management pages, or OminFind pages.

To get the Flash video, navigate to http://www.filenetinfo.com/mk/submit/classifyop?_JS=true&sor=CatalystEmail3. You will need to fill out the form if you use this link.

If you find the classification module via the IBM search box, you get pointed to http://www-306.ibm.com/software/data/enterprise-search/classification/. I didn’t have to register again, but your mileage may vary.

In my review of the classification module, I didn’t see any notable changes since I last reviewed the product. My thought is that IBM is in the midst of a marketing campaign for document classifiction for people who have a copy of OmniFind, IBM’s search system.

I took a closer look, and I found the new thing. The previous version of this product was named IBM Classification Module for OmniFind Discovery Edition. The current version of this product is the IBM Classification Module”. The module is $1,500, a bargain when compared to the deep extraction system available from Attensity. Some investigation reveals that you need to have a license for other IBM content components. If you want IBM to install or customize the system, you can run up an interesting bill.

I recall pointing out in the first three editions of Enterprise Search Report that IBM software requires a bit of work to research. In fact, product naming and interdependencies that you may not be able to do this job alone. Don’t worry. IBM’s professional services are just a phone call away and available from IBM offices worldwide.

Oh, when you do a search for the “new” product name IBM Classification Module, the top rated link is to an order form that is different from the one displayed by navigating to the main product page. The first page of the 3,622 results did not point to the new product page.

It’s a job for OmniFind Discovery Edition all right.

Stephen Arnold, May 30, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, News, Text processing | Comments Off on IBM Puts Document Classification Front and Center

Government High-Tech Investments: IN-Q-TEL

May 26, 2008

I received an email from a colleague new to the Federal sector. Her email included comments and links about US government funding of high technology companies. I was surprised because I assumed that most people knew of the IN-Q-TEL organization. As US government urls go, IN-Q-TEL’s will baffle some people. First, the hyphens throw off some folks. Then the group’s use of the Dot Org domain is another.

In a nutshell, IN-Q-TEL makes clear what it does and why:

IN-Q-TEL identifies, adapts, and delivers innovative technology solutions to support the missions of the Central Intelligence Agency and the broader US intelligence community.

I’m not interested in whether IN-Q-TEL is doing a great job or a lousy job. I’m not concerned about its mission, its funding, or its management team.

What I find fascinating is the organization’s choice of companies in which to invest. I don’t know the budget range of IN-Q-TEL, but my sources tell me that the investments stick close to $1 million, sometimes more, sometimes less. You can read more about IN-Q-TEL at these links:

The Wikipedia entry, and I am not vouching for the accuracy of this entry
The CIA’s own description here
KMWorld’s write up here. (I am a paid columnist for KMWorld, but I did not contribute to this story.)

The purpose of this feature is to provide a snapshot of the companies in which IN-Q-TEL has invested. I’ve identified more than 70 companies. This is too many to put in one posting, so I will break up the list and cover the period 2000 to 2003 here and do each subsequent year in additional Beyond Search postings.

In the period from 2000 to 2003, IN-Q-TEL invested in 25 companies. Keep in mind that I may have overlooked some in my research. If you know of a company I missed, please, use the comment section of this Web log to update my information. These appear in the table below:

Written by Stephen E. Arnold · Filed Under Feature, Online (general) | 4 Comments

Enterprise Search Vendors’ Taglines

May 16, 2008

A colleague in San Francisco asked me on May 14, 2008, “How do the search engine vendors position themselves?”

I told him that I would think about the question on the luxurious red-eye flight from SFO to Detroit. I did. I worked through the files on my trusty laptop and compiled a list of the taglines for some of the vendors whom I monitor. The list is not exhaustive, but I had data about a couple of dozen companies in the behind-the-firewall search business.

The table below provides a summary of the taglines. These are quite interesting, and I was surprised at the different approaches taken to explaining the companies’ systems. For example, I liked the taglines that echoed Caesar’s I came, I saw, I conquered (Vini, vidi, vici). SchemaLogic says, “Find. Use. Protect.” Thetus asserts, “Find. Assess. Fit. Understand.” Lexalytics crafts, “Discover. Understand. Act.”

Several of the companies use active or instrumental catchphrases. Brainware, a spin out from a German content management company, uses, “Intelligence unleashed.” I thought of a tiger pursuing me through the Louisville Zoo. And InQuira says, “Harvest knowledge.” Nstein, a company that has undergone accelerated evolution,

Less creative influences put a damper on marketing passion in these slogans. Panoptic (now Funnelback) gently offers, “Internet and Enterprise Search.” Almost matching the Australian’s tagline is Fast Search & Transfer’s “The business of search.” Clearforest matches these in understatement with its “Text Analytics Solutions.” ZyLAB comes close too, saying, “Infomation Access Solutions.”

Other companies use the tagline as elevator speeches on a diet. For example, Endeca, flush with investments from Intel and SAP, states, “Innovative Software to Help People Explore, Analyze, and Understand Information.” Not to be outdone in the pitch department is ISYS Search Software’s “Enterprise Search Solutions for Real People Doing Business in the Real World.” (I like the “real” part of this statement because some of the taglines are a bit abstract.) Stratify (formerly Purple Yogi) stikes a Zen-like note: “Focus on the Matter of eDiscovery with Peace of Mind.” When I repeat this five times, my heart rate slows and my blood pressure drops.

Other vendors assert that their system is Numero Uno in the search-and-retrieval sector in a nice way, of course. Open Text, a company with as many search technologies as Microsoft, declares themselves “The Content Experts.” And, Dieselpoint opines, “The Leader in Search & Navigation Technology.”

A small number of vendors drift into the poetic. Exegy uses repetition and alliteration to explain its super-fast appliance: “Extreme Speed. Extreme Insight.” Or, SurfRay (owner of Mondosoft and Speed of Mind) and its rhytmic “We Move People to Discover.” Note that SurfRay itself, a relative newcomer to search, describes itself this way, “Pioneers in Enterprise Search and Behavior Analytics.” Strong stuff and sure to cat catch the attention of Autonomy working overtime to catch up with the “Don’t be evil” Googlers.

Written by Stephen E. Arnold · Filed Under Enterprise, Feature, Search, Text processing | 7 Comments

New Contract for Clarabridge

May 15, 2008

Clarabridge, a “customer experience management vendor,” recently scored a posh client in Gaylord Hotels, who wants to utilize text analysis to review customer satisfaction surveys. Keeping millionaires happy requires technology.

The Clarabridge contract will install its content mining platform at Gaylord properties. The goal: to relate textual commentary to a satisfaction scale. Clarabridge’s product dumps extracted, unstructured data into a star schema to make associated fact tables, just like progenitor-once-removed MicroStrategy, the business intelligence company that passed on its reporting, analysis, and monitoring solutions DNA.

Clarabridge has a client list that includes big names Marriott, The Gap, H&R Block and more – making it quite unlikely that it will suffer a stock crash like Microstrategy did ($333 to $1 – ouch!) in 2001. Some pundits assert that Clarabridge is a company that will challenge Attensity www.attensity.com, a low-profile, fast-growing text analytics company headed by David Bean.

Gaylord, owner and operator of four vast and lavish resort hotel properties, receives tens of thousands of guest commentaries through its Opryland (Nashville, Tenn.), Palms (Orlando, Fla.), Texan (Dallas/Fort Worth), and National (Washington, D.C./Maryland) properties in a Web-based survey. While polled information is fairly straightforward, the information gained in the “other comments” box at the end of a survey is expensive, difficult to quantify, and make useful using humans. Clarabridge’s platform will change all that.

At Clarabridge’s web site, you can download their white papers, case studies, industry resources and more.

Jessica Bratcher, May 15, 2008

Written by Stephen E. Arnold · Filed Under News, Search, Text processing | 2 Comments

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Knol: A Google Geologic Hillock with an Interesting Core

Email Analysis

Enterprise Search Top Vendors: But Who Is the Judge?

Text Analytics Summit Summary Sparks UIMA Thoughts

Business Intelligence: Turmoil and Change Loom

Tag Clouds for the Enterprise

IBM Puts Document Classification Front and Center

Government High-Tech Investments: IN-Q-TEL

Enterprise Search Vendors’ Taglines

New Contract for Clarabridge

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta