Cognition’s Semantic Map

September 22, 2008

I profiled Cognition Technologies in my April 2008 “Beyond Search” report for the Gilbane Group here. I can’t reproduce the profile in my Web log, but you can find out about Cognition by reading the information on the company’s Web site. My take on the firm was that it was working to tame the semantic beast that is prowling around many procurement team meetings. The company has released a knowledge base that “teaches computers the meanings behind words.” You can read more about the semantic map in the RawStory.com article “Computers Figuring Out What Words Mean” here. Cognition has, according to RawStory, licensed the map to LexisNexis, one of the early entrants in online for-fee content access. If you are in the market for a semantic map, check out Cognition’s new offering. My view of semantic technology is that Google seems to be ideally positioned to become the Semantic Web. I provided details behind this assertion in the 2007 report I did for BearStearns before it went down in flames earlier this year. Google has quite a few of its Googley souls laboring in the semantic vine yard. As a result, the semantic efforts of smaller companies and larger outfits like Microsoft have to make significant progress and fast. Cognition’s Web site is here.

Stephen Arnold, September 22, 2008

Business Intelligence: Getting Smarter in a Class with Some Lousy Students

September 22, 2008

Business intelligence sounds more up town than search. Analytics resonates with quantitative goodness. Most employees look back on their classes in mathematics with a combination of nostalgia as in “I wish I would have taken more math” and horror as in “I hated Miss Blackburn’s algebra class”. I did a job for a major university to answer the question, “Can we be number one in computer science?” The answer was, “No.” There were not many math majors who planned on working in the US once the sheepskins were handed out. It’s tough to rise to the top when your future endowment funding sources are working in Wu Han or Mumbai. Loyalties and money may go to the local high school where the math wizards’ genius was first recognized and cultivated.

I find it amusing that search vendors are rushing to become players in the business intelligence arena. Now established business intelligence companies are encouraging the running of the bull-oney. SPSS, SAS, Cognos, and Business Objects have learned to love text because their customers demanded that structured and unstructured data be mind for insights. Ignoring comments on warranty cards, in emails, or in voice calls to a help desk do yield useful information. Some companies learn what customers loathe and then don’t fix the problem. Called your mobile provider lately? How about your bank, assuming it’s still in business? See what I mean.

When I read a good analysis of how business intelligence vendors are getting smarter, I learn something about how the market perceives business intelligence. But I wonder why these analyses don’t dig into the deeper issues associated with vendors who reinvent themselves in order to make sales. I’m not sure the product innovation is of the same quality as the marketing collateral. In short, vendors talk a good game, but the delivery remains much the way it always has. Math and programming people have to be taught the system. The business intelligence system is then set up with rules spelled out. The biggest change is that the traditional method is too expensive, so companies want short cuts to business intelligence goodness. Enter the search and content processing vendor. The idea is simple: index content and convert a user’s query to a form that generates a report. Now will the report have the same concern with the niceties and nuances of hand crafted statistical instructions operating on a well formed data cube? Maybe? But the new approaches are a heck of a lot easier, faster, and cheaper. Licensees are asked to conclude, “You get all three with our new system.”

Take a gander at the well written “Business Intelligence Gets Smart” published on September 5, 2008, by Intelligent Enterprise’s Doug Henschen here. You will have to put up with an annoying ad flop over, but the content is worth the annoyance. The key point of the write up is that business intelligence “improves business performance.” This is a key point. Most search and content processing systems don’t generate a hard return on investment. Business intelligence, according to the Information Week Research Business Intelligence Survey cited by Mr. Henschen does. That’s good news, and it encourages vendors with non-ROI systems to repackage these products as bottom line centric solutions. For me, the most important parts of this write up were the charts and graphs. Mr. Henschen does a good job of pulling together the numbers that help put business intelligence in context.

I would like to offer several observations and, of course, invite comment:

  1. Business intelligence remains a complicated area, and it does not lend itself to facile solutions.
  2. Most business intelligence systems require that content be transformed, then processed, and finally analyzed. If the content processing goes off track, the fix can be time consuming and expensive. BI systems, like search and content processing systems, can experience cost overruns because the assumptions about the source information were wrong or shallow.
  3. Business intelligence even when implemented with some of the search centric solutions on the market like Endeca’s Latitude require a math or programming wizard to configure the systems.

Quite a few search and text analytics companies are asserting that “we do business intelligence”. The statement is both true and false. In order to avoid coming down on the false side of the statement, short cuts should be avoided. Implementing business intelligence is similar to Miss Blackburn’s algebra class. It’s demanding, a great deal of work, and usually disliked by those without the appetite or the aptitude for the tasks.

Stephen Arnold, September 22, 2008

Autonomy: Compliance Initiative

September 22, 2008

Autonomy bought Zantaz in July 2007 for $375 million. The company continues to enrich its compliance line of services. For example, Autonomy has been quick to roll out services that need information management, search, and content processing. Examples include the firm’s Zantaz bundle described here in April 2008, and  its recent compliance with the UK’s FSA Conduct of Business Sourcebook (COBS) requirements. Competitors in the search, content processing, and records management markets will want to pay close attention to what Autonomy is doing. I’ve been convinced for several years that Autonomy is one of the quickest reacting search vendors. New opportunities appear in Autonomy’s marketing collateral and news releases with greater precision than in the mid range consultants’ reports about industry trends. Autonomy has a nose for trends and beats many of its competitors to these markets.

As I was thinking about Autonomy, I recalled an article that appeared in Silicon Valley Watcher in April 2008. I was able to locate a copy of that article here. Written by Tom Foremski, the write up had the zippy title “A Policeman Inside Your Commuter and Inside Your Corporate Blog. Autonomy Releases Software that Flags Illegal Communications and Other Corporate Content.” For me, the most interesting comment in the article was:

There are some good and bad aspects to this software. The bad is a big brother type use for it…It could be used to restrict blogging. A lot of people tell me that large corporations are scared of blogs violating a regulation and so every corporate blog entry has to be run through lawyers– it has to be “lawyered.” This can take time, days, even weeks. Paradoxically, I think AIG could be used to clear a blog post in real-time and could thus increase the amount of good, legal information that company workers can share in public. Either way, it automates some of the tasks of a lawyer…. Less lawyering, means lower operating costs, which maximize share holder value, and that’s what corporate officers are required to do.

With the great concern about Google I heard in my various meetings in Europe last week, I was surprised that most of those Google critics were blissfully ignorant of vendors such as Autonomy who have robust tools for monitoring available and in use. I suppose the difference is that an organization can monitor in order to comply with regulations. In the next month or so, I want to profile some of the companies with content monitoring systems. I will pick a handful of representative companies. Google’s not the only game in town, not by a long shot.

Stephen Arnold, September 22, 2008

Virtual Servers: It Is Recrawl and Reindex Time

September 22, 2008

The malarky about virtualization has many information technology professionals courting chimeras. Some virtualization is good. For example, we have a couple of quad core, four gigabyte servers that are four to five times faster on our benchmark tests than the aged NetFinity 5500s we retired. The new servers have the moxie to run virtualization software. No problems so far. In fact, chopping boxes into separate virtual servers makes sense and is tame compared to some of the technologies that arrive at our office door.

Virtual storage, however, is another kettle of fish. Our experience has been that complex directory structures such as those spawned by SharePoint and certain enterprise applications are complicated. When these complex structures are mixed with virtual storage, we have encountered some excitement. We test software, so our trashed files provide us with useful data, not long weekends and sleepless nights.

InfoWorld on September 19, 2008, here called attention to some of the issues virtual storage drags along with the snappy marketing messages and rah rahs for cheaper administration. “Virtual Server Backups Prone to Failure, Survey Finds” makes clear that virtual solutions are not without some problems. The InfoWorld write up reports on a survey that asserts more than half the virtual server backups don’t restore. The article has some other data but I want to focus only on the backups not restoring.

Here’s the problem. Search is a storage intensive application. The indexes can be big. If an index doesn’t start out big, in a matter of months the index gets big. Logs get big. When a search or content processing system crashes or an index update corrupts the master index, an administrator turns to the back up sytem. If the search system is using a whizzy new virtual storage system, the backup won’t work. The problem is that rebuilding the index is not always a five minute or even a five hour job.

Recrawling and reindexing can be tricky. Systems that perform significant content processing can crunch for a day,. maybe more generating metadata. Our suggestion is to skip virtual storage for search and content processing systems. Already have one? You may want to devirtualize and quickly.

Stephen Arnold, September 22, 2008

Microsoft Yahoo: Woulda, Coulda, Shoulda

September 21, 2008

Hindsight is 20 20. Actually hindsight is what college professors do. I’m no academic, but I write “woulda, coulda, shoulda” reports now and then. These are easy to do. The facts are right there, and I can almost always develop a nifty timeline. With a bit of math magic, I can output “what if” results. I have even sold some of these spreadsheet fever reports to investment banks and rich people with more money than the average Roman patrician who is a pal of Augustus Caesar.

I thought about “woulda, coulda, shoulda” when I read the well written essay “Yahoo Should Have Sold Search to Microsoft”. You can read Henry Blodget’s analysis here. I liked Mr. Blodget’s argument. My thoughts shifted away from “woulda, coulda, shoulda” to “it is what it is”. I snagged a notepad and jotted down three “it is what it is” points.

First, the Yahoo search is a clunker in my opinion. Now I know there are some people who do work for me who love Yahoo search. I don’t like it at all. When the Cluuz.com front end is slapped on Yahoo, I like Yahoo a lot better. Why? Yahoo does not deliver useful results without including some clunkers in each hit list. Whatever Cluuz.com is doing, Yahoo becomes more useful to me. Why doesn’t Yahoo improve its search? The company is trying to do too many things. No focus translates to search results that I avoid. Maybe I’m wrong, but when a tiny Canadian company can make Yahoo a lot better, I think the problem resides within the Yahoo search team. Bad management plus an aging search engines aren’t going to close the gap between Yahoo and Google. If Microsoft bought another clunker, it is what it is–a clunker.

Second, Microsoft is not making progress despite the reorganizations, the acquisitions, and the pay for traffic ploys. Why? Users have had a decade to get used to Google’s being “good enough”. Some people think Google’s great. I don’t. Much of Google’s success comes from having competitors who don’t know what to do to leapfrog Google. Instead of going for the jugular with privacy and usage tracking as the business end of a marketing sword, Microsoft has too many chiefs or cooks. Whatever these managers are, they are not able to respond to Google after years of trying. So if Microsoft buys Yahoo search what difference will it make? Not much. Google touches about two thirds of the searches in North America. Buying the aging Yahoo will be like the purchase of Fast Search, Powerset, and Ciao.com–too little too late. In short, it is what it is.

Third, I track 52 search vendors. I have a list of 300 or so companies competing in search and content processing. Know how many can compete with Google? Two. Why doesn’t Yahoo buy one of these outfits? Why doesn’t Microsoft? The answer is that neither company takes time from their busy meeting filled days to sit down and think about Google’s vulnerabilities, which technologies are able to out Google Google in some key area, and do a head-to-head analysis that considers significant issues, not older stuff like Fast Search’s number of customers or Yahoo’s nifty banner advertising system.

The “it is what it is” analysis is sorely needed at a number of companies, not just Microsoft or Yahoo. Google has been running free for a decade. Buying yesterday’s notions of great technology won’t do the job now or in the next six to nine months.

I agree with Mr. Blodget’s analysis, but I prefer the “it is what it is” approach. Getting real about Google is the first step toward a search response to Google.

Stephen Arnold, September 21, 2008

Google Yahoo: A Contrarian’s View

September 21, 2008

In high school, I would get into trouble by asking, “What if we look at this idea from a different point of view?” My high school teachers were kindly but not too eager to listen to a question and then a suggestion that their world view was out of kilter. I am not sure why I developed this habit of mind, but I learned when I got to my first real job at Halliburton (Nuclear Utility Services), I discovered that the nuclear physicists and mathematicians that made up 80 percent of the unit liked my approach. Instead of ignoring me or putting my desk in the hall as my high school teach Miss Sperling did, these guys and gals would light up light up like white LEDs and dig in, intellectually speaking.

After reading Randall Stross’s analysis here, I felt he was on the roight track for 180 degree thinking, but he was hitting the snow covered peaks, ignoring the basalt layers on which his big idea rests. Then I read with enjoyment Michael Arrington’s “Why the Google Yahoo Ad Deal Is Something Fear.” You can read that essay here. Not only did I enjoy his writing, several of his points resonated with me. Nevertheless, my contrarian approach levered both of these astute gentlemen’s comments into several ideas that are rotated a few degrees from each’s positions.

First, the barn is on fire. It’s burning fast. The horses gone. The hay is burning fiercely and the Harrod’s Creek fire engine aided by fire engines from elsewhere can’t douse the flames. So the fire fighting professionals hose some bushes, squirt water on the roof of an adjoining building, and watch the barn burn. My view is that Google was ignored in the period from 1995 to 1998 when Messrs. Brin and Page were fooling around with BackRub. Then in the period from 1998 to 2004, some smart money urged the Googlers and their small cohort of former DEC / AltaVista.com, Bell Labs, and Sun Microsystems’ colleagues forward. When the IPO loomed, Google settled with Yahoo for about a billion dollars. Yahoo realized that Google had learned from early GoTo.com, Overture.com, and other ad efforts. Instead of reinventing the stone wheel, Google had vulcanized a Michelin radial. Clumsy metaphor but if you have a stone wheel and your friendly competitor has Michelins you have limited choices. Yahoo sued and elected to keep using stone wheels. The result is that Yahoo has the kind of choice that BF Goodrich gives NASCAR teams. Use our tires or don’t race. Works in auto racing, and it is working in online advertising.

bullet train 02 copy copy

Who wants to stand in front of this and slow down this bullet train?

Second, Google really doesn’t “sell” advertising. Like the local utility monopoly or the local water company, you can sign up for power and water or spend your money drilling a geothermal hole, erecting solar panels, and buying Evian by the truck load. Google is a service, and if the users and the advertisers did not want to make whoopie, there’s not much Google can do about it. In fact, legislating to company dependent on Google traffic that it can no longer advertise is probably one of those remarkable opportunities to explore the law of unintended consequences in detail. I don’t know about you, but I have yet to meet a government paenl or regulatory committee who has a solid grasp that Google is a giant digital computer. Ad matching and users searching are just applications. If Google removes these functions, develoeprs can use Google’s APIs to build their own systems and Google can charge a fee and take a piece of the action. The result? No changes and maybe even more money for Google because there is a great deal of interest in tapping into Google traffic.

Read more

Autonomy Tweaks French Vendors Again

September 21, 2008

When I was in Europe last week, I learned that Autonomy gobbled another juicy  cerise from the mouths of French software vendors. Autonomy must be half French so keen is its sensitivity to the French market. This deal is for Autonomy to index France 24’s video archives. These archives at present contain more than 50,000 rich media
files, consisting of the channel’s daily broadcasts in French, Arabic and English, as well as other purchased programs, and is added to daily. You can read more about this important financial cerise here.

Stephen Arnold, September 21, 2008

.

Solr: Useful Introduction

September 21, 2008

A happy quack to the person in the Netherlands who alerted me to this Softpedia write up about Solr. Solr is a Lucene-based Java search based enterprise search server. The description is here. The entry provides a comprehensive list of Solr features. Missing, however, is a link to download the system on all platforms. You can find Solr here.

Stephen Arnold, September 21, 2008

SharePoint: Picture Perfect Search

September 21, 2008

You have SharePoint search configured, optimized, and humming like a top. You have scaled up and out. Now you are ready for the next level in SharePoint. You are on the starting line for image search. For a useful guide to implementing image search within SharePoint, you will want to read and save Matthew McDermott’s “SharePoint Image Search.” This is a four part series, and you will need all four parts to round out your knowledge of the operation. You can retrieve the series here.

For me, the most useful part of the series was Mr. McDermott’s discussion of the procedures required to index images. Tips include finding and installing iFilters and then troubleshooting the iFilters. What sticks in my mind that multiple crawls and index inspection are necessary in order to find out exactly what has been retrieved and processed. Multiple crawls are trivial and quick on small SharePoint installations. When the SharePoint installation sprawls over hundreds or thousands of servers, the recrawls are non trivial. The procedure for mapping crawled properties to managed properties is a must save bit of explanation. Part 4’s sample code is as important. In fact, without this write up, the likelihood of a mere mortal getting SharePoint to deliver images in search results is pretty close to zero.

Three thoughts:

  1. This is a lot of work, particularly for large SharePoint installations. I personally would not go through these procedures and the recrawls, manual inspection, and code twiddling. Third party vendors deliver image results without this hassle.
  2. It’s clear that anyone with the programming knowledge, patience, and SharePoint bug can hack the system to perform some clever search operations. In my experience, large SharePoint installations and hacking are mutually exclusive. A glitch can be expensive to locate and remediate.
  3. Microsoft should add image search to its SharePoint service. The omission is egregious. I also want to locate other file types as well and without the hoop jumping.

Mr. McDermott deserves a happy quack. Microsoft earns a goose gift for not building this function into the system.

Stephen Arnold, September 21, 2008

Oh, My. Google Personal News

September 21, 2008

Newspapers worldwide no longer ignore Google. Nope. The “kill more trees” crowd sees the LCD message. The GOOG does news. If you have not explored Google’s personal news service, here’s the url http://news.google.com.my/news

The Star Online here has a good summary of what the service delivers to users worldwide. The Star is published in Malaysia. Users in more than 48 countries can use the service in the country’s native language. If you find a story you can’t read, you can use Google Translate to sort out the meaning here.

After some clicking you can configure a nifty summary of what’s happened in the last 36 hours. Some headlines turn up more quickly, but for the personalized topics that I track, Google lags me by about eight hours. Your mileage may vary.

There are no advertisements on the Personal News page that I could spot. Even Google seems reluctant to jab more digital lances into the media bulls’ necks. What’s interesting is that I can replicate most of the Google functionality with other free services. What sets the GOOG’s service apart is the easy to use configuration tool and the speed with which headlines, images, and snippets render on my cheap laptop in Utrecht via a snagged, open WiFi signal.

Will the global media titans be able to stop Googzilla? In my opinion, the media titans are about 10 years too late and Googleplex of technology savvy short. But, just for goose fun, let’s assume that the newspaper titans get this Google News “my” service turned off. Here’s a scenario for you:

Google offers to share revenue for stories posted by freelance journalists, retired journalists, or Web log people whom Google certifies. Slap a few ad slots on the “my” page and call it a day.

I know many people love Yahoo News. I have a personal Yahoo news page, and I find headlines that don’t update, weird configuration tools that don’t give me control, and a content selection function that makes me do too much work. The standard news page features weird pop ups, which I dislike, and the tab that I have to select to see stories from services that are not featured. In the last year, I have learned to put up with Yahoo and love newsreaders. Now “my” Google News is flirting with me.

In this scenario, three constituencies may have some trouble:

  1. The media titans are in for a long slog through a revenue Sahara
  2. Web 2.0 newsreader providers may have to do some additional work
  3. Yahoo, long number one in Web news, may face some competition

Check out the “my” service. I used to work for a traditional newspaper which was purchased by a global media titan. After watching the daily news hole shrink, talented journalists fired, and the paper chopped down to the size of a legal pad, I must say, “Well, dudes, you are in a bit of a pickle now.” Chuckle. Chuckle.

Stephen Arnold, September 22, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta