Convera: Vertical Search Is a Slow Climb

July 29, 2008

In 2000, Convera was one of the big dogs in the enterprise search engine game. The company showed promise. Then the company hit a rough patch, losing deals with Intel and the NBA. More information about this 2001 business shift is here. The company reinvented itself by selling its enterprise search unit. Autonomy nabbed a small chunk. Fast Search & Transfer gobbled the balance. The streamlined Convera emerged as a company specializing in indexing selected sets of Web sites. Convera describes itself as a vertical search engine company. A person with a list of urls can create a vertical search engine for free using Google’s Custom Search Engine. You can read about it here. I’m doing most of this math in my head, so if you find an error, please, use the comments section to set me straight. I am a bit rushed after a weekend in the hospital watching my mom sleep.

How is Convera doing? The answer can be found it the company’s financial results for the period ending April 30, 2008.

  • Revenue from continuing operations for the first quarter of fiscal 2009 increased to $402,000 or 44% over the $280,000 in revenue recorded in the fourth quarter of fiscal 2008.
  • Backlog, grew from $4.0 million at January 31, 2008 to $4.7 million April 30, 2008. These backlog balances represent future revenues stemming from the contractual minimum revenue commitment amounts from customers.
  • As of April 30, 2008, a total of 45 Excalibur supported vertical search sites from 25 different publishers that have been commercially launched. There were 39 vertical search sites from 24 publishers that had been commercially launched at January 31, 2008.
  • Search traffic activity from the Excalibur supported vertical sites continued to grow, increasing 187% from 9.5 million searches in the fourth quarter of last year to 17.8 million in the first quarter ended April 30, 2008.
  • A total of 75 Excalibur supported vertical search sites are under contract with customers, 47 of these sites have been commercially launched and 28 of these sites are in development. These contracted sites represent publications in over 15 major vertical industries. Convera is presently providing vertical search services to 30 different trade publishers.

More information about Convera’s financial are available here. The company’s historical financials are quite interesting, and I have not figured out what has happened to Convera’s debt. The company lost $47 million in 2007 and is still losing money in 2008. Allen & Co. has a stake in the firm, and two Allens serve on the Convera Board of Directors.

The Outsell consulting firm, a group of true wizards in Burlingame, California, described Convera as “a rising star in 2008”. I don’t think I would characterize Convera’s losses in 2007 and 2008 as characteristics of a rising star. But I live in Harrods Creek, Kentucky, and watch my neighbors shoot squirrels with shotguns, so what do I know?

You can see the Convera system in action. Navigate to DwellingWell.com and enter the query “outdoor paint”? Did you get results? I did not. The screen was blank. Zero results. I wish vendors would keep their demo sites current, don’t you?

You can navigate to PureContemporarySearch here. Enter the query “wall coverings”. The result set appears below:

purecontemporary

The results seemed useful. The search page allows the user to narrow the query to editor-selected sites, the Web, and other categories. The results show suggestions with more slices of content available by clicking on the tabs above the results display.

Read more

AOL: What about Relegence

July 27, 2008

I am now longer surprised with the deep cloud of unknowing that reduces visibility in pundits’ reports. Someone (who shall remain nameless) sent me a report from a well known search “expert” (who shall also remain nameless) reporting on the changes at America Online.

AOL fumbled some of their opportunities in the last three years. Now the company is the darling of the analysts’ eye because the company is dumping services. I have a partner who is a former Naval officer. He remarked, “When you jettison stuff, you want to try and stay afloat.” Not much more deep thinking is required to understand why AOL Pictures, BlueString, and an online back up service called Xdrive are history. None has much traction compared to Flickr, YouTube.com, and literally dozens of cloud-based back up services. Here’s a link to a reasonably good summary of the received wisdom about AOL’s actions.

The report I mentioned earlier talked about solid AOL services; namely, email, instant messaging, and chat rooms. The consultant generated a laundry list of other AOL services, which you can find here without paying a consultant to prepare a custom study for you.

One interesting service, which is named “Money”, is reasonably useful. In fact, for some types of company research, one can argue that it is as good as either Yahoo Finance or Google’s Yahoo Finance clone aptly named Google Finance. I heard that one of the female engineers who worked on Yahoo Finance, jumped to Google to work on Google Finance, but that may be Silicon Valley chatter. There are some similarities.

AOL has muffed the bunny on its promotion of its service. First, navigate to http://money.aol.com. There are a number of point and click options. These range from changing the page layout to scanning through categories of information, blogs, headlines, and videos.

In November 2006, AOL acquired Relegence Corporation, originally started in Israel. This company–essentially unknown and untracked in the search and content processing world of New York punditry–developed technology that monitors, formats, and displays real time content streams. The company’s approach presaged the Connotate system (now the object of much love from Goldman Sachs) and Exegy (in deep mind meld with the financial and military intelligence sectors) in 1999.

toolbar

This is the Relegence tool bar. Users have one click access to news and other features.

In 2006, I thought that The Relegence Corporation was a very solid real-time financial services news engine, providing market and business intelligence to global buy-side and sell-side institutions. The company had an R&D center in Israel. In 2004, Relegence hooked up with search vendor X1, but that tie up has dropped off my radar. Relegence’s automated infrastructure aggregated relevant structured and unstructured information from internal resources and external research, including blogs, Web sites, email and over thousands of third-party sources in English and other languages. Relegence’s could deliver customized news in real-time to any communications device. When I looked at Relegence at the time of the AOL deal, I thought that Relegence was in a position to give InfoDesk, founded by a former Bell Labs wizard Sterling Stites, a run for the money.

Read more

Google: Chubby and Paxos

July 26, 2008

The duo is not Cisco and Poncho or the Lone Ranger and Silver. Paxos is closer to leather biker gear and a Harley Davidson belt buckle. The outfit gets some panache and the biker’s pants stay properly slung. You may want to read the 16 pages of Googley goodness here. The paper is “Paxos Made Live–An Engineering Perspective.” One of the interesting facts about this paper is that Tushar Chandra has emerged as a spokesperson for Google. You can read my translation of some of his recent comments here.

In this brief essay, I want to identify three of the points discussed in this 2007 paper that are of particular interest to me. But before I highlight these points I want to provide some context. Chubby is a mechanism to keep processes from acting like hungry kindergartners running to the milk and cookies. Chubby keeps order and get the requests filled quickly without having two six year olds getting into a knock down fight over a graham cracker.

Chubby is pretty nifty technology, representing a major advance over the file and record locking schemes used for Codd databases. When I mention this point to IBM DB2 or Oracle wizards, I am greeted with hoots of laughter. “Google has nothing we don’t have and we have file and record locking schemes that are much better,” I was told in May 2007 in the IBM booth at a major trade show. No problem. I believe IBM and Oracle. I just hope their customers believe them, when Google reveals the efficiency of Chubby. You can learn more about Chubby in my 2005 The Google Legacy and my 2007 Google Version 2.0, or you can read this Google white paper. File and record locking for reads and writes is one of the hot spots in many database systems. Some companies turn cartwheels to figure out how to perform writes without screwing up read response time. Believe me, some of these outfits do Cirque de Soleil type acrobatics to work around the database read write problems.

Second, Chubby is not new. When a Google technical paper appears, Google is not revealing a work in progress. My analysis of Google engineering papers and patent documents suggests a careful staging of each information release. When a paper appears, the technology is up, running, and locked in. A competitor learning about a Google innovation from a patent document or a Google technical paper is learning about something that is two to five years “old”; that is, the company has been working on a problem and figured out a bunch of possible solutions. The one soluti0on that makes it into the Google production environment is a good one. When the Googlers talk about an innovation, the competitor who decides to respond is late out of the starting gate. Neither of my two Google studies contained “new” information. I was reporting what was ancient history for Googzilla.

Paxos

Now what’s a Paxos?

Paxos is not one thing. It is a collectionĀ  of protocols that allow a system to adapt to failures. Google has lots of servers, so there are many failures. Chubby sits between the Google File System and Google’s BigTable (a data management system, not a traditional relational database). Wikipedia can deliver some less than stellar information, but the write up for Paxos struck me as reasonably good, and the information will get you anchored in the notion. The diagrams won’t be of much use, but the Google diagrams are almost equally opaque. The reason is that the flow diagrams don’t make much sense unless you have some experience with smart software in a failure prone environment. Based on the style of writing and the type of diagrams in the Paxos write up, my hunch is that a Google-grade brain contributed a thought of two the the Wikipedia entry. The external links reinforce my conclusion that this is a pretty reliable description of the flavors of Paxos. Of course, it’s tough to determine which “flavor” or “flavors” are part of the Google library.

chubby performance

A typical Google performance table. Google compares its processes to themselves, not to commercial alternatives. These data suggest that Google is doing the work of a cluster of high performance machines on a single commodity server. The key number is operations per second, which works out to 38,400 operations per second for 20 workers (clients). What’s remarkable is that throughput is 3.6 times greater for for the larger test database. In other words, as the data get bigger, the throughput goes up. Ā© 2007 Google, Inc.

In my vastly simplistic way, Paxos is one tiny cog in Google’s library of smart algorithms. The algorithms crank mindlessly through a procedure writing values. Another process watches these values. When an anomaly becomes evident, the watching process “checks” with other processes and reaches a consensus about what action to take. It sounds really democratic and time consuming. The method is neither. The consensus is not like a human vote. When a group of processes return an acceptance value, the “master” decision is made automatically when a majority of the processes return a proposed value to the master.

Keep in mind that this occurs in a massively parallel computing environment. These types of system level processes occur with near zero latency. This type of master-slave set up is a feature of other core Google processes; for example, the Google File System itself. I describe the advantages of Google approach in The Google Legacy, and I will not repeat that information here. I think it is sufficient to point out that the approach has some very significant benefits, and most of Google’s competitors are racing to duplicate functionality that Google has had in operation for at least eight years.

Read more

Taxonomies: 24 Caret or Fool’s Gold

July 25, 2008

I have been bedeviled by taxonomies in the last two weeks. Vendors want to demo their systems. Clients want to find out how to make their taxonomies improve search. Even an entrepreneur showed up, gave me money, and outlined his taxonomy scheme for world domination.

scriptorium_1.lg

Fancy tools are not needed to create a useful taxonomy.

Yikes!

The purpose of this feature is to provide some basic places to seek taxonomy lists, services, and functions. The list is not complete, and I will add to it over time.

  • Dow Jones Factiva. You can get librarians to give you a hand and license software too. Click here for traditional media’s taxonomy resource.
  • Interse. A Microsoft SharePoint-centric system. Click here.
  • SV Technologies, now part of Sydney Plus. Legal taxonomy. Click here
  • Taxonomy Warehouse. This is the place to start. Click here to start your quest.
  • WAND. Software, services, and term lists based on business units. Click here.
  • Wordmap. The grandpappy of many whippersnappers’ word lists. Click here

In my April 2008 Beyond Search study, I provide in depth analyses of Access Innovations‘ system and the SchemaLogic taxonomy management systems. You can get information about the for fee profiles here.

This is not a complete list. If you wish to add companies, please use the comment form for this Web log.

Stephen Arnold, July 25, 2008

Microsoft: What Now for Search?

July 24, 2008

Googzilla twitches its tail and Microsoft goes into convulsions. When I was in the management consulting game, my boss, Dr. William Sommers, talked about “hyper-actions”. The idea was that a single event or a minor event would trigger excessive reactions.

convulsions

Brain scan of a person undergoing excessive “excitement” and “over reaction”.

When I read the flows-like-water prose of Kara Swisher’s “Microsoft’s Latest Web Stumble: Kevin Johnson Out” and then her brief introduction to Mr. Steve Ballmer’s “Full Memo to the Troops about New Reorg”, I thought about Dr. Sommers’s “hyper-action” neologism. In my opinion, we are watching the twitch in Mountain View triggering via management string theory the convulsions in Redmond.

First, let me identify for you the points that jumped from screen to neurons in Ms. Swisher’s write ups.

  1. Ms. Swisher reports that Mr. Kevin Johnson was the architect behind the Yahoo buy out. I thought that the idea was cooked in Mr. Chris Liddell’s lamb-roasting pit. Obviously my sources were off base. Mr. Johnson moves to Juniper and Mr. Liddell continues to get a Microsoft paycheck. Mr. Liddell’s remarks at the March 2008 Morgan Stanley Technology Conference left me with the impression that he was being “systematic” in his analysis. Here’s one take on his remarks.
  2. Ms. Swisher’s run down of Microsoft’s actions so far in 2008 is excellent, and she reminded me that Microsoft bought aQuantive, a fact which had slipped off my radar. What has happened to aQuantive for which Microsoft paid $6 billion, more than what Microsoft paid for Fast Search & Transfer and Powerset combined. He mentioning aQuantive reminded me of those wealthy car collectors on the Speed Channel’s exotic automobile auctions. What do you do with a $1.2 million Corvette? You put it in a garage. You don’t run down to the Speedway in Harrods Creek, Kentucky, to buy a pack of chewing tobacco.
  3. Ms. Swisher turns a great phrase; specifically, “Microsoft has succeeded in burnishing its image as a Web also-ran and still has an uncertain path to change that.” I quite like the notion that a large company takes one action and succeeds in producing an opposite reaction. I think the Google folks would peg that as one of the Laws of Google Dynamics applied to Microsoft. For every action, there is a greater, opposite reaction that persists through time. (Ms. Swisher’s statement that Yahoo looks stable brought a smile to my face as well.)

Next, let me comment on the Mr. Steve Ballmer reorg memo, which will be a classic in business schools for years to come. The opening line will probably read, “Mr. Steve Ballmer, firmly in control of Microsoft, sat at his desk and looked across the Microsoft campus. He knew a bold strategic action was needed to deal with the increasing threat of Google, etc. etc.”

After the razzle dazzle about goals, the memo gets down to business:

We will out-innovate Google in key areasā€”weā€™re already seeing this in our maps and news search. Third, we are going to reinvent the search category through user experience and business model innovation. Weā€™ll introduce new approaches that move beyond a white page with 10 blue links to provide customers with a customized view of their world. This is a long-term battle for our companyā€”and itā€™s one weā€™ll continue to fight with persistence and tenacity.

Read more

Privacy Flash Point

July 23, 2008

When I speak with professional groups, I dance around the issue of “smart software”. The idea is that scripts do more than handle situations as a zero or one, white or black, on or off. The computers are binary, but programmers have numerous methods for helping a script deal with ambiguity.

One of the ways is to know what a single user or a group of users who share characteristics actually do. Looking at what a person does nine times out of ten times makes it easy to tell a script, “When this person takes this action, you take that action.”

The key to making this type of “smart software” work is data. The more data one has about an individual or a group of like-acting individuals, then the easier it is to cook up simple rules. The script runs the actions. When a decision is needed, the script looks at the usage data and makes a decision.

Endeca can integrated saved queries into a work flow. When the sales person reaches a particular point in a selling script, the Endeca system runs the query and displays the information based on a combination of rules and looking at some data about what sells, what product returns the largest commission, or some other factor.

Again, the key is rules and data.

The rules are tedious to set up and test. But once in place, the real nourishment for smart software is data. But most users are themselves unaware of what actions they take when using a computer. If I remind a user that email can be analyzed for syntactical fingerprints, friends, and insight into the preferences of the user, people are shocked. This amazes me.

closed doors

Closed doors–that is, privacy–are tough to live behind in an online world.

I was thinking about this issue and privacy because the current issue of KMWorld, a tabloid published by Information Today, arrived via snail mail this afternoon. My monthly column was no more. In the July August 2008 issue, my column had become a feature story, “Cloud Computing and the Issue of Privacy”, pages 14, 15, and 22. The highlight of the story is a graphic from one of Google’s patent documents showing an exemplary data model for usage information about an individual or a group of users. The idea is that when a person can be assigned to a cluster based on some discovered similarity, probability methods make it trivial to “predict” what most members of the group will next do or prefer. This is not magic, but it is complicated and requires a honking big computer to work when there are lots of people and many groups.

To prepare for the one or two emails I get when my for-fee articles appear, I thought it might be a good idea to see what’s online. I know a little about Google but I don’t know much beyond my little area of expertise that I hone against the whetstone of Kentucky culture.

Read more

Google Patents May Be Worthless

July 22, 2008

Thank goodness I am not an attorney. The Patent Law Blog ran an essay today (July 21, 2008) called “The Death of Google’s Patents”. You may want to read the full text of the essay here. I went through the write up two times and still came away wondering about the implications of John F. Duffy’s article in “Patently: Patent Law Blog.”

The key point in the essay was a court ruling that says, “You cannot patent a software process.” There may be some exceptions, but if a company like Google has a patent on a software system or method, well, those patents are toast. (You can see why I am not legal eagle material.)

Once Mr. Duffy drops this information bomb, he moves on to the subject of Google’s PageRank invention. If the decisions reviewed in this Web log post are ones with bite, Google could lose patent protection. Keep in mind that the PageRank invention is not Google’s. The patent proudly announcements that Stanford University is the assignee, but I may be misreading the PDF in front of me. My recollection is that the patent carries a note that some of the work was performed under a National Science Foundation grant. Does that mean that some or all of the PageRank “invention” is usable by other companies?

The rulings summarized in this thought provoking essay could (will?) undermine Google’s legal position for some, maybe all, of the firm’s 350 patent documents.

Garbage_Truck_Fire-109_resize

Could Google’s patent documents end up in the garbage dump?

Does Google Have a Patent Strategy?

At the Boston Search Engine Meeting in April 2008, a young wizard and former Microsoftie, asked in the Q&A session after my speech, “Just because a company has a patent, does that mean the company will use the disclosed invention?”

Read more

Scale Fail: Amazon and Pizza Team Engineering

July 21, 2008

My news reader is chock full of glowing embers of hostility this morning. It’s 8 30 am in rural Kentucky, where nothing works very well. Power failed again last night, but we have oil lamps and candles. Based on scanning a number of the Amazon S3 outage, Amazon may want to shore up Dr. Werner Vogels’ engineering team today. Shoestrings are great for keeping sneakers on my feet, but massively parallel distributed infrastructures needs a bit more than shareware, cleaver graduate students from the Netherlands, and technical reviews by PhD candidates from University of California computer science programs.

Amazon codes using teams large enough to be fed with one pizza. The idea is that a SOCOM-type unit is better than a rigorous engineering approach found at Boeing or even Microsoft for that matter. Amazon also allows its teams considerable latitude when solving problems. In fact, some teams can use whatever programming language or method that allows the team to solve the problem.

burned pizza

This is a burned pizza. Great ingredients, distracted chef. Source: http://msp71.photobucket.com/albums/i122/xoaleycat926ox/6298db24.jpg

This approach is fast, economical, and flexible. The downside is that if the fix triggers a fault elsewhere, the pizza team or teams must scramble to figure out what happened and why. If the previous team used some off beat language or clever method, then the fixers have to puzzle out the solution. Some folks love puzzles, but I don’t think Amazon Web Services’ customers are too keen on the approach, if I read some of the nasty grams this morning.

Om Malik’s “S3 Outage Highlights Fragility of Web Services” is among the best of the essays I reviewed. You can read his full post here. For me, the key point in his analysis was:

That said, the outage shows that cloud computing still has a long road ahead when it comes to reliability. NASDAQ, Activision, Business Objects and Hasbro are some of the large companies using Amazonā€™s S3 Web Services. But even as cloud computing starts to gain traction with companies like these and most of our business and communication activities are shifting online, web services are still fragile, in part because we are still using technologies built for a much less strenuous web.

I quite enjoyed Center Networks’ understatement aboiut the problem by reporting Amazon’s own comment:

Amazon S3 has “elevated error rates”.

I think this means crash or fail.

Read more

Google Learns about Ben Franklin’s Maxims

July 19, 2008

This is an opinion piece.

My 7th-grade teacher, Miss Soapes, was a Ben Franklin groupie. Of course, Mr. Franklin departed early life in 1790, and I was in the 7th-grade in 1957. To Ms. Soapes, Mr. Franklin was at hand. Her favorite Ben-ism was:

There are no gains without pains.

Google certainly understands the meaning of Mr. Franklin’s insight. After a decade of effort, Google has arrived at the summit of the Web search and online advertising mountain.

The Google brand is one of the most recognized in the world even though most people who use Google every day don’t know that the company’s name is a corruption of googol, a number that is equal to the digit one followed by 100 zeros.

Google accounts for about 70 percent of the Web searches in North America and even more in Denmark and Germany where Google enjoys an 80 percent or more share of the Web search traffic. Only China (www.baidu.com) and Russia (www.yandex.com) resist the charms of the GOOG.

In a miserable economy, Google’s second quarter revenue missed Wall Street’s estimate by a few pennies. Within moments, Google was a loser. I was shocked by the negative turn, but my surprise was nothing to shareholders who watch the Google share price drop below $490 in after hours trading right after the results came out. Financial success in today’s high-technology sector is rare indeed. But Wall Street wizards have come to expect stellar performance from the Mountain View, California, company.

google ben fixed 2 copy copy

The company has been somewhat less successful with its non-search and non-ad initiatives. But the lack of success is a function of comparing Google’s ad revenues with revenues from its other units. For example, in FY2007, Google reported less than $200 million in revenues from its much-watched enterprise search and services unit.

However, when I worked through Google’s financials and their less-than-helpful revenue breakouts, I identified revenue from Google geospatial services, Google’s educational sales, and fees paid by developers. After fooling with assumptions and a quite bout of spreadsheet fever, I estimated that Google’s non-search earnings that could be viewed as enterprise-centric could have been as much as $400 million. Compared to Google’s FY2007 revenue of $16.6 billion, the $400 million is larger than Autonomy (about $300 million), Endeca (about $110 million) and Fast Search & Transfer (about $70 million but subject to change) in the same 12 month period. The acquisition of Postini is likely bump these revenues upward in FY2008.

Read more

What Is SharePoint?

July 18, 2008

A few publishers print tabloids and magazines. A hard copy of System Management News, July 15, 2008, arrived today. My lunch appointment was running late so I flipped through the newsprint tabloid and saw this headline, “Microsoft’s SharePoint Hits Sweet Spot as the Next Killer App.” The author of this apologia is Patrick Hynds, president of Critical Sites and a Microsoft Regional Director. Mr. Hynds is a good writer, and he does let his enthusiasm for SharePoint sparkle at every opportunity. The hard copy has a Web corollary at www.sysmannews.com. A digital version of Mr. Hynds’s analysis is here. I urge you to read the original.

The key point in the write up is that SharePoint is a “killer app”. For me, the most interesting point in the article was:

Let’s hope Microsoft doesn’t get too visionary for its own good and and expand SharePoint beyond the sweet spot it now occupies so well. It is all about collaboration and acceleration of information sharing.

The notion of a server as a killer application befuddles me. Actually, quite a bit about SiteServer, oh, I mean SharePoint makes me think about the fees consultants can assess. Without a doubt, SharePoint packs more buzz word goodness per byte than almost any other Microsoft application.

First, you have to figure out which SharePoint to license or use. The two versions are:

  1. WSS or Windows SharePoint Services 3.0.WSS includes Windows 2003 Server and Windows 2003 Small Business Server.
  2. MOSS or Microsoft Office SharePoint Server 2007.

There are numerous differences between the two platforms but both offer search, wikis, Web logs, and calendaring.

To get SharePoint to work, you will need other Microsoft server products. The two must haves are SQL Server and Exchange. Exchange is also described as a collaboration application, presumably because some executives still send email with attachments.

Rube Goldberg: Software Architect

Whenever I read cheerleaders’ scripts, I amĀ  impressed with how each each is to remember. The reality of being a cheerleader is different from the surface appearance and the facile nature of the chants. SharePoint is for me like a cheerleader and a catchy chant. “Push em back, push em back, way back.”

SharePoint slices, dices, chops, shreds, and cuts julienne potatoes, which I would not recognize if you showed me one now. To my simplistic mind, SharePoint is a layer of spackle that one uses to fill in the gaps among islands of Microsoft functions. A properly deployed and resourced SharePoint makes it possible to create a document, make it available to authorized users, and publish the document as a Web page. The search, the collaboration, the federating of email, local, and remote information, and the rest of the bells and whistles is a huge, sprawling favela (Brazilian slum).

favela 2

Source: http://www.travelblog.org/Photos/171232.html

Underneath the layers of code is a content management system. But to make the CMS work, you have to buy other Microsoft servers, use Microsoft programming tools, and drink Microsoft KoolAid.

The problem is that SharePoint has great marketing and lousy plumbing.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta