Francisco Corella, Pomcor, an Exclusive Interview
February 11, 2009
Another speaker on the program at Infonortics’ Boston Search Engine Meeting agreed to be interviewed by Harry Collier, the founder of the premier search and content processing event. Francisco Corella is one of the senior managers of Pomcor. The company’s Noflail search system leverages open source and Yahoo’s BOSS (build your own search system). Navigate to the Infonortics.com Web site and sign up for the conference today. In Boston, you can meet Mr. Corella and other innovators in information retrieval.
The full text of the interview appears below:
Will you describe briefly your company and its search technology?
Pomcor is dedicated to Web technology innovation. In the area of search we have created Noflail Search, a search interface that runs on the Flex platform. Search results are currently obtained from the Yahoo BOSS API, but this may change in the future.  Noflail Search helps the user solve tough search problems by prefetching the results of related queries, and supporting the simultaneous browsing of the result sets of multiple queries. It sounds complicated, but new users find the interface familiar and comfortable from the start. Noflail Search also lets users save useful queries—yes, queries, not results. This is akin to bookmarking the queries, but a lot more practical.
What are the three major challenges you see in search / content processing in 2009?
First challenge: what I call the indexable unit problem. A Web page is often not the desired indexable unit. If you want to cook sardines with triple sec (after reading Thurber) and issue a query [sardines “triple sec”] you will find pages that have a recipe with sardines and a recipe with triple sec. If there is a page with a recipe that uses both sardines and triple sec, it may be buried too deep for you to find. In this case the desired indexable unit is the recipe, not the page. Other indexable units: articles in a catalog, messages in an email archive, blog entries, news. There are ad-hoc solutions for blog entries and news, but no general-purpose solutions.
Second challenge: what I call the deep API problem. Several search engines offer public Web APIs that enable search mashups. Yahoo, in particular, encourages developers to reorder search results and merge results from different sources. But no search API provides more than the first 1000 results from any result set, and you cannot reorder a set if you only have a tiny subset of its elements. What’s needed is a deep API that lets you build your own index from crawler raw data or by combining multiple sources.
Third challenge: incorporate semantic technology into mainstream search engines.
With search processing decades old, what have been the principal barriers to resolving these challenges in the past?
The three challenges have not been resolved for different reasons. Indexable units require a new standard to specify the units within a page, and a restructuring of the search engines; hence a lot of inertia stands in the way of a solution. The need for a deep API is new and not widely recognized yet. And semantics are inherently difficult.
What is your approach to problem solving in search and content processing? Do you focus on smarter software, better content processing, improved interfaces, or some other specific area?
Noflail Search is a substantial improvement on the traditional search interface. Nothing more, nothing less. It may be surprising that such an improvement is coming now, after search engines have been in existence for so many years. Part of the reason for this may be that Google has a quasi-monopoly in Web search, and monopolies tend to stifle innovation. Our innovations are a direct result of the appearance of public Web APIs, which lower the barrier to entry and foster innovation.
With the rapid change in the business climate, how will the increasing financial pressure on information technology affect search / content processing?
The crisis may have both negative and positive effects on search innovation. Financial pressure causes consolidation, which reduces innovation. But the urge to reduce cost could also lead to the development of an ecosystem where different players solve different pieces of the search puzzle. Some could specialize in crawler software, some in index construction, some in user interface improvements, some in various aspects of semantics, some in various vertical markets.
A technogical ecosystem materialized in the 80’s for the PC industry, and resulted in amazing cost reduction. Will this happen again for search? Today we are seeing mixed signals. We see reasons for hope in the emergence of many alternative search engines, and the release by Microsoft of Live Search API 2.0 with support for revenue sharing. On the other hand, Amazon recently dropped Alexa, and Yahoo is now changing the rules of the game for Yahoo BOSS, reneging on its promise of free API access with revenue sharing.
Multi core processors provide significant performance boosts. But search / content processing often faces bottlenecks and latency in indexing and query processing. What’s your view on the performance of your system or systems with which you are familiar? Is performance a non issue?
Noflail Search is computationally demanding. When the user issues a query, Noflail Search precomputes the result sets of up to seven related queries in addition to the result set of the original query, and prefetches the first page of each result set. If the query has no results (which may easily happen in a search restricted to a particular Web site), it determines the most specific subqueries (queries with fewer terms) that do produce results; this requires traversing the entire subgraph of subqueries with zero results and its boundary, computing the results set of each node. All this is perfectly feasible and actually takes very little real time.
How do we do it?Â
Since Noflail Search is built on the Flex platform, the code runs on the Flash plug-in in the user’s computer and obtains
search results directly from the Yahoo Boss API. Furthermore, the code exploits the inherent parallelism of any Web API. Related queries are all run simultaneously. And the algorithm for traversing the zero-result subgraph is carefully designed to maximize concurrency.
Yahoo, however, has just announced that they will be charging fees for API queries instead of sharing ad revenue. If we continue to use Yahoo BOSS, it may not be econonmically feasible to prefecth the results of related queries or analyze zero results as we do now. Thus, although performance is a non-issue technically, demands of computational power have financial implications.
As you look forward, what are some new features / issues that you think will become more important in 2009?
Obviously we think that the new user interface features in Noflail Search are important and hope they’ll become widely used in 2009. We have of course filed patent applications on the new features, but we are very willing to license the inventions to others. As for a breakthrough over the next 36 months, as a consumer of search, I very much hope that the indexable unit problem will be solved. This would increase search accuracy and make life easier for everybody.
Where can I find more information about your products, services, and research?
Noflail Search is available at http://noflail.com/, and white papers on the new features can be found in the Search Technology page (http://www.pomcor.com/search_technology.html) of the Pomcor Web site http://www.pomcor.com/).
Harry Collier, Infonortics Ltd., February 11, 2009
User Tracking Yahoo Style
February 5, 2009
Yahoo, if the news item in Web Pro News, is spot on, Yahoo is taking on an interesting challenge. “Yahoo to Start Keeping Tabs on Your Searches” by Chris Crumb documents Yahoo’s me-too of some discontinued Google features. Mr. Crumb said:
Search Pad for the Yahoo search engine. Essentially, it keeps track of your searches, figures out when you are researching things, and stores results of interest in a virtual notepad you can use for reference.
The write up provides links to additional information. The usage tracking implications are fascinating. The core of the write up is an interview with Tom Chi, Senior Director of Product Management with Yahoo Search. One of the most interesting comments was:
“This [service] follows the same data retention policy we have across Yahoo!,” explains Chi. “We recently announced a new policy. Under the new policy, Yahoo! will anonymize user log data within 90 days with limited exceptions for fraud, security and legal obligations. Yahoo! will also expand the policy to apply not only to search log data but also page views, page clicks, ad views and ad clicks.
Usage tracking yields high value data. How will the user, law enforcement, and marketing communities respond? It’s too soon to tell.
Stephen Arnold, February 5, 2009
Amidst Google Tracking Email Gets lost
February 4, 2009
As pundits talk about Google’s mobile locator service here, no one raises larger questions about surveillance. As important to me is the possible shift in Google’s posture to Yahoo Mail. Consider “Google Quietly Declares Email War on Yahoo” here. In my opinion, Goggle has kept a distance from Yahoo. If this Reuters story is correct, the game has changed. With a truck load of new features, Yahoo and other online mail services are at risk, For me the most important comment in the story was:
“They’re able to improve the products much faster than anyone else,” said Forrester Research analyst Ted Schadler.
Peed, scope, and agility mean that Google has taken the tip off its fencing sword and is using a rapier. Oh, the search works in GMail.
Stephen Arnold, February 4, 2009
Yahoo Search Technology Improves
February 4, 2009
Yep, Yahoo has tackled the problem of its search infrastructure. The company has made a commitment to rationalize its technology and get a running start to leap frog Google. Did Yahoo hire Alon Halevy from Google? Did Yahoo track down the hot start up in Finland whose technology could add a much needed differentiator to Yahoo search? Did Yahoo license the high performance database system developed by Perfect Search? Nope. Yahoo is changing public relations managers. The scintillating business move is described here. Google may be trembling as it guzzles Odwalla banana strawberry drinks.
Stephen Arnold, February 4, 2009
Yahoo in the Red in the Fourth Quarter
January 29, 2009
The San Jose Mercury News posted a short item whose headline tells the tale: “Yahoo Swings to Loss in 4Q” here. Sales were down. The company lost in October, November, and December 2008 $303 million. The fourth quarter is often one of the stronger for a commercial enterprise. The company faces mounting technology pressures and Yahoo may not be able to control costs for software and systems. Over the years, various Yahoo gurus have communicated to me is quite superior tones that Yahoo was better than some of its competitors. I did not believe it. Maybe now these gurus will look at Google’s performance, Google’s market share, and Yahoo’s own disarray and take action to address the technology weaknesses of Yahoo. Unless the plumbing is fixed, the company is not going to be able to make the type of progress that its stakeholders expect. Plumbing is important in search and online services. Bad plumbing equal uncontrolled technology costs. MBAs don’t believe it. I am not too concerned about MBAs. I am concerned about pundits who emphasize changes Yahoo should make that are cosmetic. Forget the lipstick. Get the infrastructure repaired.
Stephen Arnold, January 29, 2009
Ad Age Advises Yahoo: Startling Strategic Counsel
January 19, 2009
I read this weekend that top job opening require technical or scientific training. Imagine my surprise when Ad Age, a dead tree publication for the Liberal Arts and Master of Fine Arts crowd, published “Four Ways Yahoo Can right Itself under New CEO Bartz.” You can read this remarkable article here. Keep in mind that Yahoo is a technology company. The products and services of Yahoo are based on software, systems, and other arcana that delight computer scientists and electrical engineers, leaving the art gallery and soft drink executives lost in a cloud of unknowing. Furthermore, if you have read my other commentaries about Yahoo, you know that the ills of Yahoo are a manifestation of a misalignment of technology and user needs. Fixing Yahoo, therefore, requires more than a public relations blitz and a handful of consultants to change the ad rate price schedule. Some of the Mad Ave ilk will point to the unsold Super Bowl TV spots and assert, “Yahoo needs to snap up these ad slots and make some brand impact.” Right, advertising online services on the Super Bowl will work just as it will for Ask.com’s sponsorship of NASCAR.
Abbey Klaasen, the Ad Age journalist, identifies four strategies for the Yahooligans.
First, Yahoo has to hang on to search. I am a bit fuzzy about what “search” is referenced. Yahoo has a cartload of search systems. My hunch is that Ad Age thinks about Web search and ignoring the Flickr and Delicious systems, which may have more sizzle than the so so Web search. There’s also mail search, the search on the personal section, and so on. Ad Age is aware of the sports and finance information, but I wonder how much analysis is going on at Ad Age. Anyway, the idea is keep “search”. Let’s assume that Yahoo is to keep its various forms of search.
Second, the recommendation is for Yahoo to “combine search and display data.” I have to admit that I am not sure what this means. Yahoo lacks a homogeneous system; therefore, combining any cluster of services means normalization, transformation, and manipulation of data. Yahoo had a project underway to rationalize some disparate data, but I am not sure if that is still underway or if it swam on rocks. Advertisers have been asking for access to specific slices of Yahoo demographics across services for a while. Yahoo can’t deliver these types of audiences because of technical issues. Yahoo is a technology company. If a service is not available, there’s a technical reason, not a managerial reason. If the cost of “fixing up” the system is too high, the service will not be available. Yahoo has not been able to focus its resources on certain technical problems because it has a GM problem; that is, GM knows what Toyota and Honda do to make autos. GM can’t change the culture nor can it amass the resources to implement the Toyota and Honda solutions. Yahoo’s engineers are smart. Some go to Google and become happy campers; for example, the Delicious.com founder. It’s not brains; it’s a fundamental technical problem exacerbated by cost and management.
Third, Ad Age wants Yahoo to sell “the Unilevers of the world”. My hunch is that this is a play that will require fixing search and audience data. It is going to be tough to repaid the Yahoo-mobile unless one has the right parts. Yahoo is going to require the equivalent of a resto-mod rebuild on the jalopy before the Unilevers pump more cash into the Yahoo advertising opportunity.
Fourth, buy Hulu. Yahoo has been fooling around with video for a while. In case anyone missed the news, Google has managed to make YouTube.com the number two search engine. Hulu.com is also way behind the Googlers in terms of traffic. I grant that Hulu.com is better than Yahoo’s video services. Follow me on this line of reasoning: If Yahoo’s previous attempts to do video have been less than stellar, why will Yahoo handle Hulu.com better. Does anyone remember Finance Vision or the original content production push with Lloyd Braun’s return here? So, I assert that Yahoo’s ability to integrate an acquisition is questionable. Yahoo took years to integrate the Yahoo photo site into Flickr. Let’s assume that Yahoo does buy Hulu. Can Yahoo contribute to the service? At this time, whatever management expertise Yahoo has will be stretched trying to deal with the existing Yahoo technology and financial problems.
In short, I find the Ad Age counsel pretty interesting. It’s not wrong as Mad Ave thinking goes; it’s just from another dimension. I will stick with the reality of the goose pond in Harrod’s Creek, Kentucky.
Stephen Arnold, January 19, 2009
Received Wisdom about Microsoft Google Off by 30 Degrees
January 16, 2009
The dead tree version of the Wall Street Journal arrived this morning (January 16, 2009) and greeted me with Robert Guth’s article “Microsoft Bid to Beat Google Builds on a History of Misses”. You can find an online version here. You can also find a discussion by Larry Dignan here. Both of these write ups set my teeth on edge, actually, my beak. I am an addled goose, as you may know.
The premise of the Wall Street Journal article is that Microsoft had chances to do what Google is doing; to wit: sell ads, build search traffic, and buy Overture.com, among other missteps. The implication in these examples is that “woulda coulda shoulda” argument that characterizes people with a grip on received wisdom or what “everybody” knows and believes.
Mir. Dignan adds some useful points, overlooked by Mr. Guth; namely, Microsoft lacked a coherent Web strategy. Also, had Microsoft moved into ads that alone did not address Google’s focus on search. Mr. Dignan emphasizes that “you can’t count Microsoft out–even now.”
Let me from my hollow in Kentucky where the mine drainage has frozen a nice suphurous yellow this frosty morn offer a different view of the problem Microsoft faces. You can cherish these nuggets of received wisdom. I want to point out where these individual, small Google nuggets fit in the gold mine of online in the 21st century.
Received wisdom is useful but often is incomplete. Filling in the gaps makes a difference when determining what steps to take. Image source: http://www.grahamphillips.net/Ark/Ark_2_files/moses_with_tablets.jpg
What Google Did in 1998
Google looked at search and the problems then dominant companies faced. I can’t run down the numerous technical challenges. (If you want detail, click here.) I can highlight three steps taken by Google when Microsoft and others dabbling in the Internet were on equal footing.
First, Google looked at the bottlenecks in the various subsystems that go together to index digital information and make it findable. These bottlenecks were no surprise in 1998 and they aren’t today. Google identified issues with parallel processing, organizing the systems, and getting data moving the right place at the right time. Google tackled this problem head on by rethinking how the operating system could better coordinate breaking a task into bite sized chunks and then getting each chunk worked on and the results back where they were needed without bringing the computer to its knees. This problem still bedevils quite a few search engine companies, and Google may not have had a perfect solution. But Google correctly identified a problem and set out to solve it by looking for tips and tricks in the research computing literature and by tapping the expertise at AltaVista.com.
Second, Google figured that if it was going to index digital information on any scale, the company needed a way to build capacity without paying for the high end, exotic, and often flakey equipment used by some companies. One example of this type of hardware goof is the AltaVista.com service itself. It used the DEC Alpha chip, which was the equivalent of a Fabergé egg that generated the heat of a gas tungsten arc welding device. Google invested time and effort in cobbling together a commodity hardware solution.
Third, Google looked at what work had to be done when indexing and query processing. The company had enough brain power to realize that the types of read write processes that are part of standard operating systems and database systems would not be suitable for online services. Instead of embracing the traditional approach like every other commercial indexing outfit did in the 1998 to 2000 period (a critical one in Google’s technical development), Google started over. Instead of pulling an idea from the air, Google looked in the technical literature. Google took the bride’s approach to innovation: something borrowed, something new, etc. The result was what is now one of the core competitive advantages of Google–the suite of services that can deliver fast read speeds and still deliver acceptable performance with a Google Apps user saves a file.
Keep in mind that Google has been working on its business for a decade. Google is no start up. Google has a head start measured in years, not months or weeks.
Search in the Bartz Era at Yahoo
January 16, 2009
The Beyond Search geese have been honking speculatively today about Yahoo search in the post-floundering era. We decided that it was a miracle that Yahoo has been able to keep its revenues where they are and maintain a 20 percent share of the Web search market. Several of the Beyond Search goslings use Yahoo for mail, photo browsing, and bookmark surfing. Others don’t think too much of Yahoo for various reasons. These range from lousy performance over some wireless services to features that seem clunky compared to alternatives available from other vendors.
We read closely Rebecca Buckman’s “The Exacting Standars of Carol Bartz” and found the Forbes article interesting. You can read it here. Unlike some of the critical articles about Carol Bartz, Ms. Buckman focuses on her accomplishments. One interesting parallel is that the “freewheeling culture” of Autodesk and the wild and crazy approach at Yahoo may share some similarities. Ms. Bartz made staff changes and “professinalized” some departments. Yahoo may benefit from this type of management.
Our Beyond Search discussion focused on search, specifically what we perceive as the “problem” with Yahoo search. In order to make Yahoo search more useful, Yahoo has to find a way to address such shortcomings as the spotty relevancy for Web queries that are not about popular topics. The search available for Yahoo shopping is not very useful. In fact, it is on a par with eBay’s current system, and that is quite disappointing. Even convenience services such as finding currency conversion data becomes an exercise in navigating multiple pages. “Search without search” is something that Yahoo needs to master.
In order to remediate Yahoo search, we think that some serious engineering must be done and completed quickly. At lunch we ran several test queries. For example, one was “enterprise search”. The results were surprising. Here’s the display we saw:
We liked the search suggestions, but we found that the first four results were skewed to Microsoft. For example, there is the Microsoft paid ad in the blue box. That’s the second result. In the organic results, we saw a link to the Yahoo and IBM free search system, which is a boosted result. The Wikipedia result is okay. But the third and fourth results are for Microsoft search pages. The results are not “bad”; the results were just not what we expected. You can run your own queries and see how the Yahoo search results work for you.
A test shopping query was “discount quad core”. The system returned computer sytems from brand name vendors. I thnk each of these systems is tagged with the word “discount”. These are not discount systems, however.
How can these search issues be fixed? Is tweaking enough? Will Yahoo’s many different search initiatives ultimately lead to a system that is “better” than Google’s in the eyes of the users?
Here’s the Beyond Search lunch time view:
- Yahoo has to work on relevance. Google has made a significant investment in technology to determine context and react to what other users find helpful. Yahoo seems to lag in these areas.
- In terms of mobile serarch, the Yahoo system requires menu navigation. Because of the clunkiness of the approach, it is difficult to determine if Yahoo is doing much more than dumping informaton into buckets and showing stories as those stories arrive.
- For shopping, Yahoo gets a user close to a product, but Yahoo makes it difficult to find a specific product. We don’t think eBay or Google have cracked the code on shopping search. Yahoo might be able to leapfrog some of the competitors with an innovative approach.
The problem with addressing all or some of these challenges is that it will take time to come up with a solution that is not a one-off, stand-alone island. Yahoo has not focused on search as part of the core fabric of the company. At Google, search and advertising are tough to separate. At Yahoo, search is one thing. Advertising is another. Yahoo, therefore, must think of ways to integrate so the two functions yield an advantage over Google.
Yahoo has the talent and the funds to address these issues. What Yahoo does not have, we concluded, is time. In fact, time may be Yahoo’s biggest single problem. Floundering can be rectified with time. Without time, Yahoo will remain a shadow of its former self. Even a deal with Microsoft can’t change that.
Meantime, the Google maintains its lead in search and advertising. A decade of search missteps cannot be fixed over night. Ms. Bartz may have the expertise, but does she have the time? We quacked loudly, “We don’t think so.”
Stephen Arnold, January 16, 2009
Competition in Web Search: From Where
January 13, 2009
EcommerceTimes.com ran “What Search Needs Is Healthy Competition” here. Miguel Salcido takes a swing at a topic that will be lofted to regulators’ like a badminton birdie. The regulators can see it coming and the birdie looks easy to hit. But some of those whacks will miss, others fly out of bounds, some strike the net, and a few may make it to the other side for another volley. Mr. Salcido concedes that Google “continues to … dominate” Web search. Most of the tracking and statistics services understate Google’s actual lead, but that is of academic interest only. With Microsoft Live.com search losing share and Yahoo floundering like a bass in a net tossed to the bottom of the boat, Google faces competition from:
- Ask.com. The service works well for school kids but doesn’t meet my needs
- Quaero.com. The European Google killer
- Vivisimo. A metasearch system?
Maybe the challenge will come from a search engine in China (Baidu) or Russia (Yandex)? What about one of the many start ups who contact me. I like MSE360, but the company has a low profile.
Mr. Salcido suggests that competition for Google “would most likely come from a Yahoo/Microsoft or Yahoo/News Corp. merger.”
I don’t agree. Google represents a construct that will be difficult to duplicate. The “competitors” with some savvy are finding ways to work around Google, in spite of Google, or with Google. Competing with Google–based on my monitoring of Ask, Microsoft, and Yahoo–seems to be difficult. The reasons include:
- Google has been beavering away for a decade. Now an outfit wants to catch up. Good luck. A leap frog over Google is a better bet than trying to amass the resources to duplicate Google.
- Google, like Apple, has evolved into a weird social brand. Google has zero in common with the average Web user, but people love the GOOG. Apple has a similar effect, but so far Apple has not shown much appetite for the Web search business.
- Google continues to innovate. For my Google and publishing report due out this spring I describe one of Google’s monetization innovations. Few know about Google’s potential for generating money from content using a different business model than traditional media companies use. This type of innovation is largely off the radar of most Google pundits.
I think competition would be useful. From my pond filled with coal mine runoff, I don’t see much of threat in the current crop of challengers identified by Mr. Salcido. Yahoo with anything is not going to have a significant impact on Google. Yahoo search does not work for me, although about 20 percent of the Web search traffic reaches Yahoo. The problem is that the lion’s share of the traffic–about 70 percent or so–goes to Google.
Leapfrog, not a compound of two weaker entities.
Stephen Arnold, January 13, 2009
Lousy Economy, Google Gains Share
January 12, 2009
Barron’s reported here that Google gained market share in Web search in the US in December 2008. The source of the data is Hitwise.com. I think these data understate Google’s actual market share, but when the Wall Street Journal’s progeny asserts 72 percent market share, it must be true. The question is, “What will Microsoft and Yahoo do to gain ground?” The answer is, in my opinion, “Not much they can do.” Search is not a priority at either Microsoft or Yahoo. Sure, both outfits say search is job one, but the GOOG is built on search. Search is an add on, a pair of foam dice hanging from a bigger vehicle’s rear view mirror at Microsoft and Yahoo. Time is running out to catchup. Time to leapfrog.
Stephen Arnold, January 12, 2009