Financial Waves Swamp SurfRay

January 17, 2009

Just a quick update on SurfRay, the search roll up in Copenhagen. SurfRay bought Speed of Mind, Ontolica, and Mondosoft. You can run a query on Beyond Search and read the previous posts about this company. One of my readers in Europe alerted me to a bankruptcy notice. A happy quack to that person. We did some checking and found a document from Bolagsverket’s Official Register, the announcements for January 16, 2009 contained this information:

surfray bankrupt

If your Swedish is rusty, the snippet says that Surfray has been declared bankrupt at Solna court. Future announcements will be issued to some newspapers. I will keep chasing this story. I have no other details as of 3 36 Eastern on January 16, 2009.

Stephen Arnold, January 16, 2009

Oh, oh, Multicore CPUs Have Bottlenecks

January 17, 2009

Last year a wizard at a well-known search and content processing company was singing the praises of the multi-core Intel CPUs. AMD is in this game along with a handful of other companies. But the wizard was focused on Intel and the speed up the high-density multi-core chips would deliver. I listen to these excited explanations of how hardware will solve some of the performance issues associated with content processing, but I have learned that throwing hardware at a software problem is not 100 percent reliable. But it is an easy out for the information technology unit and it makes the hardware vendors really happy.

But The Register ran a story here on January 15, 2009, that suggests multi core processor technology is not the next big thing the search engine wizard hoped it would be. You can read “US Nuke Boffins” Multicore CPU Gains Stop at Eight”. (Boffin is a British word that means wizards. I will stick with wizards. A boffin in rural Kentucky would have a tough time at the local Harrod’s Creek hang out.) If the US “boffins” are correct, putting more cores on a CPU creates a data log jam. Big surprise. Moving data is not easy or fast even for “boffins”.

What’s this mean for search? Well, not much. Slow systems will remain slow. Fast systems will take advantage of two and four core CPUs. When eight core CPUs become cheap enough for mere mortals, then the GOOG will stuff them into their servers and go faster. To get the real payoff, Intel and other chip gurus will have to do some lateral thinking. The blending of dedicated CPUs with special memory structures sort of along the lines of the original PS3 chip might be a fruitful path to explore. IBM has some other tricks to try as well.

But for search systems with computational burdens, well, those systems will remain expensive to accelerate.

Stephen Arnold, January 17, 2009

Internet and Moore’s Law

January 17, 2009

Physorg.com here published “Internet Growth Follows Moore’s Law Too”.The article points out that information, according to Chinese researchers, “has discovered that Moore’s Law can also describe the growth of the Internet. The key point in the write up for me was that “the Internet will double is size every 5.32 years.” I find a number like this interesting, but it does not match the data that my team has been gathering over the last few years. We focus on the enterprise, and our data suggest that digital information in an organization doubles every six to nine months. If these data are accurate, then the Internet is not growing as rapidly as digital information in organizations. Anyone have any other data? The Chinese estimate seems on the low side.

Stephen Arnold, January 17, 2009

More Legal Clarity about Metadata

January 17, 2009

IDM.net published Greg McNevin’s “US Court Rules on eDiscovery Metadata.” You can read the story here. You probably don’t think too much about eDiscovery until you find yourself in a trial. Then you develop a real enthusiasm for the activity. Get it wrong and you will have some interesting roomies for a while. Get it right and you never forget about eDiscovery, ever.

The story grinds through a court ruling. The net net is that companies have to do a better job with electronic information. If an organization thinks a data dump will do the job, that’s probably a bad thought. The article stated:

Guidance says that as eDiscovery best practices are delineated and ultimately determined by the courts, this case is particularly important as it dispels any uncertainty concerning the legal requirements for metadata preservation.

Stated another way: index emails, keep the archive from spoliation, and provide the court with whatever the court wants.

Stephen Arnold, January 17, 2009

Oracle, Semantics, and Search

January 17, 2009

Secure Enterprise Search (SES10g) has dropped off my radar screen. Nothing new at Oracle World last fall. I did attend an Oracle briefing at one of the lame duck conferences I hit last year, but recently–zip. I knew that Oracle had explored a tie up with Siderean Software, a now quiet company near Los Angeles. I also picked up some intel about a conversation and test with the Bitext wizards but nothing lately.

I read in Semantic Focus here that Oracle is moving forward with semantics. The article “Semantic Data Storage in Oracle” here is worth a read. I found the information encouraging, but the write up prompted me to do some addled goose type thinking. If you are familiar with this Web log, you know that the “addled goose” phrase signals some questions and few observations.

The point of the article was to tell me that Oracle’s mothership (the Oracle 11g database) provides a platform to store the semantic Webby stuff called RDF and OWL data. RDF, in case you have forgotten, is semantic Web speak for Resource Description Framework. It is a framework for describing and interchanging metadata. More info is here. OWL is not part of the Hooter’s logo. The acronym means Web Ontology Language. More info is here. For me the most important comment was:

It [Oracle 11g] allows efficient storage, loading and querying of semantic data. Queries are enhanced by adding relationships (ontologies) to data and evaluated on the basis of semantics. Data storage is in the form of RDF triples (Subject, Predicate, Object) and can scale up to millions of triples. The triples stored in the semantic data store are modeled as a graphed structure. All the data is stored in a single central schema allowing access to users for loading and querying data.

Now my questions:

  1. Where does Secure Enterprise Search fit into this semantic data picture?
  2. With performance an issue, how will the inclusion of potentially verbose information affect retrieval?
  3. What tools will Oracle provide to make use of these new data types?

We’ve been stuff all sorts of information into database management systems for years. Maybe I am missing something, but I don’t see the type of breakthrough that companies like Aster Data and InfoBright are delivering whether the data are or are not “semantic”. One final question: What’s going on with SES10g?

Stephen Arnold, January 17, 2009

Received Wisdom about Microsoft Google Off by 30 Degrees

January 16, 2009

The dead tree version of the Wall Street Journal arrived this morning (January 16, 2009) and greeted me with Robert Guth’s article “Microsoft Bid to Beat Google Builds on a History of Misses”. You can find an online version here. You can also find a discussion by Larry Dignan here. Both of these write ups set my teeth on edge, actually, my beak. I am an addled goose, as you may know.

The premise of the Wall Street Journal article is that Microsoft had chances to do what Google is doing; to wit: sell ads, build search traffic, and buy Overture.com, among other missteps. The implication in these examples is that “woulda coulda shoulda” argument that characterizes people with a grip on received wisdom or what “everybody” knows and believes.

Mir. Dignan adds some useful points, overlooked by Mr. Guth; namely, Microsoft lacked a coherent Web strategy. Also, had Microsoft moved into ads that alone did not address Google’s focus on search. Mr. Dignan emphasizes that “you can’t count Microsoft out–even now.”

Let me from my hollow in Kentucky where the mine drainage has frozen a nice suphurous yellow this frosty morn offer a different view of the problem Microsoft faces. You can cherish these nuggets of received wisdom. I want to point out where these individual, small Google nuggets fit in the gold mine of online in the 21st century.

image

Received wisdom is useful but often is incomplete. Filling in the gaps makes a difference when determining what steps to take. Image source: http://www.grahamphillips.net/Ark/Ark_2_files/moses_with_tablets.jpg

What Google Did in 1998

Google looked at search and the problems then dominant companies faced. I can’t run down the numerous technical challenges. (If you want detail, click here.) I can highlight three steps taken by Google when Microsoft and others dabbling in the Internet were on equal footing.

First, Google looked at the bottlenecks in the various subsystems that go together to index digital information and make it findable. These bottlenecks were no surprise in 1998 and they aren’t today. Google identified issues with parallel processing, organizing the systems, and getting data moving the right place at the right time. Google tackled this problem head on by rethinking how the operating system could better coordinate breaking a task into bite sized chunks and then getting each chunk worked on and the results back where they were needed without bringing the computer to its knees. This problem still bedevils quite a few search engine companies, and Google may not have had a perfect solution. But Google correctly identified a problem and set out to solve it by looking for tips and tricks in the research computing literature and by tapping the expertise at AltaVista.com.

Second, Google figured that if it was going to index digital information on any scale, the company needed a way to build capacity without paying for the high end, exotic, and often flakey equipment used by some companies. One example of this type of hardware goof is the AltaVista.com service itself. It used the DEC Alpha chip, which was the equivalent of a Fabergé egg that generated the heat of a gas tungsten arc welding device. Google invested time and effort in cobbling together a commodity hardware solution.

Third, Google looked at what work had to be done when indexing and query processing. The company had enough brain power to realize that the types of read write processes that are part of standard operating systems and database systems would not be suitable for online services. Instead of embracing the traditional approach like every other commercial indexing outfit did in the 1998 to 2000 period (a critical one in Google’s technical development), Google started over. Instead of pulling an idea from the air, Google looked in the technical literature. Google took the bride’s approach to innovation: something borrowed, something new, etc. The result was what is now one of the core competitive advantages of Google–the suite of services that can deliver fast read speeds and still deliver acceptable performance with a Google Apps user saves a file.

Keep in mind that Google has been working on its business for a decade. Google is no start up. Google has a head start measured in years, not months or weeks.

Read more

Ask.com: The Official Search Engine of NASCAR

January 16, 2009

A surprised quack to the reader who alerted me to this San Francisco Business Times’s story, “Ask.com Inks NASCAR Deal.” You can read the story here. According to the story:

the goal of the deal is to make Ask the first search engine NASCAR fans use when seeking information about the sport. Ask will run the search box on NASCAR’s web site and will also make a “NASCAR toolbar” for users.

A number of years ago, Northern Light sponsored a race car. In tough economic times, maybe a NASCAR deal will work. On the other hand, NASCAR is experiencing a bit of a downturn. Toyota seems to be pulling in its NASCAR horns. Ask.com is taking a contrarian view of NASCAR. The move won’t affect Ask.com’s Web traffic in my opinion.

Stephen Arnold, January 16, 2009

Leapfish Launches

January 16, 2009

LeapFish is a metasearch engine. The company calls its system “a multi dimensional engine.” LeapFish Inc. is a privately held corporation headquartered out of CARR America Corporate Center in Pleasanton, California. The company’s metasearch technology uses a proprietary hyper threading technology. The Marketwatch.com story here said:

LeapFish pushes search to 2.0 and states “out with the search button.” LeapFish’s revolutionary new click free search interface gives life to a fast, fluid and dynamic search experience that extracts the variety of data from major online destinations such as Google, YouTube, eBay and others in a single search query. Consolidating a knowledge base of relevancy and variety from major online authorities, LeapFish effectively renders more comprehensive results than those returned by its providers.

The addled geese at Beyond Search find metasearch systems useful. Our favorite–EZ2Find.com–has exited business. Unlike EZ2Find.com, Leapfish uses an uncluttered splash screen. The naked screen of Ixquick.com is more spare, but Leapfish strikes a good balance between point and get started and the Google-type search box of Ixquick.com.

We liked the ability to limit the query to content type. Leapfish has done some work to tame the not-so-good search experience we have encountered on Google’s blogsearch service. The mouse action was a hair trigger when viewing results from a shopping query.

Leapfish is an interesting service, and we will add it to our list of systems to monitor. More information about the company is at http://www.leapfish.com/AboutUs.aspx.

Search in the Bartz Era at Yahoo

January 16, 2009

The Beyond Search geese have been honking speculatively today about Yahoo search in the post-floundering era. We decided that it was a miracle that Yahoo has been able to keep its revenues where they are and maintain a 20 percent share of the Web search market. Several of the Beyond Search goslings use Yahoo for mail, photo browsing, and bookmark surfing. Others don’t think too much of Yahoo for various reasons. These range from lousy performance over some wireless services to features that seem clunky compared to alternatives available from other vendors.

We read closely Rebecca Buckman’s “The Exacting Standars of Carol Bartz” and found the Forbes article interesting. You can read it here. Unlike some of the critical articles about Carol Bartz, Ms. Buckman focuses on her accomplishments. One interesting parallel is that the “freewheeling culture” of Autodesk and the wild and crazy approach at Yahoo may share some similarities. Ms. Bartz made staff changes and “professinalized” some departments. Yahoo may benefit from this type of management.

Our Beyond Search discussion focused on search, specifically what we perceive as the “problem” with Yahoo search. In order to make Yahoo search more useful, Yahoo has to find a way to address such shortcomings as the spotty relevancy for Web queries that are not about popular topics. The search available for Yahoo shopping is not very useful. In fact, it is on a par with eBay’s current system, and that is quite disappointing. Even convenience services such as finding currency conversion data becomes an exercise in navigating multiple pages. “Search without search” is something that Yahoo needs to master.

In order to remediate Yahoo search, we think that some serious engineering must be done and completed quickly. At lunch we ran several test queries. For example, one was “enterprise search”. The results were surprising. Here’s the display we saw:

yahoo result jan 15

We liked the search suggestions, but we found that the first four results were skewed to Microsoft. For example, there is the Microsoft paid ad in the blue box. That’s the second result. In the organic results, we saw a link to the Yahoo and IBM free search system, which is a boosted result. The Wikipedia result is okay. But the third and fourth results are for Microsoft search pages. The results are not “bad”; the results were just not what we expected. You can run your own queries and see how the Yahoo search results work for you.

A test shopping query was “discount quad core”. The system returned computer sytems from brand name vendors. I thnk each of these systems is tagged with the word “discount”. These are not discount systems, however.

yahoodiscount quad core

How can these search issues be fixed? Is tweaking enough? Will Yahoo’s many different search initiatives ultimately lead to a system that is “better” than Google’s in the eyes of the users?

Here’s the Beyond Search lunch time view:

  1. Yahoo has to work on relevance. Google has made a significant investment in technology to determine context and react to what other users find helpful. Yahoo seems to lag in these areas.
  2. In terms of mobile serarch, the Yahoo system requires menu navigation. Because of the clunkiness of the approach, it is difficult to determine if Yahoo is doing much more than dumping informaton into buckets and showing stories as those stories arrive.
  3. For shopping, Yahoo gets a user close to a product, but Yahoo makes it difficult to find a specific product. We don’t think eBay or Google have cracked the code on shopping search. Yahoo might be able to leapfrog some of the competitors with an innovative approach.

The problem with addressing all or some of these challenges is that it will take time to come up with a solution that is not a one-off, stand-alone island. Yahoo has not focused on search as part of the core fabric of the company. At Google, search and advertising are tough to separate. At Yahoo, search is one thing. Advertising is another. Yahoo, therefore, must think of ways to integrate so the two functions yield an advantage over Google.

Yahoo has the talent and the funds to address these issues. What Yahoo does not have, we concluded, is time. In fact, time may be Yahoo’s biggest single problem. Floundering can be rectified with time. Without time, Yahoo will remain a shadow of its former self. Even a deal with Microsoft can’t change that.

Meantime, the Google maintains its lead in search and advertising. A decade of search missteps cannot be fixed over night. Ms. Bartz may have the expertise, but does she have the time? We quacked loudly, “We don’t think so.”

Stephen Arnold, January 16, 2009

Google Query Volume

January 16, 2009

TelecomPaper.com, one of my favorite sources, reported here that Google conducted 5.42 billion search queries in December 2008. Let’s assume that this number is accurate. The table below shows how the alleged query volume breaks down in queries per second. The chart is probably not too interesting to anyone but an addled goose like me, but I find the scaling implications quite interesting. Two search system vendors have asserted to me that their systems are faster than Google’s. To put these vendors’ assertions in context, I use these tables.

174.838,710 queries per day
7,284,946 queries per hour
121,416 queries per minute
2,024 queries per second

I there’s an error in this quick calculation, please, let me know. To support these queries Google is showing more ads and performing other functions as well. My thought is that Google seems to have a reasonably robust system.

Stephen Arnold, January 16, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta