Google: Warning Bells Clanging

February 19, 2009

Henry Blodget wrote “Yahoo Search Share Rises Again… And Google Falls” here. The hook for the story is a report from the comScore tracking data that shows Google’s share of the Web search market “dropped a half point to 63%.” Mr. Blodget added quite correctly, “You don’t see that every day.” Mr. Blodget also flags Yahoo’s increase in search share, which jumped to 21%. Yahoo has made gains in share for the last five months. Congratulations to Yahoo.

Several comments:

  1. Data about Web search share is often questionable.
  2. Think back to your first day in statistics. Remember margin of error? When you have questionable data, a narrow gain or loss, and a data gathering system which is based on some pretty interesting data collection methods–what do you get? You get Jello data.
  3. The actual Web log data for outfits like Google and Yahoo often tell the company employees a different story. How different? I was lucky enough last year to see some data that revealed Google’s share of the Web search market north of 80 percent in the US. So which data are correct? The point is that sampled data about Web search usage is wide of the actual data by 10 to 15 percent or more.

Is Google in trouble? Not as much trouble as Yahoo. Assume the data are correct. The spread between Yahoo and Google is about 40 percent. Alarmism boosts traffic more easily than Yahoo can boost its share of the Web search market in my opinion.

Stephen Arnold, February 18, 2009

Alacra Raises Its Pulse

February 19, 2009

Alacra Inc., http://www.alacra.com/, released their Pulse Platform today. Along the lines of Beyond Search’s own Overflight service, Alacra’s Pulse finds, filters and packages web-based content by combining semantic analysis and an existing knowledge base to target and analyze more than 2,000 hand-selected feeds. The first platform app is Street Pulse, which carves out “what do key opinion leaders have to say…” about a given company. A press release says it “integrates comments from sell-side, credit analysts, industry analysts and a carefully vetted list of influential bloggers.” It’s offered free at http://pulse.alacra.com. There’s also a clean, licensed professional version with bells and whistles like e-mail alerts. More apps will follow that mayfocus on hush-hush company droppings that everyone loves to muck through. Alacra’s jingle is “Aggregate, Integrate, Package, Deliver.” Since so many people are already wading through info dumps, we see this service growing into a critical search resource. We’re certainly on board the bandwagon and will be watching for further developments.

Jessica W. Bratcher, February 19, 2009

Semantics in Firefox

February 19, 2009

Now available: the wonders of semantic search, plugged right into your Mozilla Firefox browser. headup started in closed testing but is now a public beta model downloadable from http://www.headup.com or http://addons.mozilla.org. You do have to register for it because Firefox lists it as “experimental,” but the reviews at https://addons.mozilla.org/en-US/firefox/reviews/display/10359 are glowing. A product of SemantiNet this plugin is touted to enable “true semantic capabilities” for the first time within any Web page. headup’s engine extracts customized info based on its user and roots out additional data of interest from across the Web, including social media sites like Facebook and Twitter. Looks like this add-on is a step in the right direction to bringing the Semantic Web down to earth. Check it out and let us know what you think.

Jessica Bratcher, February 19, 2009

Mahalo: SEO Excitement

February 18, 2009

If a Web site is not in Google, the Web site does not exist. I first said this in 2004 in a client briefing before my monograph The Google Legacy was published by Infonortics Ltd. The trophy MBAs laughed and gave me the Ivy draped dismissal that sets some Wall Street wizards (now former wizards) apart. The reality then was that other online indexing services were looking at what Google favored and emulating Google’s sparse comments about how it determined a Web site’s score. I had tracked down some of the components of the PageRank algorithm from various open source documents, and I was explaining the “Google method” as my research revealed it. I had a partial picture, but it was clear that Google had cracked the problem of making the first six or seven hits in a result list useful to a large number of people using the Internet. My example was the query “spears”. Did you get Macedonian spears or links to aboriginal weapons? Nope. Google delivered the pop sensation Britney Spears. Meant zero to me, but with Google’s surging share of the Web search market at that time, Google had hit a solid triple.

The SEO (search engine optimization) carpetbaggers sensed a pot of gold at the end of desperate Web site owners’ ignorance. SEO provides advice and some technical services to boost a Web page’s or a Web’s site appeal to the Google PageRank method. Over the intervening four or five years, a big business has exploded to help a clueless marketing manager justify the money pumped into a Web site. Most Web sites get minimal traffic. Violate one of Google’s precepts, and the Web site can disappear from the first page or two of Google results. Do something really crazy like BMW’s spamming or the Ricoh’s trickery and Googzilla removes the offenders from the Google index. In effect, the Web site disappears. This was bad a couple of years ago, but today, it is the kiss of death.

I received a call from a software company that played fast and loose with SEO. The Web site disappeared into the depths of the Google result list for my test queries. The aggrieved vice president (confident of his expertise in indexing and content) wanted to know how to get back in the Google index and then to the top of the results. My answer then is the same as it is today, “Follow the Google Webmaster guidelines and create compelling content that is frequently updated.”

Bummer.

I was fascinated with “Mahalo Caught Spamming Google with PageRank Funneling Link Scheme” here. The focal point of the story is that Mahalo, a company founded by Jason Calacanis, former journalist, allegedly “was caught ranking pages without any original content-in clear violation of Google’s guidelines.” The article continued:

And now he has taken his spam strategy one step further, by creating a widget that bloggers can embed on their blogs.

You can read the Web log post and explore the links. You can try to use the referenced widget. Have at it. Furthermore,I don’t know if this assertion is 100 percent accurate. In fact, I am not sure I care. I see this type of activity in reality or as a thought experiment as reprehensible. Here’s why:

  1. This gaming of Google and other Web indexing systems costs the indexing copies money. Engineers have to react to the tricks of the SEO carpetbaggers. The SEO carpetbaggers then try to find another way to fool the Web indexing system’s relevance ranking method. A thermonuclear war ensues and the costs of this improper behavior sucks money from other needed engineering activities.
  2. The notion that a Web site will generate traffic and pay for itself is a fantasy. It was crazy in 1993 when Chris Kitze and I started work on The Point (Top 5% of the Internet), which is quite similar to some of the Mahalo elements. There was no way to trick Lycos or Harvest because it was a verifiable miracle if those systems could update their indexes and handle queries with an increasing load and what is now old-fashioned, inefficient plumbing. Somehow a restaurant in Louisville Kentucky or a custom boat builder in Arizona thinks a Web site will automatically appear when a user types “catering” or “custom boat” in a Google search box. Most sites get minimal traffic and some may be indexed on a cycle ranging from several days to months. Furthermore, some sites are set up in such a wacky way that the indexing systems may not try to index the full site. The problem is not SEO; the problem is a lack of information about what’s involved in crafting a site that works.
  3. Content on most Web sites is not very good. I look at my ArnoldIT.com Web site and see a dumping ground for old stuff. We index the content using the Blossom search system so I can find something I wrote in 1990, but I would be stunned if I ran a query for “online database” and saw a link to one of my essays. We digitized some of the older stuff, but no one–I repeat–no one looks at the old content. The action goes to the fresh content on the Web log. The “traditional” Web site is a loser except for archival and historical uses.

The fact that a company like Mahalo allegedly gamed Google is not the issue. The culture of cheating and the cult of SEO carpetbaggers makes this type of behavior acceptable. I get snippy little notes from those who bilk money from companies who want to make use of online but don’t know the recipe. The SEO carpetbaggers sell catnip. What these companies need is boring, dull, and substantial intellectual protein.

Google, Microsoft, and Yahoo are somewhat guilty. These companies need content to index. The SEO craziness is a cost of doing business. If a Web site gets some traffic when new, that’s by design. Over time, the Web site will drift down. If the trophy generation Webmaster doesn’t know about content and freshness, the Web indexing companies will sell traffic.

There is no fix. The system is broken. The SEO crowds pay big money to learn how to trick Google and other Web indexing companies. Then the Web indexing companies sell traffic when Web sites don’t appear in a Google results list.

So what’s the fix? Here are some suggestions:

  1. A Web site is a software program. Like any software, a plan, a design, and a method are needed. This takes work, which is reprehensible to some. Nevertheless, most of the broken Web sites cannot be cosmeticized. Some content management systems generate broken Web sites as seen by a Web indexing system. Fix: when possible, start over and do the fundamentals.
  2. Content has a short half life. Here’s what this means. If you post a story once a month, your Web site will be essentially invisible even if you are a Fortune 50 company. Exceptions occur when an obscure Web site breaks a story that is picked up and expanded by many other Web sites. Fix: write compelling content daily or better yet more frequently.
  3. Indexing has to be useful to humans and content processing systems. Stuffing meaningless words into a metatag is silly and counterproductive. Hiding text by tinting it to be the same as a page’s background color is dumb. Fix: find a librarian or better yet take a class in indexing. Select meaningful terms that describe the content or the page accurately. The more specialized your terminology, the more narrow the lens. The broader the term, the wider the lens. Broad terms like “financial services” are almost useless, since the bound phrase is devalued. Try some queries looking for a financial services firm in a mid sized city. Tough to do unless you get a hit in http://local.google.com or just look up the company in a local business publication or ask a friend.

As for Mahalo, who cares? The notion of user generated links by a subject matter expert worked in 1993. The method has been replaced by http://search.twitter.com or asking a friend on Facebook.com. Desperate measures are needed when traffic goes nowhere. Just don’t get caught is the catchphrase in my opinion.

Stephen Arnold, February 18, 2009

Twitter and Search

February 18, 2009

I read Peter Hershberg’s “Does Twitter Represent the Future of Search? Or Is It the Other Way Around?” here. The article begins with a reference to search engine optimization guru Dan Sullivan and then races forward with this argument:

people are increasingly turning to Twitter — rather than Google and Yahoo — when looking for information on breaking news.  This is a trend we highlighted in our 2009 predictions post at the end of last year.  For proof of Twitter’s real-time search capabilities all you need to do is look back at last week’s plane crash in the Hudson to see where the news initially broke.  People were talking about the event for several minutes on Twitter before the first mentions of it on Google News or any major media site, for that matter.

For me, the most interesting comment in the article was:

My personal view is that Google and Yahoo haven’t come up with Twitter solutions simply because they did not initially understand what Twitter represents from a search perspective. Twitter themselves may have failed to grasp this initially, before Summize came into the mix. It’s unlikely that either Google or Yahoo saw Twitter’s potential as a search engine.  So, it’s only now that they’re probably starting to put adequate resources behind developing a strategy in this area, though I have to believe that it’s become a very high priority, particularly for Google. That’s where this issue gets really interesting – particularly for someone like me who views social media through the lens of search.

The wrap up made a good point:

To this point, the “Twitterverse” has pretty much been living in a bubble – one where all updates are made and consumed within Twitter and its associated applications alone and where some believe that having 10,000 followers means that you are an authoritative or influential figure.  While I believe that is, in fact, the case for some (and I won’t diminish the value in having a large following), the volume of traffic some individual Twitter updates will receive from organic search will dwarf what they are typically able to generate from Twitter alone.  It also means that Twitter accounts with fewer followers – but with something important and to say on a given topic – will start to see some increased attention as well.  Much like many of the early bloggers did.  And when that happens, the whole question of influence and authority will once again be turned on its head.

As I thought about this good write up, I formulated several questions:

  1. Will Google’s play be to provide a dataspace in which Twitter comments and other social data are organized, indexed and made useful?
  2. In a Twitterspace, will new types of queries become essential; for example, provenance and confidence?
  3. Will Google, like Microsoft, be unable to react to the opportunity of real time search and spend time and money trying to catch up with a train that has left the station?

I have no answers. Twitter is making real time search an important tool for users who have no need for the dinosaur caves of archived data that Google continues to build.

Stephen Arnold, February 18, 2009

Yahoo and Its New Mobile Service

February 18, 2009

Yahoo News posted “Yahoo Mobile Aims to Channel Your Inner iPhone” here. Yahoo access on my various mobile devices seemed to require quite a bit of menu shuffling. I also found the interface’s refusal to remember my log in name somewhat idiosyncratic. But the system worked. The new service as described in the news story seemed to me to be a giant step forward. The news release said:

Yahoo Mobile will be released in three versions — one for the mobile Web, one for the iPhone, and one for other smartphones… Yahoo’s onePlace is also available in all three editions. The service lets a user access and manage, from a single location, favorite content such as news topics and sources, RSS feeds, sports scores, weather conditions, stock quotes, blogs, movie theaters, or horoscopes… In the smartphone version, users can also use oneSearch’s voice-search feature simply by talking. It also offers maps; an integrated mini-version of the popular mobile Web browser Opera; and widgets, which are small applications that provide various services that can be mixed and matched.

I fired up my smartphone and navigated to Yahoo, following the same steps I had used prior to my test on February 17, 2009, at 5 pm Eastern. Instead of a new Yahoo service or the old Yahoo service, here’s what I saw:

yahoo mobile message 2

Sigh. I understand that new Yahoo is not available, but what about old Yahoo?

Stephen Arnold, February 18, 2009

Google from Near $600 to $300 Price Target

February 18, 2009

Most of the folks in Harrod’s Creek don’t pay much attention to California outfits’ share price targets. Truth be told, old geese like me don’t either. You may find Eric Savitz’ “Google: The Case against a Second Half Recovery” here just what your inner investor requires. The point of the write up is that the GOOG is in for some tough sledding. How tough? Well, if you bought Google at $550, you will experience a loss if you dump your shares. If you don’t have Google, no worries.

Stephen Arnold, February 18, 2009

Amazon’s Implicit Metadata

February 18, 2009

Amazon is interesting to me because the company does what Google wants to do but more quickly. The other facet of Amazon that is somewhat mysterious is how the company can roll out cloud services with a smaller research and development budget than Google’s. I have not thought much about the A9 search engine since Udi Manber left to go to Google. The system is not too good from my point of view. It returns word matches but it does not offer the Endeca-style guided navigation that some eCommerce sites find useful for stimulating sales.

Intranet Insights disclosed here that Amazon uses “implicit metadata” go index Amazon content. I can’t repeat the full list assembled by Intranet Insights, and I urge you to visit that posting and read the article. I can highlight three examples of Amazon’s “implicit metadata” and offer a couple of observations.

Implicit metadata means automatic indexing. Authors or subject matter experts can manually assign index terms or interact with a semi-automated system such as that available from Access Innovations in Albuquerque, New Mexico. But humans get tired or fall into a habit of using a handful of common terms instead of consulting a controlled term list. Software does not get tired and can hit 90 percent accuracy once properly configured and resourced. Out of the box, automated systems hit 70 to 75 percent accuracy. I am not going to describe the methods for establishing these scores in this article.

Amazon uses, according to Intranet Insights:

  • Links to and links from, which is what I call the Kleinberg approach made famous with Google’s PageRank method
  • Author’s context; that is, on what “page” or in what “process” was the author when the document or information object was created. Think of this as a variation of the landing page for an inbound link or an exit page when a visitor leaves a Web site or a process
  • Automated indexing; that is, words and phrases.

The idea is that Amazon gathers these data and uses them as metadata. Intranet Insights hints that Amazon uses other information as well; for example, comments in reviews and traffic analysis.

The Amazon system puzzles me when I run certain queries. Let me give some examples of problems I encounter:

  1. How can one search lists of books created by users? These exist, but for me the unfamiliar list is often difficult to locate and I cannot figure out how to find a particular book title on lists in that Amazon function. Why would I want to do this? I have a title but I want to see other books on lists on which the title appears. If you know how to run this query, shoot me the search string so I can test it.
  2. How can I filter a results list to eliminate books that are not yet published from books that are available? This situation arises when looking for Kindle titles using the drop downs and search function for titles in the Kindle collection. The function is available because the “recommendations” segment forthcoming titles from available titles, but the feature eludes me in the Kindle subsite.
  3. How can I run a query and see only the reviews by specific reviewers? I know that a prolific reviewer is Robert Steele. So, I want to see the books he has reviewed and I want to see the names of other reviewers who have reviewed a specific title that Mr. Steele has reviewed.

Amazon’s search system like the one Google provides for Apple.com is a frustrating experience for me. Amazon has lost sight of some of the basic principles of search; namely, if you have metadata tags, a user should be able to use these to locate the information in the public index.

This is not a question of implicit or explicit metadata. Amazon, like Apple, is indifferent to user needs. The focus is on delivering a minimally acceptable search service to satisfy the needs of the average user for the express purpose of moving the sale along. The balance I believe is between system burden and generating revenue. Amazon search deserves more attention in my opinion.

Stephen Arnold, February 18, 2009

Precognitive Search

February 17, 2009

Charles Hudson’s “The Database of Intentions Is More Valuable than the Database of Musings for Now (Google and Twitter)” is an interesting article. The notion of putting Google at one end of a spectrum and Twitter at another intrigues me. You can find the write up here. A number of buzzwords have been pushed off the cliff in an effort to capture the shift from historical search to real time search. For example, there was the word attention as used in the phrase “the attention economy”. Then there was the word “conversation” to describe Web log posts and the ripostes that would appear in the comments section. With the publication of Buyology: Truth and Lies about Why We Buy by Martin Lindstrom, a clever bit of word play, provides via brain scans that people make decisions without conscious thought. The closest the average Web user will get to this type of precognitive thinking is by running a query on http://search.twitter.com. Certain entities of various governments have somewhat similar functions, but those are not available to anyone with a Web browser. Twitter.com is a public stream of brief comments. Mr. Hudson’s tackles this notion, and he offers some excellent observations. If you are interested in the future of real time search, read his essay. He doesn’t provide context for some of his assertions, but he does make the spectrum clear and sets the stage for additional thinking about these services.

Stephen Arnold, February 17, 2009

Google and Phony Betas

February 17, 2009

I believe Google when it slaps a “beta” label on a service. Google’s system was built to deliver results to queries. The reading and writing functions are a little bit of wizardry, a dash of hack, and some old fashioned rethinking of known problems of massively parallel distributed systems. Think Chubby and its file and record locking and unlocking function in the context of Google’s scale of operation.

Now Macworld’s “Don’t Be Fooled by Google’s Phony Beta Label” by Mike Elgan tied to convince me that Google’s beta label is bogus. The argument is that Google is playing a marketing game. I don’t agree. I think that many of Google’s beta products and services are closer to alphas. The vaunted GMail is still in beta and it should be. My sources have suggested to me that the GOOG continues to twiddle the knobs and fiddle with the settings of GMail. The system is improving but it is not yet fully mature. The Postini functions are sort of there, but not fully integrated. I could generate a list of these issues with GMail, but I want to point to the most interesting comment in Mr. Elgan’s write up:

The truth is that designating new features as “experimental” and announcing them only on a blog is just a charade, a marketing gimmick. It’s just Google’s way of having it both ways. It launches apps and features that grab market share, attract eyeballs and give it the traffic it needs to make billions of dollars per fiscal quarter. But gosh, gee, it’s just little old us trying out a few ideas, so don’t criticize! Hey, we can all play that game. This publication you’re reading now uses a revenue model similar to Google’s. You’re reading this for free, but the publication makes money by selling the advertising you see on this page. The publishing company pays me to write it out of money earned from those advertising dollars.

I am not a sufficiently silly goose to do much with Gmail, my Google Apps account, or some of the other Google gizmos that I explore. In fact on February 14, 2009, my AdSense report was not available. Yep, beta even though Google is like an elephant balancing on its revenue trunk with its advertising service. If you can’t get advertising right, I ask myself, “When will Gmail be ready for prime time?” The Chrome gizmo is out of beta and I think it should be in alpha in my opinion.

Interesting write up.

Stephen Arnold, February 17, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta