Deep Web, Surface Sparkles Occlude Deeper Look

February 23, 2009

You can read pundits, mavens, and wizards comment on the New York Times’s “Exploring a Deep Web that Google Can’t Grasp.” The original is here for a short time. Analysis of varying degrees of usefulness appear in Search Engine Land and the Marketing Pilgrim’s “Discovering the Rest of the Internet Iceberg” here.

There’s not much I can say to reverse the flow of misinformation about what Google is doing because Google doesn’t talk to me or to the pundits, mavens, and wizards who explain the company’s alleged weaknesses. In 2007, I wrote a monograph about Google’s programmable search engine disclosures. Published by BearStearns, this document is no longer available. I included the dataspace research in my Beyond Search study for The Gilbane Group in April 2008. In September, I then with Sue Feldman wrote about Google’s dataspace technology. You can get  copy of the dataspace report directly from IDC here. Ask for document 213562. Both of these studies explicate Google’s activities in  structured data and how those data mesh with Google’s unstructured information methods. I did a detailed explanation of the programmable search engine inventions in Google Version 2.0. That report is still available, but it costs money and I will be darned if I will restate information that is in a for fee study. There are some brief references to these technologies available at without charge and in the archive to this Web log. You can search the archive at and this Web log from the search box on any blog page.

lga sfo

This sure looks like “deep Web” information to me. But I am not a maven, wizard, or pundit. Nor do I understand search with the depth of the New York Times, search engine optimization experts, and trophy generation analysts. I read patent documents, an activity that clearly disqualifies me from asserting that Google can’t perform a certain action based on its disclosed in open source disclosures. Life is easier when such disclosures are ignored or excluded from the research process.

So what? Two points:

  1. Google can and does handled structured data. Examples exist in the wild at and by entering the query “lga sfo” from’s search box.
  2. Yip yap about the “deep Web” has been popular for a while, and it is an issue that requires more analysis than assertions based on relatively modest research into the subject

In my opinion, before asserting that Google’s is baffled, off track, clueless, or slow on the trigger–look a bit deeper than the surface sheen on Googzilla’s scales. No wonder outfits are surprised with some of Google’s “effortless” initiatives. By dealing with superficiality, the substance is not seen for what resides under the surface.

Pundits, mavens, wizards, please, take  moment to look into Guha, Halevy, and the other Googlers who have thought about and who are working on structured, semistructured, and unstructured data in the Google data environment. That background will provide some context for Google’s apparent sluggishness in this “space”.

Stephen Arnold, February 23, 2009


2 Responses to “Deep Web, Surface Sparkles Occlude Deeper Look”

  1. links for 2009-02-23 on February 23rd, 2009 11:34 pm

    […] Deep Web, Surface Sparkles Occlude Deeper Look ( […]

  2. Matthew Theobald on February 24th, 2009 4:49 am

    Hi Stephen,

    You might mention the idea of an emerging standard for organizing the “deep web”. I think most people are getting the Semantic web of meaning confused with the depths. Quick post on explains….

    Thanks for your help this summer and we’ll seek your help again.
    But next time I’ll hit you up front. 🙂


  • Archives

  • Recent Posts

  • Meta