May 3, 2008

In May 2005, I gave a short talk at Alan Brody’s iBreakfast program. An irrepressible New Yorker, Mr. Brody invites individuals to address a hand-picked audience of movers and shakers who work in Manhattan. I reported to the venue, zoomed through a look at Google’s then-novel index of scholarly information, and sat down.

Although I’ve been asked to address the group in 2006 and 2007, I was a flop. The movers and shakers were hungry for information related to search engine optimization. SEO, as the practice is called, specializes in tips and tricks to spoof Google into putting a Web site on the first page of Google results for a query and ideally in the top spot. Research and much experimentation have revealed that if a Web site isn’t on the first page of a Google results list, that Web site is a loser–at least in terms of generating traffic and hopefully sales.

I want to invest a few minutes of my time taking a look at the information I discussed in 2005, and if you are looking for SEO information, stop reading now. I want to explore Google Scholar. With most Americans losing interest in books and scholarly journals, you’ll be wasting your time with this essay.

Google Scholar: The Unofficial View of This Google Service

Google wants to index the world’s information. Scholarly publications are a small, yet intellectual significant, portion of the world’s information. Scholarly journals are expensive and getting more costly with each passing day. Furthermore, some university libraries don’t have the budgets to keep pace with the significant books and journals that continue to flow from publishers, university presses, and some specialized not-for-profit outfits like the American Chemical Society. Google decided that indexing scholarly literature was a good idea. Google explains the service in this way:

Google Scholar provides a simple way to broadly search for scholarly literature. From one place, you can search across many disciplines and sources: peer-reviewed papers, theses, books, abstracts and articles, from academic publishers, professional societies, preprint repositories, universities and other scholarly organizations. Google Scholar helps you identify the most relevant research across the world of scholarly research.

Google offers libraries a way to make their resources available. You can read about this feature here. Publishers, with whom Google has maintained a wide range of relationships, can read about Google’s policies for this service here. My view of Google’s efforts to work with publishers is quite positive. Google is better at math than it is donning a suit and tie and kow towing to the mavens in Manhattan, however. Not surprisingly, Google and some publishers find one another difficult to understand. Google prefers an equation and a lemma; some publishers prefer a big vocabulary and a scotch.

What a Query Generates

Some teenager at one of the sophisticated research firms in Manhattan determined that Google users are more upscale than Yahoo users. I’m assuming that you have a college education and have undergone the pain of writing a research paper for an accredited university. A mail order BA, MS, or PhD does not count. Stop reading this essay now.

The idea is that you select a topic for a short list of those provided by your teacher (often a graduate student or a professor with an expertise in Babylonian wheat yield or its equivalent). You trundle off to the dorm or library, and you run a query on the library’s research system. If your institution’s library has the funds, you may get access to Thomson Reuters’ databases branded as Dialog or the equivalent offerings from outfits such as LexisNexis (a unit of Reed Elsevier) or Ebsco Electronic Publishing (a unit of the privately held E.B. Stevens Company).

Google works with these organizations, but the details of the arrangements are closely-guarded secrets. No one at the giant commercial content aggregators will tell what its particular relationship with Google embraces. Google–per its standard Googley policy–doesn’t say much of anything, but its non-messages are delivered with great good cheer by its chipper employees.

So, let’s run a query. The ones that work quite well are those concerned with math, physics, and genetics. Babylonian wheat yields, I wish to note, are not a core interest area of the Googlers running this service.

Here’s my query today, May 3, 2008: kolmogorov theorem. If you don’t know what this canny math whiz figured out, don’t fret. For my purpose, I want to draw your attention to the results shown in the screen shot below:

kolmogorov results

Navigate to http://scholar.google.com and enter the bound phrase Kolmogorov Theorem.

As I write this, I am sitting with a person who worked for Gene Garfield, the inventor of citation analysis. He was quite impressed with Google’s generating a hot link to other scholarly articles in the Google system that have cited a particular paper. You can access these by clicking the link. The screen shot below shows you the result screen displayed by clicking on “Representation Properties of Networks”, the first item in the result list above. You can locate the citation link by looking for a phrase after the snippet that begins “Cited by…” Mr. Collier’s recollection of the citation analysis was that Dr. Garfield, a former Bronx cab driver with two PhDs, believed that probability played a major role in determining significance of journal articles. If a particular article were cited by reputable organizations and sources, there was a strong probability that the article was important. To sum up, citation that point to an article are votes. Dr. Garfield came up with the idea, and Messrs. Brin and Page were attentive to this insight. Mr. Page acknowledged Dr. Garfield’s idea in the PageRank patent document.

kolmorgorov citations

Note that you can perform two additional functions from the results list displayed when you click on “Cited by…” links. First, you can see related articles; that is, without rekeying a query, you can look at articles that are on the subject of errors resulting from certain mathematical procedures. Second, you can run a Web query to get the information contained in the general Google Web index. One of the major players in for-fee citation analysis call their product the “web of science”. I don’t know about you, but Google has kicked the soccer ball through the net with its seamless integration of related articles and Web content.

You can fool around with this service on your own. You will undoubtedly come across these types of links and features. This example is not exhaustive, merely suggestive. I would like to highlight these interesting functions:

  1. Links with the form “BL direct” or some other type of “direct”. Click on one of these and Google whisks you off to the British Library Direct service. You can order the article or launch another query on the British Library’s own search system. You don’t get a “free” version of the article, but you have a way to get your hands on it if you are willing to pay a fee.
  2. Versions of the article. Scholars, not rural wackos like me, often recycle their insights in different journals. Google identifies these variants and makes them a hot link away. One little-known secret of scholarly publishing is that if you explore you may come across a version of a for-fee article that is either free or contains a more in-depth abstract of the author’s main argument. In either case, you can get more information before paying the British Library or some other vendor for the complete article.
  3. You can click one of the hot links above the search box such as “Images” and see visuals related to your query. For math and other technical subjects, a graphic makes the guts of the theorem more easily understood, at least to some people.
  4. You can click the “advanced” link and narrow the query by technical discipline. Again, the Babylonian scholars will be dismayed at the dearth of grain information. Scientists and engineers will be well served. On the advanced page, you will be able to limit the results by time. This is a feature sorely needed in the main Google index, and if Google muddles forward with its programmable search engine and other smart parsers, useful date features may lighten the researcher’s load for Web queries. For now, note that Google does time and does it quite well.

You will stumble upon other gizmos in this service as well. OCLC, a giant library information operation, works with Google. You will find that certain universities such as Stanford and Dartmouth are quite supportive of the GOOG’s scholarly efforts. I will leave you to your own investigations.

So What?

If you flash back to my 2005 talk, the audience’s non-reaction and its failure to throw objects at me, indicates that few people know what to make of this “beta” service. My view of the importance of this service includes:

  1. As library budgets come under increased pressure, institutions are going to find that students find and use Google Scholar. As these little wizards and wizard-ettes move from the groves of academe to the cubicles in Bangladore and Budapest (sorry Boston), Google becomes not a Web index but a knowledge index. Project forward a cohort or two, and you have a fundamental change in how students will do “library” research. Smart libraries will want to ride the GOOG. Less progressive institutions may find themselves marginalized.
  2. Commercial database producers who can’t figure out what Google is doing with KNOL, Google Books, its publishing inventions, and other seemingly unrelated activities, face a dilemma. On one hand, playing croquet with Google can generate new and much, much needed revenues. But an angry Google, armed with a croquet mallet could send a ball into the publisher’s ankle, causing pain and possibly inflicting an irrational outburst. On the other hand, ignoring Google means that the publisher can arise one morning and discover that GOOG has taken the publishing company’s parking lot.
  3. The “web” connections within and at some point across content domains deliver a more potent research tool than those available elsewhere. Forget the ads. Forget the dowdy result lists. The information that Google makes available with so little effort a college rugby player can master the system between gulps of Gatorade. The system makes a smart researcher even smarter.

I ended my 2005 talk with three comments. I said that Google itself does not know what will emerge from the petri dish of its betas. You’ll be surprised. Publishers will be surprised. Authors will be surprised. Google itself will be surprised. I also noted that Google will produce with little effort “standalone, linked, network-centric, and hybrid products with little or no warning”. Finally I noted that Google can monetize scholarly “stuff” with ads and other techniques that the company can implement at any time, again with no warning. I for one would pay a subscription fee to Google to use Scholar. I would certainly snort and howl in this Web log. At the end of the day, I would glumly click Google Checkout and give the beast money.

It’s three years since I gave my iBreakfast talk. A penultimate question: Are you too willing to ignore this important demonstration of collection analysis, one-of-a-kind content, and point-and-click intelligence?

A final question: Would this type of linking be useful for enterprise content?

Let me know if you think.

Stephen Arnold, May 4, 2008


5 Responses to “Poking around Google Scholar Service”

  1. sperky undernet on May 4th, 2008 8:35 am

    I have used Scholar in ways that run circles around a certain commercial citation index including the social science one. The elance ten dollar an hour researchers do not use this, confusing it with wikipedia. BTW, my results ended up in a Fortune 500 company. Scholar can definately be of enterprise content value but then so might certain comic books and novels.

  2. Daniel Tunkelang on May 4th, 2008 11:28 am

    Google Scholar is very nice when you know the title or, in some cases, the author of the article you are looking for. It also allows you to surf the citation graph using a specified article as your entry point. As such, it is a solid incremental improvement on the CiteSeer system developed in 1997.

    But, as a daily user of digital libraries, I find that Google Scholar does not help me with the highest-value use cases–namely, those where recall matters more than precision. Non-academic motives include prior art search for patents, supporting cases for freedom to operate, and competitive intelligence. Very similar to the concerns associated with discovery and compliance.

    These and other applications call for a system that supports exploratory search. For example, take a look at the Faceted DBLP, e.g. http://dblp.l3s.de/?q=kolmogorov+&search_opt=all&synt_query_exp=none&resTableName=query_resultkeHoee&year_range=2003-2007&author_url=&resultsPerPage=100

    From these results, even a non-expert might surmise that Kolmogorov is most known for Kolmogorov complexity, and that his work is relevant to clustering, machine learning, and data mining.

  3. Stephen E. Arnold on May 4th, 2008 2:38 pm

    My thought is about Google Scholar functionality. The various ways to move from hit to author to related topics is useful. I don’t recall seeing this in the default Google Search Appliance. Thanks for the posts!
    Stephen Arnold, May 4, 2008

  4. Constance Ard on May 5th, 2008 1:15 pm

    This post brings up yet another interesting realm of information literacy for librarians. Google Scholar may be the bridge that is necessary between Google and good information.

    I have had positive research experiences with Google Scholar but it has never been my first stop. As it continues to add content it may begin to be a top 5 stop instead of a last resort shop.

    University librarians may benefit from incorporating this into the information literacy curricula. Luckily, I don’t have to make those kinds of calls. I just have to find the answer.

