OCLC-SkyRiver Dust Up
December 16, 2010
In the excitement of the i2 Ltd. legal action against Palantir, I put the OCLC – SkyRiver legal hassle aside. I was reminded of the library wrestling match when I read “SkyRiver Challenges OCLC as Newest LC Authority Records Node.” I don’t do too much in libraries at this time. But OCLC is a familiar name to me; SkyRiver not so much. The original article about the legal issue appeared in Library Journal in July 29, 2010, “SkyRiver and Innovative Interfaces File Major Antitrust Lawsuit against OCLC.” Libraries are mostly about information access. Search would not have become the core function if it had not been for libraries’ early adoption of online services and their making online access available to patrons. In the days before the wild and wooly Web, libraries were harbingers of the revolution in research.
Legal battles are not unknown in the staid world of research, library services, and traditional indexing and content processing activities. But a fight between a household name and OCLC and a company with which I had modest familiarity is news.
Here’s the key passage from the Library Journal write up:
Bibliographic services company SkyRiver Technology Solutions recently announced that it had become an official node of the Name Authority Cooperative Program (NACO), part of the Library of Congress’s (LC) Program for Cooperative Cataloging. It’s the first private company to provide this service, which was already provided by the nonprofit OCLC—SkyRiver’s much larger competitor in the bibliographic services field—and the British Library. Previously, many institutions have submitted their name authority records via OCLC. But SkyRiver’s new status as a NACO node allows it to provide the service, once exclusive to OCLC in the United States, to its users directly.
For me, this is a poke in the eye for OCLC, an outfit that used me on a couple of project when General K. Wayne Smith was running a very tight operation. I don’t know how management works at OCLC, but I think any action by the Library of Congress is going to trigger some meetings.
SkyRiver sees OCLC as acting in a non-competitive way. Now the Library of Congress has blown a kiss at SkyRiver. Looks like the library landscape, already ravaged by budget bulldozers, may be undergoing another change. I think outline of the mountain range where the work is underway appears to spell out the word “Monopoly.” Nah, probably my imagination.
Stephen E Arnold, December 16, 2010
Freebie
The Future of Books?
September 15, 2010
Short honk: I came across this scanning items in my not-yet-dead Overflight service. Is this the future of books? I hope not. But with libraries facing budget pressure and library vendors scrambling, the optimal use of books may be to make furniture.
Source: http://i.imgur.com/ia1yy.jpg.
Stephen E Arnold, September 15, 2010
Freebie
Camelot to Go Viral
July 15, 2010
A Cambridge based search applications firm has been chosen by the John F Kennedy Presidential Library and Museum and the John F. Kennedy Library Foundation to help provide a search engine experience to go with the late president’s digitized archives.
Endeca Technologies has been hired to work on the project that will launch on January 20, 2011, which will be the 50th anniversary of the inauguration. The idea behind digitizing Camelot is to make the whole array of the JFK archives available to everyone from historians to schoolchildren.
Endeca’s information access solutions have long been helping people and business to explore, analyze, and understand information in a variety of different ways. Their solutions cover a wide variety of areas from retail to media and publishing.
Rob Starr, July 15, 2010
Freebie
Oxford Flexes Its Reference Muscles
April 22, 2010
I go to a gym every couple of days when I am in town. So happens that that a number of semi pro wrestlers go to the gym. Big people. Tattoos. Muscles. I an old wimp and I graciously give up my place when one of these steroid stallions trots to the workout station I favor. Academics have muscles, but I think that my image of a muscular academic and one from Oxford University at that is of a milder, more gentle giant.
The Oxford muscle builders have turned their attention to creating online bibliographies. I think, based on reading the write up “Oxford University Press Launches the Anti-Google” that these will be variants of the old Goldentree bibliographies or the type of reference book Constance Winchell cranked out.
Here’s a synopsis of the product:
The OBO [Oxford Bibliographies Online] tool is essentially a straightforward, hyperlinked collection of professionally-produced, peer-reviewed bibliographies in different subject areas—sort of a giant, interactive syllabus put together by OUP and teams of scholars in different disciplines. Users can drill down to a specific bibliographic entry, which contains some descriptive text and a list of references that link to either Google Books or to a subscribing library’s own catalog entries, by either browsing or searching. Each entry is written by a scholar working in the relevant field and vetted by a peer review process. The idea is to alleviate the twin problems of Google-induced data overload, on the one hand, and Wikipedia-driven GIGO (garbage in, garbage out), on the other.
Sounds good but there may be some challenges:
First, these hand crafted bibliographies are expensive to create and keep current. The rush of enthusiasm for a project of this type gets some bibliographies out the door. However, the ongoing costs are likely to be an issue because libraries may not have the agility to buy this online service. Oxford University has the money, but once the reality of the costs sink in, my hunch is that push back from the finance person will be coming in 12 months.
Second, revenue. The spreadsheet fever makes the project look pretty tasty. Oxford will find itself dancing with some big outfits in the commercial database world. My view is that Oxford will have to find a partner quickly because, let’s face it, universities are not exactly the top guns in the marketing arena.
Third, the anti Google thing is cute but irrelevant. The Google is muddling along with probes into different market sectors. The Google is in the “good enough” game and that’s where Google’s search and reference services will aim. Google may end up with some academic wonder products but that will be exhaust from the Google revenue machine. Red herring to even mention Google.
Fourth, users want to click and get the full text. When I am doing research, I know how to do the primary and secondary research drill. The problem is that time and resources force me to use my own tools like the Overflight system. But for some tiny percentage of folks looking up information online Bing, Google, and Yahoo will pretty good. To dig into the next level, libraries have Ebsco products. Those who need more are going to be Oxford level researchers, and I am not sure a product aimed for this tiny slice of online users can generate enough revenue to exist without subsidies. Will Oxford fund the rowing team or the bibliographies? Time will tell.
In short, interesting but a bit of anachronism in my opinion.
Stephen E Arnold, April 22, 2010
No one paid for this post.
Arnold at NFAIS: Google Books, Scholar, and Good Enough
June 26, 2009
Speaker’s introduction: The text that appears below is a summary of my remarks at the NFAIS Conference on June 26, 2009, in Philadelphia. I talk from notes, not a written manuscript, but it is my practice to create a narrative that summarizes my main points. I have reproduced this working text for readers of this Web log. I find that it is easier to put some of my work in a Web log than it is to create a PDF and post that version of a presentation on my main Web site, www.arnoldit.com. I have skipped the “who I am” part of the talk and jump into the core of the presentation.
Stephen Arnold, June 26, 2009
In the past, epics were a popular form of entertainment. Most of you have read the Iliad, possibly Beowulf, and some Gilgamesh. One convention is that these complex literary constructs begin in the middle or what my grade school teacher call “In media res.”
That’s how I want to begin my comments about Google’s scanning project – an epic — usually referred to as Google Books. Then I want to go back to the beginning of the story and then jump ahead to what is happening now. I will close with several observations about the future. I don’t work for Google, and my efforts to get Google to comment on topics are ignored. I am not an attorney, so my remarks have zero legal foundation. And I am not a publisher. I write studies about information retrieval. To make matters even more suspect, I do my work from rural Kentucky. From that remote location, I note the Amazon is concerned about Google Books, probably because Google seeks to enter the eBook sector. This story is good enough; that is, in a project so large, so sweeping perfection is not possible. Pages are skewed. Insects scanned. Coverage is hit and miss. But what other outfit is prepared to spend to scan books?
Let’s begin in the heat of the battle. Google is fighting a number things. Google finds itself under scrutiny from publishers and authors. These are the entities with whom Google signed a “truce” of sorts regarding the scanning of books. Increasingly libraries have begun to express concern that Google may not be doing the type of preservation job to keep the source materials in a suitable form for scholars. Regulators have taken an interest in the matter because of the publicity swirling around a number of complicated business and legal issues.
These issues threaten Google with several new challenges.
Since its founding in 1998, Google has enjoyed what I would call positive relationships with users, stakeholders, and most of its constituents. The Google Books’ matter is now creating what I would describe as “rising tension”. If the tension escalates, a series of battles can erupt in the legal arena. As you know, battle is risky when two heroes face off in a sword fight. Fighting in a legal arena is in some ways more risky and more dangerous.
Second, the friction of these battles can distract Google from other business activities. Google, as some commentators, including myself in Google: The Digital Gutenberg may be vulnerable to new types of information challenges. One example is Google’s absence from the real time indexing sector where Facebook, Twitter, Scoopler.com, and even Microsoft seem to be outpacing Google. Distractions like the Google Books matter could exclude Google from an important new opportunity.
Finally, Google’s approach to its projects is notable because the scope of the project makes it hard for most people to comprehend. Scanning books takes exabytes of storage. Converting images to ASCII, transforming the text (that is, adding structure tags), and then indexing the content takes a staggering amount of computing resources.
Inputs to outputs, an idea that was shaped between 1999 to 2001. © Stephen E. Arnold, 2009
Google has been measured and slow in its approach. The company works with large libraries, provides copies of the scanned material to its partners, and has tried to keep moving forward. Microsoft and Yahoo, database publishers, the Library of Congress, and most libraries have ceded the scanning of books work to Google.
Now Google finds itself having to juggle a large number of balls.
Now let’s go back in time.
I have noticed that most analysts peg Google Books’s project as starting right before the initial public offering in 2004. That’s not what my research has revealed. Google’s interest in scanning the contents of books reaches back to 2000.
In fact, an analysis of Google’s patent documents and technical papers for the period from 1998 to 2003 reveals that the company had explored knowledge bases, content transformation, and mashing up information from a variety of sources. In addition, the company had examined various security methods, including methods to prevent certain material from being easily copied or repurposed.
The idea, which I described in my The Google Legacy (which I wrote in 2003 and 2004 with publication in early 2005) was to gather a range of information, process that information using mathematical methods in order to produce useful outputs like search results for users and generate information about the information. The word given to describe value added indexing is metadata. I prefer the less common but more accurate term meta indexing.
Library Teaches Search – More Instruction Needed
June 22, 2009
My recollection is that libraries taught search as far back at 1980. I recall that either database vendors would run demonstrations or that librarians skilled in the use of online would provide guidance to those who asked. I recall running a class in ABI/INFORM at Chicago Public Library and there was an overflow crowd of both staff and research minded patrons. I was delighted, therefore, to see an article in the Sacramento Bee that described the Sutter Library’s classes in finding health and medical information online. The class is a reminder to me that:
- Librarians and information professionals often know how to search and have an interest in sharing that knowledge
- Patrons are smart enough to know that despite the marketing hype and the pundits’ assertions that search is a “done deal” additional instruction attracts people and finds its way into The Sacramento Bee
We have a long way to go before information professionals will be relics of a long gone time. The people who tell me that they “know how to search” and “can locate almost anything online” are kidding themselves. I think I am a reasonably good researcher. But if you spend time monitoring how I find information, you will learn quickly that I turn to experts who make my search skills look primitive. Even my nifty Overflight system pales with the type of information that my research team generates by:
- Knowing what content is located where
- Understanding the editorial method behind or absent from certain online systems
- Leveraging hard-to-manipulate resources such as information from government repositories, specialized services, and individual experts.
I would like to see more libraries move aggressively into online instruction, market those programs, and raise the level of expertise. Most of the people who claim to be experts at search are clueless about how bad their skills are. Among the worst offenders are self appointed search experts who have trouble figuring out when something is likely to be baloney and when something is just plain wrong. Enterprise search, content management, and text mining are three disciplines where better research will be most beneficial in my opinion. Then we need critical thinking skills. Schools have dropped the ball. Maybe libraries can help in this area as well? Search procurement teams will be well served if the team has one or more librarians in the huddle.
Stephen Arnold, June 22, 2009
SirsiDynix Search Plus Discovery for Libraries
May 24, 2009
Brainware landed a deal to provide search and discovery to SirsiDynix. After a bit of poking around, I learned that SirsiDynix wanted to move beyond key word search and provide users of its library systems with discovery functions. “Discovery”, as used in this sense, refers to giving a person looking for information easy-to-use methods to look for related information and suggested information also germane to the user’s query. Endeca hooked up with Ebsco to provide “guided navigation” to Ebsco customers. Most online public access catalogs and library-centric search systems match the users’ query terms or force the user to search by entering an author’s name. Change, at long last, seems to be coming to the library for search of an institution’s textual information. I wrote about some of the Brainware system’s capabilities in my 2008 study “Beyond Search” for the Gilbane Group here. I also did a short write up about Brainware in this Web log in early 2008 here.
A reader alerted me to an announcement here that SirsiDynix will roll out an enhanced enterprise search and discovery system to over 30 libraries. You can read that announcement here. The system includes such features as:
- Trigram analysis, or “fuzzy logic” which evaluates each trigram in a word to allow for typos, diacritics and more: a first in the library search and discovery market
- “Did you mean” suggestions which are based on terms in the catalog (rather than a generic third-party dictionary)
- Dynamic search suggestions
- Delivery of saved searches through an RSS web feed
- Email and print options for search results
- Built-in “Library Favorites”
- The capability for libraries to define their own “Favorites”, profiles, languages and filters.
You can test the Brainware power “enterprise” service at the Wells County Public Library here.
The library market has been under severe price competition. This information sector is coming under more and more pressure from Google. The world’s largest search provider has been slowly expanding its services, including the controversial Google Books’ program. So far, specialized vendors of library information systems have been able to maintain the grip in today’s slippery economic one lane highway. The impact of Google on this market will be interesting to observe.
Stephen Arnold, May 24, 2009
Google and Libraries
May 1, 2009
The USPTO must be clearing backlogs. A flurry of Google patent documents became available. Several were uninteresting (floating data centers, query expansion), but one struck me as having some disruptive potential. I refer to Library Citation Integration, US7526475. You can get the document from the USPTO at http://www.uspto.gov. The abstract stated:
An online search system generates an index of documents using index information received from a library. Some documents have restricted access; some documents may not be available online. The search system provides links to documents in the library as well as other sites based on a search, and may include link resolvers received from the library. The search system provides access links to the link resolvers if an identifier, such as a user identification or IP address, matches an affiliation list from the library.
Why? Think for a moment about the commercial database vendors, the online public access catalog vendors, and the companies building content for institutional use. I thought the pointing function to items in the OCLC system was interesting. This invention gives the Google some an opportunity to stomp, should it choose to do so, in some other vineyards. Who will be squashed into fine wine? I don’t drink, so I might not be affected. Those in the library ecosystem might have a different view.
Stephen Arnold, May 1, 2009
Amsterdam Breathes New Life into Old Information Institution
April 19, 2009
A happy quack to the reader who sent me the link to Andrew Keen’s “Digital Dutch Masterpieces” here. The article points out that libraries can be both old and new media. He wrote:
at the Amsterdam public library. Instead of the dustiness and crustiness of the typical 20th century library, visitors to Amsterdam’s central public library will find not only books, but a restaurant as well as a children’s theatre and a public radio and television studio. The library, which is open every day from 10.00 am to 10.00 pm, also holds a series of cultural festivals – such as the upcoming week of poetry – which it then broadcasts on the Internet. Amsterdam library’s website epitomizes its innovative approach to the 21st curation of knowledge. The website features its own customized search engine, the “aquabrowser”, which has integrated the library’s books, CDs and DVDs as well as a rich archive of Amsterdam’s history and culture. Equally innovatively, the website provides those who use it within the walls of the library itself open access to all its digital content.
I did not resonate with the assertion that the library has a “return on investment”. That phrase has a specific meaning in financial circles. I think that the Amsterdam effort returns significant social value. One hopes other libraries absorb the lessons of this case.
Stephen Arnold, April 19, 2009
Potential Trouble for LexisNexis and Westlaw
March 2, 2009
Most online surfers don’t click to Reed Elsevier’s LexisNexis or Thomson Reuters Westlaw. The reason? These commercial services charge money–quite a lot of money–to access legal documents. Executives at both firms can deliver compelling elevator pitches about the added value each company brings to legal documents. In the pre-crash era, legal indexing was a manual process. Then the cost crunch arrived so both outfits are trying to slap software against the thorny problem of making sense of court documents, rulings, and assorted effluvia of America’s legal factories. I may write about how these two quasi US outfits have monopolized for fee legal information about American law for lawyers, government agencies. Both Reed and Thomson then turn around and sell access to these documents to the agencies that created them in the first place. I wonder if the good senator is aware of this aspect of commercial online services’ busness practices?
What’s the trouble? I bet you thought I was going to mention Google. Wrong. Google is on the edge of indexing legal information in a more comprehensive way. But the right now trouble is Senator Joe Lieberman. Wired reported that the good senator wondered by public documents are not available without a charge. You can read the story “Lieberman Asks, Why Are Court Docs Still Behind Paid Firewall?” here. Senator Lieberman’s question may lead to a hearing. The process could, in my opinion, start a chain reaction that further erodes the revenue Reed Elsevier and Thomson Reuters derive from public documents. Somewhere in the chain, the Google will beef up the legal content in its Uncle Sam service here.
At their core, Reed Elsevier and Thomson Reuters are traditional publishing and information companies. As such, their business model is fragile. Within the present financial pressure cooker, the Lieberman question could blow the lid off these two organization’s for fee legal business. If government agencies shift to a service provided by Google, Microsoft, or Yahoo, I think these two dead tree outfits will crash to the forest floor.
What the likelihood of this downside scenario. I would put it at better than 60 percent. Have another view? Share it, please. Set the addled goose straight.
Stephen Arnold, March 2, 2009