<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Beyond Search &#187; Library automation</title>
	<atom:link href="http://arnoldit.com/wordpress/category/library-automation/feed/" rel="self" type="application/rss+xml" />
	<link>http://arnoldit.com/wordpress</link>
	<description>by Stephen E. Arnold</description>
	<lastBuildDate>Sun, 12 Feb 2012 05:07:41 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Google and Reading Books Offline</title>
		<link>http://arnoldit.com/wordpress/2011/12/26/google-and-reading-books-offline/</link>
		<comments>http://arnoldit.com/wordpress/2011/12/26/google-and-reading-books-offline/#comments</comments>
		<pubDate>Mon, 26 Dec 2011 14:06:26 +0000</pubDate>
		<dc:creator>Stephen E. Arnold</dc:creator>
				<category><![CDATA[Google]]></category>
		<category><![CDATA[Library automation]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Online (general)]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Text processing]]></category>

		<guid isPermaLink="false">http://arnoldit.com/wordpress/?p=22026</guid>
		<description><![CDATA[I recall writing a short analysis of the methods Google used to prevent a person from reading an entire book on one of the Google services. There were both patent documents and technical papers. The methods were interesting and seemed to be difficult to work around. We learned that with a little coordination and a [...]]]></description>
			<content:encoded><![CDATA[<p>I recall writing a short analysis of the methods Google used to prevent a person from reading an entire book on one of the Google services. There were both patent documents and technical papers. The methods were interesting and seemed to be difficult to work around. We learned that with a little coordination and a number of different “helpers”, it was possible to get most pages in a book, but even that method was far from fool proof.</p>
<p>Imagine my surprise when I read “<a href="http://www.engadget.com/2011/12/22/google-books-for-chrome-gets-offline-support-one-less-excuse-fo/" target="_blank">Google Books for Chrome Gets Offline Support, One Less Excuse for Not Reading the &#8216;Classics</a>&#8216;”. According to the write up:</p>
<blockquote><p>the Google Books app for <a href="http://www.engadget.com/tag/googlechrome">Chrome</a> now caches your titles for local reading. To download a book, just hover over the cover in library view and select &#8220;make available offline&#8221; from the pop-up. Then, even when you can&#8217;t get your <a href="http://www.engadget.com/tag/chromebook">Chromebook</a> connected, you&#8217;ll be able to sit back and relax with a classic novel or seedy romance tale.</p></blockquote>
<p>With libraries facing push back from publishers for lending eBooks, I found the Google service interesting. Will the addled goose read classics on his Chromebook? Nope, the goose is not a Chromebook user. Our question, “What’s next?” Might the Google allow reading public domain books on any device running Chrome? Might the Google “rent” a title because the methods for knowing who has what exists? Is Google now following Amazon? Worth watching as Google moves to redefine itself for 2012.</p>
<p><a href="http://www.arnoldit.com/sitemap.html" target="_blank">Stephen E Arnold</a>, December 26, 2011</p>
<p>Sponsored by <a href="http://www.pandia.com/enterprise-search" target="_blank">Pandia.com</a></p>
]]></content:encoded>
			<wfw:commentRss>http://arnoldit.com/wordpress/2011/12/26/google-and-reading-books-offline/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Libraries: Another Plea</title>
		<link>http://arnoldit.com/wordpress/2011/06/04/libraries-another-plea/</link>
		<comments>http://arnoldit.com/wordpress/2011/06/04/libraries-another-plea/#comments</comments>
		<pubDate>Sat, 04 Jun 2011 05:04:22 +0000</pubDate>
		<dc:creator>Stephen E. Arnold</dc:creator>
				<category><![CDATA[Business strategy]]></category>
		<category><![CDATA[Library automation]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Online (general)]]></category>

		<guid isPermaLink="false">http://arnoldit.com/wordpress/?p=17969</guid>
		<description><![CDATA[In his “The Future of the Library” essay, Seth Godin highlights the roots of public libraries and librarians and their future. He aptly points out that the librarian isn&#8217;t a clerk who happens to work at a library. A librarian is a data hound, a guide, a sherpa and a teacher. The librarian is the [...]]]></description>
			<content:encoded><![CDATA[<p>In his <a href="http://sethgodin.typepad.com/seths_blog/2011/05/the-future-of-the-library.html">“The Future of the Library”</a> essay, Seth Godin highlights the roots of public libraries and librarians and their future.</p>
<p>He aptly points out that</p>
<blockquote><p>the librarian isn&#8217;t a clerk who happens to work at a library. A librarian is a data hound, a guide, a sherpa and a teacher. The librarian is the interface between reams of data and the untrained but motivated user.</p></blockquote>
<p>I am not too keen on the sherpa thing. But the point is one with which the Beyond Search team agrees.</p>
<p>With the rise of services like Netflix and technologies like ebooks, libraries are no longer just about lending books and movies. They need to re-imagine their mission to stay relevant. He notes:</p>
<blockquote><p>Just in time for the information economy, the library ought to be the local nerve center for information.</p></blockquote>
<p>The library of the future features ”a librarian who can bring domain knowledge and people knowledge and access to information to bear.” Information overload is real. Microsoft tapped into the frustration to market its Bing search engine. Godin presents exciting future for both libraries and librarians if both are willing to change.</p>
<p>Rita Safranek, June 4, 2011</p>
<p>Sponsored by <a href="http://www.arnoldit.com">ArnoldIT.com</a>, the resource for <a href="http://www.arnoldit.com/wordpress/landscape">enterprise search information</a> and current news about <a href="http://www.inteltrax.com">data fusion</a></p>
]]></content:encoded>
			<wfw:commentRss>http://arnoldit.com/wordpress/2011/06/04/libraries-another-plea/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>An Essential Guide for Information Professionals</title>
		<link>http://arnoldit.com/wordpress/2011/05/20/an-essential-guide-for-information-professionals/</link>
		<comments>http://arnoldit.com/wordpress/2011/05/20/an-essential-guide-for-information-professionals/#comments</comments>
		<pubDate>Fri, 20 May 2011 05:16:50 +0000</pubDate>
		<dc:creator>Stephen E. Arnold</dc:creator>
				<category><![CDATA[Business strategy]]></category>
		<category><![CDATA[Library automation]]></category>
		<category><![CDATA[Management]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Reference tool]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://arnoldit.com/wordpress/?p=17734</guid>
		<description><![CDATA[Infonista has posted a review a wonderful book entitled The Information and Knowledge Professional’s Career Handbook. Full disclosure: Ulla de Stricker is a friend of ours, and we just love her and her co-author, Jill Hurst-Wahl. Though we admit to a little bias, we’re sure we’d be recommending this book in any case. The Infonista [...]]]></description>
			<content:encoded><![CDATA[<p>Infonista has <a href="http://infonista.com/2011/information-and-knowledge-professional%E2%80%99s-handbook/">posted a review</a> a wonderful book entitled <em><a href="http://www.facebook.com/pages/The-Information-and-Knowledge-Professionals-Career-Handbook/161473310552265">The Information and Knowledge Professional’s Career Handbook.</a></em> Full disclosure: <a href="http://www.destricker.com/">Ulla de Stricker</a> is a friend of ours, and we just love her and her co-author, <a href="http://www.hurstassociates.com/about.html">Jill Hurst-Wahl</a>.</p>
<p>Though we admit to a little bias, we’re sure we’d be recommending this book in any case. The Infonista review summarizes what you have to look forward to:</p>
<p>“In fifteen chapters, the authors provide detailed, practical career advice that comes across as a cross between coaching, mentoring, and okay, (in the nicest possible way), a bit of nagging. But it’s clear that their goal is to help readers avoid career potholes if possible. . . .</p>
<p>“Reading <em>The Information and Knowledge Professional’s Handbook </em>is like hanging out with two really smart, experienced, and wise mentors who aren’t going to sugarcoat any of their advice – because they know you really need the real deal. The information they provide is practical, actionable, and from this professional’s experience, spot on.”</p>
<p>This praise is no surprise to us, of course. We knew these ladies are at the top of their field.</p>
<p>Do yourself a favor and <a href="http://www.amazon.com/Information-Knowledge-Professionals-Handbook-Chandos/dp/1843346087/ref=sr_1_1?ie=UTF8&amp;s=books&amp;qid=1305518283&amp;sr=8-1">pick up a copy</a> right away.</p>
<p>Cynthia Murrell May 20, 2011</p>
]]></content:encoded>
			<wfw:commentRss>http://arnoldit.com/wordpress/2011/05/20/an-essential-guide-for-information-professionals/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Libraries Embrace Semantics</title>
		<link>http://arnoldit.com/wordpress/2011/04/15/libraries-embrace-semantics/</link>
		<comments>http://arnoldit.com/wordpress/2011/04/15/libraries-embrace-semantics/#comments</comments>
		<pubDate>Fri, 15 Apr 2011 05:33:56 +0000</pubDate>
		<dc:creator>Stephen E. Arnold</dc:creator>
				<category><![CDATA[Library automation]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Semantic]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Text processing]]></category>

		<guid isPermaLink="false">http://arnoldit.com/wordpress/?p=17058</guid>
		<description><![CDATA[We came across a quite interesting article about semantics in the library market. The world has become very dependent on search engine sites such as Google but programs such as this offer very limited results. According to Semanticweb.com “Semantics in the Public Library” introducing semantic Web technology into public libraries can help to bridge the [...]]]></description>
			<content:encoded><![CDATA[<p>We came across a quite interesting article about semantics in the library market.</p>
<p>The world has become very dependent on search engine sites such as <a href="http://www.google.com/">Google</a> but programs such as this offer very limited results. According to Semanticweb.com <a href="http://semanticweb.com/semantics-in-the-public-library_b18803">“Semantics in the Public Library”</a> introducing semantic Web technology into public libraries can help to bridge the information gap and build a new and better web. The article said:</p>
<blockquote><p>“The worldwide web is very vocabulary dependent. Today’s Web search engines do not group web pages, pull out concepts, or understand them. There is no access to the deep Web.</p></blockquote>
<p>Though Google produces seemingly an unlimited number of results it leaves the job half done. The semantic web can do more with the information and handle more complex databases as well as produce more structured results. <a href="http://www.scopus.com/scopus/home.url">Scopus</a> is a semantic web search engine configured to handle a variety of complex queries and produce structured and easy to understand results. Semantics though it seems like the perfect technology is not yet a perfect science and implementing the new technology is definitely easier said then done.</p>
<p>April Holmes, April 15, 2011</p>
<p><em>Freebie</em></p>
]]></content:encoded>
			<wfw:commentRss>http://arnoldit.com/wordpress/2011/04/15/libraries-embrace-semantics/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>OCLC-SkyRiver Dust Up</title>
		<link>http://arnoldit.com/wordpress/2010/12/16/oclc-skyriver-dust-up/</link>
		<comments>http://arnoldit.com/wordpress/2010/12/16/oclc-skyriver-dust-up/#comments</comments>
		<pubDate>Thu, 16 Dec 2010 08:23:35 +0000</pubDate>
		<dc:creator>Stephen E. Arnold</dc:creator>
				<category><![CDATA[Business strategy]]></category>
		<category><![CDATA[Indexing]]></category>
		<category><![CDATA[Legal matters]]></category>
		<category><![CDATA[Library automation]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://arnoldit.com/wordpress/?p=15203</guid>
		<description><![CDATA[In the excitement of the i2 Ltd. legal action against Palantir, I put the OCLC – SkyRiver legal hassle aside. I was reminded of the library wrestling match when I read “SkyRiver Challenges OCLC as Newest LC Authority Records Node.” I don’t do too much in libraries at this time. But OCLC is a familiar [...]]]></description>
			<content:encoded><![CDATA[<p>In the excitement of the i2 Ltd. legal action against Palantir, I put the <a href="http://www.oclc.org" target="_blank">OCLC</a> – <a href="http://theskyriver.com/" target="_blank">SkyRiver</a> legal hassle aside. I was reminded of the library wrestling match when I read “SkyRiver Challenges OCLC as Newest LC Authority Records Node.” I don’t do too much in libraries at this time. But OCLC is a familiar name to me; SkyRiver not so much. The original article about the legal issue appeared in Library Journal in July 29, 2010, “<a href="http://www.libraryjournal.com/lj/home/886099-264/skyriver_and_innovative_interfaces_file.html.csp" target="_blank">SkyRiver and Innovative Interfaces File Major Antitrust Lawsuit against OCLC</a>.” Libraries are mostly about information access. Search would not have become the core function if it had not been for libraries’ early adoption of online services and their making online access available to patrons. In the days before the wild and wooly Web, libraries were harbingers of the revolution in research.</p>
<p>Legal battles are not unknown in the staid world of research, library services, and traditional indexing and content processing activities. But a fight between a household name and OCLC and a company with which I had modest familiarity is news.</p>
<p><a href="http://arnoldit.com/wordpress/wp-content/uploads/2010/12/image9.png"><img style="display: inline; border: 0px;" title="image" src="http://arnoldit.com/wordpress/wp-content/uploads/2010/12/image_thumb9.png" border="0" alt="image" width="244" height="174" /></a></p>
<p>Here’s the key passage from the Library Journal write up:</p>
<blockquote><p>Bibliographic services company <a href="http://theskyriver.com/">SkyRiver Technology Solutions</a> recently announced that it had become an official node of the <a href="http://www.loc.gov/catdir/pcc/naco/">Name Authority Cooperative Program</a> (NACO), part of the Library of Congress&#8217;s (LC) <a href="http://www.loc.gov/catdir/pcc/">Program for Cooperative Cataloging</a>. It&#8217;s the first private company to provide this service, which was already provided by the nonprofit <a href="http://www.oclc.org/us/en/default.htm">OCLC</a>—SkyRiver&#8217;s much larger competitor in the bibliographic services field—and the <a href="http://www.bl.uk/">British Library</a>. Previously, many institutions have submitted their name authority records via OCLC. But SkyRiver&#8217;s new status as a NACO node allows it to provide the service, once exclusive to OCLC in the United States, to its users directly.</p></blockquote>
<p>For me, this is a poke in the eye for OCLC, an outfit that used me on a couple of project when General K. Wayne Smith was running a very tight operation. I don’t know how management works at OCLC, but I think any action by the Library of Congress is going to trigger some meetings.</p>
<p>SkyRiver sees OCLC as acting in a non-competitive way. Now the Library of Congress has blown a kiss at SkyRiver. Looks like the library landscape, already ravaged by budget bulldozers, may be undergoing another change. I think outline of the mountain range where the work is underway appears to spell out the word “Monopoly.” Nah, probably my imagination.</p>
<p>Stephen E Arnold, December 16, 2010</p>
<p><em>Freebie</em></p>
]]></content:encoded>
			<wfw:commentRss>http://arnoldit.com/wordpress/2010/12/16/oclc-skyriver-dust-up/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Future of Books?</title>
		<link>http://arnoldit.com/wordpress/2010/09/15/the-future-of-books/</link>
		<comments>http://arnoldit.com/wordpress/2010/09/15/the-future-of-books/#comments</comments>
		<pubDate>Wed, 15 Sep 2010 05:12:56 +0000</pubDate>
		<dc:creator>Stephen E. Arnold</dc:creator>
				<category><![CDATA[Business strategy]]></category>
		<category><![CDATA[Library automation]]></category>
		<category><![CDATA[News]]></category>

		<guid isPermaLink="false">http://arnoldit.com/wordpress/?p=13945</guid>
		<description><![CDATA[Short honk: I came across this scanning items in my not-yet-dead Overflight service. Is this the future of books? I hope not. But with libraries facing budget pressure and library vendors scrambling, the optimal use of books may be to make furniture. Source: http://i.imgur.com/ia1yy.jpg. Stephen E Arnold, September 15, 2010 Freebie]]></description>
			<content:encoded><![CDATA[<p>Short honk: I came across this scanning items in my not-yet-dead <a href="http://www.arnoldit.com/overflight" target="_blank">Overflight</a> service. Is this the future of books? I hope not. But with libraries facing budget pressure and library vendors scrambling, the optimal use of books may be to make furniture.</p>
<p><img src="http://i.imgur.com/ia1yy.jpg" alt="" width="298" height="223" /></p>
<p>Source: <a href="http://i.imgur.com/ia1yy.jpg">http://i.imgur.com/ia1yy.jpg</a>.</p>
<p>Stephen E Arnold, September 15, 2010</p>
<p><em>Freebie</em></p>
]]></content:encoded>
			<wfw:commentRss>http://arnoldit.com/wordpress/2010/09/15/the-future-of-books/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Camelot to Go Viral</title>
		<link>http://arnoldit.com/wordpress/2010/07/15/camelot-to-go-viral/</link>
		<comments>http://arnoldit.com/wordpress/2010/07/15/camelot-to-go-viral/#comments</comments>
		<pubDate>Thu, 15 Jul 2010 05:55:56 +0000</pubDate>
		<dc:creator>Stephen E. Arnold</dc:creator>
				<category><![CDATA[Business strategy]]></category>
		<category><![CDATA[Library automation]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Online (general)]]></category>
		<category><![CDATA[Search]]></category>

		<guid isPermaLink="false">http://arnoldit.com/wordpress/?p=13073</guid>
		<description><![CDATA[A Cambridge based search applications firm has been chosen by the John F Kennedy Presidential Library and Museum and the John F. Kennedy Library Foundation to help provide a search engine experience to go with the late president’s digitized archives. Endeca Technologies has been hired to work on the project that will launch on January [...]]]></description>
			<content:encoded><![CDATA[<p>A Cambridge based search applications firm has been chosen by the John F Kennedy Presidential Library and Museum and the John F. Kennedy Library Foundation to help provide a search engine experience to go with the late president’s digitized archives.</p>
<p><a href="http://www.endeca.com/">Endeca Technologies</a> has been hired to work on the project that will launch on January 20, 2011, which will be the 50<sup>th</sup> anniversary of the inauguration. The idea behind digitizing Camelot is to make the whole array of the JFK archives available to everyone from historians to schoolchildren.</p>
<p>Endeca&#8217;s information access solutions have long been helping people and business to explore, analyze, and understand information in a variety of different ways. Their solutions cover a wide variety of areas from retail to media and publishing.</p>
<p>Rob Starr, July 15, 2010</p>
<p><em>Freebie</em></p>
]]></content:encoded>
			<wfw:commentRss>http://arnoldit.com/wordpress/2010/07/15/camelot-to-go-viral/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Oxford Flexes Its Reference Muscles</title>
		<link>http://arnoldit.com/wordpress/2010/04/22/oxford-flexes-its-reference-muscles/</link>
		<comments>http://arnoldit.com/wordpress/2010/04/22/oxford-flexes-its-reference-muscles/#comments</comments>
		<pubDate>Thu, 22 Apr 2010 08:04:59 +0000</pubDate>
		<dc:creator>Stephen E. Arnold</dc:creator>
				<category><![CDATA[Financial]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Library automation]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Online (general)]]></category>
		<category><![CDATA[Publishing]]></category>

		<guid isPermaLink="false">http://arnoldit.com/wordpress/?p=11850</guid>
		<description><![CDATA[I go to a gym every couple of days when I am in town. So happens that that a number of semi pro wrestlers go to the gym. Big people. Tattoos. Muscles. I an old wimp and I graciously give up my place when one of these steroid stallions trots to the workout station I [...]]]></description>
			<content:encoded><![CDATA[<p>I go to a gym every couple of days when I am in town. So happens that that a number of semi pro wrestlers go to the gym. Big people. Tattoos. Muscles. I an old wimp and I graciously give up my place when one of these steroid stallions trots to the workout station I favor. Academics have muscles, but I think that my image of a muscular academic and one from Oxford University at that is of a milder, more gentle giant.</p>
<p>The Oxford muscle builders have turned their attention to creating online bibliographies. I think, based on reading the write up “<a href="http://arstechnica.com/science/news/2010/04/oxford-university-press-launches-the-anti-google.ars?comments=1#comments-bar" target="_blank">Oxford University Press Launches the Anti-Google</a>” that these will be variants of the old Goldentree bibliographies or the type of reference book Constance Winchell cranked out.</p>
<p>Here’s a synopsis of the product:</p>
<blockquote><p>The OBO [Oxford Bibliographies Online] tool is essentially a straightforward, hyperlinked collection of professionally-produced, peer-reviewed bibliographies in different subject areas—sort of a giant, interactive syllabus put together by OUP and teams of scholars in different disciplines. Users can drill down to a specific bibliographic entry, which contains some descriptive text and a list of references that link to either Google Books or to a subscribing library&#8217;s own catalog entries, by either browsing or searching. Each entry is written by a scholar working in the relevant field and vetted by a peer review process. The idea is to alleviate the twin problems of Google-induced data overload, on the one hand, and Wikipedia-driven GIGO (garbage in, garbage out), on the other.</p></blockquote>
<p>Sounds good but there may be some challenges:</p>
<p>First, these hand crafted bibliographies are expensive to create and keep current. The rush of enthusiasm for a project of this type gets some bibliographies out the door. However, the ongoing costs are likely to be an issue because libraries may not have the agility to buy this online service. Oxford University has the money, but once the reality of the costs sink in, my hunch is that push back from the finance person will be coming in 12 months.</p>
<p>Second, revenue. The spreadsheet fever makes the project look pretty tasty. Oxford will find itself dancing with some big outfits in the commercial database world. My view is that Oxford will have to find a partner quickly because, let’s face it, universities are not exactly the top guns in the marketing arena.</p>
<p>Third, the anti Google thing is cute but irrelevant. The Google is muddling along with probes into different market sectors. The Google is in the “good enough” game and that’s where Google’s search and reference services will aim. Google may end up with some academic wonder products but that will be exhaust from the Google revenue machine. Red herring to even mention Google.</p>
<p>Fourth, users want to click and get the full text. When I am doing research, I know how to do the primary and secondary research drill. The problem is that time and resources force me to use my own tools like the Overflight system. But for some tiny percentage of folks looking up information online Bing, Google, and Yahoo will pretty good. To dig into the next level, libraries have Ebsco products. Those who need more are going to be Oxford level researchers, and I am not sure a product aimed for this tiny slice of online users can generate enough revenue to exist without subsidies. Will Oxford fund the rowing team or the bibliographies? Time will tell.</p>
<p>In short, interesting but a bit of anachronism in my opinion.</p>
<p>Stephen E Arnold, April 22, 2010</p>
<p><em>No one paid for this post.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://arnoldit.com/wordpress/2010/04/22/oxford-flexes-its-reference-muscles/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Arnold at NFAIS: Google Books, Scholar, and Good Enough</title>
		<link>http://arnoldit.com/wordpress/2009/06/26/arnold-at-nfais-google-books-and-good-enough/</link>
		<comments>http://arnoldit.com/wordpress/2009/06/26/arnold-at-nfais-google-books-and-good-enough/#comments</comments>
		<pubDate>Fri, 26 Jun 2009 05:05:27 +0000</pubDate>
		<dc:creator>Stephen E. Arnold</dc:creator>
				<category><![CDATA[Business strategy]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[Feature]]></category>
		<category><![CDATA[Library automation]]></category>
		<category><![CDATA[Mobile]]></category>
		<category><![CDATA[Overflight]]></category>
		<category><![CDATA[Search]]></category>
		<category><![CDATA[Semantic]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Text processing]]></category>

		<guid isPermaLink="false">http://arnoldit.com/wordpress/?p=7163</guid>
		<description><![CDATA[Speaker’s introduction: The text that appears below is a summary of my remarks at the NFAIS Conference on June 26, 2009, in Philadelphia. I talk from notes, not a written manuscript, but it is my practice to create a narrative that summarizes my main points. I have reproduced this working text for readers of this [...]]]></description>
			<content:encoded><![CDATA[<p><em><span style="color: #800000;">Speaker’s introduction: The text that appears below is a summary of my remarks at the NFAIS Conference on June 26, 2009, in Philadelphia. I talk from notes, not a written manuscript, but it is my practice to create a narrative that summarizes my main points. I have reproduced this working text for readers of this Web log. I find that it is easier to put some of my work in a Web log than it is to create a PDF and post that version of a presentation on my main Web site, </span></em><a href="http://www.arnoldit.com"><em><span style="color: #800000;">www.arnoldit.com</span></em></a><em><span style="color: #800000;">. I have skipped the “who I am” part of the talk and jump into the core of the presentation.</span></em></p>
<p><em><span style="color: #800000;">Stephen Arnold, June 26, 2009</span></em></p>
<p>In the past, epics were a popular form of entertainment. Most of you have read the Iliad, possibly Beowulf, and some Gilgamesh. One convention is that these complex literary constructs begin in the middle or what my grade school teacher call “<em>In media res</em>.”</p>
<p>That’s how I want to begin my comments about Google’s scanning project – an epic &#8212; usually referred to as <a href="http://books.google.com/" target="_blank">Google Books</a>. Then I want to go back to the beginning of the story and then jump ahead to what is happening now. I will close with several observations about the future. I don’t work for Google, and my efforts to get Google to comment on topics are ignored. I am not an attorney, so my remarks have zero legal foundation. And I am not a publisher. I write studies about information retrieval. To make matters even more suspect, I do my work from rural Kentucky. From that remote location, I note the Amazon is concerned about Google Books, probably because Google seeks to enter the eBook sector. This story is good enough; that is, in a project so large, so sweeping perfection is not possible. Pages are skewed. Insects scanned. Coverage is hit and miss. But what other outfit is prepared to spend to scan books?</p>
<p>Let’s begin in the heat of the battle. Google is fighting a number things. Google finds itself under scrutiny from publishers and authors. These are the entities with whom Google signed a “truce” of sorts regarding the scanning of books. Increasingly libraries have begun to express concern that Google may not be doing the type of preservation job to keep the source materials in a suitable form for scholars. Regulators have taken an interest in the matter because of the publicity swirling around a number of complicated business and legal issues.</p>
<p>These issues threaten Google with several new challenges.</p>
<p>Since its founding in 1998, Google has enjoyed what I would call positive relationships with users, stakeholders, and most of its constituents. The Google Books’ matter is now creating what I would describe as “rising tension”. If the tension escalates, a series of battles can erupt in the legal arena. As you know, battle is risky when two heroes face off in a sword fight. Fighting in a legal arena is in some ways more risky and more dangerous.</p>
<p>Second, the friction of these battles can distract Google from other business activities. Google, as some commentators, including myself in <em><a href="http://www.infonortics.com/publications/google/google-gutenberg.html" target="_blank">Google: The Digital Gutenberg</a></em> may be vulnerable to new types of information challenges. One example is Google’s absence from the real time indexing sector where <a href="http://www.facebook.com" target="_blank">Facebook</a>, <a href="http://www.twitter.com" target="_blank">Twitter</a>, <a href="http://www.scoopler.com" target="_blank">Scoopler.com</a>, and even <a href="http://www.bing.com" target="_blank">Microsoft</a> seem to be outpacing Google. Distractions like the Google Books matter could exclude Google from an important new opportunity.</p>
<p>Finally, Google’s approach to its projects is notable because the scope of the project makes it hard for most people to comprehend. Scanning books takes exabytes of storage. Converting images to ASCII, transforming the text (that is, adding structure tags), and then indexing the content takes a staggering amount of computing resources.</p>
<p><a href="http://arnoldit.com/wordpress/wp-content/uploads/2009/06/image11.png"><img style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" title="image" src="http://arnoldit.com/wordpress/wp-content/uploads/2009/06/image-thumb11.png" border="0" alt="image" width="244" height="190" /></a></p>
<p><span style="font-size: xx-small; color: #800000;">Inputs to outputs, an idea that was shaped between 1999 to 2001. © Stephen E. Arnold, 2009</span></p>
<p>Google has been measured and slow in its approach. The company works with large libraries, provides copies of the scanned material to its partners, and has tried to keep moving forward. Microsoft and Yahoo, database publishers, the Library of Congress, and most libraries have ceded the scanning of books work to Google.</p>
<p>Now Google finds itself having to juggle a large number of balls.</p>
<p>Now let’s go back in time.</p>
<p>I have noticed that most analysts peg Google Books’s project as starting right before the initial public offering in 2004. That’s not what my research has revealed. Google’s interest in scanning the contents of books reaches back to 2000.</p>
<p>In fact, an analysis of Google’s patent documents and technical papers for the period from 1998 to 2003 reveals that the company had explored knowledge bases, content transformation, and mashing up information from a variety of sources. In addition, the company had examined various security methods, including methods to prevent certain material from being easily copied or repurposed.</p>
<p>The idea, which I described in my <em><a href="http://www.infonortics.com/publications/google/google-legacy.html" target="_blank">The Google Legacy</a></em> (which I wrote in 2003 and 2004 with publication in early 2005) was to gather a range of information, process that information using mathematical methods in order to produce useful outputs like search results for users and generate information about the information. The word given to describe value added indexing is metadata. I prefer the less common but more accurate term meta indexing.</p>
<p><span id="more-7163"></span></p>
<p>The scanning part was to take information “locked in” paper or analog form and convert it to a digital form. Google realized that it could not populate its knowledge bases by buying scanning services from commercial sources. That would be too expensive and create a dependency. Traditional database producers and publishers were not in a financial or technical position to undertake a project that would attempt to convert books into digital form. Google seemed to have hit a dead end.</p>
<p>After two or three years of preliminary investigation and engineering research, Google hired Wayne Rosing. Mr. Rosing was and still is a technical wizard, although he no longer works full time at Google. He brought to the company expertise in optical character recognition. He joined Google from Caere where he was instrumental in that firm’s scanning and OCR technology. He also brought pragmatic engineering expertise which contributed to the development by Google suppliers specialized scanning equipment, sophisticated algorithms to deal with curvature of thick volumes, and work flow processes that allow Google to “drop in” a scanning operation  so that a library’s operation is not significantly disrupted.</p>
<p>The project began filling Google’s servers with book information, meta indexing, and digitized content that supplemented Google’s own knowledge bases. The knowledge bases contain its Web index, informatoin about Google’s systems, and the informaiton from scanning books.</p>
<p>As I reported in <em>The Google Legacy </em>in 2005, the notion of knowledge bases at Google was important because “smart software” looks at the knowledge bases or their “values” in order to make “decisions”. Google deals with mathematics and the knowledge bases exist as collections of meaningful values, data, and “digital envelopes”. When you search for spears, the system displays information about “Britney Spears”. You have Google’s knowledge bases to thank or blame for that approach.</p>
<p>Since its beginnings in the 1999 to 2000 period, the Google Books’ project has expanded to include <a href="http://books.google.com/books?id=YxcEAAAAMBAJ&amp;hl=En&amp;source=gbs_navlinks_s#all_issues_anchor" target="_blank">magazines</a>. You can see this yourself. Navigate to Google Books and click on a link to a magazine. The magazine covers appear, and you can see a very rich mash up of information about a particular issue. I like to think of this as what the Union List of Serials should have been. But like most Google services, Google has stepped in, done a job that a publisher like Bowker or Cengage could have done. Google has moved into a sector where I think the Library of Congress or the British Museum should have taken the lead. We know the publishers and the national libraries did not do the job. Google began and now is playing a role that easily could have been played by other organizations. These organizations did not. Google did.</p>
<p>The difference of course is that Google operates in a one-to-one world. The Google computing infrastructUre eliminates the multiple, serialized steps between a user and an answer. I can see that the Google approach makes it possible for a person with a Web log to use Google as a new publishing medium.</p>
<p>When you look at a typical Google Books’s page, you see a number of functions. I don’t have time to work through each of these. But the Google Books’s system includes a way for Google to sell a book, provide information, and display to the user sections that include the search term. I think the system is quite usable, but it is, in a sense, the tip of the iceberg. The information resides within the Google infrastructure so Google can add features, bells, and whistles with little delay and only an incremental cost.</p>
<p>I call this power leveling because Google just does the work. Most observers fail to explain exactly what Google’s strategy is in a particular initiative. To illustrate: What’s the long term contribution of Google Wave to Google Books?</p>
<p>Now let’s move to the dénouement for this epic battle. I don’t know how Google will resolve its many challenges with its Google Books’ project. I do know that in the research for my 2007 study <a href="http://www.infonortics.com/publications/google/google-predator.html" target="_blank"><em>Google Version 2.0</em></a>, Google had completed most of the core functionality for its globe-wrapping computing infrastructure. Since 2006, Google has been accelerating its application development. The company now has an information application platform that makes it possible for the company to play one or more roles in the global information industry. The company can be a primary publisher as it is with its Knol and Web log services. The company can be a content distributor as it is with its iGoogle service and its YouTube.com product. In short, as I describe in <em><a href="http://www.infonortics.com/publications/google/gutenberg-contents.pdf" target="_blank">Google: The Digital Gutenberg</a> </em>a publisher can use Google to run a proprietary information business using only Google and making Google a partner to the venture.</p>
<p>Google has patent documents that describe how the partners can control virtually every aspect of an information business. The invention is described in terms of video content, but the system and method, as the patent authors disclose, can be applied to other media which is evident to “one skilled in the art.”</p>
<p>Google offers input forms for its Local service that provide a free “yellow page” type listing, knowledge to Google’s knowledge bases, and useful information to Google users via tethered or mobile computing devices. I have mentioned Knol, which is a type of user built encyclopedia. Google publishes a large amount of information via its more than 70 Web logs, which you can search without charge on my Overflight service. Google even makes it possible for me to have a multimedia ad about myself, hooked into Google Maps and a Google Profile. Educators can weave these services together to provide a rich instruction service to middle school students. You can create a Google “magazine” individualized for you. You can watch a Google channel on YouTube.com. Even the Pope has a Vatican channel on YouTube.com. Google and education is emerging as “next big thing”. You can follow education in terms of <a href="http://googleblog.blogspot.com/2009/06/free-webinar-google-apps-education.html" target="_blank">Google Apps</a> or in its new <a href="http://adage.com/digital/article?article_id=137473" target="_blank">Digital Education Portal</a>.</p>
<p>Let me give one example how Google’s smart software can make use of the knowledge bases. A Google software agent examining a “fact table” in a knowledge base. You can see these “fact tables” in Google Base or Google Fusion. When the software agent doesn’t know whether a value is “within range”, the software agent can look in a library for another mathematical method. If that does not resolve the issue, the agent can consult the Google knowledge bases. The idea is that Google refines the “values” in its fact table over time. This is just good engineering, not a science fiction writer’s notion of a super human brain.</p>
<p>The outputs of these systems is interesting. Google does not provide much detail, but one tantalizing example became available in 2007. The Google system, according to the patent document 20070198481 generates a dossier about the user’s query, in this case Michael Jackson, the pop star.</p>
<p>In closing, will there be a sequel to this epic battle between Google and its challengers? I have no idea. I can offer three closing observations:</p>
<p>First, Google has a system that works a bit like Lego blocks. Services, even information, can be snapped together. It is, therefore, imperative that those who want to understand Google look beyond advertising, Web search, and the squabble over Google Books. The company can morph without warning. This makes Google a very formidable competitor. How long would it take Google to become a publisher and resolve copyright by asking me to “publish” my next study for Google, for distribution by Google, and for monetization by Google. In my case, not long at all. My traditional publishers are struggling and their woes impact my financial future. Maslow’s hierarchy comes into play, not a love of tradition.</p>
<p>Second, those fighting Google have to recognize that Google is not a small company. Forget the lava lamps. Google can be a dominant force in certain battles. Without resources, fighting Google can be a difficult proposition. Viacom has been chasing Google for years. What’s the status? Stalled by legal maneuvers. This is an arena for those with considerable funds, lawyers, and stamina. European legal challenges may be contentious. Google Books is not deep linking. Google Books is a large dataspace.</p>
<p><a href="http://arnoldit.com/wordpress/wp-content/uploads/2009/06/image12.png"><img style="border-top-width: 0px; display: inline; border-left-width: 0px; border-bottom-width: 0px; border-right-width: 0px" title="image" src="http://arnoldit.com/wordpress/wp-content/uploads/2009/06/image-thumb12.png" border="0" alt="image" width="244" height="184" /></a></p>
<p>Third, I am pragmatic. For years, I have been urging publishers to surf on Google. Now “<a href="http://googleblog.blogspot.com/2009/05/went-walkabout-brought-back-google-wave.html" target="_blank">wave</a>” has another meaning. Google’s newest technology can engulf some organizations. For some, Google presents an opportujnity for a thrilling ride. For libraries faced with funding pressures, Google offers one way to obtain digital instances. For scholars, something good enough may have to do. For others, Google represents a powerful force that can change landscapes. Like some natural forces, Google operates slowly. Are we discerning what is truly significant about Google Books? Are we watching a minor feature, not the major thrust of the activity? I am trying to get the right perspective. Are you?</p>
<p>Stephen Arnold, June 26, 2009</p>
]]></content:encoded>
			<wfw:commentRss>http://arnoldit.com/wordpress/2009/06/26/arnold-at-nfais-google-books-and-good-enough/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Library Teaches Search &#8211; More Instruction Needed</title>
		<link>http://arnoldit.com/wordpress/2009/06/22/library-teaches-search-more-instruction-needed/</link>
		<comments>http://arnoldit.com/wordpress/2009/06/22/library-teaches-search-more-instruction-needed/#comments</comments>
		<pubDate>Mon, 22 Jun 2009 05:03:21 +0000</pubDate>
		<dc:creator>Stephen E. Arnold</dc:creator>
				<category><![CDATA[Library automation]]></category>
		<category><![CDATA[News]]></category>
		<category><![CDATA[Online (general)]]></category>
		<category><![CDATA[Security]]></category>

		<guid isPermaLink="false">http://arnoldit.com/wordpress/?p=7143</guid>
		<description><![CDATA[My recollection is that libraries taught search as far back at 1980. I recall that either database vendors would run demonstrations or that librarians skilled in the use of online would provide guidance to those who asked. I recall running a class in ABI/INFORM at Chicago Public Library and there was an overflow crowd of [...]]]></description>
			<content:encoded><![CDATA[<p>My recollection is that libraries taught search as far back at 1980. I recall that either database vendors would run demonstrations or that librarians skilled in the use of online would provide guidance to those who asked. I recall running a class in ABI/INFORM at Chicago Public Library and there was an overflow crowd of both staff and research minded patrons. I was delighted, therefore, to see an article in the <a href="http://www.tmcnet.com/usubmit/2009/06/21/4236173.htm" target="_blank">Sacramento Bee</a> that described the Sutter Library’s classes in finding health and medical information online. The class is a reminder to me that:</p>
<ol>
<li>Librarians and information professionals often know how to search and have an interest in sharing that knowledge</li>
<li>Patrons are smart enough to know that despite the marketing hype and the pundits’ assertions that search is a “done deal” additional instruction attracts people and finds its way into The Sacramento Bee</li>
</ol>
<p>We have a long way to go before information professionals will be relics of a long gone time. The people who tell me that they “know how to search” and “can locate almost anything online” are kidding themselves. I think I am a reasonably good researcher. But if you spend time monitoring how I find information, you will learn quickly that I turn to experts who make my search skills look primitive. Even my nifty Overflight system pales with the type of information that my research team generates by:</p>
<ul>
<li>Knowing what content is located where</li>
<li>Understanding the editorial method behind or absent from certain online systems</li>
<li>Leveraging hard-to-manipulate resources such as information from government repositories, specialized services, and individual experts.</li>
</ul>
<p>I would like to see more libraries move aggressively into online instruction, market those programs, and raise the level of expertise. Most of the people who claim to be experts at search are clueless about how bad their skills are. Among the worst offenders are self appointed search experts who have trouble figuring out when something is likely to be baloney and when something is just plain wrong. Enterprise search, content management, and text mining are three disciplines where better research will be most beneficial in my opinion. Then we need critical thinking skills. Schools have dropped the ball. Maybe libraries can help in this area as well? Search procurement teams will be well served if the team has one or more librarians in the huddle.</p>
<p>Stephen Arnold, June 22, 2009</p>
]]></content:encoded>
			<wfw:commentRss>http://arnoldit.com/wordpress/2009/06/22/library-teaches-search-more-instruction-needed/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>

