Autonomy Bullish on 2010

February 10, 2010

Fresh from a $700 million year, a reader sent me a link to “Autonomy Confident on 2010.” With the US economy on life support, the addled goose gobbled up this bit of good news. For us at Beyond Search, the most salient comment in the article was:

“It’s winter with snowdrops,” he said. “We are very confident in an upturn (the new products) will do very well. Customers from airlines, like American Airlines, through to supermarkets were interested in the products, he said. “It is basically anyone that touches consumers, like Carphone Warehouse , Ericsson and Kraft.”

Information about strong results from Exalead reached us a few days ago. Perhaps the weight of debt in the European Community and the tourist troubling events in Athens and southwestern Italy are anomalies. A recovery may be underway if you are viewing the world from Autonomy’s vantage point.

Stephen E Arnold, February 10, 2010

No one paid the addled goose to write this news item. I will report non payment to the Department of Labor in Washington.

British Library Offers Free Downloads for Kindle

February 9, 2010

Short honk: I wrote a few days ago about national libraries not stepping up to the job of scanning their holdings and making them available. Well, wouldn’t you know? The addled goose was wrong. The British Library is making 19th century first editions available via Amazon for the heavily DRMed Kindle. I have a Kindle, so I the goose is in fine feathers. You can read about these 65,000 “rare first editions” in the Telegraph’s “British Library to Offer 19th Century First Editions for Free Download on Amazon Kindle” and get the details. Unlike the commercial services offering special collections, I think the search and retrieval function may be a bit undernourished. I find the Amazon Kindle search system a drag to use. Heck, I think the Amazon.com search service is almost as bad as Apple’s iTunes’ service.

The 19th century novel has been a area of study I loved. I used to read quickly and with great recall. I remember even today such monumental factoids as the name of Pip’s pal in Great Expectations. (You know, Herbert Pocket.) First editions are interesting if you have the subsequent editions that incorporate either the meddling publisher’s fixes or the author’s attempts to salvage a real loser of a novel based on an even lousier series of monthly segments. (Think Pickwick Papers.) The scholar can work through various editions and identify changes. Some of these research nuggets lead to PhDs if not to jobs.

If you hunger for a penny dreadful, you are in luck. Just make sure you own a Kindle.

My hope is that the British Library shifts into high gear and scans, makes searchable, and offers to researchers its collections of periodicals, broadsheets, and books.

With Google being drawn and quartered by everyone from legal eagles to relatives of deceased writers, the national libraries may have to convert their materials to digital form. I don’t plan on buying microfilm reels from ProQuest or hoping that poor conservation methods will preserve millions of fragile information objects.

My  concern is that the collection shows the British Library can do the job. Now the library has to finish the job. Without Google, the ball is back in the national libraries’ side of the court in my opinion.

Stephen E Arnold, February 9, 2010

No one paid me to write this article. I will report this free work to the National Archives, which is embarking on its own digitization project I hear.

Analysis of Aardvark and Crowdsourcing Answers to Questions

February 9, 2010

We received a copy of the paper “Anatomy of a Large-Scale Social Search Engine”.

The information was interesting. The idea is that individuals who have an intereset in answering questions related to their area of expertise provides an adjunct to other research methods. The paper explains the method used to determine who should answer a question and the other components of the “social search system”.

If you have not visited the Aardvark Web site, you will want to take a look at the service. The url is www.vark.com. The screen shot below shows the search box and the firm’s explanation of what happens when you ask a question of the Aardvark question answering community.

aardvark

The informatoin in the paper suggests that members’ question are more lengthy than a query sent to Google. Instead of two or three words, think about a sentence with a dozen or more words. In addition, the answerer and the questioneer can enter into a conversation which further disambiguates and tugs out the needed information.

All good.

When we talked about this paper at lunch, the goslings asked a number of rhetorical questions. I want to share three of these with you, so you can think about the Aardvark paper and your own experience with question answering systems:

  1. The people answering questions have self selected to answer questions. When the method is moved to a more general audience—for example, Facebook or Orkut—will the metrics in the paper be congruent with the broader community’s behavior? (In the self selected community about half of those registered did not ask or answer a question according to the paper.)
  2. In question answering systems, how will disinformation be identified and filtered? (Some government entities, not the US in my opinion, could inject intentionally shaped information which the questioner could accpet as fact than pass along as accurate information.)
  3. Pre computing certain values is one way to minimize computational load; however, over time an expert may acquire additional domains of expertise. How can the system adapt and get “old” experts with “new” informatoin on the roster for certain questions?

We think the Aardvark service is interesting and the paper stimulated our thinking.

Stephen E Arnold, February 9, 2010

No one paid us to read and write about this paper. I will report this to the National Parks Service, an outfit familiar with crowds in Yellowstone and elsewhere.

Use Bing? You Are an Early Adopter

February 8, 2010

AdAge ran “What Your Choice of Search Engine Says about You.” Marketing and economic research usually leaves me baffled. Most of the data and the interpretation of those data are bit like a comedian’s joke. The punch line is unexpected, and I often wonder how the comedian’s imagination was able to hit on a twist to a tired light bulb incident. You will need to read this AdAge article and make your own decision.

What I underlined in my hard copy were these points:

  • The search engine I use tells me a lot about myself. Example: If I use Bing I am an early adopter. My office is filled with new stuff, which arrives every day. I don’t use Bing. Guess what I do as a technology analyst is not what a “real early adopter does”, right?
  • I also shop at Wal*Mart. Wrong. I don’t shop. When I buy stuff I use online services to find low prices and then do some research. Then I buy. When I shop, I get what I need at the junky store attached to the gasoline pump I use to fill my Honda every 10 days.
  • Search engine choice sheds light on consumer behavior. Okay, what about a sample of users of which fewer than 15 percent use something other than Google. For the rest of the sample, it is Google all the way. Doesn’t this change the results? Sure makes me think about this baloney but not the AdAge editors.

Yep, a “new wrinkle”. Maybe for the marketing and econ majors. Not for me.

Stephen E Arnold, February 8, 2010

No one paid me to point out the flaws in this survey. Maybe I will report this non payment to the Bureau of the Census. That outfit knows how to count.

Google and the Lazarus Myth for Books

February 8, 2010

I read “Google: We Will Bring Books Back to Life” by Google’s legal eagle, David Drummond. The title of the article suggested to me that books are dead. I tried to visualize an information Lazarus, but I just received a royalty check from one of my half dozen publishers. The numbers looked fine, so that book of mine was not dead. On life support maybe. Definitely not dead.

image

Google saith, “Get up and read.” Van Gogh sees “yellow” as in journalism. Source: http://cruciality.files.wordpress.com/2009/09/van-gogh-the-raising-of-lazarus-1890.jpg

The argument in the Googler’s write up was well honed. I can see in my mind’s eye several Googlers laboring away in their cubes over the rough draft. This passage struck me as interesting:

First, this passage: “Yet doubts remain, and there is particular concern among authors that they are in danger of handing control of their work to Google.”

Doubts is a bit of an understatement. The love Google enjoyed when it was a wee tot of two or three is long gone. The Google is does more than engender doubt.

Second, this passage: “Some have questioned the impact of the agreement on competition, suggesting it will limit consumer choice and hand Google a monopoly. In reality, nothing in this agreement precludes any other organization from pursuing its own digitization efforts.”

Okay, let’s be clear. National libraries should be digitizing their holdings. National libraries did not take this duty, leaving it to others. Now the “others” are a set of one, the Google. No one other than Google is going to scan books, Period. I accepted this years ago when I worked briefly at UMI in 1986. I thought scanning books was a great idea. The idea got reduced to scanning specific sets of books and then providing a set of microfilm for related information. When I looked into sets, it became clear that one could scan a single book like Pollard & Redgrave and then provide microfilm of the referenced content objects. But UMI’s financial set up precluded much more than a very small, undernourished effort. If money was not available in the mid 1980s, I don’t think money will be available in the post crash, pre depression 2010s.

Third, this passage: “Imagine if that information could be made available to everyone, ­everywhere, at the click of a mouse. Imagine if long-forgotten books could be enjoyed again and could earn new ­revenues for their authors. Without a settlement it can’t happen.”

I can imagine it. I can also imagine the data available to Google’s internal knowledgebases, its advertising revenue, and its potential to generate new content objects from the information in these processed documents.

Apple is working at books from its iTunes angle. Amazon is working at books from the digital Wal*Mart angle.

Where are the national libraries? Where are the consortia of government entities responsible for archives? Where are the UN members’ pooling resources to tackle must have collections such as health and medicine? Where are the publishers’ associations?

Answer nowhere, silent, or hoping for the hobbling of Google.

Google is doing the government’s job. If the Google is stopped, the information in books is going to be handled like so many important tasks in today’s world. Poorly or not at all.

I am on the side of Googzilla. But the Google does not know I exist and does not care about an addled goose in Kentucky. I do hear, “Get up and read” in my mind.

Stephen E Arnold, February 8, 2010

No one paid me to write this. I will report this sad state of affairs to the manager of the US government document repository near my goose pond.

Online Information: The View of a Real Author

February 8, 2010

The addled goose publishes monographs. These are expensive and sell in dribs and drabs. The addled goose has no real publishing experience so I gobbled up the information in the TechCrunch article “Hey, 1997 – Macmillan Called, They Want the Net Book Agreement Back.” If the information in this write up is on the money, real publishers have been making life difficult for authors and booksellers for a long time. I recall reading that Charles Dickens was a slippery dude with whom to deal. Now I am wondering if the dust up between the world’s smartest man (Jeff Bezos) and a bunch of publishers is an information apocalypse, a business negotiation, or a new era in information access. I am working on a really dull monograph with zero interest to anyone except a few attorneys and possibly an investment banker or two. I may even give the monograph away because with the speed with which my stuff published by my publishers is selling, I will be in the big duck oven in the sky before I can pay for a meal at McDuck’ down the road.

In the TechCrunch write up, I noted these items:

First this passage:

Of course, publishers still choose their wholesale price, but there’s nothing to stop, say, Borders from heavily discounting bestsellers to get people through the door. Publishers didn’t necessarily like this as it led to booksellers demanding more aggressive discounting (sometimes more than 60% off the cover price), but they didn’t have much of a choice but to accept. The fact is that publishers couldn’t justify opening up their own stores, so if they wanted readers to be able to actually read their books, they had to keep bookstores happy.

Ah, control is not complete. I did not know that publishers were at the mercy of their retail partners.

Second, this passage:

It took until the late 90s [in the United Kingdom] for the Restrictive Practices Court to declare that the Net Book Agreement was anti-competitive and should be scrapped. Shortly afterwards, Borders entered the UK market, hundreds of UK independent bookshops went bankrupt and publishers decided to change their contracts with authors. Now, instead of being based on the cover price of a book, the author’s royalty would be based on ‘net receipts’, which is to say the price that publishers actually received from bookshops.

Yikes, price manipulation and then pricing actions that further reduced what was paid to real authors. Geese like me are even further down the food chain. Yikes again.

Finally, this passage:

For the first time in the UK since 1997, and ever in the US, publishers are able to set – and enforce- their own prices on ebooks. And they will; not to make a fair return on ebooks but rather to cripple their sales in order to protect early hardback book sales. They’ve admitted as much themselves, saying that prices will start high on hardback release, before dropping steadily over time. The idea that this benefits anyone, least of all authors, is laughable.

So what are the consequences?

You can read the TechCrunch original for its view. Mine is that a disintermediation option is open. I argued that Google could move in this space with the flip of a bit. So far Google is dragging its giant Googzilla feet. But for how long? Have publishers read The Strategy Paradox by Michael E. Raynor. It might be available on Amazon or in your local bookstore. Worthwhile reading for understanding in my opinion.

Stephen E Arnold,

No one paid me to write this. I will report non payment to the US Department of Treasury, which prints money, unlike publishers and authors.

ArnoldIT.com Partner Lands New Web Site

February 8, 2010

Dr. Tyra Oldham Banks is an ArnoldIT.com partner. She notified us on February 7, 2010 of some news. She has rolled out a new Web site. You can see Dr. Oldham on our TheSeed2020.com video and learn about her management and engineering services work at LAND Construct.

tyra small copy

Dr. Tyra Oldham, President, LAND Construct, an engineering services firm and ArnoldIT.com partner.

Point your browser to www.landconstruct.com. A happy quack to her.

Stephen E Arnold, February 8, 2010

This was a compensated placement. Dr. Oldham bought Stuart Schram and me a hot chocolate as an inducement to review her new Web site and make constructive suggestions. The addled goose and one of my goslings were eager to comply. It only took a hot chocolate.

Online Pricing: Disruption Is the Game

February 8, 2010

It’s Monday morning. The Super Bowl is over, but the world football ecosystem is unfazed. The same cannot be said of for-fee content. I want to point out two seemingly unrelated developments and link them to one of the keystones of doing business in an online, Web-centric world. I am working on a couple of oh-so-secret write ups, and I will make oblique references to research findings by the goslings here in Harrod’s Creek that will be more widely known in the spring.

image

When world’s collide. The boundary is the exciting spot in my opinion. Image source: http://www.sciencedaily.com/images/2008/01/080112152249-large.jpg

First, consider the plight of Google Books. Suddenly the Department of Justice is showing some moxie. That’s a good thing, but I think the reality of derailing Google Books is like to have some interesting repercussions going forward. For now, the big story is that Google Books has become the poster child of Google being Google. You can get the received wisdom in the UK newspaper The Telegraph and its write up “Justice Department Cr5iticises Google Books Settlement.” The glee is evident to me in this write up, but perhaps I am jaded and worn down by the approach certain publications take to Google. The company is essentially the first examples of what will be a growing line up of firms that use technology to alter business processes. I will be talking about this in my NFAIS speech on March 1, 2010. I am the luncheon speaker, and I think some of those in the room will get indigestion. The reason is that Google comes from a domain that people within 20 years of my age of 65 don’t fully understand. The Telegraph doesn’t get it either, and I think this passage highlights that generational divide:

The ruling is a blow to Google and authors’ groups who had supported the search giant’s ambitious plan to create a vast online library of digitised books. The controversial Google Book Search project attracted fierce criticism from authors, who believed their rights were being eroded, while winning praise from other quarters for helping to widen access to classic, rare or useful works of literature.

Too bad the writer, a real journalist, omitted the word “goodie”. My hunch is that since national libraries have not shown any interest in creating digital collections, students and researchers will be doing their work the way John Milton and Andrew Marvell did. Great for those who have the time, money, and cursive writing skills. Not so great for those who need to sift through lots of content quickly. With library budgets shrinking and librarians forced to decide which books to keep, which to store, and which to trash, I think the failure of national libraries is evident. Google made a Googley and somewhat immature attempt to step into the breach and look what has resulted? A bureaucratic, legal eagle snarl. Books are an intellectual resource and I keep asking, “If not Google who?” Reed Elsevier? The British government? The National Library of China? A consortium of publishers? The answer is, in my opinion, now clear, “No one.” Maybe Google will keep going with this project. Hard to tell. Life might be easier to shift gears, go directly to authors, and cut specific deals for their future work. In a decade or so, end of problem. Also, end of traditional publishing. If Google actually talked to me, I would offer this advice, “Go for it, dudes.”

Read more

Mobile Devices and Their Apps: Search Gone Missing

February 5, 2010

VentureBeat’s “A Pretty Chart of Top Apps for iPhone, Android, BlackBerry” shocked me. Not a little. Quite a bit. You will want to look at the top apps for the iPhone, the Android devices, and the BlackBerry gizmos. There is no entry for search. In fact, there is no entry for any app related to reading. What’s this tell me?

First, search is not considered an app. I think people assume that finding stuff on a mobile device is a bit of hunting, some familiar icons, and letting the system spit out what seems to be relevant when looking at a map for pizza and walking around. That’s disappointing because search on a mobile device is important in my opinion.

Second, there is no app at hooks into reading. For those publishers in secret meeting with tablet makers, I have a hunch that readership of books and magazines may not take off like a sky rocket. Perhaps bundling a tablet and a subscription will work in some niches, but I am not sure if a major bump will occur.

Finally, the lists tell me too much about our society. I am delighted that I am an old and addled goose.

Stephen E Arnold, February 5, 2010

No one paid me to write about these lists of what are toys and entertainment applications. I suppose I need to report this sad situation to the Department of Education, a stellar outfit.

Lucene and Integrated Log Data

February 5, 2010

You may find “Into the Cloud: How Search Unlocks Log Metadata to Visualize Your Business Process” interesting if you are an open source technology maven. The idea is that different applications generate log files. When these log files are aggregated, the information that can be searched reveals insights about a business, customers, system issues, etc. The participants are Boomi and Lucid Imagination. Boomi is the “integration cloud company”. You can get more information at www.boomi.com. Lucid Imagination is the company that creates a build of Lucene and Solr that is current, complete, and ready to install. Lucid sells engineering services, and I have a hunch some services will be required to deliver unlocked log data.

After listening to the program, I had several questions:

First, the notion of integrating log files is a good one but I wondered how long it takes to suck big log files, determine deltas, and then update the indexes.

The second question pivots on the usefulness of search for log file analysis. In my experience, we have had to jump through hoops to concatenate certain query results, perform sub queries, and then crunch data. The bigger the log files, the more work these steps were.

Listen to the podcast. The idea is interesting, and I think the market uptake on this idea will be the proof of the pudding.

Stephen E Arnold, February 5, 2010

No one paid me to listen to the podcast or write this article. Too bad. I will report this failure to get paid to the Department of Labor. Too bad I am not a child. I could report myself for unfair practices.

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta