Mysteries of Online 7: Errors, Quality, and Provenance

February 19, 2009

This installment of “Mysteries of Online” tackles a boring subject that means little or nothing to the entitlement generation. I have recycled information from one of my talks in 1998, but some of the ideas may be relevant today. First, let’s define the terms:

Errors–Something does not work. Information may be wildly inaccurate but the user may not perceive this problem. An error is a browser that crashes, a page that doesn’t render, a Flash that fails. This notion of an error is very important in decision making. A Web site that delivers erroneous information may be perceived as “right” or “good enough”. Pretty exciting consequences result from this notion of an “error” in my experience.
Quality–Content displayed on a Web page is consistent. The regularity of the presentation of information, the handling of company names in a standard way, and the tidy rows and columns with appropriate values becomes “quality” output in an online experience. The notion of errors and quality combine to create a belief among some that if the data come from the computer, then those data are right, accurate, reliable.
Provenance–This is the notion of knowing from where an item came. In the electronic world, I find it difficult to figure out where information originates. The Washington Post reprints a TechCrunch article from a writer who has some nerve ganglia embedded in the companies about which she writes. Is this provenance enough or do we need the equivalent of a PhD from Oxford University and a peer reviewed document. In my experience, few users of online information know or know how to think about the provenance of the information on a Web page or in a search results list. Pay for placement adds spice to provenance in my opinion.

So What?

A gap exists between individuals who want to know whether information is accurate and can be substantiated from multiple sources and those who take what’s on offer. Consider this Web log post. If someone reads it, will that individual poke around to find out about my background, my published work, and what my history is. In my experience, I see a number of comments that say, “Who do you think you are? You are not qualified to comment on X or Y.” I may be an addled goose, but some of the information recycled for this Web log are more accurate than what appears in some high profile publications. A recent example was a journalist’s reporting that Google’s government sales were about $4,000, down from a couple of hundred thousand dollars. The facts were wrong and when I checked back on that story I found that no one pointed out the mistake. A single GB 7007 can hit $250,000 without much effort. It doesn’t take many Google Search Appliance Sales to beat $4,000 a year in revenue from Uncle Sam.

The point is that most users:

Lack the motivation or expertise to find out if an assertion or a fact is correct or incorrect. Instead of becoming a priority, in my opinion, few people care too much about the dull stuff–chasing facts. Even when I chase facts, I can make an error. I try to correct those I can. What makes me nervous are those individuals who don’t care whether information is on target.
See research as a core competency. Research is difficult and a thankless task. Many people tell me that they have no time to do research. I received an email from a person asking me how I could post to this Web log every day. Answer: I have help. Most of those assisting me are very good researchers. Individuals with solid research skills do not depend solely upon the Web indexes. When was the last time your colleague did research among sources other than those identified in a Web index.
Get confused with too many results. Most users look at the first page of search results. Fewer than five percent of online users make use of advanced search functions. Google, based on my research, takes a “good enough” approach to their search results. When Google needs “real” research, the company hires professionals. Why? Good enough is not always good enough. Simplification of search and the finding of information is a habit. Lazy people use Web search because it is easy. Remember: research is difficult.

What’s the Mystery?

The mystery is that content which is factually correct may be perceived as wrong or off base if the presentation of the data is not consistent, easily understood, distraction fee.

The challenge of online is to have solid data and achieve consistency. The systems must be easy to use and make it possible for users to compare and contrast like data. A handful of systems present side by side results, but these systems are not used.

The bafflers, in my opinion, are:

Online makes it easy to spot certain types of errors. For example, a missing value in a D&B credit report. Viewed on paper, the gap may not be noticeable. Online the mistake can often be spotted easily if one takes the time to look. At the same time, the blurring homogenization of pages of text make it tough to focus and spot errors. Flipping and multi tasking exacerbate the problem.
Good information when converted to online form may become unusable information. Examples of this range from spotting a bogus Web page pumped up on SEO steroids to crammed interfaces with lots and lots of headlines, intrusive popups, and crazy color schemes. I find www.popurls.com hard to use even less useful than print outs of the lists. Spotting an error online is quite difficult due to the colors, blue and black with white headlines. The interface invites hip hopping around.
A news story or article broken up across three or more Web pages makes it hard for me to flip back and check what the author said on a previous page. The ads, the surveys, and the automatic videos–these erode attention. If a uer does not exert considerable effort, the information may be right and be ignored or looked at with half an eye.

Is There a Fix?

Search without search may provide some benefits for certain device users. For general users, I am not sure how the rigor of checking sources, comparing data points, and digging into multiple, high value resources can be inculcated in most Web users. Some people have an affinity for research. Others, in my opinion, don’t know how poor their research and information processing skills are. Others are content with the first Google listing in a results list.

The big fix will come from a company that “becomes” the Internet. In that scenario, one organization can use its view of the datasphere to make decisions about what’s relevant and what’s reliable. If this sounds scary, think in terms of benign reference librarian. This person has the expertise to judge accuracy, reliability, and provenance. Few doubted the librarian in grade school I attended from 1950 to 1958.

One of the mysteries of online then is the answer to this question, “Who will become the librarian who takes care of our meta-information needs in the 21st century? Send me your candidates, please.

Stephen Arnold, February 19, 2009

Written by Stephen E. Arnold · Filed Under Feature, Online (general), Rich media

Comments

One Response to “Mysteries of Online 7: Errors, Quality, and Provenance”

sperky undernet on February 19th, 2009 1:53 am

How people using one screen can concentrate on minimized webpages is a mystery to begin with. Solutions start with dual monitor use. That leaves the questions concerning user choice of what is important to examine. Granted we all have technical tricks and methods examining more than one document or webpage at a time but this assumes we concern ourselves with the meaningful documents to begin with and, as your feature makes clear, layers of diligence are mandatory to validate or qualify. This stage is separate from another fraught with complexity which I attribute to our “conversation” with the material. It is unlikely our super librarian can provide this conversation without those “pesky but essential reference interviews” referred to in comments to Mysteries of Online 6. However it might be alleviated somewhat, tho not entirely, by AskJeeving our searches – working through answers and results and then building, rethinking and adding layers to our own questions, improving them. My history prof in university in the 70s would say this is part of something called “learning”, an intellectual activity that occurs within the student/inquirer mind. With the supervision and guidance of the teacher/professor/mentor. In my opinion, this has to break into all the disintermediations for the possibility of anything durable – besides twitter -to emerge.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.