User Tracking Yahoo Style

February 5, 2009

Yahoo, if the news item in Web Pro News, is spot on, Yahoo is taking on an interesting challenge. “Yahoo to Start Keeping Tabs on Your Searches” by Chris Crumb documents Yahoo’s me-too of some discontinued Google features. Mr. Crumb said:

Search Pad for the Yahoo search engine. Essentially, it keeps track of your searches, figures out when you are researching things, and stores results of interest in a virtual notepad you can use for reference.

The write up provides links to additional information. The usage tracking implications are fascinating. The core of the write up is an interview with Tom Chi, Senior Director of Product Management with Yahoo Search. One of the most interesting comments was:

“This [service] follows the same data retention policy we have across Yahoo!,” explains Chi. “We recently announced a new policy.  Under the new policy, Yahoo! will anonymize user log data within 90 days with limited exceptions for fraud, security and legal obligations. Yahoo! will also expand the policy to apply not only to search log data but also page views, page clicks, ad views and ad clicks.

Usage tracking yields high value data. How will the user, law enforcement, and marketing communities respond? It’s too soon to tell.

Stephen Arnold, February 5, 2009

Google and Privacy

February 3, 2009

Google has become a magnet for hassles. I don’t know if this story “Privacy Professional Facing Criminal Charges” here is taken from Moses’ tablets. But the fact that it is running points out how the once infallible Googzilla has become the a whipping boy. For me the most interesting comment in this write up was:

But the Internet is a different medium, says Google. “We cannot agree with the concept that a tool can be blamed for the use that is made of it,” a company spokesperson said.

I am not going to disagree with the Googzilla, but it seems that the malware fiasco, the propensity to blame another outfit and then fess up, and now the escalating problem in Italy signals a change.

About a year or so ago, Google’s attorneys informed me via my client that my reports about its Semantic Web activities were deep dark secrets caught me by surprise. I thought Google’s attorneys read their patent documents. My client faxed the cover pages of the sources of my information, and the USPTO patent application number was a revelation. Now one of Google’s legal eagles won’t be buying limoncello from the source for a while. Google has kept Ramanathan Guha out of the spotlight. Now the company is allowing Alon Halevy to chat up the Google’s semantic interests. Alon Halevy’s talk last week received little notice.

Google’s legal troubles in Italy did. In my opinion, it is interesting to see what constitutes news. For another “issue”, check out Google’s response to an allegation that its services assist bad guys here.

Stephen Arnold, February 3, 2009

Picking Google’s Security Boil

January 28, 2009

Johnny Doe (original name!), writing on Wiseperception.com here, takes a rough finger nail and digs into Google’s security scab. The core of the article is information gathered by an Austrian professor named Hermann Maurer. After my Google 2.0 study appeared, I received several queries from folks in Europe wanting me to provide information about Google that was negative. I refused. I read patent applications and technical papers. Not Herr Dr. Maurer. The academic is asserting that Google has data about users and can assemble those data into profiles. No kidding. The news, Herr Dr. Maurer, is old. The privacy and security boils on Googzilla’s snout are pretty obvious as well. For me the most interesting comment in this article was:

He [Herr Dr. Maurer] also speculates on the possibility of Governments paying Google for information on an opponent, or to block their citizens’ access to servers. “If Google did this they wouldn’t be doing anything illegal. They have this information, they are a company, why not sell it?” Maurer says.

Great idea. The only problem is that Google remains sufficiently disorganized that government officials have trouble getting a Googler to return their calls. My thought is that Europe is going to be a giant thorn in Googzilla’s paw with regard to privacy. Microsoft and other competitors have avoided tackling Google head on on this matter. Looks like the good Herr Doctor won’t be a shrinking violet. Herr Dr. Maurer may be the first of European Google watchers to poke Google’s security boil.

Stephen Arnold, January 28, 2009

More Social Network Issues

January 12, 2009

Social search, social networks, and social pitfalls–the cheerleaders don’t want the social bandwagon to be delayed but trouble looms. Google’s Orkut made clear the issues that can arise when a social network becomes the playground of some interesting people in Brazil. Now you can read “(Under)mining Privacy in Social Networks” here by a trio of Googlers. The Google write up identifies some obvious flaws; for example, exposing information unintentionally. But the more significant part of the paper in my opinion are the references to merging social graphs. The dataspace drum beats are getting louder.

Stephen Arnold, January 12, 2009

Social Search: Manipulating for Money

January 9, 2009

Mike Elgan wrote “How China’s 50 Cent Army Could Wreck Web 2.0” here. The point of this article is that a person with money can hire Chinese computer users to insert comments into social networks. The infusion of posts would, in effect, distort the much-ballyhooed wisdom of crowds. Mr. Elgan does a good job of explaining how these army works and pointing out the fragility of user-dependent Web 2.0 services. I think he strays from the tethering ring when we asserts that the Chinese “army” can undermine free speech, but otherwise, he’s spot on.

However–and I know you relish my “howevers”–a few of my addled goose observations are now in order.

First, the “social network” revolution is not as zippy as most pundits assert. Mr. Elgan’s write up explains how the person with money can pay to make a specific issue, product, or person percolate upwards. Money can’t buy happiness but it sure can buy visibility in a Web 2.0 service that depends on user inputs.

Second, social networks is more of marketing story than a technology innovation. Sure, MySpace.com and Facebook.com move well beyond discussion fora and individual Web pages. These sites have knitted together functions and surfed on young-at-heart users who need a way to connect in today’s Jetson’s world. As the young-at-heart grow old and infirm, their use of network communication methods will persist, but these methods are extensions of older technologies, not sudden inventions.

Third, the implications of a technology cannot be accurately predicted. As a result, when an issue arises with a technology application or suite of technology applications like social networks, the “fix” will be more technology. My concern with MySpace.com and Facebook.com stems not from what they do, but my concern arises from the new technologies these services will require to handle the problems. For example, what’s the fix for the Chinese “army” issue? Think more stringent controls. The casualty is not free speech. It is freedom.

Stephen Arnold, January 9, 2009

Lawyers and Metadata

January 8, 2009

Now the indexing world gets something to gnaw on. Automated indexing systems beat out humans when measured by cost per item indexed, speed, and consistency. Automated indexing systems can be as good as a human for some types of content. But humans are variably bad at indexing. Software hits a sweet spot and doesn’t get significantly better or worse unless the content throws in a wrench. Now the issue of not providing metadata arises. We can automate the creation of metadata, but it is early days in the world of automatic metadata scrubbing. I quacked happily when I thought, “I wonder who knows where their metadata are?”

Jim Calloway’s “Metadata–What Is It and Waht Are My Ethical Duties” here breathes new life into human indexing. What I find interesting is that lawyers charge by the hour. Human indexes are paid by piece work schedules or given a flat year fee and maybe some benefit crumbs. The economics of human indexing is based on keeping the per record cost as low as possible whilst one maintains the “quality” of the indexing. “Quality” in the commercial database world is often defined as a metric such as “four to six index terms per bibliographic record” or “16 records per hour with required fields completed”. You may have a more academic definition, but my examples come from the soon-to-be-marginalized world of human commercial database production.

The article defines metadata in terms of a legal eagle, of course. But the story gets interesting when Mr. Calloway cites a sitaution in which metadata became a legal issue. Where there is a legal issue, there is the risk of a fine, jail, or losing pride of place among the brood of legal eagles. Forget the compensation. Ego may be a bigger force in the legal eagle world. Mr. Calloway nicely hooks metadata with risk.

For me, the most important comment in this useful write up was:

In this writer’s view, the key is to avoid sending out documents with metadata that could disclose confidential information. Comparing metadata to a wrongly sent fax or e-mail is questionable and the idea that lawyers will be prohibited from examining metadata while parties, law enforcement officers and private detectives will be free to do so seems artificial at best. The Colorado rule that one must disclose receiving confidential information via metadata before acting on it seems to strike a rational balance. The best rule is for law firms to develop best practices internally to keep metadata from “escaping” in the first place.

I quite like “keep metadata from escaping in the first place”. To close, let me ask several questions:

  • Do you know why metadata are in the documents available for indexing on your Web site
  • Do you know how value added indexing in a dataspace can expand the access to a document in an often unrelated context
  • Do you know where metadata are in a document, in a Web page or other containing housing the document, or in the dataspace created for the information objects?

If not, you will want to dig up this information yourself. Asking your attorney will result in a very large legal bill. One final question: Do you think Mr. Madoff knows about his metadata?

Stephen Arnold, January 8, 2009

Google Joins EU Privacy Commission Advisory Group

December 30, 2008

Out-Law.com here reported that Google’s privacy law expert has snagged a seat on a committee which will provide input to the European Commission about data protection. You can read the story here. For me the most important comment in the story was:

Google prompted a debate on retention when it announced it would no longer keep logs indefinitely, but would delete them after 18 months. Data protection authorities argued that logs should be kept for no longer than six months. Google eventually conceded that the EU’s Data Retention Directive did not apply to the information, and has said that it will now only keep records for nine months.

The thought that I had was, “Why keep them any longer than necessary?” The GOOG crunches the data in near real time, tokenizes it, and stuff the outputs of its processes into its nifty data management system. Inside the GOOG, various systems and methods grind away, feeding outputs into other Google operations. The 18 months, the nine months, even the six months of retention are red herrings. The GOOG zooms through data so chopping “months” down may be a negotiating tactic. In my experience with government and quasi government advisory bodies, pertinent facts and solid technical knowledge can be as hard to find as a pig in the hollow who volunteers to become a Kentucky ham.

Stephen Arnold, December 30, 2008

« Previous Page

  • Archives

  • Recent Posts

  • Meta