Google: Human Editing or Technical Scrubbing

February 7, 2009

Carlo Longino, writing in TechDirt here, reported a story that caught my attention. I don’t know if the story is on the money, but it raised some interesting questions for me. The article was "Google Accused of Invisibly Deleting Blog Posts on the RIAA’s Say So". The assertion is that some Web log posts about an RIAA action have been deleted. For me the most interesting comment was:

Google says that it notifies bloggers after their posts have been taken down, in accordance with the DMCA. But it should hardly be surprising that many of those affected say they’ve gotten no such notice, nor that the offending material was either legally posted and/or supplied by the labels themselves.

What ran through my mind this morning as I bundled debris from the fallen trees that block my path from nest to goose pond were:

  1. What’s so surprising? Google is no longer a wonky start up. The outfit is now disrupting business sectors from telecommunications to online payments, from content distribution to video production. In order to keep potential partners and advertisers happy, why not tweak the corpus? I do it when I delete from the comments to this Web log plugs for financial services.
  2. Could the disappearing content be another of Google’s technical glitches? I can see a script containing errant instructions. The Googleplex crunches forward, hacking down certain references to forbidden Web sites and other subjects on the stop list. In the last couple of weeks, the GOOG has marked every Web site as malware and roached some ad statistics. No big surprise that as Google gets larger, the superior beings of the early Googlers are now diluted with less prescient coders. Ergo, mistakes in run-of-the-mill filtering.
  3. Hasn’t Google been hand fiddling with results for a long time? I recall reading that there was a human touch in some of the Google News displays. Algorithms weren’t, as I thought I heard, sufficiently sensitive to the needs of the dead tree newspaper crowd. Most "search systems" provide tools to permit editing, hit boosting, tuning, and other bits of magic that the search pundits remain blissfully ignorant.

You may have a different view of this situation, but I think it is par for this particular golf course. What do you think? Maybe I’m jaded, but results shaping is not an oddity; it is part of the search and retrieval game.

Stephen Arnold, February 7, 2009

Search without Search, Sort of, Maybe

February 7, 2009

Ina Fried’s “Microsoft Offers to Just Fix It” here describes an interesting twist on getting online answers without some of the messiness of traditional search and retrieval. The idea is that one of my dwindling number of Windows PCs has a problem. (What? A problem!) The user locates a Microsoft knowledgebase “help” document. The document has a logo featuring what looks like a young version of Mr. Rogers, my plumber here in Harrods Creek, Kentucky. Click the icon and Microsoft automatically (perhaps auto magically) fixes the glitch. I am ready for a fix to Word’s autonumbering features. I am ready for a fix for PowerPoint’s replacing black text with light blue text when copying a file from one presentation to another. I am ready for Excel charts to eliminate surprises when trying (note that I said trying) to customize an Excel chart. I am ready for Outlook not to corrupt its PST files. I am ready for Visio’s export filters to work without creating graphics excitement. And so on. I am ready for a search function that doesn’t motivate me to download an alternative from Exalead, ISYS Search Software, or another vendor including the IBM Yahoo clunker that runs on Lucene. The idea is that the user clicks a button and gets a useful “answer” is refreshing. I am eager to try this service. With more than my share of Word gotchas under my mouse cursor, I  have to be convinced.

Stephen Arnold, February 7, 2009

Oh, Oh, SEO Debunked

February 7, 2009

If you are skeptical of the carnival barkers who pitch SEO or search engine optimization, you will want to read TechCrunch’s “SEO at the Enterprise Level-A Major Flop” here by Jeff Widman. SEO refers to tricks and methods to cause a particular Web document to appear at the top of a results list. Instead of finding one’s Web page on the first page of a Google or Yahoo result list, one finds one’s Web page deep in the results list. With Google’s choke hold on Web search, SEO often becomes a digital duel between the SEO crowd and Google’s Knights of the Golden Keyboard. Google tries to identify “relevant” Web pages via numerical recipes. SEO heathens try to figure out what the GOOG is doing and find a spoof. Mr. Widman’s write up covers this topic in a more thorough manner, so you may want to take heed.

In my opinion, the SEO tweaks identified by trial and error are becoming increasingly onerous. SEO, like autumn, may be “the year’s last, loveliest smile.” The GOOG wants not to be tricked. It’s not nice to fool Mother Google. The TechCrunch write up explained in its interview with an SEO boffin:

Enterprise and SEO is like cognitive dissonance–SEO is nimble, experimental, dynamic, continuously iterating, never-ending process. A complete anathema to enterprise IT which is project focused, do it and forget it. There’s also an internal disconnect because SEO crosses IT and marketing. Example: changing from horrible URL’s–super long, no keywords in the URL–to cleaner, shorter URLs is a marketing driven initiative but entirely reliant on IT execution. Part of the problem lies in that the Fortune 500 enterprises rely on their ad agencies for the “interactive” stuff but the agencies don’t know how to integrate SEO requirements with branding. Lastly, Web sites are seldom built with SEO in mind; developers/programmers didn’t know what they didn’t know. It’s much like a house where the electrical wasn’t thought about until years later–a major multi-year project to redo it.

That’s clear to me. Before you spend up to $10,000 per month or more, consider the effort required to trick Googzilla. In my opinion, original content may be a less onerous burden.

Stephen Arnold, February 7, 2009

blinkx Has a New Home Page — Stop the Presses

February 6, 2009

Blinkx, http://www.blinkx.com, self-touted as the world’s largest video search engine, released a news alert that it has “re-launched” its home page with a couple “new” options: an inform me button that sends you news (this concept isn’t new) and an entertain me button that sends entertaining videos (that’s not a new idea either). They’re also promoting searching for videos visually (what a concept) and using speech tags in video (it’s been done). If companies like blinkx want to be in the spotlight, somewhat more steroid charged news might calm the Beyond Search goslings.

Jessica W. Bratcher, February 6, 2009

Googzilla versus the Bambies

February 6, 2009

Charles Cooper wrote “Google Killers? I Don’t Think So” here. I must admit that I would never have thought to identify Google’s Web search competitors as also rans. But after reading his Coop’s Corner post on February 5, 2009, I concede that he has a point. The hook for the article is a study by an advertising oriented trade magazine. And if anyone knows about Web search, it will definitely be an advertising oriented trade magazine. The sharp pencil crowd at AdWeek, according to Mr. Cooper, published “There’s Still Room for Google Killers”, a study. For me the most interesting comment in Mr. Cooper’s article was:

Consumers are still not loyal to a single engine. But Google still enjoys the most exclusivity–20 percent of all searchers use only Google on a weekly basis.

If a company enjoys a market share north of 60 percent, I would assert that some people are happy with the GOOG. When the competitors are not making much headway, I would suggest that those competitors are not pushing the right buttons. Mr. Cooper, like me, urges his readers to draw their own conclusions.

I think that the ad crowd is waking up too late to the reality of Googzilla. The good news is that most companies still want a pr or ad relationship at least some of the time. The bad news is that Google can disintermediate this service sector with a flick of its Googzilla tail. My suggestion to ad agencies: check out Google’s partner programs today.

Stephen Arnold, February 6, 2009

Great Bit Faultline: IT and Legal Eagles

February 6, 2009

The legal conference LegalTech generates quite a bit of information and disinformation about search, content processing, and text mining. Vendors with attorneys on the marketing and sales staff are often more cautious in their wording even though these professionals are not the school president type personalities some vendors prefer. Other vendors are “all sales all the time” and this crowd surfs the trend waves.

You will have to decide whose news release to believe. I read an interesting story in Centre Daily Times here called “Continuing Disconnect between IT and Legal Greatly Hindering eDiscovery Efforts, Recommind Survey Finds”. The article makes a point for which I have only anecdotal information; namely, information technology wizards know little about the eDiscovery game. IT wonks want to keep systems running, restore files, and prevent users from mucking up the enterprise systems. eDiscovery on the other hand wants to pour through data, suck it into a system that prevents spoliation (a fancy word for delete or change documents), and create a purpose built system that attorneys can use to fight for truth, justice, and the American way.

Now, Recommind, one of the many firms claiming leadership in the eDiscovery space, reports the results of a survey. (Without access to the sample selection method and details of the analytic tools, the questionnaire itself, and the folks who did the analysis I’m flying blind.) The article asserts:

Recommind’s survey demonstrates that there is significant work remaining to achieve this goal: only 37% of respondents reported that legal and IT are working more closely together than a year before. This issue is compounded by the fact that only 21% of IT respondents felt that eDiscovery was a “very high” priority, in stark contrast with the overwhelming importance attached to eDiscovery by corporate legal departments. Furthermore, there remains a significant disconnect between corporate accountability and project responsibility, with legal “owning” accountability for eDiscovery (73% of respondents), records management (47%) and data retention (50%), in spite of the fact that the IT department actually makes the technology buying decisions for projects supporting these areas 72% of the time. Exacerbating these problems is an alarming shortage of technical specifications for eDiscovery-related projects. Only 29% of respondents felt that IT truly understood the technical requirements of eDiscovery. The legal department fared even worse, with only 12% of respondents indicating that legal understood the requirements. Not surprisingly, this disconnect is leading to a lack of confidence in eDiscovery project implementation, with only 27% of respondents saying IT is very helpful during eDiscovery projects, and even fewer (16%) believing legal is.

My reaction to these alleged findings was, “Well, makes sense.” You will need to decide for yourself. My hunch is that IT and legal departments are a little like the Hatfields and the McCoys. No one knows what the problem is, but there is a problem.

What I find interesting is that enterprise search and content processing systems are generally inappropriate for the rigors of eDiscovery and other types of legal work. What’s amusing is a search vendor trying to sell to a lawyer who has just been surprised in a legal action. The lawyer has some specific needs, and most enterprise search systems don’t meet these. Equally entertaining is a purpose built legal system being repackaged as a general purpose enterprise search system. That’s a hoot as well.

As the economy continues its drift into the financial Bermuda Triangle, I think everyone involved in legal matters will become more, not less, testy. Stratify, for example, began life as Purple Yogi and an intelligence-centric tool. Now Stratify is a more narrowly defined system with a clutch of legal functions. Does an IT department understand a Stratify? Nope. Does an IT department understand a general purpose search system like Lucene. Nope. Generalists have a tough time understanding the specific methods of experts who require a point solution.

In short, I think the numbers in the Recommind study may be subject to questions, but the overall findings seem to be generally on target.,

Stephen Arnold, February 6, 2009

Google Squeezes into Mobile Books

February 6, 2009

Before noon, the ebook publishers were looking forward to the weekend. Sure, Friday was an office day, but in the new America, not too many people grind out an 18 hour day on Friday. Well, maybe some blue chip consultant fodder and Type A attorneys with a client who has deep pockets. But for the ebook crowd, Thursday is a run up to the TGIF cheer.

But at 11 56 Eastern time on February 5, 2009, ebook boffins got a surprise. The Google delivered 1.5 million books “in your pocket”. You can read Viresh Ratnakar’s and his colleagues’ chatty little blog post here. I not going to trouble you with the implications of this announcement.

You can surf the waves of Web log posts, pundit analyses and boffin bombast elsewhere. Just point your mobile browser to http://books.google.com/m. I wonder is the “m” stands for mayhem. Any thoughts? Oh, if I have any ebook executives among my three or four readers. Sorry about your run up to the weekend. Bummer.

Stephen Arnold, February 6, 2009

SurfRay Round Up: Herd Them Doggies, Pardner

February 5, 2009

I am not contributing any information that I have personally verified. What I want to do in this article is quote from the comments that have flowed into the mine run off pond here in Harrod’s Creek in the last week or so. If you have a beef with one of the quotes, please, navigate to the comments section of the Web log and object or you can use the Blossom search system to locate the person who posted the comment, and you can trade barbs there.

Background

More than five years ago, I did a job for Mondosoft. Even longer ago, I did a job for Speed of Mind. Both of these companies were rolled up into an outfit called SurfRay. SurfRay also got the Ontolica SharePoint fixer upper which came with the Mondosoft purchase. I sort of paid attention to SurfRay, but I was out of the loop when the financial and management pressures made the roll up possible.

surfray logo

What’s Happened?

To make a long and convoluted story short, SurfRay couldn’t generate enough cash to grow. When financial pressures mount, folks get angry. SurfRay followed this well known trajectory, which appeared to have ended with the most recent set of company filings in Sweden. At this point, I am going to excerpt the SurfRay information from the comments to my various SurfRay articles. These were to date:

  • August 29, 2008: SurfRay: Has the Company Missed the Search Wave. Nope : Beyond Search here
  • October 24, 2008: SurfRay Round Up : Beyond Search here
  • December 4, 2008: SurfRay Update : Beyond Search here
  • November 23, 2008: SurfRay Update: Beyond Search here
  • January 17, 2009: Financial Woes Swamp SurfRay here
  • July 6, 2008: SurfRay AB Update here
  • December 3, 2008: SurfRay Rumblings and Questions here
  • December 9, 2008: Danish Software Excitement here
  • January 27, 2009: SurfRay More Change here

Selected Comments Posted to the Beyond Search Articles

Below I am quoting from some of the submitted comments. You will need to verify the information and make your own decision. I am presenting what I received from readers who posted via the comments function on this Web log. If in doubt about how this Web log works, read the About section and its disclaimer and editorial policy. If uncomfortable with this goose pond, flap away now.

My story SurfRay: Has the Company Missed the Search Wave. Nope, August 29, 2008:

Bill Cobbs, then SurfRay CEO, wrote on August 29, 2008

I can assure both our clients and our partners that SurfRay is alive and well. It’s unfortunate that a minor phone glitch would lead to speculation regarding the viability of the company.

SurfRay is continuing our mission of providing cutting edge search technology to help our clients drive business results. Moving forward we are focusing on working with our clients to specifically identify business opportunities where they can create competitive advantage. We intend to provide search based solutions that have a very focused impact on driving bottom line results in both revenue generation and cost containment.

SurfRay is alive and well and launching the next wave of Search technology.

My story SurfRay Round Up on October 24, 2009

From Lars Petersen, wrote on October 30, 2008:

Related to the speed index I fully understand that this is an indexing engine, but the question was WHY haven’t SurfRay used this overfull hyperoptimising index instead of just relying on the slow index engine in SharePoint. And by not integrating and improve a little on the SharePoint search why do any organization need to buy it and why buy it from SurfRay (small company who may or may not provide service) when it can be bought from a must bigger company like BA-insight???

The attached roadmap for 2009 also gives me a bad feeling as I can see that Reporting now is postponed another 6-9 month!!! And at the same time SurfRay state that Ontolica has first priority on R&D resources??!! Can someone explain this for me….

Torben explained that second priority is Mondosoft Site Search, which I personally liked and it was also what my company used until bad service from the new owner got us to change search engine, but again later he state no plans yet to upgrade or replace the Enterprise Search offering. Could anyone at SurfRay please tell me again WHY we should go back to your Site Search when we have Omniture Site Search from a NASDAQ company providing more key functionality – Better relevancy etc, than Mondosoft’s 2 year old site search and the cost is a 1/10 of what SurfRay offers?

I have been a loyal fan of Mondosoft’s search and the support and maintenance back in 2006-7 unfortunately nothing on this site has proven to me that SurfRay is on the right track and it doesn’t make me “want to come home”.

Anyway Bill stated “Let me say unequivocally that I am now the CEO of SurfRay”, and you properly are Bill, but on surfray’s Web site under executive team, Martin Veise is mentioned first in a very long line of Member of the GROUP MANAGEMENT?! Is it normal that Chairman of the board is heading Group Management or ?

Read more

Google’s Medical Probe

February 5, 2009

Yikes, a medical probe. Quite an image for me. In New York City at one of Alan Brody’s events in early 2007, I described Google’s “I’m feeling doubly lucky” invention. The idea was search without search. One example I used to illustrate search without search was a mobile device that could monitor a user’s health. The “doubly lucky” metaphor appears in a Google open source document and suggests that a mobile device can react to information about a user. In one use case, I suggested, Google could identify a person with a heart problem and summon assistance. No search required. The New York crowd sat silent. One person from a medical company asked, “How can a Web search and advertising company play a role in health care?” I just said, “You might want to keep your radar active?” In short, my talk was a bust. No one had a clue that Google could do mobile, let alone mobile medical devices. Those folks probably don’t remember my talk. I live in rural Kentucky and clearly am a bumpkin. But I think when some of the health care crowd read “Letting Google Take Your Pulse” in the oh-so-sophisticated Forbes Magazine, on February 5, 2009, those folks will have a new pal at trade shows. Googzilla is in the remote medical device monitoring arena. You can read the story here–just a couple of years after Google disclosed the technology in a patent application. No sense in rushing toward understanding the GOOG when you are a New Yorker, is there? For me, the most interesting comment in the Forbes’s write up was:

For IBM, the new Google Health functions are also a dress rehearsal for “smart” health care nationwide. The computing giant has been coaxing the health care industry for years to create a digitized and centrally stored database of patients’ records. That idea may finally be coming to fruition, as President Obama’s infrastructure stimulus package works its way through Congress, with $20 billion of the $819 billion fiscal injection aimed at building a new digitized health record system.

Well, better to understand too late than never. Next week I will release a service to complement Oversight to allow the suave Manhattanites an easy way to monitor Google’s patent documents. The wrong information at the wrong time can be hazardous to a health care portfolio in my opinion.

Stephen Arnold, February 5, 2009

User Tracking Yahoo Style

February 5, 2009

Yahoo, if the news item in Web Pro News, is spot on, Yahoo is taking on an interesting challenge. “Yahoo to Start Keeping Tabs on Your Searches” by Chris Crumb documents Yahoo’s me-too of some discontinued Google features. Mr. Crumb said:

Search Pad for the Yahoo search engine. Essentially, it keeps track of your searches, figures out when you are researching things, and stores results of interest in a virtual notepad you can use for reference.

The write up provides links to additional information. The usage tracking implications are fascinating. The core of the write up is an interview with Tom Chi, Senior Director of Product Management with Yahoo Search. One of the most interesting comments was:

“This [service] follows the same data retention policy we have across Yahoo!,” explains Chi. “We recently announced a new policy.  Under the new policy, Yahoo! will anonymize user log data within 90 days with limited exceptions for fraud, security and legal obligations. Yahoo! will also expand the policy to apply not only to search log data but also page views, page clicks, ad views and ad clicks.

Usage tracking yields high value data. How will the user, law enforcement, and marketing communities respond? It’s too soon to tell.

Stephen Arnold, February 5, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta