Google Exploring Visualization

February 6, 2009

In 2005, I saw some graphics of Google visualizations. I was not impressed. By chance, I looked at the Web site of an outfit called Gridplane. You probably know this company well. I did not. I clicked around and learned that Googzilla had hired Instrument to create some visualizations. You can find some information here. These visualizations triggered “yos” and “oys” from my tech team. This addled goose found the visualizations tough to interpret. You need to check out the examples and draw your own conclusion. My interest was in the company. The GOOG typically restricts contractors from telling their mother that Google is paying them. Gridplane makes this Google connection clear on its Web site. So, now we know that Google needs graphic inputs. Not revolutionary but interesting. Visualizations ring the chimes of enterprise and organizational customers. Hmmm?

Stephen Arnold, February 6, 2009

Great Bit Faultline: IT and Legal Eagles

February 6, 2009

The legal conference LegalTech generates quite a bit of information and disinformation about search, content processing, and text mining. Vendors with attorneys on the marketing and sales staff are often more cautious in their wording even though these professionals are not the school president type personalities some vendors prefer. Other vendors are “all sales all the time” and this crowd surfs the trend waves.

You will have to decide whose news release to believe. I read an interesting story in Centre Daily Times here called “Continuing Disconnect between IT and Legal Greatly Hindering eDiscovery Efforts, Recommind Survey Finds”. The article makes a point for which I have only anecdotal information; namely, information technology wizards know little about the eDiscovery game. IT wonks want to keep systems running, restore files, and prevent users from mucking up the enterprise systems. eDiscovery on the other hand wants to pour through data, suck it into a system that prevents spoliation (a fancy word for delete or change documents), and create a purpose built system that attorneys can use to fight for truth, justice, and the American way.

Now, Recommind, one of the many firms claiming leadership in the eDiscovery space, reports the results of a survey. (Without access to the sample selection method and details of the analytic tools, the questionnaire itself, and the folks who did the analysis I’m flying blind.) The article asserts:

Recommind’s survey demonstrates that there is significant work remaining to achieve this goal: only 37% of respondents reported that legal and IT are working more closely together than a year before. This issue is compounded by the fact that only 21% of IT respondents felt that eDiscovery was a “very high” priority, in stark contrast with the overwhelming importance attached to eDiscovery by corporate legal departments. Furthermore, there remains a significant disconnect between corporate accountability and project responsibility, with legal “owning” accountability for eDiscovery (73% of respondents), records management (47%) and data retention (50%), in spite of the fact that the IT department actually makes the technology buying decisions for projects supporting these areas 72% of the time. Exacerbating these problems is an alarming shortage of technical specifications for eDiscovery-related projects. Only 29% of respondents felt that IT truly understood the technical requirements of eDiscovery. The legal department fared even worse, with only 12% of respondents indicating that legal understood the requirements. Not surprisingly, this disconnect is leading to a lack of confidence in eDiscovery project implementation, with only 27% of respondents saying IT is very helpful during eDiscovery projects, and even fewer (16%) believing legal is.

My reaction to these alleged findings was, “Well, makes sense.” You will need to decide for yourself. My hunch is that IT and legal departments are a little like the Hatfields and the McCoys. No one knows what the problem is, but there is a problem.

What I find interesting is that enterprise search and content processing systems are generally inappropriate for the rigors of eDiscovery and other types of legal work. What’s amusing is a search vendor trying to sell to a lawyer who has just been surprised in a legal action. The lawyer has some specific needs, and most enterprise search systems don’t meet these. Equally entertaining is a purpose built legal system being repackaged as a general purpose enterprise search system. That’s a hoot as well.

As the economy continues its drift into the financial Bermuda Triangle, I think everyone involved in legal matters will become more, not less, testy. Stratify, for example, began life as Purple Yogi and an intelligence-centric tool. Now Stratify is a more narrowly defined system with a clutch of legal functions. Does an IT department understand a Stratify? Nope. Does an IT department understand a general purpose search system like Lucene. Nope. Generalists have a tough time understanding the specific methods of experts who require a point solution.

In short, I think the numbers in the Recommind study may be subject to questions, but the overall findings seem to be generally on target.,

Stephen Arnold, February 6, 2009

Google Squeezes into Mobile Books

February 6, 2009

Before noon, the ebook publishers were looking forward to the weekend. Sure, Friday was an office day, but in the new America, not too many people grind out an 18 hour day on Friday. Well, maybe some blue chip consultant fodder and Type A attorneys with a client who has deep pockets. But for the ebook crowd, Thursday is a run up to the TGIF cheer.

But at 11 56 Eastern time on February 5, 2009, ebook boffins got a surprise. The Google delivered 1.5 million books “in your pocket”. You can read Viresh Ratnakar’s and his colleagues’ chatty little blog post here. I not going to trouble you with the implications of this announcement.

You can surf the waves of Web log posts, pundit analyses and boffin bombast elsewhere. Just point your mobile browser to http://books.google.com/m. I wonder is the “m” stands for mayhem. Any thoughts? Oh, if I have any ebook executives among my three or four readers. Sorry about your run up to the weekend. Bummer.

Stephen Arnold, February 6, 2009

SurfRay Round Up: Herd Them Doggies, Pardner

February 5, 2009

I am not contributing any information that I have personally verified. What I want to do in this article is quote from the comments that have flowed into the mine run off pond here in Harrod’s Creek in the last week or so. If you have a beef with one of the quotes, please, navigate to the comments section of the Web log and object or you can use the Blossom search system to locate the person who posted the comment, and you can trade barbs there.

Background

More than five years ago, I did a job for Mondosoft. Even longer ago, I did a job for Speed of Mind. Both of these companies were rolled up into an outfit called SurfRay. SurfRay also got the Ontolica SharePoint fixer upper which came with the Mondosoft purchase. I sort of paid attention to SurfRay, but I was out of the loop when the financial and management pressures made the roll up possible.

surfray logo

What’s Happened?

To make a long and convoluted story short, SurfRay couldn’t generate enough cash to grow. When financial pressures mount, folks get angry. SurfRay followed this well known trajectory, which appeared to have ended with the most recent set of company filings in Sweden. At this point, I am going to excerpt the SurfRay information from the comments to my various SurfRay articles. These were to date:

  • August 29, 2008: SurfRay: Has the Company Missed the Search Wave. Nope : Beyond Search here
  • October 24, 2008: SurfRay Round Up : Beyond Search here
  • December 4, 2008: SurfRay Update : Beyond Search here
  • November 23, 2008: SurfRay Update: Beyond Search here
  • January 17, 2009: Financial Woes Swamp SurfRay here
  • July 6, 2008: SurfRay AB Update here
  • December 3, 2008: SurfRay Rumblings and Questions here
  • December 9, 2008: Danish Software Excitement here
  • January 27, 2009: SurfRay More Change here

Selected Comments Posted to the Beyond Search Articles

Below I am quoting from some of the submitted comments. You will need to verify the information and make your own decision. I am presenting what I received from readers who posted via the comments function on this Web log. If in doubt about how this Web log works, read the About section and its disclaimer and editorial policy. If uncomfortable with this goose pond, flap away now.

My story SurfRay: Has the Company Missed the Search Wave. Nope, August 29, 2008:

Bill Cobbs, then SurfRay CEO, wrote on August 29, 2008

I can assure both our clients and our partners that SurfRay is alive and well. It’s unfortunate that a minor phone glitch would lead to speculation regarding the viability of the company.

SurfRay is continuing our mission of providing cutting edge search technology to help our clients drive business results. Moving forward we are focusing on working with our clients to specifically identify business opportunities where they can create competitive advantage. We intend to provide search based solutions that have a very focused impact on driving bottom line results in both revenue generation and cost containment.

SurfRay is alive and well and launching the next wave of Search technology.

My story SurfRay Round Up on October 24, 2009

From Lars Petersen, wrote on October 30, 2008:

Related to the speed index I fully understand that this is an indexing engine, but the question was WHY haven’t SurfRay used this overfull hyperoptimising index instead of just relying on the slow index engine in SharePoint. And by not integrating and improve a little on the SharePoint search why do any organization need to buy it and why buy it from SurfRay (small company who may or may not provide service) when it can be bought from a must bigger company like BA-insight???

The attached roadmap for 2009 also gives me a bad feeling as I can see that Reporting now is postponed another 6-9 month!!! And at the same time SurfRay state that Ontolica has first priority on R&D resources??!! Can someone explain this for me….

Torben explained that second priority is Mondosoft Site Search, which I personally liked and it was also what my company used until bad service from the new owner got us to change search engine, but again later he state no plans yet to upgrade or replace the Enterprise Search offering. Could anyone at SurfRay please tell me again WHY we should go back to your Site Search when we have Omniture Site Search from a NASDAQ company providing more key functionality – Better relevancy etc, than Mondosoft’s 2 year old site search and the cost is a 1/10 of what SurfRay offers?

I have been a loyal fan of Mondosoft’s search and the support and maintenance back in 2006-7 unfortunately nothing on this site has proven to me that SurfRay is on the right track and it doesn’t make me “want to come home”.

Anyway Bill stated “Let me say unequivocally that I am now the CEO of SurfRay”, and you properly are Bill, but on surfray’s Web site under executive team, Martin Veise is mentioned first in a very long line of Member of the GROUP MANAGEMENT?! Is it normal that Chairman of the board is heading Group Management or ?

Read more

Google’s Medical Probe

February 5, 2009

Yikes, a medical probe. Quite an image for me. In New York City at one of Alan Brody’s events in early 2007, I described Google’s “I’m feeling doubly lucky” invention. The idea was search without search. One example I used to illustrate search without search was a mobile device that could monitor a user’s health. The “doubly lucky” metaphor appears in a Google open source document and suggests that a mobile device can react to information about a user. In one use case, I suggested, Google could identify a person with a heart problem and summon assistance. No search required. The New York crowd sat silent. One person from a medical company asked, “How can a Web search and advertising company play a role in health care?” I just said, “You might want to keep your radar active?” In short, my talk was a bust. No one had a clue that Google could do mobile, let alone mobile medical devices. Those folks probably don’t remember my talk. I live in rural Kentucky and clearly am a bumpkin. But I think when some of the health care crowd read “Letting Google Take Your Pulse” in the oh-so-sophisticated Forbes Magazine, on February 5, 2009, those folks will have a new pal at trade shows. Googzilla is in the remote medical device monitoring arena. You can read the story here–just a couple of years after Google disclosed the technology in a patent application. No sense in rushing toward understanding the GOOG when you are a New Yorker, is there? For me, the most interesting comment in the Forbes’s write up was:

For IBM, the new Google Health functions are also a dress rehearsal for “smart” health care nationwide. The computing giant has been coaxing the health care industry for years to create a digitized and centrally stored database of patients’ records. That idea may finally be coming to fruition, as President Obama’s infrastructure stimulus package works its way through Congress, with $20 billion of the $819 billion fiscal injection aimed at building a new digitized health record system.

Well, better to understand too late than never. Next week I will release a service to complement Oversight to allow the suave Manhattanites an easy way to monitor Google’s patent documents. The wrong information at the wrong time can be hazardous to a health care portfolio in my opinion.

Stephen Arnold, February 5, 2009

User Tracking Yahoo Style

February 5, 2009

Yahoo, if the news item in Web Pro News, is spot on, Yahoo is taking on an interesting challenge. “Yahoo to Start Keeping Tabs on Your Searches” by Chris Crumb documents Yahoo’s me-too of some discontinued Google features. Mr. Crumb said:

Search Pad for the Yahoo search engine. Essentially, it keeps track of your searches, figures out when you are researching things, and stores results of interest in a virtual notepad you can use for reference.

The write up provides links to additional information. The usage tracking implications are fascinating. The core of the write up is an interview with Tom Chi, Senior Director of Product Management with Yahoo Search. One of the most interesting comments was:

“This [service] follows the same data retention policy we have across Yahoo!,” explains Chi. “We recently announced a new policy.  Under the new policy, Yahoo! will anonymize user log data within 90 days with limited exceptions for fraud, security and legal obligations. Yahoo! will also expand the policy to apply not only to search log data but also page views, page clicks, ad views and ad clicks.

Usage tracking yields high value data. How will the user, law enforcement, and marketing communities respond? It’s too soon to tell.

Stephen Arnold, February 5, 2009

Speed Thrills and Slow Speed Chills

February 5, 2009

A happy quack who sent me a link to Geeking with Greg’s November 9, 206, item about speed and user behavior. You  can review the article here. For me, the most interesting point in this three year young write up was this comment:

Half a second delay caused a 20% drop in traffic. Half a second delay killed user satisfaction.

I spent a day looking at various online search systems. My reaction was that some of the sites took far too long to render results. I was using a high speed Internet connection in a big company that does advanced technology for a living. The network latency could have been a factor, but I was running test queries throughout the day. Some sites were slow.

Speed is a big deal for me. If Ms. Meyer’s data are correct, slow systems drive users away. Here are some general thoughts about my experience over the last eight hours on non stop testing.

  1. Yahoo mail and search from “ying” was consistently slow. I wonder if firing the PR person will address the severe latency within Yahoo’s subsystems
  2. Microsoft Live.com search was faster on popular search topics like Spears, American Idol, etc. Google was milliseconds slower on these queries. I think Microsoft is caching lots of popular searches and probably paying dearly for this approach.
  3. Google was consistently quick. Even when running infrequent queries on topics like emergency core cooling system and Boolean queries on Books, Google was snappy–consistently.
  4. Metasearch systems, particularly Ixquick.com, were snappy.

There were many piggies. Government sites were particularly annoying.

The conclusion I drew is that site unfortunate enough to respond in a sluggish manner from the point of view of the user will lose users. Next week I will highlight a high performance search system. I plan to allow vendors to index a specific corpus so visitors to this Web log can run head to head search system comparisons. You can  judge for yourself how speed affects your perception of a search system. If you have  search system and want to participate in this head to  head projects, watch for details next week,

Stephen Arnold, February 5, 009

 

Stephen Arnold, February 5, 2009

Microsoft Live Search Enhancements

February 5, 2009

Search Engine Journal has a useful summary of new features in its “Microsoft Adds Instant Answers to IE8 Live Search Box ” here. I like these types of Web log write ups. Saves me time. As I scanned the story three thoughts crossed my mind:

  1. Most of these features emulate operations supported by Google
  2. With traffic an issue for Microsoft’s Web search, features won’t by themselves have sufficient magnetic power to influence significantly usage
  3. Question answering services are experiencing a revival. When humans are involved, costs cannot be easily controlled. Automated question answering systems become more important unless the provider has deep pockets

One final question, “How does Fast ESP benefit or make use of these features?” Different search code bases add cost. Check out Live search here.

Stephen Arnold, February 5, 2009

Certified Search: Who Was First

February 5, 2009

I chuckled when I read “Autonomy Introduces Industry First Search Process Validation Module to Ensure Defensible Search” here. The story asserts:

Autonomy Corporation plc (LSE: AU. or AU.L), a global leader in infrastructure software for the enterprise, today unveiled the industry’s most advanced, forensically sound search module, Search Process Validation (SPV).

You can get more information about Autonomy here. The reason for my giggle? There some folks who have been “certifying” search results for a while. Check out Iron Mountain’s Stratify and Clearwell Systems.

Stephen Arnold, February 5, 2009

Mysteries of Online 4: The Bits Are Bits Fallacy

February 5, 2009

In a meeting last week, a young wizard said, “Bits are bits.” The context for this statement was a meeting to move an organization’s databased information and unstructured text online. The idea was that the task was trivial.

In fact, the task was a mixture of trivial and non-trivial sub tasks. So, bits are not the same because a zero and one may not behave like grains of salt. The ones and zeros may look the same, but one of the mysteries of online is that many factors bedevil the would be online entrepreneur. Google, for example, wants out of its AOL deal. Obviously the bit wizards at Google know that AOL bits are not Google bits here. But the GOOG dumped some serious coinage into the online company direct mail spam made famous.

 

image

Bits are bits just like penguins.

Here’s my list of factors, which is not complete and represents my thoughts to myself:

  1. Digital objects have stages. The source may be transformed, indexed, tokenized, and manipulated by two or more sub sub systems. Get these processes wrong, and weird behaviors become apparent. What’s wrong? Who knows. A person or persons have to figure it out, find a fix, and implement it. As this process goes forward, it becomes apparent  that the bits are a tad mischievous
  2. A fancy search system cannot locate a document or other object. Indexing systems may skip malformed documents, indexes may not update, and other issues annoy users. What went wrong? Who knows. A person or persons have to figure it out, find a fix, and implement it.
  3. A document returns a 404 or file not found error. The document used to exist because it is in the index. Now the document has gone walkabout. What’s wrong? Who knows. A person or persons have to figure it out, find a fix, and implement it.

Causes

I wish I had a fool proof way to prevent errors caused by this “bits are bits” fallacy. Much of he frustration generated by search, content management, and business intelligence systems have their roots wrapped tightly around the facile assumption that electronic information is no big deal. Electronic information is a big deal and for many organizations electronic information may be their undoing. The reason? Many assume that once a file is in electronic form, the rest is easy.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta