Search Roll Up with CMS and eDiscovery

March 14, 2009

Two roads once diverged in a yellow wood. Now three roads merge into one muddy path. Why? Read on.

I read Barb Mosher’s “The Converging Paths of Search, eDiscovery and Enterprise CMS” here. My first pass through the article was swift. Then I went back through the write up thought about the Autonomy approach to growth: acquisitions, most recently in the eDiscovery sector. The article tackles end to end plays practiced by Open Text. The conclusion stressed that convergence is the path forward. On the surface, this view is supported by received wisdom and the actions of some high profile companies.

My view is somewhat different. First, I think search for some companies is indeed a dead end. The search technology is growing long in the tooth, and companies looking for solutions want to try newer approaches. One example is Google’s success with its Google Search Appliance, a system that certain large vendors find easy to criticize. The system may be simplistic, but the GOOG provides a potent way to make the GSA sit up and roll over. Furthermore, with about 25,000 appliances sold, the GOOG is the largest vendor of search solutions in the world. Other systems with newer technology that some big name vendors are selling in a lousy economy at a steady pace; for example, Coveo, Exalead, and ISYS Search Software.

image

Putting search, content management and eDiscovery in one system means a miserable path forward for the organization taking such an approach.

So what do big guys with no organic do to grow? Answer: buy promising opportunities. The fuel behind some of the acquisition activity is an inability to grow within a core market in an organic way. A short cut is needed. With some PR spin and a boatload of journalists looking for an angle, the notion of convergence gets a new lease on life.

Enterprise software is a complicated business. No company wants to have one system handle multiple tasks. The complexity of information and the context for certain content functions requires some granularity.

Read more

Searching for People

March 14, 2009

I ran across a useful summary of sources of information about people. The write up was the work of JR Raphael, and the story “People Search Engines: The Newest Web Privacy Threat” here. Mr. Raphael runs through some vertical search systems, providing tips to get useful results. The write up about Spokeo was useful. He mentioned one site with which I was not familiar, Rapleaf. His conclusion reminds the reader to be aware of what information is available. I downloaded and saved the story. Unfortunately, the publisher–an outfit called PCAdvisor–cluttered the pages with pop ups and annoying advertisements which made it a chore to read a useful article. I don’t think PCAdvisor is going to win me as a loyal reader with baloney getting in the way of the sirloin in its write ups. Too bad.

Stephen Arnold, March 14, 2009

Web Search Scoreboard

March 14, 2009

I got a lot of grief at a conference last year when I said, “Google has won the search game.” The conference organizer was annoyed because sponsors don’t want to hear that their money was wasted. Too bad. The stats about market share have understated Google’s dominance of Web search. I have seen data that pegs Google’s share at 80 percent and higher as long as 18 months ago. Believe me. The source of the data was solid and based on counts, not samples. Well, now the samples are reporting that the GOOG’s market share is in the 60 to 70 percent range. Imagine my surprise when I read ” Microsoft U.S. Search Share Hits 12-Month Low” here. The angle is not that Google has won. ComputerWorld’s approach was that Microsoft has not just lost share but Microsoft is falling further behind despite its effort to  close the gap. The ComputerWorld story supports my assertion that Google has won. Game over. Search is a digital service that is a natural monopoly. What’s amazing is that Microsoft thinks it can gain traction by buying or integrating Yahoo’s search service. In my opinion, Google will continue to operate like a giant magnet, pulling traffic to itself. A leapfrog play is needed, not a me too play.

Stephen Arnold, March 14, 2009

Google AOL Shock

March 14, 2009

I read an interesting article by Nicholas Carlson called “Googlers Shocked by Armstrong Defection” here. Google is search so when a Googler goes to AOL, it is a blow to Google search. Wrong. The blow hit Google in the ego. Mr. Carlson included some interesting items in his write up. For example, I quite liked this one:

He never oversaw product managers or engineers, for example.

And this one:

Not to say he isn’t a quick decision maker, “good at people,” a good listener and responsible for hiring everyone in Google’s US sales force, but Tim really was just a sales guy at Google.

Sounds like bruised egos talking to me. Why would a high profile leader go to a loser like AOL? Easy. Money and a chance to break free of Google’s somewhat odd culture. Just my opinion, of course.

Stephen Arnold, March 14, 2009

Google and World Domination

March 13, 2009

Ryan Singel’s “Google Voice Speaks of World Domination” here gave me a wake up call. Google’s prowess in telephony has been a topic that I long ago accepted. The company has had telephony and communications on its agenda from 1998. When we ran around the country in 2007 doing briefings about Google’s communications systems and methods, the attendees were eager to deny the Googlers’ cleverness in voice search (a Brin subspecialty) to cute ways to replicate a wireless infrastructure with low cost, low power gizmos and lots of innovation in between.

To be frank, slapping chat, SMS, and Skype-type comms into a Google “container” or service is not rocket science for Google. Sure, the company has to make sure that dependencies don’t befuddle its system or a line of code ruin a Googler’s lunch hour. The work is not invention; these are slipstreaming type features.

The title of the article–“Google Voice Speaks of World Domination”–was striking. The author Ryan Singel did a good job of explaining Google Voice, the “new” service that has the Twitterworld aflame. For me, the most important comment in the article after the title was:

Google Voice also threatens to disrupt voice-to-text startups like SpinBox, with built-in support for turning your voicemail messages into searchable text. Voice-to-text is one of the cornerstones of Google’s drive into mobile search. Google already uses the same technology to power GOOG-411 and the voice-activated search app for the iPhone. Getting even more samples — from messages left for users — will only help tune the algorithms for more lucrative ventures.

This paragraph makes clear the integration of the Google comms service and its disruptive potential, not just for smaller firms but for the big, telco dinosaurs. I say this with some affection since I was a Bell Labs’s contractor, worked on the Bellcore billing system for baby Bell charge backs, and also the USWest Yellow Pages service. Google is not a telco. Telco is just an application running on the Google infrastructure, what I call the Google infrastructure or Googleplex in honor of the buildings off Shoreline Drive.

Should you care? Yes, if you want to reduce for the short term your telecom hassles. Should the telcos caer? No, in my opinions telcos missed the train, and I don’t know when another will drop by Bell Head Station again. Should regulators care? Maybe. But regulators have a tough time understanding cable versus satellite TV so there’s a knowledge gap to fill. Should the blogosphere care? Absoltutely. Those who get it will carry Google type services to the future as the “obvious way” to perform certain functions.

Is this world domination? Not by Google in my opinion. The “legacy” of Google is that it shows the way cloud based services will supplant more widespread methods. Google’s legacy is that the company is a trail blazer. Others will follow and then go further. If this sounds like an interesting premise for a book, check out my 2005 The Google Legacy. This is the story I followed between 2002 and 2004 when I did my primary research. Old stuff to the addled goose. Just not world domination. That’s a reach in my view.

Google Invents Dynamic Virtual Input Device Configuration

March 13, 2009

The addled goose loves Google open source information. A case in point is US20090070098, a patent application for “Dynamic Virtual Input Device Configuration”. You don’t care! Well, I do. The conjunction of “dynamic” and “virtual” are significant to the addled goose. Vlad Patryshev and Google legal thought so too. Here’s the Googlespeak modified by legalspeak about what I think is an important disclosure:

In one aspect, a virtual input device can be configured by detecting a language identifier associated with a selected data entry field, determining a key mapping corresponding to the detected language identifier, configuring a virtual input device in accordance with the key mapping, wherein the virtual input device includes one or more controls and the key mapping specifies a character corresponding to at least one of the plurality of controls, and presenting the virtual input device to a user. The language identifier can comprise one of an Extensible Markup Language tag and a Hypertext Markup Language tag. Further, user input selecting a second data entry field can be received, wherein a second language identifier is associated with the second data entry field, a second key mapping corresponding to the second language identifier can be determined, and the virtual input device can be configured in accordance with the second key mapping.

Take your breath away? I thought of the applicaitons of this invention for data input and behind the scenes manipulation of those data–dynamic, virtual device configuration.

Stephen Arnold, March 13, 2009

Google and Display Advertising

March 13, 2009

A happy quack to the reader in a Near Eastern company who sent me a link to the Globes.co.il article “Israel Is Disproportionately Important to Google”. You can find it here. When I read this write up, I thought, “The author is not familiar with the GOOG’s China operation.” No problem. Individual Googlers are entitled to their data-centrism.

The March 12, 2009, article used an interview the Globes.co.il conducted with David Rosenblatt, originally from DoubleClick and now the Google VP responsible for display advertising. Mr. Rosenblatt’s comments made this write up quite useful. I don’t want to summarize the full story. I do want to highlight three points from the article that struck me as noteworthy. Your mileage may vary so read the original.

First, display advertising is big: “Global revenues form display advertising are still very concentrated: about 50% of advertising expenditure in this market goes to just 3-4 content distributors”. So what? The sector is immature which means Google can move in, bring economies of scale, inject efficiency. I think this is important and probably going to happen. Google’s current line up of competitors does not strike me as capable of mounting meaningful resistance. I can be wrong, but I am betting on Googzilla.

Second, Google’s bidding approach to setting prices is right for the present economic climate. Google’s system and method helps ensure that every ad gets sold. The unsold inventory decreases and everybody wins. Globes.co.il wrote, “The combination of Google and DoubleClick,” Rosenblatt says, “enables us to sell both our products, and we thus enable the whole world to compete for every seat on the flight. The seat goes to the customer who pays the highest price for it, and we succeed in making the maximum profit from each seat, that is, from every customer.”

Third, the recession won’t hurt every company equally. Google will be okay. “The online advertising market simply won’t grow, but it is certain that it will not weaken. As far as the market share the Internet has out of the general advertising cake is concerned, the current recession is actually good for the Internet.”

My take away: what’s good for the Internet is good for Google. Even if the economy is not so good for the Internet, Google will be not just okay, Google will grow its ad revenues. No rebranding. No paying for traffic. No becoming the search engine of NASCAR. Just a way for advertisers to reach potential buyers.

Stephen Arnold, March 13, 2009

Ignoring Twitter, Hazardous to Google Blog Search Traffic

March 13, 2009

Upfront let me say that data about traffic from reputable analytics shops are subject to considerable variance. The data are not “wrong”; the data represent a sample and must be viewed as “close enough for horseshoes”. The March 11, 2009, article “Twitter Search Traffic Po9ised to Eclipse Google Blog Search” here is interesting and suggestive, not definitive. Twitter, the two year old micro blogging service, is now being recognized as the leader in real time search. (Please, don’t write me to explain that another system is “real time”. Twitter’s real time means that the content exists in a transient form, so a query reflects the informational equivalent of taking a pulse.

The big takeaway from this Steve Rubel article was:

Consider this nugget. According to compete.com (an account is required to view this subdomain data), traffic to search.twitter.com tripled in the last six months. Meanwhile, Google Blog Search traffic is flat and, only until just recently, the same can be said for Technorati. More importantly, Twitter Search has just about eclipsed Google Blog Search. As of February, Twitter Search attracted 1.35 million users while Google Blog Search, which has been plagued by relevance issues, sits at 1.38 million users.

Even this addled goose has figured out that Twitter is doing something in search that Google, Microsoft, and Yahoo either cannot do because it has arteriosclerosis or because it sees the Twitter posts are trivial. In my tiny pond filled with mine run off, the Twitter content can yield useful, actionable information. Yesterday I explained its utility to a dozen law enforcement professionals. To my surprise, the listeners understood the value of the system. That was encouraging. Mr. Rubel’s interpretation of Compare data suggests others are on the Twitter wavelength as well.

Stephen Arnold, March 13, 2009

Browser Drag Racing

March 13, 2009

I recall the good old days of running speed tests on the PCs I used to build. Now I just buy Macs. I would load TACH or some other tool and watch the data appear as the system opened a faux Word document, rendered graphic primitives, and wrote files to the disc, then reading them. I never figured out what made a particular computer do well on one test and poorly on another. Even when I tested machines with the same motherboard, CPUs, and memory configurations, I would find wide margins of error. On the serious tests I ran when I was trying to figure out Google’s read write speeds in one of the company’s early technical papers, I identified weird differences on my identical IBM NetFinity 5500 quad processor, four gig, EXP 10 drive SCSI III storage devices, and the six Seagate Cheetahs I used as a Level 5 RAID boot device. Drove me crazy.

Now I read Emil Protalinkski’s “Microsoft’s Own Speed Tests Show IE Beating Chrome, Firefox” and have a flash back. You can find the useful write up here. He has reported on some interesting tests, including a useful table that shows IE 8. as the speed champ. For me, the most interesting point in his article was:

Microsoft chooses approximately 25 websites for daily testing, and tens of thousands on a monthly basis. If you’re going to do your own tests, Microsoft emphasizes that “any list of websites to be used for benchmarking must contain a variety of websites, including international websites, to help ensure a complete picture of performance as users would experience on the Internet.”

In my opinion, this comment does not go far enough. The tests have to be conducted in a rigorous manner in order to deal with latency. I also identified other variables that can affect speed tests:

  • Is the test machine or test machines running the benchmarks at the same operating temperature?
  • Is each machine running the same set of processes and tasks at the time the tests are conducted?
  • Are the sites being tested static pages or composite applications?
  • Is the test machine or machines operating with flushed caches, defragged drives, etc. when the tests are run?

Small frictional points can add up over time. Some of the variances in the Microsoft table included in Mr. Protalinkski’s article are modest in my opinion. Even with baseline systems the variances can be significant. In my opinion, the speed tests are helpful but not definitive.

The same issues apply to testing search systems. It’s easy to crank out a remarkable indexing benchmark until the real world content flow brings the weaknesses of the systems to center stage. I quit benchmark testing long ago, but I still find the data somewhat interesting.

Stephen Arnold, March 13, 2009

Google: Pope Pumps Google as HR Tool

March 13, 2009

Step away from the hoo hah about Google’s addition of communication functions to its services. Expected and old news. A more interesting twist in terms of search was the CNN story “Pope: We Should Have Googled Holocaust Bishop” here. The angle for the story was, according to CNN:

The Pope has admitted making mistakes over the lifting of the excommunication of a Holocaust-denying bishop, saying the church will make much greater use of the Internet in the future to help avoid such controversies… “I have been told that consulting the information available on the Internet would have made it possible to perceive the problem early on.”

Why’s this important? Information in the Google data management systems makes it possible to perform a crude, but useful, type of reputation analysis. Now envision a world in which voice to text, email, and Web content can be queried. The Pope understands that aggregated data is more useful than what one might hear from a single source or two. Why? Data have value when there are numerous points to analyze.

Will traditional Web search systems deliver what’s needed? No. Google has invested in technology that can add two new types of queries to its arsenal of tricks. I don’t know if Google will make these publicly available, but with Google gathering data from multiple input nodes, it is more important to move beyond the simple keyword and concept search. Not even semantics can do the job. Google has invested in queries that deliver a certainty score (how likely a data point is to be accurate) and lineage (where a data point comes from). Add these two much needed types of queries to Google’s arsenal of information and methods and you have human resources research tool that leapfrogs other systems’ capabilities.

The Pope gets it. I wonder how many others can look beyond obvious extensions of Google’s as is technology to the new frontiers. Kudos to the Pope. Not so much praise for the recycling of the Grand Central and other comms functions.

Stephen Arnold, March 12, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta