Google Twitter: Miscommunication

March 5, 2009

Henry Blodget’s “Google’s Schmidt: I Didn’t Diss Twitter” made me laugh. When I saw the blogosphere lightning strikes about an alleged remark by Google’s top wizard, I wondered if the reporters heard correctly. I don’t do hard news. I point to stories I find interesting. Mr. Blodget wrote on March 4, 2009, a story that allegedly set the record straight. You can read it here.

image

Which interstellar object is growing? Which is dying? Which is the winner? Which will become a charcoal briquette in a manner of speaking?

Please, navigate to Silicon Valley Insider because the good stuff is in capital letters with some words tinted red in anger. For me, the most interesting comment was:

In context if you read what I said, I was talking about the fact that communication systems are not going to be separate. They’re all going to become intermixed in various ways.

Several comments:

  1. The quote sounds like something I heard George Gilder say years ago. (For the record, the fellow who paid Mr. Gilder and me for advice sided with me about convergence. I prefer the term “blended”, and I still do.) Think a digital Jamba cooler.
  2. Google’s top Googler comes across as more politically sensitive. In Washington, DC, saying nothing whilst saying something that seems coherent is an art form. Mr. Schmidt is carrying a tinge of Potomac fever in my opinion.,
  3. The Twitter “thing” is clearly on Mr. Schmidt’s mind. My conclusion after reading the capital letters and red type is that Twitter has become a wisdom tooth ache. The pain is deep and it is getting worse.

No one is more interested in real time search than sentiment miners, intelligence professionals, and some judicially oriented researchers. The more the Twitter and real time search gains traction, the older and slower Google looks. In case you missed my post here, is this another sign of a generation gap between Google’s “old style” indexing and Twitter’s here and now flow? Note: Facebook.com is getting with the program too. eWeek has an interesting article here.

In my opinion we have a fuzzy line taking shape like those areas between galaxies that NASA distributes to show the wonders of the universe.

Metadata Perp Walk

March 5, 2009

I mentioned the problems of eDiscovery in a briefing I did last year for a content processing company. I have not published that information. Maybe some day. The point that drew a chuckle from the client was my mentioning the legal risk associated with metadata. I was reporting what I learned in one of  my expert witness projects. Short take: bad metadata could mean a perp walk. Mike Fernandes’ “Think You’re Compliant? Corrupt Metadata Could Land You in Jail” here tackled this subject in a more informed way than my anecdote. He does a good job of explaining why metadata are important. Then he hits the marrow of this info bone:

Data recovery cannot be treated as the ugly stepsister of enterprise backup, and the special needs that ECM systems place on backup must not be ignored. Regulatory authorities and industry experts are beginning to demand more ECM- and compliance-savvy recovery management strategies, thereby setting new industry-wide legal precedents. One misstep can lead to disaster; however, there are approaches and ECM solutions that help avoid noncompliance, downtime and other incidents.

If you are floating through life assuming that your metadata are shipshape, you will want to make a copy of Mr. Fernandes’ excellent write up. Oh, and why the perp walk? Bad metadata can annoy a judge. More to the point, bad metadata in the hands of the attorney from the other side can land you in jail. You might not have an Enron problem, but from the inside of a cell, the view is the same.

Stephen Arnold, March 5, 2009

Endeca: Push into Education and Training

March 5, 2009

Endeca, http://www.endeca.com, is expanding its information access software business by connecting education and training customers with more specialized solutions. You can read the press release here. Solutions from Endeca Education Services here, include customized training curriculum served on site or online. The goal is to get pre-packaged, flexible solutions that speed up business performance to their customers in these trying economic times. Part of the attraction of Endeca’s expanded offerings is the ability to pre-purchase training at a discounted rate. There are a lot of information access companies in this industry, and education services are particularly dependent on critical technology. It’s a really good move on Endeca’s part to expand in that venue to tap so much opportunity. On the other hand, the shadows of Apple and Google have begun to creep into the education market. Excitement ahead in a large business sector perhaps?

Jessica W. Bratcher, March 5, 2009

Yahoo: Inventing the Next Facebook

March 5, 2009

Reuters issued a story with the social network worm dangling in front of the Web surfers fishing for information. You can read “Yahoo CEO Interested in Social Networks and Search” here. The Thomson Reuters’ links can be slippery eels themselves, so the link may be dead when you read my comments. The story summarized Yahoo’s new chief executive’s comments at a bank’s high tech conference. Yep, banks. High tech. Credibility. Whatever. For me, the most remarkable comment attributed to Ms. Bartz, chief Yahoo, was:

“I do not believe we can invent the next Facebook,” Bartz said.

When I read this, I realized that Yahoo’s research and development effort might be redirected. Acquistions, partnerships, and deals may not require the innovations that have been released in the last year; for example, BOSS, the build your own search system. Maybe Yahoo will stick to its historical path of buying companies and trying to grow seedlings into giant redwoods. I liked Ms. Bartz’s pragmatism. I wonder what it means for Yahoo’s R&D initiatives and what companies will become the focal point for Yahoo. With “every business up for examination”, the stability of Yahoo may be a future goal, not a here and now reality.

Stephen Arnold, March 5, 2009

Facebook: Moving into Business Directory Territory

March 5, 2009

I saw “Facebook Creates New Profiles for Public Figures and Organizations” by Nicholas Kolakowski here. The write up struck me as important and indicative of a content processing opportunity. Facebook.com is one of the two online services that have managed to defy Googzilla. The other is the leader in real time search, Twitter.com. Facebook, according to the eWeek story:

Facebook announced on March 4 the launch of new profiles for public figures and organizations. These profile pages might belong to a large company or famous politician, but nonetheless will function like the “regular” user pages already present on the site. These new profiles will let their administrators post status updates, videos, and photos, as well as provide information via a real-time news feed to users.

When I read this passage, I thought business directory. The “old” Hoover’s pointed toward a new, more useful type of business directory. Dun & Bradstreet (quite a url the dnb.com moniker) dominates this sector. Hoover’s disappeared into the D&B combine and its usefulness has deteriorated. The void has not been filled on a large scale. The eWeek story sparked the thought in my addled goose brain that Facebook.com organization pages could revivify the business directory business.

Why? Three reasons:

  1. User generated content plus content generated by the system (in this case Facebook.com) may provide a nice mix of subjective and objective information, particularly for publicly traded companies.
  2. The possibility of linking Facebook.com users to a particular organization is a potentially useful tool for text mining and relationship analysis.
  3. The updating problems of the traditional business directory companies could be eliminated. Facebook.com’s could update pages in near real time. A Twitter like function for news about an organization would be a boon to researchers, analysts, job hunters, and law enforcement.

What will Facebook.com do? People are now worrying about what Google would do a decade after the company began its run? A 20 something might want to ask the question about Facebook.com-like outfits. And D&B? I am not sure the company is aware of Facebook-like companies and their potential in business directory information. My hunch is that a Facebook-like business directory could erode traditional business directory revenues. Maybe I’m off base, so set me straight.

Stephen Arnold, March 5, 2009

Libraries: A Tipping Point in Commercial Online

March 5, 2009

Libraries find themselves in a tough spot. The economic downturn has created a surge in walk in traffic. In Louisville, Kentucky, I watched as patrons waited to use various online systems available. I spoke with several people. Most were looking for employment information or government benefit resources. I pop into the downtown library a couple of times a month, and at 4 pm on a Thursday, the place was busy.

In Massachusetts, four libraries found themselves in the spotlight. According to the Wicked Local Brockton here, “Wareham, Norton Libraries Lose Certification; Brockton, Rockland Given Reprieve”. The libraries, according to Maria Papadopoulos’ article had cut their budgets too much. As a result, the libraries lost their state certification, which further increases budget pressure. Across the country the Seattle Post Intelligencer reported “Big Challenges Await City’s New Librarian.” Kathy Mulady wrote:

Actual library visits are up 20 percent, and virtual visits online are up even more. About 13 million people visited city library branches last year.

That’s the good news. The bad news is that Seattle, home of Amazon (king of ebooks) and Microsoft (the go-to company for software and online information) has a budget crunch. The new library director will have to deal with inevitable financial pressure at a time when demand for services is going up. Tough job.

What’s this mean for commercial online services?

image

View of a collision between light rail and a freight locomotive. Will this happen when library budgets collide with the commercial online vendors in 2010? Image source: http://www.calbar.ca.gov/calbar/images/CBJ/2005/Metrolink-Train-Wreck.jpg

My view is that the companies dependent on libraries for their revenue will be facing a very lean 2009. The well managed companies will survive, but those companies that are highly leveraged may find themselves facing significant revenue pressure. Most of the vendors dependent on libraries for revenue are low profile operations. These companies aggregate information and make that information available to individual libraries or to groups of libraries that join together to act as a buying club. Most library acquisitions occur on a cycle that is governed by the budget authority funding a library. In effect, library vendors will receive orders and payments in 2009.

The big crunch may occur in 2010. When that happens, the library vendors will be put under increasing pressure. I have identified three potential developments to watch.

First, I think some high profile library dependent information companies will be forced to merge, cut back on staff and product development, or shut their doors. Size of the library centric company may not protect these firms. The costs of creating and delivering electronic information of higher value than this goose-based Web log are often high and difficult to compress. The commercial database companies are dependent on publishers for content. Publishers are in a difficult spot themselves. As a result, the interlocks between commercial publishing, traditional database companies, and libraries are complex. Destabilize one link and the chain disintegrates. No warning. Pop. Disintegration.

image

Image source: http://harvardinjurylaw.com/broken-chain.jpg

Second, the libraries themselves are going to have to rethink what they do with their budgets. This type of information decision has been commonplace for many years. For example, libraries have to decide what books to buy. Libraries have to decide what percent of their budget gets spent on periodicals in print or online. Libraries have to decide whether to cut hours or cut acquisitions. Libraries, in short, make life and death information decisions each day. The forced choices mean that libraries have to decide between serving patrons with online access to Internet resources or online access to high value information sources like those purchased from Cambridge Scientific Abstracts (privately held), Ebsco (privately held), Reed Elsevier (tie up between two non US commercial entities, one Dutch, one British), Thomson Reuters (public company)  Wolters Kluwer (public, non US company) and some other companies that are not household names. Free services from Google, Microsoft, and Yahoo plus Web logs, Twitter, and metasearch systems like IxQuick.com would look pretty good to me when I had to decide between a $200,000 payment to a commercial database company and providing services to my patrons, students, and consortium partners.

Third, Google’s steady indexing of content in Google Books and in its government service and the general Google Web index offers an alternative to the high value, six figure deals that library centric information companies pursue. If I were working in a library, I would not hesitate to focus on Google-type resources. I would shift money from the commercial database line item to those expenses associated with keeping the library open and the public access terminals connected to the Internet available.

In short, the economic problems for companies in the search and content processing sector are here-and-now problems. The managers of these firms need to make sales in order to stay in business. The library centric information companies are sitting on railroad tracks used by the TGV, just waiting for the real budget collision to arrive. The traditional library information companies cannot get off the tracks even though they know the 2010 is going to arrive right on schedule.

I want to steer clear of these railroad tracks. Debris can do some collateral damage.

Stephen Arnold, March 5, 2009

Maybe the Google Fatal Flaw Revealed

March 4, 2009

Mashable, the go to Web log for interesting cloud applications, has a blockbuster of an article here. “Why Googlers Are Leaving to Start Social Sites (And Invites to One of Them)” reveals a flaw at Google that is likely to get worse before Googlers address the issue. The comment that triggered this post was:

According to Reddy, “Most Google infrastructure is based on the original search thinking that scaling is done by using lots of cheap hardware using software layers to protect against machine failures. While this works really well for certain problem classes, there is a “scalability and complexity tax” which most new services pay in terms of development speed, even though they don’t need it in the initial phases.” The reason she sees more opportunity with Likaholix is because Google can’t always leverage open source tools due to infrastructure limitations, where Likaholix can “leverage as many open source tools as possible,” admitting that they “could not have made this much progress over the last 7 months or so in terms of product, UI and engineering if we were to build this at Google.”

Quite an interesting point. If this statement is accurate, Google’s not inept; Google is hamstrung by the wizardry that catapulted it to a dominant position in Web search. The Innovator’s Dilemma comes alive.

Stephen Arnold, March 5, 2009

Yahoo BOSS Queries per Second

March 4, 2009

Update: March 4, 2009, 5 30 pm Eastern. A relevant link from Lemur Consulting: http://www.flax.co.uk/blog/2009/03/04/performance-metrics/

A number of readers have commented via the blog feedback and by email about the Autonomy metrics I summarized here. To provide some baseline data, I dipped into my search archive an located an item that appeared in Search Engine Land in December 2008. I don’t know if these data are accurate, but judging from the feedback on the Autonomy metrics, readers are not shy about providing other data points. You can find the “Yahoo BOSS Now Serving 100 Queries per Second” write up here.

I also had a copy of a presentation given in 2004 by  Gurmeet Singh, Information Sciences Institute. You can still find those data here. What is interesting about these data is that a Web interface chops down the query per second rate. In the 2004 report, 8,000 queries per second were achieved on the test system without the Web interface. Combine a large database with a Web interface and the QPS rate drops to hundreds of queries per second. Complex queries knocks performance down as well. The 2004 data hit 800 queries per second with greater drop offs when the database is larger. As I reviewed these 2004 data, I recalled reading Google technical documents about the importance of optimizing for Web interfaces. Google’s engineers must have experienced similar performance in their experience which maybe influenced the “speed” angle of Chrome. Who knows? Google won’t tell me.

You will have to draw your own conclusions about:

  • Autonomy’s performance data cited above
  • The validity of the ISI data
  • The computational capability of Yahoo BOSS
  • Google’s 2000 queries per second referenced in my Autonomy summary.

In my experience, metrics for search systems are difficult and expensive to determine. The variables have to be squeezed out when comparing systems. The notion of an apples to apples comparison is difficult in today’s financial climate. I take most performance data with a liberal amount of nuoc cham.

Stephen Arnold, March 4, 2009

YAGG: Google Groups

March 4, 2009

ZDNet Web Logs reported here that “Archived discussions on all Google Groups were unavailable for a short time this afternoon.” Groups are social. Social is hot. Might be a good idea to keep these services up and running. And don’t beat up on me. I’m not a Googler, just an addled goose. An old addled goose. I am just pointing to the post by Ed Burnette. If you have forgotten, YAGG is yet another Google glitch. Coming with shorter intervals between each event. What do you think? Normal? Signs of deterioration? Growing pains? Indifference? I am clueless.

Stephen Arnold, March 4, 2009

SEO Cheat Sheet

March 4, 2009

I never thought much about cheating in school. I just grunted along and took what grades I earned. For readers who do have a fondness for cheat sheets, short cuts, and line jumping, here’s a link for you–“The Web Developer’s SEO Cheat Sheet” by Danny Dover. There’s a link on the site to a PDF version of the “cheat sheet”. You will learn about “important” tags, indexing limits aka stay under the stuffing ceilings, syntax, url conventions, redirects, bot factoids, bot traps, robots.txt syntax, and sitemap syntax. I scanned the tips and concluded that this is less of a “cheat sheet” that a check list to avoid silly errors. One person’s cheat sheet is another person’s reminders.

Stephen Arnold, March 4, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta