Google on Chrome: What We Meant Really… No, Really

September 4, 2008

You must read Matt Cutts’s “Google Does Not Want Rights to Things You Do Using Chrome”. First, click here to read the original clause about content and rights. Now read the September 3, 2008, post about what Google * really * meant to say here. I may be an addled goose in rural Kentucky but I think the original statements in clause 11.1 expressed quite clearly Google’s mind set.

It sure seems to me that the two Google statements–the original clause 11.1 and Mr. Cutts’s statements–are opposite to one another. In large companies this type of “slip betwixt cup and lip” occurs frequently. What struck me as interesting about Google is that it is acting in what I call due to my lack of verbal skill, “nerd imperialism”.

What troubles me is the mounting evidence in my files that Google can do pretty much what it wants. Mr. Cutts’ writing is a little like those text books that explain history to suit the needs of the school district or the publisher.

Google may house it lawyers one mile from Shoreline headquarters, but the fact is that I surmise that Google’s legal eagles wrote exactly what Google management wanted. Further I surmise that Google needs Chrome to obtain more “context” information from Chrome users. I am speculating but I think the language of the original clause was reviewed, vetted, and massaged to produce the quite clear statements in the original version of clause 11.1.

When the the firestorm flared, Google felt the heat and rushed backwards to safety. The fix? Easy. Googzilla rewrote history in my opinion. The problem is that the original clause 11.1 showed the intent of Google. That clause 11.1 did not appear by magic from the Google singularity. Lawyers drafted it; Google management okayed the original clause 11.1. I can almost hear a snorting chuckle from Googzilla, but that’s my post heart attack imagination and seems amusing to me. (I was a math club member, and I understand mathy humor but not as well as a “real” Googler, of course.)

If you have attended my lecture on Google’s container invention or read my KMWorld feature about Google’s data model for user data, are you able to see a theme? For me, the core idea of the original clause 11.1 was to capture more data about “information.” Juicy meta information like who wrote what, who sent what to whom, and who published which fact where and when. These data are available in a dataspace managed by a dataspace support platform or DSSP which Google may be building.

Google wants these meta metadata to clean up the messiness of ambiguity in information. Better and more data means that predictive algorithms work with more informed thresholds. To reduce entropy in the information it possesses, you need more, better, and different information–lots of information. For more on usage tracking and Google’s technology, you can find some color in my 2005 The Google Legacy and my 2007 Google Version 2.0. If you are an IDC search research customer, you can read more about dataspaces in IDC report 213562. These reports cost money, and you will have to contact my publishers to buy copies. (No, I don’t give these away to be a kind and friendly former math club member. Giggle. Snort. Snort.)

Plus, I have a new Google monograph underway, and I will be digging into containers, janitors, and dataspaces as these apply to new types of queries and ad functions. For me the net net is that I think Google’s lawyers got it right the first time. Agree? Disagree? Help me learn.

Stephen Arnold, September 4, 2008

IBM and Sluggish Visualizations: Many-Eyes Disappointment

September 1, 2008

IBM’s Boston research facility offers a Web site called Many Eyes. This is another tricky url. Don’t forget the hyphen. Navigate to the service at http://www.many-eyes.com. My most recent visit to the site on August 31, 2008, at 8 pm eastern timed out. The idea is that IBM has whizzy new visualization tools. You can explore these or, when the site works, upload your own data and “visualize” it. The site makes clear the best and the worst of visualization technology. The best, of course, is the snazzy graphics. Nothing catches the attention of a jaded Board of Directors’ compensation committee like visualizing the organization’s revenue. The bad is that visualization is still tricky, computationally intensive, and capable of producing indecipherable diagrams. A happy quack to the reader who called my attention to this site, which was apparently working at some point. IBM has a remarkable track record in making its sites unreliable and difficult to use. That’s a type of consistency I suppose.

Stephen Arnold, September 1, 2008

Why Dataspaces Matter

August 30, 2008

My posts have been whipping super-wizards into action. I don’t want to disappoint anyone over the long American “end of summer” holiday. Let’s consider a problem in information retrieval and then answer in a very brief way why dataspaces matter. No, this is not a typographical error.

Set Up

A dataspace is somewhat different from a database. Databases can be within a dataspace, but other information objects, garden variety metadata, and new types of metadata which I like to call meta metadata, among others can be encompassed. These are represented in an index. For our purpose, we don’t have to worry about the type of index. We’re going to look up something in any of the indexes that represent our dataspace. You can learn more about dataspaces in the IDC report #213562, published on August 28, 2008. It’s a for fee write up, and I don’t have a copy. I just contribute; I don’t own these analyses published by blue chip firms.

Now let’s consider an interesting problem. We want to index people, figure out what those people know about, and then generate results to a query such as “Who’s an expert on Google?” If you run this query on Google, you get a list of hits like this.

google expert

This is not what I want. I require a list of people who are experts on Google. Does Live.com deliver this type of output? Here’s the same query on the Microsoft system:

live expert output

Same problem.

Now let’s try the query on Cluuz.com, a system that I have written about a couple of times. Run the query “Jayant Madhavan” and I get this:

cluuz

I don’t have an expert result list, but I have a wizard and direct links to people Dr. Madhavan knows. I can make the assumption that some of these people will be experts.

If I work in a company, the firm may have the Tacit system. This commercial vendor makes it possible to search for a person with expertise. I can get some of this functionality in the baked in search system provided with SharePoint. The Microsoft method relies on the number of documents a person known to the system writes on a topic, but that’s better than nothing. I could if I were working in a certain US government agency use the MITRE system that delivers a list of experts. The MITRE system is not one whose screen shots I can show, but if you have a friend in a certain government agency, maybe you can take a peek.

None of these systems really do what I want.

Enter Dataspaces

The idea for a dataspace is to process the available information. Some folks call this transformation, and it really helps to have systems and methods to transform, normalize, parse, tag, and crunch the source information. It also helps to monitor the message traffic for some of that meta metadata goodness. An example of meta metadata is an email. I want to index who received the email, who forwarded the email to whom and when, and any cutting or copying of the information in the email to which documents and the people who have access to said information. You get the idea. Meta metadata is where the rubber meets the road in determining what’s important regarding information in a dataspace.

Read more

Dataspaces Analysis Available

August 29, 2008

IDC, the research giant near Boston, has issued for its paying customers “Google: A Push Beyond Databases”. The write up is part of the firm’s Technology Assessment series. Sue Feldman, the IDC search and content processing lead analyst and industry expert, is the lead author. I provided some background rowing. The result is a useful first look at a Google initiative that’s been rolling along since 2006. The 12-page document provides a brief definition of dataspaces, a timeline of key events, and several peeks into the technology and applications of this important technology. Ms. Feldman and I teamed to outline some of the implications that we identified. If you want a copy of this document, you will have to contact IDC for document #213562. If your company has an IDC account, you can obtain the document directly. If you wish to purchase a copy of this report, navigate to http://www.idc.com/ and click on the “Contact” link. As with my BearStearns’ Google analyses, I am not able to release these documents. I’m sure others know about dataspaces, but I find the topic somewhat fresh and quite suggestive.

This report is particularly significant in light of Google’s making its “golden oldie” technology MapReduce available to Aster Data and Greenplum. You can read about this here. Last year, I spoke with representatives of IBM and Oracle. I asked about their perceptions of Google in the database and data management business. Representatives of both companies assured me that Google was not interested in this business. Earlier this year, via a government client I learned that IBM’s senior managers see Google as a company that is fully understood by the top brass of the White Plains giant. My thought is that it must be wonderful to know so much about Google, its deal for MapReduce, and now the dataspace technology before anyone else learns of these innovations. The dataspace write up, therefor, will be interest to those who lack the knowledge and insight of IBM and Oracle wizards.

Stephen Arnold, August 29, 2008

Clearwell Systems: Making Pain Go Away in eDiscovery

August 27, 2008

I have had some experience as an expert witness. One thing I learned: real life law isn’t like TV law. The mind numbing tediousness of document review, discussing information germane to a legal matter, and talking about data have to be experienced to be understood.

When I saw a demo of Clearwell Systems last year, I was impressed with the company’s understanding of this brain killing work in eDiscovery; that is, the process of figuring out what info is buried in information generated in a legal matter.

Clearwell Systems has introduced a new version of its content analysis system, and it adds some additional and useful features to a good product. You can read about the new version here. In a nutshell, the most important features for me are:

  1. Improved search reports. This feature makes it possible to show where information came from. Clearwell talks about “black box” searching; that is, you enter terms and documents come out. The “transparent” approach produces an audit trail. Very useful.
  2. Tweaks to make the appliance go faster.
  3. Training wheels for formulating a query. Legal eagles are smart, but Clearwell adds training wheels to reduce the chance for a lousy query.

 For more information, navigate to Clearwell Systems at http://www.clearwellsystems.com.

Stephen Arnold, August 27, 2008

MapReduce: Google’s Database Probe Launched

August 26, 2008

Update 2, August 29, 2008, 1 50 pm Eastern

There’s an interesting and possibly relevant story on CNet here. Matt Asay wrote “Google’s Weird Ways with Open Source Licenses,” which became available on August 29, 2008. The core of the story is in the title. Open source licenses appear to be handled in a Googley way; that is, Google’s way. I sure don’t want to dispute the assertions that MapReduce as used by Aster Data and Greenplum is in any way affected by these “weird ways”. I do want to point you to this article and quote one sentence that was of interest to me:

As for the MPL, while DiBona doesn’t state it outright, I suspect that Google’s decision to re-up its commitment to Mozilla for three more years probably involved some strained discussions about Google’s weird decision to dump the MPL, one of the industry’s most popular open-source licenses.Regardless, all is well that ends well. Google came to the right decision, however odd the logic.

You can the Steve Shankland article, which touches upon the great MapReduce technology here. For something as simple as making code available as open source, there’s a lot of huffing and puffing. I’m watching for signs of smoke now. Wizards, pundits, and Googley types are welcome to add links, correct either of these authors, or opine with limited data via the comments on this addled goose’s Web log. What’s next for open source? The programmable search engine technology. That would be useful here in the hills of Kentucky.

Update 1, August 29, 2008, around 11 am Eastern

My comment about MapReduce triggered some keyboarding by various wizards. Thanks for the inputs. The point of the flurry is that MapReduce doesn’t have anything to do with Google. MapReduce is “in the wild” and anyone can make use of it. Nevertheless, I remain keenly interested in this technology for several reasons:

  1. MapReduce was the subject of a lecture given at the University of Washington several years ago by Jeffrey Dean and then written up as a paper.  You can snag a copy here.
  2. Google has been careful about the scope of its enterprise ambitions with regard to data management, data base, and data analysis. The company has been sufficiently circumspect as to make the key players in the database and data management market confident that Google’s enterprise ambitions are focused on search, maps, and light weight cloud applications. Forget the dashboard I wrote about. It’s light weight too.
  3. Aster Data is a company that came on my radar because of its “Googley nature”. I have picked up some suggestive comments about the robustness of the Aster Data technology and I learned from Aster Data that it is not interested in search. I believe that statement but I watch this space for interesting developments.

From my point of view, MapReduce–open source or any other variety–intrigues me. Based on my observation of things Google from my remote hide away in Harrod’s Creek, Kentucky, my hunch is that Google has a tiny bit of interest in how Aster Data and Greenplum use MapReduce, how their customers respond, and what interest the technology generates. In my lingo, Google learns from its environment. That’s why I sub titled my Google Version 2.0 study “the calculating predator”. Watching, learning, waiting–could this be part of the Map Reduce or broader Google goodness? I will let you know what I snag in my crawler.

Original Post Below

I wrote about Aster Data several weeks ago. If you are not familiar with the company, you may want to look at my article or navigate to the Aster Data Web site and get up to speed. It is an important company and is in the process of becoming more important.

InfoWorld’s “Database Vendors Add Google’s MapReduce” here reports that Google has cut a deal with Aster Data and Greenplum for Google’s nifty method of combining two separate functions into one instruction, reducing the “time” and computational cycles required to perform a task essential to chopping results from a larger data set. MapReduce is useful for certain operations with peta scale data.

Has Google entered the enterprise data management market? Not yet. Like Google’s interaction with Salesforce.com, Google is in “learn” mode. MapReduce by itself is not a complete data solution, but it provides some horsepower to Aster Data and Greenplum.

Will Google challenge IBM, Microsoft, and Oracle among others in the DBMS market? Google will watch and learn. Google has some serious data management capabilities in development. MapReduce is a golden oldie at Google.

When Google figures out what it wants to do to cash in on the pain many companies experience when using traditional database management systems, the Google will leap frog what’s available. For now, Google is no threat to DBMS vendors. In the future, who knows, probably not even Google until it gets enough hard data to justify a decision one way or the other.

Stephen Arnold, August 26, 2008

How Yahoo Will Catch Google in Search

August 25, 2008

Here’s an interview you must read. On August 25, 2008, the Financial Express (India) here published an interview with Yahoo’s super wizard, Prabhakar Raghavan. Dr. Raghavan is the head of research at Yahoo, a Stanford professor, and a highly regarded expert in search, database, and associated technologies. He’s even the editor of computer science and mathematics journals. A fellow like this can leap over Google’s headquarters and poke out Googzilla’s right eye. The interview, conducted by Pragati Verma, provides a remarkable look inside the plans Yahoo has to regain control of Web search.

There were a number of interesting factoids that caught my attention in this interview. Let me highlight a few.

First, Yahoo insists that the cost of launching Web search is $300 million. Dr. Raghavan, who is an expert in things mathematical, said:

Becoming a serious search player requires a massive capital investment of about $300 million. We are trying to remove all barriers to entry for software developers, who have ideas about how to improve search.

The idea is to make it easy for a start up to tap into the Yahoo Web index and create new services. The question nagging at me is, “If Web search is $300 million, why hasn’t Yahoo made more progress?” I use Yahoo once in a while, but I find that its results are not useful to me. When I search Yahoo stores, I have a heck of a time finding what I need. What’s Yahoo been doing since 1998? Answer: losing market share to Google and spending a heck of a lot more than a paltry $300 million losing ground.

Second, Google can lose share to search start ups. Dr. Raghavan said:

According to comScore data, Google had a 62% share of the US search market in May, while we had 21% and MSN 9%. Our prediction models suggest that Google could lose a big chunk of its market share, as BOSS partners and players come in.

My question is, “Since Google is vulnerable, why haven’t other search systems with funding made any headway; for example, Microsoft?” The notion that lots of little mosquitos can hobble Googzilla is not supported by Yahoo’s many search efforts. These range from Mindset to InQuira, from Flickr search to the deal with IBM, etc. Chatter and projections aside, Google’s share is increasing, and I don’t see much zing from the services using Yahoo index so far.

Finally, people don’t want to search. I agree. There is a growing body of evidence that key word search is generally a hassle. Dr. Raghavan said:

Users don’t really want to search. They want to spend time on their work, personal lives and entertainment. They come to search engines only to get their tasks done. We will move search to this new paradigm of getting the task done….

My question is, “How is Yahoo with its diffused search efforts, its jumble of technologies, and its inability to make revenue progress without a deal from Google doing to reverse its trajectory?” I wish Yahoo good luck, but the company has not had much success in the last year or so.

Yahoo lost its way as a directory, as a search system, and as a portal. I will wait to see how Yahoo can turn its “pushcart full of odds and ends” into a Formula One racer.

Stephen Arnold, August 25, 2008

Single Page Format

Linguistic Agents Unveils RoboCrunch

August 21, 2008

Linguistic Agents, based in Jerusalem, has rolled out RoboCrunch.com, a semantic search engine. You can read the company news release here. Like Powerset and Hakia, Linguistic Agents’ technology makes it possible to locate information by asking the system a question. This platform enables software to respond and act upon natural human language in the most intuitive fashion for users.

As of August 21, 2008, the system operates with two functions:

  1. Natural Language Inquiries are transformed to Advanced Search Queries
  2. Results are semantically sorted by relevance.

The developer–founded in 1999–plans to change the present method of Web Navigation by using its advanced semantic technology to better understand user’s information requests. The company says, “Linguistic Agents has developed an integrative language platform that is based on the most current research in the field of theoretical linguistics.”

I have written about this company before. Check out the demo. Let me know your impressions.

Stephen Arnold, August 21, 2008

Metadata Modeling

August 21, 2008

Embarcadero, in my opinion, is a software tools company. The company’s products allows developers and database administrators to design, build and run software applications in the environment they choose. The company says it has more than three million users of its CodeGear and DatabaseGear products.

The firm announced on August 19, 2008, that it had rolled out its ER/Studio Enterprise Portal. As I read the announcement here, ER/Studio Enterprise Portal is a search engine for data. The system “transforms the way metadata, business rules and models are located and accessed in the enterprise.”

As I thought about this phrasing, it struck me that Embarcadero wants to piggyback on the interest in search, portals, and metadata–all major problems in many organizations. The news story released on Business Wire includes this statement:

‘We’re doing for metadata what Google did for Web search. Today’s enterprise data explosion has made collecting and refining information time consuming for the architect and hard to understand for the business user,’ said Michael Swindell, Vice President of Products, Embarcadero Technologies. ‘ER/Studio Enterprise Portal dramatically simplifies the process for assembling, finding and communicating technical metadata.’

A couple of thoughts. Embarcadero tools can assist developers. No question about it. I am unsettled by two points:

  1. The suggestion that ER/Studio Enterprise Portal is a search engine. Search is a commodity in many ways. The term is ambiguous. I find it hard to figure out what this “portal” delivers. My hunch is that it is a metadata management tool.
  2. The suggestion that Embarcadero, founded in 1993, is “doing for metadata what Google did for Web search” is an example of wordsmithing of an interesting nature. “Google” is a magic word. The company generates billions of dollars and unnerves outfits like Verizon and Viacom. The notion that a software tool for managing metadata will have a “Google” effect amused me.

I find it harder and harder to figure out what business a company is in (“portal”, “search”, “metadata”) and what specific problem an company’s product solves. I’m a supporter of reasonably clear writing. Metaphors like “addled goose” can be useful as well. But mixing a stew of buzzwords leaves me somewhat confused and perhaps a bit suspicious of some product assertions.

Other companies in the metadata game are Wren and Access Innovations. What do you think?

Stephen Arnold, August 21, 2008

Powerset as Antigen: Can Google Resist Microsoft’s New Threat

August 20, 2008

I found the write ups about Satya Nadella’s observations about Microsoft’s use of the Powerset technology in WebProNews, Webware.com, and Business Week magnetizing. Each of these write ups converged on a single key idea; namely, Microsoft will use the Powerset / Xerox PARC technology to exploit Google’s inability to deal with tailoring a search experience to deliver a better search experience a user. The media attention directed at a conference focused on generating traffic to a Web site without regard to the content on that site, its provenance, or its accuracy is downright remarkable. Add together the assertion that Powerset will hobble the Google, and I may have to extend my anti-baloney shields another 5,000 kilometers.

Let’s tackle some realities:

  1. To kill Google, a company has to jump over, leap frog, or out innovate Google. Using technology that dates from the 1990s, poses scaling challenges, and must be “hooked” into the existing Microsoft infrastructure is a way to narrow a gap, but it’s not enough to do much to wound, impair, or kill Google. If you know something about the Xerox PARC technology that I’m missing, please, tell me. I profiled Inxight Software in one of my studies. Although different from Xerox PARC technology used by Powerset, it was close enough to identify some strengths and weaknesses. One issue is the computational load the system imposes. Maybe I’m wrong but scaling is a big deal when extending “context” to lots of users.
  2. Microsoft is slipping further behind Google. The company is paying users, and it is still losing market share. Read my short post on this subject here. Even if the data are off by an order of magnitude, Microsoft is not making headway in the Web search market share.
  3. Cost is a big deal. Microsoft appears to have unlimited resources. I’m not so sure. If Google’s $1 of infrastructure investment buys 4X the performance that a Microsoft $1 does, Microsoft has an infrastructure challenge that could cost more than even Microsoft can afford.

So, there are computational load issues. There are cost issues. There are innovation issues. There are market issues. I must be the only person on the planet who is willing to assert that small scale search tweaks will not have the large scale effects Microsoft needs.

Forget the assertion that Business Week offers when its says that Google is moving forward. Google is not moving forward; Google is morphing into a different type of company. “Moving forward” only tells part of the story. I wonder if I should extend my shields of protection to include filtering baloney about search emanating from a conference focused on tricking algorithms into putting a lousy site at the top of a results list.

Agree? Disagree? I’m willing to learn if my opinions are scrambled.

Stephen Arnold, August 20, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta