Guha and the Google Trust Method Patent

October 16, 2009

I am a fan of Ramanathan Guha. I had a conversation not long ago with a person who doubted the value of my paying attention to Google’s patent documents. I can’t explain why I find these turgid, chaotic, and cryptic writings of interest. I read stuff about cooling ducts and slugging ads into anything that can be digitized, and I yawn. Then, oh, happy day. One of Google’s core wizards works with attorneys and a meaningful patent document arrives in Harrod’s Creek goose nest.

Today is such a day. The invention is “Search Result Ranking Based on Trust” which you can read courtesy of the every reliable USPTO by searching for US7,603,350 (filed in May 2006). Dr. Guha’s invention is described in this patent in this way:

A search engine system provides search results that are ranked according to a measure of the trust associated with entities that have provided labels for the documents in the search results. A search engine receives a query and selects documents relevant to the query. The search engine also determines labels associated with selected documents, and the trust ranks of the entities that provided the labels. The trust ranks are used to determine trust factors for the respective documents. The trust factors are used to adjust information retrieval scores of the documents. The search results are then ranked based on the adjusted information retrieval scores.

Now before you email me and ask, “Say, what?”, let me make three observations:

  • The invention is a component of a far larger data management technology initiative at Google. The implications of the research program are significant and may disrupt the stressed world of traditional RDBMS vendors at some point.
  • The notion of providing a “score” that signals the “reliability” or lack thereof is important in consumer searches, but it has some interesting implications for other sectors; for example, health.
  • The plumbing to perform “trust” scoring on petascale data flows gives me confidence to assert that Microsoft and other Google challengers are going to have to get in the game. Google is playing 3D chess and other outfits are struggling with checkers.

You can read more about Dr. Guha in my Google Version 2.0. He gets an entire chapter (maybe 30 pages of 10 pt type) for a suite of inventions that make it possible for Google to be the “semantic Web”. lever company, brilliant guy, Guha is.

Stephen Arnold, October 15, 2009

Dust Up between Libraries and Publishers Possible

October 16, 2009

The New York Times reported that some libraries are lending digital books. You will want to read the original article “Libraries and Readers Wade Into Digital Lending” yourself. For me the most important statement in the write up was:

Publishers, inevitably, are nervous about allowing too much of their intellectual property to be offered free. Brian Murray, the chief executive of HarperCollins Publishers Worldwide, said Ms. Smith’s proposal was “not a sustainable model for publishers or authors.”

I am intrigued that Microsoft and Yahoo pulled out of the digital book game. Google faces tough sledding but a compromise seems to be possible. Even government national libraries are slow to the starting line.

The world’s traditional book foundations seem to be under increasing stress. Exciting. The business model of libraries is about to collide with the business model of publishers. After centuries of living in harmony, friction seems to be increasing.

Stephen Arnold, October 17, 2009

LexisNexis Jumps on Semantic Bandwagon

October 15, 2009

Pure Discovery, a Dallas based search and content processing company, has landed a mid-sized tuna, LexisNexis. Owned by publishing giant Reed Elsevier, LexisNexis faces some strong downstream water. The $1 billion plus operation is paddling its dugout canoe upstream. Government agencies, outfits like Gov Resources, and the Google are offering products and services that address the squeals from law firms. What is the cause of the legal eagle squeaks? The cost of running searches on the commercial online services like LexisNexis and Westlaw, among others like Questel. Clients are putting caps on some law firm expenditures. Even white shoe outfits in New York and Chicago are feeling the pinch.

I saw one short news item about this tie up in an article in Search Engine Watch.

Patent searching is a particularly exciting field of investigation. If you click over to the responsive USPTO, you can search patents for free. Tip: Print out the search hints before you begin. I am not sure who is responsible for this wonderful search system, but it is a wonder.

Semantic technology along with other sophisticated content processing tools can make life a little – notice the word “little” – easier for those conducting patent research. Even the patent examiners have to use third party systems because the corpus of the USPTO is a bit like a buggy without a horse in my opinion.

The company that LexisNexis tapped to provide its semantic technology is Pure Discovery in Dallas, Texas. I had one reference to the firm in my Overflight service and that was to an individual named Adam Keys, Twitter name therealadam. Mr. Keys left Pure Discovery in 2006 after two years at the company. I had a handwritten note to the effect that venture funding was provided in part by Zon Capital Partners in Princeton, New Jersey. I have little detail about how the Pure Discovery system works.

Here’s a description of the company I pulled from Zon’s Web site:

Pure Discovery (Dallas, TX) has developed enterprise semantic web software. Its offering combines automated semantic discovery with a peer networking architecture to transform static networks into dynamic ecosystems for knowledge discovery.

I snagged a few items from the firm’s Web site.

The product line up consists of KnowledgeGraph products. These include the PD BrainLibrary (“BrainLibrary is a breakthrough technology that harnesses the collective intelligence of organizations and their people in ways that have never been possible before), PD Transparent Concept Search (“PD Concept Search has completely removed the top off the black box and for the first time ever, users are not only able to see what has been learned by the system, but also use our QueryCloud application to control it.”), PD QueryCloud Visual Query Generator (“QueryCloud then lets users control what terms or phrases are used, not used, emphasized or de-emphasized. All with the simple click of a button.”), PD Clustering (“D Clustering dynamically orders similar documents into clusters enabling users to browse data by semantically related groups rather than looking at each individual document. PD Clustering is fast enough to cluster even the largest of document populations with a benchmark of over 80 million pages clustered in a 48 hr period on a single machine.”), and PD Near-Dupe Identification (“PureDiscovery’s Near-Dedupe Identification Engine provides instant value to any application by detecting and grouping near duplicate documents. Identifying documents with these slight variances results in dramatic savings in time wasted looking at the same document again and again.”) This information is from the Pure Discovery Web site here.

The company also offers its Transparent Concept Search Query Cloud.

The software is available for specific vertical markets and niches; for example, litigation support, “human capital management” (maybe human resources or knowledge management?), intellectual property, and homeland security and defense.

These are sophisticated functions. I look forward to examining the LexisNexis patent documents using this new tool. Perhaps LexisNexis has found a software bullet to kill the beasties chewing into its core business. If not, LexisNexis will face that rushing torrent without a paddle.

As more information flows to me, I will update this write up.

Stephen Arnold, October 15, 2009
I wrote this short post without so much as a thank you from anyone.

Google Probes the Underbelly of AutoCAD

October 15, 2009

Remember those college engineering wizards who wanted to build real things? Auto fenders, toasters, and buildings in Dubai. Changes are the weapon of choice was a software product from Autodesk. Over the years, Autodesk added features and functions to its core product and branched out into other graphic areas. In the end, Autodesk was held captive by the gravitational pull of AutoCAD.

In one of my Google monographs, I wrote about Google’s SketchUp program. I recall several people telling me that SketchUp was unknown to them. These folks, I must point out, were real, live Google experts. SketchUp was a blip on a handful of users’ radar screen. I took another angle of view, and I saw that the Google coveted the engineering wizards when they were in primary school and had a method for keeping these individuals in the Google camp until they designed their last, low-cost fastener for a green skyscraper in Shanghai.

No one really believed that this was possible.

My suggestion is that some effort may be prudently applied to rethinking what the Google is doing with engineering software that makes pictures and performs other interesting Googley tricks. The first step could be reading the Introducing Google Building Maker article on the “official” Google Web log. I would gently suggest that the readers of this Web log buy a copy of the Google trilogy, consisting of my three monographs about Google technology. Either path will give you some food for thought.

For me, the most interesting comment in the Google blog post was:

Some of us here at Google spend almost all of our time thinking about one thing: How do we create a three-dimensional model of every built structure on Earth? How do we make sure it’s accurate, that it stays current and that it’s useful to everyone who might want to use it? One of the best ways to get a big project done — and done well — is to open it up to the world. As such, today we’re announcing the launch of Google Building Maker, a fun and simple (and crazy addictive, it turns out) tool for creating buildings for Google Earth.

The operative phrase is “every built structure on early”. How is that for scale?

What about Autodesk? My view is that the company is going to find itself in the same position that Microsoft and Yahoo now occupy with regard to Google. Catch up is impossible. Leap frogging is the solution. I don’t think the company can make this type of leap. Just my opinion.

Stephen Arnold, October 15, 2009
Another freebie. Not even a lousy Google mouse pad for my efforts.

Oracle Taps Brainware

October 15, 2009

The Reuters’s story “Brainware Signs OEM Agreement with Oracle for Intelligent Data Extraction” caught me and probably the folks at ZyLAB and other content processing companies by surprise. Brainware and its patented trigram technology has created strong believers in some markets such as litigation support. But the company has been working to strengthen its content acquisition functionality as well. The idea is that paper and electronic information enter at one end and searchable at the other. Oracle has been lagging in search. The Triple Hop technology has not taken center stage in my opinion. The Brainware deal seems to be for the content acquisition functions, what the news story calls “intelligent data capture”; that is, scanning and transforming functions plus entity extraction. Will Oracle embrace Brainware’s search and retrieval technology as well? Good question. Secure Enterprise Search needs some vitamins in my opinion. My hunch is that Oracle is beefing up its back end content intake system in order to deal with the increasingly successful Autonomy combine which continues to put pressure on big boys like Oracle. Brainware benefits from the publicity this tie up will produce. Search vendors, in my opinion, need this type of buzz to light up the radar of information technology professionals who too often focus on three or four search vendors, ignoring some interesting  alternatives.

Stephen Arnold, October 14, 2009

Google Wave Made Simple

October 15, 2009

Short honk: A happy quack to the reader who sent me a link to “Google Wave – A Complete Guide”. You get screenshots and a run down of the principal components of Google Wave. Wave, like Google Squared, is a subset of a larger Google technology initiative. Most analysts focus on the specific demos and betas, not the big Google technology initiatives. If you want a useful intro to Wave, this document eliminates chasing down Google’s dribs and dabs of information.

If you want info about Wave extensions, you can get a run down here.

Stephen Arnold, October 15, 2009, written for free because we love Google

How to Make Money with Google AdSense Video Released

October 15, 2009

You can watch a four minute video that provides a quick primer on how to make money with AdSense. To view the video, navigate to ArnoldIT.com and click on the Video link or click here. The video has been produced by the ArnoldIT.com team to fill a gap in the flood of information about AdSense. “The idea,” said Stephen E. Arnold, “was to put in one place a quick overview and links that a person needs to get started with AdSense. My hope is that libraries will point patrons who want to find possible business ideas to these videos.” He added, “Google provides the information, but we learned from our client work that a quick overflight of the Google money making options as needed.”

The video series was announced at the International Computers in Library Conference in London, England today, October 15, 2009. In his talk he said, “Google offers the same type of opportunity for third parties as did Microsoft in the early 1980s. In these tough economic times, an understanding of the revenue potential the Google platform provides is a prudent business step.”

Five more videos in the “How to Make Money with Google” series will be released in the coming weeks. A person looking for extra revenue or a way to build a new career by focusing on the opportunities presented by the Google platform can view one or more of these videos to get ArnoldIT.com’s view about what Google offers.

The next free video “Search Engine Optimization Consulting” will be released at the end of October 2009. Other free videos in the series cover writing programs for the Google platform, becoming a Google partner, and introductory and wrap up videos.

The videos are provided without charge for two reasons. According to Mr. Arnold, “We received client questions and spam promising “get rich quick” schemes regarding Google. I decided it would be a useful exercise to produce brief, factual videos to make clear that Google is a significant opportunity for motivated individuals, organizations, and commercial enterprises. Many people see Google as a one trick pony, even though Google has matured into a platform for programmers, consultants, and computer service businesses.”

The full series will be out by the end of November 2009 and can be viewed as individual videos or as a 35 minute program. ArnoldIT.com is not affiliated with Google. The videos were designed and funded by Stephen E. Arnold.

Jessica Bratcher, October 15, 2009
No one paid for this write up.

Easing Data Transformation Woes

October 15, 2009

Transformation is the Latinate mumbo jumbo that information technology professionals use to flummox chief financial officers. File A is in one format. The enterprise system understands File A only when it is transformed into something that the enterprise software system can * really * process. XML, for example, is not created equal. There are weirdnesses in common file types such as the wonder RTF from Microsoft ages ago and still kicking after all these years. In my work, I have established that transformation can chew up 25 to 35 percent of the information technology unit’s budget. CFOs don’t know where the money is going because “transformation” is not taught in MBA school and CFA online courses.

If you track this type of information processing, you will want to read “Convert PDF to XML and Save Up to 60% Cost”. I found the assertions in the write up interesting. Here’s an example passage:

Outsourcing XML Conversion – A trusted and well known xml conversion outsourcing organization provide PDF to XML conversion services & solution worldwide at lowest possible rates. We offer high quality PDF to XML conversion services with savings up to 60% on PDF to XML conversion cost. You can also try our free PDF to XML conversion services to test our quality of our PDF to XML conversion.

What I have learned is that getting PDFs to yield usable tagged content is a tough problem. The Adobe crowd in the 1980s seemed to be looking for a way to render the printed document on multiple platforms. Somehow the PDF became the “new” PostScript which was wild and crazy too. PDF files make it tough to figure out what content object goes with what content object. Stated another way, PDFs do string searching because the format is clueless when content is rendered in columns. The text is unpredictable when simple copying of a sentence or two from one PDF is required. The text will have more fleas and ticks that my pet goat in Brazil had when I was a “kid”.

I wanted to capture this info because transformation troubles often yield only to a full, complete list of vendors and some hammer dialing.

Y0u can also try Online OCR which cuts out the humans entirely.

Stephen Arnold, October 15, 2009
I wish I knew how to get paid for writing about transformation outfits in far off lands. I will keep trying. No dough for the goose on this write up.

Google Wants to Be a Media Company = Content Delivery Network Rumors

October 15, 2009

Barron’s  is one of those business newspapers that blends caution with molecules of nouns to whip investors into a frenzy of uncertainty. Barron’s “Akamai Rallies on Rumor of Google Bid” is an interesting write up. CDNs or content delivery networks are complicated. Akamai has proprietary technology, legions of ISPs on board, and nifty methods for getting popular content to a user quickly. An investor type, who actually bought me lunch at Taco Bell, floated this idea past me. I pointed out:

  • Akamai is sophisticated outfit
  • Akamai has plumbing in place and on-board ISPs who get  financial and bandwidth benefits from their support of the Akamai methods. These involve the injection of smart bits in packets and some other magic
  • Video is becoming the method of communication in the emerging semi literate world of the US of A
  • Companies with a plan to be a media giant can benefit from owning an Akamai or similar outfit because it generates revenue and provides a convenient way to slash certain operational costs.

Barron’s said:

Briefing.com notes that AKAM calls are seeing buying interest this morning amid “GOOG for AKAM chatter.” I’m not sure that Google really wants to be in the content delivery network business, particularly given a spreading view on the Street that AKAM’s results could be hurt by intensifying pricing pressure in the CDN market. But clearly, somebody believe the rumor.

See fan and back peddle. Fan and back peddle.

With churn the name of one popular game on Wall Street, I sure don’t know if Googzilla is going to gobble up the staff and the technology at Akamai. Google has its own CDN in place, but with the volume of rich media that will be coming down the road in the months ahead, this type of acquisition makes sense to me. Akamai has technology, ISP relationships, plumbing, and people. Did I mention really good people?

Stephen Arnold, October 15, 2009
Sadly no one paid me to write this article. The investor on Friday bought me a chicken thing with a made up name, though.

Google Inches Closer to Becoming the Internet

October 14, 2009

Shocking, right? When I articulated this viewpoint in my 2007 write up about Google’s “big play” for the now defunct BearStearns investment shop, few were thinking about this type of big play. Even today, when I point out that “Google has won” in some next generation applications, most people shake their heads slowly and say, “That guy is an addled goose.” Hmmm. Maybe.

For those who are looking at tiny perturbations and missing the larger wave forms, you may want to read “Google Now Largest Source Of Internet Traffic” by Thomas Claburn. You can get the juicy details from the story which first appeared in InformationWeek. You can get my analysis in the Google Trilogy, available from Infonortics Ltd. I think you may find that what seems “new” today is actually the trailing edge of even more significant Google innovations. Those trying to pigeon hole Google have their work cut out for them. Companies trying to catch up with Google may find themselves falling farther behind Googzilla, according to my research.

Stephen Arnold, October 14, 2009
(Yep, BearStearns paid me in 2007 to write about Ramanathan Guha and the Programmable Search Engine. No. No one gave me money to purchase dog treats for Tess for this write up.)

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta