Google Grabs a Handful of French Élan

August 20, 2009

A gaggle of goslings are on our way to a meeting. We just learned that the Google has grabbed a handful of France. We received an email that the estimable Times of London’s article “Google Bruises Gallic Pride as National Library Does Deal with Search Giant” said:

The shift was explained by economics, said Denis Bruckmann, director of collections at the BNF, which joins 29 other major world libraries in opening its shelves to Google’s project (including Oxford’s Bodleian). France provides only €5 million (£4.3 million) a year for digitizing books for Gallica, the national digital library, yet the BNF needs up to €80 million just for its works from 1870 to 1940, said Mr Bruckman. “We will not stop our own digitizing programme, but if Google can enable us to go faster and farther, then why not?”

The two most important words in this write up are economics and Google. Without the Google, digitization projects are likely to be little more than modest subsets. What’s up with the Library of Congress’ digitization projects? Why did the French library overlook French technology? Answers anyone?

Stephen Arnold, August 19, 2009

Foundem Gets Lost in Google

August 20, 2009

I find newspaper stories with quotes like this quite amusing:

“Google is just too dominant for any of us to feel entirely comfortable.”

This statement appears in the UK Guardian’s story “Search for Answers to Google’s Power Leaves UK Internet Firm Baffled”. I have been asked to examine Web sites that have dropped or disappeared from a high ranking in a Google results list. I can’t identify these companies, but I can share with you three reasons that my team identified.

The first Web site was a financial services firm. The Web site with the ranking problem indexed its page using terms like “financial” and “services” and “enterprise”. The notion of substantive content, concrete nouns, original content, and inbound and outbound links were non existent. My recommendation to this outfit was to dump its present Web site development company, recruit a couple of writers, and get indexing help from a librarian. The company did not and it still is nowhere in the Google rankings. These missteps were news to my client. The problem was that the client was confident that she knew how to make a Web site pay. Wrong. Arrogant and stubborn. Attractive to some but not too helpful in getting a Web site indexed in a way that warrants a top ranking by the GOOG.

The second Web site was a company that made some sort of email add in. My recollection is that the software snagged email addresses. This outfit disappeared completely from the Google results list. We looked at the old Web site and the new Web site. There were metatags with hundreds of words, long page descriptions, and content that was essentially brochure prose. The problem we learned after doing some poking around was an “SEO expert” whom I shall not name. This expert fiddled and the unexciting site disappeared. The person who hired this “SEO expert” had some spiel about the brilliance of the “SEO expert”. My suggestion. Dump the crazy indexing and spend a weekend reading Google’s Web master guidelines. The company fired the person who hired the “SEO expert” and followed Google’s rules. The site is now back in Google’s good graces.

The third Web site experienced double digit decreases in traffic over a period of four months. We looked at the site and found hundreds of 404 errors, thin content, repeated service outages, and an interface that was tough for a human and pretty much a mess for an indexing robot. We turned in our report. The company’s management buried it. The site is up and unchanged. The problem was not Google; it was people who thought they knew something and did not.

Now what about the poster child in the Guardian news story–http://www.foundem.co.uk/

First, running a query for pages indexed by the Google, I learned that Googzilla had 29, 600 hits for this site. This means that Google is indexing the site and that the site is in the index. In fact, ArnoldIT.com has a miserable 782 pages indexed by the Google. I am not complaining. This means that on the surface the site is not generating enough “Google glue” to warrant a high ranking.

Second, I ran the site against some validators. Errors were returned. Not good. Before grousing, one should make certain that the code is clean. Google’s automated system thrives on data. Bad code is, well, bad.

Finally, I ran some queries and looked at the results. Yep, a dynamic site. Now the Google has some nifty technology for dynamic sites. Dynamic sites fall into the province of Google’s Programmable Search Engine and its mostly ignored dataspace technology. Foundem is obviously not hip to Dr. Guha’s methods of working with dynamic sites’ data.

You can see this in action when you navigate to the US Google.com and enter the query SFO LGA. Notice that the Google has some partners like Cheap Tickets, Expedia, and five others. My thought is that this “top site” should get Googley and try to work with the Google. I know this take work, but the effort may pay off.

Blaming Google for not indexing and providing a high ranking to a site may be good for the Guardian I suppose but not so helpful to Foundem. Google is a great many things, but the company is not set up to focus on a single Web site. Fix the code. Read about the Programmable Search Engine. Talk to the vendors who are listed as top dog in the SFO and LGA query. The Guardian’s reporter may not know about these nuances. More work is needed on the Foundem end of the deal.

Just my opinion.

Stephen Arnold, August 20, 2009

Commercial Textbook Publishers Faces Pressure from the Open Source Crowd

August 20, 2009

Open source once meant software that most humans could not locate, convert to an executable, and use if and when the human was able to get the software on a computing device. No more. Open source continues to push its snout into the bedrooms and under the covers of the enterprise. Nothing makes a commercial vendor more uncomfortable than the wet snout of open source sniffing in some very private regions.

To get a feel for how open source may push traditional publishers over the cliff, you will want to read “Advice On Creating an Open Source Textbook?” Let us assume that the author Occamboy is spot on. When I take this approach, the following comment strikes me as important:

Poking around on the Net, I’ve found several intriguing options for distributing open source texts, such as Flatworld Knowledge, Lulu, and Connexions.

When I add Google’s creative commons publishing effort to Occamboy’s three links, I see a way for subject matter experts to create textbooks. Taking this a step further, combining online video with open source textbooks, I see a way for some instructors to cover a subject without requiring students to pony up big money for textbooks and course materials. If I am correct, the screams and splashes one hears will be traditional textbook publishers being pushed off a financial cliff to the river ???? (Styx) below. Some will find opportunity in open source textbooks. Good for them. The publishers who fail? I think the Wal*Mart in J’town is hiring greeters. More along this line appears in Google: The Digital Gutenberg. Click the ad at the top of any Beyond Search Web page. How is that for subtle marketing, dear journalists? I am selling my new Google study! Right here! Now!

Stephen Arnold, August 20, 2009

Google: App Engine on Parallel Bars Goes for Gold

August 19, 2009

Short honk: If you are not familiar with Google’s enterprise initiatives, you will want to bone up. The Google has made what I think is an important announcement about the Google App Engine. Google said on August 17, 2009:

Written by a group of engineers at VendAsta, asynctools is a rather nifty toolkit that allows you to execute datastore queries in parallel in the Python runtime. The interface is slightly more involved than using standard queries, but the ability to execute multiple queries in parallel can substantially reduce the render time for a page where you need to execute multiple independent queries. Jason Collins has written a detailed article about how and why they wrote asynctools, which can be found here.

What is the big deal? Functional parallel execution in a Dot Net environment is mostly a handcrafted operation like making a patchwork quilt. Google is an “plumbing that usually does not leak” outfit. Want more about the App Engine and what you can do with it? Navigate to Adhere Solutions and get their view of the new functionality. Alternatively, you can email me, and one of my Harrod’s Creek geese will provide some insight.

If you are in the dark about the Adhere Solutions outfit, check it out.

In terms of contacting me, the Beyond Search Web log is focused on marketing. Got it. One more email from a journalist who thinks I deserve an F in the journalism class I never took will trigger a flappin’ hot write up straight from the goose’s beak. To make sure these self important protectors of traditional journalism get it, let me be clear—marketing, marketing, marketing.

Stephen Arnold, August 19, 2009

Autonomy and Its Play in the Security Game

August 19, 2009

The ace search and content processing marketer is at it again. I read “Autonomy and Verdasys Expands EIP Alliance” and was initially baffled with the acronym “EIP”. After a bit of game show puzzling, I figured out that the “EIP” referred to enterprise information protection. The coinage is a blend of security and governance, two hot trends in the enterprise. Autonomy and Verdasys have had a tie up in place for about a year. The new deal, said the ComputerWire release:

Verdasys will use Autonomy’s core infrastructure software Intelligent Data Operating Layer (IDOL) to power its content inspection and data classification application Digital Guardian.

When I read this story, I thought about Oracle’s “secure enterprise search” play. Autonomy has taken a plain Jane topic like security and sent it to a Beverly Hills plastic surgeon. The result is a new packaging of plain Jane. Oracle had a good idea with SES10g but could not get her a date. Autonomy has found a way to make this approach have some sizzle.

Autonomy’s ability to work marketing magic in the search and content processing sector is one of the company’s core competencies. Some of Autonomy’s competitors send me news releases that do not break new marketing ground. Autonomy finds a fresh trail through the prairie that others have traversed before.

Stephen Arnold, August 19, 2009

The Sink Hole for Enterprise Software: Unplanned Costs

August 19, 2009

Network World published “ROI Doesn’t Always Pan Out with Unified Communications“, an article I thought had some excellent comments for whiz kids procuring enterprise software systems in general and search in particular. The point of the write up is that eager beavers ignore some basic facts about indirect and direct costs. Looking at one data point makes it easy to show a big financial win. When sharper pencils are applied to the assumptions and the cost analysis, trendy technology like Unified Communications” can produce unexpected costs. The eager beaver’s ROI disappears in a flood of budget busting cost overruns.

Tim Greene wrote: “Return on investment is a big but unfulfilled promise of unified communications.” The same statement can be applied to enterprise search systems, content management systems, and next-generation content processing systems that “read and understand” unstructured content. I am not saying that these systems do not work. I am saying that the cost time bombs some systems embed in an organization create major problems for the folks who have to deal with these software acquisitions.

I want to call to your attention one passage:

Establishing ROI is difficult for some businesses because IT directors that propose use of UC don’t calculate a baseline cost of certain business functions before UC that they can compare to the costs after an implementation, he [a consultant with whom Mr. Greene spoke] says.

I want to highlight the phrase “baseline cost”. That is part of the story, but it is not the whole story. I urge you to read the Network World write up and consider these points:

  • New technology often creates unexpected situations and consequences. The short term benefit may be derailed by developments not knowable. The rigorous procurement will make a concerted effort to identify potential cost issues, capture those, and include cost scenarios in the planning. The idea that every day is warm, clear, and sunny is great for the power of positive thinking crowd. That type of thinking does not mean much when budgets cannot accommodate the cost spikes and unexpected capital requirements.
  • The notion that “unified” means better, faster, or cheaper is confused. With squishing separate functions into one bundle, the boundaries between and among functions may not perform in an acceptable way. What happens is that users switch to less efficient methods because the “unified” search or any other system consumes more time and creates more hassles than it solves. These “friction” costs are not mapped back to the new software or system. The overall effect is to create more “drag” on the organization’s performance. In short, the new system slows down performance while it consumes more resources per unit of work.
  • Human nature responds to a deal. Most of those involved in system procurements, in my experience, lack the expertise to perform the type of financial analysis I suggest be standard. As a result, the “deal” becomes a cost sink hole. The consequence is that within a short period of time, needed enhancements have to be jettisoned because any budget dollars have to be shifted to cover the operational problems and their attendant cost overruns. Ever wonder why two thirds of enterprise software deployments crash and burn? I do, and I think the root cause is like the stream that backs up in Harrod’s Creek. Eager beavers do what is easiest for themselves. Eager beavers don’t think beyond the here and now.

In short, enterprise software in general and specific “universal” solutions may be a messy kettle of fish.

Stephen Arnold, August 19, 2009

Entity Extraction from Google and Yahoo

August 19, 2009

I found the announcement reported on the Programmable Web a harbinger. “Yahoo Quietly Axes Two Search APIs” disclosed that Yahoo is out of the entity extraction business * before * the Microsoft deal closes. I can see nuking this type of service once the deal closes, but killing off a service quietly is more troubling to me. Programmable Web provides a link to “Being Optimistic at the Deathbed of Yahoo Search API”. I am not so optimistic. You can read Yahoo’s announcement on the YDN:

yahoo discontinue

What is the impact? Yahoo offers no information. What about those BOSS fans? Yahoo offers no information.

To my surprise, Mashable here reported that Yahoo has not killed its term extraction API.

What’s ironic is that at about the same time as Yahoo’s entity flip flop flip was taking place, the Google’s patent document US20090204592 was published. Now entity extraction is no big deal any longer. You can poke around and find open source routines or you can click on Google ads for Teragram’s solution. I find these Google patent documents interesting because it suggests to me that the Google is cognizant of the functions that search vendors such as Autonomy and Endeca have been including in their upscale systems. With the Google nosing into these functions, I have a hunch that Google will be looking to add some new zing to its Google Search Appliance and its enterprise applications.

You can read the patent document using the wonderful USPTO system here. The abstract for the document filed on April 9, 2009, complements other Google text processing patent documents. (You can explore these via the Perfect Search / ArnoldIT.com service at http://arnoldit.perfectsearchcorp.com/.)

A system receives a search query, determines whether the received search query includes an entity name, and determines whether the entity name is associated with a common word or phrase. When the entity name is associated with a common word or phrase, the system generates a link to a rewritten query, performs a search based on the received search query to obtain first search results, and provides the first search results and the link to the rewritten query. When the entity name is not associated with a common word or phrase, the system rewrites the received search query to include a restrict identifier associated with the entity name, generates a link to the received search query, performs a search based on the rewritten search query to obtain second search results, and provides the second search results and the link to the received search query.

Yahoo waffles (thrashes in confusion?) and the Google discloses an entity function. I noted with interest that one of the Google entity extraction inventors was Marissa Mayer along with several colleagues. Tell me. Which company seems to be on the upswing? Which company is pointed toward the sunset?

Stephen Arnold, August 19, 2009

Twitter Stream Value

August 19, 2009

Short honk: I want to document the write up in Slashdot “Measuring Real Time Public Opinion With Twitter.” The key point for me was that University of Vermont academics are investigating nuggets that may be extracted from the fast flowing Twitter stream of up to 140 character messages. No gold bricks yet, but the potential for high value information seems to warrant investigation.

Stephen Arnold, August 19, 2009

Chrome Glints through Blog Smog

August 19, 2009

Mashable has caught Google watchers with their sun screen on the noses and their feet in the pool. You can see alleged screenshots of the “new” Chrome by reading and clicking “Google Chrome OS Screenshots.” The addled goose is semi-excited about these screen shots. With Google’s underlying technology described in various patent documents dating from 2006, the glints of understanding now making their way through the blogosphere are heartening, just not as important as the dataspace innovations that are where the action is. Understanding Google requires more than describing the developments Google makes available. The screen shots are important for three reasons:

  • Images don’t leak from Google by accident.
  • The fact that Chrome is getting more OS X graphic ornaments underscores the point that the Google wants to poach from * both * Apple and Google.
  • The build number suggests steady progress, not the on-again, off-again approach some of Google’s initiatives receive.

In short, the clicks are leading the Google forward.

Stephen Arnold, August 19, 2009

The Google Sony Dalliance

August 19, 2009

I wanted to capture some thoughts after a conversation at lunch with one of the Beyond Search goslings. The Slashdot post “Sony to Convert Online Bookstore to Open Format” triggered the question, “Why is Google making its book collection available to a lame duck like Sony?”

In our conversation several hypotheses floated above the bucket of Kentucky Fried Chicken:

  1. Google needs a consumer product pal to replace its former best buddy Apple. Sony is not dead yet, and Google’s magic might transform this old sneaker into a Manolo Blahnik number
  2. Google could fill a gap in its service offering with Sony’s game juice. Google, someone told me, spoke to Yahoo in 2005 or 2006 about a tie up for online games. Yahoo was too disorganized to do much more than look like the Yahoo we know and love.
  3. Google could add some online love to Sony’s pretty clumsy online efforts.

Is there more to this Google Sony romance than a love of public domain ebooks? Good question. No answer at this time.

Stephen Arnold, August 19, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta