Bing and Fail Over
July 4, 2009
Short honk: I had high hopes for Bing.com and its next generation, high availability data centers. The addled goose is inspecting goose ponds 4,000 miles from Harrods Creek and was not able to access Bing.com’s travel vertical. The goose thought he was at fault. I then read “Seattle Data Center Fire Knocks out Bing Travel, Other Web Sites” and learned that others were at fault. Whew. New acronym need: MGOL or Microsoft Goes Off Line.
Stephen Arnold, July 4, 2009
Hadoop Caught in Loops
July 4, 2009
Dana Blankenhorn’s “Who Will Control Hadoop?” here raised an important question. The focus was close, but I considered his question in a broader context. Mr. Blankenthorn asked:
Do too many Hadoops spoil the code?
In a narrow sense, my view is let many flowers bloom. When the world was less fluid, flakey, and financially challenged, many efforts seemed like a good idea. Now, I am not so sure. Mr. Blankenthorn said:
But some reporters are beginning to ask who is really in charge of Hadoop. Is it Apache or Yahoo? Was Yahoo’s distribution a diss of Facebook, which previously developed its own Hadoop SQL, called Hive? Most projects have a community and a commercial arm. Hadoop’s importance has drawn a number of corporate sponsors to separately deliver their implementations. Microsoft, Yahoo, Google, and Facebook all have their own takes on Hadoop, alongside Apache and Cloudera. All these various Hadoops can be seen as a positive or a negative. As a positive, there is growth and momentum for the framework. As a negative, there are many organizations pulling Hadoop in different directions.
In a broad context, the value of open source software is that many hands working to create something that is not proprietary, not unstable, and not subject to the whims of a corporate titan is a foundation stone. On the other hand, fragmentation of an important technology makes some folks wary of open source.
The way online works is to reward one company with a virtual monopoly. This is a natural consequence of costs and user behavior. The problem is that when one outfit is in control, that organization follows the well worn path of profit and benefit maximization. That can’t be helped either.
In short, I think the same type of financial meltdown that has trashed some individuals’ plans for the future is likely to take place again. Tricky stuff, indeed.
Stephen Arnold, July 4, 2009
Independence Day and the Internet as a Basic Human Right
July 4, 2009
I read a couple of weeks ago a short item in the Hindu here. The article’s title gave me pause: “Internet Access Is a Fundamental Human Right”. The story asserted:
The Internet has become such a part of today’s life, that it is now considered a necessity rather than a luxury. And, now a French court has ruled that access to the world wide web is a fundamental human right. “Under the Declaration of 1789 (founding principles of the Republic set down after the French Revolution), every man is presumed innocent until proven guilty. “The Internet is a fundamental human right that cannot be taken away by anything other than a court of law, only when guilt has been established there,” the Constitutional Council in France has ruled.
Several comments:
- How will this right be fulfilled in such places as Soweto and similar infrastructure starved regions?
- Has the Internet become synonymous with communication, leaving its technology roots behind as unnecessary baggage?
- Will the leaders of countries eager to have hardware limit access to knowledge separate the “right of access” from the accuracy, completeness, and quality of the information available?
The addled goose struggles with politics and law. Hopefully the French will provide some exemplary implementations. I can think of a couple of areas in Marseille where a demo would be useful.
Stephen Arnold, July 4, 2009
Concept Searching Update
July 3, 2009
Founded in 2002, Concept Searching provides licensees with search, auto-classification, taxonomy management and metadata tagging solutions. You can download a fact sheet about the privately firm here. The software can be used on an individual user’s computer or mounted on servers to deliver enterprise solutions. The company’s secret sauce is its statistical metadata generation and classification method. The technology uses concept extraction and compound term processing to facilitate access to unstructured information. The company operates from Stevenage in Hertsfordshire. A list of the Concept Searching offices is here.
The company emphasizes the value of lateral thinking, and its approach to content analysis implements numerical recipes to find these insights and linkages within unstructured text.
When I updated my profile for this company earlier this year, I noted that the firm had signed Portal Solutions, a company that focuses on things Microsoft. The idea is to make it possible for a user to search for “insider dealing” and retrieve documents where that bound phrase does not appear but a related phrase such as “insider trading” does appear. This type of system appeals to intelligence officers and financial analysts. Concept Searching’s methods generated lists of related topics. You can see an example of the system in action by navigating to this page. I ran several test queries and the interface provided useful information and suggestions about other related content in the processed corpus. A screen shot of the output appears below:
Concept Searching is a Microsoft and Fast Search partner. The idea is that Concept Searching’s technology complements and in some cases extends the search and content processing services in Microsoft products. In May 2009, the company sponsored a best practices site for Microsoft SharePoint. The deal involves a number of companies, including ShemaLogic, KnowlegeLake, and K2 Technologies among others. The site is supposed to go live in the next couple of weeks, but I don’t have a url or a date at this time.
The company had a busy May, signing deals with Allianz Global Investors, Directory, and AT&T Government Solutions.
For me, the most interesting system that Concept Searching offers is its ability to generate and classify terms found in SharePoint documents into a taxonomy. The company has prepared a brief video that demonstrates this functionality. You can find the video here. The company’s approach does not require a separate index. Microsoft Enterprise Search can use the outputs of the Concept Searching system. I noted two “uniques” in the narrative to the video, and I remain skeptical about categorical affirmatives. I think the bound phrase extraction and the close integration with SharePoint are benefits. I just bristle when I hear “unique”, which means the one and only anywhere in the world. Broad assertion in my experience.
Concept Searching’s president, Martin Garland, said here:
Our intellectual property is still unique as we are the only statistical search technology able to indentify multi-word patterns within text and insert these patterns directly into the index at ingestion or creation time. We call this “Compound Term Processing”.
Last week I sat in a briefing given by one of Microsoft’s enterprise search team. I thought I heard descriptions of functions that struck me as quite similar to those performed by Concept Search and such companies as Interse in Copenhagen, Denmark.
I think it will be fruitful to watch what features and functions are baked into the upcoming Microsoft Fast ESP version of the old Fast Search & Transfer system. Remember: the roots of Fast Search stretch deep to 1997, a year before Google poked its nose from the Stanford baby crib.
Partners like Concept Searching have invested significant resources in Microsoft technologies. Will Microsoft respect these investments, or will Microsoft in an effort to recoup is $1.23 billion investment take a hard line toward such companies as Concept Searching.
I am on the fence regarding this issue.
Stephen Arnold, July 3, 2009
OECD Data Diving
July 3, 2009
Short honk: Want to explore OECD country data. First, read the BBC story “Exploring the OECD Web Site” then navigate to OECD Explorer. Ideal for those who want short cuts to data analysis.
Stephen Arnold, July 3, 2009
YAGG: Google App Engine Takes a Long Lunch
July 2, 2009
Short honk: Fresh from its criticism of Microsoft’s approach to data centers, Google makes clear its engineering approach to reliability. TechCrunch reported “Google App Engine Broken For 4 Hours And Counting.” That early Google patent document about quality of service may not be in the hands of the App Engine team I surmise. YAGG is the addled goose’s acronym for “yet another Google goof.” Will Google issue another critique of the Microsoft approach today to obfuscate what seems to be a Googley way to bring some fireworks to App Engine users’ pre holiday festivities?
Stephen Arnold, July 2, 2009
Search and Maxing Out the Grid
July 2, 2009
I recall meetings at Halliburton’s old Nuclear Utilities Services unit we talked about the problem of sucking too much power from the grid. The grid, of course, is a metaphor for a complicated set up of devices and cables that move power from where it is produced to an end point or end points. I found the reference in the Slashdot article “NSA To Build 20-Acre Data Center In Utah” a blast from the past. My thought is that the problem of power sucking data centers is not on most folks’ radar. Along with the deteriorating roads and bridges in rural Kentucky, the power generation industry faces a similar problem. Search requires big data centers. Green yapping aside, as the volume of data increases, the need for megawatts goes up. Green is good but not even zippy chips and clever ideas like uninterruptable power supplied by conventional flashlight batteries is enough. Exciting times and costs ahead for those with big data centers. Plumbing may make the difference between the winners and the losers in search and content processing.
Stephen Arnold, July 2, 2009
Microsoft Vision of a Fast Follower
July 1, 2009
I wondered how a once-nimble company like Microsoft would find a way to make lemonade from the lemons in its financial and product orchard. I learned a bit about how Microsoft will spin its corporate story as it tries to move its shareprice from value stock land to the country club that Wall Street desperately needs.
You can get some insight by reading Don Dodge’s “Why Do Fast Followers Often Beat the First Mover Innovators?” on Microsoft’s Start Up Zone blog for entrepreneurs. I don’t know information worker productivity person Don Dodge, but he provides some info about his background. He workd at Digital Equipment Corp. and five start ups over the “next 12 years”. Experience is definitely good as long as the same errors are not repeated each year.
The part of the write up that interested me was the use of Alta Vista as the first mover and Google as the fast follower. Not only do I think that this example (which is only one of 12 in the write up, I think it underscores what search is so confusing and ultimately dissatisfying for most users.
First, Alta Vista was not a first mover. The notion of Web search stretches back to approaches either forgotton (Aliweb, JumpStation) or repositioned before Google rolled out (Northern Light) or drifted into oblivion (Inktomi, Excite). If we set aside Web search, anyone remember STAIRS and InQuire with its famous forward truncation? Verona or Very Easy Rodent Oreiented Netwide Index to Computer Archives or something along those lines. Alta Vista was important but it was a demo, and its real contribution was to provide Google with a gaggle of wizards who arrived with years of research into the special problems high volume indexing via Internet. I am not sure AltaVista.com is a precursor to Google. In my research, I think of Google hiring AltaVista.com wizards and just doing what AltaVista.com had started but without the craziness of the DEC sale to Compaq, the HP buy out of Compaq, and the orphaning of a demo for the Alpha chip. AltaVista.com was more like a roll up of other search engines’ ideas and Google just pumped in more dough and included the PageRank gizmo.
Second, implicit in “fast follower” is the idea of standing on the shoulders of giants or what I call “me too” invention. By making incremental improvements, a company can find a way to use modest technology enhancements and Grand Canyon marketing to make sales. The notion that marketing is more important than technology applies to many “fast follower” products. In short, marketers can’t invent much, but they can sell candy to moms who want their kids to stop whining. Does this sound like the addled goose is disenchanted with 21st centruy marketing? It should. Marketing for the purpose of inventing a needing and pumping up a fuzzy idea like value has made some remarkable contributions to US economic life. Bernie Madoff is an example of influence marketing in a “fast follower” setting. Come to think of it the police action underway in Norway concerns Fast Search & Transfer falls into a similar category. “Me too” leads to some interesting business actions it seems.
Third, the “fast follower” notion is not innovation in the scheme of Harrod’s Creek and the goose pond. Incremental improvements are essential, but anyone who looks at how American icon Tom Edison worked comes away with a sense that judgment, business ethics, and integrity can become road kill on this path. Are not the ideas of Friedrich Hayek and his pals exciting?
For me the most interesting information in the write up was a series of five dot points. I can’t reproduce each of these but I can comment on one dot point and urge you to read and reflect on the other four. Dot point three underscores the fallacy of the “fast follower” approach to business success. The bullet said:
Value sales and marketing talent as much as technical talent.
That is the mantra of modern business and innovation today. The user car salesman and the technical expert. Which is able to deliver more value? Which benefits society more? Technology can only be managed with more technology. Sales begets a need for more sales.
In short, the “fast follower” method adds another burden of meaning to the fictional Willy Loman. “Low man” – get it. How low. Read Gawker’s take.
Stephen Arnold, June 30, 2009
Google Offers a Digital Olive Branch July 1
July 1, 2009
In my Google: The Digital Gutenberg, I describe an invention disclosed in a Google patent document for a “partner” to use Google like an integrated motion picture studio. The invention, in effect, allows a partner to create content, post it, control access to the content, run an ad campaign using Google tools, and essential operate like those fun loving moguls Sammy (I am a lamb) Goldwyn and Louis (I am a cupcake) Mayer. Google, according to Reuters, is promoting this “run your own business” service to newspapers. You can read Joseph Tartakoff’s “Google Wants Newspaper to Post Their Videos to YouTube” to get the Thomson Reuters’ slant on this story. For me, the most intriguing comment was:
That [the new offer from Google] contrasts with Google News, where publishers do not get a cut of any of the revenue from the ads that are placed around their headlines. Still, it’s unlikely that many publishers will want to abandon other video platforms, like Brightcove, which also allow them to sell their own ads against their video content—and to link up with several ad networks. Google had already begun to slowly integrate YouTube news videos with Google News last month, when it added videos for the first time to Google News, and the new push should further that. For Google, it’s also a free way to add more professional content to YouTube, and thus attract more premium advertisers.
Will newspapers grab the digital olive branch? Good question. I think that some publishers may do the math and conclude that Google has tipped the odds in favor of the house. I think that’s a wrong way to look at the Google offer, but that’s why I am a fat, addled goose, paddling in the pond with mine drainage run off. I don’t sit in an office tower with air conditioning cooling my pin feathers.
Stephen Arnold, July 1, 2009
Google and Data Object Visualization
June 30, 2009
The USPTO published US7555471 B2 on June 30, 2009. The Beyond Search goslings think this is a reasonably important Google disclosure. The investors include one super Googler and clutch of other Google rock star engineers. Andrew Hogue is a Googler to watch. If you find his official Google page opaque, try this link. He and his band of engineers have received a patent for “Data Object Visualization.” Don’t get too excited about the graphics. The system and method applies to a core Google system for cleaning up discrepancies in fact tables. If you are a fan of Dilbert, this is the invention that describes one of Google’s smartest agents the official descriptor “janitor”. How smart is the janitor. Smart enough to make dataspaces closer to reality. The USPTO system is sluggish today, so you can get info from FreePatentsOnline.com or one of the other services that provide access to these public documents. I love that janitor lingo too. Googley humor for big time inventions makes clear that the 11 year old Google still possesses math club whimsy. Those examples for atomic mass and volcano are equally illuminating.
Stephen Arnold, June 30, 2009


