New York Times Asserts It Is Indeed Hip Riding the Word Train
January 12, 2009
The New York Times is trapped within a mindset, wrapped in a culture, and under a layer of costs. New York Magazine is doing its part to show how trendy and agile this aging swan really is. You can read “The New Journalism: Goosing the Gray Lady.” The notion of goosing a dowager is an image that makes this addled goose cringe. The folks working on this project will benefit from the experience when they seek another job or chat up a venture capitalist for some dough. For me, the dead tree crowd, including the goosed gray lady, is struggling to find a solution to the journalistic equivalent of Fermat’s last theorem. Trying is good.
Stephen Arnold, January 12, 2009
Crazy Stats: Interesting Yet Hardly Web 2.0
January 12, 2009
I think the clever wordsmiths who snagged the Web 2.0 meme are blowing smoke. Losing money is not a business model. Nevertheless, I enjoyed this list of Web 2.0 statistics. I think the word “statistics” as used by TheFutureBuzz.com means “unverifiable factoids”. The article is “49 Amazing Social Media, Web 2.0, and Internet Stats” is here. Three of the unsubstantiated factoids that caught my attention were:
- Google’s one trillion urls. Impossible to verify. Ranks with Amazon’s assertions about the number of objects managed in its AWS service. More PR fluff than factual bedrock.
- The 70 million videos on Google. Nice assertion, no verification.
- 133 million Web logs indexed by Technorati. Yep, but how many have been orphaned. The total number of Web logs remains a mystery.
If you love these types of factoids, TheFutureBuzz.com article is for you.
Stephen Arnold, January 12, 2009
Lousy Economy, Google Gains Share
January 12, 2009
Barron’s reported here that Google gained market share in Web search in the US in December 2008. The source of the data is Hitwise.com. I think these data understate Google’s actual market share, but when the Wall Street Journal’s progeny asserts 72 percent market share, it must be true. The question is, “What will Microsoft and Yahoo do to gain ground?” The answer is, in my opinion, “Not much they can do.” Search is not a priority at either Microsoft or Yahoo. Sure, both outfits say search is job one, but the GOOG is built on search. Search is an add on, a pair of foam dice hanging from a bigger vehicle’s rear view mirror at Microsoft and Yahoo. Time is running out to catchup. Time to leapfrog.
Stephen Arnold, January 12, 2009
More Social Network Issues
January 12, 2009
Social search, social networks, and social pitfalls–the cheerleaders don’t want the social bandwagon to be delayed but trouble looms. Google’s Orkut made clear the issues that can arise when a social network becomes the playground of some interesting people in Brazil. Now you can read “(Under)mining Privacy in Social Networks” here by a trio of Googlers. The Google write up identifies some obvious flaws; for example, exposing information unintentionally. But the more significant part of the paper in my opinion are the references to merging social graphs. The dataspace drum beats are getting louder.
Stephen Arnold, January 12, 2009
Ask.com’s Search Technology Advances
January 12, 2009
Ask.com keeps trying. On January 8,2009, the company announced “Semantic Search technology Advances from Ask.com.” You can read the company’s statement here. The company asserts:
In October last year we introduced our proprietary DADS (Direct Answers from Databases), DAFS (Direct Answers from Search), and AnswerFarm technologies, which are breaking new ground in the areas of semantic, web text, and answer farm search technologies. Specifically, the increasing availability of structured data in the form of databases and XML feeds has fueled advances in our proprietary DADS technology. With DADS, we no longer rely on text-matching simple keywords, but rather we parse users’ queries and then we form database queries which return answers from the structured data in real time. Front and center. Our aspiration is to instantly deliver the correct answer no matter how you phrased your query.
The idea is that a user–assuming there is enough traffic to make the site viable in 2009–can enter a query any way he or she wishes. The Ask.com system will figure out the query and provide a Direct Answer. Let’s check out the system.
My first query was, “What’s the daily show?” The system responded with the top result “The Daily Show with Jon Stewart.” Good. My second query was, “What is a dataspace’s application?” The system responded by asking me the question, “What is a data spaces application?” The first result was a link to Sourceforge’s information about EQUIP2. Sorry, the correct answer was in my mind a link to the ACM papers about dataspaces. My third query was, “What is an information manifold?” This is no trick question because there is a technical paper with a title that contains the bound phrase “information manifold.” The Ask.com system asked me, “What is an information mannford?” I don’t know what a “mannford” is.
For the types of questions a middle school student might ask, the new system will work pretty well. For popular culture topics, the system will probably be better than some I have examined this week. For the types of queries I have about technologies that address the known weaknesses of traditional semantic processing, Ask.com won’t help me too much. That’s good. Knowing what questions to ask allows me to feed my goslings. Ask.com won’t put me out of job this year. One final point: I clicked on “mannford”. It’s a a city in Oklahoma. No dataspaces among that state’s wide open spaces. Look west, young search, look to Mountain View, California.
Stephen Arnold, January 12, 2009
Xsearch CEO Norbert Weitkämper Interviewed
January 12, 2009
Weitkämper Technology–based in Staffelsee, Germany–is a search and content processing vendor with a low profile in North America. The firm offers its multi-source search suite that incorporates proprietary technology to deliver fast content and query processing. The company’s XSEARCH package is customizable to focus on the client’s specific need. It offers nine variables: Clustering Engine, Suggest, DidYouMean, Summarizer, Linguistic Engine, Federated Search, Facet Navigator, Entity Extractor and Intelligent Classifier.
The industrial engineer was dissatisfied with the search results available from commercial products. Norbert Weitkämper developed Xsearch after working in electronic publishing. He told Search Wizards Speak:
As we are specialized on search for more than a decade our package is very well tuned; not only for speed but also for content for example. We will combine our new HitEngine with our established technologies like Linguistic, Did-You-Mean, clustering, synonyms and ontologies, or our personal ranking mechanisms. They are already released, we just have to melt them together.
He added:
For the complex roman languages our linguistic engine with its morphologic analysis is a big advantage, because algorithmic approaches like Bayesian or Porter, which are doing a good job for English, are a miserable failure.
On the subject of semantic analysis, Mr. Weitkämper said:
Semantic analysis is much more difficult for European languages than for English. We are already able to integrate thesauri or ontologies. I have not seen any system yet which meets the requirements for semantic analysis – at least when you have a closer look into the system. But storing information in a quick and accessible way is even more important for this approach, as you have to consider much more than only keywords and positions. So I can imagine that our optimized index structure may help also in this field to achieve adequate results in an acceptable amount of time.
More information about the company is available at its Web site, http://www.weitkamper.com. The full text of the interview with Mr. Weitkämper is at http://www.arnoldit.com/search-wizards-speak/xsearch.html.
Stephen Arnold, January 12, 2009
British Library Dubunks Myth of a Google Generation
January 11, 2009
Libraries are fighting for money and a role in the digital world. The plight of white shoe publishers is well known. Newspapers, once the life blood of information, are now stuffed with soft news or, what’s worse, old information. The shift from desktop boat anchor computers to sleek hand held devices is moving forward. Flag ship PC vendors like Dell Computers is in a fight for Wall Street respectability. The television and motion picture pasha believe that the fate of the traditional music publishing business is not theirs.
On January 16, 2008 (the date and the information come from this source), the British Library press room issued or issues or will issue “Pioneering Research Shows Google Generation Is a Myth.” The news release summarizes the study Information Behaviour of the Research of the Future. Here’s the link I located but it did not work without some clicking around. The report strikes me as something developed in an alternate universe where the Googleplex and its information system are small potatoes indeed.
He does not exist, but this member of the Google generation made it to the cover of the British Library debunking the myth study. In the future, this lad will be retrieving information from a mobile device, no PC or library required thinks this addled goose.
The study was, according to the press release,
Commissioned by the British Library and JISC (Joint Information Systems Committee), the study calls for libraries to respond urgently to the changing needs of researchers and other users. Going virtual is critical and learning what researchers want and need crucial if libraries are not to become obsolete, it warns. “Libraries in general are not keeping up with the demands of students and researchers for services that are integrated and consistent with their wider Internet experience”, says Dr Ian Rowlands, the lead author of the report.
Now this paragraph seems to suggest that “something” has happened and that libraries must “respond urgently to the changing needs of researchers and other users.” My hunch is that libraries are not surfing on the Google but paddling along trying to keep Googzilla’s spikey back in view.
Most of these curves head south, right? © British Library 2009 and presumably in the universe which I inhabit.
The news release also suggests libraries must turn to “Page 2.0”, which I presume is another silly reference to the made up world of Search 2.0, Enterprise 2.0, and Web 2.0. The news release from the future ends with the mysterious phrase “The panel:”.
Keep in mind that I am writing this notice on January 11, 2009, at 9 30 am Eastern time. The news release is from the future. It has a date of January 16, 2009. One would think that the British Library, operating outside the normal space time continuum could do more than tell me that the myth of the Google generation does not exist. Clever headline aside, libraries must define a role for themselves before funding dwindles even more. University libraries might be grandfathered into the institutional budget. Other types? Might be a tough sale.
In my opinion, what does not exist among some in the library profession is a firm grip on the hear and now. I am 65, and I think the Google generation exists. I wish it were not so, but it exists and the world one hopes will be better for the generation’s presence. Libraries seem to exist in a medieval world. Even Shakespeare is in step with the shift from paper to digital information. Consider Hamlet’s statement from one of the versions of the play crafted from Shakespeare’s foul papers:
Let us go in together,
And still your fingers on your lips, I pray.
The time is out of joint—O cursèd spite,
That ever I was born to set it right!
Nay, come, let’s go together.
No myth this, sprites.
Stephen Arnold, January 11, 2009
Microsoft’s Data Robustness
January 11, 2009
The “we may go out of business” Seattlepi.com Web site ran a story with the cruel title “Microsoft’s Servers Overloaded by Interest in Windows 7.” You can read this sort of weird headline and its accompanying story here. The story makes clear that Microsoft’s investments in its data centers was not up to the load imposed by the faithful downloading Windows 7.
The misstep was described as a “borkfest” by Lifehacker here. This goose isn’t sure what a borkfest is, but he can make a guess. Gina Trapani’s article nails the problem. She wrote:
If lack of infrastructure to handle an insane traffic spike over a few hours was truly the problem (even though these were conditions Microsoft created), there are lots of alternatives they could’ve used that would have kept their servers up. In fact, users have been happily downloading and distributing the Windows 7 beta build 7000 now for weeks using an efficient file-sharing protocol called BitTorrent.
When the GOOG streamed its live concert test last year, the Googlers tapped Akamai. Did Microsoft use its own content delivery network? Did Microsoft contract out the job? Whoever handled the job may want to check out another line of work in my opinion. Seattlepi.com quotes a Microsoft Web log. I noted this sentence: “We are adding some additional infrastructure support to the Microsoft.com properties before we post the public beta.” Good idea.
Stephen Arnold, January 11, 2009
Yahoo: Slipping and Dipping
January 11, 2009
I have deep skepticism about third party data. Nevertheless, when reports about Web site traffic and online advertising share appear, the data get snapped up the way Tess goes for a dropped chicken wing. Silicon Alley Insider’s “Yahoo’s Share of All Search Advertisers Drops 36% in QY (YHOO)” is worth reading. You can find the story and the scary red line here. Let’s assume the data are accurate. Bad news for Yahoo. Let’s assume the data are off a tad, say, down 18 percent in Q4. Slightly less bad news. If the Yahooligans continue to slip, the GOOG benefits. Yahoo started as a directory, became a portal, and then floundered. Like a person overboard in the Arctic waters off Nordkaap, even a strong swimmer succumbs. A weak swimmer, well, not much chance. Yahoo is now in the Arctic waters.
Stephen Arnold, January 11, 2009
Business Week: All Over the GOOG
January 10, 2009
Business Week may want to rename one of its editorial sections “Google Week.” The editors at Business Week crank out articles about Google. Most are interesting, but some of the Google coverage is–well, let me be gentle–obvious. Here’s an example, “Small businesses Love Google, Even When things Go Wrong.” Now we know that search is not very good. I know that folks with multiple PhD’s and big IQs will beg to differ but I point to the research I have done, Jane McConnell in Paris has done, and that Martin White in London has done. Our data reveal that about two thirds of the users of a search system are dissatisfied. Now Business Week has embraced a Neilsen-WebVisible survey that says 92 percent of Internet users are satisfied with Web search. But–and this is an important “but”–“39 percent of them frequently can’t find companies they’re looking for.” Search doesn’t work too well. Imagine that. You must read the Business Week article here which includes a link to the news release from the big time research outfit here. In my opinion, the reason people love Google has to do with the imprint Google has stamped on two thirds of the people who look for information on Google. Google is search. Search is Google. If a free service works in a manner one can describe as “good enough”, that’s okay. The key is the brand power and magnetism Google possesses. Perception is a big part of a search system’s success. Google’s been working on perception for a decade, and the GOOG has done a bang up job. Now if we can shift people from their grip on the view of Google as an ad company, I would be a happier goose.
Stephen Arnold, January 10, 2009