Microsoft: Open Source and Mixed Signals
May 8, 2009
Short honk: Mary Jo Foley’s “Microsoft to turn .Net Micro Framework Code, Support over to the Community” here contains a rare glimpse of an intelligent observer trying to make sense out of corporate machinations. The snit is related to a chunk of the super wonderful Dot Net framework. After some layoffs and a statement that the code would be converted to open source, Ms. Foley pointed out that Redmond seemed to be withdrawing from the embedded sector. Microsoft replied that Ms. Foley was incorrect. Read her story and the updates and decide for yourself. In my opinion, it seems like some backtracking to me. No impact on search as far as I can see, but does this cut back and restatement presage similar behavior in the enterprise search space as well?
Stephen Arnold, July 8, 2009
IBM, 50 Years of the Mainframe and No STAIRS Reference
May 8, 2009
I was disappointed. I clicked through the slides here to review 50 years of mainframe innovations from IBM. I realized that most of the mainframe innovations are available on smaller, cheaper, faster, and less complex systems today. I was dismayed that there was no reference to the mainframe search system STAIRS and STAIRS III (now Search Manager here). Sad. That was arguably the first enterprise search system. STAIRS sort of lives on in the BRS system available from Open Text.
Stephen Arnold, May 8, 2009
Google Founder’s Letter
May 8, 2009
This document has been getting quite a bit of play today (May 7). You can read the original here. I wanted to capture what I thought was the most important segment of the document. I quote:
Given the tremendous pace of technology, it is impossible to predict far into the future. However, I think the past decade tells us some things to expect in the next. Computers will be 100 times faster still and storage will be 100 times cheaper. Many of the problems that we call artificial intelligence today will become accepted as standard computational capabilities, including image processing, speech recognition, and natural language processing. New and amazing computational capabilities will be born that we cannot even imagine today.
While about half the people in the world are online today via computers and mobile phones, the Internet will reach billions more in the coming decade. I expect that by using simple yet powerful models of computing such as web services, everyone will be more productive. These tools enable individuals, small groups, and small businesses to accomplish tasks that only large corporations could achieve before, whether it is making and releasing a movie, marketing a product, or reporting on a war.
When I was a child, researching anything involved a long trip to the local library and good deal of luck that one of the books there would be about the subject of interest. I could not have imagined that today anyone would be able to research any topic in seconds. The dark clouds currently looming over the world economy are a hardship for us all, but by the time today’s children grow up, this recession will be a footnote in history. Yet the technologies that we create between now and then will define their way of life. [Emphasis added]
I interpret this to mean that the Google is going to step up its activities in certain core markets. Disruption ahead. Fasten your seatbelts.
Stephen Arnold, May 8, 2009
Twitter Data Mining
May 8, 2009
A semi-happy quack to the reader who sent me a link to what appears to be a Microsoft Twitter data mining application. You can look at the service here.
Runtime Error
Description: An application error occurred on the server. The current custom error settings for this application prevent the details of the application error from being viewed remotely (for security reasons). It could, however, be viewed by browsers running on the local server machine.
Details: To enable the details of this specific error message to be viewable on remote machines, please create a <customErrors> tag within a “web.config” configuration file located in the root directory of the current web application. This <customErrors> tag should then have its “mode” attribute set to “Off”.You may need to do some fiddling. My Windows 7 experience returned this message:
<!– Web.Config Configuration File –>
<configuration>
<system.web>
<customErrors mode=”Off”/>
</system.web>
</configuration>
I tried again, and I was able to see the page shown in the screenshot below:
I learned that “The app is available for installation from the Flotzam site at http://flotzam.com/archivist.“ No joy. I was going to Twitter for help, but I thought I would pass along my experience. You may get this to work.
Stephen Arnold, May 8, 2009
Google Hadoop de Yahoo
May 7, 2009
The Register has another interesting write up about Google. This story — “Hadoop – Why Is Google Juicing Yahoo Search?” here – struck me as having two excellent points. The first is that Google’s arrogance is served with the uptake of its approach to data management. Second, Yahoo grabbed on to the Google solution and so far has not narrowed the gap between it and its rival Google. For me the most interesting comment in the article was:
…the old Google arrogance is also at play. In sharing its distributed-computing genius with the rest of the world, Bisciglia says, Google “showed the world that they were right.”
Yep, seems as if Google has been more right than its Web search, online advertising, and rich media search rivals so far.
Stephen Arnold, June 7, 2009
New Media, Old Media Spoofed
May 7, 2009
The story “Student’s Wikipedia Hoax Quote Used Worldwide in Newspaper Obituaries” here underscored for me the precarious nature of “information” in today’s world. The Irish Times reported that a fake quote in Wikipedia popped up in newspapers around the world. New media and old media both fell into the comfortable assumption that if it is online, information is correct, accurate, true, and well-formed.
At a conference yesterday, I spoke with a group of information professionals. The subject was the complexity of information. One of the people in the group said, “Most of the technical information goes right over my head. At work, people turn to me for answers.”
I don’t want to dip into epistemological waters. I can observe that the rising amount of digital information (right or wrong is irrelevant) poses some major challenges to individuals and organizations. The push for cost reduction fosters an environment suitable for short cuts.
Last Sunday, one of my publishers (Harry Collier, managing director, Infonortics Ltd.) and I were talking about the change in how large news organizations operated. He had worked from book and newspaper publishers earlier in his career as i had. He noted that the days of investigative reporting seem to have passed.
I had to agree. The advent of online has made research something that takes place within the cheerful confines of the Web browser. Interviews I once conducted face to face, now take place via email. Even the telephone has fallen from favor because it is difficult to catch a person when he or she is not busy.
A number of companies involved in content processing are experimenting with systems that can provide some clues to the “provenance” or “sentiment” of information. The idea is that tireless software can provide some guideposts one can use to determine if a statement is right or wrong, hot or cold, or some similar soft characteristic.
The quote story from the Irish Times highlights the pervasiveness of online short cuts. In this case, the quote is unlikely to do much lasting harm. Can the same be said of other information short cuts that are taken each day? Will one of these short cuts trigger a chain of unexpected events? Will the work processes that encourage short cuts create ever more chaotic information systems that act like a brake on performance? Who is spoofing whom? Maybe ourselves?
Stephen Arnold, June 8, 2009
Twitter Pumps Search
May 7, 2009
Newsfactor here and other Web news services posted stories about Twitter getting a dose of search steroids. You will want to read “Not-for-Sale Twitter Is Expanding Search Functionality” by Patricia Resende to get the details. Ms. Resende wrote:
Twitter Search will be used to crawl information from links by Twitters to analyze and then index the content for future use, Jayaram, a former vice president for search quality at Google, told Webware. Currently Twitter Search is only used to search words included in tweets, but not words in links. Along with its new crawling functionality, Twitter Search will also get a ranking system. When users do a search on trending topics — the top-10 topics people tweet about, which get their own link on the Twitter sidebar — Twitter will analyze the reputation of the tweet writer and rank search results partially based on that.
To me, I think this scoring will be an important step. Here’s why:
- Clickstream metrics by individuals about topics, links, and words provide important clues to smart software
- Individuals with large numbers of followers provide “stakes in the sand” for making some type of subjective, value-centric calculation; for example, a person with more followers can be interpreted as an “authority”
- Individuals who post large number of results and have followers and topics add additional scoring dimensions for calculating “reputation” and other squishy notions.
A number of commercial content processing companies are in the “reputation” and subjective scoring game, but Twitter is a free (for now) real time service with a large volume of posts. The combination makes Twitter a potential dark horse in the reputation analysis game. Believe me. That game has some high stakes. Nonsense about waiting in line at a restaurant becomes high value data when one can identify high score folks standing in line multiple times per week. You don’t have to be a rocket scientist to figure out that the restaurant is doing something right. The score may not be a Zagat type report, but it works pretty well for making certain types of marketing scans useful.
Twitter on steroids plus real time search. More than teen craziness I assert.
Stephen Arnold, May 8, 2009
Google and Publishing: Some Metrics
May 7, 2009
The Guardian reported some metrics about Google and publishing. You will find the summary of a Googler’s speech at a publishing conference here. The article is “Google: We Give Publishers £3.3bn”. Highlights from the news story include:
- A quote attributed to Googler Matt Brittin, “We want to help publishers make money online”
- Google sends a billion clicks to publishers each month
- Google wants to “work with publishers to improve their digital revenues and help close the gap between print and online advertising returns”.
For me, the most interesting comment in the article was this passage:
He [Matt Brittin, Googler] said publishers should look to use technology to help their digital publications move at a greater pace and keep up with consumer demand, but that while it could help, Google could not offer all the necessary solutions.
The challenge that I see is that publishers think about technology in terms of putting color pictures in newspapers and slashing costs. Technology as the term is used by Googlers may a more sophisticated approach.
I don’t think the audience was able to see a path through the swamp. I wonder if any of those Google billions were jingling in the pockets of the attendees?
Stephen Arnold, June 7, 2009
Google Disclosed Time Function for Desktop Search
May 7, 2009
Time is important to Google. Date tags on documents are useful. As computer users become more comfortable with search, date and time stamps that one can trust are a useful way to manipulate retrieved content.
The ever efficient USPTO published US7,529,739 on May 5, 2009, “Temporal Ranking Scheme for Desktop Searching”. The patent is interesting because of the method disclosed, particularly the predictive twist. The abstract stated:
A system for searching an object environment includes harvesting and indexing applications to create a search database and one or more indexes into the database. A scoring application determines the relevance of the objects, and a querying application locates objects in the database according to a search term. One or more of the indexes may be implemented by a hash table or other suitable data structure, where algorithms provide for adding objects to the indexes and searching for objects in the indexes. A ranking scheme sorts searchable items according to an estimate of the frequency that the items will be used in the future. Multiple indexes enable a combined prefix title and full-text content search of the database, accessible from a single search interface.
You can snag a copy at http://www.freepatentsonline.com or you can brave the syntax at the USPTO here.
Stephen Arnold, May 7, 2009
Alternatives to Google Web Search
May 7, 2009
Abhijeet Mukherjee wrote “Ditch Google For A Day: 10 Amazing Search Engines to Try Out” here. This article provides a list of search engines that may be useful. The premised of the essay is to assert that a reader may want to set Google aside for a day or two and use these systems. I don’t want to reproduce the list. Please, visit the original write up. I would like to mention three of the systems and offer a brief comment.
First, Docjax is a metasearch system. I have noticed that Google’s coverage of PowerPoint files has been changing over the last year. There are fewer PowerPoints available and Google does not do a particularly stellar job of indexing the contents of presentations on services such as Scribd. Docjax may be useful to you. I find it helpful for certain queries.
Second, Yahoo Glue is one of those Yahoo search experiments that deliver some useful search features. I used Mindset, now removed from the Yahoo Labs’s site, for certain types of technical queries. I don’t like Yahoo Glue as well, but you may find that Yahoo is more useful with the Glue service. When I first saw Glue, I thought it was a variation on Google’s universal search.
Third, Freshbargains is a useful bargain search engine. I would classify this as a vertical search system. Some results are spot on, others less useful. Worth a shot when looking for deals.
In my opinion, none of these is a leap frog service, and I don’t any of these systems can scale like Google for a couple of reasons. First, the cost would be high and the economy is not too good in my view. Second, Google has a magnetic brand. Trucks of ad dollars would be needed to catch user’s attention.
Marginally better won’t close the gap between these systems’ market share and Google’s. That 10 year lead looks more formidable each day.
Stephen Arnold, May 8, 2009