MSE360: Cooler than Cuil
January 6, 2009
I received an email from Daniel Clark. He provided me with some information about a new Web search engine, MSE360.com. I ran a number of test queries on the system and found it to be useful. The most interesting feature to me is what Mr. Clark calls “deep search”. He said:
We… have introduced Deep Search methods to try and provide the user with a notice when a site is known to host a valid privacy policy. Although this feature is still in beta and thus only a few million sites have been deep searched, the platform will in the end provide users with a way to decide what sites to trust.
When we do spot checks on some potentially useful but really low traffic Web sites like the National Railway Retirement Board, we have found that Google does not visit very often nor does the GOOG go much beyond three links deep. The key point, of course, is how often a Web indexing system pings a site to determine if there is new or changed information available. If you have a billion Web pages indexed and refresh only 10 percent of them, the index is not too useful. Other vendors only index sites that contribute to popular searches. This approach saves money and returns useless results unless one has the knack of searching what rings the bells of 15 year olds.
MSE360.com wants to change these practices. The engine also beeps when its visits a site with a virus. I was able to find a site that would inject trojans and the MSE360.com did not squawk. The system is new, and I think its virus alert will improve. The company also wants to protect users’ privacy. Google does this too, and until I see how the company grows, I applaud MSE360.com’s privacy initiative, but policies can change. You can generate tag clouds which show some of the popular searches on the system.
I ran a query for my Web log Beyond Search. We pop up on the results list but not in the top spot. No problem on my end. You can see from the screen shot below, that MSE360.com presents hits from Wikipedia, Web logs, traditional results in the middle panel, and images on the right hand panel. I was not able to run an image search, but I did not dig into the advanced search options very deeply. You can see more results by clicking a relatively tiny hot link at the bottom of the very dense results page.
Mr Clark said:
We wanted to allow users to get the most out of there time, so in turn we designed the 3 tier layout. This layout allows for the user to get images, blogs, Wikipedia and web results, all on one page. When we polled 250 random Internet users over 70% said they preferred the layout over Yahoo. Of course the other 30% didn’t!
I found the system useful. Check it out. I will keep my eye on the service. I don’t have substantive information about funding and other basic facts. When I get them, I will pass them along.
Stephen Arnold, January 6, 2009
Sky Grid: Thomson Reuters and Bloomberg Challenger
January 6, 2009
A reader in the Eastern Mediterranean alerted me to SkyGrid, founded in 2005. After a bit of checking, I found some information in the TechCrunch write up here. The SkyGrid Web site here provides a run down of the media coverage the firm’s for-fee service has achieved. The founder of the company is Kevin Pomplun who combined high value content and what one commentator called “flow based architecture”. The notion is that information is dynamic, and the SkyGrid system is constantly refreshed. Once configured, the system delivers search without search. The service costs about $500 per month per user. The target market appears to be Wall Street’s analysts and related disciplines; for example, some intelligence and law enforcement professionals will find the service interesting. Based on information available to me, SkyGrid uses proprietary methods to acquire, process, and personalize information for each user. The technologies embraced by SkyGrid hit such hot buttons as sentiment analysis (whether information is positive or negative), categorization (figuring out what an article is about and tagging it with a classification code and term), and graphic displays of data (stock price change, for example). When I reviewed the service, I noticed parallels between SkyGrid and data on the terminals in financial shops now. The dense display (shown below) appeals to those in the financial business. The idea is to provide hot information in one place. There are some similarities between SkyGrid and Silobreaker, which I have described in this Web log. Other services that offer similar functions include FirstRain (which asserts that its technology “changes the rules of research). Monitor110 was another similar service but fell upon hard times in mid 2008.
Source: SkyGrid 2008
Several comments will let me capture my thoughts:
First, the financial services sector has some challenges facing it. As a result, I expect some of the big name Bloomberg and Thomson Reuters customers to start demanding more value. The word value, in my opinion, means price cuts. This may be good or bad news for companies like SkyGrid. The good news is that its price point is appetizing compared to the hefty fees assessed by the incumbent real time data providers on Wall Street. The bad news is that a start up lacks the track record of the incumbents, so the cost of sales might be an issue. Long decision cycles may also work against the newcomers.
Second, other companies are pushing into real time. These range from “utility” type vendors such as Exegy. This company’s value proposition is speed; that is, no bottlenecks. Latency is a big deal for the surviving financial services firms. Also, such companies as Connotate and Relegence offer appealing services that are even more customized than some of the services now trying to make sales to the Wall Street crowd (minus Mr. Madoff’s operation, BearStearns’ and Lehman Brothers, of course).
Third, these new services are at their core “dataspace” plays. As the volume of information increases, the cost of the plumbing will be an ongoing issue for these challenges to Bloomberg and Thomson Reuters. Cluuz.com, for example, has shifted from direct indexing of Web content for its demonstration service to the Yahoo “build your own search service”.
Fourth, the for fee content vendors are going to have little choice but raise their rates. The Factiva unit of Dow Jones struggled as an independent entity. Now that company is inside Dow Jones and as Dow Jones’s financial pressure mount, watch for Factiva to charge more for its services, particularly the Wall Street Journal and Barron’s data.
Fifth, the Google looms over this entire sector. Here’s why that company is a serious mid term threat to both incumbents and start ups:
- Scale. Google has plumbing. Incumbents and competitors have to get it. Expensive that.
- Data. Google has quite a bit of structured and unstructured data. The incremental cost to the GOOG to expand http://finance.google.com is incremental, maybe incidental.
- Brand. The GOOG has the hot brand. Brand visibility sells.
In closing, I think there will be consolidation and attrition in this sector. I don’t think the services have flaws. I think that the broader datasphere is marshalling forces that will make life difficult.
Stephen Arnold, January 6, 2008
Google and Publishing
January 5, 2009
Two articles appeared in my newsreader. Both discuss Google and its impact on publishing. I won’t spoil your fun by summarizing these write ups. I want to highlight each and make one observation pertinent to search and content processing.
The first article is by the New York Times (a troubled ship is she too). The author is Motoko Rich and you can read “Google Hopes to Open a Trove of Little Seen Books” here. The subject is Google Book Search, the scanning project, and the usefulness of the service to the curious.
The second article is by an outfit doing business as Ohmy News. Its article is “The Web Is Winning the News War.” Peter Hinchliffe (Hinchy for short I think) points out that Web services are a challenge for the traditional news outfits. Hinchy does not mention Google, but the shadow falls over the story.
My observation is a modest one. Google disintermediates people, streamlines production, and relies on digital distribution. Books, news–whatever. The writing is on the wall. The Google is a disrupter and the implications have not been converted to learnings.
Stephen Arnold, January 5, 2009
Cloud Data Storage
January 5, 2009
The UK publication Sys-con.com published “Data Storage Has Been Taken for Granted” here. You may have to fight through some pop ups and kill the sound on the auto-running commercial, but you will want to put up with this wackiness to read Dave Graham’s article. Mr. Graham does a good job of highlighting the needs for cloud data storage. This initial article will be followed by other segments, so you will want to snag each of them. In this first installment, for me the most important comment was:
Each type of content, whether it be structured or unstructured, has different influencing factors affecting its storage and retrieval.
The significance of this comment is that a vendor or storage provider will have to have the specific framework in place to handle the demands of different types of data storage and access. Why is this important? I run into quite a few people who dismiss storage as a non-issue. These issues are not trivial and data management remains one of the factors that govern the performance and cost of a storage system. The phrase “garbage in, garbage out” has given way to “get data in, get data out” easily, quickly, economically.
Stephen Arnold, January 5, 2009
Google and Time
January 5, 2009
Time is a big deal at Google. Only a few Web search outfits manipulate time in a useful way. The GOOG has a couple of patent documents that disclose some of the company’s methods for dealing with this slippery notion. You can see one example of a historical time graph by navigating to Google.com. If you are outside the US, you have to click navigate to a country news page, click on US, and then launch your query for there. Even that may not work for everyone. Here’s what you see for the Googley query “Albert Einstein”. You have to scroll to the bottom of the page.
In this example, Googzilla is including book results and related searches to help the curious in their quest for information about the special theory of relativity.
Stephen Arnold, January 5, 2008
SharePoint: Don’t Automate, Do Stuff by Hand
January 4, 2009
The SharePointer (a place of sharing pointers) published “MOSS Variations: Page Properties that Do Not Get Propagated to Target Variations” is a useful article for two reasons. First, it solves a mystery that the geese at Beyond Search have encountered. Second, the write up shows what’s wrong with SharePoint. Why automate a function when you can do the work manually? Makes sense to some, I guess.
The author of the useful article here is Tehnoon Raza, a Senior Support Escalation Engineer at Microsoft. I love that title. At Beyond Search we will definitely add that to our SharePoint expert’s title. Now to the good stuff. Mr. Raza’s article explains that when you propagate a source site to its variations, the copy does not copy everything. The work around is easy. Create a custom column and manually insert these items for each page you want to propagate:
- URL Name (seems important, right?)
- Title
- Description
- Schedule Start Date
- Schedule End Date
- Audience Targeting
- Contact
- Contact Name
- Contact E-mail Address (another important item, right?)
- Contact Picture.
I did not notice an explanation that made much sense to me, an addled goose. You may be more in tune with the Microsoft way. My thought was, “Why not copy the properties?” No problem when there is one page. The recommended approach begs for a script when there are two or more pages. Maybe I’m missing something, but this strikes me as sort of clunky. Oh, hold on. Tess, our SharePoint expert, has a comment. She says, “It’s something a box would definitely not do.” Wow. Harsh.
Stephen Arnold, January 4, 2009
Mobile Search
January 4, 2009
One of the ZDnet Web logs presents snippets of data. I read “Top US Web Sites Accessed over Mobile Phones in October 2008” here. I then went back to the chart and looked at the data more carefully. What did I overlook in my first scan? The combined traffic of Google Search, Gmail, and Google Maps was twice that of the number one most used mobile site–Yahoo. So what? In my addled goose brain, the dominance of Google in mobile is moving toward the same “game over” type of market share Google has in Web search. Who is going to knock off the GOOG. Yahoo? I am not sure what Yahoo will be doing. Microsoft? Again, I am in the dark. I have given up trying to figure out who is in charge of search. The revolving door spins too quickly for me. AOL? Snort, snort. Weather, sports, news? Nope, the GOOG has nifty technology to make its traditional offerings more interesting by creating its own information. Maybe I am reading this Nielsen data incorrectly? If I am, let me know.
Stephen Arnold, January 4, 2009
Google Now Officially Microsoft-esque
January 3, 2009
Matt Asay’s headline caught my eye. “Google’s Microsoft-esque Landgrab for IE’s Market Share” discusses the erosion of Internet Explorer’s market share. I commented on this so I won’t review the implications of this market share decline. I want to focus on the word “Microsoft-esque.” I respect CNet, and I think its editors make a effort to choose headlines that are accurate and catchy. The use of the word “Microsoft-esque” makes official that the old order has been by passed. Microsoft snookered IBM. IBM today is a weird amalgam of “to be” software and consulting. The company generates about $100 billion in revenue so its brain trust knows how to make money. But IBM is not on my short list of companies to watch in 2009. Microsoft is now a version of IBM, outpaced and out maneuvered by Google. Google, therefore, is the “new” Microsoft. If CNet sees Google as “Microsoft-esque”, so do I. The hitch in the rope is that I don’t think Google is a Microsoft. Google is different creature, and its competitive impact is disruptive in a way that is different from Microsoft’s in the 1980s. I like “Microsoft-esque”. I just think it is misleading. The GOOG is fission compared to Microsoft’s lubrication function. The differences are more subtle than market grabbing.
Stephen Arnold, January 3, 2009
Browser Share Drop for Microsoft Is Bad News
January 2, 2009
The netbooks have arrived in rural Kentucky. Beyond Search now has two of these devices. Nothing beats the IBM mainframe in my opinion, but even old geese have to adapt. Netbooks can run applications, but we find ourselves using portable applications and services available via a WiFi or the Verizon wireless service. Once Firefox is up and running, we have found that over time cloud-based services such as Google Apps are good enough. As fond as we are of the MVS/TSO approach to computing, the browser or browser like environments seem to be the future. Victor Goinez’s “Internet Explorer’s Share of the Browser Market Fell below 70% in November” here struck us as bad news for Microsoft. The article contains a nifty graphic showing the vendors’ respective market shares too. Data reported second or third hand can be wide of the mark. Let’s assume that these figures are spot on. So what? In our opinion, a decline in Internet Explorer share of the market means that other vendors have sucked some oxygen from the Microsoft ecosystem. Microsoft can keep on breathing, but the company needs to address the problem. Other browser developers may ramp up their attack on IE, which has lagged Chrome, Firefox, Safari, and Opera in some key features. If the shift is evident to computer users in rural Kentucky, the more informed folks in more intellectually astute areas will be even more aware of the importance of the browser and browser like environments. Chrome, in our opinion, only looks like a browser. Chrome is a software airlock that connects a computing device to the Google mothership. If Chrome succeeds in snapping its airlock on more computers, Microsoft’s share of the browser market may continue to experience labored breathing.
Stephen Arnold, January 2, 2008
Google Trend Identifier
January 2, 2009
in 2008, Google rolled out a free trend and key word research tool. If you have not experimented with it, navigate to Insights for Search here. You can enter one or more words and phrases, click search, and see a nifty graphic of the “popularity” of your terms over time. I took a break from my forthcoming study about Google and publishing to run this query: Autonomy IDOL”, “Google Search Appliance”, “SharePoint search”. The data come from Google’s log files. If you are a fan of one of these enterprise search systems, you may protest the data displayed below directly to the GOOG, not me. The result was a shocker to me:
I have snipped only the chart. You will find a number of useful features available, including:
- Data by categories, time, and geography
- News items related to your query
- Identification of terms that are moving up and those that are moving down.
For me, this system is important because it shows that Google can blend several different types of information from various Google subsystems in a homogeneous service. As 2009 gets underway, companies wanting to compete with Googzilla have to match and, if possible, leapfrog its services. Companies selling key word tools or charging big bucks for trend data may have to view Google as an increasingly disruptive force in their market patch.
Stephen Arnold, January 2, 2009