Google Video Creeps Forward

January 7, 2009

Telecompaper.com reported on January 7, 2008, here that “T-Mobile Launches YouTube Channel for G1.” Google has a Google Channel on YouTube.com. How many more channels will be available for special niches? The GOOG, unlike the traditional TV crowd, generates metatags for its videos. Creating a channel is a software process, not one requiring humans sitting in dark control rooms twirling dials. Michael Hirschorn’s “End Times” here notwithstanding, the GOOG’s potential energy in another bastion of traditional energy will increase in force. Like an earthquake, a jump from a 2.0 to a 3.0 is not a linear force. Clever writing won’t do much to change the face of traditional media when Googzilla does its waltz to the Strauss tune Schatz-Walzer. There’s gold in those honking hot videos pumped to any device that can tap into the Google umbilical.

Stephen Arnold, January 7, 2008

Google and Disallow

January 7, 2009

You will want to check out “On Google Disallowing Carling of Their Life Hosting” here. Google Blogoscoped has a good write up about this — to some — surprising development. Other search engines cannot index the Time Warner Life Magazine images. Google inserted a blocking line in its robots.txt file. I noticed that I was limited in the number of images I could browse when the service first went live. I was surprised that these images were available to me without a fee. For years, the Time crowd has noodled about its picture archive. First, Time wanted to handle the scanning itself. Then Time wanted to subcontract the work but that was too expensive. Then it was a good idea to talk with experts about what to do. Then the cycle repeated. Along came the GOOG and the rest, as someone will write after this goose is cooked, is history. Here’s what is going on in my opinion:

  1. Restrictive content access is going to become more visible. If you read the Guha patent applications from February 2007, you will have noted that Google’s system can operate in a discrimatory way. That translates, in my view of the world, to restrictions on what others can and cannot do with Google information. This is an important phrase: “Google information.” Please, note it, copyright lovers.
  2. The Life images are a big deal, and I am confident that the restrictions are probably positioned as part of the method to balance public access with protection for the assets of Time Warner. Everyone has needs, so this restriction is a nifty way of finding a middle way with Googzilla’s hands on the controls.
  3. The cost of getting the Life images was not trivial. I have not heard anything substantive about the financial burden of this project, but based on my prior knowledge of the magnitude of the scanning and logistics of the images, this puppy was expensive. In my view, unlike a pure academic library play, this deal has a price tag and someone has to pay at some point.

What’s ahead? Well, in my view, once Google creates metadata and populates one of its knowledgebases, those data will be protected and probably with considerable enthusiasm. Google’s programmable search engine generates data and if some data items are missing, the system beavers away until the empty cell is filled. Once those dataspaces are populated, the information is not for just anyone’s use.

I mentioned the word dataspaces in a telephone converastion today. I know I am not communicating. The person on the other end of the call asked, “What’s a dataspace?” Well, you are now disallowed from one.

Stephen Arnold, January 7, 2008

Newspapers: Another Analysis of Failure

January 7, 2009

Slate’s Jack Shafer took a Tanaka ECS-3301 chain saw to traditional newspapers here. His “How Newspapers Tried to Invent the Web” was an enjoyable read for me. I don’t think the wizards at some of the formerly high flying newspaper companies were similarly affected. The hook for the article was Pablo J. Boczkowski’s 2004 book, Digitizing the News: Innovation in Online Newspapers. Armed with a fact platform, Mr. Shafer frolics through the misadventures of media mavens and the Web. The phrase I liked was “extreme suckage”. I wish this goose had thought of that. Wordsmithing aside, the comment that resonated with me was:

From the beginning, newspapers sought to invent the Web in their own image by repurposing the copy, values, and temperament found in their ink-and-paper editions. Despite being early arrivals, despite having spent millions on manpower and hardware, despite all the animations, links, videos, databases, and other software tricks found on their sites, every newspaper Web site is instantly identifiable as a newspaper Web site. By succeeding, they failed to invent the Web.

A congratulatory quack to Mr. Shafer for this write up. Read at once. Now think about a similar fate for motion picture outfits confident of their brilliance after a strong 2008. The party’s not over for that crowd. More about this in my forthcoming Google and Publishing monograph.

Stephen Arnold, January 7, 2009

Data for the 21st Century

January 6, 2009

A happy quack to Max Indelicato for his “Scalability Strategies Primer: Database Sharding” here. Mr. Indelicato has gathered very useful information about data management tactics. Unlike the IBM-Microsoft-Oracle database information, this write up delivers useful, interesting information. Download and save the article. For me, the most important comment in the write up was:

You may be wondering if there is a high amount of overhead involved in always connecting to the Index Shard and querying it to determine where the second data retrieving query should be executed. You would be correct to assume that there is some overhead, but that overhead is often insignificant in comparison to the increase in overall system performance, as a result of this strategy’s granted parallelization. It is likely, independent of most dataset scenarios encountered, that the Index Shard contains a relatively small amount of data. Having this small amount of lookup data means that the database tables holding that data are likely to be stored entirely in memory. That, coupled with the low latencies one can achieve on a typical gigabit LAN, and also the connection pooling in use within most applications, and we can safely assume that the Index Shard will not become a major bottleneck within the system (have fun cutting down this statement in the comments, I already know it’s coming 🙂

Ah, the Google legacy coming to light.

Stephen Arnold, January 6, 2009

Search Pioneer Upshifts: Interview with Mike Weiner

January 6, 2009

In the 1980s I relied on a very fast search system for my personal computer. The program was Gopher from Microlytics. In the late 1990s, I met the founder of Gopher and tracker his interest in linguistic-centric search systems. I lost track of Mike Weiner, former president of Microlytics, but we spoke on the telephone a day or two ago. You can get information about Technology Innovations here. I captured his comments in an interview which is now available on the ArnoldIT.com Search Wizards Speaks sub site here.

Two comments in my conversation with Mr. Weiner struck a chord with me. Let me highlight these in this brief news item about the interview.

First, search has grown beyond the desktop. Mr. Weiner said in response to a question about desktop search:

…the desktop of today and tomorrow are connected to the “world.” So there can be very clever background processing done on your behalf that can leverage off the information you access and the information you create. The question will be, what’s useful and important to you, and can the system fetch, or generate, this, for you, and in an efficient form you can cognitively benefit from. One of the next potentials for incredible retrieval will be intelligent “information extraction.”

Second, Mr. Weiner’s new interests pivot on innovation. Technology Innovations holds patents on different facets of electronic paper or “epaper”. About the future of epaper, Mr. Weiner said:

I see epaper heavily used in educational publications, where children and learners have questions, need definitions, etc. You may see a speller and thesaurus, and translation technology coming bundled on books with electronic chips in them.

If you are interested in search and publishing in the 21st century, you will find the Mike Weiner interview interesting.

Stephen Arnold, January 6, 2008

Can You Find Crackle Videos with Crackle Search

January 6, 2009

At lunch the subject of video search came up among the Beyond Search goslings. One of the newly-hatched goslings mentioned that Sony’s Crackle was indexed thoroughly on Google Video. Furthermore, Sony uses YouTube.com to promote new, original Crackle content. For an example, click here. We fired up our baby Asus netbook and gave the flakey Verizon high speed wireless a go. Success. We were able to connect to the Crackle.com Web site and run queries on Google Video. What’s this have to do with search? Well, the search system on the Crackle.com site is not too good. The system uses a weird and hard to read blue type on black motif, returns matches on “star” and truncates the “ving” without warning, and generally seems sluggish.

Crackle, I learned from the gosling, that Sony bought the Grouper.com site for $65 million in 2007. Some background information is here. Renamed Crackle.com, Sony’s video site is positioned–well–out of site for me. I did explore the site via the search system. The programs like Rocketboom resonated. Sony paid a hefty sum to get the rights to distribute the quirky Net-centric video show. More information about this deal is here.

Sony is spending to be a player in video. But with the PlayStation sucking air and a global financial crisis bubbling away, one wonders if Sony can do much to boost the visibility of the Crackle.com service and have the money to fix the Crackle.com search system. One plus. Crackle.com works a lot better than the piggy Web site for the Sony electronic book.

Stephen Arnold, January 6, 2008

MSE360: Cooler than Cuil

January 6, 2009

I received an email from Daniel Clark. He provided me with some information about a new Web search engine, MSE360.com. I ran a number of test queries on the system and found it to be useful. The most interesting feature to me is what Mr. Clark calls “deep search”. He said:

We… have introduced Deep Search methods to try and provide the user with a notice when a site is known to host a valid privacy policy. Although this feature is still in beta and thus only a few million sites have been deep searched, the platform will in the end provide users with a way to decide what sites to trust.

When we do spot checks on some potentially useful but really low traffic Web sites like the National Railway Retirement Board, we have found that Google does not visit very often nor does the GOOG go much beyond three links deep. The key point, of course, is how often a Web indexing system pings a site to determine if there is new or changed information available. If you have a billion Web pages indexed and refresh only 10 percent of them, the index is not too useful. Other vendors only index sites that contribute to popular searches. This approach saves money and returns useless results unless one has the knack of searching what rings the bells of 15 year olds.

MSE360.com wants to change these practices. The engine also beeps when its visits a site with a virus. I was able to find a site that would inject trojans and the MSE360.com did not squawk. The system is new, and I think its virus alert will improve. The company also wants to protect users’ privacy. Google does this too, and until I see how the company grows, I applaud MSE360.com’s privacy initiative, but policies can change. You can generate tag clouds which show some of the popular searches on the system.

I ran a query for my Web log Beyond Search. We pop up on the results list but not in the top spot. No problem on my end. You can see from the screen shot below, that MSE360.com presents hits from Wikipedia, Web logs, traditional results in the middle panel, and images on the right hand panel. I was not able to run an image search, but I did not dig into the advanced search options very deeply. You can see more results by clicking a relatively tiny hot link at the bottom of the very dense results page.

mse360 screen

Mr Clark said:

We wanted to allow users to get the most out of there time, so in turn we designed the 3 tier layout. This layout allows for the user to get images, blogs, Wikipedia and web results, all on one page. When we polled 250 random Internet users over 70% said they preferred the layout over Yahoo. Of course the other 30% didn’t!

I found the system useful. Check it out. I will keep my eye on the service. I don’t have substantive information about funding and other basic facts. When I get them, I will pass them along.

Stephen Arnold, January 6, 2009

Sky Grid: Thomson Reuters and Bloomberg Challenger

January 6, 2009

A reader in the Eastern Mediterranean alerted me to SkyGrid, founded in 2005. After a bit of checking, I found some information in the TechCrunch write up here. The SkyGrid Web site here provides a run down of the media coverage the firm’s for-fee service has achieved. The founder of the company is Kevin Pomplun who combined high value content and what one commentator called “flow based architecture”. The notion is that information is dynamic, and the SkyGrid system is constantly refreshed. Once configured, the system delivers search without search. The service costs about $500 per month per user. The target market appears to be Wall Street’s analysts and related disciplines; for example, some intelligence and law enforcement professionals will find the service interesting. Based on information available to me, SkyGrid uses proprietary methods to acquire, process, and personalize information for each user. The technologies embraced by SkyGrid hit such hot buttons as sentiment analysis (whether information is positive or negative), categorization (figuring out what an article is about and tagging it with a classification code and term), and graphic displays of data (stock price change, for example). When I reviewed the service, I noticed parallels between SkyGrid and data on the terminals in financial shops now. The dense display (shown below) appeals to those in the financial business. The idea is to provide hot information in one place. There are some similarities between SkyGrid and Silobreaker, which I have described in this Web log. Other services that offer similar functions include FirstRain (which asserts that its technology “changes the rules of research). Monitor110 was another similar service but fell upon hard times in mid 2008.

image

Source: SkyGrid 2008

Several comments will let me capture my thoughts:

First, the financial services sector has some challenges facing it. As a result, I expect some of the big name Bloomberg and Thomson Reuters customers to start demanding more value. The word value, in my opinion, means price cuts. This may be good or bad news for companies like SkyGrid. The good news is that its price point is appetizing compared to the hefty fees assessed by the incumbent real time data providers on Wall Street. The bad news is that a start up lacks the track record of the incumbents, so the cost of sales might be an issue. Long decision cycles may also work against the newcomers.

Second, other companies are pushing into real time. These range from “utility” type vendors such as Exegy. This company’s value proposition is speed; that is, no bottlenecks. Latency is a big deal for the surviving financial services firms. Also, such companies as Connotate and Relegence offer appealing services that are even more customized than some of the services now trying to make sales to the Wall Street crowd (minus Mr. Madoff’s operation, BearStearns’ and Lehman Brothers, of course).

Third, these new services are at their core “dataspace” plays. As the volume of information increases, the cost of the plumbing will be an ongoing issue for these challenges to Bloomberg and Thomson Reuters. Cluuz.com, for example, has shifted from direct indexing of Web content for its demonstration service to the Yahoo “build your own search service”.

Fourth, the for fee content vendors are going to have little choice but raise their rates. The Factiva unit of Dow Jones struggled as an independent entity. Now that company is inside Dow Jones and as Dow Jones’s financial pressure mount, watch for Factiva to charge more for its services, particularly the Wall Street Journal and Barron’s data.

Fifth, the Google looms over this entire sector. Here’s why that company is a serious mid term threat to both incumbents and start ups:

  1. Scale. Google has plumbing. Incumbents and competitors have to get it. Expensive that.
  2. Data. Google has quite a bit of structured and unstructured data. The incremental cost to the GOOG to expand http://finance.google.com is incremental, maybe incidental.
  3. Brand. The GOOG has the hot brand. Brand visibility sells.

In closing, I think there will be consolidation and attrition in this sector. I don’t think the services have flaws. I think that the broader datasphere is marshalling forces that will make life difficult.

Stephen Arnold, January 6, 2008

Google and Publishing

January 5, 2009

Two articles appeared in my newsreader. Both discuss Google and its impact on publishing. I won’t spoil your fun by summarizing these write ups. I want to highlight each and make one observation pertinent to search and content processing.

The first article is by the New York Times (a troubled ship is she too). The author is Motoko Rich and you can read “Google Hopes to Open a Trove of Little Seen Books” here. The subject is Google Book Search, the scanning project, and the usefulness of the service to the curious.

The second article is by an outfit doing business as Ohmy News. Its article is “The Web Is Winning the News War.” Peter Hinchliffe (Hinchy for short I think) points out that Web services are a challenge for the traditional news outfits. Hinchy does not mention Google, but the shadow falls over the story.

My observation is a modest one. Google disintermediates people, streamlines production, and relies on digital distribution. Books, news–whatever. The writing is on the wall. The Google is a disrupter and the implications have not been converted to learnings.

Stephen Arnold, January 5, 2009

Cloud Data Storage

January 5, 2009

The UK publication Sys-con.com published “Data Storage Has Been Taken for Granted” here. You may have to fight through some pop ups and kill the sound on the auto-running commercial, but you will want to put up with this wackiness to read Dave Graham’s article. Mr. Graham does a good job of highlighting the needs for cloud data storage. This initial article will be followed by other segments, so you will want to snag each of them. In this first installment, for me the most important comment was:

Each type of content, whether it be structured or unstructured,  has different influencing factors affecting its storage and retrieval.

The significance of this comment is that a vendor or storage provider will have to have the specific framework in place to handle the demands of different types of data storage and access. Why is this important? I run into quite a few people who dismiss storage as a non-issue. These issues are not trivial and data management remains one of the factors that govern the performance and cost of a storage system. The phrase “garbage in, garbage out” has given way to “get data in, get data out” easily, quickly, economically.

Stephen Arnold, January 5, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta