Embedding Lucene

January 31, 2010

The goslings and I participated in a search conference call last week. One of the topics du jour is Lucene. The open source search system continues to fascinate certain government procurement teams and those looking for a low-cost way to provide users with a search-and-retrieval system. The enthusiasm for Lucene and Solr goes up as the age of the information technology professionals decreases. Whatever universities are putting in the Red Bull sold in computer science departments seems to trigger a Lucene / Solr craving.

In the course of the conversation, I mentioned embedding Lucene in commercial software. The advantages ranged from low cost to sidestepping the blow-back from customers. The blow back occurs when the users of software want a feature not in the OEM “stub” embedded in a system or gizmo. The fix is to buy the full version of the software. The “stub” is a good enough chunk of functionality, but it won’t do the fancy back flips some users want when looking for information.

scribovox diagram

© Scribovox 2009

Lucene can be extended as long as the outfit doing the embedding has some Lucene experts on staff or access to a consultant able to keep appointments, complete work on time and in budget, and writes code that works. The example I gave was the Lucene within Scribovox.com.

Scribovox is a software that performs such tricks as converting a podcast to text. You can get more information about the product at http://www.scribvox.com. The information I referenced came from a June 17, 2009 Scribovox design document called “Integration with Social Networks.” I found the information in this write up quite useful, and you can download a copy of the paper from this link.

The author of the paper is Patrick Nicholas. He discusses some interesting ideas; for example:

  • Flow diagrams for processing real time content
  • A useful architecture diagram
  • A discussion of indexing and summarization
  • Some information about Amazon EC2, MapReduce and Hadoop.

If you are serious about open source, I would tuck this document in your bag of tricks. The time estimation puts search and semantics into perspective. Useful for the azure chip crowd since most don’t have too much, if any, oil under their fingers from removing the fuel injection unit from a search system.

Stephen E Arnold, January 31, 2010

A freebie. No one paid me to write this. I will report this charitable act to the boss at the National Cathedral on Wisconsin Avenue, in Washington, DC.

Black and White Photo Search

January 30, 2010

Short honk: I wanted to let you know that “Top 5 Black & White Image Search Engines” provides a description of five photo search systems. What makes this  list useful is that the angle is black and white pix.

Stephen E Arnold, January 29, 2010

A freebie. I will report this to the photo manager at the Department of Energy where there is considerable expertise in managing images.

Can Search Save YouTube?

January 26, 2010

YouTube.com has been a topic of conversation here at the goose pond today. Several of the goslings commented about the redesign. Another pointed out that the search function was a hit-and-miss affair. I described a couple of patent documents such as US2006/0080238 that I thought were designed to give Google’s grassroots media video service some lift (as in pants on the ground). I don’t think search can save YouTube.com. Money can.

pantsontheground

Finding pants on the ground was easy. It’s not so easy finding some other videos.

When I read “You Tube Is Doomed Guy Refuses to Admit He Was Wrong (But YouTube No Longer Doomed”, I learned that YouTube.com is going to become a pay-per-view operation. The story in Silicon Alley Insider suggests that Google will emulate the Hulu.com model.

The write up presents a summary of some conflicting or maybe just fluid information about the profitability of the YouTube.com service. Google bought YouTube.com in 2006 at about the same time it was working out a deal with dMarc and lifting some other rich media barbells in the Google gymnasium.

The key passage for me was:

Google never figured out how to get advertisers excited about millions of people’s home videos. Benjamin [critic of YouTube.com and CEO of Fliqz.com] thinks Google will continue to chase after premium content, making the site more like Hulu. He also thinks eventually, Google could charge a small fee to upload video to the site.  In other words, YouTube isn’t doomed.

The guts of the article is an interview with Benjamin Wayne, Fliqz and it is worth reading.

The goslings and I were uncertain about YouTube.com. On one hand, it seems to have some challenges in the search department. Finding a video is often most easily accomplished looking for a link in a write up, not by searching for a video. The ads are indeed annoying, and these may have disappointed both Google and the folks buying ads on YouTube.com videos. On the other hand, does the world need another for-fee video site. These seem to be predicated on the same assumptions one finds in the eBook reader sector. More may not yield a bigger revenue pie.

What is Google’s play in rich media? Perhaps Google has matured sufficiently to realize that there are other business models, but these may not lend themselves to the Googley style of management. Management, not emulating Hulu.com or some other for fee rich media service, may be  the deciding factor for YouTube.com.

Stephen E Arnold, January 26, 2010

A freebie. Someone promised to pay me a pittance in the future, but that faint assertion had nothing to do with the plight of YouTube.com.

YouTube Terms of Service Updated

January 19, 2010

I have been thinking about Google and rich media. Rich media means multimedia. Multimedia means YouTube. These terms are important because Google uses a wide range of words and phrases to describe its rich media services and capabilities.

On January 14, 2010, Google posted “YouTube’s APIs and Refresher on our Terms of Service”.  The write up does a good job of highlighting the major changes. My view of the changes is that they nudge the YouTube service forward to commercial payoff land.

For example, the point “Videos belong to their owners” is a gentle reminder that Google’s innovations in giving content owners a control panel on which to input settings is an important function. The more content owners input the rules for a particular content object, the more useful the Google control panel or content owner dashboard becomes in the upload process.

The focus on the YouTube video player is a reminder that consistency for Google is a positive. Google is pointing out that certain actions are not making the Google happy; for example, enable videos for download.

The third point is that the Google wants ads left alone. Period. Stripping ads is a no no. The person who wants to monetize a video can read the API monetization guide. If you have not looked at this API, it is worth a quick look. You can find the API monetization guide with some helpful links on the Google Code page in the write up “Using the YouTube APIs to Bu9ild Monetizable Applications.” We geese at Beyond Search think this is a pretty important chunk of info, by the way.

Finally, Google wants those who do charge for a video to make clear that Google is not charging. My hunch is that Google gets email complaining about fees for some YouTube videos and Google doesn’t have time to handle that type of email. Heck, Google has a tough time handling email for the Nexus One phone. It doesn’t need more email about an issue a content provider causes. Just my opinion, gentle reader.

You may want to add the YouTube API blog to your newsreader if you are into rich media, multimedia, video, or related content types.

Stephen E Arnold, January  19, 2010

A short article I wrote without anyone, including a TV or motion picture company, paying me for the effort. Is the Oscar committee in charge of this type of write up and disclosure. I will report to them to be sure.

Google and Face Recognition

January 18, 2010

You can read a good summary of a mainstream publication’s analysis of Google and its face recognition technology. Just navigate to Google Blogoscoped and check out “German Spiegel on Google Goggles’ Face Recognition and More”. The only problem is that the author of the write up did not consider the application of this system and method to video. To get the full picture of the Google facial recognition capability, you may want to skip the traditional publication and read US20100008547 “Method and System for Automated Annotation of Persons in Video Content”. You can find this document at the USPTO’s free patent document Web site, www.uspto.com. I find it interesting that open source information about a specific and significant Google system and method is ignored. Much easier to write without too much information I suppose. That’s what keeps the Larry and Sergey eat pizza book writers in high clover.

Stephen E Arnold, January 18, 2010

A freebie. Due to the direct reference to the USPTO, I herewith report that I was not paid to point out this omission about Google’s facial recognition technology.

Personalized Playlists

January 15, 2010

I read “PerfectStream: The Future for Personalized Video Playlists, Advertising?” and thought that it was a good idea. I think that the sentence that snagged me was:

Munich-based PerfectStream is taking the business-to-business route and hopes to license its technology out to media and tech companies that already have professionally-produced or user-generated content. It came out of stealth this week and has raised funding solely from Brandenburg.

I recall reading about personalized playlists somewhere else. Maybe a Google patent application. Interesting.

Stephen E Arnold, January 15, 2010

Sad to say a freebie. Possible patent research ahead. I will report my non-compensation to the ever vigilant USPTO.

Interactive Computing from Apple

January 6, 2010

Short honk: Take a peek at how Apple presented the tablet concept about 25 years ago. Voice interaction, touch screen, and sort of rich media. You can find the video on TUAW.com here. Search does not work exactly the way depicted in the video. Slow progress. Mom still calls in the video. Son ignores mom. That’s accurate for some I suppose.

Stephen E. Arnold, January 6, 2010

A freebie. To whom do I report? I know for apple related information it must be the USDA.

Real Canines and False Teeth

January 3, 2010

Read Staci Kramer’s “News Corp, Time Warner Cable Reach Deal Without Blackouts; Scripps Still On Bubble”. Rupert Murdoch’s threats succumbed to common sense and money, or was it money and common sense. Here, in my opinion, is the key point:

The deal announced some 19 hours after the New Year’s Eve midnight deadline covers the Fox television stations, Fox, Fox Cable Networks and Fox’s Regional Sports Networks for TWC’s 13 million households. It also applies to Bright House Networks’ 2.4 million subs; the operator has a heavy presence in Florida, which means a particular interest in avoiding black screens when Florida meets Cincinnati in the Allstate Sugar Bowl.

So what? I think it makes it clear that Mr. Murdoch may berate the Google, but when the money or the common sense clicks in, there will be a deal. Quite a negotiator that Mr. Murdoch. I really like his references to quality journalism and other bits of business confetti. Money and common sense or common sense and money.

Stephen E. Arnold, January 3, 2010

A freebie, gentle reader, a freebie. Maybe I should email spam Mr. Murdoch for mentioning his negotiating skills? He spammed me in 2009. I will report my work-for-free mode to the Advisory Council on Historic Preservation. Mr. Murdoch wants to preserve the historic approach to information I think.

Netflix Jumps to Amazon

January 2, 2010

Want to enrage a giant, Oracular bull?

Bad news for Oracle, IBM as reported by Computerworld.com: Netflix is transferring its datacenter from Oracle on IBM hardware to Amazon Web Services’ (AWS) Elastic Compute Cloud (EC2) in an effort to save capital.  The switch comes as Netflix’s customer count is headed through the roof, and thus the cost and un-reliability of maintaining or expanding the existing data centers is becoming too great a burden.

Netflix was already patronizing AWS for other less critical applications like customer interfacing and even announced last May its intention to expand this relationship.  They weren’t kidding around.  The decision is prompted by three major cost points.  First, Oracle on IBM is inherently “very expensive”.  Second, it would have required long hours and great effort for Netflix to build their own data center when systems are added to AWS’s cloud with ease.  And finally, “EC2’s pay-as-you-go model means costs are elastic,” so no more paying for unused resources stranded on service contract.

Besides those direct cost reductions, this transition will free up other engineering resources required to baby-sit the existing infrastructure to be re-tasked in other areas.

Netflix makes some compelling arguments here; it doesn’t take long for the dominoes to fall.  Wonder if other companies will realize the same thing and follow suit.  It would be prudent for Oracle, IBM to investigate what upgrade options exist to be more competitive with AWS and to prevent further customer turnover.

Sarah Rogers, January 2, 2011

Freebie

Chief Economist of Google Invents a Search Tool for Advertising

December 21, 2009

Most companies don’t have a chief economist. Google has a chief economist. The economist is Hal R. Varian.

image

Dr. Varian has good paper.

image

Dr. Varian worked on a Google team which includes other Google wizards. The invention is “search tool advertising”. Definitely clear. In prose any patent attorney would be proud to claim, US2009/0299816 says:

A content item is presented to at least one user via a first medium, where the content item identifies a target concept. The first medium can be, for instance, radio, television, print advertisements, or the Internet. The number of requests at a search tool for the target concept are measured subsequent to the presentation of the content item in the first medium. The difference (e.g, increase or decrease) in use of a second medium, e.g., the Internet, subsequent to the presentation of the content item in the first medium can be measured, which can be used to modify a value associated with a subsequent presentation of the content item using the first medium.

Speaks volumes, doesn’t it?

The diagrams are abstract. The claims, all 30 of them, make clear that the Google is moving forward with the use of semi autonomous agents to assemble content. Although focused on advertising, the “assembly” plumbing can be seen elsewhere in Google’s open source information. (You think I am going to list these co-occurrences in a free Web log? Wrong.)

Several points strike me of interest:

  • The invention applies to text and other media; for example, television or radio
  • Metrics make the little method hum; that is, data from the system feed back and inform subsequent decisions the semi autonomous agents make
  • The use of the word “publisher” makes clear that the “digital Gutenberg” is alive and kicking. See [0030].

Stephen E. Arnold, December 21, 2009

Oyez, oyez, this is a freebie. I want to disclose this fact to the Economic Adjustment Office. Google competitors will have to make some adjustments due to Google economics. Where better to report and seek succor?

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta