Microsoft Visual Search

October 7, 2009

The Inquirer has a knack for innovation. The story “Microsoft Demos Visual Search” provides a good description of a forthcoming Microsoft service. Images are grouped in galleries, making it easy to spot a particular image. The Inquirer reported, “Once a gallery has been selected, the images can be filtered and sorted through a series of sub-categories based on the gallery.” The Inquirer pointed out that the demo included 40 topics including dog breeds. Everything was flowing smoothly. The Inquirer then pointed out that the service would be particularly useful in an image segment popular with some folks but not discussed in polite circles. Kudos to Microsoft and to the Inquirer for its product application savvy.

Stephen Arnold, October 7, 2009

Google and Image Recognition

June 29, 2009

Not content with sophisticated image compression, Google continues to press forward in image recognition. Face recognition surfaced about a year ago. You can get some background about that home-grown technology in “Identifying Images Using Face Recognition”, US2008/0130960, filed in December 2006. The company has  long history of interest in non text objects. If you are not familiar with Larry Page’s invention “Method for Searching Media” US2004/0122811 was filed in 2003.

app of face recogniton

Source: Neven Technologies, 2006

The catalyst for the missing link between auto identified and processed images and assigning meaningful tags to images such as “animal” or “automobile” arrived via Google’s purchase of Neven Vision (originally I think the company used the “Eyematic” name. The switch seems to have taken place in 2003 or 2004.)

At that time, All Business described the company in this way:

Neven Vision purchased Eyematic’s assets in July 2003. Dr. Hartmut Neven, one of the world’s leading machine vision experts, led the technical team that created the original Eyematic system. Dr. Neven is also developing groundbreaking “next generation” face and object recognition technologies at USC’s Information Sciences Institute (ISI).

Google snagged with the acquisition the Eyematic patent documents. These make interesting reading, and I direct your attention to “Face Recognition from Video Images”, US6301370, which seems to be part of the Neven technology suite. The US patent document is – ah, somewhat disjointed.

Mixing Picasa, home grown technology, and the image recognition technology from Neven, Google had the ingredients for tackling a tough problem in content processing; namely, answering the question, “What’s that a picture of?”

Google provided some information in June 2009. A summary of Google’s image initiative appeared in Silicon.com, which published “Google Gets a New Vision When It Comes to Pictures”. (Silicon.com points to CNet.com which originally ran the story.) Tom Krazit reported:

Google thinks it has made a breakthrough in “computer vision”. Imagine stumbling upon a picture of a beautiful landscape filled with ancient ruins, one you didn’t recognize at first glance while searching for holiday destinations online. Google has developed a way to let a person provide Google with the URL for that image and search a database of more than 40 million geotagged photos to match that image to verified landmarks, giving you a destination for that next trip. The project is still very much in the research stage, said Jay Yagnik, Google’s head of computer vision research.

For me the key point in the Silicon.com story was that Google used its “big data” approach to making headway in image recognition. When matched to technology evolving from the FERET program, Google can disrupt a potentially lucrative sector for some big government integration firms.  The idea is that with lots of data, Google’s “smart software” can figure out what an image is about. Tapping Google’s clustering technology, Google’s Picasa image collection has been processed engineers to assign meaningful semantic tags to digital objects that don’t contain text.

Read more

Web Site Search: More Confusion

May 1, 2009

Diane Sterling, e-Commerce Times, wrote a story that appeared in my newsreader as a MacNewsWorld.com story called “The Wide Open World of Web Site Search”.

. You can find the article here. The write up profiles briefly several search systems; namely:

  • SLI systems here. I think of this company as providing a product that makes it easy to display items from a catalog, find indexed items, and buy a product. The company has added a number of features over the years to deliver facets, related searches, and suggestions. In my mind, the product shares some of the features of EasyAsk, Endeca, and Mercado (now owned by Omniture), among others.
  • PicoSearch here is a hosted service, and I think of it as a vendor offering indexing in a way that resembles Blossom.com’s service (used on this Beyond Search Web log) or the “old” hosted service provided by Fast Search & Transfer prior to its acquisition by Microsoft. Google offers this type of search as well. Google’s Site Search makes it easy to plop a Google search box on almost any site, but the system does not handle structured content in the manner of SLI Systems, for example.
  • LTU Technologies here. I first encountered LTU when it was demonstrating its image processing technology. The company has moved from its government and investigative focus to e-commerce. The company’s core competency, in my view, is image and video processing. The system can identify visual similarity. A customer looking at a red sweater will be given an opportunity to look at other jacket-type products. No human has to figure out the visual similarity.

Now the article is fine but I was baffled by the use of the phrase “Web site search”. The idea I think is to provide the user with a “finding experience” that goes beyond key word searching. On that count, SLI and LTU are good examples for e-commerce (online shopping). PicoSearch is an outlier because it offers a hosted text centric search solution.

Another issue is that the largest provider of site search is our good pal Googzilla. Google does not rate a mention, and I think that is a mistake. Not only does Google make it possible to search structured data but the company offers its Site Search service. More information about Site Search is here.

These types of round up articles, in my opinion, confuse those looking for search solutions. What’s the fix? I think the write up should have made the focus on e-commerce in the title of the article and probably early in the write up included the words “e-commerce search”. Second, I think the companies profiled should have been ones who deliver e-commerce search functions. None of the profiled companies have a big footprint in the site search world that I track. This does not mean that the companies don’t have beefy revenue or satisfied customers. I think that the selection is off by 15 degrees and a bit of a fruit salad, not a plate of carrots.

Why do I care?

There is considerable confusion about search. There are significant differences between a search system for a text centric site and a search system for a structured information site such as an e-commerce site. One could argue that Endeca is a leader in e-commerce. That’s fine but most people don’t know this side of Endeca. The omission is confusing. The result, in my experience, is that the reader is confused. The procurement team is confused. And competitors are confused. Search is tough enough without having the worlds of image, text, and structured data scrambled unnecessarily.

Stephen Arnold, May 1, 2009

OpenText and Endeca Tie Up: Digital Asset Management Play

April 17, 2009

OpenText has a six pack of search systems. There’s the original Tim Bray SGML search system (either the first or one of the first), the Information Dimensions BASIS (structure plus analytics which we used for a Bellcore project eons ago), BRS Search (a rewrite of STAIRS III which I’m sure the newly minted search consultant who distributed a search methodology built on a taxonomy will have in depth expertise), the Fulcrum engine (sort of Windows centric with some interesting performance metrics), and a couple of others which may or may not be related to the ones I’ve named). Endeca is a privately held vendor of search and content processing technology. I like the Endeca system for ecommerce sites where the “guided navigation” can display related products. Endeca has been working overtime to develop a business intelligence revenue stream and probe new markets such as traditional library search. The company received an infusion of cash last year and I heard that the company had made strides in addressing both scaling and performance challenges. One reseller allegedly told a government procurement officer that Endeca had no significant limit on the volume of content that it could index and make findable.

So what are these two powerhouses doing?

According to Newsfactor here, the two companies are teaming up for digital asset reuse. Most organizations have an increasing amount of podcasts, videos, images, and other rich media. If you read my link tasty essay about content management (the mastodon) and the complexities of dealing with content objects in containers (tar pit), you know that there is an opportunity to go beyond search.

The Newsfactor story is called “Open Text, Endeca to Deliver Digital Asset Reuse”. My understanding of the Newsfactor version of the deal is that OpenText will integrate Endeca’s asset management system into OpenText content management systems. There are a number of product names in the write up, and I must confess I confuse them with one another. I am an old and addled goose.

What’s the implication of the tie up? I think that Autonomy’s push into asset management with its IDOL server and the Virage software has demonstrated that there’s money in those rich media objects that are proliferating like gerbils. The world of ediscovery has an asset twist as well. Videos and podcasts have to be located and analyzed either by software or a semi alert paralegal, maybe a junior lawyer. OpenText has a solid ediscovery practice, so there’s some opportunity there. In short, I think this tie up helps two established companies deal with a competitor who is aggressive and quicker to seize enterprise opportunities. Autonomy is a serious competitor.

What will Autonomy and other vendors do? I think that in this economic climate there will be several reactions to monitor. Some aggressiveness on the part of Autonomy and probably Adobe will be quick to come. Second, other vendors of search and content processing systems will shift their marketing messages. A number of search systems have this capability and some, like Exalead, can make videos searchable with markers where particular passages can be viewed in the video object. This is quite useful. You can see a demo here. Third, I think that eDiscovery companies already adept at handling complex matters and content objects will become more price competitive. Stratify comes to mind as one outfit that may use price as a counter to the OpenText and Endeca tie up. I can point to start ups, aging me-too outfits like IBM, and a fair number of little known specialists in rich media who may step up their marketing.

This will be interesting to watch. OpenText is a bit like the old Ling Temco Vought type of roll up. Endeca is a solid vendor of search and content processing technology that was unable to pull off an initial public offering and a recipient of cash infusions from Intel and SAP’s venture arm. The expectation is that one plus one will equal three. In today’s market, there’s a risk that a different outcome may result.

Stephen Arnold, April 17, 2009

Yidio Update

March 29, 2009

Quite a few readers have shown interest in Yidio, the video search system I wrote about here. A reader sent me a link to this interesting post on Quantcast. The site has shown strong traffic growth in the first two months of 2009. You can view the data here. What’s interesting is that the viewers of Yidio don’t favor YouTube.com, if the Quantcast data are accurate. Frankly I had not heard of most of the sites in the “Audience Also Visits” listing; for example, tvduck.com, although the name appeals greatly to this addled goose. TVDuck seemed to be quite YouTube.com centric which begged the question, “How dependent on YouTube.com are these services.

A happy quack to the reader who pointed out that I did not mention that a videographer can make money by posting the content to Yidio. The procedure requires that the videographer provide his / her AdSense identification code. Click here for details.

Stephen Arnold, March 29, 2009

Flash Flex Silverlight

January 13, 2009

Search engines often stumble when indexing certain content types. I avoid Flash, Flex, and Silverlight myself, but there are 20 somethings who want to make my browser work like the local movie theatre. Here in Harrod’s Creek, Kentucky, we are getting new films every week or so.  Most are still black and white. But the Flash, Flex, and Silverlight crowd goes for color, sound, and the big screen. Well, I should say that Flash and Flex go for the big screen. Silverlight if the data presented by Rich Internet Application Statistics are correct. You can find the information here. The url is one that might be gone when you read this. The data point out that search vendors will be focusing on indexing Flash and Flex. Looking at the pie charts, the Adobe crowd has 90 percent penetration. Silverlight is chugging along in the 15 percent range. Well, the good news is that Microsoft Fast can probably index Silverlight content.

Stephen Arnold, January 13, 2009

.

MSE360: Cooler than Cuil

January 6, 2009

I received an email from Daniel Clark. He provided me with some information about a new Web search engine, MSE360.com. I ran a number of test queries on the system and found it to be useful. The most interesting feature to me is what Mr. Clark calls “deep search”. He said:

We… have introduced Deep Search methods to try and provide the user with a notice when a site is known to host a valid privacy policy. Although this feature is still in beta and thus only a few million sites have been deep searched, the platform will in the end provide users with a way to decide what sites to trust.

When we do spot checks on some potentially useful but really low traffic Web sites like the National Railway Retirement Board, we have found that Google does not visit very often nor does the GOOG go much beyond three links deep. The key point, of course, is how often a Web indexing system pings a site to determine if there is new or changed information available. If you have a billion Web pages indexed and refresh only 10 percent of them, the index is not too useful. Other vendors only index sites that contribute to popular searches. This approach saves money and returns useless results unless one has the knack of searching what rings the bells of 15 year olds.

MSE360.com wants to change these practices. The engine also beeps when its visits a site with a virus. I was able to find a site that would inject trojans and the MSE360.com did not squawk. The system is new, and I think its virus alert will improve. The company also wants to protect users’ privacy. Google does this too, and until I see how the company grows, I applaud MSE360.com’s privacy initiative, but policies can change. You can generate tag clouds which show some of the popular searches on the system.

I ran a query for my Web log Beyond Search. We pop up on the results list but not in the top spot. No problem on my end. You can see from the screen shot below, that MSE360.com presents hits from Wikipedia, Web logs, traditional results in the middle panel, and images on the right hand panel. I was not able to run an image search, but I did not dig into the advanced search options very deeply. You can see more results by clicking a relatively tiny hot link at the bottom of the very dense results page.

mse360 screen

Mr Clark said:

We wanted to allow users to get the most out of there time, so in turn we designed the 3 tier layout. This layout allows for the user to get images, blogs, Wikipedia and web results, all on one page. When we polled 250 random Internet users over 70% said they preferred the layout over Yahoo. Of course the other 30% didn’t!

I found the system useful. Check it out. I will keep my eye on the service. I don’t have substantive information about funding and other basic facts. When I get them, I will pass them along.

Stephen Arnold, January 6, 2009

BBC: Search Is a Backwater

September 27, 2008

I just read a quite remarkable essay by a gentleman named Richard Titus, Controller, User Experience & Design for BBC Future Media & Technology. (I like the word controller.) I am still confused by the time zone zipping I have experienced in the past seven days. At this moment in time, I don’t recall if I have met Mr. Titus or if I have read other writings by him. What struck me is that he was a keynote at a BBC Future Media & Technology Conference. My first reaction is that to learn the future a prestigious organization like the BBC might have turned toward the non-BBC world. The Beeb disagreed and looked for its staff to illuminate the gloomy passages of Christmas Yet to Come. You can read this essay “Search and Content Discovery” here. In fact, you must read it.

With enthusiasm I read the essay. Several points flew from the page directly into the dead letter office of my addled goose brain. There these hot little nuggets sat until I could approach them in safety. Here are the points that cooked my thinking:

  1. Key word search is brute force search.
  2. Yahoo BOSS is a way to embrace and extend search
  3. The Xoogler Cuil.com system looked promising but possibly disappoints
  4. Viewdle facial recognition software is prescient. (This is an outfit hooked up with Thomson Reuters, known for innovation by chasing markets before the revenue base crumbles away. I don’t associate professional publishers with innovation, however.)
  5. Naver from Korea is a super electronic game portal.
  6. Mahalo is a human-mediated system and also interesting, and the BBC has a topics page which also looks okay
  7. SearchMe, also built by Xooglers, uses a flash-based interface.

searchmeresults

Xooglers are inspired by Apple’s cover flow. Now how many hits did my query “beyond search” get. Can your father figure out how to view the next hit or make this one large enough to read, a brute force way to get information of course.

These points were followed by this statement:

When you marry solid data and indexing (everyone forgets that Google’s code base is almost ten years old), useful new data points (facial recognition, behavioral targeting, historical precedent, trust, etc) with a compelling and useful user experience, we may see some changes in the market leadership of search.

I would like to comment on each of these points:

Read more

VideoSurf: Video Metasearch

September 23, 2008

I received an invitation to preview VideoSurf, a video metasearch provider, based in San Mateo, California. I tested the system whilst recovering from my wonderful Northwest Airlines flight from Europe to the US of A. When I fired up my laptop with the high speed Verizon service, I couldn’t get the video to run. When I switched to a high speed connection in my office, the search results were snappy and the videos I viewed ran without a hitch. Nice high speed network, Verizon.

The system offers a number of useful features:

  • When I misspelled Google, the system offered a “did you mean” to fix up my lousy typing
  • A handy checkbox in the left hand column allowed me to exclude certain video sites from the query. I noticed that the “world’s largest video search engine” Blinkx was not included.
  • There’s a porn and no porn filter, which you can use to turn on porn. However, when I ran my test query “teen dancing” on the non-porn setting, I got some pretty exciting videos in my result set. I was too tired to watch more than a few seconds of gyrations to conclude that the non porn filter needs some fine tuning.

VideoSurf analyzes the contents of video. Most video search engines work with metadata and close caption information. Googzilla, not surprisingly, has introduced its own technology to index the audio content of files. For now, I thought VideoSurf was useful for general purpose queries. I did not find it as helpful for locating Google lectures at universities or for pinpointing presentations given at various Microsoft events. But it’s early days for the service.

videosurf screen bill gates

This is what I saw when I ran my test query “Bill Gates”.

The company says here:

VideoSurf has created a better way for users to search, discover and watch online videos. Using a unique combination of new computer vision and fast computation methods, VideoSurf has taught computers to “see” inside videos to find content in a fast, efficient, and scalable way. Basing its search on visual identification, rather than text only, VideoSurf’s computer vision video search engine provides more relevant results and a better experience to let users find and discover the videos they really want to watch. With over 10 billion (and rapidly growing!) visual moments indexed from videos found across the web, VideoSurf allows consumers to visually navigate through their results to easily find the specific scenes, people or moments they most want to see. Users can now spend less time searching and more time being entertained! VideoSurf was founded in 2006 by leading experts in search, computer vision and fast computation technology and aims to become the destination for users looking to find, discover and watch online videos. The company is based in San Mateo, California.

The company was founded by Lior Delgo of FareChase.com fame. The technical honcho is Achi Brandt, who is a certified math whiz. The rest of the company’s management team is here.

The service merits a closer look.

Stephen Arnold, September 23, 2008

tyBit: Zero Click Fraud

September 11, 2008

I’m getting my bobbin loaded and squirting the trestle on my sewing machine. It’s almost time for Project Runway, my favorite television show. I put down my can of 3 in 1 oil and scanned my newsreader for gewgaws. What did I see? A story in the prestigious Forbes Magazine about a new search engine called tyBit. I put down my bobbin and picked up my mouse. The paragraph in  the Business Wire story on the lofty Forbes.com’s Web site said here:

tyBit is the only Internet search solution that eliminates click fraud for its advertisers and provides itemized billing for all advertising dollars spent. It is also a no-cost private label search engine for traditional media so they can win back their advertisers, subscribers and revenue.

I navigated to the tyBit Web site, which was new to me, and saw this splash page complete with my tyBit “man”.

tybit splash

I ran my favorite query “ArnoldIT Google” and received this list of results:

arnoldit query

I was happy. The first hit pointed to something I had written.

I then ran an image search on the query “arnoldit” and saw this:

imae search

There I was in bunny rabbit ears in 1981 and in 2007 with my lifetime achievement award for ineptitude. Happy again.

But I clicked on the ad with label “Get free advertising now.” I don’t do advertising. No one hires me anyway. I clicked on the ad, hit back, and then clicked again. What do you know? Click fraud; that is, the click with no intent to buy. In fact, I did it seven or eight times until I decided that the zero click fraud assertion did not apply to house ads on queries about “ArnoldIT Google.”

The site indexes images, video, news, music, “local” and “shop”. I found a line to sign up for tyBit mail. I did not click on each of these links. Project Runway awaits. The Forbes.com write up provides some metrics about the company:

  • More than 6,000 advertisers test the click fraud technology
  • The site averages 2.1 million search per day and 50 million searches in August 2008
  • One advertiser got more than 40 leads.

Sounds good. My suggestion is read the Forbes.com write up, explore the tyBit site here, and make up your mind. Google’s dominance does not seem to intimidate Clarence Briggs, CEO of tyBit. I have added this company to my watch list. Lots of search innovation out there right now there is, there is.

Stephen Arnold, September 11, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta