Text Analytics SummitPolySpot: Agile Enterprise Search Infrastructure

Short List of Image Search Tools

October 29, 2010

Short honk: One never knows when this type of list will be needed. “7 Image Search Tools That Will Change Your Life” provides descriptions, some screenshots, and links to seven image search tools. My life has not been changed, but a happy quack to Brain Pickings for the information. One example:

Retrievr at http://labs.systemone.at/retrievr/

Stephen E Arnold, October 29, 2010

Freebie

July Google Becoming Bing?

July 23, 2010

It seems that Google isn’t immune to adopting good ideas when it sees them in other places. Reading Google Positively Bing-Like With New Image Search Capabilities we see they’ve updated their search technology to view over 1000 images on each page. It shows that even Google knows when they need to change and keep moving ahead and it shows that they’re not immune to influences from the likes of Microsoft.

There are other noteworthy changes and these include increasing the density of the search results page and the ability to get a bigger preview of an image by hovering a mouse over it.

Of course no changes would be complete without some kind of advertising friendly features as well. Hence the new image format called Google Search Ads. Still the Microsoft influence makes us wonder whatever happened to innovation at Google? Strange for a company where searches are the lifeblood.

I don’t like the endless page “thing”. Latency remains an issue with certain network connections. How about a button to reclaim the “old” image search. Better yet, do something original.

Stephen E Arnold, July 23, 2010

Google Probes the Underbelly of AutoCAD

October 15, 2009

Remember those college engineering wizards who wanted to build real things? Auto fenders, toasters, and buildings in Dubai. Changes are the weapon of choice was a software product from Autodesk. Over the years, Autodesk added features and functions to its core product and branched out into other graphic areas. In the end, Autodesk was held captive by the gravitational pull of AutoCAD.

In one of my Google monographs, I wrote about Google’s SketchUp program. I recall several people telling me that SketchUp was unknown to them. These folks, I must point out, were real, live Google experts. SketchUp was a blip on a handful of users’ radar screen. I took another angle of view, and I saw that the Google coveted the engineering wizards when they were in primary school and had a method for keeping these individuals in the Google camp until they designed their last, low-cost fastener for a green skyscraper in Shanghai.

No one really believed that this was possible.

My suggestion is that some effort may be prudently applied to rethinking what the Google is doing with engineering software that makes pictures and performs other interesting Googley tricks. The first step could be reading the Introducing Google Building Maker article on the “official” Google Web log. I would gently suggest that the readers of this Web log buy a copy of the Google trilogy, consisting of my three monographs about Google technology. Either path will give you some food for thought.

For me, the most interesting comment in the Google blog post was:

Some of us here at Google spend almost all of our time thinking about one thing: How do we create a three-dimensional model of every built structure on Earth? How do we make sure it’s accurate, that it stays current and that it’s useful to everyone who might want to use it? One of the best ways to get a big project done — and done well — is to open it up to the world. As such, today we’re announcing the launch of Google Building Maker, a fun and simple (and crazy addictive, it turns out) tool for creating buildings for Google Earth.

The operative phrase is “every built structure on early”. How is that for scale?

What about Autodesk? My view is that the company is going to find itself in the same position that Microsoft and Yahoo now occupy with regard to Google. Catch up is impossible. Leap frogging is the solution. I don’t think the company can make this type of leap. Just my opinion.

Stephen Arnold, October 15, 2009
Another freebie. Not even a lousy Google mouse pad for my efforts.

Oracle Taps Brainware

October 15, 2009

The Reuters’s story “Brainware Signs OEM Agreement with Oracle for Intelligent Data Extraction” caught me and probably the folks at ZyLAB and other content processing companies by surprise. Brainware and its patented trigram technology has created strong believers in some markets such as litigation support. But the company has been working to strengthen its content acquisition functionality as well. The idea is that paper and electronic information enter at one end and searchable at the other. Oracle has been lagging in search. The Triple Hop technology has not taken center stage in my opinion. The Brainware deal seems to be for the content acquisition functions, what the news story calls “intelligent data capture”; that is, scanning and transforming functions plus entity extraction. Will Oracle embrace Brainware’s search and retrieval technology as well? Good question. Secure Enterprise Search needs some vitamins in my opinion. My hunch is that Oracle is beefing up its back end content intake system in order to deal with the increasingly successful Autonomy combine which continues to put pressure on big boys like Oracle. Brainware benefits from the publicity this tie up will produce. Search vendors, in my opinion, need this type of buzz to light up the radar of information technology professionals who too often focus on three or four search vendors, ignoring some interesting  alternatives.

Stephen Arnold, October 14, 2009

Microsoft Visual Search

October 7, 2009

The Inquirer has a knack for innovation. The story “Microsoft Demos Visual Search” provides a good description of a forthcoming Microsoft service. Images are grouped in galleries, making it easy to spot a particular image. The Inquirer reported, “Once a gallery has been selected, the images can be filtered and sorted through a series of sub-categories based on the gallery.” The Inquirer pointed out that the demo included 40 topics including dog breeds. Everything was flowing smoothly. The Inquirer then pointed out that the service would be particularly useful in an image segment popular with some folks but not discussed in polite circles. Kudos to Microsoft and to the Inquirer for its product application savvy.

Stephen Arnold, October 7, 2009

Google and Image Recognition

June 29, 2009

Not content with sophisticated image compression, Google continues to press forward in image recognition. Face recognition surfaced about a year ago. You can get some background about that home-grown technology in “Identifying Images Using Face Recognition”, US2008/0130960, filed in December 2006. The company has  long history of interest in non text objects. If you are not familiar with Larry Page’s invention “Method for Searching Media” US2004/0122811 was filed in 2003.

app of face recogniton

Source: Neven Technologies, 2006

The catalyst for the missing link between auto identified and processed images and assigning meaningful tags to images such as “animal” or “automobile” arrived via Google’s purchase of Neven Vision (originally I think the company used the “Eyematic” name. The switch seems to have taken place in 2003 or 2004.)

At that time, All Business described the company in this way:

Neven Vision purchased Eyematic’s assets in July 2003. Dr. Hartmut Neven, one of the world’s leading machine vision experts, led the technical team that created the original Eyematic system. Dr. Neven is also developing groundbreaking “next generation” face and object recognition technologies at USC’s Information Sciences Institute (ISI).

Google snagged with the acquisition the Eyematic patent documents. These make interesting reading, and I direct your attention to “Face Recognition from Video Images”, US6301370, which seems to be part of the Neven technology suite. The US patent document is – ah, somewhat disjointed.

Mixing Picasa, home grown technology, and the image recognition technology from Neven, Google had the ingredients for tackling a tough problem in content processing; namely, answering the question, “What’s that a picture of?”

Google provided some information in June 2009. A summary of Google’s image initiative appeared in Silicon.com, which published “Google Gets a New Vision When It Comes to Pictures”. (Silicon.com points to CNet.com which originally ran the story.) Tom Krazit reported:

Google thinks it has made a breakthrough in “computer vision”. Imagine stumbling upon a picture of a beautiful landscape filled with ancient ruins, one you didn’t recognize at first glance while searching for holiday destinations online. Google has developed a way to let a person provide Google with the URL for that image and search a database of more than 40 million geotagged photos to match that image to verified landmarks, giving you a destination for that next trip. The project is still very much in the research stage, said Jay Yagnik, Google’s head of computer vision research.

For me the key point in the Silicon.com story was that Google used its “big data” approach to making headway in image recognition. When matched to technology evolving from the FERET program, Google can disrupt a potentially lucrative sector for some big government integration firms.  The idea is that with lots of data, Google’s “smart software” can figure out what an image is about. Tapping Google’s clustering technology, Google’s Picasa image collection has been processed engineers to assign meaningful semantic tags to digital objects that don’t contain text.

Read more

Web Site Search: More Confusion

May 1, 2009

Diane Sterling, e-Commerce Times, wrote a story that appeared in my newsreader as a MacNewsWorld.com story called “The Wide Open World of Web Site Search”.

. You can find the article here. The write up profiles briefly several search systems; namely:

  • SLI systems here. I think of this company as providing a product that makes it easy to display items from a catalog, find indexed items, and buy a product. The company has added a number of features over the years to deliver facets, related searches, and suggestions. In my mind, the product shares some of the features of EasyAsk, Endeca, and Mercado (now owned by Omniture), among others.
  • PicoSearch here is a hosted service, and I think of it as a vendor offering indexing in a way that resembles Blossom.com’s service (used on this Beyond Search Web log) or the “old” hosted service provided by Fast Search & Transfer prior to its acquisition by Microsoft. Google offers this type of search as well. Google’s Site Search makes it easy to plop a Google search box on almost any site, but the system does not handle structured content in the manner of SLI Systems, for example.
  • LTU Technologies here. I first encountered LTU when it was demonstrating its image processing technology. The company has moved from its government and investigative focus to e-commerce. The company’s core competency, in my view, is image and video processing. The system can identify visual similarity. A customer looking at a red sweater will be given an opportunity to look at other jacket-type products. No human has to figure out the visual similarity.

Now the article is fine but I was baffled by the use of the phrase “Web site search”. The idea I think is to provide the user with a “finding experience” that goes beyond key word searching. On that count, SLI and LTU are good examples for e-commerce (online shopping). PicoSearch is an outlier because it offers a hosted text centric search solution.

Another issue is that the largest provider of site search is our good pal Googzilla. Google does not rate a mention, and I think that is a mistake. Not only does Google make it possible to search structured data but the company offers its Site Search service. More information about Site Search is here.

These types of round up articles, in my opinion, confuse those looking for search solutions. What’s the fix? I think the write up should have made the focus on e-commerce in the title of the article and probably early in the write up included the words “e-commerce search”. Second, I think the companies profiled should have been ones who deliver e-commerce search functions. None of the profiled companies have a big footprint in the site search world that I track. This does not mean that the companies don’t have beefy revenue or satisfied customers. I think that the selection is off by 15 degrees and a bit of a fruit salad, not a plate of carrots.

Why do I care?

There is considerable confusion about search. There are significant differences between a search system for a text centric site and a search system for a structured information site such as an e-commerce site. One could argue that Endeca is a leader in e-commerce. That’s fine but most people don’t know this side of Endeca. The omission is confusing. The result, in my experience, is that the reader is confused. The procurement team is confused. And competitors are confused. Search is tough enough without having the worlds of image, text, and structured data scrambled unnecessarily.

Stephen Arnold, May 1, 2009

OpenText and Endeca Tie Up: Digital Asset Management Play

April 17, 2009

OpenText has a six pack of search systems. There’s the original Tim Bray SGML search system (either the first or one of the first), the Information Dimensions BASIS (structure plus analytics which we used for a Bellcore project eons ago), BRS Search (a rewrite of STAIRS III which I’m sure the newly minted search consultant who distributed a search methodology built on a taxonomy will have in depth expertise), the Fulcrum engine (sort of Windows centric with some interesting performance metrics), and a couple of others which may or may not be related to the ones I’ve named). Endeca is a privately held vendor of search and content processing technology. I like the Endeca system for ecommerce sites where the “guided navigation” can display related products. Endeca has been working overtime to develop a business intelligence revenue stream and probe new markets such as traditional library search. The company received an infusion of cash last year and I heard that the company had made strides in addressing both scaling and performance challenges. One reseller allegedly told a government procurement officer that Endeca had no significant limit on the volume of content that it could index and make findable.

So what are these two powerhouses doing?

According to Newsfactor here, the two companies are teaming up for digital asset reuse. Most organizations have an increasing amount of podcasts, videos, images, and other rich media. If you read my link tasty essay about content management (the mastodon) and the complexities of dealing with content objects in containers (tar pit), you know that there is an opportunity to go beyond search.

The Newsfactor story is called “Open Text, Endeca to Deliver Digital Asset Reuse”. My understanding of the Newsfactor version of the deal is that OpenText will integrate Endeca’s asset management system into OpenText content management systems. There are a number of product names in the write up, and I must confess I confuse them with one another. I am an old and addled goose.

What’s the implication of the tie up? I think that Autonomy’s push into asset management with its IDOL server and the Virage software has demonstrated that there’s money in those rich media objects that are proliferating like gerbils. The world of ediscovery has an asset twist as well. Videos and podcasts have to be located and analyzed either by software or a semi alert paralegal, maybe a junior lawyer. OpenText has a solid ediscovery practice, so there’s some opportunity there. In short, I think this tie up helps two established companies deal with a competitor who is aggressive and quicker to seize enterprise opportunities. Autonomy is a serious competitor.

What will Autonomy and other vendors do? I think that in this economic climate there will be several reactions to monitor. Some aggressiveness on the part of Autonomy and probably Adobe will be quick to come. Second, other vendors of search and content processing systems will shift their marketing messages. A number of search systems have this capability and some, like Exalead, can make videos searchable with markers where particular passages can be viewed in the video object. This is quite useful. You can see a demo here. Third, I think that eDiscovery companies already adept at handling complex matters and content objects will become more price competitive. Stratify comes to mind as one outfit that may use price as a counter to the OpenText and Endeca tie up. I can point to start ups, aging me-too outfits like IBM, and a fair number of little known specialists in rich media who may step up their marketing.

This will be interesting to watch. OpenText is a bit like the old Ling Temco Vought type of roll up. Endeca is a solid vendor of search and content processing technology that was unable to pull off an initial public offering and a recipient of cash infusions from Intel and SAP’s venture arm. The expectation is that one plus one will equal three. In today’s market, there’s a risk that a different outcome may result.

Stephen Arnold, April 17, 2009

Yidio Update

March 29, 2009

Quite a few readers have shown interest in Yidio, the video search system I wrote about here. A reader sent me a link to this interesting post on Quantcast. The site has shown strong traffic growth in the first two months of 2009. You can view the data here. What’s interesting is that the viewers of Yidio don’t favor YouTube.com, if the Quantcast data are accurate. Frankly I had not heard of most of the sites in the “Audience Also Visits” listing; for example, tvduck.com, although the name appeals greatly to this addled goose. TVDuck seemed to be quite YouTube.com centric which begged the question, “How dependent on YouTube.com are these services.

A happy quack to the reader who pointed out that I did not mention that a videographer can make money by posting the content to Yidio. The procedure requires that the videographer provide his / her AdSense identification code. Click here for details.

Stephen Arnold, March 29, 2009

Flash Flex Silverlight

January 13, 2009

Search engines often stumble when indexing certain content types. I avoid Flash, Flex, and Silverlight myself, but there are 20 somethings who want to make my browser work like the local movie theatre. Here in Harrod’s Creek, Kentucky, we are getting new films every week or so.  Most are still black and white. But the Flash, Flex, and Silverlight crowd goes for color, sound, and the big screen. Well, I should say that Flash and Flex go for the big screen. Silverlight if the data presented by Rich Internet Application Statistics are correct. You can find the information here. The url is one that might be gone when you read this. The data point out that search vendors will be focusing on indexing Flash and Flex. Looking at the pie charts, the Adobe crowd has 90 percent penetration. Silverlight is chugging along in the 15 percent range. Well, the good news is that Microsoft Fast can probably index Silverlight content.

Stephen Arnold, January 13, 2009

.

Next Page »

  •  Only search links from this page: