Google Probes the Underbelly of AutoCAD
October 15, 2009
Remember those college engineering wizards who wanted to build real things? Auto fenders, toasters, and buildings in Dubai. Changes are the weapon of choice was a software product from Autodesk. Over the years, Autodesk added features and functions to its core product and branched out into other graphic areas. In the end, Autodesk was held captive by the gravitational pull of AutoCAD.
In one of my Google monographs, I wrote about Google’s SketchUp program. I recall several people telling me that SketchUp was unknown to them. These folks, I must point out, were real, live Google experts. SketchUp was a blip on a handful of users’ radar screen. I took another angle of view, and I saw that the Google coveted the engineering wizards when they were in primary school and had a method for keeping these individuals in the Google camp until they designed their last, low-cost fastener for a green skyscraper in Shanghai.
No one really believed that this was possible.
My suggestion is that some effort may be prudently applied to rethinking what the Google is doing with engineering software that makes pictures and performs other interesting Googley tricks. The first step could be reading the Introducing Google Building Maker article on the “official” Google Web log. I would gently suggest that the readers of this Web log buy a copy of the Google trilogy, consisting of my three monographs about Google technology. Either path will give you some food for thought.
For me, the most interesting comment in the Google blog post was:
Some of us here at Google spend almost all of our time thinking about one thing: How do we create a three-dimensional model of every built structure on Earth? How do we make sure it’s accurate, that it stays current and that it’s useful to everyone who might want to use it? One of the best ways to get a big project done — and done well — is to open it up to the world. As such, today we’re announcing the launch of Google Building Maker, a fun and simple (and crazy addictive, it turns out) tool for creating buildings for Google Earth.
The operative phrase is “every built structure on early”. How is that for scale?
What about Autodesk? My view is that the company is going to find itself in the same position that Microsoft and Yahoo now occupy with regard to Google. Catch up is impossible. Leap frogging is the solution. I don’t think the company can make this type of leap. Just my opinion.
Stephen Arnold, October 15, 2009
Another freebie. Not even a lousy Google mouse pad for my efforts.
Oracle Taps Brainware
October 15, 2009
The Reuters’s story “Brainware Signs OEM Agreement with Oracle for Intelligent Data Extraction” caught me and probably the folks at ZyLAB and other content processing companies by surprise. Brainware and its patented trigram technology has created strong believers in some markets such as litigation support. But the company has been working to strengthen its content acquisition functionality as well. The idea is that paper and electronic information enter at one end and searchable at the other. Oracle has been lagging in search. The Triple Hop technology has not taken center stage in my opinion. The Brainware deal seems to be for the content acquisition functions, what the news story calls “intelligent data capture”; that is, scanning and transforming functions plus entity extraction. Will Oracle embrace Brainware’s search and retrieval technology as well? Good question. Secure Enterprise Search needs some vitamins in my opinion. My hunch is that Oracle is beefing up its back end content intake system in order to deal with the increasingly successful Autonomy combine which continues to put pressure on big boys like Oracle. Brainware benefits from the publicity this tie up will produce. Search vendors, in my opinion, need this type of buzz to light up the radar of information technology professionals who too often focus on three or four search vendors, ignoring some interesting alternatives.
Stephen Arnold, October 14, 2009
Microsoft Visual Search
October 7, 2009
The Inquirer has a knack for innovation. The story “Microsoft Demos Visual Search” provides a good description of a forthcoming Microsoft service. Images are grouped in galleries, making it easy to spot a particular image. The Inquirer reported, “Once a gallery has been selected, the images can be filtered and sorted through a series of sub-categories based on the gallery.” The Inquirer pointed out that the demo included 40 topics including dog breeds. Everything was flowing smoothly. The Inquirer then pointed out that the service would be particularly useful in an image segment popular with some folks but not discussed in polite circles. Kudos to Microsoft and to the Inquirer for its product application savvy.
Stephen Arnold, October 7, 2009
Google and Image Recognition
June 29, 2009
Not content with sophisticated image compression, Google continues to press forward in image recognition. Face recognition surfaced about a year ago. You can get some background about that home-grown technology in “Identifying Images Using Face Recognition”, US2008/0130960, filed in December 2006. The company has long history of interest in non text objects. If you are not familiar with Larry Page’s invention “Method for Searching Media” US2004/0122811 was filed in 2003.
Source: Neven Technologies, 2006
The catalyst for the missing link between auto identified and processed images and assigning meaningful tags to images such as “animal” or “automobile” arrived via Google’s purchase of Neven Vision (originally I think the company used the “Eyematic” name. The switch seems to have taken place in 2003 or 2004.)
At that time, All Business described the company in this way:
Neven Vision purchased Eyematic’s assets in July 2003. Dr. Hartmut Neven, one of the world’s leading machine vision experts, led the technical team that created the original Eyematic system. Dr. Neven is also developing groundbreaking “next generation” face and object recognition technologies at USC’s Information Sciences Institute (ISI).
Google snagged with the acquisition the Eyematic patent documents. These make interesting reading, and I direct your attention to “Face Recognition from Video Images”, US6301370, which seems to be part of the Neven technology suite. The US patent document is – ah, somewhat disjointed.
Mixing Picasa, home grown technology, and the image recognition technology from Neven, Google had the ingredients for tackling a tough problem in content processing; namely, answering the question, “What’s that a picture of?”
Google provided some information in June 2009. A summary of Google’s image initiative appeared in Silicon.com, which published “Google Gets a New Vision When It Comes to Pictures”. (Silicon.com points to CNet.com which originally ran the story.) Tom Krazit reported:
Google thinks it has made a breakthrough in “computer vision”. Imagine stumbling upon a picture of a beautiful landscape filled with ancient ruins, one you didn’t recognize at first glance while searching for holiday destinations online. Google has developed a way to let a person provide Google with the URL for that image and search a database of more than 40 million geotagged photos to match that image to verified landmarks, giving you a destination for that next trip. The project is still very much in the research stage, said Jay Yagnik, Google’s head of computer vision research.
For me the key point in the Silicon.com story was that Google used its “big data” approach to making headway in image recognition. When matched to technology evolving from the FERET program, Google can disrupt a potentially lucrative sector for some big government integration firms. The idea is that with lots of data, Google’s “smart software” can figure out what an image is about. Tapping Google’s clustering technology, Google’s Picasa image collection has been processed engineers to assign meaningful semantic tags to digital objects that don’t contain text.
Web Site Search: More Confusion
May 1, 2009
Diane Sterling, e-Commerce Times, wrote a story that appeared in my newsreader as a MacNewsWorld.com story called “The Wide Open World of Web Site Search”.
. You can find the article here. The write up profiles briefly several search systems; namely:
- SLI systems here. I think of this company as providing a product that makes it easy to display items from a catalog, find indexed items, and buy a product. The company has added a number of features over the years to deliver facets, related searches, and suggestions. In my mind, the product shares some of the features of EasyAsk, Endeca, and Mercado (now owned by Omniture), among others.
- PicoSearch here is a hosted service, and I think of it as a vendor offering indexing in a way that resembles Blossom.com’s service (used on this Beyond Search Web log) or the “old” hosted service provided by Fast Search & Transfer prior to its acquisition by Microsoft. Google offers this type of search as well. Google’s Site Search makes it easy to plop a Google search box on almost any site, but the system does not handle structured content in the manner of SLI Systems, for example.
- LTU Technologies here. I first encountered LTU when it was demonstrating its image processing technology. The company has moved from its government and investigative focus to e-commerce. The company’s core competency, in my view, is image and video processing. The system can identify visual similarity. A customer looking at a red sweater will be given an opportunity to look at other jacket-type products. No human has to figure out the visual similarity.
Now the article is fine but I was baffled by the use of the phrase “Web site search”. The idea I think is to provide the user with a “finding experience” that goes beyond key word searching. On that count, SLI and LTU are good examples for e-commerce (online shopping). PicoSearch is an outlier because it offers a hosted text centric search solution.
Another issue is that the largest provider of site search is our good pal Googzilla. Google does not rate a mention, and I think that is a mistake. Not only does Google make it possible to search structured data but the company offers its Site Search service. More information about Site Search is here.
These types of round up articles, in my opinion, confuse those looking for search solutions. What’s the fix? I think the write up should have made the focus on e-commerce in the title of the article and probably early in the write up included the words “e-commerce search”. Second, I think the companies profiled should have been ones who deliver e-commerce search functions. None of the profiled companies have a big footprint in the site search world that I track. This does not mean that the companies don’t have beefy revenue or satisfied customers. I think that the selection is off by 15 degrees and a bit of a fruit salad, not a plate of carrots.
Why do I care?
There is considerable confusion about search. There are significant differences between a search system for a text centric site and a search system for a structured information site such as an e-commerce site. One could argue that Endeca is a leader in e-commerce. That’s fine but most people don’t know this side of Endeca. The omission is confusing. The result, in my experience, is that the reader is confused. The procurement team is confused. And competitors are confused. Search is tough enough without having the worlds of image, text, and structured data scrambled unnecessarily.
Stephen Arnold, May 1, 2009
OpenText and Endeca Tie Up: Digital Asset Management Play
April 17, 2009
OpenText has a six pack of search systems. There’s the original Tim Bray SGML search system (either the first or one of the first), the Information Dimensions BASIS (structure plus analytics which we used for a Bellcore project eons ago), BRS Search (a rewrite of STAIRS III which I’m sure the newly minted search consultant who distributed a search methodology built on a taxonomy will have in depth expertise), the Fulcrum engine (sort of Windows centric with some interesting performance metrics), and a couple of others which may or may not be related to the ones I’ve named). Endeca is a privately held vendor of search and content processing technology. I like the Endeca system for ecommerce sites where the “guided navigation” can display related products. Endeca has been working overtime to develop a business intelligence revenue stream and probe new markets such as traditional library search. The company received an infusion of cash last year and I heard that the company had made strides in addressing both scaling and performance challenges. One reseller allegedly told a government procurement officer that Endeca had no significant limit on the volume of content that it could index and make findable.
So what are these two powerhouses doing?
According to Newsfactor here, the two companies are teaming up for digital asset reuse. Most organizations have an increasing amount of podcasts, videos, images, and other rich media. If you read my link tasty essay about content management (the mastodon) and the complexities of dealing with content objects in containers (tar pit), you know that there is an opportunity to go beyond search.
The Newsfactor story is called “Open Text, Endeca to Deliver Digital Asset Reuse”. My understanding of the Newsfactor version of the deal is that OpenText will integrate Endeca’s asset management system into OpenText content management systems. There are a number of product names in the write up, and I must confess I confuse them with one another. I am an old and addled goose.
What’s the implication of the tie up? I think that Autonomy’s push into asset management with its IDOL server and the Virage software has demonstrated that there’s money in those rich media objects that are proliferating like gerbils. The world of ediscovery has an asset twist as well. Videos and podcasts have to be located and analyzed either by software or a semi alert paralegal, maybe a junior lawyer. OpenText has a solid ediscovery practice, so there’s some opportunity there. In short, I think this tie up helps two established companies deal with a competitor who is aggressive and quicker to seize enterprise opportunities. Autonomy is a serious competitor.
What will Autonomy and other vendors do? I think that in this economic climate there will be several reactions to monitor. Some aggressiveness on the part of Autonomy and probably Adobe will be quick to come. Second, other vendors of search and content processing systems will shift their marketing messages. A number of search systems have this capability and some, like Exalead, can make videos searchable with markers where particular passages can be viewed in the video object. This is quite useful. You can see a demo here. Third, I think that eDiscovery companies already adept at handling complex matters and content objects will become more price competitive. Stratify comes to mind as one outfit that may use price as a counter to the OpenText and Endeca tie up. I can point to start ups, aging me-too outfits like IBM, and a fair number of little known specialists in rich media who may step up their marketing.
This will be interesting to watch. OpenText is a bit like the old Ling Temco Vought type of roll up. Endeca is a solid vendor of search and content processing technology that was unable to pull off an initial public offering and a recipient of cash infusions from Intel and SAP’s venture arm. The expectation is that one plus one will equal three. In today’s market, there’s a risk that a different outcome may result.
Stephen Arnold, April 17, 2009
Yidio Update
March 29, 2009
Quite a few readers have shown interest in Yidio, the video search system I wrote about here. A reader sent me a link to this interesting post on Quantcast. The site has shown strong traffic growth in the first two months of 2009. You can view the data here. What’s interesting is that the viewers of Yidio don’t favor YouTube.com, if the Quantcast data are accurate. Frankly I had not heard of most of the sites in the “Audience Also Visits” listing; for example, tvduck.com, although the name appeals greatly to this addled goose. TVDuck seemed to be quite YouTube.com centric which begged the question, “How dependent on YouTube.com are these services.
A happy quack to the reader who pointed out that I did not mention that a videographer can make money by posting the content to Yidio. The procedure requires that the videographer provide his / her AdSense identification code. Click here for details.
Stephen Arnold, March 29, 2009
Flash Flex Silverlight
January 13, 2009
Search engines often stumble when indexing certain content types. I avoid Flash, Flex, and Silverlight myself, but there are 20 somethings who want to make my browser work like the local movie theatre. Here in Harrod’s Creek, Kentucky, we are getting new films every week or so. Most are still black and white. But the Flash, Flex, and Silverlight crowd goes for color, sound, and the big screen. Well, I should say that Flash and Flex go for the big screen. Silverlight if the data presented by Rich Internet Application Statistics are correct. You can find the information here. The url is one that might be gone when you read this. The data point out that search vendors will be focusing on indexing Flash and Flex. Looking at the pie charts, the Adobe crowd has 90 percent penetration. Silverlight is chugging along in the 15 percent range. Well, the good news is that Microsoft Fast can probably index Silverlight content.
Stephen Arnold, January 13, 2009
.
MSE360: Cooler than Cuil
January 6, 2009
I received an email from Daniel Clark. He provided me with some information about a new Web search engine, MSE360.com. I ran a number of test queries on the system and found it to be useful. The most interesting feature to me is what Mr. Clark calls “deep search”. He said:
We… have introduced Deep Search methods to try and provide the user with a notice when a site is known to host a valid privacy policy. Although this feature is still in beta and thus only a few million sites have been deep searched, the platform will in the end provide users with a way to decide what sites to trust.
When we do spot checks on some potentially useful but really low traffic Web sites like the National Railway Retirement Board, we have found that Google does not visit very often nor does the GOOG go much beyond three links deep. The key point, of course, is how often a Web indexing system pings a site to determine if there is new or changed information available. If you have a billion Web pages indexed and refresh only 10 percent of them, the index is not too useful. Other vendors only index sites that contribute to popular searches. This approach saves money and returns useless results unless one has the knack of searching what rings the bells of 15 year olds.
MSE360.com wants to change these practices. The engine also beeps when its visits a site with a virus. I was able to find a site that would inject trojans and the MSE360.com did not squawk. The system is new, and I think its virus alert will improve. The company also wants to protect users’ privacy. Google does this too, and until I see how the company grows, I applaud MSE360.com’s privacy initiative, but policies can change. You can generate tag clouds which show some of the popular searches on the system.
I ran a query for my Web log Beyond Search. We pop up on the results list but not in the top spot. No problem on my end. You can see from the screen shot below, that MSE360.com presents hits from Wikipedia, Web logs, traditional results in the middle panel, and images on the right hand panel. I was not able to run an image search, but I did not dig into the advanced search options very deeply. You can see more results by clicking a relatively tiny hot link at the bottom of the very dense results page.
Mr Clark said:
We wanted to allow users to get the most out of there time, so in turn we designed the 3 tier layout. This layout allows for the user to get images, blogs, Wikipedia and web results, all on one page. When we polled 250 random Internet users over 70% said they preferred the layout over Yahoo. Of course the other 30% didn’t!
I found the system useful. Check it out. I will keep my eye on the service. I don’t have substantive information about funding and other basic facts. When I get them, I will pass them along.
Stephen Arnold, January 6, 2009
BBC: Search Is a Backwater
September 27, 2008
I just read a quite remarkable essay by a gentleman named Richard Titus, Controller, User Experience & Design for BBC Future Media & Technology. (I like the word controller.) I am still confused by the time zone zipping I have experienced in the past seven days. At this moment in time, I don’t recall if I have met Mr. Titus or if I have read other writings by him. What struck me is that he was a keynote at a BBC Future Media & Technology Conference. My first reaction is that to learn the future a prestigious organization like the BBC might have turned toward the non-BBC world. The Beeb disagreed and looked for its staff to illuminate the gloomy passages of Christmas Yet to Come. You can read this essay “Search and Content Discovery” here. In fact, you must read it.
With enthusiasm I read the essay. Several points flew from the page directly into the dead letter office of my addled goose brain. There these hot little nuggets sat until I could approach them in safety. Here are the points that cooked my thinking:
- Key word search is brute force search.
- Yahoo BOSS is a way to embrace and extend search
- The Xoogler Cuil.com system looked promising but possibly disappoints
- Viewdle facial recognition software is prescient. (This is an outfit hooked up with Thomson Reuters, known for innovation by chasing markets before the revenue base crumbles away. I don’t associate professional publishers with innovation, however.)
- Naver from Korea is a super electronic game portal.
- Mahalo is a human-mediated system and also interesting, and the BBC has a topics page which also looks okay
- SearchMe, also built by Xooglers, uses a flash-based interface.
Xooglers are inspired by Apple’s cover flow. Now how many hits did my query “beyond search” get. Can your father figure out how to view the next hit or make this one large enough to read, a brute force way to get information of course.
These points were followed by this statement:
When you marry solid data and indexing (everyone forgets that Google’s code base is almost ten years old), useful new data points (facial recognition, behavioral targeting, historical precedent, trust, etc) with a compelling and useful user experience, we may see some changes in the market leadership of search.
I would like to comment on each of these points:


