Google OS: Nightmare in Redmond
March 16, 2009
ComputerWorld’s Steven J. Vaughan-Nichols reported here “Google OS Will Be on Netbooks by Year’s End”. Google has insisted that it does not have a Google operating system. I believed what I was told and used the phrase “Google operating environment.” Now it seems that I was dead wrong, if Mr. Vaughan-Nichols’ report is accurate. He wrote:
I predict that by December [2009], we’ll see not only Asus selling Android-based netbooks, but at least a half-dozen other vendors doing so as well. In bad times, businesses have to be smart, and Android on netbooks is a smart move indeed.
Google, of course, remains inscrutable. The company provides the Lego blocks. Google lets others in the playroom build whatever they want. A Google OS would add to Microsoft’s revenue concerns. If this ComputerWorld report is on the money, a nightmare in Redmond may await–low cost, search, contextualized ads, and good enough software with the cloud as a big fluffy cushion.
Stephen Arnold, March 15, 2009
Searching Microblog Content
March 16, 2009
The disruptive force of the flakey Twitter service continues. In case you have been hanging out with the goslings in Harrod’s Creek, Kentucky, Twitter is the somewhat unstable, rapidly growing, money losing micro blogging service. A micro blog is a text message that is short, less than 140 character if my addled goose memory is working this morning. Who cares about Twitter? Young people and the young at heart have tons of fun firing out text bullets in real time to anyone with a Twitter account. Unlike email which in theory is sort of a one to one communication, Twitter is spam fortified. Any post gets blasted to anyone with a Twitter account. To filter the stuff, one can “follow” a Twitter user. Dozens of utilities ranging from the silly to the stalker inspired are available.
I found the March 13, 2009, article “Microblogging Will Marginalize Corporate Email” here quite interesting. The idea is that microblogging is a disruptive technology. Over time, its utility will increase, particularly for “notifications” and certain types of marketing functions. I don’t disagree. If you are a Twitter watcher, you will want to save a copy of “I’m Not Actually a Geek’s” article. Ignoring Twitter as a source of useful intelligence is an oversight. The challenge of searching and generating knowledge from a Twitter stream remains an interesting challenge. I don’t think Twitter has a solution. Further, I don’t think any of the vendors whose software I monitor has a solution. Big opportunities in my opinion.
Stephen Arnold, March 15, 2009
EveryZing: Exclusive Interview with Tom Wilde, CEO
March 16, 2009
Tom Wilde, CEO of EveryZing, will be one of the speakers at the April 2009 Boston Search Engine Meeting. To meet innovators like Mr. Wilde, click here and reserve your space. Unlike “boat show” conferences that thrive on walk in gawkers, the Boston Search Engine Meeting is content muscle. Click here to reserve your spot.
EveryZing here is a “universal search and video SEO (vSEO) firm, and it recently launched MediaCloud, the Internet’s first cloud-based computing service for generating and managing metadata. Considered the “currency” of multimedia content, metadata includes the speech transcripts, time-stamped tags, categories/topics, named entities, geo-location and tagged thumbnails that comprise the backbone of the interactive web.
With MediaCloud, companies across the Web can post live or archived feeds of video, audio, image and text content to the cloud-based service and receive back a rich set of metadata. Prior to MediaCloud and the other solutions in EveryZing’s product suite — including ezSEARCH, ezSEO, MetaPlayer and RAMP — discovery and publishing of multimedia content had been restricted to the indexing of just titles and tags. Delivered in a software-as-a-service package, MediaCloud requires no software to purchase, install or maintain. Furthermore, customers only pay for the processing they need, while obtaining access to a service that has virtually unlimited scalability to handle even large content collections in near real-time. The company’s core intellectual property and capabilities include speech-to-text technology and natural language processing.
Harry Collier (Infonortics Ltd) and I spoke with Mr. Wilde on March 12, 2009. The full text of our interview with him appears below.
Will you describe briefly your company and its search / content processing technology?
EveryZing originally spun out of BBN technologies in Cambridge MA. BBN was truly one of the godfathers of the Internet, and developed the email @ protocol among other breakthroughs. Over the last 20 years, the US Government has spent approximately $100MM with BBN on speech-to-text and natural language processing technologies. These technologies were spun out in 2006 and EveryZing was formed. EveryZing has developed a unique Media Merchandising Engine which is able to connect audio and video content across the web with the search economy. By generating high quality metadata from audio and video clips, processing it with our NLP technology to automatically “tag” the content, and pushing it through our turnkey publishing system, we are able to make this content discoverable across the major search engines.
What are the three major challenges you see in search / content processing in 2009?
Indexing and discovery of audio and video content in search; 2) Deriving structured data from unstructured content; 3) Creating better user experiences for search & navigation.
What is your approach to problem solving in search and content processing?
Well, yes, meaning that all three are critical. However, the key is to start with the user expectation. Users expect to be able to find all relevant content for a given key term from a single search box. This is generally known as “universal search”. This requires then that all content formats can be easily indexed by the search engines, be they web search engines like Google or Yahoo, as well as site search engines. Further, users want to be able to alternately search and browse content at will. These user expectations drive how we have developed and deployed our products. First, we have the best audio and video content processing in the world. This enables us to richly markup these files and make them far more searchable. Second, our ability to auto-tag the content makes it eminently more browsable. Third, developing a video search result page that behaves just like a text result page (i.e. keyword in context, sortability, relevance tuning) means users can more easily navigate large video results. Finally, plumbing our meta data through the video player means users can search within videos and jump-to the precise points in these videos that are relevant to their interests. Combining all of the efforts together means we can deliver a great user experience, which in turn means more engagement and consumption for our publishing partners.
Search / content processing systems have been integrated into such diverse functions as business intelligence and customer support. Do you see search / content processing becoming increasingly integrated
into enterprise applications?
Yes, absolutely. Enterprises are facing a growing pile of structured and unstructured content, as well as an explosion in multimedia content with the advent of telepresence, Webex, videoconferencing, distance learning etc. At the same time, they face increasing requirements around discovery and compliance that requires them to be able to index all of this content. Search is rapidly gaining the same stature as databases and document management systems as core platforms.
Microsoft acquired Fast Search & Transfer. SAS acquired Teragram. Autonomy acquired Interwoven and Zantaz. In your opinion, will this consolidation create opportunities or shut doors?
Major companies are increasingly looking to vendors with deep pockets and bench strength around support and R&D. This has driven some rapid market consolidation. However, these firms are unlikely to be the innovators, and will continue to make acquisitions to broaden their offerings. There is also a requirement to more deeply integrate search into the broader enterprise IT footprint, and this is also driving acquisitions.
Multi core processors provide significant performance boosts. But search / content processing often faces bottlenecks and latency in indexing and query processing. What’s your view on the performance of
your system or systems with which you are familiar?
Yes, CPU power has directly benefited search applications. In the case of EveryZing, our cloud architecture takes advantage of quad-core computing so we can deliver triple threaded processing on each box. This enables us to create multiple quality of service tiers so we can optimize our system for latency or throughput, and do it on a customer by customer basis. This wouldn’t be possible without advances in computing power.
Graphical interfaces and portals (now called composite applications) are making a comeback. Semantic technology can make point and click interfaces more useful. What other uses of semantic technology do you see gaining significance in 2009?
Semantic analysis is core to our offering. Every clip we process is run through our NLP platform, which automatically extracts tags and key concepts. One of the great struggles publishers face today is having the resources to adequately tag and title all of their video assets. They are certainly aware of the importance of doing this, but are seeking more scalable approaches. Our system can use both a unsupervised and supervised approach to tagging content for customers.
Where can I find more information about your products, services, and research?
Our Web site is www.everyzing.com.
A Look Inside a Search System
March 16, 2009
A happy quack to the reader who sent me three links to posts by Vic Cherubini. Much of the detail will not be of interest to non tech readers, but I think a quick look at these three articles will provide a useful window into the complexities of search. Keep in mind that there are some trophy generation consultants running around saying, “Search is easy. Search is stable.” Baloney. Baloney. Baloney. Don’t believe me. Navigate to these posts and scan them:
- On Building an Efficient, Indexed Search Engine With a Word Proximity Algorithm here
- On Building an Efficient Search Indexer here
- Update: On Building an Efficient Search Indexer here.
These write ups make clear the effort required to avoid bottlenecks in essential components of a search system. Keep in mind that more complex systems require intricate ballets of numerical recipes, memory, and storage devices. Still think search is simple? The minor error Mr. Cherubini handles in a professional way is probably one that only a small number of Beyond Search readers would recognize and know how to remediate. Simple, right? Beware consultants manufacturing baloney from ignorance, please.
Stephen Arnold, March 15, 2009
Microsoft and Pirated Windows
March 15, 2009
I saw a link to Dan Hong’s “Microsoft Pardons Users of Pirated Windows: Defrauded Customers May Be Eligible for Free Windows XP Pro, but with Some Strings Attached” and wondered, “Is this a kinder, gentler Microsoft?” You can read the story here. Some youngsters and young at heart oldsters see piracy, which is theft, as A OK. According to Mr. Hong, “Microsoft is offering users a chance to redeem themselves for having purchased—unwittingly—computers or software containing counterfeit versions of the Windows XP operating system. All customers have to do is turn in the alleged perpetrators.” The offer is valid through July 30, 2009. If true, I find this interesting. Presumably Microsoft has worked out a method for determining which reports are valid and which are spoofs. I assume that those falsely accused may express some concern about the approach. Lawyers working on this project for Microsoft are probably quite happy with the program.
Stephen Arnold, March 15, 2009
New York Times: Groping for Cash, Heading for a Crash
March 15, 2009
Here we go again. I read “New York Times Mulls Online Subscription Fee” in Silicon Alley Insider” here. Mr.. Blodget offers some ideas. In than rolling out what did not work before, the Times needs new ideas. The financial crisis has not passed. I am subscriber, and I see the paper becoming a secondary source for me. I rely on Amazon for book reviews. The stories in the paper turn up in my newsreader 24 hours before my hard copy arrives. I no longer pay much attention to the magazine section. The wacky design annoys me. Mr.. Blodget was correct when he wrote:
An incremental $50-$75 million a year will buy the company more time to sell assets, restructure its business, and pacify its creditors, but it won’t save the place. The only way to do that, in our opinion, is to radically cut costs.
Nuclear winter arrives and settles in.
Stephen Arnold, March 15, 2009
Google Competition Amusement
March 15, 2009
I navigated to “Call for Competition Probe into Google Power” here. The article ran in the Times of London and reported that Feargal Sharkey suggested “a competition probe” of Googzilla. Nothing new in that for me. What was amusing is that I had to wait as the DoubleClick ad loaded. Nah, no irony, just reality.
Stephen Arnold, March 15, 2009
Ardentia Search
March 15, 2009
Note: A reader asserts that Ardentia is now Connexica. We’re checking.
Ardentia Search here is a company that provides “information access solutions”. The company offers a search solution for customer relationship management and direct marketing. The firm asserts that its software “has bridged the gap between BI [business intelligence] and enterprise search. In short, the company has made it possible to drive down a single digital highway instead of contending with multiple products from different vendors.
The company has entered into a global resellers agreement with Clinical Solutions. The resale agreement provides exclusivity for Clinical Solutions to resell NetSearch for healthcare in selected geographic markets.
According to the company, “NetSearch is a high speed information enquiry and visualization solution. It combines the speed of the latest search engine technology with the ease of use of an OLAP-style enquiry interface.”
The company’s search system is based on its NetSearch application. NetSearch is able to index and provide access to information in databases, content management systems, electronic mail, the Internet, and on file systems. The process is search, explore, and obtain results. The company says that the system can reduce traditional business intelligence system costs by up to 60 percent over 36 months.
Among the features the system offers are:
- Search and analyse structured and unstructured data.
- Combines external content with corporate information assets.
- Full textual searching including wild cards and approximate matching.
- Venn diagrams for data segmentation.
- Lotus Domino plug-in for federated searching.
- Microsoft Excel plug-in for advanced analytics.
To connect to content, Ardentia hooks into databases via JDBC or ODBC interfaces. Data extraction can be scheduled. The system allows the licensee to specify that changes (deltas) be extracted if identify by a data time stamp. The data extractions can be throttled to reduce the impact on online systems.
Ardentia asserts, “This provides end users with a fast, intelligent ad hoc enquiry tool that enables them to take control of the vast amounts of information currently locked away in networked and unconnected data sources.”
The company offers a search and data management solution for content residing on the SugarCRM system.
You can sign up for an online demonstration here. Information about the company’s OEM program is available here.
I did not see a search “box” on the company’s Web site. One can search the Ardentia Web site via Google using this syntax from a Google search box: site:www.ardentiasearch.com [your query]. For a search vendor, I found the omission of a search tool somewhat odd, but maybe I overlooked the link to the search function.
Stephen Arnold, March 15, 2009
Microsoft Pricing
March 15, 2009
If this Web log post is accurate, the “new” FrontPage—now called SharePoint Designer—will be free starting on April 1, 2009. Has Bill Simser unearthed a free program, or is he reporting an April fool joke? He wrote:
I caught a couple of blog posts from here and here that had to make me do a double take. I’m not one for relaying gossip, but this information seems to be legit. As of April 1, 2009 SharePoint Designer will be free. Now if you go to the “official” site there’s no mention of it however I’m hearing through the grapevine it’s true. The official site even has a “buy it today” option, so you might want to hold off on that.
Microsoft seems to be working on several fronts to keep the gates shut to its walled garden. Free software and discounts are like grocery store incentives. Even if SharePoint Designer is free, you still need Visual Studio to accomplish serious work foe SharePoint. Send your price war items via the comments section of this blog.
Stephen Arnold, March 15, 2009
SQL Server Worst Practices
March 15, 2009
I am by nature skeptical of best practices. An organization in financial trouble or under a legal storm cloud is not a candidate for a best practices write up in my opinion. I saw a link to an article called “SQL Server Worst Practices” here. The information was compiled by the consulting firm, Edgewood Solutions Engineers. My hunch after reading the write up was that the Edgewood folks wanted to offer some tongue in cheek commentary on two thirds of the IT departments’ favorite old style Codd database, Microsoft SQL Server. The impact on me was different. I did not find the 13 worst practices very amusing. Let me highlight three worst practices and then offer my view on why these did not tickle my funny bone. Keep in mind that I am an addled goose and have an insensitive funny bone. My picks from the 13 items:
- Worst practice 11 “No referential integrity”. The problem is a lousy data model. From that devolves problems with referential integrity. Get it wrong and spends lots of time reinventing your wheel. Common, expensive, and careless.
- Worst practice 8 “Throwing hardware at the problem”. The idea is that Intel’s hottest CPU, lots of cheap RAM, and SATA will make up for choke points in reading and writing data to the RDBMS’ tables.
- Worst practice 2 “Not verifying SQL Server backups…” The assumption is that back ups work. In my experience, back ups don’t work that well. The fancy dancing turns into clumsiness when data are lost.
Why did I not find the 13 worst practices amusing? In search systems from a certain large vendor in Redmond, the beastie SQL Server lurks. After many years of effort, SQL Server is commonplace. Along with its ubiquity are problems in performance, back up and restore, and integration issues with other enterprise systems including Microsoft’s own trailer park of servers, among others. Think weird latency. Think complexity with the scale up and out.
This checklist reminds me why the RDBMS model is not the peppiest geriatric at the Senior Center. Time to move on. Databases are yesterday. Dataspaces are where one should look.
Stephen Arnold, March 15, 2009