Throwing Water on Real Time Search
December 2, 2009
Short honk: I read this TechCrunch article “Twitturly Sold for a Song” weeks ago. I wanted to point out a passage that I found interesting:
Joel Strellner, who started the project, finally put Twitturly up for sale on Flippa ten days ago, and the auction just ended. Only five bids came in, and the sale ultimately netted no more than $8,500.
Is real time search losing its magnetic force? Looks like it. Since I am talking about real time search, I want to include this example of water being thrown on the hot fires of big money in this corner of the search world.
Stephen Arnold, December 2, 2009
Disclosure: Yep, I am paid to talk about real time search. I am not paid to write about it. Alert to local Justice of the Peace. I am without compensation.
Some Thoughts About Real Time Content Processing
December 2, 2009
I wanted to provide my two or three readers with a summary of my comments about real time content processing at the Incisive international online information conference. I arrived more addled than than normal due to three mechanical failures on America’s interpretation of a joint venture between Albanian and Galapagos Airlines. That means Delta Airlines I think.
What I wanted to accomplish in my talk was to make one point—real time search is here to stay. Why?
First, real time means lots of noise and modest information payload. To deal with lots of content requires a robust and expensive line up of hardware, software, and network resources. Marketers have been working overtime by slapping “real time” on any software product conceivable in the hopes of making another sale. And big time search vendors essentially ignored the real time information challenge. Plain vanilla search on content updated when the vendor decided was an easier game.
Real time can mean almost any thing. In fact, most search and content processing systems are not even close to real time. The reason is that slow downs can occur in any component of a large, complex content processing system. As long as the user gets some results, for many of the too-busy 30 somethings that is just fine. Any information is better than no information. Based on the performance of some commercial and governmental organizations, the approach is not working particularly well in my opinion.,
Let me give you an example of real time. In the 1920s, America decided that no booze was good news. Rum runners filled the gap. The US Coast Guard learned that it could tune a radio receiver to a frequency used by the liquor smugglers. The intercepts were in real time, and the Coast Guard increased its interdiction rate. The idea was that a bad buy talked and the Coast Guard listened in real time even though there was a slight delay in wireless transmissions. The same idea is operative today when good guys intercept mobile conversations or listen to table talk at a restaurant.
The problem is that communications and content believed to be real time are not. SMS may be delivered quickly, but I have received SMS sent a day or more earlier. The telco takes considerable license in billing for SMS and delivering SMS. No one seems to be the wiser.
A content management system often creates this ty8pe of conversation in an organization. Jack: “I can’t find my document.” Jill: “Did you put it in the system with the ‘index me’ metatag?’” Jack: “Yes.” Jill: “Gee, that happens to me all the time.” The reason is that the CMS indexes when it can or on a specific schedule. Content in some CMSs are not findable. So much for real time in the organization.
An early version of the Google Search Appliance could index so aggressively that the network was choked by the googlebot. System administrators solved the problem by indexing once a day, maybe twice a day. Again, the user perceives one thing and the system is doing another.
This means that real time will have a specific definition depending on the particular circumstances in which the system is installed and configured.
Several business sectors are gung ho for real time information.
Financial services firms will pay $500,000 for a single Exegy high speed content processing server. When that machine is saturated, just buy another Exegy server. Microsoft is working on a petascale real time content processing system for the financial services industry which will compete with such established vendors as Connotate and Relegence. But a delay of a millisecond or two can spoil the fun.
Accountants want to know exactly what money is where. Purchase order systems and accounts receivable have to be fast. Speed does not prevent accidents. The implosion of such corporate giants as Enron and Tyco make it clear that going faster does not make information or management decisions better.
Intelligence agencies want to know immediately when a term on a watch list appears in a content stream. A good example is “Bin Ladin” or “Bin Laden” or a variant. A delay can cost lives. Systems from Exalead and SRA can handle this type of problem and a range of other real time tasks without breaking a sweat.
The problem is that there is not certifying authority for “real time”. Organizations trying to implement real time may be falling for a pig in the poke or buying a horse without checking to see if it has been enhanced at a horse beauty salon.
In closing, real time is here to stay.
First, Google, Microsoft, and other vendors are jumping into indexing content from social networks, RSS feeds, and Web sites that update when new information is written to their databases. Like it or not, real time links or what appear to be real time links will be in these big commercial systems.
Second, enterprise vendors will provide connectors to handle RSS and other real time content. This geyser of information will be creating wet floors in organizations worldwide.
Third, vendors in many different enterprise sectors will be working to make fresh data available. You may not be able to escape real time information even if you work with an inventory control system.
Finally, users—particularly recent college graduate—will get real time information their own way, like it or not.
To wrap up, “what’s happening now, baby?” is going to be an increasingly common question you will have to answer.
Stephen Arnold, December 2, 2009
Oyez, oyez, I disclose to the National Intelligence Center that the Incisive organization paid me to write about real time information. In theory, I will get some money in eight to 12 weeks. Am I for sale to the highest bidder? I guess it depends on how good looking you are.
Can Microsoft and Its Petascale Financial Services Mining Project Succeed
December 1, 2009
The goslings and I were chattering and quacking in Harrod’s Creek. One of our cousins was killed and eaten for an American holiday. What a way for our beloved friend and colleague to go: deep fried in an oil drum behind the River Creek Inn.
As we were recalling the best moments in Theodore the Turkey’s life, we discussed the likelihood of Microsoft’s petascale content mining project hitting a home run. The ideas, as we addled geese understand it, is that Microsoft wants to process lots of content and generate high value insights for the money crazed MBAs in the world’s leading financial institutions.
The project tackles a number of tough technical problems; for example, getting around the inherent latency in petascale systems, dealing with the traditional input output balkiness of Windows plumbing, and crunching enough data with sufficient accuracy to make the exercise worth the time of the financial client. You may find my earlier post germane.
Other outfits are in this game as well. Some are focused on the hardware / firmware / software side like Exegy. Others provide toolkits like Kapow Technologies. Some Beltway Bandits operate low profile content filtering systems for governmental and commercial clients. And there is the old nemesis, Googzilla, happily chewing through one trillion documents every few days. Finally, some of the financial institutions themselves have pumped support into outfits like Connotate. Even the struggling Time Warner owns some nifty technology in the Relegence unit. So, what’s new?
Three thoughts as I prepare to comment about the push into perceived real time processing at the International Online
Show:
- The cost of slashing latency with any type of content is going to be one expensive proposition. Not even some governments have the cash to muscle up a serve with terabytes of RAM. Yep, terabytes.
- Figuring out what process left another process gasping for air requires some programmers who can plow through code looking for an errant space, a undefined variable, or a bit of an Assembler hack that push when it should have popped
- Latency outside the span of control of the system can render some outputs just plain wrong. Delay is bad; bad outputs are even worse.
If you have not been tracking Microsoft’s big initiatives, you may want to spend some more time grinding through the ACM and other scholarly papers such as “Towards Loosely Coupled Coupled Programming on Petascale Systems. and poking around on the Microsoft Web site. To find useful stuff, I use the Google Microsoft index. If you aren’t familiar with it, check it out here.
I wonder if this stuff will be part of SharePoint 2011 and available as a Microsoft Fast ESP plug in?
Stephen Arnold, December 1, 2009
Yes, oh, yes. Let me disclose to the National Institute of Science and Technology that I was not paid to write this humorous essay. Consider it a name day present. If I am late, that’s latency. If I am early, that’s predictive output.
Cell Phone Early Warning System
November 9, 2009
A happy quack to my colleague in the Near East for pointing me to “Cellphone Alert System Expected in 2 Yrs.” The point of the story is that Israel’s home front command “will be able to calculate the precise location of an impact zone, and alert residents in an affected neighborhood via their cellphones.” I also noted this passage:
Soffer [Israeli official] said that 90 percent of the civilian casualties sustained by Israel during the Second Lebanon War and Operation Cast Lead in Gaza involved people who were struck by projectiles while they were in open areas away from buildings. Civilians who seek cover in designated safe zones during rocket attacks are not likely to be wounded or killed…
Interesting use of “push, real-time mobile technology in my opinion.
Stephen Arnold, November 9, 2009
I was at the Jewish Community Center last night but I had to pay to get in. I don’t think that counts as payment for this write up. To be safe, I will alert the Jefferson Country Animal Control Office.
Topsy Adds New Feature
November 6, 2009
Topsy, one of the real time search systems that I use had added a new feature. The company’s search results now includes archived content. You can get a summary of the services other new features in “Topsy Gives Tough Competition to Tweetmeme and OneRiots in Real Time Search”. One comment in the write up jumped out for me:
There is more to the content search in Topsy. In order to filter the spam, each users are rated according to influence. This brings the relevance of the content in search result.
As more services add numerical recipes that make value judgments, the greater the pressure on traditional information companies becomes.
Stephen Arnold, November 6, 2009
No dough for this. Sigh.
Yahoo and Real Time Search
November 5, 2009
I thought Yahoo said it was going to make search a priority. I assumed that its wizards and wizardettes would tap their inner coders. Wrong if I understand the TechCrunch story “OneRiot Confirms They’re Building Yahoo’s Real Time Search Engine”. How is Yahoo going to respond to the real time search need? Yahoo is working with OneRiot. I like the OneRiot service, but I think the deal makes clear that Yahoo’s top management has more confidence in the “buy” approach than the “make” approach. This deal suggests to me that Yahoo’s own search wizards and wizardettes are either busy with other tasks or not up to the rigors of the real time search task. Just my opinion.
Stephen Arnold, November 5, 2009
I want to alert DHS that I received zero consideration for this blog post about Yahoo’s search wizardry.
Social Media Search
October 30, 2009
“Social Media Accounts for 18% of Information Search Market” is an interesting summary of social search data. I take these types of data roundups with a dose of cod liver oil, but you may find the info tasty in their native form. What struck me as important is that social search has emerged as a specific type of search. Nothing like the provenance of info from a friend who may or may not know what the heck he / she is talking about. But in today’s world that’s close enough for horseshoes, just like these statistics.
Stephen Arnold, October 30, 2009
No one I know is sufficiently clueless to pay me for writing this item.
Microsoft and Yahoo Redo
October 30, 2009
I read “Microsoft and Yahoo Delay Signing Search Deal” and heard Yogi Berra say, “It’s déjà vu all over again”. Why am I not surprised. The post deal melt down deal involved no cash but lots of “goodwill”. How did Silicon Alley Insider learn about the “delay”? Yahoo disclosed this item in an SEC filing. Hmmm. Subtle on Yahoo’s part but not sufficiently subtle for Nicholas Carlson, who noticed the item. He wrote:
We reached out to Yahoo and got back a statement that sounds almost exactly like the SEC filing: “’Microsoft and Yahoo! are committed to this agreement and believe this is a highly competitive deal that is good for consumers, advertisers and publishers. We have made good progress in finalizing the definitive agreements. Given the complex nature of this transaction there remain some issues that need some additional clarity and definitive details. So, the teams at Yahoo! and Microsoft are continuing to work on the remaining details, and we have mutually agreed to extend the period to negotiate and execute the agreement. We plan to do this as expeditiously as possible. Both companies are optimistic that we will be able to close this deal by early 2010.’”
Actions speak louder than words. Top line revenue growth or the lack thereof speak even more loudly. Google must be giggling in the Googleplex. With each day that passes, the gap between Google and its competitors increases. Ask.com seems to be crying, “Uncle.” As I said in 2007 much to the annoyance of a 20-something, “Game over in Web search.”
Next up for Microsoft and Yahoo? Real time search. Google is lagging in this sector. Maybe that’s the future for Yahoo?
Stephen Arnold, October 30, 2009
No gifts on my Halloween skeleton for this article.
Impromptu Shared Information Spaces
October 28, 2009
The USPTO published US 7,610,287, “System and Method for Impromptu Shared Communication Spaces”, filed in 2005, an invention by Jeff Dean, Georges Harik and Obeka Tallis Brown Bakin. This is a pretty significant invention in my opinion. The system and method works around some interesting technical precursors, yet gives Google a grasp on an important advance in data management. I discuss some of the invention’s implications in Google: The Digital Gutenberg, and I wanted to call to your attention this open source publication. The description of the invention is:
Communications between entities who may share common interests. For entities determined to be sharing common interests (e.g., searching using the same terms or topics, browsing a page, a site or a groups of topically related sites), options for communication among the entities are provided. For example, a chat room may be dynamically created for persons who are currently searching or browsing the same or related information. As another example, a “homepage” may be created for each query and contain various types of information related to the query. A permission module controls which entities may participate, what types of information (and from what sources) an entity can (or desires to) receive, what types of information the entity may (or desires to) share.
I anticipate that in a short time, various Google mavens and pundits will explain that this invention is related to the fun, friendly world of Google that makes up most of the services now offered from the GOOG. In my opinion, this invention moves into a new “space”. That’s as far as I will go in this free Web log. Maybe someone will be able to find search engine optimization gems in this invention. I don’t, but I am an addled goose.
Stephen Arnold, October 28, 2009
Google did not send me so much as a mouse pad for flagging this open source document as important. Spoil sports.
Twitter Tizzy
October 22, 2009
Big day for Twitter. Instead of selling out to one big dog, Twitter is fraternizing with the litter. Microsoft announced that it is indexing tweets. Google announced that it is indexing tweets. Tweets are already indexed by Twitter and a kennel of real time search systems. In short, tweets are everywhere and findable.
Lots of posts are chasing their tails around the Internet. The addled goose had three thoughts:
First, with tweets as common as cats and dogs in Harrod’s Creek, any differentiation is going to have to come via value-adding. I don’t see much high value adding. Over time, tweets may become a way to differentiate the vendors from one another.
Second, tweets are not new and I think that the extensions of the basic few words broadcast model are going to challenge some of the tweet indexing systems. How will rich media be indexed and made searchable in a meaningful way? For me rich media and some embedded links are exercises in frustration. More indexing means more hits on the Twitter system and I am not sure that Twitter will be able to cope.
Third, the Google – Microsoft dust up seems to be spreading into real time search. Since this is a relatively new content domain to many online searchers, the winner of this battle gains more than users who want access to this content domain. Dominance in Twitter search reflects on the winner’s technology and marketplace magnetism.
In short, Twitter’s puppies are going to produce one formidable online canine in my opinion.
Stephen Arnold, October 22, 2009
No one bought me dinner to share my opinion. Better luck next time wish I.