Real Time Search Systems, Part 1
June 21, 2010
Editor’s note: For those in the New Orleans real time search lecture and the Madrid semantic search talk, I promised to make available some of the information I discussed. Attendees are often hungry to have a take away, and I want to offer a refrigerator magnet, not the cruise ship gift shop. This post will provide a summary of the real time information services I mentioned. The group focuses on content processed from such services as Facebook, Twitter, blogs, and other geysers of digital confetti. A subsequent blog post will present the basics of my draft taxonomy of real time search. I know that most readers will kick the candy bar wrapper into the gutter. If you are one of the folks who picks up the taxonomy, a credit line would make the addled goose feel less like a down pillow and more like a Marie Antoinette pond ornament.
What’s Real Time Search?
Ah, gentle reader, real time search is marketing baloney. Life has latency. You call me on the phone and days, maybe weeks go by, and I don’t return the call. In the digital world, you get an SMS and you think it was rocketed to you by the ever vigilant telecommunications companies. Not exactly. In most cases, unless you conduct a laboratory test between mobiles on different systems, capturing the transmit time, the receiving time, and other data points such as time of day, geolocation, etc., you don’t have a clue what the latency between sending and receiving. Isn’t it easier to assume that the message was sent instantly. When you delve into other types of information, you may discover that what you thought was real time is something quite different. The “check is in the mail” applies to digital information, index updating, query processing, system response time, and double talk from organizations too cheap or too disorganized to do much of anything quickly. Thus, real time is a slippery fish.
Real Time Search Systems
Why do I use the phrase “real time”? I don’t have a better phrase at hand. Vendors yap about real time and a very, very few explain exactly what their use of the phrase means. One outfit that deserves a pat on the head is Exalead. The company explains that in an organization, most information is available to an authorized user no less than 15 minutes after the Exalead system becomes aware of the data. That’s fast, and it beats the gym shorts of many other vendors. I would love to pinpoint the turtles, but my legal eagle cautions me that this type of sportiness will get me a yellow card. Figure it out for yourself is the sad consequence.
Here’s the list of the systems I identified in my lectures. I don’t work for any of these outfits, and I use different services depending on my specific information needs. You are, therefore, invited to run sample queries on these services or turn to one of the “real” journalists for their take. If you have spare cash and found yourself in the lower quartile of your math class, you may find that an azure chip consultant is just what you need to make it in the crazy world of online information.
- Collecta – www.collecta.com – Venture funded with another infusion of $5 million
- Crowdeye www.crowdeye.com – Former Microsoft employees’ start up
- DailyRT http://dailyrt.com –
- Ice Rocket www.icerocket.com – funded by Mark Cuban
- ITPints www.itpints.com – Single entrepreneur
- Leapfish — www.leapfish.com Metasearch, controversy over alleged link fraud, backed by DotNext; integrates Topsy.com
- Newslookup – www.newslookup.com, founded 2000 regions and categories, open source engine, DataparkSearch
- OneNewsPage www.onenewspage.com – Live access to top news and analysis
- Red Tram www.redtram.com – Russian service, broad coverage in nine languages, including Chinese. Based in Cyprus
- Scoopler www.scoopler.com – Y Combinator funded
- Topsy www.topsy.com – Ignition Partners and other VC firms
- Twazzup www.twazzup.com – Self funded start up. Don’t confuse this with Exalead’s Tweepz.com
- Tweetmeme www.tweetmeme.com – Part of Fav.or.it in the UK via angel funding
- Yauba www.yauba.com – IIT and UC Berkeley, privacy safe
In my lectures I made four points about these types of real-time search services.
First, each of these services did at the time of my talks deliver more useful and comprehensive results than the “real time search” services from the Big Gals in the Web search game; namely, Google, Microsoft Bing, and Yahoo. Yahoo, I pointed out, doesn’t do real time search itself. Yahoo has a deal with the OneRiot.com outfit. The service is useful and I suppose I could stick it in the list above, but I am just cutting and pasting from the PowerPoint decks I used as crutches and dogs in my lecture.
Semantic Search Explained
June 19, 2010
I get asked about semantic search one a day, often more frequently. I usually say, “Semantic search means software can figure out what something is about.” If that does not do the trick, I trot out the more detailed explanation Martin White and I put in our 2009 study “Successful Enterprise Search Management.”
I neglected to write about “10 Things that Make Search a Semantic Search.” The informton in that write up by the founder of Hakia, Dr. Riza C. Berkan is useful. If you have not reviewed the write up, you will want to put this reading on your To Do list.
I don’t want to reproduce the full list. Navigate to the original article and work through. I do want to highlight three points with which I agree.
First, a semantic search can handle synonyms. Languages are like roads in Kentucky, full of potholes. Disambiguation and figuring out synonyms are two important tasks. Their presence signals a semantic component in the content processing system.
Second, a search systm that can present a snippet or a highlight of the key sentence of paragraph is quite useful. I find that some snippeting technology is designed to meet the needs of folks selling ads. The snippeting function I want works with the honesty and zeal of a prisoner who is due to be released from prison in two days.
Finally, a user can enter a query without having to formulate a query with Boolean operators or special instructions such as CC=. Systems have to be smart but not biased or tilted for the benefit of advertisers. Objectivity is important in delivering this type of query support. Alas, I think this is a difficult goal to achieve. Humans are humans and often prefer to click the ad for a vacation rental than running a query and perusing results, then making an informed decision.
A happy quack to Hakia for the post.
Stephen E Arnold, June 19, 2010
Freebie
Google Austria Book Scanning Deal
June 17, 2010
Google keeps on plugging away with its book scanning project. I have been one of the people who think that Google has flowed into a vacuum. The sniping and legal flaps have not taken my eye off the ball, however. Google wants to scan, ingest the content, and make money from its effort. I think a big part of the book scanning effort is directed at Google’s knowledge base initiative. The more content processed by the Google, the better able its numerical recipes are at making decisions. The making money part is important but not the whole story.
Google, according to India’s Economic Times, has deal with Austria. The story “Google to Scan 400,000 Austrian Library Books” said:
Austria’s national library said on Tuesday it has struck a 30-million-euro deal with US Internet giant Google to digitize 400,000 copyright-free books, a vast collection spanning 400 years of European history. Johanna Rachinger, the head of the ONB library, hailed what she called an “important step,” arguing at a news conference that “there are few projects on such a scale elsewhere in Europe.” The Austrian library project concerns one of the world’s five biggest collections of 16th- to 19th-century literature, totaling some 120 million pages, the ONB said in a statement.
Important points. This is a 30 million euro deal. The content is non exclusive. The library solves a preservation problem along with some access and money issues.
Look ahead 10 years. When you want a book from this collection, will you use Google or some other service? Google is aiming for the long haul and a much bigger play. What about the “regular” scanning activity? Just keeps on clicking along in my opinion.
Stephen E Arnold, June 17, 2010
Freebie
McKinsey to Squeeze the Azure Chip Consultants in Social Media
June 14, 2010
The azure chip crowd is going to have to up their game. I read “Nielsen Partners with McKinsey to Create Social Media Consultancy” and chuckled. The blue chip firms don’t twitch and jump. Their business is predicated on 80 percent plus repeat business from Fortune 500 firms. The new work comes from the churn and drama of business activities. The azure chip crowd usually lacks the luxury of the blue chip firms’ momentum.
Social media intelligence is one of those odd little markets which overlap traditional competitive analysis, the whizzy new “voice of the customer” baloney, and the Google-like “big data” approach to decision making.
According to the write up:
Research firm Nielsen has partnered its social monitoring service BuzzMetrics with management consultancy McKinsey to form NM Incite, a social media consultancy…In January, Nielsen announced it would extend its partnership with Facebook to measure the impact of online branding ads on Facebook.
What happened to comScore and other firms with a “core competency” in social media metrics? McKinsey has chosen its partner for the first dance in the social media waltz. What will the azure chip crowd do? Probably launch a Twitter campaign and cook up more white papers. The big money jobs will now be more exciting for the azure chip folks. Just my opinion. Ah, don’t know what makes the blue chip consulting firms different? There’s a useful data point, gentle reader. Some tips are here.
Stephen E Arnold, June 14, 2010
Freebie
Morgan Stanley Wants You to Churn Your Investments
June 13, 2010
Short honk: The excitement is back. Forget the fire fights among Apple, Google, Microsoft, and others. Forget the lousy economic outlook. Forget the oil spill. Remember the good old pre crash days. To document this moment in time, navigate to “Mary Meeker’s Amazing Internet Presentation.” You can view the great news here. Churn those holdings of your now. Yes, right now. Those data are hot, objective, and darn near as solid as anything Wall Street has to offer its partners. Amazing for sure.
Stephen E Arnold, June 13, 2010
Freebie which is a word that Morgan Stanley does not use with high frequency.
Quote to Note: Management Excellence at AOL and Time Warner
June 13, 2010
This quote to note appeared in the Daily Telegraph’s “Yahoo! Shakes on a New Type of Partnership.” If an accurate statement, it helps me understand the AOL and Time Warner way:
Here’s the passage that made me honk:
A former AOL executive said the best line to me this week – which summed up the crazy technology acquisition culture perfectly: “Every business we [AOL] ever bought we destroyed – until we bought Time Warner and they destroyed us.”
The write up provides some insight into Yahoo, but Yahoo’s track record in acquisitions is notable for the number of business school analyses each has triggered in my opinion.
Stephen E Arnold, June 13, 2010
Freebie
Quote to Note: Data Pig
June 10, 2010
I don’t use an iPhone. Yes, I pay AT&T for one of my broadband landlines. Yes, I have an AT&T landline. I am not sure if I sympathize with people who make a conscious choice to purchase services which can impose punitive variable pricing. Maybe most people don’t remember the pre-Judge Green days when a person rented a Western Electric telephone device and never owned it? I was at the Piscataway IBM facility when the order was enforced with one part of the building becoming Bellcore and other part remaining Bell Labs. The object of the company was to make money, pay for the fancy stuff like PICS, and build phones you could toss from the second floor of the Western Electric building confident that the clunky thing would work after the 26 foot fall to the concrete below.
Money.
When a telephone carrier with the “old” AT&T DNA offers a deal, I chuckle. I used to put on my Young Pioneers hat, but Tess ate it. Sigh. Memories of a monopoly don’t face quickly.
Point your browser at “AT&T Learns Exactly The Wrong Thing About Data Usage.” Agree or disagree with the write up. What I noted was:
AT&T says that 65% of its users use less 200 megabytes per month; a whopping 98% use less than 2 gigabytes. (NYT) AT&T looked at these numbers and concluded it was time for tiered pricing; time to soak these “data pigs”.
Now that’s a quote to note: “data pigs.” You can take the old AT&T out of the phone business but you can’t alter than DNA easily. Ah, “data pigs”.
Stephen E Arnold, June 10, 2010
Freebie, unlike a long distance call in 1950 when a ringy dingy to Brazil was a major event. Remember differential pricing by class of customer? Ah, remember.
Another Upstart Nation State Bans Google
June 10, 2010
I may have to fire up my old copy of XyWrite III+, create a template, and assign standing text to an Alt key. I read “Turkey Bans Use of Google, Services.” If I weren’t so busy with my World Cup paperwork, I would create a chart with such categories as “banned”, “sued”, “threatened”, and probably a couple of other categories.
The most recent nation state to get frisky with Google is Turkey. Long viewed by the US as a cheerleader, Turkey seems to be willing to make pals with certain countries which are annoyed with the United States.
Here’s the passage I noted:
In an official statement, Turkey’s Telecommunications Presidency said it has banned access to many of Google IP addresses without assigning clear reasons. The statement did not confirm if the ban is temporary or permanent….The banned IP addresses include translate.google.com, books.google.com, Google-analytics.com, tools.google.com and docs.google.com.
I thought companies had an obligation to shareholders to maximize returns. Getting in hot water in countries where there are potentially lucrative markets strikes me as losing an opportunity to make money. After the World Cup, I will work through the countries in which Google faces push back. Fascinating that a single company can become the focal point for frequent hassles with nation states.
Maybe this is a trend, not an outlier? A good question in my opinion: “Who is at fault? The country, a politician, a government, a company?” I can hear my seventh grade teacher now: Discuss in less than 250 words. What’s next? Educational institutions?
Stephen E Arnold, June 10, 2010
Freebie
The UX Crowd Does Harvey the Rabbit
June 9, 2010
I like the blinking dot interface. The 20 somethings poke with fingers. Sigh. The user experience chatter goes unheard by me. I find the cartoons, the mini motion pictures, and cluttered “assisted navigational aids” annoying. The future of interfaces is certainly less cluttered. Point your browser at “‘Imaginary’ Interface Could Replace Real Thing.” And, for a bonus, you get the “real thing”, a phrase much loved by some azure chip poobahs. The point of the write up is that the interface is – well – imaginary. For me, the key passage was:
Researchers are experimenting with a new interface system for mobile devices that could replace the screen and even the keyboard with gestures supported by our visual memory.
I have a gesture in mind.
Stephen E Arnold, June 9, 2010
Freebie.
Evidence of an Open Source Boomlet?
June 8, 2010
I read “What Is Data Science?” with interest. This is a long O’Reilly Radar essay by Mike Loukides. The write up has a message that is going to be of interest to those looking for the next big and some giant companies with data and not much leverage from that asset. The key point in the write up is that there is money to be made by converting data into products. Note that this is not the tired old data-information-knowledge mantra. The days of the quasi-intellectual approach to making money is not sufficiently pragmatic for these economic times. The key is to take data and make a product. When I read the essay, I thought about various online vendors who are doing this now. Candidates for poster children include Google, Facebook, and Yahoo along with lots of other folks. Statistics Canada once signed a deal with a vendor to crunch the StatsCan stuff into more saleable products.
But for me the most interesting item in the write up was a chart that showed the number of job listings for a couple of open source products; specifically, Hadoop and Cassandra.
You can see the lines trending upwards.
My take: there is some tangible data that indicates open source software in the data management sector is gaining traction. I am not sure what this means for other open source software. But I found this factoid interesting.
Stephen E Arnold, June 8, 2010
Freebie