Real Time Search Systems, Part 3

June 23, 2010

Editor’s note: This is the draft taxonomy of real time systems that I discussed in my June 15 and 17 lectures. It may or may not make sense, but I wanted to make clear that the broad use of the phrase “real time” does not convey much meaning to me. The partial fix, short of incarcerating the marketers who slap “real time” on their brochures, is to come up with “types” of real time information. The type helps make clear the cost and other characteristic features of a system sporting the label “real time”.

Stop and think about the difference in user expectations between an investment firm and a middle school child processing information. The greed mongers want to get the freshest information possible to make the maximum return on each bet or investment. The middle school kid wants to make fun of a teacher.

The greed mongers spend millions for Fancy Systems from Thomson Reuters, Exegy, or a similar specialist. The reason is that if the Morgan Stanley Type As get bond information a few milliseconds after the God loving folks at Goldman, lots of dough can slip through the clutching paws of the person responsible for a trade. With a great deal at stake, real time means in milliseconds.

The middle school wit is happy with whatever happens as long as the teacher remains blissfully ignorant of the message. If the recipient lets out a hoot, then there may be consequences, but the downside is less painful than what happens to the crafty Wall Street wonder.

The figure below presents the draft taxonomy. If you find it silly, no problem. If you rip it off, a back link would be a nice gesture, but I don’t have any illusions about how stateless users conduct themselves.

image

Where does the latency originate? The diagram below provides the tech sleuth with some places to investigate. The lack of detail is intentional. Free blog, remember?

Read more

Real Time Search Systems, Part 1

June 21, 2010

Editor’s note: For those in the New Orleans real time search lecture and the Madrid semantic search talk, I promised to make available some of the information I discussed. Attendees are often hungry to have a take away, and I want to offer a refrigerator magnet, not the cruise ship gift shop. This post will provide a summary of the real time information services I mentioned. The group focuses on content processed from such services as Facebook, Twitter, blogs, and other geysers of digital confetti. A subsequent blog post will present the basics of my draft taxonomy of real time search. I know that most readers will kick the candy bar wrapper into the gutter. If you are one of the folks who picks up the taxonomy, a credit line would make the addled goose feel less like a down pillow and more like a Marie Antoinette pond ornament.

What’s Real Time Search?

Ah, gentle reader, real time search is marketing baloney. Life has latency. You call me on the phone and days, maybe weeks go by, and I don’t return the call. In the digital world, you get an SMS and you think it was rocketed to you by the ever vigilant telecommunications companies. Not exactly. In most cases, unless you conduct a laboratory test between mobiles on different systems, capturing the transmit time, the receiving time, and other data points such as time of day, geolocation, etc., you don’t have a clue what the latency between sending and receiving. Isn’t it easier to assume that the message was sent instantly. When you delve into other types of information, you may discover that what you thought was real time is something quite different. The “check is in the mail” applies to digital information, index updating, query processing, system response time, and double talk from organizations too cheap or too disorganized to do much of anything quickly. Thus, real time is a slippery fish.

Real Time Search Systems

Why do I use the phrase “real time”? I don’t have a better phrase at hand. Vendors yap about real time and a very, very few explain exactly what their use of the phrase means. One outfit that deserves a pat on the head is Exalead. The company explains that in an organization, most information is available to an authorized user no less than 15 minutes after the Exalead system becomes aware of the data. That’s fast, and it beats the gym shorts of many other vendors. I would love to pinpoint the turtles, but my legal eagle cautions me that this type of sportiness will get me a yellow card. Figure it out for yourself is the sad consequence.

Here’s the list of the systems I identified in my lectures. I don’t work for any of these outfits, and I use different services depending on my specific information needs. You are, therefore, invited to run sample queries on these services or turn to one of the “real” journalists for their take. If you have spare cash and found yourself in the lower quartile of your math class, you may find that an azure chip consultant is just what you need to make it in the crazy world of online information.

In my lectures I made four points about these types of real-time search services.

First, each of these services did at the time of my talks deliver more useful and comprehensive results than the “real time search” services from the Big Gals in the Web search game; namely, Google, Microsoft Bing, and Yahoo. Yahoo, I pointed out, doesn’t do real time search itself. Yahoo has a deal with the OneRiot.com outfit. The service is useful and I suppose I could stick it in the list above, but I am just cutting and pasting from the PowerPoint decks I used as crutches and dogs in my lecture.

Read more

Collecta Nets a $5 Million Series B

June 11, 2010

Collecta.com has been one of the go-to services for me. We tested the service in the demonstration site at http://ssnblog.com and found it useful and reliable. We learned today (June 10, 2010) that Dace Ventures has provided about $5 million in Series B funding to the the company. With the cash injection, we anticipate that the company will step up its marketing and add additional features. Mashable quoted the company’s CEO Gerry Campbell as saying:

This funding is a great validation of both Collecta’s technology, as well as the company’s core vision that real-time data applications are fundamentally changing how we access information on the Internet.

RedWriteWeb reported that the company had an initial capital base of about $2 million.

If you have not explored the service, navigate to www.collecta.com. Similar services, which I tracked in my 2009 series of Information World Review columns, include www.itpints.com, www.topsy.com, and www.scoopler.com. In my SLA lecture, I will point out that the real time services from Google, Microsoft, and Yahoo lag the service provided by specialists like Collecta.com.

Stephen E Arnold, June 11, 2010

Freebie

Searchtrix: Quite Useful

June 8, 2010

The purpose of this system is to “tease out popular keywords and phrases in social media.” The system does that job quite well. However, the addled goose is on the look out for search and retrieval systems that make it easier to explore certain types of questions or problems. Searchtrix is a quite useful search system. Let me give you one example from my test queries, and then leave it to you to test drive this system.

Navigate to the Searchtrix splash page. Ignore the terms in the search box. You will be entering your own keywords. In the first box type “taxonomy, ontology” and in the second box, enter “system, method”. The Searchtrix system will create eight queries; specifically, “taxonomy” and “system”, “ontology” and “system” and so on. You can specify how many words should appear between the two words. This is a modern version of proximity searching. If you don’t know what that means, don’t worry about it for this demo. Now click the “search” button. Here’s what you will see:

image

When you click on one of the results, the system displays the hits for the query the system created and fired at some combination of Twitter, Facebook, and Topsy. You specify which of these indexes for which you want results. Here’s the result lit for the query “ontology” AND “system”.

image

Useful stuff. The hit was spot on and pointed to a project I had heard about but never paid much attention.

The system offers a number of features that seem to be aimed at the search engine optimization banditos. I think this is a very interesting system for making sense out of the message streams that flow through the real time systems Searchtrix taps.

A happy quack for this service.

Stephen E Arnold, June 8, 2010

Freebie

SAP and Oracle Chase Real Time

June 4, 2010

At the SLA Conference in New Orleans in a couple of weeks, I am talking about real time information processing. That paper will focus on a taxonomy of real time. Most folks use the phrase “real time” without placing it in context. Like much of the blather about finding information that is germane to a specific need, 20 somethings, azure chip consultants, and the formerly employed grad on to a buzzword. Thank goodness I am 65 and happy paddling quietly in the goose pond here in Harrod’s Creek.

I read “SAP, Oracle and Real Real Time Apps.” You should reach article, consider its argument, and make up your own mind about real, real time. For me, the killer passage was:

Forgive me for being skeptical, but I’ve been asking myself these last few weeks why a database vendor hasn’t come up with something along the lines of what SAP now says it will deliver. In-memory and column-oriented technologies have been around for years, and vendors like Sybase and Vertica have been talking about 10X to 100X data compression for nearly as long. Did it really take an application vendor to think outside the box of the database market as we know it? Has it really been beyond outfits as talented and well-funded as IBM and Teradata to tackle these problems? Or have the database vendors been protecting the status quote and certain revenue streams? It seems even Oracle’s OLTP- and OLAP-capable Exadata doesn’t aspire to replace the data warehouse layer as we know it.

I think this is on the same page with my thinking or maybe in the same chapter.

image

My view on SAP and Oracle is that neither company defines real time in a way that makes me feel comfortable. I get agitated when I hear the word “real” used to describe anything related to digital information. I don’t want to get into eschatology, but there’s a limit to my tolerance for “real”.

What’s real about big traditional database and IBM-inspired systems is that getting updates is tough. Even more problematic is the difference between processing data related to events or activities and information activities. Large systems have a tough time handling real time because latency is a fact. The bigger and clunkier the system, the more latency. Gmail went south for some users last week, and the users identified the flaw due to latency. What really happened is probably unknown to most Googlers except for the team that tracked down the problem and resolved it. But the level of service restored probably has latency, just brief enough latency to allow the user to perceive that the system was working in what the user perceived as real time.

Read more

Silobreaker to Roll Out Report Feature

June 4, 2010

Short honk: We learned from a reader in Europe, that Silobreaker plans to roll out a report feature in the fall of 2010. Silobreaker delivers high value information in an easy-to-digest format. If you are not familiar with the company, navigate to http://www.silobreaker.com/. You can use the free service and upgrade to the industrial-strength version once you get a feel for the depth of the service. Silobreaker supports one or multi dimension queries. When you become a paying customer, you can configure a custom report on a topic of interest to you. You can specify daily, weekly, or monthly updates.

Stephen E Arnold, June 4, 2010

Freebie although I have been assured a fish treat the next time I track down a Silobreaker executive. Promises, promises. I would settle for an answer to my email queries.

Wowd Gets Two Patents – Sign of Future Success?

May 22, 2010

New kid on the block gets two patents on its method for ranking search results based on usage data and its variation on peer-to-peer networking. Wowd is a search system that makes it easier to discover what’s popular on the Web. the company says, “A new way to search… when what’s happening now matters.”

Though Wowd is not yet at the scale that necessitates this patented technology, they are hedging their bets and being prepared for when that day comes. Gigaom.com reported in their article, ‘Wowd Doubles Down With Social Search and P2P Patents”  that Wowd doesn’t plan to do much with the patents at the moment but it will demonstrate to investors that they are serious.

The first patent is for a method of ranking web pages based on the way people use them. In other words, it gives a search engine the ability to weigh anonymized information about where users click to go next from a web page. The technology was developed for real-time use and especially social search. The second patient is for their variation on peer-to-peer networking and is not search specific. The real time search sector has a number of vendors fighting for traffic. Wowd is a useful service.

Melody K. Smith, May 22, 2010

Note: Post was not sponsored.

Faveeo Turns Real Time Chatter Into Useful Info

May 18, 2010

Faveeo, a new semantic-based search app, aims to gather the incredible amount of data floating around on the internet and filter it into something useful. Their demo video shows a mash-up of a “Google” tag and “smartphone” tag, instantly yielding articles on the left and a live Twitter feed on the right. Clicking on an article about the Android phone updates the Twitter feed to include this tag and allows selections between tweets about Google, smartphones, the Android, or any combination of the three. While Faveeo currently focuses on Twitter, it’s currently still in beta and could easily grab tags from Facebook, YouTube, and other social networks in the future. Interested users can get an invite to try it out via the site.

Samuel Hartman, May 18, 2010

Note: Post not sponsored.

Tough Old Birds

May 4, 2010

I am going to be 66. I am a spring chicken compared to two tough birds: Sumner Redstone (films and TV) and Rupert Murdoch (newspapers and Fox News). What could be more enlightening to those under 25 than these roosters opining about content, online, and anything in between? Not much.

First, navigate to “Sumner Redstone Says Murdoch’s Newspapers Will Fail.” The harpoon struck here:

“He [Mr. Murdoch] lives in ink, and I live in movies and television,” Redstone said. “Ink is going to go away, and movies and television will be here forever, like me.”

Spicy? For sure.

Next, point your browser at “Fox News, Rupert Murdoch… All Pirates.” Here’s the passage I noted:

It seems that Murdoch has a double standard when it comes to copyright infringement. Apparently it’s not that bad if he’s the one making money from it.

What I learned was:

  • Don’t irritate Mr. Redstone
  • Don’t expect consistency from News Corp.
  • Steer clear of both outfits.

And search? Not an issue of significance to either top dog.

Stephen E Arnold, May 4, 2010

Unsponsored post.

Google Adds to Its Real Time Search Services

April 15, 2010

Short honk: Every Google watcher on the planet has documented the most recent Google real time search services. I just want to capture the date (April 14, 2010) and the links to the “official” announcements. For the blog post about the new service navigate to “Replay It.” For the experimental find a person to follow service, point your browser to Google Follow Finder. Both services are likely to be used by a smaller percentage of Google.com users. These are advanced search features, and users still bang in two or three words and look at the top results. One important aspect of the Follow Finder is that it appears to be running on the Google Apps Engine. Useful for marketing and intelligence purposes.

Stephen E Arnold, April 15, 2010

Unsponsored post.

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta