OneRiot Identifies Challenges to Monetizing Real Time Info

April 13, 2010

OneRiot’s Kimbal Musk has identified the three challenges to monetizing real time information. The reasons appear in “Monetizing the Realtime Web” in the company’s blog. I agree that there is interest in real time information. Even SAS, the analytics giant, wants to hop on this fast moving content train. Law enforcement has long had an interest in knowing what’s going on, particularly in certain fast moving situations when mobile devices are used to pass messages. The challenges are, however, formidable. Mr. Musk identifies these hurdles:

  1. “Real time targeting”; that is, knowing what message goes to whom at a particular point in time. Advertisers want to fire info rifle shots, not shotgun blasts in my experience. However, real time targeting can be computationally expensive.
  2. “Data is everything”; that is, individual messages must be processed and converted into meaningful information. Google has had this challenge gripped in its teeth for more than a decade. Many organizations are struggling with this issue. There are costs and precision issues in addition to technical challenges to resolve. Better metadata are needed to make some real time information useful to an advertiser.
  3. Advertisers have some learning to do. Missionary marketing is important and some old expectations and habits can be difficult to change.

Mr. Musk provides some color about OneRiot’s successful approach provides a useful case.

The challenge is not just OneRiot’s. Google continues to tweak its presentation of real time results. I noted that our research suggests that users skip over the real time results. Some topics don’t have real time results; others do. Traditional searchers, therefore, don’t see information consistently in result sets. Consistency is important.

The larger issue, in my opinion, is that some real time results lack context. Additional information may be needed to make sense of some real time results. These injected content wrappers provide the user with the information needed to make sense of an otherwise cryptic or out of context item of information. If you run a query on a current event such as updates to the PGA tournament, the user presumably has context. But even these messages may need framing.

At this time, injection and wrapper technology is available, based on our research, just not deployed. Real time information is likely to benefit when more than the terse message is presented. Smart software may be able to shoulder the burden, converting isolated items into mini news stories.

Whoever cracks this problem will have an edge in monetization because the machine generated wrappers can have ads attached which may offer more advertising hooks.

Stephen E Arnold, April 13, 2010

Unsponsored post.

Splunk and Real Time Search

April 6, 2010

My column for Information World Review addressed the issue of latency in what marketers call real-time search. I am not sure when the article goes on the Information World Review Web site at http://www.iwr.co.uk/, but I can hit the three points in the write up.

  1. Real time means different things in different contexts
  2. The services which return results with less latency are specialist vendors such as Collecta, Surchur, and Twitter, among others.
  3. The real time results in the Big Three’s search systems are uniformly disappointing.

When I read “Splunk Goes Real-Time, Eliminates Latency from IT Data Search,” I wondered what I missed. After working through the write up, I realized that “real time search” was not defined. The assumption that a buzzword makes sense to a casual reader like myself is a common practice.

The write up said:

With a major upgrade, Splunk eliminates the latency by opening the doors to real-time search, analysis and monitoring for live streaming data. The company offered a glimpse by allowing me to go into the site and conduct a random search so that I could see my own search appear in real-time data, just as an IT admin might see it.

Splunk is company that specializes in log management. Logs are important for such applications as search engine optimization and certain security-related tasks. Here’s how the company describes itself:

Splunk is software that provides unique visibility across your entire IT infrastructure from one place in real time. Only Splunk enables you to search, report, monitor and analyze streaming and historical data from any source. Now troubleshoot application problems and investigate security incidents in minutes instead of hours or days, monitor to avoid service degradation or outages, deliver compliance at lower cost and gain new business insights from your IT data.

The addition of a search function that indexes in real time is a potentially big improvement over traditional log file analysis. The system includes a function to post Splunk saved search results to Twitter. You can get the script here.

The ZDNet write up includes a diagram for “Machine-Generated IT Data Contains a Categorical Record of Activity and Behavior”.

Splunk is a low latency search system that indexes certain types of content. “Real time” is a murky concept, and in my experience, every system exhibits latency to some degree.

Stephen E Arnold, April 6, 2010

This is an unsponsored post.

Exalead Tightens NewspaperArchive Tie Up

March 26, 2010

A happy quack to the reader who alerted me to a Marketwire story about Exalead’s deal with NewspaperArchive.com. Exalead is one of the most interesting search applications and content processing companies we monitor. The story I read was “NewspaperArchive.com Scales With Exalead”.

The story reported:

NewspaperArchive.com is the largest historical newspaper database online. It contains tens of millions of newspaper pages from 1753 to present. Every newspaper in the archive is fully searchable by keyword and date, making it easy for people to quickly explore historical content. NewspaperArchive.com had bumped up against limitations of having nearly 100 million records. After the switch to Exalead in December 2009, NewspaperArchive.com has been able to scale again, increasing the number of records by 20%; while at same time reducing the amount of hardware by 75%.

The performance angle is important based on our research. There are very few companies with the engineering and architecture to deal with the types of data flows found in many organizations today. One of the founders of Exalead worked on the AltaVista.com search system. I have identified a number of Exalead innovations that moved beyond the Digital Equipment approach to search. One of the most important is scaling and a design that permits enterprise applications to break free of their lock step methods of making data available to users. Exalead can give today’s iPod savvy user a way to access business information with the fluidity of downloading a tune from Apple’s system. In the enterprise, this type of functionality is a rare animal in my experience.

Exalead, founded in 2000,

…is the leading search-based application platform provider to business and government. Exalead’s worldwide client base includes leading companies such as PricewaterhouseCooper, ViaMichelin, GEFCO, American Greetings and Sanofi Pasteur, and more than 100 million unique users a month use Exalead’s technology for search. Today, Exalead is reshaping the digital content landscape with its platform, Exalead CloudView™, which uses advanced semantic technologies to bring structure, meaning and accessibility to previously unused or under-used data in the new hybrid enterprise and Web information cloud. Cloudview collects data from virtually any source, in any format, and transforms it into structured, pervasive, contextualized building blocks of business information that can be directly searched and queried, or used as the foundation for a new breed of lean, innovative information access applications. Exalead is an operating unit of Qualis, an international holding company, with offices in Paris, San Francisco, Glasgow, Milan and Darmstadt.

I want to let you know that the last time I was in Paris I got a preview of Exalead’s forthcoming search application technology. I am not at liberty to let le chat out of the bag, but I will be describing the system when Exalead makes a formal announcement.

You can get more information about Exalead at www.exalead.com. Additional information about NewspaperArchive is available at

Exclusive Interview with the Founder of Hot Neuron

March 23, 2010

What happens when a theoretical physicist focuses his attention on the problems of content processing? One answer is the Hot Neuron technology. Dr. Bill Dimm, after a successful career in physics and finance, founded Hot Neuron to “develop innovative methods and algorithms that help people find and organize information that will make their companies more productive.”

In an exclusive interview for the ArnoldIT.com feature Search Wizards Speak, Dr. Dimm said:

Clustify analyzes the text of your documents and groups related documents together into clusters. Each cluster is labeled with a few keywords to tell you what it is about, providing an overview of what the document set is about, and allowing you to browse the clusters by keyword in a a hierarchical fashion. The aim is to help the user more efficiently and consistently categorize documents, since he or she can categorize an entire cluster or a whole group of clusters with a single mouse click. Our approach to forming clusters is impacted by that goal. We use a modified agglomerative algorithm to ensure that the most similar documents get clustered together, and we allow the user to specify how similar documents must be in order to appear in the same cluster. By choosing a high similarity cutoff, the user can be confident that it is safe to categorize all documents in the cluster the same way. Clustify can also do automatic categorization by taking documents that have already been categorized, finding similar documents, and putting them in the same categories.

I asked Dr. Dimm about the intense competition in the text processing sector. He said:

For companies that do original research and adapt their products to their customers’ needs (like us, of course), there is a fair amount of opportunity for differentiation–customers really need to try the products and see what works in their situation. The companies that just pull an algorithm out of a book or mimic another product will be left competing on price.

You can see the technology in action at Dr. Dimm’s MagPortal.com site. For the full text of this exclusive interview with an innovative thinker in information retrieval, read the full text of  Hot Neuron interview. For more information, visit http://www.hotneuron.com.

Stephen E Arnold, March 23, 2010

A free write up and a free article. I will report this “free” stuff to the Department of Labor. I know the DOJ will care.

Limitations of MSFT Exchange 2010

March 16, 2010

I am not sure how one of my goslings came across this spreadsheet tucked away on the Microsoft Exchange Web log. When I tried to access the file, the system did not recognize my “official” Microsoft MSDN user ID nor my Windows Live credentials. So you may have to register to access the blog. Once there, you need to look for the download section and visually inspect the file names for the one that points to the Exchange Performance Excel spreadsheet. Running a query in the blog’s search box produced zero hits for me. But with some persistence and patience I was able to get a copy of the spreadsheet. Latency was a problem when I was fiddling with this download. (Note: if the link is dead, write one of the goslings at benkent2020 at yahoo dot com, and maybe he will email you a copy of this document.)

Once you get the document “Scalability Limitations”, you will see some pretty interesting information. One quick example is that the spreadsheet includes three columns of specifics about scaling amidst the more marketing oriented data on the spreadsheet. These three juicy columns are:

  • Limitation
  • Issue
  • Mitigation.

Here’s the information for the row Database Size:

  • Limitation–Exchange 2007 – 200GB; Exchange 2010 – 2TB or 1 disk, whichever is less
  • Issue–The DB size guidance changed from 200GB (if you are in CCR) to 2TB or 1 disk, whichever is greater (if you have 2+ copies of the DB in question)
  • Mitigation—Blank. No information.

Okay.

I hope you are able to locate this document. For those of you eager to install Exchange 2010, SharePoint 2010, and Fast Search 2010, you will want to make sure you have these type of spreadsheets at your fingertips * before * you jump on the Microsoft Enterprise steam engine. The information in the spreadsheet makes clear why some types of email content processing may be expensive to implement.

Stephen E Arnold, March 16, 2010

This is the equivalent of the free newspaper Velocity in Louisville. Read it for nothing. I will report working for no dough to the Jefferson County agency that thinks I work in Louisville when I spend most of my time in the warm embrace of airlines.

Real Time Search: Poor Layout or Lousy Content?

March 9, 2010

In my Information World Review column which I submitted last week, I talked about the “marshmallow wars” being waged among Google, Microsoft, and Yahoo. The idea is that these three “big boys” are not doing a particularly meaty job with real time content. I was fascinated to read “Why Do We Ignore Real Time Results from Google Search” in the media-savvy Guardian. I focused on the substance of the real time results, the latency, and the method of displaying these results. Each company rows its real time boat differently, and that makes life difficult for geese like me.

The Guardian’s approach, which was quite interesting to me, focused on eye tracking. You can read the write up and decide whether user experience or the content itself is the problem. I am very skeptical of the razzle dazzle about eye candy and how eyes move. My recollection from my grade school and high school days is that some people are not very adept readers. In my class which underwent a speed reading test in Illinois in the 1950s, few students were able to absorb blocks of text at one glance. Obviously, if there are some slow readers, there may be some difficulty with certain types of layouts. On the other hand, if you are like me and can swallow paragraphs or even pages at a glance, then the eye movement stuff may not be as significant as the value of the information.

My column for Information World Review focuses on substance. I leave the wandering eyeballs of those who read a word or two at a time and may sub vocalize when they grind through information to the arts and crafts approach to information. My opinion is that Google, Microsoft, and Yahoo are chasing real time content because it is has marketing value. There are useful data in real time results but not in the presentations of the big dogs of Web search. I identify some go-to services for real time search but you will have to wait until the IWR publishing cycle outputs the column.

Stephen E Arnold, March 10, 2010

No one paid me to write this. I wonder if those reading the article glance, move their lips, or follow with their fingers. I suppose this type of non compensated writing and the attendant question means I must report to the FBI, an outfit skilled in dealing with impressions of fingers.

Automated Reports May Squeeze Azure Chip Consultants

March 3, 2010

On Mach 1, 2010, I heard an interesting presentation by Alacra. This company aggregates business information and packages it in an easy-to-read way. The talk focused on Alacra Pulse. The basics of the service as I recorded the speaker’s comments are:

The Alacra Pulse Platform monitors thousands of carefully selected news feeds and blogs and extracts actionable intelligence in near-real-time.  Alacra’s Applied Knowledge Extraction turns Web content into an idea generation and current awareness service for financial professionals and corporate executives.

The Pulse service offers a free version and a premium version. I thought the demonstration of the free version was quite good, and it certainly shows how far automated content assembly has come in the last two or three years. You can explore the service yourself. Navigate to http://pulse.alacra.com/analyst-comments to get started.

Right after listening to the Alacra talk, I read a news release about “New Market Report Now Available: Hon Hai Precision Industry Company Limited – SWOT Analysis” from an azure chip outfit called Datamonitor. Now Hon Hai Precision, if I recall correctly, is a unit of Foxconn. Foxconn makes stuff for “hollow manufacturers.” The idea is that a company like Apple or Dell does not have the expertise or cost structure to make things. Foxconn and Hon Hoi do. The buzzword for lacking a core competency in manufacturing is outsourcing in my opinion. Foxconn is interesting because of a glitch in Apple related security and an employee death.

I think I prefer the autogenerated reports from Alacra, thank you. Image sourcehttp://4.bp.blogspot.com/_Hs_cKZPkvaU/Sxshr9-foYI/AAAAAAAAAG4/ZtHNfWYGEIE/s400/sleazy-salesman-thumb.jpg:

The azure chip crowd assembles reports, presumably like the Hon Hoi “SWOT” analysis using basic business school methods, which may or may not be germane to the present financial climate. Here’s what the news release said:

Datamonitor’s Hon Hai Precision Industry Company Limited – SWOT Analysis company profile is the essential source for top-level company data and information. Hon Hai Precision Industry Company Limited – SWOT Analysis examines the company’s key business structure and operations, history and products, and provides summary analysis of its key revenue lines and strategy.

Would you trust your retirement savings to anyone relying upon an off the shelf,  MBA’s SWOT analysis in today’s real time world? I would not.

But several more important questions crossed my mind.

First, why would I pay for a profile when I could get the basic information from Alacra? Furthermore, when I get the Alacra report, it is (as I recall the speaker’s saying) generated in real time. No delay between my request and getting the freshest data. I am reasonably capable can can formulate my own views of a company from fresh data if what I saw from Alacra is indicative of what the company provides in its Pulse service.

Second, assume I don’t know about Alacra. Why not use a federating search system such as Devilfinder.com, Ixquick.com, or Vivisimo.com? There are even more interesting federating systems to tap, including Bright Planet and Deep Web Technologies. I understand laziness, but these services can deliver the basic information that provides a manager with direct links to some publicly available, quite useful reports about the company; for example, the Google Finance or the AOL company report.

Third, if I * really * want to do a good job, perhaps my firm should look into industrial strength solutions from Kapow Technologies or Kroll Ontrack? Let’s face it. Buying an off-the-shelf report chock full of business school jargon may not be what’s needed to deal with a business decision in today’s economic climate.

My opinion is that as outfits like Alacra and Google roll out more reports generated by “smart” software, the pressure will mount on the azure chip consulting crowd. Already pressured from below by the likes of Gerson Lehrman Group and from above from the blue chip outfits like McKinsey & Co, the azure chip outfits will be like the cheese and salami in a Panini, feel heat and pressure from two places at once.

My thought is that canny business executives may want to check out services like Pulse, familiarize themselves with federating search systems, consider an industrial strength solution that operates in near real time, or just sign up for a pay as you consulting service from Gerson Lehrman. In short, do something other than buying the Cliff’s Notes for executives who are lazy like this addled goose. In short, the middle may be an uncomfortable place for self appointed experts, poobahs, and mavens writing reports about Chinese businesses using English language sources. Yep, Cliff’s Notes for Harried Executives?

Stephen E Arnold, March 3, 2010

No one paid me to write this. Since I mentioned myself as a lazy goose, I will report non paid writing to Fish & Wildlife, an outfit that does a stellar job scheduling rooms at national parts and monitoring the health of geese in the US.

Buzz Search: Defaults Do Not Fly

February 22, 2010

Editor’s Note: Constance Ard, the Answer Maven, is one of the goslings. She wrote an overview of Google Buzz search functionality. Ms. Ard is active in the Special Libraries Association, heads up the legal interest group, and has an MLS with an emphasis on online search, taxonomies, and content processing.

With the release of Buzz flapping everyone’s wings over the last Internet half-life, it’s time to consider some practical application for Buzz. Danny Sullivan at Search Engine Land has laid the groundwork for searching Buzz.

For the record, the type it in the box and trust the search results, aren’t enough with this service from Google. You can see below, that Buzz, a social media tool that gets food from Twitter, Google Reader, Friend Feed, and SMS display results from a typical box search that are surprisingly old in the real-time scheme of things.

These results are for a search done at approximately 8 p.m. EST on February 17, 2010, through the Buzz search box with the term: Olympics. The first result is time-stamped 4:50 p.m. The last result was stamped 9:41 a.m. and the second was stamped 8:23 a.m. These are not exactly real-time results and not even reverse chronological in display.

clip_image002

clip_image002[4]

clip_image002[6]

clip_image002[8]

The same search on Buzzzy.com (selected results shown below) done at the same approximate time provides even more irritating displays. Has anyone heard of time, date stamps? I understand that in real-time search hours count but in search, pinpointing an accurate date and time is essential.

Read more

Twitter and Mining Tweets

February 21, 2010

I must admit. I get confused. There is Twitter, TWIT (a podcast network), TWIST (a podcast from another me-too outfit), and “tweets”. If I am confused, imagine the challenge for text processing and then analyzing short messages.

Without context, a brief text message can be opaque to someone my age; for example, “r u thr”. Other messages say one thing, “at the place, 5” and mean to an insider “Mary’s parents are out of town. The party is at Mary’s house at 5 pm.”

When I read “Twitter’s Plan to Analyze 100 Billion Tweets”, several thoughts struck me:

  1. What took so long?
  2. Twitter is venturing into some tricky computational thickets. Analyzing tweets (the word given to 140 character messages sent via Twitter and not to be confused with “twits”, members of the TWIT podcast network) is not easy.
  3. Non US law enforcement and intelligence professionals will be paying a bit more attention to the Twitter analyses because Twitter’s own outputs may be better, faster, and cheaper than setting up exotic tweet subsystems.
  4. Twitter makes clear that it has not analyzed its own data stream, which surprises me. I thought these young wizards were on top of data flows, not sitting back and just reacting to whatever happens.

According to the article, “Twitter is the nervous system of the Web.” This is a hypothetical, and I am not sure I buy that assertion. My view is that Google’s more diverse data flows are more useful. In fact, the metadata generated by observing flows within Buzz and Wave are potentially a leapfrog. Twitter is a bit like one of those Faith Popcorn-type of projects. Sniffing is different from getting the rare sirloin in a three star eatery in Lyon.

The write up points out that Twitter will use open source tools for the job. There are some juicy details of how Twitter will process the traffic.

A useful write up.

Stephen E Arnold, February 22, 2010

No one paid me to write this article. I will report non payment to the Department of Labor, where many are paid for every lick of work.

PointCast Version 2?

February 19, 2010

I read “Did Google Reader Just Turn on the Firehose?” I don’t use the Google Reader. The addled goose does not read. But when he scanned with one eye the story in Stay N’ Alive, he had one thought, “Is PointCast back?” Different coat of paint maybe but possibly the same squeaking wheel?

Stephen E Arnold, February 19, 2010

No one paid me to write this. Non payment means that I must report this to the IMF, an outfit aware of such sad situations.

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta