Baidu to Invest in Search Results Filtering

December 17, 2010

Those expensive coffees at boutique coffee shops are filtered. People like filtered coffee.

The goose knows why Baidu is probably going to be successful in certain markets. “Baidu to Spend $15 Mln to Screen Search Engine Results” reports that “China’s leading search engine plans to deploy $15 million to expunge illicit material and false information from its search results.” The source? State media.

image

Filtering makes some things better. One example is revenue derived from for fee service for the China market. Image source: Weekender.com

Observations:

  1. Filtering happens. What’s interesting is the price tag placed on the renewed effort. Details about the scale of the filtering or “expunging” appear in the Interfax.com write up
  2. Happy government officials. Reading between the lines I could see modest smiles of happiness on the faces of some government officials.
  3. Unbeatable advantage in the fastest growing and largest market for online in the world. No further comment necessary because some Google shareholders may ask, “Tell us again why you are not making every effort to maximize shareholder value in the world’s largest market?”

In my opinion, the $15 million is irrelevant. The message is the investment. Message received in Harrod’s Creek. I am not sure about elsewhere.

Stephen E Arnold, December 17, 2010

Freebie

Live Protest Map

December 11, 2010

Short honk: The new approach to search is to create an app that shows information. No search required. An interesting example of this is the Live Protest Map for London protesters. I am not sure how long the map will be online, but if you click now (9 am Eastern, December 11, 2010), you can view the map at this link.

Stephen E Arnold, December 11, 2010

Freebie

Goose Defeathered: Real Time Truth Revealed

December 5, 2010

The goose returned from snowy England and France. Alas, the trip to sunny Luxembourg was not possible. Luxembourg, as you know, is  San Tropez North. The trip was uneventful. I wanted to call attention to the work of a sketch artist who heard my talk about real time search. I don’t recall using the phrase “no bullshit”, but I was cold and without adequate supplies of Diet Coke. The Skinker’s crew had something called beer, which I don’t drink.

I want to reproduce a summary of my talk which is now on Flickr at this link. The screenshot from my ancient browser renders the goose without feathers, a nice pair of bald spots and a pregnant tummy. Perfect!

skinkers

I have posted one of the screenshots from my talk, and I will pepper my blog and its two or three readers with other screenshots from my winter wonderland Euro adventure in the next few days.

Where are those feathers?

Stephen E Arnold, December 5, 2010

Freebie, just like the talk at Skinker’s.

Can Analytics Turn Drivel into Diamonds?

November 10, 2010

Are Facebook posts drivel or diamonds? Perhaps a better question is, “Can analytics convert drivel into diamonds?” The answer may be, “Yes.”

The Facebook content does not require semantics to squeeze sense out of it. The fact that a person has posted information delivers a signal about content “value.” The article “Drivel on Facebook More Valuable Than We Think” references a Swedish university’s report that calls attention to the importance of the superficial contacts, apparently unnecessary comments, and banal status updates on Facebook.

Regarding the more-than-real pseudo-friends of Facebook, the article says:

These contacts in fact constitute highly useful networks, networks that make use of the ostensibly meaningless comments and updates.

The public value of messages from a semi-private ecosystem is high. Companies and public authorities are not aware of the value of Facebook and other social content, particularly streams of content. Analytic methods, both simple and complex, justifies the cost of running analyses across these data. Who knew that social networking would generate value beyond the satisfaction of communicating with friends and acquaintances. The message is clear, “It’s time to cash on this gold rush.” For more information about text and data mining, navigate to www.inteltrax.com.

Harleena Singh, November 10, 2010

Freebie

Microsoft Suggests Google Instant Is No Big Deal

October 7, 2010

Microsoft and Google have never hidden their competiveness when it comes to trying to outdo one another. In the PC World article “Microsoft’s Yusuf Mehdi Pooh-Poohs Google Instant”, one of Microsoft’s VP’s discusses Google’s new instant feature. The new feature is designed to refresh users search results while they are typing their queries and describes the program as “search before you type”. Microsoft’s response to the new system is “Google Instant is technologically “impressive” but misses the mark in what search engines should do.” Microsoft contends that the service, though fast, misses the mark because it does not help users to narrow down the information they are looking for. Microsoft promises its Bing service will continue to focus more on providing users with more of the information that they want. It is convenient to get quick results but if the results are not helpful then the speed factor is really of little importance. Our view is that most users won’t know how to turn it off, so for casual consumers, Google Instant is the new Google.

April Holmes, October 7, 2010

Tibco: Money and Mentos

September 27, 2010

Tibco (founded and directed by MIT- and Harvard-grad Vivek Ranadive) reported strong third quarter earnings. The company also made an interesting acquisition. Tibco purchased OpenSpirit, a maker of software used in oil and gas exploration in September 2010.

The “information bus” upon which Tibco’s fame rests is used as plumbing in a number of high profile industries. These include news, financial services, and government entities.

What’s important about Tibco is that the firm, in my opinion, has been one of the leaders in real time computing and information systems. Tibco’s approach can alert, pass messages, and transform content. With a bit of work, Tibco becomes the equivalent of the nervous system of a client. Many companies assert that their technology delivers a platform. Palantir, for example, is a relative newcomer to the platform pitch. But the reality is that companies like Tibco deliver a deeper, more fundamental architectural approach.

And Tibco makes the efficacy of its architecture easy to understand. How does Tibco communicate the value of its real time architecture? Click here.

For more information about Tibco, what I call a real platform company, navigate to the firm’s Web site at www.tibco.com. When I visited Tibco’s offices a decade ago, I remember see Yahoo News chugging happily away on Tibco’s servers. Yep, Tibco is more than Mentos.

Stephen E Arnold, September 27, 2010

Tweets with Pickles: DataSift and Its Real Time Recipe

September 25, 2010

We have used Tweetmeme.com to see what Twitter users are doing right now. The buzz word real time has usurped “right now” but that’s the magic of folks born between 1968 and 1978.

DataSift combines some nifty plumbing with an original scripting language for filtering 800 tweets a second. The system can ingest and filter other types of content, but as a Twitter partner, DataSift is in the Twitterspace at the moment.

Listio describes the service this way:

DataSift gives developers the ability to leverage cloud computing to build very precise streams of data from the millions and millions of tweets sent everyday. Tune tweets through a graphical interface or through its bespoke programming language. Streams consumable through our API and real-time HTTP. Comment upon and rank streams created by the community. Extend one or more existing streams to create super streams.

The idea is that a user will be able to create a filter that plucks content, patterns like Social Security Numbers, and metadata like the handle, geographic data, and the like. With these items, the system generates a tweet stream that matches the parameters of the filter. The language is called “Filtered Stream Definition Language” and you can see an example of its lingo below:

RULE 33e3891a3aebad56f962bb5e7ae4dc94AND twitter.user.followers_count > 1000

A full explanation of the syntax appears in the story “FSDL”.

You can find an example on the DataSift blog which is more accessible than the videos and third party write ups about a service that is still mostly under wraps.

The wordsmiths never rest. Since I learned about DataSift, the service has morphed into “cloud event processing.” As an phrase for Google indexing, this one is top notch. In terms of obfuscating the filter, storage, and analysis aspect of DataSift, I don’t really like cloud event processing or the acronym CEP. Once again, I am in the minority.

The system’s storage component is called “pickles.” The filters can cope with irrelevant hash tags and deal with such Twitter variables as name, language, location, profiles, and followers, among others. There are geospatial tricks so one can specify a radius around a location or string together multiple locations and get tweets from people close to bankrupt Blockbuster stores in Los Angeles.

The system is what I call a next generation content processing service. Perched in the cloud, DataSift deals with the content flowing through the system. To build an archive, the filtered outputs have to be written to a storage service like Pickles. Once stored, clever users can slice and dice the data to squeeze gems from the tweet stream.

The service seems on track to become  available in October or November 2010. A graphical interface is on tap, a step that most next generation content processing systems have to make. No one wants to deal with an end user who can set up his own outputs and make fine decisions based on a statistically-challenged view of his or her handiwork.

For more information point your browser at www.datasift.net.

Stephen E Arnold, September 25, 2010

Exclusive Interview: Quentin Gallivan, Aster Data

September 22, 2010

In the last year or two, a new type of data management opportunity has blossomed. I describe this sector as “big data analytics”, although the azure chip consultants will craft more euphonious jargon. One of the most prominent companies in the big data market is Aster Data. The company leverages BigTable technology (closely associated with Google) and moves it into the enterprise. The company has the backing of some of the most prestigious venture firms; for example, Sequoia Capital and Institutional Venture Partners, among others.

Aster Data, therefore, is one of the flagships in  big data management and big data analysis for data-driven applications.  Aster Data’s nCluster is the first MPP data warehouse architecture that allows applications to be fully embedded within the database engine to enable fast, deep analysis of massive data sets.

The company offers what it calls an “applications-within” approach. The idea is to allow application logic to exist and execute with the data itself. Termed a “Data-Analytics Server,” Aster Data’s solution effectively utilizes Aster Data’s patent-pending SQL-MapReduce together with parallelized data processing and applications to address the big data challenge. Companies using Aster Data include Coremetrics, MySpace, comScore, Akamai, Full Tilt Poker, and ShareThis. Aster Data is headquartered in San Carlos, California.

I spoke with Quentin Gallivan, the company’s new chief executive officer on Tuesday, September 22. Mr. Gallivan made a number of interesting points. He told me that data within the enterprise is “growing at a rate of 60% a year.” What was even more interesting was that data growth within Internet-centric organizations was growing at “100% a year.”

I asked Mr. Gallivan about the key differentiator for Aster Data. Data management and chatter about “big data” peppers the information that flows to me from vendors each day. He said:

Aster Data’s solution is unique in that it allows complete processing of analytic applications ‘inside’ the Aster Data MPP database. This means you can now store all your data inside of Aster Data’s MPP database that runs on commodity hardware and deliver richer analytic applications that are core to improving business insights and providing more intelligence on your business. To enable richer analytic applications we offer both SQL and MapReduce. I think you know that MapReduce was first created by Google and provides a rich parallel processing framework. We run MapReduce in-database but expose it to analysts via a SQL-MapReduce interface. The combination of our MPP DBMS and in-database MapReduce makes it possible to analyze and process massive volumes of data very fast.

In the interview he describes an interesting use case for Barnes & Noble, one of Aster Data’s high profile clients. You can read the full text of the interview in the ArnoldIT.com Search Wizards Speak service by clicking this link. For a complete list of interviews with experts in search and content processing click here. Most of the azure chip consultants recycle what is one of the largest collection of free information about information retrieval in interview form available at this time.

Stephen E Arnold, September 22, 2010

Freebie. Maybe another Jamba juice someday?

Thomson Reuters Pushes Further into Real Time

September 20, 2010

Well, it was only last year that Thomson Reuters revamped their website to provide content that is more intelligent to business professionals. It has since then had a series of launches of Elecktron – “a high-speed data distribution network, and interactive on-demand video platform Insider,” that forms part of the company’s financial markets data subscription business.

The NewMediaAge article, “Thomson Reuters Adds to Market Information Service,” informs about the third component of its revamped market information service – Eikon. The article reports that, “Eikon combines market information, news, analytics, and trading tools into a desktop facility, with social and mobile access.” Describing further it states that Eikon, “links with foreign exchanges, equities, fixed income and trading venues, with easy-click trade capabilities so that traders can act in real-time.”

We wonder if these specialized, real-time high-end services can help the company climb back on the growth roller coaster. Only time will tell.

Harleena Singh, September 20, 2010

A Fatter Big Brother? Search, Surveillance, and More

September 19, 2010

Big Brother. The Man. Spies. All three of these buzzwords conjures up many things in peoples minds. Who are they? What exactly do they do? Are they watching me and recording every move I make? To most, silly paranoia. But, in an eye opening article “Big Brothers of Multiculturalism,” Ms. Julienne Eden Buši?’s points got me thinking.

Follow along from this excerpt from the article:

The time I got into an argument with a waiter named Tony at a restaurant behind the Votiv Church and was escorted roughly out, never to return.  They must have been snickering at my indignation, these omnipresent agents.  Who does she think she is?  Creating a ruckus, disturbing the other guests?  Another time at the Prater amusement park, had they been there, too, when I had….oh the indignity of it all!  It was bad enough that I remembered, but to think that others remembered, too, that they had written down all the gory details in a secret report so that others could visualize it as well….that they had then talked about it with still more people, perhaps their wives or colleagues, chuckled again about the “American girl”, her scandalous behavior, her embarrassment, excessiveness…this was unbearable.  Who did she think she was, anyway?  On the other hand, many of the other dossier allegations, observations, statements, conclusions were total fabrications, less believable than if they had written that I’d suddenly grown a long, hairy tail and sprouted horns, and intended quite obviously to gain praise from one’s boss, or perhaps a raise in position or salary.  So how effective, after all, was the notorious spy agency, if its actions were predicated upon some agent’s literary flights of fancy?

The exact time frame of that statement is unknown, and information gathered in that same time is also unknown. The real question then is, What is the quality of the information gathered? Is it from a trusted source? How reliable are the “facts”? Someone gathered it from somewhere, but was that information handled correctly? It would also be a safe bet that some of this information recorded was not at all accurate, but an all out fabrication of a persons mind.

Fast forward to 9-11. After the attacks on America, the Federal Government shifts into ultra high gear. Overdrive is an understatement. The effort and investment are mirrored by Defense Secretary Robert Gates, “We did as we so often do in this country…the attitude was, if it’s worth doing, it’s probably worth overdoing.”

The article is a thought starter if largely unverified. Interesting to consider search, surveillance, and content processing in the context of Eden Buši?’s remarks.

Glenn Black, September 19, 2010

Freebie

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta