Enterprise Search: Baloney Six Ways, like Herring
December 21, 2010
When my team and I discussed my write up about the shift of some vendors from search to business intelligence, quite a bit of discussion ensued.
The idea that a struggling vendor of search—most often an outfit with older technology—“reinvents” itself as a purveyor of business intelligence systems—is common evoked some strong reactions.
One side of the argument was that an established set of methods for indexing unstructured content could be extended. The words used to describe this digital alchemy were Web services, connectors, widgets, and federated content. Now these are or were useful terms. But what happens is that the synthetic nature of English makes it easy to use familiar sounding words in a way to perform an end run around the casual listener’s mental filters. It is just not polite to ask a vendor to define a phrase like business intelligence. The way people react is to nod in a knowing manner and say “for sure” or “I’ve got it.”
Have you taken steps to see through the baloney passed off as enterprise search, business intelligence, and knowledge management?
The other side of the argument was that companies are no longer will to pay big money for key word retrieval. The information challenge requires a rethink of what information is available within and to an organization. Then a system developed to “unlock the nuggets” in that treasure trove is needed. This side of the argument points to the use of systems developed for certain government agencies. The idea is that a person wanting to know which supplier delivers the components with the fewest defects needs an entirely different type of system. I understand this side of the argument. I am not sure that I agree but I have heard this case so often, the USB with the MP3 of the business intelligence sound file just runs.
As we approach 2011, I think a different way to look at the information access options is needed. To that end, I have created a tabular representation of information access. I call the table and its content “The Baloney Scorecard, 2011.”
Baidu to Invest in Search Results Filtering
December 17, 2010
Those expensive coffees at boutique coffee shops are filtered. People like filtered coffee.
The goose knows why Baidu is probably going to be successful in certain markets. “Baidu to Spend $15 Mln to Screen Search Engine Results” reports that “China’s leading search engine plans to deploy $15 million to expunge illicit material and false information from its search results.” The source? State media.
Filtering makes some things better. One example is revenue derived from for fee service for the China market. Image source: Weekender.com
Observations:
- Filtering happens. What’s interesting is the price tag placed on the renewed effort. Details about the scale of the filtering or “expunging” appear in the Interfax.com write up
- Happy government officials. Reading between the lines I could see modest smiles of happiness on the faces of some government officials.
- Unbeatable advantage in the fastest growing and largest market for online in the world. No further comment necessary because some Google shareholders may ask, “Tell us again why you are not making every effort to maximize shareholder value in the world’s largest market?”
In my opinion, the $15 million is irrelevant. The message is the investment. Message received in Harrod’s Creek. I am not sure about elsewhere.
Stephen E Arnold, December 17, 2010
Freebie
Live Protest Map
December 11, 2010
Short honk: The new approach to search is to create an app that shows information. No search required. An interesting example of this is the Live Protest Map for London protesters. I am not sure how long the map will be online, but if you click now (9 am Eastern, December 11, 2010), you can view the map at this link.
Stephen E Arnold, December 11, 2010
Freebie
Goose Defeathered: Real Time Truth Revealed
December 5, 2010
The goose returned from snowy England and France. Alas, the trip to sunny Luxembourg was not possible. Luxembourg, as you know, is San Tropez North. The trip was uneventful. I wanted to call attention to the work of a sketch artist who heard my talk about real time search. I don’t recall using the phrase “no bullshit”, but I was cold and without adequate supplies of Diet Coke. The Skinker’s crew had something called beer, which I don’t drink.
I want to reproduce a summary of my talk which is now on Flickr at this link. The screenshot from my ancient browser renders the goose without feathers, a nice pair of bald spots and a pregnant tummy. Perfect!
I have posted one of the screenshots from my talk, and I will pepper my blog and its two or three readers with other screenshots from my winter wonderland Euro adventure in the next few days.
Where are those feathers?
Stephen E Arnold, December 5, 2010
Freebie, just like the talk at Skinker’s.
Can Analytics Turn Drivel into Diamonds?
November 10, 2010
Are Facebook posts drivel or diamonds? Perhaps a better question is, “Can analytics convert drivel into diamonds?” The answer may be, “Yes.”
The Facebook content does not require semantics to squeeze sense out of it. The fact that a person has posted information delivers a signal about content “value.” The article “Drivel on Facebook More Valuable Than We Think” references a Swedish university’s report that calls attention to the importance of the superficial contacts, apparently unnecessary comments, and banal status updates on Facebook.
Regarding the more-than-real pseudo-friends of Facebook, the article says:
These contacts in fact constitute highly useful networks, networks that make use of the ostensibly meaningless comments and updates.
The public value of messages from a semi-private ecosystem is high. Companies and public authorities are not aware of the value of Facebook and other social content, particularly streams of content. Analytic methods, both simple and complex, justifies the cost of running analyses across these data. Who knew that social networking would generate value beyond the satisfaction of communicating with friends and acquaintances. The message is clear, “It’s time to cash on this gold rush.” For more information about text and data mining, navigate to www.inteltrax.com.
Harleena Singh, November 10, 2010
Freebie
Microsoft Suggests Google Instant Is No Big Deal
October 7, 2010
Microsoft and Google have never hidden their competiveness when it comes to trying to outdo one another. In the PC World article “Microsoft’s Yusuf Mehdi Pooh-Poohs Google Instant”, one of Microsoft’s VP’s discusses Google’s new instant feature. The new feature is designed to refresh users search results while they are typing their queries and describes the program as “search before you type”. Microsoft’s response to the new system is “Google Instant is technologically “impressive” but misses the mark in what search engines should do.” Microsoft contends that the service, though fast, misses the mark because it does not help users to narrow down the information they are looking for. Microsoft promises its Bing service will continue to focus more on providing users with more of the information that they want. It is convenient to get quick results but if the results are not helpful then the speed factor is really of little importance. Our view is that most users won’t know how to turn it off, so for casual consumers, Google Instant is the new Google.
April Holmes, October 7, 2010
Tibco: Money and Mentos
September 27, 2010
Tibco (founded and directed by MIT- and Harvard-grad Vivek Ranadive) reported strong third quarter earnings. The company also made an interesting acquisition. Tibco purchased OpenSpirit, a maker of software used in oil and gas exploration in September 2010.
The “information bus” upon which Tibco’s fame rests is used as plumbing in a number of high profile industries. These include news, financial services, and government entities.
What’s important about Tibco is that the firm, in my opinion, has been one of the leaders in real time computing and information systems. Tibco’s approach can alert, pass messages, and transform content. With a bit of work, Tibco becomes the equivalent of the nervous system of a client. Many companies assert that their technology delivers a platform. Palantir, for example, is a relative newcomer to the platform pitch. But the reality is that companies like Tibco deliver a deeper, more fundamental architectural approach.
And Tibco makes the efficacy of its architecture easy to understand. How does Tibco communicate the value of its real time architecture? Click here.
For more information about Tibco, what I call a real platform company, navigate to the firm’s Web site at www.tibco.com. When I visited Tibco’s offices a decade ago, I remember see Yahoo News chugging happily away on Tibco’s servers. Yep, Tibco is more than Mentos.
Stephen E Arnold, September 27, 2010
Tweets with Pickles: DataSift and Its Real Time Recipe
September 25, 2010
We have used Tweetmeme.com to see what Twitter users are doing right now. The buzz word real time has usurped “right now” but that’s the magic of folks born between 1968 and 1978.
DataSift combines some nifty plumbing with an original scripting language for filtering 800 tweets a second. The system can ingest and filter other types of content, but as a Twitter partner, DataSift is in the Twitterspace at the moment.
Listio describes the service this way:
DataSift gives developers the ability to leverage cloud computing to build very precise streams of data from the millions and millions of tweets sent everyday. Tune tweets through a graphical interface or through its bespoke programming language. Streams consumable through our API and real-time HTTP. Comment upon and rank streams created by the community. Extend one or more existing streams to create super streams.
The idea is that a user will be able to create a filter that plucks content, patterns like Social Security Numbers, and metadata like the handle, geographic data, and the like. With these items, the system generates a tweet stream that matches the parameters of the filter. The language is called “Filtered Stream Definition Language” and you can see an example of its lingo below:
RULE “33e3891a3aebad56f962bb5e7ae4dc94” AND twitter.user.followers_count > 1000
A full explanation of the syntax appears in the story “FSDL”.
You can find an example on the DataSift blog which is more accessible than the videos and third party write ups about a service that is still mostly under wraps.
The wordsmiths never rest. Since I learned about DataSift, the service has morphed into “cloud event processing.” As an phrase for Google indexing, this one is top notch. In terms of obfuscating the filter, storage, and analysis aspect of DataSift, I don’t really like cloud event processing or the acronym CEP. Once again, I am in the minority.
The system’s storage component is called “pickles.” The filters can cope with irrelevant hash tags and deal with such Twitter variables as name, language, location, profiles, and followers, among others. There are geospatial tricks so one can specify a radius around a location or string together multiple locations and get tweets from people close to bankrupt Blockbuster stores in Los Angeles.
The system is what I call a next generation content processing service. Perched in the cloud, DataSift deals with the content flowing through the system. To build an archive, the filtered outputs have to be written to a storage service like Pickles. Once stored, clever users can slice and dice the data to squeeze gems from the tweet stream.
The service seems on track to become available in October or November 2010. A graphical interface is on tap, a step that most next generation content processing systems have to make. No one wants to deal with an end user who can set up his own outputs and make fine decisions based on a statistically-challenged view of his or her handiwork.
For more information point your browser at www.datasift.net.
Stephen E Arnold, September 25, 2010
Exclusive Interview: Quentin Gallivan, Aster Data
September 22, 2010
In the last year or two, a new type of data management opportunity has blossomed. I describe this sector as “big data analytics”, although the azure chip consultants will craft more euphonious jargon. One of the most prominent companies in the big data market is Aster Data. The company leverages BigTable technology (closely associated with Google) and moves it into the enterprise. The company has the backing of some of the most prestigious venture firms; for example, Sequoia Capital and Institutional Venture Partners, among others.
Aster Data, therefore, is one of the flagships in big data management and big data analysis for data-driven applications. Aster Data’s nCluster is the first MPP data warehouse architecture that allows applications to be fully embedded within the database engine to enable fast, deep analysis of massive data sets.
The company offers what it calls an “applications-within” approach. The idea is to allow application logic to exist and execute with the data itself. Termed a “Data-Analytics Server,” Aster Data’s solution effectively utilizes Aster Data’s patent-pending SQL-MapReduce together with parallelized data processing and applications to address the big data challenge. Companies using Aster Data include Coremetrics, MySpace, comScore, Akamai, Full Tilt Poker, and ShareThis. Aster Data is headquartered in San Carlos, California.
I spoke with Quentin Gallivan, the company’s new chief executive officer on Tuesday, September 22. Mr. Gallivan made a number of interesting points. He told me that data within the enterprise is “growing at a rate of 60% a year.” What was even more interesting was that data growth within Internet-centric organizations was growing at “100% a year.”
I asked Mr. Gallivan about the key differentiator for Aster Data. Data management and chatter about “big data” peppers the information that flows to me from vendors each day. He said:
Aster Data’s solution is unique in that it allows complete processing of analytic applications ‘inside’ the Aster Data MPP database. This means you can now store all your data inside of Aster Data’s MPP database that runs on commodity hardware and deliver richer analytic applications that are core to improving business insights and providing more intelligence on your business. To enable richer analytic applications we offer both SQL and MapReduce. I think you know that MapReduce was first created by Google and provides a rich parallel processing framework. We run MapReduce in-database but expose it to analysts via a SQL-MapReduce interface. The combination of our MPP DBMS and in-database MapReduce makes it possible to analyze and process massive volumes of data very fast.
In the interview he describes an interesting use case for Barnes & Noble, one of Aster Data’s high profile clients. You can read the full text of the interview in the ArnoldIT.com Search Wizards Speak service by clicking this link. For a complete list of interviews with experts in search and content processing click here. Most of the azure chip consultants recycle what is one of the largest collection of free information about information retrieval in interview form available at this time.
Stephen E Arnold, September 22, 2010
Freebie. Maybe another Jamba juice someday?
Thomson Reuters Pushes Further into Real Time
September 20, 2010
Well, it was only last year that Thomson Reuters revamped their website to provide content that is more intelligent to business professionals. It has since then had a series of launches of Elecktron – “a high-speed data distribution network, and interactive on-demand video platform Insider,” that forms part of the company’s financial markets data subscription business.
The NewMediaAge article, “Thomson Reuters Adds to Market Information Service,” informs about the third component of its revamped market information service – Eikon. The article reports that, “Eikon combines market information, news, analytics, and trading tools into a desktop facility, with social and mobile access.” Describing further it states that Eikon, “links with foreign exchanges, equities, fixed income and trading venues, with easy-click trade capabilities so that traders can act in real-time.”
We wonder if these specialized, real-time high-end services can help the company climb back on the growth roller coaster. Only time will tell.
Harleena Singh, September 20, 2010