SAS a Visionary in EMM. EMM?
October 21, 2010
Enterprise Marketing Management. Yep, the workhorse of statistics majors who dabble in oil field exploration, political polling, military intelligence, and cancer researchers are ready for their next job move. Enterprise marketing management. Now that’s going to be a shock to the MBAs who rely on Excel’s math functions to determine a sample size or calculate a rate of return.
Applications of heavy duty stats can wander over the entire landscape of business problems. But one of the azure chip consulting firms has killed off a dog (enterprise search) and created a Frankenstats. Navigate to “SAS Positioned in Visionaries Quadrant in Leading Industry Analyst Firm’s 2010 “Magic Quadrant…” and make up your own mind about this market positioning.
For me, this was an interesting passage:
According to Gartner enterprise marketing management “encompasses the business strategies, process automation and technologies required to effectively operate a marketing department, align resources, execute customer-centric strategies and improve marketing performance. This includes functionality for campaign management, lead management, MRM, loyalty management, event management, industry-specific functionality, marketing performance management and analytics.”
Frankenstats?
Stephen E Arnold, October 21, 2010
Wolfram Alpha and Search
October 12, 2010
I read “Wolfram Alpha and the Future of Search.” When I first looked at Wolfram Alpha, I did not consider the system a search engine. Google has a similar function. The idea is that an appropriate query will generate an answer. In my first queries with Wolfram Alpha, the math questions worked well. The more generalized query elicited some head scratching from the Wolfram Alpha system.
The write up summarizes some remarks made by Stephen Wolfram, a well know wizard and software genius, whose Mathematica finds use in many PhD study areas, research labs in Silicon Valley, and puzzle solvers who find Mathematica just what the doctor ordered to avoid a silly addition error.
The write up contained two points which I found interesting.
First, Dr. Wolfram allegedly said something along the lines:
Traditional search engines help us find documents in that mountain of words. But they do very little to distill those words into knowledge, or to answer our questions. The challenge in the coming years, Wolfram said, was to make more of these files and documents computable. That would enable systems like Wolfram/Alpha to digest them, and to use them to produce answers and analysis.
Dr. Wolfram is right in the flow of the data fusion trend. The question I would raise is, “What happens when those generating the outputs fiddle the game?” I don’t think “trust”, “reputation,” or “honor” will satisfy my need for some substantive reassurance. The nifty interfaces and the point-and-click access to “the answer” may be a mixed blessing.
Second, Dr. Wolfram alleged said something along these lines:
But the way Wolfram sees it, more of us will produce information in a style (or on templates) that will make it computable, and machines like his will eventually be able to answer all sorts of questions. In a sense, an early stage of this pre-processing is already happening: An entire industry is formatting Web pages to make them more searchable.
Bingo. Data fusion. The question I would raise is, “What happens when one of the nifty acquisition and transformation systems cannot process certain content?” In my experience, the scale of operation at even Twitter content centric start ups is a significant amount of data. Presenting information as complete that may quite incomplete seems to be a sticky wicket to me.
Is this bulk content processing and machine answering the future? Google, Recorded Future, and DataSift are rushing toward that end zone. Trends are fascinating, and in this case, data fusion tells us more about the market’s need for an easy-as-pie way to get actionable information than about the validity of the methods and the appropriateness of the outputs.
Stephen E Arnold, October 12, 2010
Yahoo and Prediction
October 5, 2010
Yahoo’s public relations machine is working hard to deal with the flood of news about executive turnover and the questions about Yahoo’s management leadership. I wanted to snag this “What Can Search Predict?” item before it becomes unfindable. Google Instant and Bing are wonder services but pinning down specific documents in the brave new world is getting more difficult in my opinion.
The point of the write up is that user behavior at a point in time provides information about what’s hot and what’s not. I understand this. Analyzing usage data is not a new thing, nor is the math used to clump clicks and plot them, massage them, and extrapolate from them. Most college grads had a chance to try their hand at this type of math in classes from psychology to biology and statistics. (I can hear the groans now.)
Yahoo says:
In many cases, we found that these traditional predictions performed on par with those generated from search. Although search data are indeed predictive of future outcomes, alternative information sources often perform equally well.
The idea is that big data are good but specific, narrow sets of data from specific corpuses may deliver better indicators of user future actions.
Makes sense to me. Big data are big. More precisely constrained data are narrower. When looking for a specific indicator, why not consider the constrained data? Makes sense to me, but I would prefer a method that uses big data, when available, and more constrained data. Two sets of outputs can be examined.
Yahoo adds:
The potential for search-based predictions seems greatest for applications like financial analysis where even a minimal performance edge can be valuable, or for situations in which it is cumbersome or expensive to collect and parse data from traditional sources. Ultimately, search can be useful in predicting real-world events, not because it is better than other traditional data, but because it is fast, convenient, and offers insight into a wide range of topics.
Several questions waddled across my mind:
- What is the current Yahoo use case for its insight? I know that each time I return to my Yahoo Mail, the system does not remember me, nor does it present options to me based on my behavior or a larger group’s in my view. I have to click, click, click to see a list of email. Maybe Yahoo can provide some concrete examples?
- In the midst of the shift to Bing search, where does this predictive stuff fit. I was looking for a “mens black watch” on Yahoo Shopping. Try the query. I am not sure what can be done to improve the results, but search results mixed ranges with specific prices on specific models. Huh? With user data – either big or constrained – predictive methods should reduce confusion, not create a “huh” moment for me.
- Is this a “level” problem? Here’s what I am thinking. The problem in search that Yahoo is addressing seems to be down in the weeds. There are larger findability problems with Yahoo’s system. For example, in the shopping example a user must click on a “more” link in order to access the shopping search feature. Most users don’t know to what that “More” refers. Is this a contributing factor to user frustration which in turn may explain some of the loss of polish on the purple Yahoo Y?
Worth reading and then finding a use case (which I may be missing) before recycling information already in the channel in my opinion.
Stephen E Arnold, October 5, 2010
Freebie
Aster Data Garners $30 Million
September 24, 2010
With increasing databases and demand for its successful conversion into meaningful rich resources, it is no wonder that the financial investors and promoters greatly acknowledge this remarkable opportunity, and foresee a tremendous growth in this sector. It is evident from the recent announcement from the market leader in big data management and data analysis for data-driven application, Aster Data, of accumulating a surplus fund of $30 million in its Series C round, contributed by its existing and new investors.
Riding on the explosive data volumes growth across organizations, Aster data has leveraged on the great demand for rich and advanced analytics, which has huge potential to tap on the gigantic $20 billion database market. Aster data’s recent press release asserts that its strong innovation and execution have led to double its revenue every year. This demonstrates a great momentum, which has been recognized by the renowned business organizations like the World Economic Forum, Gartner Inc., and won it excellence awards by TechWeb and San Francisco Business Times, for its product and service.
According to the press release, Aster Data has recently introduced nCluster 4.6 and its ‘Data-Analytics Server’, which is, “specifically designed to enable organizations to cost effectively store and analyze massive volumes of data,” with the help of the richer and faster processing ‘in-database’ analytics engine that uses MapReduce, a technology developed by Google. The company has attracted biggies like MySpace, comScore, Barnes & Noble, ShareThis, and Akamai, helping them with deeper insights into their data.
The high growth potential and proven market leadership has reinforced the trust of its existing investors and new industry visionaries like David Cheriton, who previously backed high-growth companies like Google and VMware. With lots of money in its booty, Aster Data wants to accelerate growth, scale operations, and expand its global market share. We believe in today’s computing scenario, such investments are definitely rewarding. The company can move in almost any direction it wishes. Will Aster Data stick with data management, data analytics, or move into closely allied data and information sectors? We will be watching.
See our interview with the new senior manager of Aster Data in the ArnoldIT.com Search Wizards Speaks collection.
Harleena Singh, September 24, 2010
Exclusive Interview: Mats Bjore, Silobreaker
September 23, 2010
In some follow up work for the Lucene Revolution Conference, I spoke with Mats Bjore, the former blue chip consultant at the blue chip McKinsey on Tuesday, September 21, 2010. After we discussed our respective talks at the upcoming open source conference sponsored by Lucid Imagination, I asked about the Silobreaker technology. Mats explained that there were some new developments that would be discussed at the upcoming Boston conference.
If you have not used Silobreaker.com, you will want to point your browser at www.silobreaker.com. When you click on the interface, you get an instant report. You can run a more traditional query, but Silobreaker uses proprietary methods to highlight the most important stories, provide visualizations related to the story and entities mentioned, and links to related content. The public Silobreaker is a demonstration of the more robust enterprise solution available from the firm. Silobreaker is in use at a number of interesting client facilities in the US and elsewhere.
I captured our conversation using the handy Skype recorder add in. The full text of our conversation appears below.
Mi, Mats, it’s good to talk with you again. I keep hearing about Silobreaker, so you are getting quite a bit of attention in the business intelligence space. What’s new with Silobreaker in the last few months?
Yes, we are getting quite a bit of media attention. As you know, the primary objective of launching the free news search engine was to showcase our technology to enterprise users and to make them see how easily a Silobreaker solution could be tweaked to fit their domain and requirements. The Silobreaker Enterprise Software Suite (“SESS”) was successfully installed last year as the core engine for the Swedish Armed Forces new news intelligence system and we are just about to release a SaaS product online called Silobreaker Premium that is specifically aimed at business and government agency users who don’t need or want a standalone installation. We already have some US clients as pilot clients.
Silobreaker’s splash screen at www.silobreaker.com
How do you describe Silobreaker at this stage in its development?
We’ve come a long way, yet have an exciting product roadmap ahead of us. But most importantly, we have finally reached some milestones in terms of commercial robustness and viability with the platform. Silobreaker Premium will be an exciting new product in the marketplace. Also since our technology and team is highly customizable – our clients and users demands is the most important guide for our development,
What new services have you introduced that you can talk about?
As I said, Silobreaker Premium is the new product for us this year, but we also develop a lot of integrated entity and content management functions for clients that want to have integrated Intelligence analytical tools.
What new functions are available to commercial licensees?
We think Silobreaker Premium is a powerful enterprise product for professional media-monitoring, early warning, risk management, intelligence and decision support.
Available as SaaS (Software as a Service) in a single intuitive and secure user interface, you are able to define monitoring targets, trigger content aggregation, perform analyses, and display results in customized dashboards, e-mail alerts and by auto-generated reports.
What else can the system do?
Let me list a few of the functions. You can set up watch lists and collaborate with colleagues. Also, it is easy to monitor news, reports, multimedia and social media. Clients can track big movers in the news by heat tools and other analytics. A user can easily save and export findings straight into third party applications. We make it very easy to mix and match free and premium content.
What’s the pricing?
Good question for sure. Silobreaker Premium will be priced with a single monthly flat fee per enterprise to allow and encourage large user groups within an organization to use the service regardless of the number of queries, monitoring agents, dashboards, watch lists, alerts, or reports.
There has been quite a bit of “noise” about Palantir and Recorded Future? I thought Silobreaker provided similar functions. Is that correct?
That is correct. I think conceptually we are very similar in what we are trying to deliver to our customers, but there are also some noticeable differences. We crawl different databases, we use different search methodologies, and as companies we are different in size and our pricing differs. Also I believe that from an analyst perspective the Silobreaker , in its customized versions, can provide tools that encompasses the whole intelligence process to a price that enables even large organizations to deploy our systems to everyone. We believe in Silobreaking also when it comes to collaboration.
And silobreaking means what?
Most organizations have “walls” between units. Information in one silo may not be easily accessible to authorized users in other silos. So, our product is a “silobreaker.”
I like the name. My view is that pr, venture capitalists, and the name “Google” blow some technologies up like a Macy’s Thanksgiving Day balloon. What anchors the Silobreaker approach? Don’t give me PR talk, okay?
No problem. Our independence and our beliefs makes Silobreaker unique. We are not VC-financed and have managed to build the business through our own money and customer revenues. That may mean that things have taken a bit longer, but it shows that what we do is for real, which is far away from the many “hype today gone tomorrow” companies that we’ve seen in passing over the last few years. We also anchor all we do in a strong belief in that information overload is not evil but a reassuring consequence of freedom and innovation, but that it is the ability to refine this overload and extract benefits from it that truly create the “killer app” that everybody needs.
Let’s assume I am a busy person. I have to make decisions and I don’t have much time. What do I have to do to get a custom output from Silobreaker?
Not much. Our Premium users typically do two things to generate custom output. Firstly, they create one or several watch lists. This could be people, products, companies or anything else they are interested in – or a list of favorite publications. Such lists can then be used to make queries across all our tools and features or to customize dashboards, email alerts and reports.
What happens if a new content stream becomes available. Say, for example, the Tumblr micro-blogging service. What is required to intake that content and include its content in my results? Is there an open source component or API for Silobreaker?
We support many different types of content. At the moment we will add open sources on request which are added easily through RSS/Atom feeds or through crawling the site. As a general rule, we do not allow users to add sources themselves. Having said that, though, Premium users can add “internal content” through an upload facility, enabling them to mix internal reports and posts with external content.
I find some visualizations pretty but either confusion, meaningless, or downright misleading. What has Silobreaker done to make information for quickly apprehendable? I guess this is called the UX or user experience?
We actually believe that graphics and visualizations should play as big a role for text-mining as it does for numerical analysis. However, I agree with you that usability becomes a big issue in order to make users understand what the visualizations are showing and how they can be used for more in-depth analysis. That is something we are working on all the time, but users must also realize that keyword-based queries generating just lists of search hits can never be the way forward for search, so we hope they are open-minded and about these new ways of presenting results.
As you look ahead, what are the two or three big changes you anticipate in next generation information access?
The focus on “how many hits at what speed” feels very much like first generation features and haven’t really helped with information overload. Context, analysis, and query customizations will be the challenges for next generation algorithms and services.
How can a reader explore Silobreaker.
Silobreaker.com is free and anyone is welcome to a free trial of Silobreaker Premium. Just get in touch.
If a person wants more information, what’s the best way to get that information or contact you?
Contact us directly at sales@silobreaker.com or call or sales director Anders Kasberg at +46 (0) 8 662 3230.
See you in Boston and then in Bethesda the following week, okay.
Yes.
Stephen E Arnold, September 23, 2010
Freebie. The last time I was in Sweden I got herring. I want a taco.
Exclusive Interview: Quentin Gallivan, Aster Data
September 22, 2010
In the last year or two, a new type of data management opportunity has blossomed. I describe this sector as “big data analytics”, although the azure chip consultants will craft more euphonious jargon. One of the most prominent companies in the big data market is Aster Data. The company leverages BigTable technology (closely associated with Google) and moves it into the enterprise. The company has the backing of some of the most prestigious venture firms; for example, Sequoia Capital and Institutional Venture Partners, among others.
Aster Data, therefore, is one of the flagships in big data management and big data analysis for data-driven applications. Aster Data’s nCluster is the first MPP data warehouse architecture that allows applications to be fully embedded within the database engine to enable fast, deep analysis of massive data sets.
The company offers what it calls an “applications-within” approach. The idea is to allow application logic to exist and execute with the data itself. Termed a “Data-Analytics Server,” Aster Data’s solution effectively utilizes Aster Data’s patent-pending SQL-MapReduce together with parallelized data processing and applications to address the big data challenge. Companies using Aster Data include Coremetrics, MySpace, comScore, Akamai, Full Tilt Poker, and ShareThis. Aster Data is headquartered in San Carlos, California.
I spoke with Quentin Gallivan, the company’s new chief executive officer on Tuesday, September 22. Mr. Gallivan made a number of interesting points. He told me that data within the enterprise is “growing at a rate of 60% a year.” What was even more interesting was that data growth within Internet-centric organizations was growing at “100% a year.”
I asked Mr. Gallivan about the key differentiator for Aster Data. Data management and chatter about “big data” peppers the information that flows to me from vendors each day. He said:
Aster Data’s solution is unique in that it allows complete processing of analytic applications ‘inside’ the Aster Data MPP database. This means you can now store all your data inside of Aster Data’s MPP database that runs on commodity hardware and deliver richer analytic applications that are core to improving business insights and providing more intelligence on your business. To enable richer analytic applications we offer both SQL and MapReduce. I think you know that MapReduce was first created by Google and provides a rich parallel processing framework. We run MapReduce in-database but expose it to analysts via a SQL-MapReduce interface. The combination of our MPP DBMS and in-database MapReduce makes it possible to analyze and process massive volumes of data very fast.
In the interview he describes an interesting use case for Barnes & Noble, one of Aster Data’s high profile clients. You can read the full text of the interview in the ArnoldIT.com Search Wizards Speak service by clicking this link. For a complete list of interviews with experts in search and content processing click here. Most of the azure chip consultants recycle what is one of the largest collection of free information about information retrieval in interview form available at this time.
Stephen E Arnold, September 22, 2010
Freebie. Maybe another Jamba juice someday?
Exclusive Interview with Steve Cohen, Basis Technology
September 21, 2010
The Lucene Revolution is a few weeks away. One of the featured speakers is Steve Cohen, the chief operating officer of Basis Technology. Long a leader in language technology, Basis Technology has ridden a rocket ship of growth in the last few years.
Steve Cohen, COO, Basis Technology
I spoke with Steve about his firm and its view of open source search technology on Monday, November 20, 2010. The full text of the interview appears below:
Why are you interested in open source search?
The open source search movement has brought great search technology to a much wider audience. The growing Lucene and Solr community provides us with a sophisticated set of potential customers, who understand the difference that high quality linguistics can make. Historically we have sold to commercial search engine customers, and now we’re able to connect with – and support – individual organizations who are implementing Solr for documents in many languages. This also provides us with the opportunity to get one step closer to the end user, which is where we get our best feedback.
What is your take on the community aspect of open source search?
Of course, open source only works if there is an active and diverse community. This is why the Apache Foundation has stringent rules regarding the community before they will accept a project. “Search” has migrated over the past 15 years from an adjunct capability plugged onto the side of database-based systems to a foundation around which high performance software can be created. This means that many products and organizations now depend on a great search core technology. Because they depend on it they need to support and improve it, which is what we see happening.
What’s your take on the commercial interest in open source?
Our take, as a mostly commercial software company, is that we absolutely want to embrace and support the open source community – we employ Apache committers and open source maintainers for non-Apache projects – while providing (selling) technology that enhances the open source products. We also plan to convert some of our core technology to open source projects over time.
What’s your view on the Oracle Google Java legal matter with regards to open source search?
The embedded Java situation is unique and I don’t think it applies to open source search technology. We’re not completely surprised, however, that Oracle would have a different opinion of how to manage an open source portfolio than Sun did. For the community at-large this is probably not a good thing.
What are the primary benefits of using open source search?
I’ll tell you what we hear from customers and users: the primary benefits are to avoid vendor lock-in and flexibility. There has been many changes in the commercial vendor landscape over the fifteen years we’ve been in this business, and customers feel like they’ve been hurt by changes in ownership and whole products and companies disappearing. Search, as we said earlier, is a core component that directly affects user experience, so customizing and tuning performance to their application is key. Customers want all of the other usual things as well: good price, high performance, support, etc.
When someone asks you why you don’t use a commercial search solution, what do you tell them?
We do partner with commercial search vendors as well, so we like to present the benefits of each approach and let the customer decide.
What about integration? That’s a killer for many vendors in my experience.
Our exposure to integration is on the “back end” of Lucene and Solr. Our technology plugs in to provide linguistic capabilities. Since we deliver a reliable connector between our technology and the search engine this hasn’t been much of a problem.
How does open source search fit into Basis’ product/service offerings?
Our product, Rosette, is a text analysis toolkit that plugs into search tools like Solr (or the Lucene index engine) to help make search work well in many languages. Rosette prepares tokens for the search index by segmenting the text (which is not easy in some languages, like Chinese and Japanese), using linguistic rules to normalize the terms to enhance recall, and also provide enhanced search and navigation capabilities like entity extraction and fuzzy name matching.
How do people reach you?
Our Web site, at www.basistech.com, contains details on our various products and services, or people can write to info@basistech.com or call +1-617-386-2090.
Stephen E Arnold, September 21, 2010
Sponsored post
Quick and Dirty Sentiment Analysis
September 14, 2010
I thought “Most Common Words Unique to 1 Star and 5 Star App Store Reviews” provides some insight into how certain sentiment analysis systems work. The article said:
I wrote a script to crawl U.S. App Store customer reviews for the top 100 apps from every category (minus duplicates) and compute the most common words in 1-star and 5-star reviews, excluding words that were also common in 3-star reviews.
Frequency count against a “field”. Here are the results for positive apps:
awesome, worth, thanks, amazing, simple, perfect, price, everything, ever, must, iPod, before, found, store, never, recommend, done, take, always, touch
How do you know a loser?
waste, money, crashes, tried, useless, nothing, paid, open, deleted, downloaded, didn’t, says, stupid, anything, actually, account, bought, apple, already
“Sentiment” can be disceerned by looking for certain words and keeping count. So much for rocket science of “understanding unstructured text.”
Stephen E Arnold, September 14, 2010
Freebie
SwiftRiver: Open Source Pushes into the Intel Space
September 13, 2010
If you are one of the social netizens, you know it isn’t easy to keep track of, manage, and organize the hundreds of Twitter streams, Facebook updates, blog posts, RSS feeds, or SMS that you keep getting. Do not feel helpless as SwiftRiver comes to your aid, which is a free open source intelligence-gathering platform for managing real-time streams of data streams. This unique platform consists of a number of unique products and technologies, and its goal is to aggregate the information from multiple media channels, and add context related to it, using semantic analysis.
SwiftRiver can also be used as a search tool, for email filtering, to monitor numerous blogs, and verify real-time data from various channels. It offers, “Several advanced tools (social graph mining, natural language processing, locations servers, and twitter analytics) for free use via the open API platform Swift Web Services.” According to the parent site Swiftly.org, “This free tool is especially for organizations who need to sort their data by authority and accuracy, as opposed to popularity.” SwiftRiver has the ability to act quickly on massive amounts of data, a feat critical for emergency response groups, election monitors, media, and others.
There are multiple Swift Rivers. You want the one at http://swift.ushahidi.com or http://swiftly.org/.
Ushahidi, the company behind this initiative claims, “The SwiftRiver platform offers organizations an easy way to combine natural language/artificial intelligence process, data-mining for SMS and Twitter, and verification algorithms for different sources of information.” Elaborating further it states, “SwiftRiver is unique in that there is no singular ‘SwiftRiver’ application. Rather, there are many, that combine plug-ins, APIs, and themes in different ways that are optimized for workflows.”
Presently SwiftRiver uses the Sweeper App, the Kohana MVC UI, the distributed reputation system RiverID, and SwiftWebServices (SWS) as the API platform. The beauty here is that SwiftRiver is just the core, and it can have any UI, App, or API. It also has an intuitive and customizable dashboard, and the “users of WordPress and Drupal can add features like auto-tagging and more using Swift Web Services.” While you may download SwiftRiver and run it on your web server, SWS is a hosted cloud service, and does not need to be downloaded and installed.
Harleena Singh, September 13, 2010
Freebie
RSS Readers Dead? And What about the Info Flows?
September 13, 2010
Ask.com is an unlikely service to become a harbinger of change in content. Some folks don’t agree with this statement. For example, read “The Death Of The RSS Reader.” The main idea is that:
There have been predictions since at least 2006, when Pluck shut its RSS reader down that “consumer RSS readers” were a dead market, because, as ReadWriteWeb wrote then, they were “rapidly becoming commodities,” as RSS reading capabilities were integrated into other products like e-mail applications and browsers. And, indeed, a number of consumer-oriented RSS readers, including News Alloy, Rojo, and News Gator, shut down in recent years.
The reason is that users are turning to social services like Facebook and Twitter to keep up with what’s hot, important, newsy, and relevant.
An autumn forest. Death or respite before rebirth?
I don’t dispute that for many folks the RSS boom has had its sound dissipate. However, there are several factors operating that help me understand why the RSS reader has lost its appeal for most Web users. Our work suggest these factors are operating:
- RSS set up and management cause the same problems that the original Pointcast, Backweb, and Desktop Data created. There is too much for the average user to do and then too much on going maintenance required to keep the services useful.
- The RSS stream outputs a lot of baloney along with the occasional chunk of sirloin. We have coded our own system to manage information on the topics that interest the goose. Most folks don’t want this type of control. After some experience with RSS, my hunch is that many users find them too much work and just abandon them. End users and consumers are not too keen on doing repetitive work that keeps them from kicking back and playing Farmville or keeping track of their friends.
- The volume of information in itself is one part of the problem. The high value content moves around, so plugging into a blog today is guarantee that the content source will be consistent, on topic, or rich with information tomorrow. We have learned that lack of follow through by the creators of content creators is an issue. Publishers know how to make content. Dabblers don’t. The problem is that publishers can’t generate big money so their enthusiasm seems to come and go. Individuals are just individuals and a sick child can cause a blog writer to find better uses for any available time.