Top Papers in Data Mining: Some Concern about Possibly Flawed Outputs

January 12, 2015

If you are a fan of “knowledge,” you probably follow the information provided by I read “Research Leaders on Data Science and big Data Key Trends, Top Papers.” The information is quite interesting. I did note that the paper was kicked off with this statement:

As for the papers, we found that many researchers were so busy that they did not really have the time to read many papers by others. Of course, top researchers learn about works of others from personal interactions, including conferences and meetings, but we hope that professors have enough students who do read the papers and summarize the important ones for them!

Okay, everyone is really busy.

In the 13 experts cited, I noted that there were two papers that seemed to call attention to the issue of accuracy. These were:

“Preventing False Discovery in Interactive Data Analysis is Hard,” Moritz Hardt and Jonathan Ullman

Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images,” Anh Nguyen, Jason Yosinski, Jeff Clune.

A related paper noted in the article is “Intriguing Properties of Neural Networks,” by Christian Szegdy, et al. The KDNuggets’ comment states:

It found that for every correctly classified image, one can generate an “adversarial”, visually indistinguishable image that will be misclassified. This suggests potential deep flaws in all neural networks, including possibly a human brain.

My take away is that automation is coming down the pike. Accuracy could get hit by a speeding output.

Stephen E Arnold, January 12, 2015

RapidMiner Cloud Includes Wisdom of the Crowds Operator Recommendations

November 13, 2014

The article on Inside BigData titled RapidMiner Moves Predictive Analytics, Data Mining and Machine Learning into the Cloud promotes RapidMiner Cloud, the recently announced tool for business analysts. The technology allows for users to leverage over 300 cloud platforms such as Amazon, Twitter and Dropbox at an affordable price ($39/month.) The article quotes RapidMiner CEO Ingo Mierswa, who emphasized the “single click” necessary for users to gain important predictive analytics. The article says,

“RapidMiner understands the unique needs of today’s mobile workforce. RapidMiner Cloud includes connectors to cloud-based data sources that can be used on-premises and in the cloud with seamless transitioning between the two. This allows users to literally process Big Data at anytime and in any place, either working in the cloud or picking up where they left off when back in the office. This feature is especially important for mobile staff and consultants in the field.”

RapidMiner Cloud also contains the recently launched Wisdom of the Crowds Operator Recommendations, which culls insights into the analytics process from the millions of models created by members of the RapidMiner community. The article also suggests that RapidMiner is uniquely capable of integration with open-source solutions, rather than competing, the platform is more invested in source-code availability.

Chelsea Kerwin, November 13, 2014

Sponsored by, developer of Augmentext

MarkLogic: Banging a Drum in Hopes of Drowning Out Open Source NoSQL Reggae Beat

October 3, 2014

I read “MarkLogic Positioned as a Leader in NoSQL Document Databases Report by Independent Research Firm.” The research firm is the mid tier outfit Forrester Research Inc. Forrester creates “wave” reports. These are Forrester’s response to various grid, quadrants, and tables cranked out by Gartner, Ovum, Butler, Kelsey, and a life boat stuffed with consulting firm shakeout survivors. Dated October 2, 2014, the MarkLogic news release will be the first of a half dozen or more issued by companies in this “independent research firm’s” report. The mid tier analyses are crafted so that negatives are swathed in high density, low impact foam like the spray on insulation.

Why not?

Like Heaven’s Gate’s media event, any publicity is good publicity. At least, that’s the public relations mantra. Look at IBM Watson and its BBQ sauce recipe with tamarind. I mention that innovation as frequently as possible.

Well, let me do my part for this report:

The write up asserts:

“MarkLogic offers the most mature and scalable NoSQL document database. Unlike other NoSQL document databases, MarkLogic has been offering a NoSQL solution for more than a decade,” stated Forrester in the report that evaluated select companies against 57 criteria. “MarkLogic has the most comprehensive data management features and functionality to store, process, and access any kind of structured and multi structured data.” Forrester’s evaluation of NoSQL document database vendors scored factors like performance, scalability, integration, security, high availability, workload management and form factor. MarkLogic was cited as a Leader in the evaluation, receiving its highest score in the go-to-market category.

Okay. The news release provides a link so the reader can get a copy of the “independent research firm’s” report. If you want to skip the original document and go to the registration form so you can download the “independent research firm’s” report, navigate to In my experience, some follow up by the “leader” MarkLogic may take place.

In my view, content marketing covers these “independent” reports. The idea makes clear that attention is required in order to kindle interest in a product or a service. Now MarkLogic is an Extensible Markup Language data management system. The company has been in business since 2003. The firm has ingested more than $70 million in venture funding. The firm has experienced the same type of revolving door for senior management that other ageing starts up experience; for example, Lucid Imagination (now Lucid Works, which I write as Lucid Works. Really?). MarkLogic, in order to meet stakeholders’ expectations, has to find a growth bull, get it in a corral, and covert the animal to high value revenue.

Several observations:

  1. Proprietary XML systems positioned as NoSQL alternatives have to find a way to convince a prospect that proprietary is a better value than open source. The impact of Hadoop, a variant of Google’s Big Table, is long in the tooth and faces some of its own value challenges.
  2. Companies like Oracle are providing some of its clients with the comfort of a proprietary system with compatibility with open source technology. Thus, some large companies may be reluctant to dismount one old nag and climb on another. IBM also does some anti open source marketing but that’s another story. For some insights, run a query for Watson on the Beyond Search index.
  3. The noise surrounding NoSQL is creating some confusion. This means that firms that are neither big or small have to find a way to make their size into a positive. Enter content marketing and reports that present a group of companies in a simplified table.
  4. Do the “independent” experts use the products included in a variant of the Boston Consulting Group’s matrix? You know: Install, optimize, customize, and utilize with their own brain, fingers, and eyeballs? My hunch is that none of this “real” experience stuff is germane to cranking out an “independent” report. Just my uninformed opinion, you understand.

If a company requires a NoSQL solution, how do those firms select vendors? Based on the research that IDC used to skip Dave Schubmehl to expert status, large companies are more likely to try open source for a new project. Smaller firms often look for brand name software in order to show investors that base technology has a brand name.

Forrester-type firms (Gartner, IDC, Ovum, etc.) generate “independent” reports to inflate the balloon. The French have a delightful verb for this: “se gonfler”. So, nous [MarkLogic] gonflons notre ballon. (If the translation is poor, blame Google, the inventor of Big Table more than a decade ago.)

Stephen E Arnold, October 3, 2014

Taken in by Data Mining and Transfinancial Economics

April 2, 2014

Have you ever heard of transfinancial economics? It is a concept originated by Robert Searle and he writes about the topic and other related concepts on his blog The Economic Realms. Searle explains that:

“[Transfinancial economics] believes that apart from earned money, new unearned money could be electronically created without serious inflation notably for key climate change/ environmentally sustainable projects, and for high ethical/ social “enterprises.” “

It is a possible theory that could be explored, but while investigating Searle’s blog posts and his user profile it comes to light that Searle is either an extremely longwinded person or he is a dummy SEO profile. While trying to study his reasoning for transfinancial economics, he wrote a blog post that explains how data mining will be important to it.

He then copied the entire Wikipedia entry on data mining. Browsing through his other posts, he has copied other Wikipedia entries among a few original entries. If Searle is a real person, his blog follows a Pat Gunkel-esque writing style. He spins his ideas to connect to each other from his transfinancial economics to improvisation whistling. If you have time, you work through the entire blog for an analysis of the discipline and how transfinancial economics works. We doubt that Searle will be writing a book on the topic soon.

Whitney Grace, April 02 2014
Sponsored by, developer of Augmentext

Digging for Data Gold

April 1, 2014

Tech Radar has an article that suggests an idea we have never heard before: “How Text Mining Can Help Your Business Dig Gold.” Be mindful that was a sarcastic comment. It is already common knowledge that text mining is advantageous tool to learn about customers, products, new innovations, market trends, and other patterns. One of big data’s main scopes is capturing that information from an organization’s data. The article explains how much data is created in a single minute from text with some interesting facts (2.46 million Facebook posts, wow!).

It suggests understanding the type of knowledge you wish to capture and finding software with a user-friendly dashboard. It ends on this note:

“In summary, you need to listen to what the world is trying to tell you, and the premier technology for doing so is “text mining.” But, you can lean on others to help you use this daunting technology to extract the right conversations and meanings for you.”

The entire article is an overview of what text mining can do and how it is beneficial. It does not go further than basic explanations or how to mine the gold in the data mine. That will require further reading. We suggest a follow up article that explains how text mining can also lead to fool’s gold.

Whitney Grace, April 01, 2014
Sponsored by, developer of Augmentext

Funnelback Advocates Big Data Mining

February 9, 2014

It is a new year and, as usual, there are big plans for big data. Instead of looking ahead, however, lets travel back to the July 4, 2012 Squiz and Funnelback European User Summit. On that day, Ben Pottier gave a discussion on “Big Data Mining With Funnelback.” Essentially it is a sales pitch for the company, but it is also a primer to understanding big data and how people use data.

At the beginning of the talk, Pottier mentions a quote from the International Data Corporation:

“The total amount of global data is expected to grow almost 3 zettabytes during 2012.”

That is a lot of ones and zeroes. How much did it grow in 2013 and what is expected for 2014? However much global data is grown, Pottier emphasizes that most of Funnelback’s clients have 75,000 documents and as it grows bigger organizations need to address how to manage it. Over the basic explanation, Pottier explains the single biggest issue for big data is finding enterprise content. In the last five minutes, he discusses data mining’s importance and how it can automate work that used to be done manually.

In Pottier’s talk, he explains that search is a vital feature for big data. Ha! Interesting how search is stretched to cover just about any content related function. Maybe instead of big data it should be changed to big search.

Whitney Grace, February 09, 2014

Sponsored by, developer of Augmentext

Visual Mining Redesign

January 18, 2014

We are familiar with Visual Mining and its range of dashboard and data visualization software. Currently, Visual Mining has been working on products that help users better understand and analyze actionable business data. Its enterprise software line NetCharts is compatible across all platforms, including mobile and tablets. The company recently released their Winter 2013 Chartline Newsletter.

Along with the usual end of the year greetings and gratitudes, the first note of business in the newsletter addresses is the Web site’s redesign.

Among the new features are:

  • “Live Demo We would like to invite you to take a virtual test drive of our live NetCharts Performance Dashboards (NCPD) demo to see our newly restyled dashboard KPI’s.
  • Blog Among the new items to explore on our site includes our new blog. This developer driven blog features new content with many different topics including tips and simple tricks to help you build and style your charts and dashboards. Keep coming back for lots more new content that will be added each month.
  • Chart Gallery We also have a new chart gallery, which features all new examples with many different kinds of chart types to demonstrate some of the countless possibilities. We also added new chart type categories such as Alerting Charts and Showcase Charts. The Alerting Charts include different chart types that use alert zones while the Showcase category features chart examples with new and unusual styling approaches to demonstrate the flexibility of our charts.”

We have to wonder if the redesign came from the lack of Web traffic. Most Web sites are losing traffic, among them are content processing vendors. Does Visual Mining hope to generate sales more traffic based on their new look? We hope so.

Whitney Grace, January 18, 2014

Sponsored by, developer of Augmentext

Free Data Mining Book

January 7, 2014

We enjoy telling you about free resources, and here’s another one: Mining of Massive Datasets from Cambridge University Press. You can download the book without charge at the above link, or you can purchase a discounted hardcopy here, if you prefer. The book was developed by Anand Rajaraman and Jeff Ullman for their Stanford course unsurprisingly titled “Web Mining.” The material focuses on working with very large data sets and emphasizes an algorithmic approach.

The description reminds us:

“By agreement with the publisher, you can still download it free from this page. Cambridge Press does, however, retain copyright on the work, and we expect that you will obtain their permission and acknowledge our authorship if you republish parts or all of it. We are sorry to have to mention this point, but we have evidence that other items we have published on the Web have been appropriated and republished under other names. It is easy to detect such misuse, by the way, as you will learn in Chapter 3.”

Nice plug there at the end. If you’re looking for more info on working with monster datasets, check out this resource—the price is right.

Cynthia Murrell, January 07, 2013

Sponsored by, developer of Augmentext

Palantirs Growth Continues Following the 2011 Move to Australia

November 11, 2013

The article titled The Rise and Rise of Palantir and Its Deep Domain Knowledge on Crikey follows the move of Palantir Technologies, a datamining company with a 2 million dollar investment from the CIA, to Canberra, Australia. Palantir has seen its fair share of press, good and bad, but ever since Anonymous hacked their system and discovered their plan to destroy WikiLeaks’ credibility in 2011, the adjective “ruthless” seems appropriate. The company, founded in 2002, moved to Australia in 2011 and has seen enormous success. The article explains,

“The Department of Defence began using some of its software in 2011 via third-party providers, but this year has seen the company grow rapidly… Top-flight lobbying firm Government Relations Australia was hired to represent them in Canberra and state capitals. In the last few weeks, the company has secured multi-year contracts with the Department of Defence’s Intelligence and Security branch worth nearly $2 million, all secured via limited tender…Those of course are the contracts we know about.”

The article speculates that Palantir is being utilized by the Australian government given the proven effectiveness of datamining for national security. While the ACLU believes they pose a massive threat to the privacy of civilians, governments continue to invest in cybersecurity companies.

Chelsea Kerwin, November 11, 2013

Sponsored by, developer of Augmentext

There Is Much to be Learned about Visual Dating Mining

October 6, 2013

What is visual data mining? I know that data mining involves searching through data with a computer program in search of specific information. I am guessing that visual data mining includes the same aspect except it presents the data using various patterns. Am I right? Am I dead wrong? I do not know, but I do know the way to find the answer is to read Visual Data Mining-Theoyr by Arturas Mazeika, Michael H. Bohlen, and Simeoin Simoff.

Here is the item description from Amazon:

“The importance of visual data mining, as a strong sub-discipline of data mining, had already been recognized in the beginning of the decade. In 2005 a panel of renowned individuals met to address the shortcomings and drawbacks of the current state of visual information processing. The need for a systematic and methodological development of visual analytics was detected. This book aims at addressing this need. Through a collection of 21 contributions selected from more than 46 submissions, it offers a systematic presentation of the state of the art in the field. The volume is structured in three parts on theory and methodologies, techniques, and tools and applications.”

This book usually retails for a whooping $99.00 or $63.91 with the Amazon discount. It is still a hefty chunk of change for a 163 page book, which is why we are pleased to say if you are a member of ISBN Book Funder or then it is available to you for free. Other books are free for members. If that does not appeal to you check our your local academic library.

Whitney Grace, October 06, 2013

Sponsored by, developer of Augmentext

Next Page »