February 12, 2014
The article How To Do Predictive Analytics with Limited Data from Datameer on Slideshare suggests that Limited Data may replace Big Data in import. The idea of “semi-supervised learning” is presented to handle the difficulties associated with creating predictions based on limited data such as expense and manageability and simply missing key data. The overview states,
“As it turns out, recent research on machine learning techniques has found a way to deal effectively with such situations with a technique called semi-supervised learning. These techniques are often able to leverage the vast amount of related, but unlabeled data to generate accurate models. In this talk, we will give an overview of the most common techniques including co-training regularization. We first explain the principles and underlying assumptions of semi-supervised learning and then show how to implement such methods with Hadoop.”
The presentation summarizes possible approaches to semi-supervised learning and the assumptions it is possible to make about unlabeled data (these include such models as clustering, low density and manifold assumptions). It also covers the concepts of Label Propagation and Nearest Neighbor Join. However, as inviting as it is to forget Big Data, and switch to predictive analytics with Limited Data the suggestion may sound too much like Bayes-Laplace.
Chelsea Kerwin, February 12, 2014
February 11, 2014
The article on PRNewswire titled Attivio and Quant5 Partner to Bring Fast and Reliable Predictive Customer Analytics to the Cloud explains the partnership between the two analytics innovators. Aimed at producing information from data without the hassle of a team of data scientists, the partnership promises to effectively create insights that companies will be able to act on. The partnership responds to the growing frustration some companies face with gleaning useful information from huge amounts of data. The article explains,
“Attivio built its business around the core principle that integrating big data and big content should not require expensive mainframe legacy systems, handcuffing service agreements, years of integration and expensive data scientists. Attivio enterprise customers experience business-changing efficiency, sales and competitive results within 90 days. Similarly, Quant5 arose from the understanding that businesses need simple, elegant solutions to address difficult and complex marketing challenges. Quant5 customers experience increased revenues, reduced customer churn and an affordable and fast path to predictive analytics.”
The possibility of indirect sales following in the footsteps of Autonomy and Endeca does seem to be a part of the 2014 tactics. The Attivio-Quant5, Inc. solutions are offered in five major areas of concern: Lead & Opportunity Scoring, Customer Segmentation, Targeted Offers, Product Usage and Product Relationships.
Chelsea Kerwin, February 11, 2014
February 7, 2014
What do you make of this headline from All Analytics: “Text And The City: Municipalities Discover Text Analytics”? Businesses have been using text mining software for awhile and understand the insights it can deliver to business decisions. The same goes for law firms that must wade through piles of litigation. Are governments really only catching onto text mining software now?
The article reports on several examples where municipal governments have employed text mining and analytics. Law enforcement agencies are using it to identify key concepts to deliver quick information to officials. The 311 systems, known as the source of local information and immediate contact with services, is another system that can benefit from text analytics, because it can organize and process the information faster and more consistently.
There are many ways text analytics can be helpful to local governments:
“Identifying root causes is a unique value proposition for text analytics in government. It’s one thing to know something happened — a crime, a missed garbage collection, a school expulsion — and another to understand where the problem started. Conventional data often lacks clues about causes, but text reveals a lot.”
The bigger question is will local governments spend the money on these systems? Perhaps, but analytic software is expensive and governments are pressured to find low-cost solutions. Expertise and money are in short supply on this issue.
Whitney Grace, February 07, 2014
February 6, 2014
Tucked in “The Morning Ledger: Companies Seek Help Putting Big Data to Work” was a quote attributed to SAS, a vendor of statistical solutions and software. The quote:
David Ginsberg, chief data scientist at SAP, said communication skills are critically important in the field, and that a key player on his big-data team is a “guy who can translate Ph.D. to English. Those are the hardest people to find.”
I have been working through patent documents from some interesting companies involved in Big Data. The math is somewhat repetitive, but the combination of numerical ingredients makes the “invention” it seems.
One common thread runs through the information I have reviewed in preparation for my lectures in Dubai in early March 2014. Fancy software needs humans to:
- Verify the transforms are within acceptable limits
- Configure thresholds
- Specify outputs often using old fashioned methods like SQL and Boolean
- Figure out what the outputs “mean”.
With search and content processing vendors asserting that their systems make it easy for end users to tap the power of Big Data, I have some doubts. With most “analysts” working in Excel, a leap to the types of systems disclosed in open source patent documents will be at the outer edge of end users’ current skills.
Big Data requires much of skilled humans. When there are too few human Big Data experts, Big Data may not deliver much, if any, value to those looking for a silver bullet for their business.
Stephen E Arnold, February 6, 2014
January 26, 2014
In the article titled Business Intelligence Usage Evolving Subtly on Smart Data Collective it is made apparent that new developments in business intelligence and analytics are still growing. The article assumes that the 2013 trend in cloud computing popularity will continue into 2014.
Looking further ahead, the article states:
“There could soon be a whole new BI paradigm, in which many affordable analysis processes are created at once, rather than devoting the whole budget to one effort. Enterprise Apps Today explained that this is another natural role for the cloud, with good projects surviving and poor options falling by the wayside, all without the effort or funding that would be necessary to accomplish the same on-site.”
The article cites a MarketsandMarkets survey that concluded that BI would be found useful in many sectors. More specifically, “the source indicated that the technology will grow at a rate of 8.3 percent through 2018.” That would mean a value of $20.8 billion in 2018, up from the current worth of $13.9 billion. However, others are less optimistic, believing the slow evolution of business intelligence may be too snail-like, since business intelligence is currently meeting sales resistance in France, as we reported in the article Business Intelligence: Free Pressure for Fee Solutions. Perhaps subtle is not enough?
Chelsea Kerwin, January 26, 2014
January 23, 2014
Do equations sell? Some color:
I know that I received negative feedback when I described the mathematical procedures used for Google’s semantic search inventions. I receive presentations and links to presentations frequently. Few of these contain mathematical expressions. In my forthcoming no-cost discussion of Autonomy from 1996 to 2007, I include one equation. I learned my lesson. Today’s search and content processing truth seekers want training wheels, not higher level math. I find this interesting because as systems become easier to use, the fancy math becomes more important.
Anyway, imagine my surprise when I received a link to a company founded 14 years ago. The outfit does business as Digital Reasoning, and it competes with Palantir (a segment superstar), IBM i2 (the industry leader for relationship analysis), and Recorded Future (backed, in part, by the Google). Dozens of other companies chase revenues in this content processing sector. Today’s New York Times includes a content marketing home run by an outfit called YarcData. You can find this op ed piece by Tim White on page A 23 of the dead tree version of the paper I received this morning (January 23, 2014). Now that’s a search engine optimization Pandas and the Times’s demographic can love.
To the presentation. My link points to Paragon Science at http://slidesha.re/1jpXAGd. I was logged in automatically, so you may have to register to flip through the slide deck.
Navigate to slides 33 and following. Slides 1 to 32 review how text has been parsed for decades. The snappy stuff kicks in on page 33. There are some incomprehensible graphics. These Hollywood style data visualizations are colorful. I, unlike the 20 somethings who devour this approach to information, have a tough time figuring out what I am supposed to glean.
At slide 42, I am introduced to “dynamic cluster analysis.” The approach echoes the methods developed by Dr. Ron Sacks-Davis in the late 1970s and embedded in some of the routines of the 1980 system that a decade later became better known as InQuirion and then TeraText.
At slide 44, the fun begins. Here’s an example which I am sure you will recall from your class in chaos mathematics. If you can’t locate your class notes, you can get a refresher at http://bit.ly/1mKR3G9 courtesy of Cal Tech, home of the easy math classes as I learned during my stint at Halliburton Nuclear Utility Services. The tough math classes were taught at MIT, the outfit that broke new ground in industry sponsored educational methods.
January 17, 2014
The article on Lexalytics Blog titled Tagging, Taxonomies, Categorization with Salience provides a guide to using salience to get the most out of data. The first step, Discovery, involves features like Themes which extracts proper noun phrases to give a summary of what the content contains. Step 2 uses Concept Topics which uses ontology built from Wikipedia’s semantic knowledge to relate one word to another.
The article explains how this works:
“Salience will use the relationship between the category samples to tag your data. So every time the word “lion” pops up in your data, that entry will be categorized as “cats”. Every time the word “cheetah” appears, salience will know that this animal belongs to the cat family, and will tag the document as “cats”. This method of categorization is awesome because you do not need to list every single member of the cat family to create this category.”
Step 3 is another way of classifying data; it is creating a query topic. You input all words associated with your topic after consulting Wikipedia and a thesaurus, then limit the search with more information, and you also include how closely one word must be to another for it to be relevant.
Chelsea Kerwin, January 17, 2014
January 12, 2014
Conceptual search allows users to search by concepts and ideas within information rather than basic keywords and phrases. Great idea, except that that the idea of conceptual search has been around since 1999. HP is touting it as a entirely brand new idea in the article, “Analytics For Human Information: Optimize Information Categorization With HP IDOL” posted on its own Web site. Rather than break directly into the “new” conceptual search, we are given the even better glittery term “categorization.” HP IDOL, using ExploreCloud-an SaaS solution for analytics and sights, offers an auto-categorization feature marked as a time saver and productive tool.
HP describes it as a magic tool:
“Powered by HP IDOL, ExploreCloud helps you uncover insights across all channels: web, mobile, social media, email, contact center, database, and storefront, so that you can organize and quantify content in a consistent, objective manner, resulting in data that is more accessible and consistent. And you can maintain existing legacy taxonomies and/or enrich them with contextual understanding. When you go beyond the limitations of what keywords can help you do, your whole world opens up. You can also discover the “unknown unknowns,” or topics you did not know to look for in the first place.”
The article stresses that regular keyword searching is far from abandoned, but its limitations are stressed. Keyword search’s weaknesses are addressed to the point of stating the obvious, and then it turns into a sales pitch for HP IDOL. Little is said about what exactly HP IDOL can do, other than organize data. HP, please tell us something we do not know.
Whitney Grace, January 12, 2014
January 10, 2014
Traveling around the Cape of Good Hope can be tricky business, but Connexica, a software company based in Staffordshire, England, plans on opening an office in South Africa. According to Midlands Business News in the article, “Staffordshire Based Connexica Expand Into South Africa” the business move is the result of a strategic partnership with Allard Verster Group.
Allard Verster Group specializes in business consulting and solutions and the partnership between the two companies will give Allard the ability to sell Connexica’s Business Analytics Software CXAIR in South Africa. CXAIR gives its users high-speed access to data with interactive diagrams.
Allard Verster Group already has an established client base in several sectors, including mining, manufacturing, healthcare, insurance, and local government. The partnership will allow Connexica to reach a new range of clientele. Both companies are excited about the venture and the new opportunities it presents:
“Head of Business Development Greg Richards says of the agreement ‘We are delighted to announce this partnership with the Allard Verster Group and I am particularly excited about CXAIR moving into a new territory and see real opportunity for CXAIR within the South African and wider African market.’
Craig Verster, Executive Director at Allard Verster Group commented: ‘Our partnership with Connexica significantly enhances our ability to deliver powerful search, business Intelligence and data analysis’ productivity solutions and services to business users. It validates our strategy to co-innovate with our partners to deliver measurable value to our clients.’ ”
Good news for Connexica and Allard Verster Group. Strategic partnerships are one of the best ways to drum up new business as well as expand a product’s market reach.
Whitney Grace, January 10, 2014
January 9, 2014
Right now, Datameer is happily positioned at the intersection of preparation and opportunity, we learn from “Datameer Picks Up $19M to Help Companies Do Analytics Along with Hadoop” at VentureBeat. The use of Hadoop has been soaring, and Datameer is perfectly poised to rise with it. As more companies implement the open-source database framework, Datameer is seeing more demand for its help making sense of it all. It doesn’t hurt that the data-analysis firm built its solutions with Hadoop in mind from the start—any IT professional knows that can mean the difference between headache-free implementation and long hours trying to force applications to play well together.
Investors have taken notice of Datameer’s advantages. Writer Jordan Novet relates:
“‘You’re actually seeing Datameer being purchased almost at the same time as Hadoop itself, at the same time as the distribution,’ Ben Fu, a partner at Next World Capital, said in an interview with VentureBeat. Next World led the latest round of funding for the company, bringing its total funding to $36.8 million. Datameer’s large contracts from customers such as British Telecom, Sears, and Visa, also made the company interesting, Fu said….
Next World Capital’s Fu is joining Datameer’s board. Alongside Next World, Kleiner Perkins Caufield & Byers and Redpoint Ventures also joined the round. The new money will provide Datameer with the firepower to sign up new customers, especially in Europe, where Next World has a program to put startups in touch with executives at enterprises from around the continent.”
Novet notes the funding can also allow Datameer to take advantage of further Hadoop advances, as well as respond to competition. Datameer was begun in 2009 by some of the original Hadoop contributors. Headquartered in San Mateo, California, the company also has offices in New York City and in Halle, Germany. In related and possibly helpful news, Datameer is hiring for several positions as of this writing.
Cynthia Murrell, January 09, 2014