Hewlett Packard Autonomy Pushes into Hungary

October 28, 2014

I read “Statis Partners Up with HP Autonomy.” Statis is a services firm with about 60 employees. According to the article:

Hungarian Stratis Vezet?i és Informatikai Tanácsadó Kft. entered a partnership with Hewlett Packard Autonomy, thereby becoming part of the Big Data market, Stratis announced…

IDOL and the Digital Reasoning Engine can “do” Big Data, but the core system does information retrieval. There are other approaches to Big Data that use more modern technologies.

Some content processing vendors are showing more interest in what I call the Eastern European market. With HP looking to sell some of its China assets, the shift to Europe may be one way of growing revenues.

HP, like IBM, has its hands full. Forget the legal hassles, both companies are trying to get out of the buggy whip business, to reference a famous marketing myopia case.

The problem is that the revenues generated by new fangled businesses via the old-school IBM- and HP-type business model will produce revenue. Unfortunately that revenue will be less lucrative than the money made on mainframes and scientific equipment in the good old days.

HP will need to find dozens of Hungarian-type deals to allow Autonomy to pay back its new owner.

Stephen E Arnold, October 28, 2014

Big Data Defined 43 Ways

October 21, 2014

A happy quack to the reader who alerted us to “”What Is Big Data?” The write up consists of 43 definitions provided by luminaries in a variety of fields. If you are in search of enlightenment with regard to Big Data, navigate to the story and dig in.

I found a couple of definitions interesting. Let me highlight Daniel Gillick’s and Hal Varian’s. Both are hooked up with Google, one of the big time big data outfits.

Mr. Gillick says:

Historically, most decisions — political, military, business, and personal — have been made by brains [that] have unpredictable logic and operate on subjective experiential evidence. “Big data” represents a cultural shift in which more and more decisions are made by algorithms with transparent logic, operating on documented immutable evidence. I think “big” refers more to the pervasive nature of this change than to any particular amount of data.

Mr. Varian says:

Big data means data that cannot fit easily into a standard relational database.

There you have it: A cultural shift and anything that won’t fit in a Codd-style data management system. Are the other 41 definitions superfluous?

Stephen E Arnold, October 21, 2014

Report Predicts Big Data Growth

October 21, 2014

Here’s another prediction on the future of Big Data. WhaTech calls our attention to a recent report from ReportsnReports in, “Explore Global Big Data Market that Will Grow at a CAGR of 34.17% by 2018.” Those on the hook to venture firms looking for Big Data payoffs hope the estimate is on the low side. Keep in mind, though, that this figure comes from a wild and crazy consulting firm report. The press release tells us:

“Global Big Data Market 2014-2018, has been prepared based on an in-depth market analysis with inputs from industry experts. The report covers the Americas, and the EMEA and APAC regions; it also covers the Global Big Data market landscape and its growth prospects in the coming years. The report also includes a discussion of the key vendors operating in this market.”

See the write-up for a list of vendors mentioned in the report; that we can get for free. The post goes on to list the “key questions” addressed by the $2500 report:

“What will the market size be in 2018 and what will the growth rate be?

What are the key market trends?

What is driving this market?

What are the challenges to market growth?

Who are the key vendors in this market space?

What are the market opportunities and threats faced by the key vendors?

What are the strengths and weaknesses of the key vendors?”

Good questions, all.

Cynthia Murrell, October 21, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Big Data Myths Debunked by Gartner Research

October 14, 2014

The article titled The Truth About Big Data on Datamation was posted September 26, 2014 and debunks some of the myths surrounding big data. Gartner, tech research firm, has collected data on the plans of organizations for big data. The most hopeful information may have been for businesses who have yet to hop on the big data bandwagon. This may sound like old news, but Gartner’s analysis of its findings leads to their claim that big data solution’s market “is in its infancy.” The article states,

“Seventy-three percent of organizations surveyed by the research group said that they are investing or plan to invest in big data technologies. Yet, only 13 percent said that they had deployed related solutions. Big data projects are stalling out in the planning stage, Gartner discovered. “The biggest challenges that organizations face are to determine how to obtain value from big data, and how to decide where to start,” said the firm in a statement… Gartner recommends that organizations sweat the small stuff.”

This means that the idea that individual flaws in data will have less impact on big data is wrongheaded. More data means more flaws, so keeping a close eye on data quality remains important. Companies need to clear away these misconceptions and others mentioned in the article in order to get the most bang for their big data bucks.

Chelsea Kerwin, October 14, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Buzzword of the Day: Dark Data

October 3, 2014

Big Data. Biggish Data. Now dark data. The idea plays on the silliness of the dark Web; that is, it is information that is “there”, but you don’t know about it. Well, get with it, pilgrim. Datameer used this term in “Shine Light on Dark Data.”

Here’s the definition:

At every organization neglected data sits overlooked in log files and archives accumulating digital dust and incurring costs. But as more organizations look for ways to become better, stronger and faster, they’re digging into this “dark” data and uncovering a gold mine of business intelligence.

Now how do you shine light on dark data? Great question. I will not probe the logical aspects of this concept. There are, according to the article, five steps to take. These are—unsurprisingly—the same steps a prudent and informed manager takes to figure out just plain old data.

Words to marketers make all the difference. I am not sure data has an opinion.

Stephen E Arnold, October 3, 2014

Among More Changes Connotate Adds New Leader

September 30, 2014

Connotate has been going through many changes through 2014. According to Virtual Strategy they can count adding a new leader to the list: “Connotate Appoints Rich Kennelly As Chief Executive.” Connotate sells big data technology, specializing in enterprise grade Web data harvesting services. The newest leader for the company is Richard J. Kennelly. Kennelly has worked in the IT sector for over twenty years. Most of his experience has been helping developing businesses harness Internet and data. He has worked at Ipswitch and Akami Technologies, holding leadership roles at both companies.

Kennelly is excited about his new position:

“ ‘This is the perfect time to join Connotate,’ said Kennelly. ‘The Web is the largest data source ever created.  The biggest brands are moving quickly to leverage that data to drive competitive advantage and create new revenue streams. Connotate’s patented technology, scalability, and deep technical expertise make us the natural choice for these forward thinking companies.’”

The rest of the quote includes a small, but impressive client list, more praise for Kennelly, and how Connotate is a leading big data company.

If Connotate did not have good products and services, then they would not keep their clients. Despite the big names, they are still going through financial woes. Is choosing Kennelly a sign that they are trying to raise harvest more funding?

Whitney Grace, September 30, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Feds Warned to Sweat the Small Stuff When Considering Big Data Solutions

September 15, 2014

Say, here’s a thought: After spending billions for big-data software, federal managers are being advised to do their research before investing in solutions. We learn about this nugget of wisdom from Executive Gov in their piece, “Report: Fed Managers Should Ask Data Questions, Determine Quality/Impact Before Investing in Tech.” Writer Abba Forrester sums up the Federal Times report:

“Rutrell Yasin writes that the above managers should follow three steps as they seek to compress the high volume of data their agencies encounter in daily tasks and to derive value from them. According to Shawn Kingsberry, chief information officer for the Recovery Accountability and Transparency Board, federal managers should first determine the questions they need to ask of data then create a profile for the customer or target audience.

“Next, they should locate the data and their sources then correspond with those sources to determine quality of data, the report said. ‘Managers need to know if the data is in a federal system of records that gives the agency terms of use or is it public data,’ writes Yasin.

“Finally, they should consider the potential impact of the data, the insights and resulting technology investments on the agency.”

For any managers new to data management, the article notes they should choose a platform that includes data analysis tools and compiles data from multiple sources into one repository. It also advises agencies to employ a dedicated chief data officer and data scientists/ architects. Good suggestions, all. Apparently, agencies need to be told that a cursory or haphazard approach to data is almost certain to require more time, effort, and expense down the line.

Cynthia Murrell, September 15, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Questions about Statistical Data

September 4, 2014

Autonomy, Recommind, and dozens of other search and content processing firms rely on statistical procedures. Anyone who has survived Statistics 101 believe in the power of numbers. Textbook examples are—well—pat. The numbers work out even for B and C students.

The real world, on the other hand, is different. What was formulaic in the textbook exercises is more difficult with most data sets. The data are incomplete, inconsistent, generated by systems whose integrity is unknown, and often wrong. Human carelessness, the lack of time, a lack of expertise, and plain vanilla cluelessness makes those nifty data sets squishier than a memory foam pillow.

If you have some questions about statistical evidence in today’s go go world, check out “I Disagree with Alan Turing and Daniel Kahneman Regarding the Strength of Statistical Evidence.”

I noted this passage:

It’s good to have an open mind. When a striking result appears in the dataset, it’s possible that this result does not represent an enduring truth or even a pattern in the general population but rather is just an artifact of a particular small and noisy dataset. One frustration I’ve had in recent discussions regarding controversial research is the seeming unwillingness of researchers to entertain the possibility that their published findings are just noise.

An open mind is important. Just looking at the outputs of zippy systems that do prediction for various entities can be instructive. In the last couple of months, I learned that predictive systems:

  • Failed to size the Ebola outbreak by orders of magnitude
  • Did not provide reliable outputs for analysts trying to figure out where a crashed airplane was
  • Came up short regarding resources available to ISIS.

The Big Data revolution is one of those hoped for events. The idea is that Big Data will allow content processing vendors to sell big buck solutions. Another is that massive flows of unstructured content can only be tapped in a meaningful way with expensive information retrieval solutions.

Dreams, hopes, wishes—yep, all valid for children waiting for the tooth fairy. The real world has slightly more bumps and sharp places.

Stephen E Arnold, September, 2014

I Thought Big Data Were Already Relevant

September 4, 2014

Here is an article that makes you question the past two years, from the Federal Times comes “Steps To Make Big Data Relevant” from August 2014. For the past two years, big data has been the go-to term for technology and information professionals. IT companies have sold software meant to harness big data’s potential and generate revenue. So why is there an article explaining how to make it relevant now? It is using the federal government as an example and any bureaucrat can tell you government implementation is slow.

If, however, you do not even know what big data is and you want to get started, this article explains it in basic terms. It has three steps people need to think about to develop a big data plan:

  1. Determine what questions need to be asked of the data.
  2. Determine where all of the data you want is located and ask the data owners’ to understand the data’s quality.
  3. Decide what it means to answer these questions and use technology to help answer them.

Then the last suggestion is to have a dedicated team to manage big data:

“To address that challenge, federal agencies need a chief data officer and data architects or scientists. The chief data officer would keep the chief information officer and chief information security officer better informed about the value of their information and how to interact with that information to make it useful. Chief data architects/scientists are needed to design the data infrastructure and quantify the value of the data at its lowest common elements.”

When you read over the questions, you will see they are an implementation plan for any information technology software: what do you want to do, figure out how to do it, make a plan to implement it. Big data is complex, but the steps governing it are not.

Whitney Grace, September 04, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Big Data Should Mean Big Quality

September 2, 2014

Why does logic seem to fail in the face of fancy jargon? DataFusion’s Blog posted on the jargon fallacy in the post, “It All Begins With Data Quality.” The post explains how with new terms like big data, real-time analytics, and self-service business intelligence that the basic fundamentals that make this technology work are forgotten. Cleansing, data capture, and governance form the foundation for data quality. Without data quality, big data software is useless. According to a recent Aberdeen Group study, data quality was ranked as the most important data management function.

Data quality also leads to other benefits:

“When examining organizations that have invested in improving their data, Aberdeen’s research shows that data quality tools do in fact deliver quantifiable improvements. There is also an additional benefit: employees spend far less time searching for data and fixing errors. Data quality solutions provided an average improvement of 15% more records that were complete and 20% more records that were accurate and reliable. Furthermore, organizations without data quality tools reported twice the number of significant errors within their records; 22% of their records had these errors.”

Data quality saves man hours, discovers missing errors, and deleted duplicate records. The Aberdeen Group’s study also revealed that poor data quality is a top concern. Organizations should deploy a data quality tool, so they too can take advantage of its many benefits. It is a logical choice.

Whitney Grace, September 02, 2014
Sponsored by ArnoldIT.com, developer of Augmentext

Next Page »