September 30, 2014
Connotate has been going through many changes through 2014. According to Virtual Strategy they can count adding a new leader to the list: “Connotate Appoints Rich Kennelly As Chief Executive.” Connotate sells big data technology, specializing in enterprise grade Web data harvesting services. The newest leader for the company is Richard J. Kennelly. Kennelly has worked in the IT sector for over twenty years. Most of his experience has been helping developing businesses harness Internet and data. He has worked at Ipswitch and Akami Technologies, holding leadership roles at both companies.
Kennelly is excited about his new position:
“ ‘This is the perfect time to join Connotate,’ said Kennelly. ‘The Web is the largest data source ever created. The biggest brands are moving quickly to leverage that data to drive competitive advantage and create new revenue streams. Connotate’s patented technology, scalability, and deep technical expertise make us the natural choice for these forward thinking companies.’”
The rest of the quote includes a small, but impressive client list, more praise for Kennelly, and how Connotate is a leading big data company.
If Connotate did not have good products and services, then they would not keep their clients. Despite the big names, they are still going through financial woes. Is choosing Kennelly a sign that they are trying to raise harvest more funding?
September 15, 2014
Say, here’s a thought: After spending billions for big-data software, federal managers are being advised to do their research before investing in solutions. We learn about this nugget of wisdom from Executive Gov in their piece, “Report: Fed Managers Should Ask Data Questions, Determine Quality/Impact Before Investing in Tech.” Writer Abba Forrester sums up the Federal Times report:
“Rutrell Yasin writes that the above managers should follow three steps as they seek to compress the high volume of data their agencies encounter in daily tasks and to derive value from them. According to Shawn Kingsberry, chief information officer for the Recovery Accountability and Transparency Board, federal managers should first determine the questions they need to ask of data then create a profile for the customer or target audience.
“Finally, they should consider the potential impact of the data, the insights and resulting technology investments on the agency.”
For any managers new to data management, the article notes they should choose a platform that includes data analysis tools and compiles data from multiple sources into one repository. It also advises agencies to employ a dedicated chief data officer and data scientists/ architects. Good suggestions, all. Apparently, agencies need to be told that a cursory or haphazard approach to data is almost certain to require more time, effort, and expense down the line.
Cynthia Murrell, September 15, 2014
September 4, 2014
Autonomy, Recommind, and dozens of other search and content processing firms rely on statistical procedures. Anyone who has survived Statistics 101 believe in the power of numbers. Textbook examples are—well—pat. The numbers work out even for B and C students.
The real world, on the other hand, is different. What was formulaic in the textbook exercises is more difficult with most data sets. The data are incomplete, inconsistent, generated by systems whose integrity is unknown, and often wrong. Human carelessness, the lack of time, a lack of expertise, and plain vanilla cluelessness makes those nifty data sets squishier than a memory foam pillow.
If you have some questions about statistical evidence in today’s go go world, check out “I Disagree with Alan Turing and Daniel Kahneman Regarding the Strength of Statistical Evidence.”
I noted this passage:
It’s good to have an open mind. When a striking result appears in the dataset, it’s possible that this result does not represent an enduring truth or even a pattern in the general population but rather is just an artifact of a particular small and noisy dataset. One frustration I’ve had in recent discussions regarding controversial research is the seeming unwillingness of researchers to entertain the possibility that their published findings are just noise.
An open mind is important. Just looking at the outputs of zippy systems that do prediction for various entities can be instructive. In the last couple of months, I learned that predictive systems:
- Failed to size the Ebola outbreak by orders of magnitude
- Did not provide reliable outputs for analysts trying to figure out where a crashed airplane was
- Came up short regarding resources available to ISIS.
The Big Data revolution is one of those hoped for events. The idea is that Big Data will allow content processing vendors to sell big buck solutions. Another is that massive flows of unstructured content can only be tapped in a meaningful way with expensive information retrieval solutions.
Dreams, hopes, wishes—yep, all valid for children waiting for the tooth fairy. The real world has slightly more bumps and sharp places.
Stephen E Arnold, September, 2014
September 4, 2014
Here is an article that makes you question the past two years, from the Federal Times comes “Steps To Make Big Data Relevant” from August 2014. For the past two years, big data has been the go-to term for technology and information professionals. IT companies have sold software meant to harness big data’s potential and generate revenue. So why is there an article explaining how to make it relevant now? It is using the federal government as an example and any bureaucrat can tell you government implementation is slow.
If, however, you do not even know what big data is and you want to get started, this article explains it in basic terms. It has three steps people need to think about to develop a big data plan:
- Determine what questions need to be asked of the data.
- Determine where all of the data you want is located and ask the data owners’ to understand the data’s quality.
- Decide what it means to answer these questions and use technology to help answer them.
Then the last suggestion is to have a dedicated team to manage big data:
“To address that challenge, federal agencies need a chief data officer and data architects or scientists. The chief data officer would keep the chief information officer and chief information security officer better informed about the value of their information and how to interact with that information to make it useful. Chief data architects/scientists are needed to design the data infrastructure and quantify the value of the data at its lowest common elements.”
When you read over the questions, you will see they are an implementation plan for any information technology software: what do you want to do, figure out how to do it, make a plan to implement it. Big data is complex, but the steps governing it are not.
Whitney Grace, September 04, 2014
September 2, 2014
Why does logic seem to fail in the face of fancy jargon? DataFusion’s Blog posted on the jargon fallacy in the post, “It All Begins With Data Quality.” The post explains how with new terms like big data, real-time analytics, and self-service business intelligence that the basic fundamentals that make this technology work are forgotten. Cleansing, data capture, and governance form the foundation for data quality. Without data quality, big data software is useless. According to a recent Aberdeen Group study, data quality was ranked as the most important data management function.
Data quality also leads to other benefits:
“When examining organizations that have invested in improving their data, Aberdeen’s research shows that data quality tools do in fact deliver quantifiable improvements. There is also an additional benefit: employees spend far less time searching for data and fixing errors. Data quality solutions provided an average improvement of 15% more records that were complete and 20% more records that were accurate and reliable. Furthermore, organizations without data quality tools reported twice the number of significant errors within their records; 22% of their records had these errors.”
Data quality saves man hours, discovers missing errors, and deleted duplicate records. The Aberdeen Group’s study also revealed that poor data quality is a top concern. Organizations should deploy a data quality tool, so they too can take advantage of its many benefits. It is a logical choice.
September 1, 2014
I suppose I am narrow minded. I don’t associate the Huffington Post with high technology analyses. My ignorance is understandable because I don’t read the Web site’s content.
However, a reader sent me a link to “Top Three Big Data Myths: Debunked”, authored by a search vendor’s employee at Recommind. Now Recommind is hardly a household word. I spoke with a Recommind PR person about my perception that Recommind is a variant of the technology embodied in Autonomy IDOL. Yep, that company making headlines because of the minor dust up with Hewlett Packard. Recommind provides a probabilistic search system to customers that were originally involved in the legal market. The company has positioned its technology to other markets and added a touch of predictive magic as well. At its core, Recommind indexes content and makes the indexes available to users and other services. The company in 2010 formed a partnership with the Solcara search folks. Solcara is now the go to search engine for Thomson Reuters. I have lost track of the other deals in which Recommind has engaged.
The write up reveals quite a bit about the need for search vendors to reach a broader market in order to gain visibility to make the cost of sales bearable. This write up is a good example of content marketing and the malleability of outfits like Huffington Post. The idea strikes me as something that looks interesting may get a shot at building the click traffic for Ms. Huffington’s properties.
So what does the article debunk? Fasten your seat belt and take your blood pressure medicine. The content of the write up may jolt you. Ready?
First, the article reveals that “all” data are not valuable. The way the write up expresses it takes this form, “Myth #1—All Data Is Valuable.” Set aside the subject verb agreement error. Data is the plural and datum is the singular. But in this remarkable content marketing essay, grammar is not my or the author’s concern. The notion of categorical propositions applied to data is interesting and raises many questions; for example, what data? So the first my is that if one if able to gather “all data”, it therefore follows that some is not germane. My goodness, I had a heart palpitation with this revelation.
Second, the next myth is that “with Big Data the more information the better.” I must admit this puzzles me. I am troubled by the statistical methods used to filter smaller, yet statistically valid, subsets of data. Obviously the predictive Bayesian methods of Recommind can address this issue. The challenges Autonomy like syst4ems face are well known to some Autonomy licensees and, I assume, to the experts at Hewlett Packard. The point is that if the training information is off base by a smidge and the flow of content does not conform to the training set, the outputs are often off point. Now with “more information” the sampling purists point to sampling theory and the value of carefully crafted training sets. No problem on my end, but aren’t we emphasizing that certain non Bayesian methods are just not a wonderful as Recommind’s methods? I think so.
The third myth that the write up “debunks” is “Big Data opportunities come with no costs.” I think this is a convoluted way of saying that get ready to spend a lot of money to embrace Big Data. When I flip this debunking on its head, and I get this hypothesis, “The Recommind method is less expensive than the Big Data methods that other hype artists are pitching as the best thing since sliced bread.
The fix is “information governance.” I musty admit that like knowledge management, I have zero idea what the phrase means. Invoking a trade association anchored in document scanning does not give me confidence that an explanation will illuminate the shadows.
Net net: The myths debunked just set up myths for systems based on aging technology. Does anyone notice? Doubt it.
Stephen E Arnold, September 1, 2014
August 28, 2014
Technology moves fast. The race is always one to remain on top and relevant. Big data companies especially feel the push to develop new and improved products. Datamation makes a keen observation about big data competition in the article “30 Big Data Companies Leading The Way:”
“For Big Data companies, this is a critical period for competitive jockeying. These are the early days of Big Data, which means there are still a plethora of companies – a mix of new firms and old guard Silicon Valley firms – looking to stay current. Like everything else, the Big Data market will mature and consolidate. In five years, you can bet that many of the Big Data companies on this list will be gone – either out of business or merged/acquired with a larger player.”
Datamation continues the article with a list of big data companies that specialize in big data analytics. It is stressed that the list is not to be used as a buyer’s guide, but more as a rundown of the various services each of the thirty companies offers and how they try to distinguish themselves in the market. Big names like Google, Microsoft, IBM, and SAP rare among the first listed, while smaller companies are listed towards the bottom. Many of the smaller firms are ones that do not make the news often, but judging by their descriptions have comparable products.
Who will remain and who will stay in the next five years?
August 23, 2014
I don’t know if the data in “Most Smartphone Users Download Zero Apps per Month.” The majority (65%) of smartphone users download zero apps per month. I suppose the encouraging point in the write up is 35% of smartphone users download more than one per month. The Hewlett Packard IDOL app can be a slam dunk when HP unleashes IDOL enterprise apps. If HP converts just one percent of the 35 percent, millions will flow to the printer ink and personal computer company. At least, that’s one way to interpret the data the MBA way. Plug those numbers into Excel, fatten up the assumptions, and the money is in the virtual bank. At least that’s one way to leverage spreadsheet fever into a corporate initiative for Big Data IDOL enterprise apps.
Stephen E Arnold, August 23, 2014
August 18, 2014
I read “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights.” The write up from the newspaper that does not yet have hot links to the New York Times’ store, has revealed that Big Data involves “janitor work.”
Interesting. I thought that Big Data was a silver bullet, a magic jinni, a miracle, etc. The write up reports that “far too much handcrafted work — what data scientists call “data wrangling,” “data munging” and “data janitor work” — is still required.”
And who does the work? The MBAs? The venture capitalists? The failed Webmasters? The content management specialists? The faux consultants pitching information governance?
The work is done by data scientists.
The New York Times has learned:
Before a software algorithm can go looking for answers, the data must be cleaned up and converted into a unified form that the algorithm can understand.
Quiet a surprise for the folks at the newspaper.
How much of a data scientist’s time goes to data clean up? The New York Times has learned:
Data scientists, according to interviews and expert estimates, spend from 50 percent to 80 percent of their time mired in this more mundane labor of collecting and preparing unruly digital data, before it can be explored for useful nuggets.
What’s this mean in terms of cost?
Put simply, Big Data is likely to cost more than the MBAs, the venture capitalists, faux information government consultants, et al assumed.
So as the volume of Big Data expands ever larger, doesn’t this mean that the price tag for Big Data grows ever larger. I don’t want to follow this logic too far. Exponentiating costs and falling farther and farther behind the most recent data is likely to make the folks with those fancy, real time predictive models based on Big Data uncomfortable.
Don’t worry. Be happy. The Big Data did miss the Ebola issue, the caliphate, various financial problems, and a handful of trivial events.
Stephen E Arnold, August 18, 2014
August 11, 2014
I love quotes about Big Data. “Big” is relative. You have heard a doting patent ask a toddler, “How big are you?” The toddler puts up his or her arms and says, “So big.” Yep, big at a couple of years old and 30 inches tall.
“If You Think Big Data’s Big Now, Just Wait” contains a quote attributed to a Big Data company awash in millions in funding money. Here’s the item I flagged for my Quote to Note file:
“The promise of big data has ushered in an era of data intelligence. From machine data to human thought streams, we are now collecting more data each day, so much that 90% of the data in the world today has been created in the last two years alone. In fact, every day, we create 2.5 quintillion bytes of data — by some estimates that’s one new Google every four days, and the rate is only increasing…
I like the 2.5 quintillion bytes of data.
I am confident that Helion, IBM’s brain chip, and Google’s sprawling system can make data manageable. Well, more correctly, fancy systems will give the appearance of making quintillions of whatevers yield actionable intelligence.
If you do the Samuel Taylor Coleridge thing and enter into a willing suspension of disbelief, Big Data is just another opportunity.
How do today’s mobile equipped MBAs make decisions? A Google search, ask someone, or guess? I suggest you consider how you make decisions. How often do you have an appetite for SPSS style number crunching or a desire to see what’s new from the folks at Moscow State University.
Yep, data intelligence for the tiny percentage of the one percent who paid attention in statistics class. This is a type of saucisson I enjoy so much. Will this information find its way into a Schubmehl-like report about a knowledge quotient? For sure I think.
Stephen E Arnold, August 11, 2014