Big Data Debunkers Arise, Unite, Question Value

April 27, 2015

I enjoy reading the “analyses” of Blue Chip consulting firms. I have had a brush or two with the folks at these outfits over the years. I seem to recall working for one of them and doing consulting for a couple of others. At age 70, who knows?

I read “To Benefit from Big Data, Resist the Three False Promises.” Just three, I thought. To learn the truth, I sucked in the bits and learned:

Gartner recently predicted that “through 2017, 60% of big data projects will fail to go beyond piloting and experimentation and will be abandoned.” This reflects the difficulty of generating value from existing customer, operational and service data, let alone the reams of unstructured internal and external data generated from social media, mobile devices and online activity.

Zounds. A Blue Chip firm citing an Azure Chip firm. That, to me, is like the Cleveland Cavaliers tapping into a talent from a middle school basketball team. I assumed there was an intellectual gap between the Blue Chip consultants and the second tier outfits. Guess I was wrong. Another possibility is that the folks behind the article were plucking low hanging research fruit in order to make their case.

I learned that the three “false promises” were ones that just never, ever crossed my mind. The article states that there are three, count ‘em, three items of information about Big Data which are not true. Not true equals a lie, does it not?

  1. The “technology” singular of Big Data will automatically discover and present business opportunities. Shucks, I though magic happened, particularly when dissimilation was involved.
  2. “Harvesting more data” automatically generates “more value.” There’s that magic again. I was stunned to learn that collecting information does not automatically equal much of anything. If there is one thing easy to collect, it is digital information.
  3. “Good” data scientists similar those who work at Blue Chip and Azure Chip consulting firms? No matter. The “good data scientists” cannot “find value” for a paying customer. Is this a hedge to prevent consulting firm clients from alleging that the Big Data services did not yield a pot of gold?

Big Data, like most of the technology buzzwords, short circuit harried executives’ prudence. The silver tongued are able to invoke MBAisms and close deals. The benefits of those deals are often very difficult to pinpoint, quantify, or understand.

Write ups that are blunt tips on probing questions are amusing. I wonder if there is Big Data to make clear how many Big Data projects end up like other digital information silver bullets; that is, shooting blanks. Bang. Bang. Bang. That’s value.

Stephen E Arnold, April 27, 2015

Oracle Challenges HP Autonomy Service

April 22, 2015

The article titled Oracle Adds Big Data Integration Tool To Streamline Hadoop Deployments on Silicon Angle discusses the news from Oracle that follows its determination that putting the right tools before users is the only way to allow for success. The Data Integrator for Big Data is meant to create more opportunities to pull data from multiple repositories by treating them the same. The article states,

“It’s an important step the company insists, because Big Data tools like Hadoop and Spark use languages like Java and Python, making them more suitable for programmers rather than database admins (DBAs). But the company argues that most enterprise data analysis is carried out by DBAs and ETL experts, using tools like SQL. Oracle’s Big Data integrator therefore makes any non-Hadoop developer “instantly productive” on Hadoop, added Pollock in an interview with PC World.”

Pollock also spoke to Oracle’s progress, claiming that they are the only company with the capability to generate Hive, Pig and Spark transformations from a solitary mapping. For customers, this means not needing to know how to code in multiple programming languages. HP is also making strides in this line of work with the recent unveiling of the software that integrates Vertica with HP Autonomy IDOL. Excitement ahead!

Chelsea Kerwin, April 22, 2014

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

The Enterprise is a Jungle Search

April 16, 2015

The word collaboration has become one of those corporate power words like “synergy” and “KISS method.”  Many people groan inwardly at new ways to “collaborate,” because it usually means another tool they have to learn and will fall out of use in under a year.  With the myriad of ways to collaborate digitally, getting any actual collaborating done is difficult.  The SAP News blog says enterprise collaboration might be getting a little easier in the article, “EnterpriseJungle Tames Enterprise Search.”

EnterpriseJungle created an application with the SAP Hana Cloud Platform to help companies connect quickly find and connect with experts within or outside their company.  The Principal at EnterpriseJungle states that a company’s people search is vital tool to locate and harness information.

“ ‘Large companies are desperate to get a handle on understanding and accessing the expertise available to them at any given moment,’ said Sinclair. ‘Our solutions help companies solve fundamental questions like how do we find the people who are fantastic at what they do, but only known to their closest core group of co-workers? And, how do we easily bring their knowledge and expertise to the front line with minimal extra work? If we can help get information to employees that need it, we’re fundamentally making their lives easier, and making the company’s life easier.’ “

After a description of how EnterpriseJungle’s works and its usefulness for companies, it makes a claim to offer Google-like search results.  While it might be a people search tool, the application is capable of much more.  It can help people locate experts, track down skill sets, and even improve IT relations.

EnterpriseJungle is hitting on a vital tool for companies.  People search has a severe need for improvement and this might be the start of a new enterprise niche market.

Whitney Grace, April 16, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

NSF Makes Plan for Public Access to Scientific Research

April 16, 2015

The press release on the National Science Foundation titled National Science Foundation Announces Plan for Comprehensive Public Access to Research Results speaks to the NSF’s interest in increasing communications on federally funded research. The NSF is an independent federal agency with a 7 billion dollar annual budget that is dispersed around the country in the form of grants to fund research and education in science and engineering. The article states,

“Scientific progress depends on the responsible communication of research findings,” said NSF Director France A. Córdova…Today’s announcement follows a request from the White House Office of Science and Technology Policy last year, directing science-funding agencies to develop plans to increase access to the results of federally funded research. NSF submitted its proposal to improve the management of digital data and received approval to implement the plan.”

The plan is called Today’s Data, Tomorrow’s Discoveries and promotes the importance of science without creating an undue burden on scientists. All manuscripts that appear in peer-reviewed scholarly journals and the like will be made available for free download within a year of the initial publication. In a time when scientists are less trusted and science itself is deeply misunderstood, public access may be more important than ever.

 

Chelsea Kerwin, April 16, 2014

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Informed Millennials

April 15, 2015

With the fall of traditional newspapers and aging TV News audiences, just where are today’s 20- and young 30- somethings turning for news coverage?  Science 2.0  tells us “How Millennials Get News,” reporting on a recent survey from the American Press Institute and the Associated Press-NORC Center for Public Affairs Research. The joint effort comes from a collaboration arrangement the organizations call the Media Insight Project. Conducted at the beginning of 2015, the survey asked Millennials about their news-consumption habits. The article tells us:

“People ages 18-34 consume news and information in strikingly different ways than did previous generations, they keep up with ‘traditional’ news as well as stories that connect them to hobbies, culture, jobs, and entertainment, they just do it in ways that corporations can’t figure out how to monetize well….

“‘For many Millennials, news is part of their social flow, with most seeing it as an enjoyable or entertaining experience,’ said Trevor Tompson, director of the AP-NORC Center. ‘It is possible that consuming news at specific times of the day for defined periods will soon be a thing of the past given that news is now woven into many Millennials’ connected lives.’”

Soon? Even many of us Gen Xers and (a few intrepid Baby Boomers) now take our news in small doses at varying hours. The survey also found that most respondents look at the news at least once a day, and many several times per day. Also, contrary to warnings from worrywarts (yes, including me), personalized news feeds may not be creating a confirmation-bias crisis, after all. Most of these Millennials insist their social-media feeds are well balanced; the write-up explains:

“70 percent of Millennials say that their social media feeds are comprised of a diverse mix of viewpoints evenly mixed between those similar to and different from their own. An additional 16 percent say their feeds contain mostly viewpoints different from their own. And nearly three-quarters of those exposed to different views (73 percent) report they investigate others’ opinions at least some of the time–with a quarter saying they do it always or often.”

Well, that’s encouraging. Another finding might surprise some of us: Though a vast 90 percent of Millennials have smart phones, only half report being online most of all of the day. See the article for more, or navigate to the report itself; the study’s methodology is detailed at the end of the report.

Cynthia Murrell, April 15, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

A Former Googler Reflects

April 10, 2015

After a year away from Google, blogger and former Googler Tim Bray (now at Amazon) reflects on what he does and does not miss about the company in his post, “Google + 1yr.” Anyone who follows his blog, ongoing, knows Bray has been outspoken about some of his problems with his former employer: First, he really dislikes “highly-overprivileged” Silicon Valley and its surrounds, where Google is based. Secondly, he found it unsettling  to never communicate with the “actual customers paying the bills,” the advertisers.

What does Bray miss about Google? Their advanced bug tracking system tops the list, followed closely by the slick and efficient, highly collaborative internal apps deployment. He was also pretty keen on being paid partially in Google stock between 2010 and 2014. The food on campus is everything it’s cracked up to be, he admits, but as a remote worker, he rarely got to sample it.

It was a passage in Bray’s “neutral” section that most caught my eye, though. He writes:

“The number one popular gripe against Google is that they’re watching everything we do online and using it to monetize us. That one doesn’t bother me in the slightest. The services are free so someone’s gotta pay the rent, and that’s the advertisers.

“Are you worried about Google (or Facebook or Twitter or your telephone company or Microsoft or Amazon) misusing the data they collect? That’s perfectly reasonable. And it’s also a policy problem, nothing to do with technology; the solutions lie in the domains of politics and law.

“I’m actually pretty optimistic that existing legislation and common law might suffice to whack anyone who really went off the rails in this domain.

“Also, I have trouble getting exercised about it when we’re facing a wave of horrible, toxic, pervasive privacy attacks from abusive governments and actual criminals.”

Everything is relative, I suppose. Still, I think it understandable for non-insiders to remain a leery about these companies’ data habits. After all, the distinction between “abusive government” and businesses is not always so clear these days.

Cynthia Murrell, April 10, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

 

Predicting Plot Holes Isn’t So Easy

April 10, 2015

According to The Paris Review’s blog post “Man In Hole II: Man In Deeper Hole” Mathew Jockers created an analysis tool to predict archetypal book plots:

A rough primer: Jockers uses a tool called “sentiment analysis” to gauge “the relationship between sentiment and plot shape in fiction”; algorithms assign every word in a novel a positive or negative emotional value, and in compiling these values he’s able to graph the shifts in a story’s narrative. A lot of negative words mean something bad is happening, a lot of positive words mean something good is happening. Ultimately, he derived six archetypal plot shapes.”

Academics, however, found some problems with Jockers’s tool, such as is it possible to assign all words an emotional variance and can all plots really take basic forms?  The problem is that words are as nuanced as human emotion, perspectives change in an instant, and sentiments are subjective.  How would the tool rate sarcasm?

All stories have been broken down into seven basic plots, so why can it not be possible to do the same for book plots?  Jockers already identified six basic book plots and there are some who are curiously optimistic about his analysis tool.  It does beg the question if will staunch author’s creativity or if it will make English professors derive even more subjective meaning from Ulysses?

Whitney Grace, April 10, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

Apache Sparking Big Data

April 3, 2015

Apache Spark is an open source cluster computing framework that rivals MapReduceVenture Beat says that people did not pay that much attention to Apache Spark when it was first invented at University of California’s AMPLAB in 2011.  The article, “How An Early Bet On Apache Spark Paid Off Big” reports the big data open source supporters are adopting Apache Spark, because of its superior capabilities.

People with big data plans want systems that process real-time information at a fast pace and they want a whole lot of it done at once.  MapReduce can do this, but it was not designed for it.  It is all right for batch processing, but it is slow and much to complex to be a viable solution.

“When we saw Spark in action at the AMPLab, it was architecturally everything we hoped it would be: distributed, in-memory data processing speed at scale. We recognized we’d have to fill in holes and make it commercially viable for mainstream analytics use cases that demand fast time-to-insight on hordes of data. By partnering with AMPLab, we dug in, prototyped the solution, and added the second pillar needed for next-generation data analytics, a simple to use front-end application.”

ClearStory Data was built using Apache Spark to access data quickly, deliver key insights, and making the UI very user friendly.  People who use Apache Spark want information immediately to be utilized for profit from a variety of multiple sources.  Apache Spark might ignite the fire for the next wave of data analytics for big data.

Whitney Grace, April 3, 2015
Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

EBay Develops Open Source Pulsar for Real Time Data Analysis

April 2, 2015

A new large-scale, real-time analytics platform has been launched in response to one huge company’s huge data needs. VentureBeat reports, “EBay Launches Pulsar, an Open-Source Tool for Quickly Taming Big Data.” EBay has made the code available under an open-source license. It seems traditional batch processing systems, like that found in the widely used open-source Hadoop, just won’t cut it for eBay. That puts them in good company; Google, Microsoft, Twitter, and LinkedIn have each also created their own stream-processing systems.

Shortly before the launch, eBay released a whitepaper on the project, “Pulsar—Real-time Analytics at Scale.” It describes the what and why behind Pulsar’s design; check it out for the technical details. The whitepaper summarizes itself:

“In this paper we have described the data and processing model for a class of problems related to user behavior analytics in real time. We describe some of the design considerations for Pulsar. Pulsar has been in production in the eBay cloud for over a year. We process hundreds of thousands of events/sec with a steady state loss of less than 0.01%. Our pipeline end to end latency is less than a hundred milliseconds measured at the 95th percentile. We have successfully operated the pipeline over this time at 99.99% availability. Several teams within eBay have successfully built solutions leveraging our platform, solving problems like in-session personalization, advertising, internet marketing, billing, business monitoring and many more.”

For updated information on Pulsar, monitor their official website at gopulsar.io.

Cynthia Murrell, April 2, 2015

Stephen E Arnold, Publisher of CyberOSINT at www.xenky.com

HP Vertica and IDOL: Just Three Short Plus Years in the Making

March 31, 2015

I read an article from the outfit that relies on folks like Dave Schubmehl for expertise. The write up is “HP Links Vertica and IDOL Seeking Better Unstructured Data Analysis.” But I quite like the subtitle because it provides a timeline; to wit:

The company built a connector server for the products, which it acquired separately in 2011.

Let’s see that is just about three years plus a few months. The story reminded me of Rip Van Winkle who woke to a different world when he emerged from his slumber. The Sleepy Hollow could be a large technology company in the act of performing mitosis in order to generate [a] excitement, [b] money, and [c] the appearance of progress. I wonder if the digital Sleepy Hollow is located near Hanover Street? I will have to investigate that parallel.

What’s a few years of intellectual effort in a research “cave” when you are integrating software that is expected to generate billions of dollars in sales. Existing Vertica and Autonomy licensees are probably dancing in the streets.

The write up states:

Promising more thorough and timelier data analysis, Hewlett-Packard has released a software package that combines the company’s Vertica database with its IDOL data analysis platform. The HP Haven Connector Framework Server may allow organizations to study data sets that were too large or unwieldy to analyze before. The package provides “a mixture of statistical and contextual understanding,” of data, said Jeff Veis, HP vice president of marketing for big data. “You can pull in any form of data, and then do real-time high performance analysis.”

Hmm. “Promising” and “may allow” are interesting words and phrases. It seems as if the employer of Mr. Schubmehl is hedging on the HP assertions. I wonder, “Why?”

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta