Honkin' News banner

HonkinNews for August 16, 2016

August 16, 2016

The weekly news program about search, online, and content processing is now available at https://youtu.be/mE3MGlmrUWc. In addition to comments about Goo!Hoo, IBM, and Microsoft, you will learn about grilling squirrel over a wood fire. Live from Harrod’s Creek.

Stephen E Arnold, August 16, 2016

Quote to Note: Big Data Governance

August 10, 2016

I read an interview with a wizard from Talend, which I did not know had French roots. The write up is “Interview: Christophe Toum, Talend on Why Big Data Needs Big Governance.” I noted two passages which I found refreshing.

The first address the unpleasant topic of being organized. The code word for this all-too-human characteristic is “governance.” I highlighted this passage:

At Talend we believe Big Data without governance will quickly become a big problem…Big Data needs even more governance.

My view is that more of an annoying administrative, human subject matter intensive investment required, the less governance will be applied. Just a thought based on my experience.

The second comment elicited one exclamation report from my subdued pale blue highlighter:

Controlling who can access and use this data, what data is verified and trusted, by whom and how, is a big deal.

No kidding.

Stephen E Arnold, August 10, 2016

Big Consulting Firm Smashes the Big Data Conundrum

August 9, 2016

I read “Cracking the Data Conundrum: How Successful Companies Make Big Data Operational.” The high level, super sophisticated, MBA quivering report is free. Does that mean that Capgemini Consulting is trying to drum up business? I thought these top level outfits generated 90 percent of their annual revenue from repeat business? Perhaps today’s economic climate is different?

The report is interesting because the premise is that Capgemini has solved a “conundrum.” This is a nifty word which I learned when I was a wee lad trying to keep my tutor in Campinas, Brazil, happy. I recall that the word was used by one Thomas Nash (no, not a relative of the Nash made famous with the quip “the golden trashery of Ogden Nashery). But that neologistic meaning has a fresh charge of meaning for me; to wit:

A term of abuse for a crank or a pedant.

Today the word is popular among the MBA set as a solvable problem. However, a conundrum can be another word for dilemma. That’s a logical word for illogical statements; for example,

Bruno was gored on the horns of a big, angry dilemma.

What does the Capgemini document suggest is the resolution to the problem of Big Data.

The write up tells the reader that most outfits trying to integrate Big Data into every day work life screw up. The fancy wording is:

Successful Big Data implementations elude most organizations.

That’s bad for the organizations, and I assume really good for consultants who know how to deal with wasted money.

The problem? Organizations’ management are not able to manage. I learned:

Our research revealed that the top challenges that organizations face include: dealing with scattered silos of data, ineffective coordination of analytics initiatives, the lack of a clear business case for Big Data funding, and the dependence on legacy systems to process and analyze Big Data.

Imagine organizations have these flaws. What are they to do?

Step one is to get their act together; that is, organize for Big Data. Sounds good. But what if the organization is set up to do something else; for instance, make men’s shirts or do publicity of a Hollywood motion picture?

Well, these outfits need to have a systematic approach to Big Data. And one size does not fit every organization. Capgemini identifies four ways to put the ponies in the circus wagon. These are:

  • Scattered pockets of Big Data stuff
  • Decentralized Big Data stuff. (How is this different from “scattered pockets”?)
  • Centralized Big Data stuff
  • A Big Data business unit. (This is the one that delivers the most “success.” I am not sure for whom however.)

How does an organization move from total loser in Big Data to a successful outfit integrating Big Data into operations? This effort, which will be billed either as a flat fee, a retainer, or time and materials basis, is an “implementation journey.” I have a hunch that this trip will not a 10 walk to the convenient store for a bottle of Big Red soda pop. The trip will be a hike through the Ural mountains in winter.

The write up includes a test. This makes it easy for the shirt maker in Bangladesh or the 20 somethings working from a trailer in Orange County to put their act in the circus’ center ring.

The write up references a survey conducted in 2014. I suppose in the slow moving world of the shirt makers and Hollywood publicists a year and a half is a reasonable time interval.

If you want to test your understanding of the word “conundrum,” you will want to read this free report. Only you can answer this question: Does conundrum reference a crank or pedant or a hapless MBA dangling from a sharp horn? Whenever horns of a bull enter a conversation, other stuff may follow.

Stephen E Arnold, August 9, 2016

Beyond Search HonkinNews Video for 8 August 2016 Online Now

August 9, 2016

You can view the August 8, 2016, HonkinNews program at this link. The video comes from Goodwill-grade 8 mm film equipment. The program highlights recent stories from the free (yep, no cost whatsoever) Beyond Search Web log. Learn about the how one Google executive “escaped” life in the fast lane. The Verizon acquisition of Yahoo reminds Stephen of Washington’s wooden false teeth. The deal allows Verizon to own two Internet artifacts. Hewlett Packard Enterprise, owner of Autonomy, faces an uncertain future as its sells units and thinks about selling itself. And there’s more in the six minute news program; for example, a restrained MBA cheer for Big Data. But that’s a sotte voce rah, rah. Like Beyond Search, the honking video version tries to separate the giblets from the goose feathers in the thrilling world of search, content processing, and related disciplines. That’s not easy in today’s search-centric world where relevance is mostly good enough and jargon is its own virtual reality.

Ken Toth, August 9, 2016

A Big Data Disconnect. Who Knew

August 2, 2016

I read “Advisors and Big Data: The Disconnect.” Stunned am I. Consultants not listening to their clients. Systems with severed communication channels to those to who their licensing bills. Unbelievable.

I learned:

But while many companies have big dough invested in this ongoing project, they still rely far too much on intuition and gut instinct instead of using their data to operate. This is often due to a fundamental disconnection between the actual needs of the business versus what the data analytics are designed to deliver.

The write up makes a number of statements which suggest there is some snake oil laced with ineptitude in the Big Data world; for example:

  • Analytics enable. What if analytics enable poor decision making?
  • Algorithms are not a “magic kit.” I thought algorithms were really smart.
  • Bad data are bad. Really?
  • Data are not insights. I thought data were chock full of insight.
  • Moving big data from Point A to Point B is not a slam dunk. What about a three point shot?

If these points resonate with you, you are probably not getting with the Big Data program. I thought Big Data was a silver bullet and a magic potion blended in one tasty for fee meal. Stunned, I tell you. Stunned. Imagine. Disconnect advisors.

Stephen E Arnold, August 2, 2016

Big Data Is Just a Myth

August 1, 2016

Remember in the 1979 hit The Muppet Movie there was a running gag where Kermit the Frog kept saying, “It’s a myth.  A myth!”  Then a woman named Myth would appear out of nowhere and say, “Yes?”  It was a funny random gag, but while it is a myth that frogs give warts, most of the myths related to big data may or not be.  Data Science Central decided to explain some of the myths in, “Debunking The 68 Most Common Myths About Big Data-Part 2.”

Some of the prior myths debunked in the first part were that big data was the newest power word, an end all solution for companies, only meant for big companies, and that it was complicated and expensive.  In truth, anyone can benefit from big data with a decent implementation plan and with someone who knows how to take charge of it.

Big data, in fact, can be integrated with preexisting systems, although it takes time and knowledge to link the new and the old together (it is not as difficult as it seems).  Keeping on that same thought, users need to realize that there is not a one size fits all big data solution.  Big data is a solution that requires analytical, storage, and other software.  It cannot be purchased like other proprietary software and it needs to be individualized for each organization.

One myth that is has converted into truth is that big data relies on Hadoop storage.  It used to be Hadoop  managed a market of many, but bow it is an integral bit of software needed to get the big data job done.  One of the most prevalent myths is it only belongs in the IT department:

“Here’s the core of the issue.  Big Data gives companies the greatly enhanced ability to reap benefits from data-driven insights and to make better decisions.  These are strategic issues.

You know who is most likely to be clamoring for Big Data?  Not IT.  Most likely it’s sales, marketing, pricing, logistics, and production forecasting.  All areas that tend to reap outsize rewards from better forward views of the business.”

Big data is becoming more of an essential tool for organizations in every field as it tells them more about how they operate and their shortcomings.  Big data offers a very detailed examination of these issues; the biggest issue users need to deal with is how they will use it?

 

Whitney Grace, August 1, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Governance for Big Data. A Sure Fire Winner for Consultants

July 28, 2016

I read “What’s Next for Big Data Analytics?” I didn’t know the answer to this question, and I still don’t. The angle of attack is common sense. Companies with experience is dealing with digital information often have viewpoints different from the marketing collateral produced by their colleagues. This write up seems to fall in the category of Mr. Bush’s request, “Please, clap.”

The idea is that an organization has to have information policies. That sounds like consultant speak. Most organizations struggle to figure out what their company party policies are. Digital data policies are one of those tasks that senior managers allow others to wrestle to the ground and get a tap out.

The write up includes a number of diagrams. I highlighted this one:

image

The red area is the governance and management thing. Good luck with that. Companies need revenue. Big Data is supposed to deliver. If not, those policies and governance meeting minutes along with the consultants who billed big bucks for them are going to the shredder in my opinion.

Stephen E Arnold, July 28, 2016

Scholarship Evolving with the Web

July 21, 2016

Is big data good only for the hard sciences, or does it have something to offer the humanities? Writer Marcus A Banks thinks it does, as he states in, “Challenging the Print Paradigm: Web-Powered Scholarship is Set to Advance the Creation and Distribution of Research” at the Impact Blog (a project of the London School of Economics and Political Science). Banks suggests that data analysis can lead to a better understanding of, for example, how the perception of certain historical events have evolved over time. He goes on to explain what the literary community has to gain by moving forward:

“Despite my confidence in data mining I worry that our containers for scholarly works — ‘papers,’ ‘monographs’ — are anachronistic. When scholarship could only be expressed in print, on paper, these vessels made perfect sense. Today we have PDFs, which are surely a more efficient distribution mechanism than mailing print volumes to be placed onto library shelves. Nonetheless, PDFs reinforce the idea that scholarship must be portioned into discrete units, when the truth is that the best scholarship is sprawling, unbounded and mutable. The Web is flexible enough to facilitate this, in a way that print could never do. A print piece is necessarily reductive, while Web-oriented scholarship can be as capacious as required.

“To date, though, we still think in terms of print antecedents. This is not surprising, given that the Web is the merest of infants in historical terms. So we find that most advocacy surrounding open access publishing has been about increasing access to the PDFs of research articles. I am in complete support of this cause, especially when these articles report upon publicly or philanthropically funded research. Nonetheless, this feels narrow, quite modest. Text mining across a large swath of PDFs would yield useful insights, for sure. But this is not ‘data mining’ in the maximal sense of analyzing every aspect of a scholarly endeavor, even those that cannot easily be captured in print.”

Banks does note that a cautious approach to such fundamental change is warranted, citing the development of the data paper in 2011 as an example.  He also mentions Scholarly HTML, a project that hopes to evolve into a formal W3C standard, and the Content Mine, a project aiming to glean 100 million facts from published research papers. The sky is the limit, Banks indicates, when it comes to Web-powered scholarship.

 

Cynthia Murrell, July 21, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

There is a Louisville, Kentucky Hidden Web/Dark
Web meet up on July 26, 2016.
Information is at this link: http://bit.ly/29tVKpx.

 

Attivio Targets Profitability by the End of 2016 Through $31M Financing Round

July 18, 2016

The article on VentureBeat titled Attivio Raises $31 Million to Help Companies Make Sense of Big Data discusses the promises of profitability that Attivio has made since its inception in 2007. According to Crunchbase, the search vendor has raised over $100 million from four investors. In March 2016, the company closed a financing round at $31M with the expectation of becoming profitable within 2016. The article explains,

“Our increased investment underscores our belief that Attivio has game-changing capabilities for enterprises that have yet to unlock the full value of Big Data,” said Oak Investment Partners’ managing partner, Edward F. Glassmeyer. Attivio also highlighted such recent business victories as landing lab equipment maker Thermo Fisher Scientific as a client and partnering with medical informatics shop PerkinElmer. Oak Investment Partners, General Electric Pension Trust, and Tenth Avenue Holdings participated in the investment, which pushed Attivio’s funding to at least $102 million.”

In the VentureBeat Profile about the deal, Stephen Baker, CEO of Attivio makes it clear that 2015 was a turning point for the company, or in his words, “a watershed year.” Attivio prides itself on both speeding up the data preparation process and empowering their customers to “achieve true Data Dexterity.”  And hopefully they will also be profitable, soon.

 

Chelsea Kerwin, July 18, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

There is a Louisville, Kentucky Hidden Web/Dark
Web meet up on July 26, 2016.
Information is at this link: http://bit.ly/29tVKpx.

==

The Web, the Deep Web, and the Dark Web

July 18, 2016

If it was not a challenge enough trying to understand how the Internet works and avoiding identity theft, try carving through the various layers of the Internet such as the Deep Web and the Dark Web.  It gets confusing, but “Big Data And The Deep, Dark Web” from Data Informed clears up some of the clouds that darken Internet browsing.

The differences between the three are not that difficult to understand once they are spelled out.  The Web is the part of the Internet that we use daily to check our email, read the news, check social media sites, etc.  The Deep Web is an Internet sector not readily picked up by search engines.  These include password protected sites, very specific information like booking a flight with particular airline on a certain date, and the TOR servers that allow users to browse anonymously.  The Dark Web are Web pages that are not indexed by search engines and sell illegal goods and services.

“We do not know everything about the Dark Web, much less the extent of its reach.

“What we do know is that the deep web has between 400 and 550 times more public information than the surface web. More than 200,000 deep web sites currently exist. Together, the 60 largest deep web sites contain around 750 terabytes of data, surpassing the size of the entire surface web by 40 times. Compared with the few billion individual documents on the surface web, 550 billion individual documents can be found on the deep web. A total of 95 percent of the deep web is publically accessible, meaning no fees or subscriptions.”

The biggest seller on the Dark Web is child pornography.  Most of the transactions take place using BitCoin with an estimated $56,000 in daily sales.  Criminals are not the only ones who use the Dark Web, whistle-blowers, journalists, and security organizations use it as well.  Big data has not even scratched the surface related to mining, but those interested can find information and do their own mining with a little digging

 

Whitney Grace,  July 18 , 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

There is a Louisville, Kentucky Hidden Web/Dark
Web meet up on July 26, 2016.
Information is at this link: http://bit.ly/29tVKpx.

Next Page »