Big Data Diagram Reveals Database Crazy Quilt

July 7, 2016

I was cruising through the outputs of my Overflight system and spotted a write up with the fetching title “Big Data Services | @CloudExpo #BigData #IoT #M2M #ML #InternetOfThings.” Unreadable? Nah. Just a somewhat interesting attempt to get a marketing write up indexed by a Web search engine. Unfortunately humans have to get involved at some point. Thus, in my quest to learn what the heck Big Data is, I explored the content of the write up. What the article presents is mini summaries of slide decks developed by assorted mavens, wizards, and experts. I dutifully viewed most of the information but tired quickly as I moved through a truly unusual article about a conference held in early June. I assume that the “news” is that the post conference publicity is going to provide me with high value information in exchange for the time I invested in trying to figure out what the heck the title means.

I viewed a slide deck from an outfit called Cazena. You can view “Tech Primer: Big Data in the Cloud.” I want to highlight this deck because it contains one of the most amazing diagrams I have seen in months. Here’s the image:

image

Not only is the diagram enhanced by the colors and lines, the world it depicts is a listing of data management products. The image was produced in June 2015 by a consulting firm and recycled in “Tech Primer” a year later.

I assume the folks in the audience benefited from the presentation of information from mid tier consulting firms. I concluded that the title of the article is actually pretty clear.

I wonder, Is a T shirt is available with the database graphic? If so, I want one. Perhaps I can search for the strings “#M2M #ML.”

Stephen E Arnold, July 7, 2016

What Makes Artificial Intelligence Relevant to Me

July 7, 2016

Artificial intelligence makes headlines every once in awhile when a new super computer beats a pro player at chess, go, or even Jeopardy.  It is amazing how these machines replicate human thought processes, but it is more of a novelty than a practical application.  The IT Proportal discusses the actual real world benefits of artificial intelligence in, “How Semantic Technology Is Making Sense Of Our Big Data.”

The answer, of course, revolves around big data and how industries are not capable of keeping up with the amount of unstructured data generated by the data surges with more advanced technology.  Artificial intelligence processes the data and interprets it into recognizable patterns.

Then the article inserts information about the benefits of natural language processing, how it scours the information, and can extrapolate context based on natural speech patterns.  It also goes into how semantic technology picks up the slack when natural language processing does not work.  The entire goal is to make unstructured data more structured:

“It is also reasonable to note that the challenge also relates to the structure and output of your data management. The application of semantic technologies within an unstructured data environment can only draw real business value if the output is delivered in a meaningful way for the human tasked with looking at the relationships. It is here that graphical representations add user interface value and presents a cohesive approach to improving the search and understanding of enterprise data.”

The article is an informative fluff piece that sells big data technology and explains the importance of taking charge of data.  It has been discussed before.

 

Whitney Grace,  July 7, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

More Data Truths: Painful Stuff

July 4, 2016

I read “Don’t Let Your Data Lake Turn into a Data Swamp.” Nice idea, but there may be a problem which resists some folks’ best efforts to convert that dicey digital real estate into a tidy subdivision. Swamps are wetlands. As water levels change, the swamps come and go, ebb and flow as it were. More annoying is the fact that swamps are not homogeneous. Fens, muskegs, and bogs add variety to the happy hiker who strays into the Vasyugan Swamp as the spring thaw progresses.

The notion of a data swamp is an interesting one. I am not certain how zeros and ones in a storage medium relate to the Okavango delta, but let’s give this metaphor a go. The write up reveals:

Data does not move easily. This truth has plagued the world of Big Data for some time and will continue to do so. In the end, the laws of physics dictate a speed limit, no matter what else is done. However, somewhere between data at rest and the speed of light, there are many processes that must be performed to make data mobile and useful. Integrating data and managing a data pipeline are two of these necessary tasks.

Okay, no swamp thing here.

The write up shifts gears and introduces the “data pipeline” and the concept of “keeping the data lake clean.”

Let’s step back. What seems to be the motive force for this item about information in digital form has several gears:

  1. Large volumes of data are a mess. Okay, but not all swamps are messes. The real problem is that whoever stored data did it without figuring out what to do with the information. Collection is not application.
  2. The notion of a data pipeline implies movement of information from Point A to Point B or through a series of processes which convert Input A into Output B. Data pipelines are easy to talk about, but in my experience these require knowing what one wants to achieve and then constructing a system to deliver. Talking about a data pipeline is not a data pipeline in my wetland.
  3. The concept of pollution seems to suggest that dirty data are bad. Making certain data are accurate and normalized requires effort.

My view is that this write up is trying to communicate the fact that Big Data is not too helpful if one does not take care of the planning before clogging a storage subsystem with digital information.

Seems obvious but I suppose that’s why we have Love Canals and an ever efficient Environmental Protection Agency to clean up shortcuts.

Stephen E Arnold, July 4, 2016

Bad News for Instant Analytics Sharpies

June 28, 2016

I read “Leading Statisticians Establish Steps to Convey Statistics a Science Not Toolbox.” I think “steps” are helpful. The challenge will be to corral the escaped ponies who are making fancy analytics a point and click, drop down punch list. Who needs to understand anything. Hit the button and generate visualizations until somethings looks really super. Does anyone know a general who engages in analytic one-upmanship? Content and clarity sit in the backseat of the JLTV.

The write up is similar to teens who convince their less well liked “pals” to go on a snipe hunt. I noted this passage:

To this point, Meng [real statistics person] notes “sound statistical practices require a bit of science, engineering, and arts, and hence some general guidelines for helping practitioners to develop statistical insights and acumen are in order. No rules, simple or not, can be 100% applicable or foolproof, but that’s the very essence that I find this is a useful exercise. It reminds practitioners that good statistical practices require far more than running software or an algorithm.”

Many vendors emphasize how easy smart analytics systems are to use. The outputs are presentation ready. Checks and balances are mostly pushed to the margins of the interface.

Here are the 10 rules.

  1. Statistical Methods Should Enable Data to Answer Scientific Questions
  2. Signals Always Come with Noise
  3. Plan Ahead, Really Ahead
  4. Worry about Data Quality
  5. Statistical Analysis Is More Than a Set of Computations
  6. Keep it Simple
  7. Provide Assessments of Variability
  8. Check Your Assumptions
  9. When Possible, Replicate!
  10. Make Your Analysis Reproducible

I think I can hear the guffaws from the analytics vendors now. I have tears in my eyes when I think about “statistical methods should enable data to answer scientific questions.” I could have sold that line to Jack Benny if he were still alive and doing comedy. Scientific questions from data which no human has checked for validity. Oh, my goodness. Then reproducibility. That’s a good one too.

Stephen E Arnold, June 28, 2016

Forbes, News Coverage, and Google Love

June 24, 2016

Short honk: US news coverage has “faves.” I assume that the capitalist tool avoids bias in its admirable reporting about business.

Navigate to “Television As Data: Mapping 6 Years of American Television News.” The write up uses Big Data from television news to reveal what gets air time. When I read the article, I must admit I thought about the phrase “If it bleeds, it leads.”

The bottom line is not that countries and cities are used to characterize an event. For me the most interesting comment was the thanks bestowed on Google for assisting with the analysis.

I circled twice in honest blue this statement:

In the end, these maps suggest that the bigger story that is being missed in all the conversation about media fragmentation and bias is that media has always been biased geographically, culturally and linguistically.

Note the “all” and the “always.” Nifty generalizations from an analysis of six years of data.

Biased coverage? I cannot conceive of biased coverage. Film at 11.

Stephen E Arnold, June 24, 2016

Data Wrangling Market Is Self-Aware and Growing, Study Finds

June 20, 2016

The article titled Self-Service Data Prep is the Next Big Thing for BI on Datanami digs into the quickly growing data preparation industry by reviewing the Dresner Advisory Services study. The article provides a list of the major insights from the study and paints a vivid picture of the current circumstances. Most companies often perform end-user data preparation, but only a small percentage (12%) find themselves to be proficient in the area. The article states,

“Data preparation is often challenging, with many organizations lacking the technical resources to devote to comprehensive data preparation. Choosing the right self-service data preparation software is an important step…Usability features, such as the ability to build/execute data transformation scripts without requiring technical expertise or programming skills, were considered “critical” or “very important” features by over 60% of respondents. As big data becomes decentralized and integrated into multiple facets of an organization, users of all abilities need to be able to wrangle data themselves.”

90% of respondents agreed on the importance of two key features: the capacity to aggregate and group data, and a straightforward interface for implementing structure on raw data. Trifacta earned the top vendor ranking of just under 30 options for the second year in a row. The article concludes by suggesting that many users are already aware that data preparation is not an independent activity, and data prep software must be integrated with other resources for success.

 

Chelsea Kerwin, June 20, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

The Value of Data: The Odd Isolation of Little Items

June 17, 2016

I read “Determining the Economic Value of Data.” The author is a chief technology officer, a dean of Big Data, and apparently a college professor training folks to be MBAs. The idea is that data are intangible. How does one value an intangible when writing from the perspective of a “dean”?

The answer is to seize on some applications of Big Data which can be converted to measurable entities. Examples include boosting the number of bank products a household “holds”, reducing of customer churn, and making folks happier. Happiness is a “good” and one can measure it; for example, “How happy are you with the health care plan?”

One can then collect data, do some Excel fiddling, and output numbers. The comparative figures (one hopes) provide a handle upon which to hang “value.”

This is the standard approach used to train business wizards in MBA programs based on my observations. We know the method works, just check out the economic performance of the US economy in the last quarter.

The problem I have with this isolationist approach is that it ignores the context of any perceived value. I don’t want to hobble through the The Knowledge Value Revolution by Taichi Sakaiya. I would suggest that any analysis of value may want to acknowledge the approach taken by Sakaiya about four decades ago. One can find a copy of the book for one penny on good old Amazon. How’s that for knowledge value.

Old ideas are not exactly the fuel that fires the imaginations of some “deans” or MBAs. Research is the collection of data which one can actually locate. Forget about the accuracy of the data or the validity of the analyses of loosey goosey notions of “satisfaction”.

I would suggest that the “dean’s” approach is a bit wobbly. Consider Sakaiya, who seems to be less concerned with creating busy work and more with coming to grips why certain products and services command high prices and others are almost valueless.

I know that reading a book written in the 1980s is a drag. Perhaps it is better to ignore prescient thought and just go with whatever can be used to encourage the use of Excel and the conversion of numbers into nifty visualizations.

Stephen E Arnold, June 17, 2016

Enterprise Search Vendor Sinequa Partners with MapR

June 8, 2016

In the world of enterprise search and analytics, everyone wants in on the clients who have flocked to Hadoop for data storage. Virtual Strategy shared an article announcing Sinequa Collaborates With MapR to Power Real-Time Big Data Search and Analytics on Hadoop. A firm specializing in big data, Sinequa, has become certified with the MapR Converged Data Platform. The interoperation of Sinequa’s solutions with MapR will enable actionable information to be gleaned from data stored in Hadoop. We learned,

“By leveraging advanced natural language processing along with universal structured and unstructured data indexing, Sinequa’s platform enables customers to embark on ambitious Big Data projects, achieve critical in-depth content analytics and establish an extremely agile development environment for Search Based Applications (SBA). Global enterprises, including Airbus, AstraZeneca, Atos, Biogen, ENGIE, Total and Siemens have all trusted Sinequa for the guidance and collaboration to harness Big Data to find relevant insight to move business forward.”

Beyond all the enterprise search jargon in this article, the collaboration between Sinequa and MapR appears to offer an upgraded service to customers. As we all know at this point, unstructured data indexing is key to data intake. However, when it comes to output, technological solutions that can support informed business decisions will be unparalleled.

 

Megan Feil, June 8, 2016

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

Search Vendor Identifies Big Data Failings

June 5, 2016

Talk about the pot calling the kettle a deep fryer? I read “Attivio Survey Exposes Disconnect Between Big Data Excitement and Organizations’ Ability to Execute.” On one hand, the idea that a buzzword does not, like Superman, transform into truth, justice, and the America way is understandable. On the other hand, the survey underscores one of the gaps in the marketing invasion force search vendors have when selling information access as business intelligence.

The write up points out that Big Data is going like gangbusters. However:

64 percent of respondents said that process bottlenecks prevent large data sets from being accessed quickly and efficiently. This dissonance highlights a growing gulf between the desire to embrace Big Data and their ability to operationalize it.

With a sample size of 150, I am not sure how solid these results are, but the point is poignant. Doing “stuff” with data is great. But how is the “stuff” relevant to closing a sale.

Attivio, the apparent sponsor of the study, seems a glass that is more than half full, maybe overflowing. Three key findings from the study allegedly were:

  • Legacy systems are not up to the task of Big Data crunching. The fix? Not provided but my hunch is that the “cloud” will be a dandy solution
  • Finding folks who can actually “do” Big Data and provide useful operational outputs is a very difficult task. The fix? I assume one can hire an outfit like the study’s sponsor, but this is just a wild guess on my part.
  • Governance is an issue. The fix? If I were working at Booz, Allen, the answer is obvious: Hire Booz, Allen to manage. If that’s not an option, well, floundering may work.

Net net: Search vendors need to find a source of sustainable revenue. Big Data is a possibility, but the market is not exactly confident about the payoff and how to use the outputs. The demos are often interesting.

Stephen E Arnold, June 5, 2016

Financial Institutes Finally Realize Big Data Is Important

May 30, 2016

One of the fears of automation is that human workers will be replaced and there will no longer be any more jobs for humanity.  Blue-collar jobs are believed to be the first jobs that will be automated, but bankers, financial advisors, and other workers in the financial industry have cause to worry.  Algorithms might replace them, because apparently people are getting faster and better responses from automated bank “workers”.

Perhaps one of the reasons why bankers and financial advisors are being replaced is due to their sudden understanding that “Big Data And Predictive Analytics: A Big Deal, Indeed” says ABA Banking Journal.  One would think that the financial sector would be the first to embrace big data and analytics in order to keep an upper hand on their competition, earn more money, and maintain their relevancy in an ever-changing world.   They, however, have been slow to adapt, slower than retail, search, and insurance.

One of the main reasons the financial district has been holding back is:

“There’s a host of reasons why banks have held back spending on analytics, including privacy concerns and the cost for systems and past merger integrations. Analytics also competes with other areas in tech spending; banks rank digital banking channel development and omnichannel delivery as greater technology priorities, according to Celent.”

After the above quote, the article makes a statement about how customers are moving more to online banking over visiting branches, but it is a very insipid observation.  Big data and analytics offer the banks the opportunity to invest in developing better relationships with their customers and even offering more individualized services as a way to one up Silicon Valley competition.  Big data also helps financial institutions comply with banking laws and standards to avoid violations.

Banks do need to play catch up, but this is probably a lot of moan and groan for nothing.  The financial industry will adapt, especially when they are at risk of losing more money.  This will be the same for all industries, adapt or get left behind.  The further we move from the twentieth century and generations that are not used to digital environments, the more we will see technology integration.

Whitney Grace, May 30, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta