Big Data: The Crawfish Approach to Meaningful Information

March 21, 2017

Have you ever watched a crawfish (sometimes called a crawdad or a crayfish) get away from trouble. The freshwater crustaceans can go backwards. Members of the members of the Astacidae can be found in parts of the south, so you will have to wander in a Georgia swamp to check out the creature’s behavior.

The point is that crawfish go backwards to protect themselves and achieve their tiny lobster like goals. Big time consultants also crawfish in order to sell more work and provide “enhanced” insight into a thorny business or technical problem other consultants have created.

To see this in action, navigate to “The Conundrum of Big Data.” A super consultant explains that Big Data is not exactly the home run, silver bullet, or magic potion some lesser consultants said Big Data would be. I learned:

Despite two decades of intensive IT investment in data [mining] applications, recent studies show that companies continue to have trouble identifying metrics that can predict and explain performance results and/or improve operations. Data mining, the process of identifying patterns and structures in the data, has clear potential to identify prescriptions for success but its wide implementation fails systematically. Companies tend to deploy ‘unsupervised-learning’ algorithms in pursuit of predictive metrics, but this automated [black box] approach results in linking multiple low-information metrics in theories that turn out to be improbably complex.

Big surprise. For folks who are not trained in the nuts and bolts of data analysis and semi fancy math, Big Data is a giant vacuum cleaner for money. The cash has to pay for “experts,” plumbing, software, and more humans. The outputs are often fuzzy wuzzy probabilities which more “wizards” interpret. Think of a Greek religious authority looking at the ancient equivalent of road kill.

The write up cites the fizzle that was Google Flu Trends. Cough. Cough. But even that sneeze could be fixed with artificial intelligence. Yep, when smart humans make mistakes, send in smart software. That will work.

In my opinion, the highlight of the write up was this passage:

When it comes to data, size isn’t everything because big data on their own cannot just solve the problem of ‘insight’ (i.e. inferring what is going on). The true enablers are the data-scientists and statisticians who have been obsessed for more than two centuries to understand the world through data and what traps lie in wait during this exercise. In the world of analytics (AaaS), it is agility (using science, investigative skills, appropriate technology), trust (to solve the client’s real business problems and build collateral), and ‘know-how’ (to extract intelligence hidden in the data) that are the prime ‘assets’ for competing, not the size of the data. Big data are certainly here but big insights have yet to arrive.

Yes. More consulting is needed to make those payoffs arrive. But first, hire more advisers. What could possibly go wrong? Cough. Sneeze. One goes forwards with Big Data by going backwards for more analysis.

Stephen E Arnold, March 21, 2017

ScyllaDB Version 3.1 Available

March 8, 2017

According to Scylla, their latest release is currently the fastest NoSQL database. We learn about the update from SiliconAngle’s article, “ScyllaDB Revamps NoSQL Database in 1.3 Release.” To support their claim, the company points to a performance benchmark test executed by the Yahoo Cloud Serving Benchmark project. That group compared ScyllaDB to the open source Cassandra database, and found Scylla to be 4.6 times faster than a standard Cassandra cluster.

Writer Mike Wheatley elaborates on the product:

ScyllaDB’s biggest differentiator is that it’s compatible with the Apache Cassandra database APIs. As such, the creators claims that ScyllaDB can be used as a drop-in replacement for Cassandra itself, offering users the benefit of improved performance and scale that comes from the integration with a light key/value store.

The company says the new release is geared towards development teams that have struggled with Big Data projects, and claims a number of performance advantages over more traditional development approach, including:

*10X throughput of baseline Cassandra – more than 1,000,000 CQL operations per second per node

*Sub 1msec 99% latency

*10X per-node storage capacity over Cassandra

*Self-tuning database: zero configuration needed to max out hardware

*Unparalleled high availability, native multi-datacenter awareness

*Drop-in replacement for Cassandra – no additional scripts or code required”

Wheatley cites Scylla’s CTO when he points to better integration with graph databases and improved support for Thrift, Date Tiered Compaction Strategy, Large Partitions, Docker, and CQL tracing. I notice the company is hiring as of this writing. Don’t let the Tel Aviv location of Scylla’s headquarters stop from applying you if you don’t happen to live nearby—they note that their developers can work from anywhere in the world.

Cynthia Murrell, March 8, 2016

New Technologies Meet Resistance in Business

March 3, 2017

Trying to sell a state of the art, next-gen search and content processing system can be tough. In the article, “Most Companies Slow to Adopt New Business Tech Even When It Can Help,” Digital Trends demonstrates that a reluctance to invest in something new is not confined to Search. Writer Bruce Brown cites the Trends vs. Technologies 2016 report (PDF) from Capita Technology Solutions and Cisco. The survey polled 125 ICT [Information and Communications Tech] decision-makers working in insurance, manufacturing, finance, and the legal industry. More in-depth interviews were conducted with a dozen of these folks, spread evenly across those fields.

Most higher-ups acknowledge the importance of keeping on top of, and investing in, worthy technological developments. However, that awareness does not inform purchasing and implementation decisions as one might expect. Brown specifies:

The survey broke down tech trends into nine areas, asking the surveyed execs if the trends were relevant to their business, if they were being implemented within their industry, and more specifically if the specific technologies were being implemented within their own businesses. Regarding big data, for example, 90 percent said it was relevant to their business, 64 percent said it was being applied in their industry, but only 39 percent reported it being implemented in their own business. Artificial intelligence was ranked as relevant by 50 percent, applied in their industry by 25 percent, but implemented in their own companies by only 8 percent. The Internet of Things had 70 percent saying it is relevant, with 50 percent citing industry applications, but a mere 30 percent use it in their own business. The study analyzed why businesses were not implementing new technologies that they recognized could improve their bottom line. One of the most common roadblocks was a lack of skill in recognizing opportunities within organizations for the new technology. Other common issues were the perception of security risks, data governance concerns, and the inertia of legacy systems.

The survey also found the stain of mistrust, with 82 percent of respondents sure that much of what they hear about tech trends is pure hype. It is no surprise, then, that they hesitate to invest resources and impose change on their workers until they are convinced benefits will be worth the effort. Perhaps vendors would be wise to dispense with the hype and just lay out the facts as clearly as possible; potential customers are savvier than some seem to think.

Cynthia Murrell, March 3, 2017

 

Comprehensive, Intelligent Enterprise Search Is Already Here

February 28, 2017

The article on Sys-Con Media titled Delivering Comprehensive Intelligent Search examines the accomplishments of World Wide Technology (WWT) in building a better search engine for the business organization. The Enterprise Search Project Manager and Manager of Enterprise Content at WWT discovered that the average employee will waste over a full week each year looking for the information they need to do their work. The article details how they approached a solution for enterprise search,

We used the Gartner Magic Quadrants and started talks with all of the Magic Quadrant leaders. Then, through a down-selection process, we eventually landed on HPE… It wound up being that we went with the HPE IDOL tool, which has been one of the leaders in enterprise search, as well as big data analytics, for well over a decade now, because it has very extensible platform, something that you can really scale out and customize and build on top of.

Trying to replicate what Google delivers in an enterprise is a complicated task because of how siloed data is in the typical organization. The new search solution offers vast improvements in presenting employees with the relevant information, and all of the relevant information and prevents major time waste through comprehensive and intelligent search.

Chelsea Kerwin, February 28, 2017

The Game-Changing Power of Visualization

February 8, 2017

Data visualization may be hitting at just the right time. Data Floq shared an article highlighting the latest, Data Visualisation Can Change How We Think About The World. As the article mentions, we are primed for it biologically: the human eye and brain processes 10 to 12 separate images per second, comfortably. Considering the output, visualization provides the ability to rapidly incorporate new data sets, remove metadata and increase performance. Data visualization is not without challenge. The article explains,

Perhaps the biggest challenge for data visualisation is understanding how to abstract and represent abstraction without compromising one of the two in the process. This challenge is deep rooted in the inherent simplicity of descriptive visual tools, which significantly clashes with the inherent complexity that defines predictive analytics. For the moment, this is a major issue in communicating data; The Chartered Management Institute found that 86% of 2,000 financiers surveyed late 2013, were still struggling to turn volumes of data into valuable insights. There is a need, for people to understand what led to the visualisation, each stage of the process that led to its design. But, as we increasingly adopt more and more data this is becoming increasingly difficult.

Is data visualization changing how we think about the world, or is the existence of big data the culprit? We would argue data visualization is simply a tool to present data; it is a product rather than an impetus for a paradigm shift. This piece is right, however in bringing attention to the conflict between detail and accessibility of information. We can’t help but think the meaning is likely in the balancing of both.

Megan Feil, February 8, 2017

Counter Measures to Money Laundering

January 30, 2017

Apparently, money laundering has become a very complicated endeavor, with tools like Bitcoin “washers” available via the Dark Web. Other methods include trading money for gaming or other virtual currencies and “carding.”  ZDNet discusses law enforcement’s efforts to keep up in, “How Machine Learning Can Stop Terrorists from Money Laundering.”

It will not surprise our readers to learn authorities are turning to machine learning to cope with new money laundering methods. Reporter Charlie Osborne cites the CEO of cybersecurity firm ThetaRay, Mark Gazit, when she writes:

By taking advantage of Big Data, machine learning systems can process and analyze vast streams of information in a fraction of the time it would take human operators. When you have millions of financial transactions taking place every day, ML provides a means for automated pattern detection and potentially a higher chance of discovering suspicious activity and blocking it quickly. Gazit believes that through 2017 and beyond, we will begin to rely more on information and analytics technologies which utilize machine learning to monitor transactions and report crime in real time, which is increasingly important if criminals are going to earn less from fraud, and terrorism groups may also feel the pinch as ML cracks down on money laundering.

Of course, criminals will not stop improving their money-laundering game, and authorities will continue to develop tools to thwart them. Just one facet of the cybersecurity arms race.

Cynthia Murrell, January 30, 2017

Big Data Is a Big Mess

January 18, 2017

Big Data and Cloud Computing were supposed to make things easier for the C-Suites to take billion dollar decisions. But it seems things have started to fall apart.

In an article published by Forbes titled The Data Warehouse Has Failed, Will Cloud Computing Die Next?, the author says:

A company that sells software tools designed to put intelligence controls into data warehousing environments says that traditional data warehousing approaches are flaky. Is this just a platform to spin WhereScape wares, or does Whitehead have a point?

WhereScape, a key player in Data Warehousing is admitting that the buzzwords in the IT industry are fizzing out. The Big Data is being generated, in abundance, but companies still are unsure what to do with the enormous amount of data that their companies produce.

Large corporations who already have invested heavily in Big Data are yet to find any RoIs. As the author points out:

Data led organizations have no idea how good their data is. CEOs have no idea where the data they get actually comes from, who is responsible for it etc. yet they make multi million pound decisions based on it. Big data is making the situation worse not better.

Looks like after 3D-Printing, another buzzword in the tech world, Big Data and Cloud Computing is going to be just a fizzled out buzzword.

Vishal Ingole, January 18, 2017

The Software Behind the Web Sites

January 17, 2017

Have you ever visited an awesome Web site or been curious how an organization manages their Web presence?  While we know the answer is some type of software, we usually are not given a specific name.  Venture Beat reports that it is possible to figure out the software in the article, “SimilarTech’s Profiler Tells You All Of The Technologies That Web Companies Are Using.”

SimilarTech is a tool designed to crawl the Internet to analyze what technologies, including software, Web site operators use.  SimiliarTech is also used to detect which online payment tools are the most popular.  It does not come as a surprise that PayPal is the most widely used, with PayPal Subscribe and Alipay in second and third places.

Tracking what technology and software companies utilize for the Web is a boon for salespeople, recruiters, and business development professionals who want a competitive edge as well as:

Overall, SimilarTech provides big data insights about technology adoption and usage analytics for the entire internet, providing access to data that simply wasn’t available before. The insights are used by marketing and sales professionals for website profiling, lead generation, competitive analysis, and business intelligence.

SimiliarTech can also locate contact information for personnel responsible for Web operations, in other words new potential clients.

This tool is kind of like the mailing houses of the past. Mailing houses have data about people, places, organizations, etc. and can generate contact information lists of specific clientele for companies.  SimiliarTech offers the contact information, but it does one better by finding the technologies people use for Web site operation.

Whitney Grace, January 17, 2016

The Disconnect: Big Data and Business Strategy

January 9, 2017

Imagine that: Big Data may not have a direct impact on business strategy.

I read “Why Big Data and Algorithms Won’t Improve Business Strategy.” I learned that Big Data learns by playing algorithmic chess. The “moves” can be converted to patterns. The problem is that no one knows what the game is.

The write up points out:

White’s control panel is just a shadow of the landscape and the sequence of presses lacks any positional information or consistent understanding of movement on the board. When faced with a player who does understand the environment then no amount of large scale data analysis on combinations of sequences of presses through the control panel or application of artificial intelligence or algorithms that is going to help you.

The idea is that a disconnect occurs.

Data does not equal strategy for the game of “real” chess.

The write up includes an analysis of a famous battle. An accurate map may be more useful than an MBA analysis of a situationally ignorant analysis. Okay, I understand.

The write up points out:

In the game of Chess above, yes you can use large scale data analytics, AI and algorithms to discover new patterns in the sequences of presses and certainly this will help you against equally blind competitors. Such techniques will also help you in business improve your supply chain or understand user behavior or marketing or loyalty programs or operational performance or any number of areas in which we have some understanding of the environment.

The author adds:

But this won’t help you in strategy against the player with better situational awareness. Most business strategy itself operates in a near vacuum of situational awareness. For the vast majority then I’ve yet to see any real evidence to suggest that big data is going to improve this. There are a few and rare exceptions but in general, the key is first to understand the landscape and that a landscape exists.

The write up leaves me with an opportunity to hire the author. What’s clear is that content marketing and business strategy do connect. That’s reassuring. No analysis needed. No map either.

Stephen E Arnold, January 9, 2017

An Apologia for People. Big Data Are Just Peachy Keen

December 25, 2016

I read “Don’t Blame Big Data for Pollsters’ Failings.” The news about the polls predicting a victory for Hillary Clinton reached me in Harrod’s Creek five days after the election. Hey, Beyond Search is in rural Kentucky. It looks from the news reports and the New York Times’s odd letter about doing “real” journalism that the pundits predicted that the mare would win the US derby.

The write up explains that Big Data did not fail. The reason? The pollsters were not using Big Data. The sample sizes were about 1,000 people. Check your statistics book. In the back will be samples sizes for populations. If you have an older statistics book, you have to use the formula like

image

Big Data doesn’t fool around with formulas. Big Data just uses “big data.” Is the idea is that the bigger the data, the better the output?

The write up states that the problem was the sample itself: The actual humans.

The write up quotes a mid tier consultant from an outfit called Ovum which reminds me of eggs. I circled this statement:

“When you have data sets that are large enough, you can find signals for just about anything,” says Tony Baer, a big data analyst at Ovum. “So this places a premium on identifying the right data sets and asking the right questions, and relentlessly testing out your hypothesis with test cases extending to more or different data sets.”

The write up tosses in social media. Facebook takes the position that its information had minimal effect on the election. Nifty assertion that.

The solution is, as I understand the write up, to use a more real time system, different types of data, and math. The conclusion is:

With significant economic consequences attached to political outcomes, it is clear that those companies with sufficient depth of real-time behavioral data will likely increase in value.

My view is that hope and other distinctly human behaviors certainly threw an egg at reality. It is great to know that there is a fix and that Big Data emerge as the path forward. More work ahead for the consultants who often determine sample sizes by looking at Web sites like SurveySystem and get their sample from lists of contributors, a 20 something’s mobile phone contact list, or lists available from friends.

If you use Big Data, tap into real time streams of information, and do the social media mining—you will be able to predict the future. Sounds logical? Now about that next Kentucky Derby winner? Happy or unhappy holiday?

Stephen E Arnold, December 25, 2016

Next Page »

  • Archives

  • Recent Posts

  • Meta