NLP and Smart Software: Everyone Becomes a Big Data Expert
December 28, 2015
A new company seeks to make everyone a big data expert. You can get the full scoop in “Detecting Consumer Decisions within Messy Data: Software Analyzes Online Chatter to Predict Health Care Consumers’ Behavior.” The company with the natural language technology and proprietary smart software is dMetrics.
Here’s the premise:
DecisionEngine, Nemirovsky [dMetrics wizard] says, better derives meaning from text because the software — which now consists of around 2 million lines of code — is consistently trained to recognize various words and synonyms, and to interpret syntax and semantics. “Online text is incredibly tough to analyze properly,” he says. “There’s slang, misspellings, run-on sentences, and crazy punctuation. Discussion is messy.”
Now, how does the system work?
Visualize the software as a three-tiered funnel, Nemirovsky suggests, with more refined analysis happening as the funnel gets narrower. At the top of the funnel, the software mines all mentions of a particular word or phrase associated with a certain health care product, while filtering out “noise” such as fake websites and users, or spam. The next level down involves separating out commenters’ personal experiences over, say, marketing materials and news. The bottom level determines people’s decisions and responses, such as starting to use a product — or even considering doing so, experiencing fear or confusion, or switching to a different medication.
The company wants to expand beyond health care. Worth monitoring.
Stephen E Arnold, December 28, 2015
Star Wars and Its Big Data Lessons
December 24, 2015
I thought the Big Data hype had peaked. Wrong. Navigate to “6 Lessons ‘Star Wars’ Can Teach Us About Big Data.” The insight upon which the article is based is that Star Wars is not designed as a multi billion dollar commercial vehicle for Disney. Nope. Star Wars is a really nifty way to learn six lessons about Big Data. My mind is firing synapses. Amazing. Brilliant. Yes, go to a two hour movie designed to entertain the folks in love with the confection of light sabers, nifty space ships, and assorted folks who are very similar to those who reside in Harrod’s Creek, Kentucky.
What are the six learnings, which I assume were crafted in strict adherence to the rules of peer reviewed, tenure track journal article crafting. Here we go:
- Recognize the full value of data. For example, look at the many killed via nifty weapons and conclude, avoid the weapons.
- Seek context to better understand your data. Okay, don’t be where a nifty weapon will strike. Got it.
- Make precise measurements and calculations. I would just ask a rolling robot, however.
- Plan what one is tracking or measuring. Wow.
- Hire smart analysts. I would add, “who have a love for Star Wars.”
- Use data to inform decision making. Yep, that works in films too.
Remarkable. It is time to quaff another mug of bourbon infused egg nog and tackle another Big Data topic.
Stephen E Arnold, December 24, 2015
New Years Resolutions in Personal Data Security
December 22, 2015
The article on ITProPortal titled What Did We Learn in Records Management in 2016 and What Lies Ahead for 2016? delves into the unlearnt lessons in data security. The article begins with a look back over major data breaches, including Ashley Madison, JP Morgan et al, and Vtech and gathers from them the trend of personal information being targeted by hackers. The article reports,
“A Crown Records Management Survey earlier in 2015 revealed two-thirds of people interviewed – all of them IT decision makers at UK companies with more than 200 employees – admitted losing important data… human error is continuing to put that information at risk as businesses fail to protect it properly…but there is legislation on the horizon that could prompt change – and a greater public awareness of data protection issues could also drive the agenda.”
The article also makes a few predictions about the upcoming developments in our approach to data protection. Among them includes the passage of the European Union General Data Protection Regulation (EU GDPR) and the resulting affect on businesses. In terms of apps, the article suggests that more people might start asking questions about the information required to use certain apps (especially when the data they request is completely irrelevant to the functions of the app.) Generally optimistic, these developments will only occur of people and businesses and governments take data breaches and privacy more seriously.
Chelsea Kerwin, December 22, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Watson Weekly: More on the Weather Channel
December 21, 2015
IBM owns the Weather Company data, not the TV show. The write up “The Future of Cognitive Computing Is Now: Advanced Analytics Is Already Saving Lives and Driving Efficiencies at Airbus and The Weather Company” is an interesting IBM Watson content marketing effort.
When IBM bought the flagging Weather Company, I was curious how the tie up would fly. This article explains that the duo are performing some pretty serious good deeds; for example, saving lives. The argument is interesting to those who want airlines to be more efficient in ways beyond treating customers like cattle and feeding them meals which are very similar to what is in Old Yeller dog food.
The airline angle is explained this way:
The Weather Company utilizes the connected sensors of aircraft from over 200 airlines (which adds up to a combined total of over 50,000 flights a day) to measure atmospheric pressure and wind speed. Drones and even smart phones are used to take measurements closer to the ground, while satellites collect data from high above the globe. This amounts to a vast amount of data, Kenny explains.
The data help pilots avoid turbulence.
Also the write up explains:
Harnessing the power of the IoT and cognitive computing isn’t just about saving lives, important as that is. That data is analyzed to provide better customer services, something that Laurent Martinez aims to do at Airbus by improving digital operations in two ways. “First is what I call gate-to-gate operations… There’s also a second, productivity-boosting goal that comes with connected aircraft: allowing passengers to use the internet while on flights, something which would be especially useful for long haul.
My take on this deal with IBM is that the Watson tendrils are being hydroponically force fed into business niches.
Will these tendrils take root and flourish? Will these tendrils die and be absorbed into the datasphere? IBM is betting big money that from a tiny acorn, giant Watson revenues will grow.
Great idea. We have to wait to find out what Mother Nature does. I would not try to fool Mother Nature or stakeholders when revenue is involved.
Stephen E Arnold, December 21, 2015
Internet Sovereignty, Apathy, and the Cloud
December 21, 2015
The OS News post titled Dark Clouds Over the Internet presents an argument that boils down to a choice between international accord and data sharing agreement, or the risk of the Internet being broken up into national networks. Some very worked up commenters engaged in an interesting discussion that spanned government overreaching, democracy, data security, privacy, and for some reason, climate change. One person summarized their opinion thusly:
“Best policy: don’t store data with someone else. There is no cloud. It’s just someone else’s computer.”
In response, a user named Alfman replied that companies are to blame for the current lack of data security, or more precisely, people are generally to blame for allowing this state of affairs to exist,
The privacy issues we’re now seeing are a direct consequence of corporate business models pushing our data into their central silos. None of this is surprising except perhaps how willing users have been to forgo their own privacy. Collectively, it seems that we are very willing to give up our rights for very little in exchange… makes it difficult to achieve critical mass around technologies promoting data independence.”
It is hard to argue with the apathy factor, with data breaches occurring regularly and so little being done by individuals to protect themselves. Good thing these commenters have figured it all out. Next up, solving climate change.
Chelsea Kerwin, December 21, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Topology Is Finally on Top
December 21, 2015
Topology’s time has finally come, according to “The Unreasonable Usefulness of Imagining You Live in a Rubbery World,” shared by 3 Quarks Daily. The engaging article reminds us that the field of topology emphasizes connections over geometric factors like distance and direction. Think of a subway map as compared to a street map; or, as writer Jonathan Kujawa describes:
“Topologists ask a question which at first sounds ridiculous: ‘What can you say about the shape of an object if you have no concern for lengths, angles, areas, or volumes?’ They imagine a world where everything is made of silly putty. You can bend, stretch, and distort objects as much as you like. What is forbidden is cutting and gluing. Otherwise pretty much anything goes.”
Since the beginning, this perspective has been dismissed by many as purely academic. However, today’s era of networks and big data has boosted the field’s usefulness. The article observes:
“A remarkable new application of topology has emerged in the last few years. Gunnar Carlsson is a mathematician at Stanford who uses topology to extract meaningful information from large data sets. He and others invented a new field of mathematics called Topological data analysis. They use the tools of topology to wrangle huge data sets. In addition to the networks mentioned above, Big Data has given us Brobdinagian sized data sets in which, for example, we would like to be able to identify clusters. We might be able to visually identify clusters if the data points depend on only one or two variables so that they can be drawn in two or three dimensions.”
Kujawa goes on to note that one century-old tool of topology, homology, is being used to analyze real-world data, like the ways diabetes patients have responded to a specific medication. See the well-illustrated article for further discussion.
Cynthia Murrell, December 21, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Watson Is Laying Startup Eggs
December 21, 2015
Incubators are warming stations for eggs. Without having to rely on an organism’s DNA donor, an incubator provides a warm, safe environment for the organism to develop, hatch, and eventually be ready to face the world. Watson has decided it is time for itself to propagate, but instead of knitting tiny computer cases Watson will invest its digital DNA in startups. The Chicago Tribune discusses Watson’s reproduction efforts and progeny in “Watson, IBM’s Big-Data Program Is Also A Startup Incubator.”
While IBM sells Watson’s ability to scan and understand terabytes of data, the company also welcomes developers to use Watson for new ideas. What is even more amazing is that IBM gives developers the ability to use Watson for free for a limited time.
“In Ecosystem, everyone is invited to play with Watson for free (for a limited time); some 77,000 developers have accepted. If your Watson-powered startup shows promise, it becomes a “partner,” often via a quasi-incubator model, and enjoys access to IBM business and technology advisers–and a shot at a capital infusion from the $100 million IBM is making available to Watson startups…”
Ecosystem has been used for startups that feature lifestyle coaching, personal shopping, infrastructure guards, veterinarian advice, fantasy sports calculator, 311 information, and even a hotel butler.
To quote the biblical justification for propagation: “Go forth and multiply the [Watson startups].”
Whitney Grace, December 21, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Two AI Paths Pondered by Teradata
December 20, 2015
I read the content marketing write up by Karthik Guruswamy. I like the “guru” part of the expert’s name. I am stuck with the “old” part of my name.
The write is called “Data Science: Machine Learning Vs. Rules Based Systems.” I know a little bit about both of these methods, and I know a teeny tiny bit about Teradata, an outstanding data warehouse solution chugging along with its stock in the high $20s per share. The Google finance chart suggests that the company has some challenges with net income and profit margin to my unlearned eye:
Looks like some content marketing oomph is needed to move that top line number.
I learned in the write up:
Rules based systems will work effectively if all the situations, under which decisions can be made, are known ahead of time.
Okay. Insight. Know everything ahead of time and one can write rules to cover the situation. Is this expensive? Is this a never ending job? Consultants sure hope so.
There is an alternative:
Enter Machine Learning or ML! If we classify the data into good vs. bad data sets or categorize them into different labels like A, B, C, D etc., the Machine Learning algorithms can help build rules on the fly. This step is called training which results in a model. During operationalization, this model is used by the prediction algorithm to classify the incoming data in the right way which in turn leads to sound decision making.
I recall that Autonomy used this approach for its system. Those familiar with Autonomy have some experience with retraining, Bayesian drift, and other exciting facets of machine learning based systems. Consultants love to build new training sets.
The write up asserts:
With Machine Learning, one can iteratively achieve good results by cleansing & prepping the data, changing or combining algorithms or merely tweaking the algorithm parameters. This is becoming much easier thanks to the increased awareness and the availability of different types of data science tools in the market today.
High five.
My view is that the write up left out some information. But there is one omission which warrants a special comment.
Neither of these systems works without human intervention.
Bummer. Reality is sort of a drag, but maybe that’s why Teradata is wrestling with revenue and net profit alligators. Consultants, on the other hand, can bill to enhance either approach.
What about the customer? Well, some customers of brand name data warehouse systems struggle to get data into and out of this whiz bang systems in my experience. Regardless of the craziness involved with Hadoop and Spark, these open source approaches may make more sense than pumping six or seven figures into a proprietary system.
Consultants can still bill, of course. That’s one upside of any approach one wishes to embrace.
Stephen E Arnold, December 20, 2015
Mid Tier Consultant Sees Ripples of Opportunity in Data Lakes
December 16, 2015
I love predictions from mid tier consultants. One can spot what these folks will be pitching to their customers. One can also see the buzzwords likely to replace plain talk in their reports.
A good example of this type of forecasting—which if it worked would be used to pick horse race winners, not technologies—appears in “Big Data’s Future According to Ovum.” To spare extra wear and tear on your rapidly beating heart, SQL data management will remain popular but nothing will capture the excitement of Hadoop on Spark or is it Spark in Hadoop? Oh, well.
Here’s the passage I found as chilling as a dip in the lake near my shack in rural Kentucky:
Ovum’s other big prediction for 2016 is for data lake adoption to become a “front-burner issue” for mature Hadoop adopters that have already successfully put analytics into production serving multiple lines of business and stakeholder groups across the organization. The result will be a new demand for tools to govern the data lake and make it more transparent. Ovum expects significant growth in tooling that builds on emerging data lineage capabilities to catalogue, protect, govern access, tier storage, and manage the lifecycle of data stored in data lakes.
The word for 2016 will involve govern as in “governance.” The idea is that once folks dump stuff in the lake, a digital and procedural mechanism will be needed to figure out exactly what’s in the lake.
Wow, mid tier consulting pitching the need for management. I wonder if the mid tier consulting firms are able to sell their clients management consulting services?
I think this means that these predictions and the attendant reports are essentially content marketing exercises. That’s okay, but writing about a problem is exactly the same as solving a problem. Right?
How did that work out when search was the topic of the moment?
Stephen E Arnold, December 16, 2015
Big Data and Enterprise Search: The Caution Lights Are Flashing
December 15, 2015
I read “How You Should Explain Big Data to Your CEO.” The write up included a link which triggered thoughts of how enterprise search dug itself a hole and climbed. Unable to extricate itself from a problem enterprise search vendors created, the entire sector has been marginalized. In some circles, enterprise search is essentially a joke. “Did you hear about the three enterprise search vendors who walked into a bar?” The bartender says, “What is this? Some kind of joke?”
The link pointed me to a Slideshare (owned by the email and job hunting champion LinkedIn). That presentation, “5 Signs Your Big Data Project is Doomed to Fail,” could have been borrowed from one of my talks about enterprise search in 2001. It was not, but the basic message was identical: Big Data has created a situation in which there are some challenges here and now.
The presentation was prepared by Qubole (maybe pronounced cue ball?). Qubole is a click to query outfit. This means that reports from Big Data are easy to generate.
Here are the problems Big Data faces:
- Failed implementations. Qubole asserts that 87 percent of the Big Data implementations are flops
- 73 percent of executive describe the Big Data project as flop
- 45 percent of Big Data projects are completed
These data are similar to the results of “satisfaction” with enterprise search solutions.
Why? Qubole asserts:
- Inaccurate project scope
- Inadequate management support
- No business case
- Lack of talent (in search the talent may be present but overestimates its ability to deal with enterprise search technologies and processes)
- “Challenging tools.” I think this means that in the Big Data world there are lots of complexities.
What can one charged with either search or Big Data tasks do with this information?
My view is, “Ignore it.”
The “can do” spirit carries professionals forward. Hiring a consultant provides some job protection but does little to reverse the failure and disappointment rate.
My view is that the willingness of executives to grab at a magic solution presented by a showman marketer overrides failure date. The arrogance of those involved create a “that won’t happen to us” belief.
Who is to blame? The company for believing in baloney? The marketer for painting a picture and showing a Hollywood style demo? The developers who created the Big Data solution, knowing that chunks were not complete or just did not work before the ship date? The in house engineers who lacked self knowledge to understand their own limitations?
Everyone is in the hole with the enterprise software vendors. The hole is deep. Magic solutions are difficult to pin down. The future of Big Data is likely to parallel to some degree the dismal track record of enterprise search. Fascinating. I can hear the mid tier consultants and the handful of remaining enterprise search vendors asserting that Qubole’s points are not applicable to their specific situation.
Yep, and I believe in the tooth fairy and Santa.
Stephen E Arnold, December 15, 2015