8 Complexity Analysis Underscores a Fallacy in the Value of Mindless Analysis of Big Data
February 8, 2021
First, I want mention that in the last two days I have read essays which are like PowerPoint slide shows with animation and written text. Is this a new form of “writing”?
Now to the business of the essay and its mini movies: “What Is Complexity Science?” provides a run down of the different types of complexity which academics, big thinkers, number nerds, and wizard-type people have identified.
If you are not familiar with the buzzwords and how each type of complexity generates behaviors which are tough to predict in real life, read the paper which is on Microsoft Github.
Here’s the list:
- Interactions or jujujajaki networks. Think of a graph of social networks evolving in real time.
- Emergence. Stuff just happens when other stuff interact. Rioting crowds or social media memes.
- Dynamics. Think back to the pendulum your high school physics teacher tried to explain and got wrong.
- Forest fires. Visualize the LA wildfires.
- Adaptation. Remember your friend from college who went to prison. When he was released and hit the college reunion, he had not yet adjusted to life outside: Hunched, stood back to wall, put his left arm around his food, weird eye contact, etc.
The write up explains that figuring out what’s happening is difficult. Hence, mathematics. You know. Unreasonably effective at outputting useful results. (How about that 70 to 90 percent accuracy. Close enough for horse shoes? Except when the prediction is wrong. (Who has heard, “Sorry about the downside of chemotherapy, Ms. Smith. Your treatment failed and our data suggest it works in most cases.”)
Three observations:
- Complexity is like thinking and manipulating infinity. Georg Cantor illustrates what can happen when pondering the infinite.
- Predictive methods make a stab at making sense out of something which may be full of surprises. What’s important is not the 65 to 85 percent accuracy. The big deal is the 35 to 15 percent which remains — well — unpredictable due to complexity.
- Humans want certainty, acceptable risk, and possibly change on quite specific terms. Hope springs eternal for mathematicians who deliver information supporting this human need.
Complicated stuff complexity. Math works until it doesn’t. But now we have a Ramanujam Machine which can generate conjectures. Simple, right?
Stephen E Arnold, February 8, 2021
Online Privacy Just Got a Lot Less Private
December 15, 2017
Forget for a moment political hacks from other countries and think about yourself. We are far more vulnerable online than you might think. A scary new report was discovered in a University of Washington News story, “For $1,000 Anyone Can Purchase Online Ads to Track Your Location and App Use.”
According to the story:
The researchers discovered that an individual ad purchaser can, under certain circumstances, see when a person visits a predetermined sensitive location — a suspected rendezvous spot for an affair, the office of a company that a venture capitalist might be interested in or a hospital where someone might be receiving treatment — within 10 minutes of that person’s arrival. They were also able to track a person’s movements across the city during a morning commute by serving location-based ads to the target’s phone.
Importantly, the target does not have to click on or engage with the ad — the purchaser can see where ads are being served and use that information to track the target through space. In the team’s experiments, they were able to pinpoint a person’s location within about 8 meters.
The scariest part of this story is that, while there are many techniques for hiding your online browsing and consumption, there is not much you can do from being spied on by software like this. However, the ebb and flow of the internet tell us that as soon as this becomes a public concern some programmer with dollar signs in their eyes will invent a solution. We just hope it’s not too late by then.
Patrick Roland, December 15, 2017
Hope for Improvement in Predictive Modeling
July 18, 2017
A fresh approach to predictive modeling may just improve the process exponentially. Phys.org reports, “Molecular Dynamics, Machine Learning Create ‘Hyper-Predictive Computer Models.” The insight arose, and is being tested, at North Carolina State University.
The article begins by describing the incredibly complex and costly process of drug development, including computer models that predict the effects of certain chemical compounds. Such models traditionally rely on QSAR modeling and molecular docking. We learn:
Denis Fourches, assistant professor of computational chemistry, wanted to improve upon the accuracy of these QSAR models. … Fourches and Jeremy Ash, a graduate student in bioinformatics, decided to incorporate the results of molecular dynamics calculations – all-atom simulations of how a particular compound moves in the binding pocket of a protein – into prediction models based on machine learning. ‘Most models only use the two-dimensional structures of molecules,’ Fourches says. ‘But in reality, chemicals are complex three-dimensional objects that move, vibrate and have dynamic intermolecular interactions with the protein once docked in its binding site. You cannot see that if you just look at the 2-D or 3-D structure of a given molecule.’
See the article for some details about the team’s proof-of-concept study. Fourches asserts the breakthrough delivers a simulation that would previously have been built over six months in a mere three hours. That is quite an improvement! If this technique pans out, we could soon see more rapid prediction not only in pharmaceuticals but many other areas as well. Stay tuned.
Cynthia Murrell, July 18, 2017
Women in Tech Want Your Opinion on Feminism and Other Falsehoods Programmers Believe
July 14, 2017
The collection of articles on Github titled Awesome Falsehood dives into some of the strange myths and errors believed by tech gnomes and the issues that they can create. For starters, falsehoods about names. Perhaps you have encountered the tragic story of Mr. Null, who encounters a dilemma whenever inputting his last name in a web form because it often will be rejected or even crash the system.
The article explains,
This has all gotten to the point where I’ve developed a number of workarounds for times when this happens. Turning my last name into a combination of my middle name and last name, or middle initial and last name, sometimes works, but only if the website doesn’t choke on multi-word last names. My usual trick is to simply add a period to my name: “Null.” This not only gets around many “null” error blocks, it also adds a sense of finality to my birthright.
Another list expands on the falsehoods about names that programmers seem to buy into. These include cultural cluelessness about people having first names and last names that never change and are all different. Along those lines, one awesome female programmer wrote a list of falsehoods about women in tech, such as their existence revolving around a desire for a boyfriend or to complete web design tasks. (Also, mansplaining is their absolute favorite, did you know?) Another article explores falsehoods about geography, such as the mistaken notion that all places only have one official name, or even one official name per language, or one official address. While the lists may reinforce some negative stereotypes we have about programmers, they also expose the core issues that programmers must resolve to be successful and effective in their jobs.
Chelsea Kerwin, July 14, 2017
Algorithms Are Getting Smarter at Identifying Human Behavior
June 19, 2017
Algorithm deployed by large tech firms are better at understanding human behaviors, reveals former Google data scientist.
In an article published by Business Insider titled A Former Google Data Scientist Explains Why Netflix Knows You Better Than You Know Yourself, Seth Stephens-Davidowitz says:
Many gyms have learned to harness the power of people’s over-optimism. Specifically, he said, “they’ve figured out you can get people to buy monthly passes or annual passes, even though they’re not going to use the gym nearly enough to warrant this purchase.
Companies like Netflix use this to their benefit. For instance, during initial years, Netflix used to encourage users to create playlists. However, most users ended up watching the same run of the mill content. Netflix thus made changes and started recommending content that was similar to their content watching habits. It only proves one thing, algorithms are getting smarter at understanding and predicting human behaviors, and that is both good and bad.
Vishal Ingole, June 19, 2017
Crowd Wisdom Adjusted to Measure Information Popularity
June 2, 2017
The article on ScienceDaily titled In Crowd Wisdom, the ‘Surprisingly Popular’ Answer Can Trump Ignorance of the Masses conveys the latest twist on crowd wisdom, or efforts to answer questions by asking many people rather than specialists. Unsurprisingly, crowd wisdom often is not very wise at all, but rather favors the most popular information. The article uses the example of asking various populations whether Philadelphia is the capital of Pennsylvania. Those who answered yes also believed that others would agree, making it a popular answer. The article goes on to explain,
Meanwhile, a certain number of respondents knew that the correct answer is “no.” But these people also anticipated that many other people would incorrectly think the capital is Philadelphia, so they also expected a very high percentage of “yes” answers. Thus, almost everyone expected other people to answer “yes,” but the actual percentage of people who did was significantly lower. “No” was the surprisingly popular answer because it exceeded expectations of what the answer would be.
By measuring the perceived popularity of a given answer, researchers saw errors reduced by over 20% compared to straightforward majority votes, and by almost 25% compared to confidence-weighted votes. As in the case of the Philadelphia question above, those who predicted that they were in the minority deserve the most attention because they had enough information to expect that many people would incorrectly vote yes. If you take away nothing else from this, let it be that Harrisburg, not Philly, is the capital of Pennsylvania.
Chelsea Kerwin, June 2, 2017
How Data Science Pervades
May 2, 2017
We think Information Management may be overstating a bit with the headline, “Data Science Underlies Everything the Enterprise Now Does.” While perhaps not underpinning quite “everything,” the use of data analysis has indeed spread throughout many companies (especially larger ones).
Writer Michael O’Connell cites a few key developments over the last year alone, including the rise of representative data, a wider adoption of predictive analysis, and the refinement of customer analytics. He predicts, even more, changes in the coming year, then uses a hypothetical telecom company for a series of examples. He concludes:
You’ll note that this model represents a significant broadening beyond traditional big data/analytics functions. Such task alignment and comprehensive integration of analytics functions into specific business operations enable high-value digital applications ranging far beyond our sample Telco’s churn mitigation — cross-selling, predictive and condition-based maintenance, fraud detection, price optimization, and logistics management are just a few areas where data science is making a huge difference to the bottom line.
See the article for more on the process of turning data into action, as illustrated with the tale of that imaginary telecom’s data-wrangling adventure.
Cynthia Murrell, May 2, 2017
IBM Uses Watson Analytics Freebie Academic Program to Lure in Student Data Scientists
May 6, 2016
The article on eWeek titled IBM Expands Watson Analytics Program, Creates Citizen Data Scientists zooms in on the expansion of the IBM Watson Analytics academic program, which was begun last year at 400 global universities. The next phase, according to Watson Analytics public sector manager Randy Messina, is to get Watson Analytics into the hands of students beyond computer science or technical courses. The article explains,
“Other examples of universities using Watson Analytics include the University of Connecticut, which is incorporating Watson Analytics into several of its MBA courses. Northwestern University is building Watson Analytics into the curriculum of its Predictive Analytics, Marketing Mix Models and Entertainment Marketing classes. And at the University of Memphis Fogelman College of Business and Economics, undergraduate students are using Watson Analytics as part of their initial introduction to business analytics.”
Urban planning, marketing, and health care disciplines have also ushered in Watson Analytics for classroom use. Great, so students and professors get to use and learn through this advanced and intuitive platform. But that is where it gets a little shady. IBM is also interested in winning over these students and leading them into the data analytics field. Nothing wrong with that given the shortage of data scientists, but considering the free program and the creepy language IBM uses like “capturing mindshare among young people,” one gets the urge to warn these students to run away from the strange Watson guy, or at least proceed with caution into his lair.
Chelsea Kerwin, May 6, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Open Source Boundaries
July 3, 2015
Now here is an interesting metaphor to explain how open source is sustainable. On OpenSource.com, Bryan Behrenshausen posted the article, “Making Collaboration Sustainable” that references the famous scene from Tom Sawyer, where the title character is forced to whitewash a fence by his Aunt Polly. He does not want to do it, but is able to persuade his friends that whitewashing is fun and has them pay him for the privilege.
Jim Whitehurst refers to it as the “Tom Sawyer” model, where organizations treat communities as gullible chumps who will work without proper compensation. It is a type of crowdsourcing, where the organizations benefit from the communities’ resources to further their own goals. Whitehurst continues that this is not a sustainable approach to crowdsourcing. It could even backfire at some point.
He continues to saw open source requires a different mindset, one that has a commitment from its contributors and everyone is equal and must be treated/respected for their efforts.
“Treating internal and external communities as equals, really listening to and understanding their shared goals, and locating ways to genuinely enhance those goals—that’s the key to successfully open sourcing a project. Crowdsourcing takes what it can; it turns people and their ideas into a resource. Open sourcing reciprocates where it can; it channels people and their ideas into a productive community.”
The entire goal of open source is to work with a community that coalesces around shared beliefs and passions. Behrenshausen finishes with that an organization might find themselves totally changed by engaging with an open source community and it could be for the better. Is that a good thing or a bad thing? It is, however, concerning for enterprise search solutions.
Whitney Grace, July 3, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Kroll Ontrack Enjoys Predictive Coding Award
October 20, 2014
What happened to Recommind and ZyLAB? We thought they were eDiscovery frontrunners, but now BusinessWire tells us, “Kroll Ontrack Voted Best Predictive Coding Solution in New York Law Journal Survey.” The 2014 survey tallied votes in 90 categories from readers of the law journal ALM. The press release quotes Kroll Ontrack’s VP of product management, John Grancarich:
“We are honored to be chosen as the leading predictive coding technology in the industry by New York Law Journal readers. With a focus on amplifying the power of your best reviewers, this award demonstrates the impact ediscovery.com Review predictive coding technology has in driving increased speed, consistency and accuracy in document review.”
The strength of the predictive coding platform, we are told, comes from three parts that work together: workflow technology, “smart training” technology, and quality control/ sampling technology. The write-up emphasizes:
“Given the innovative volume control mechanisms of ediscovery.com Review, the award-winning power of Kroll Ontrack’s predictive coding is available throughout the entire culling, filtering, early data assessment and review experience. For more information about Kroll Ontrack predictive coding technology, visit http://www.ediscovery.com/solutions/review/ or watch a demo at http://www.ediscovery.com/review-demo/.”
Headquartered in Eden Prairie, Minnesota, Kroll Ontrack launched as a software firm in 1985. The company’s work with damaged hard drives led to a focus on data recovery. Now, Kroll Ontrack supplies a wealth of data-related solutions to customers in the legal, corporate, and government arenas.
Cynthia Murrell, October 20, 2014
Sponsored by ArnoldIT.com, developer of Augmentext