December 5, 2016
Many know that law enforcement often turns to social media for clues, but you may not be aware how far such efforts have gotten. LittleSis, a group that maps and publishes relationships between the world’s most powerful entities, shares what it has learned about the field of social-media spying in, “You Are Being Followed: The Business of Social Media Surveillance.”
LittleSis worked with MuckRock, a platform that shares a trove of original government documents online. The team identified eight companies now vending social-media-surveillance software to law enforcement agencies across the nation; see the article for the list, complete with links to more information on each company. Writer Aaron Cantú describes the project:
We not only dug into the corporate profiles of some of the companies police contract to snoop on your Tweets and Facebook rants, we also filed freedom of information requests to twenty police departments across the country to find out how, when, and why they monitor social media. …
One particularly well-connected firm that we believe is worth highlighting here is ZeroFOX, which actively monitored prominent Black Lives Matter protesters in Baltimore and labeled some of them, including former Baltimore mayoral candidate DeRay McKesson, ‘threat actors.’ The company reached out to Baltimore officials first, offering it services pro-bono, which ZeroFOX executives painted as a selfless gesture of civic responsibility. But city officials may have been especially receptive to ZeroFOX’s pitch because of the powerful names standing behind it.
Behind ZeroFOX are weighty names indeed, like Mike McConnell, former director of the NSA, and Robert Rodgiguez, who is tied to Homeland Security, the Secret Service, and a prominent security firm. Another company worth highlighting is Geofeedia, because its name appears in all the police-department records the project received so far. The article details how each of these departments have worked with that company, from purchase orders to contract specifications. According to its CEO, Geofeedia grew sevenfold in just the last two years.
Before closing with a call for readers to join the investigation through MuckRock, Cantú makes this key observation:
Because social media incites within us a compulsion to share our thoughts, even potentially illegal ones, law enforcement sees it as a tool to preempt behavior that appears threatening to the status quo. We caught a glimpse of where this road could take us in Michigan, where the local news recently reported that a man calling for civil unrest on Facebook because of the Flint water crisis was nearly the target of a criminal investigation. At its worst, social media monitoring could create classes of ‘pre-criminals’ apprehended before they commit crimes if police and prosecutors are able to argue that social media postings forecast intent. This is the predictive business model to which Geofeedia CEO Phil Harris aspires. [The link goes to a 23-minute interview with Harris at YouTube.]
Postings forecast intent”— because no one ever says anything online they don’t really mean, right? There is a reason the pre-crime-arrest concept is fodder for tales of dystopian futures. Where do details like civilian oversight and the protection of civil rights come in?
Cynthia Murrell, December 5, 2016
December 5, 2016
An analytics company that collects crime related data from local law enforcement agencies plans to help reduce crime rates by using Big Data.
CrimerReports.com, in its FAQs says:
The data on CrimeReports is sent on an hourly, daily, or weekly basis from more than 1000 participating agencies to the CrimeReports map. Each agency controls their data flow to CrimeReports, including how often they send data, which incidents are included.
Very little is known about the service provider. WhoIs Lookup indicates that though the domain was registered way back in 1999, it was updated few days back on November 25th 2016 and is valid till November 2, 2017.
CrimeReports is linked to a local law enforcement agency that selectively shares the data on crime with the analytics firm. After some number crunching, the service provider then sends the data to its subscribers via emails. According to the firm:
Although no formal, third-party study has been commissioned, there is anecdotal evidence to suggest that public-facing crime mapping—by keeping citizens informed about crime in their area—helps them be more vigilant and implement crime prevention efforts in their homes, workplaces, and communities. In addition, there is anecdotal evidence to suggest that public-facing crime mapping fosters more trust in local law enforcement by members of the community.
To maintain data integrity, the data is collected only through official channels. The crime details are not comprehensive, rather they are redacted to protect victim and criminal’s privacy. As of now, CrimeReports get paid by law enforcement agencies. Certainly, this is something new and probably never tried.
Vishal Ingole, December 5, 2016
December 1, 2016
“Let’s index everything” or “Let’s process all the digital data”. Ever hear these statements or something similar? I have. In fact, I hear this type of misinformed blather almost every day. I read “Big Data Coming in Faster Than Biomedical Researchers Can Process It” seems to have figured out that yapping about capture and crunch are spitting out partial truths. (What’s new in the trendy world of fake news?)
The write up points out in a somewhat surprised way:
“It’s not just that any one data repository is growing exponentially, the number of data repositories is growing exponentially,” said Dr. Atul Butte, who leads the Institute for Computational Health Sciences at the University of California, San Francisco.
Now the kicker:
Prospecting for hints about health and disease isn’t going to be easy. The raw data aren’t very robust and reliable. Electronic medical records are often kept in databases that aren’t compatible with one another, at least without a struggle. Some of the potentially revealing details are also kept as free-form notes, which can be hard to extract and interpret. Errors commonly creep into these records. And data culled from scientific studies aren’t entirely trustworthy, either.
Net net: Lots of data. Inadequate resources. Inability to filter for relevance. Failure to hook “data” to actual humans. The yap about curing cancer or whatever disease generates a news release indicates an opportunity. But there’s no easy solution.
The resources to “make sense” of large quantities of historical and real time data are not available. But marketing is easy. Dealing with real world data is a bit more difficult. Keep that in mind if you develop a nifty disease and expect Big Data and analytics to keep the cookies from burning. Sure the “data” about making a blue ribbon batch of chocolate chips is available. Putting the right information into a context at the appropriate time is a bit more difficult even for the cognitive, smart software, text analytics cheerleaders.
Wait. I have a better idea. Why not just let a search system find and discover exactly what you need? Let me know how that works out for you.
Stephen E Arnold, December 1, 2016
November 30, 2016
It seems obvious to us, but apparently, some folks need a reminder. Harvard Business Review proclaims, “You Don’t Need Big Data, You Need the Right Data.” Perhaps that distinction has gotten lost in the Big Data hype. Writer Maxwell Wessel points to Uber as an example. Though the company does collect a lot of data, the key is in which data it collects, and which it does not. Wessel explains:
In an era before we could summon a vehicle with the push of a button on our smartphones, humans required a thing called taxis. Taxis, while largely unconnected to the internet or any form of formal computer infrastructure, were actually the big data players in rider identification. Why? The taxi system required a network of eyeballs moving around the city scanning for human-shaped figures with their arms outstretched. While it wasn’t Intel and Hewlett-Packard infrastructure crunching the data, the amount of information processed to get the job done was massive. The fact that the computation happened inside of human brains doesn’t change the quantity of data captured and analyzed. Uber’s elegant solution was to stop running a biological anomaly detection algorithm on visual data — and just ask for the right data to get the job done. Who in the city needs a ride and where are they? That critical piece of information let the likes of Uber, Lyft, and Didi Chuxing revolutionize an industry.
In order for businesses to decide which data is worth their attention, the article suggests three guiding questions: “What decisions drive waste in your business?” “Which decisions could you automate to reduce waste?” (Example—Amazon’s pricing algorithms) and “What data would you need to do so?” (Example—Uber requires data on potential riders’ locations to efficiently send out drivers.) See the article for more notes on each of these guidelines.
November 24, 2016
Showing work is messy, necessary step to prove how one arrived at a solution. Most of the time it is never reviewed, but with big data people wonder how computer algorithms arrive at their conclusions. Engadget explains that computers are being forced to prove their results in, “MIT Makes Neural Networks Show Their Work.”
Understanding neural networks is extremely difficult, but MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) has developed a way to map the complex systems. CSAIL figured the task out by splitting networks in two smaller modules. One for extracting text segments and scoring according to their length and accordance and the second module predicts the segment’s subject and attempts to classify them. The mapping modules sounds almost as complex as the actual neural networks. To alleviate the stress and add a giggle to their research, CSAIL had the modules analyze beer reviews:
For their test, the team used online reviews from a beer rating website and had their network attempt to rank beers on a 5-star scale based on the brew’s aroma, palate, and appearance, using the site’s written reviews. After training the system, the CSAIL team found that their neural network rated beers based on aroma and appearance the same way that humans did 95 and 96 percent of the time, respectively. On the more subjective field of “palate,” the network agreed with people 80 percent of the time.
One set of data is as good as another to test CSAIL’s network mapping tool. CSAIL hopes to fine tune the machine learning project and use it in breast cancer research to analyze pathologist data.
November 22, 2016
Yahoo, Facebook, Google, WhatsApp, Instagram and Microsoft all have one thing in common; for any service that they provide for free, they are harnessing your private data to be sold to advertisers.
Mirror UK recently published an Op-Ed titled Who Is Spying on You? What Yahoo Hack Taught Us About Facebook, Google, and WhatsApp in which the author says:
Think about this for a second. All those emails you’ve written and received with discussions about politics and people that were assumed to be private and meant as inside jokes for you and your friends were being filtered through CIA headquarters. Kind of makes you wonder what you’ve written in the past few years, doesn’t it?
The services be it free email or free instant messaging have been designed and developed in such a way that the companies that own them end up with a humongous amount of information about its users. This data is sugarcoated and called as Big Data. It is then sold to advertisers and marketers who in the garb of providing immersive and customized user experience follow every click of yours online. This is akin to rearing animals for slaughtering them later.
The data is not just for sale to the corporates; law enforcement agencies can snoop on you without any warrants. As pointed out in the article:
While hypocritical in many ways, these tech giants are smart enough to know who butters their bread and that the perception of trust outweighs the reality of it. But isn’t it the government who ultimately ends up with the data if a company is intentionally spying on us and building a huge record about each of us?
None of the tech giants accept this fact, but most are selling your data to the government, including companies like Samsung that are into the hardware business.
Is there are a way that can help you evade this online snooping? Probably no if you consider mainstream services and social media platforms. Till then, if you want to stay below the radar, delete your accounts and data on all mainstream email service providers, instant messaging apps, service providing websites and social media platform.
November 18, 2016
I read “Big Data Shows People’s Collective Behavior Follows Strong Periodic Patterns.” May I suggest you sit down, take a deep breath, and contemplate a field of spring flowers before you read these findings. I am not kidding. Hot stuff, gentle reader.
According to the write up,
New research has revealed that by using big data to analyze massive data sets of modern and historical news, social media and Wikipedia page views, periodic patterns in the collective behavior of the population can be observed that could otherwise go unnoticed.
Here are the findings. I take no responsibility for the impact of these Big Data verified outputs. You are on your own. You now have your trigger warning about the findings from online news, newspapers, tweets, and Wikipedia usage. The findings are:
- “People’s leisure and work were regulated by the weather with words like picnic or excursion consistently peaking every summer in the UK and the US.”
- Diet, fruits, foods, and flowers were influenced by the seasons.
- Measles surface in the spring
- Gooseberries appear in June. (Well, maybe not in Harrod’s Creek.)
- Football and Oktoberfest become popular in the fall. (Yep, October for Oktoberfest, right?)
- People get depressed in the winter.
Now you have it. Big Data delivers.
Stephen E Arnold, November 18, 2016
November 18, 2016
I love election years! Actually, that is sarcasm. Election years bring out the worst in Americans. The media runs rampant with predictions that each nominee is the equivalent of the anti-Christ and will “doom America,” “ruin the nation,” or “destroy humanity.” The sane voter knows that whoever the next president is will probably not destroy the nation or everyday life…much. Fear, hysteria, and paranoia sells more than puff pieces and big data supports that theory. Popular news site Newsweek shares that, “Our Trust In Big Data Shows We Don’t Trust Ourselves.”
The article starts with a new acronym: DATA. It is not that new, but Newsweek takes a new spin on it. D means dimensions or different datasets, the ability to combine multiple data streams for new insights. A is for automatic, which is self-explanatory. T stands for time and how data is processed in real time. The second A is for artificial intelligence that discovers all the patterns in the data.
Artificial intelligence is where the problems start to emerge. Big data algorithms can be unintentionally programmed with bias. In order to interpret data, artificial intelligence must learn from prior datasets. These older datasets can show human bias, such as racism, sexism, and socioeconomic prejudices.
Our machines are not as objectives as we believe:
But our readiness to hand over difficult choices to machines tells us more about how we see ourselves.
Instead of seeing a job applicant as a person facing their own choices, capable of overcoming their disadvantages, they become a data point in a mathematical model. Instead of seeing an employer as a person of judgment, bringing wisdom and experience to hard decisions, they become a vector for unconscious bias and inconsistent behavior. Why do we trust the machines, biased and unaccountable as they are? Because we no longer trust ourselves.”
Newsweek really knows how to be dramatic. We no longer trust ourselves? No, we trust ourselves more than ever, because we rely on machines to make our simple decisions so we can concentrate on more important topics. However, what we deem important is biased. Taking the Newsweek example, what a job applicant considers an important submission, a HR representative will see as the 500th submission that week. Big data should provide us with better, more diverse perspectives.
November 17, 2016
Offering on Dark Web marketplaces are getting weirder by the day. Apart from guns, ammo, porn, fake identities, products like forged train tickets are now available for sale.
The Guardian in an investigative article titled Dark Web Departure: Fake Train Tickets Go on Sale Alongside AK-47s reveals that:
At least that’s the impression left by an investigation into the sale of forged train tickets on hidden parts of the internet. BBC South East bought several sophisticated fakes, including a first-class Hastings fare, for as little as a third of their face value. The tickets cannot fool machines but barrier staff accepted them on 12 occasions.
According to the group selling these tickets, the counterfeiting was done to inflict financial losses on the operators who are providing deficient services. Of course, it is also possible that the fake tickets are used by people (without criminalistics inclinations) who do not want to pay for the full fares.
One school of thought also says that like online marketplaces on Open Web, Dark Web marketplaces are also getting customer-savvy and are providing products and services that the customers need or want. This becomes apparent in this portion of the article:
The academics say the sites, once accessed by invitation or via dark-web search engines (there’ll be no hyperlinks here) resemble typical marketplaces such as Amazon or eBay, and that customer service is improving. “Agora was invitation-only but many of these marketplaces are easily accessible if you know how to search,” Dr Lee adds. “I think any secondary school student who knows how to use Google could get access – and that’s the danger of it.
One of the most active consumer group on Dark Web happens to be students, who are purchasing anything from fake certificates to hacker services to improve their grades and attendance records. Educational institutions, as well as law enforcement officials, are worried about this trend. And as more people get savvy with Dark Web, this trend is going to strengthen creating a parallel e-commerce, albeit a dark one.
November 16, 2016
Researchers from Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) are claiming that an algorithm developed by them is capable of identifying gang members on Twitter.
Vice.com recently published an article titled Researchers Claim AI Can Identify Gang Members on Twitter, which claims that:
A deep learning AI algorithm that can identify street gang members based solely on their Twitter posts, and with 77 percent accuracy.
The article then points out the shortcomings of the algorithm or AI by saying this:
According to one expert contacted by Motherboard, this technology has serious shortcomings that might end up doing more harm than good, especially if a computer pegs someone as a gang member just because they use certain words, enjoy rap, or frequently use certain emojis—all criteria employed by this experimental AI.
The shortcomings do not end here. The data on Twitter is being analyzed in a silo. For example, let us assume that few gang members are identified using the algorithm (remember, no location information is taken into consideration by the AI), what next?
Is it not necessary then to also identify other social media profiles of the supposed gang members, look at Big Data generated by them, analyze their communication patterns and then form some conclusion? Unfortunately, none of this is done by the AI. It, in fact, would be a mammoth task to extrapolate data from multiple sources just to identify people with certain traits.
And most importantly, what if the AI is put in place, and someone just for the sake of fun projects an innocent person as a gang member? As rightly pointed out in the article – machines trained on prejudiced data tend to reproduce those same, very human, prejudices.