Penn State Research Team Uses Big Data to Explore Crime Rates

February 2, 2017

The article on E&T titled Social Media and Taxi Data Improve Crime Pattern Picture delves into a fascinating study that uses big data involving taxi routes and social media location labels from sites like Foursquare to discover a correlation between taxis, locations of interest, and crime. The study was executed by Penn State researchers who are looking for a more useful way to estimate crime rates rather than the traditional approach targeting demographics and geographic data only. The article explains,

The researchers say that the analysis of crime statistics that encompass population, poverty, disadvantage index and ethnic diversity can provide more accurate estimates of crime rates … the team’s approach likens taxi routes to internet hyperlinks, connecting different communities with each other… One surprising discovery is that the data suggests areas with nightclubs tend to experience lower crime rates – at least in Chicago.  The explanation may be that it reflects people’s choices to be there.

This research will be especially useful to city planners interested in how certain spaces are being used, and whether people want to go to those spaces. But the researcher Jessie Li, an assistant professor of information sciences, explained that while the correlation is clear, the underlying cause is not yet known.

Chelsea Kerwin, February 2, 2017

 

Fight Fake News with Science

February 1, 2017

With all the recent chatter around “fake news,” one researcher has decided to approach the problem scientifically. An article at Fortune reveals “What a Map of the Fake-News Ecosystem Says About the Problem.” Writer Mathew Ingram introduces us to data-journalism expert and professor Jonathan Albright, of Elon University, who has mapped the fake-news ecosystem. Facebook and Google are just unwitting distributors of faux facts; Albright wanted to examine the network of sites putting this stuff out there in the first place. See the article for a description of his methodology; Ingram summarizes the results:

More than anything, the impression one gets from looking at Albright’s network map is that there are some extremely powerful ‘nodes’ or hubs, that propel a lot of the traffic involving fake news. And it also shows an entire universe of sites that many people have probably never heard of. Two of the largest hubs Albright found were a site called Conservapedia—a kind of Wikipedia for the right wing—and another called Rense, both of which got huge amounts of incoming traffic. Other prominent destinations were sites like Breitbart News, DailyCaller and YouTube (the latter possibly as an attempt to monetize their traffic).

Albright said he specifically stayed away from trying to determine what or who is behind the rise of fake news. … He just wanted to try and get a handle on the scope of the problem, as well as a sense of how the various fake-news distribution or creation sites are inter-connected. Albright also wanted to do so with publicly-available data and open-source tools so others could build on it.

Albright also pointed out the folly of speculating on sources of fake news; such guesswork only “adds to the existing noise,” he noted. (Let’s hear it for common sense!) Ingram points out that, armed with Albright’s research, Google, Facebook, and other outlets may be better able to combat the problem.

Cynthia Murrell, February 1, 2017

Hewlett Packard Enterprise Releases Q4 Earnings of First Year After Split from HP

January 30, 2017

The article on Business Insider titled Hewlett Packard Enterprise Misses Its Q4 Revenue Expectations But Beats on Profit discusses the first year of HPE following its separation from HP. The article reports fiscal fourth quarter revenue of $12.5B, just short of the expected $12.85B. The article provides all of the nitty gritty details of the fourth quarter segment results, including,

Software revenue was $903 million, down 6% year over year, flat when adjusted for divestitures and currency, with a 32.1% operating margin. License revenue was down 5%, down 1% when adjusted for divestitures and currency, support revenue was down 7%, up 1% when adjusted for divestitures and currency, professional services revenue was down 7%, down 4% adjusted for divestitures and currency, and software-as-a-service (SaaS) revenue was down 1%, up 11% adjusted for divestitures and currency.

Additionally, Enterprise Services revenue was reported as $4.7B, down 6% year over year, and Enterprise Group revenue was down 9% at $6.7B. Financial Services revenue was up 2% at $814M.  According to HPE President and CEO Meg Whitman, all of this amounts to a major win for the standalone company. She emphasized the innovation and financial performance and called FY16 a “historic” year for the company.

Chelsea Kerwin, January 30, 2017

Some Web Hosting Firms Overwhelmed by Scam Domains

January 27, 2017

An article at Softpedia should be a wakeup call to anyone who takes the issue of online security lightly—“One Crook Running Over 120 Tech Support Scam Domains on GoDaddy.” Writer Catalin Cimpanu explains:

A crook running several tech support scam operations has managed to register 135 domains, most of which are used in his criminal activities, without anybody preventing him from doing so, which shows the sad state of Web domain registrations today. His name and email address are tied to 135 domains, as MalwareHunterTeam told Softpedia. Over 120 of these domains are registered and hosted via GoDaddy and have been gradually registered across time.

The full list is available at the end of this article (text version here), but most of the domains look shady just based on their names. Really, how safe do you feel navigating to ‘security-update-needed-sys-filescorrupted-trojan-detected[.]info’? How about ‘personal-identity-theft-system-info-compromised[.]info’?

Those are ridiculously obvious, but it seems to be that GoDaddy’s abuse department is too swamped to flag and block even these flagrant examples. At least that hosting firm does have an abuse department; many, it seems, can only be reached through national CERT teams. Other hosting companies, though, respond with the proper urgency when abuse is reported—Cimpanu holds up Bluehost and PlanetHoster as examples. That is something to consider for anyone who thinks the choice of hosting firm is unimportant.

We are reminded that educating ourselves is the best protection. The article links to a valuable tech support scam guide provided by veteran Internet security firm Malwarebytes, and suggests studying the wikis or support pages of other security vendors.

Cynthia Murrell, January 27, 2017

Declassified CIA Data Makes History Fun

January 26, 2017

One thing I have always heard to make kids more interested in learning about the past is “making it come alive.”  Textbooks suck at “making anything come alive” other than naps.  What really makes history a reality and more interesting are documentaries, eyewitnesses, and actual artifacts.  The CIA has a wealth of history and History Tech shares with us some rare finds: “Tip Of The Week: 8 Decades Of Super Cool Declassified CIA Maps.”  While the CIA Factbook is one of the best history and geography tools on the Web, the CIA Flickr account is chock full of declassified goodies, such as spy tools, maps, and more.

The article’s author shared that:

The best part of the Flickr account for me is the eight decades of CIA maps starting back in the 1940s prepared for the president and various government agencies. These are perfect for helping provide supplementary and corroborative materials for all sorts of historical thinking activities. You’ll find a wide variety of map types that could also easily work as stand-alone primary source.

These declassified maps were actually used by CIA personnel, political advisors, and presidents to make decisions that continue to impact our lives today.  The CIA flickr account is only one example of how the Internet is a wonderful tool for making history come to life.  Although you need to be cautious about where the information comes from since these are official CIA records they are primary sources.

Whitney Grace, January 26, 2017

Obey the Almighty Library Laws

January 23, 2017

Recently I was speaking with someone and the conversation turned to libraries.  I complimented the library’s collection in his hometown and he asked, “You mean they still have a library?” This response told me a couple things: one, that this person was not a reader and two, did not know the value of a library.  The Lucidea blog discussed how “Do The Original 5 Laws Of Library Science Hold Up In A Digital World?” and apparently they still do.

S.R. Ranganathan wrote five principles of library science before computers dominated information and research in 1931.  The post examines how the laws are still relevant.  The first law states that books are meant to be used, meaning that information is meant to be used and shared.  The biggest point of this rule is accessibility, which is extremely relevant.  The second laws states, “Every reader his/her book,” meaning that libraries serve diverse groups and deliver non-biased services.  That still fits considering the expansion of the knowledge dissemination and how many people access it.

The third law is also still important:

Dr. Ranganathan believed that a library system must devise and offer many methods to “ensure that each item finds its appropriate reader”. The third law, “every book his/her reader,” can be interpreted to mean that every knowledge resource is useful to an individual or individuals, no matter how specialized and no matter how small the audience may be. Library science was, and arguably still is, at the forefront of using computers to make information accessible.

The fourth law is “save time for the reader” and it refers to being able to find and access information quickly and easily.  Search engines anyone?  Finally, the fifth law states that “the library is a growing organism.”  It is easy to interpret this law.  As technology and information access changes, the library must constantly evolve to serve people and help them harness the information.

The wording is a little outdated, but the five laws are still important.  However, we need to also consider how people have changed in regards to using the library as well.

Whitney Grace, January 23, 2017

Another Untraceable Dark Web Actor Put Behind Bars

January 19, 2017

A prison librarian in England who purchased drugs and weapons over the Dark Web for supplying them to prisoners was sentenced to 7-years in prison.

The Register in a news report Prison Librarian Swaps Books for Bars After Dark-Web Gun Buy Caper says:

Dwain Osborne, of Avenue Road, Penge, in London, was nabbed in October of 2015 after he sought to procure a Glock 19 – a staple of police and security forces worldwide – and 100 rounds of ammunition on the dark web. A search of Osborne’s house revealed the existence of a storage device, two stolen passports, and a police uniform.

Osborne was under the impression that like other Dark Web actors, he too is untraceable. What made the sleuths suspicious is not known, however, the swift action and prosecution are commendable. Law enforcement agencies are challenged by this new facet of crime wherein most perpetrators manage to remain anonymous.

Most arrests related to the purchase of arms and drugs over Dark Web were result of undercover operations. However, going beyond this type of modus operandi is the need of the hour.

Systems like Apacke Teka seem to be promising, but it is premature to say how such kind of systems will evolve and most importantly, will be implemented.

Vishal Ingole, January 19, 2017

The Software Behind the Web Sites

January 17, 2017

Have you ever visited an awesome Web site or been curious how an organization manages their Web presence?  While we know the answer is some type of software, we usually are not given a specific name.  Venture Beat reports that it is possible to figure out the software in the article, “SimilarTech’s Profiler Tells You All Of The Technologies That Web Companies Are Using.”

SimilarTech is a tool designed to crawl the Internet to analyze what technologies, including software, Web site operators use.  SimiliarTech is also used to detect which online payment tools are the most popular.  It does not come as a surprise that PayPal is the most widely used, with PayPal Subscribe and Alipay in second and third places.

Tracking what technology and software companies utilize for the Web is a boon for salespeople, recruiters, and business development professionals who want a competitive edge as well as:

Overall, SimilarTech provides big data insights about technology adoption and usage analytics for the entire internet, providing access to data that simply wasn’t available before. The insights are used by marketing and sales professionals for website profiling, lead generation, competitive analysis, and business intelligence.

SimiliarTech can also locate contact information for personnel responsible for Web operations, in other words new potential clients.

This tool is kind of like the mailing houses of the past. Mailing houses have data about people, places, organizations, etc. and can generate contact information lists of specific clientele for companies.  SimiliarTech offers the contact information, but it does one better by finding the technologies people use for Web site operation.

Whitney Grace, January 17, 2016

BAE Lands US Air Force Info Fusion Job

January 6, 2017

I read “BAE Systems Awarded $49 Million Air Force Research Lab Contract to Enhance Intelligence Sharing.” The main point is that the US Air Force has a pressing need for integrating, analyzing, and sharing text, audio, images, and data. The write up states:

The U.S. Air Force Research Lab (AFRL) has awarded BAE Systems a five-year contract worth up to $49 million to develop, deploy, and maintain cross domain solutions for safeguarding the sharing of sensitive information between government networks.

The $49 million contract will enhance virtualization, boost data processing, and support the integration of machine learning solutions.

I recall reading that the Distributed Common Ground System performs some, if not most, of these “fusion” type functions. The $49 million seems a pittance when compared to the multi-billion dollar investments in DCGS.

My hunch is that Palantir Technologies may point to this new project as an example of the US government’s penchant for inventing, not using commercial off the shelf software.

Tough problem it seems.

Stephen E Arnold, January 6, 2016

Google Looks to Curb Hate Speech with Jigsaw

January 6, 2017

No matter how advanced technology becomes, certain questions continue to vex us. For example, where is the line between silencing expression and prohibiting abuse? Wired examines Google’s efforts to walk that line in its article, “Google’s Digital Justice League: How Its Jigsaw Projects are Hunting Down Online Trolls.” Reporter Merjin Hos begins by sketching the growing problem of online harassment and the real-world turmoil it creates, arguing that rampant trolling serves as a sort of censorship — silencing many voices through fear. Jigsaw, a project from Google, aims to automatically filter out online hate speech and harassment. As Jared Cohen, Jigsaw founder and president, put it, “I want to use the best technology we have at our disposal to begin to take on trolling and other nefarious tactics that give hostile voices disproportionate weight, to do everything we can to level the playing field.”

The extensive article also delves into Cohen’s history, the genesis of Jigsaw, how the team is teaching its AI to identify harassment, and problems they have encountered thus far. It is an informative read for anyone interested in the topic.

Hos describes how the Jigsaw team has gone about instructing their algorithm:

The group partnered with The New York Times (NYT), which gave Jigsaw’s engineers 17 million comments from NYT stories, along with data about which of those comments were flagged as inappropriate by moderators.

Jigsaw also worked with the Wikimedia Foundation to parse 130,000 snippets of discussion around Wikipedia pages. It showed those text strings to panels of ten people recruited randomly from the CrowdFlower crowdsourcing service and asked whether they found each snippet to represent a ‘personal attack’ or ‘harassment’. Jigsaw then fed the massive corpus of online conversation and human evaluations into Google’s open source machine learning software, TensorFlow. …

By some measures Jigsaw has now trained Conversation AI to spot toxic language with impressive accuracy. Feed a string of text into its Wikipedia harassment-detection engine and it can, with what Google describes as more than 92 per cent certainty and a ten per cent false-positive rate, come up with a judgment that matches a human test panel as to whether that line represents an attack.

There is still much to be done, but soon Wikipedia and the New York Times will be implementing Jigsaw, at least on a limited basis. At first, the AI’s judgments will be checked by humans. This is important, partially because the software still returns some false positives—an inadvertent but highly problematic overstep. Though a perfect solution may be impossible, it is encouraging to know Jigsaw’s leader understands how tough it will be to balance protection with freedom of expression. “We don’t claim to have all the answers,” Cohen emphasizes.

Cynthia Murrell, January 6, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta