Academics Can Predict Crime: What about Close Enough for Horseshoes Accuracy?

July 6, 2022

I have no phat phaux phrench bulldog in this upcoming academic free-for-all. I read “Algorithm Predicts Crime a Week in Advance, But Reveals Bias in Police Response.” Yellow lights flash.

The article is a summary of a longer research paper published by wizards at the University of Chicago, an outstanding institution located in a safe, well-lit, and community-oriented area of Chicago. Home of the Bears and once the literal stomping grounds of the P Stone Nation.  (And, Yes, I am intentionally leaving part of the gang’s name out of my reference. Feel free to use the full gang name yourself.)

The write up says:

Data and social scientists from the University of Chicago have developed a new algorithm that forecasts crime by learning patterns in time and geographic locations from public data on violent and property crimes. The model can predict future crimes one week in advance with about 90% accuracy.

Predicting crime a week before the incident or incidents sounds like an application of predictive analytics. I think there was an outfit which started at Indiana University which came up with something similar. That system attracted some attention and some skepticism.

But humans are curious and applying mathematical recipes to available data is for some an interesting way to pursue grants, publicity, and maybe some start up funding.

But 90 percent. That begs the question, “What about that other 10 percent?” How low does the model go for acceptable outputs? Maybe 60 percent confidence? Maybe lower?

The write up continues:

Previous efforts at crime prediction often use an epidemic or seismic approach, where crime is depicted as emerging in “hotspots” that spread to surrounding areas. These tools miss out on the complex social environment of cities, however, and don’t consider the relationship between crime and the effects of police enforcement.

I know I have mentioned Banjo (now SafeX AI) and the firm’s patents. Some of these patent documents provide useful summaries of some of the algorithms used in predictive models. What’s strikes me as  important about math-centric outputs is that methods are useful — up to a point. I have a canned lecture which identifies the 10 most used mathy methods and identifies how the data sets going in can be poisoned by an intentional actor. The culprit can be smart software generating data in the manner of AI synthetic data systems or by humans working for a government funded entity in St. Petersburg, Russia.

However, there have been a few high hurdles predictive systems have to jump over in a clean, fluid manner; for instance:

  • Identifying and filtering certain data. Bad data can have a significant impact of the outputs. My recollection is that analysis of a predictive system in California revealed wide variation in the collection of data and the consistency of the data from both humans and automated sources
  • Refining actionable outputs. Some of these outputs are often wide of the mark. This means that scarce resources may be deployed on a wild goose chase or investigation of actors who are not “bad” or involved in an incident
  • Real time not correlating with the past. Numerous contextual issues arise in real time, and predictive systems operate in what I call a time disconnected mode. For those on the pointy end of the stick, this time variance can create a situation in which the predictive outputs are not just a few degrees off center, they are orbiting around a beach club in Bermuda.

If you want to read the entire academic “we have cracked this problem” article, navigate to this link. You will have to pay to read this remarkable article.

Stephen E Arnold, July 6, 2022


Comments are closed.

  • Archives

  • Recent Posts

  • Meta