Solving real-world problems with data analytics

Data Analytics Mathematics
May 6, 2016

Think data analysis, and you might picture statisticians sifting rivers of neatly ordered numbers (think lab results, smartphone, sensor data — the ocean of information that is the Internet) for insights into everything from epidemics to online shopping trends. But as some of mathematician Matt Neal’s students have discovered, the data in question are not always so easily wrangled.

Funded by a grant from the National Science Foundation-supported program PIC Math (Preparation for Industrial Careers in Mathematical Sciences), Neal devised a prototype for the capstone course in the data analytics major: teams of students working to solve real-world problems suggested by alumni from different fields.

One of those alumni, criminologist Lawrence Sherman ’70, is known as the father of evidence-based policing. As director of the Institute of Criminology at the University of Cambridge, Sherman helps police forces around the world, teaching them to use empirical research to improve their operations, and training senior officers in evidence-based policy and practice.

Sherman arranged for one of his former students—Jeannette Kerr, an assistant commissioner for Police, Fire, and Emergency Services in Australia’s Northern Territory—to work with a Denison student team, looking at police reports to predict the severity and frequency of domestic violence incidents.

It’s the kind of work that could lead to a serious academic paper—and also to changes in policing strategies, law and public policy. (Sherman’s own research was cited in a 1985 Supreme Court decision that limited the use of deadly force against fleeing suspects, and laid the groundwork for the “hot spots” policing strategy that most American police departments now use to hone in on the geographical areas where the majority of crimes occur.)

But before they could even begin to analyze those reports for meaningful patterns, Neal’s students first had to convert them into a form that they could actually use.

It wasn’t easy. As Neal puts it, “Math is sometimes not as neat and clear as we make it appear in the classroom.”

For one thing, the reports, mostly written in English, were not in a statistically accessible format. For another, they were littered with errors; several, says Neal, listed individuals who were purportedly 110 years old.

So, along with perusing the scholarly literature on criminology and acquainting themselves with the practical details of police data collection, the team had to code computer algorithms that could convert the reports’ contents into an analyzable form; synthesize many different types of reports into one coherent data set; weed out outliers, like those alleged centenarians; identify which reports actually dealt with violence versus accidental injuries; and track multiple individuals through multiple incidents—all just so that they could get to the point where they could begin looking for the strongest predictors of domestic violence.

In the end, the students succeeded in massaging the reports into a more tractable form, but it took them most of the semester to do it. They determined if repeat incidents predicted severity or frequency of future harm; analyzed the role of culture, sexual orientation and gender; and gauged the effectiveness of recent alcohol abuse policing programs in Northern Territories.

The work is massive and ongoing. Regardless of what they find, the students already have learned a valuable lesson about the messiness of real-world problems—as opposed to the kind that crop up in textbooks and problem sets. And two graduating seniors, Sarah Torrence from Rockford, Ill., and Jingwen “Sky” Liu, from Wuhan, China, will continue their work for Sherman during a paid internship at Cambridge in the summer of 2016.

Back to top