Forecasting Homicides with BSTS

Is crime going up? Or down?

While some are justifying a tougher approach on crime because of a perceived crime wave (what Jeff Sessions calls a "permanent" increasing trend), it appears that violent crime is actually decreasing. In fact, the year 2017 is on track to have one of the lowest violent crime rates in nearly two decades. The temporary increase in violent crime in 2015 and 2016, which was driven by a small number of cities (more on this, soon!), was really a blip in an overall nation-wide trend toward lower crime rates. So, 2017 is looking pretty favorable right now, which leads many people to make projections about the rest of the year. If we're already down this much, we're likely to be down even more later in the year, right?

Well, maybe. Or maybe not. Year-to-date comparisons are fraught with error. One of the biggest assumptions is that the given trend will continue unabated until the year's end. This is an assumption which is generally untenable because crime displays moderate to strong levels of seasonality - that is, depending on when you make the forecast the remaining months are likely to be different than the ones you're forecasting from. Think of it this way: You report that robberies are down 30% from last year for January through March. Not bad, crime is way down for the first quarter! However, robberies tend to increase dramatically throughout the summer months because of greater opportunity. The initial decrease you observed for the first quarter might be made up for in spades during months when there is a lot more activity.

 

Second, crime is often unpredictable - especially when you start to forecast relatively rare events like Homicides. The number of homicides month-to-month can be extremely variable, and a freak incident can cause a huge spike in the percentage change (which is often a misleading statistic in and of itself!). For instance, check out Figure 1. below! While homicides in Detroit have been at a record low in 2017, the number of Feburary homicides was higher than any year since 2012! Jerry Ratcliffe gives a good rundown on why we should be skeptical of year-to-date comparisons, especially when you're looking at highly volatile crimes early in the year.

Homicides in Detroit

Amid all this news about a decrease violent crime in 2017, the city of Detroit is getting some positive attention. Homicides are down considerably in 2017, and are projected to decrease (percentage-wise) more than many other large cities. This data comes via a report from the Brennan Institute, which is a private research firm. They projected the final number of 2017 homicides by using the proportion of homicides committed in that city up to the present date, then multiply it by the current year-to-date crime rate. For instance: if a city had 100 murders by June. It's a relatively simple method that relies on the assumption that seasonal monthly effects will be stable. Based on their information, the Brennan center estimated that Detroit will have about 220 homicides this year - down 81 incidents from 2016, representing a roughly 26% decrease. That's pretty huge.

Looking at the last five years, 2017 definitely appears to be significantly lower. Figure 1 shows that June, July, and August had the lowest recorded number of homicides. However, how certain can we be that this trend will hold? Is it likely that, by year's end, homicide will be down by roughly a quarter? Let's compare the Brennan Center's projections with the estimates from a Bayesian Structural Time Series Model.

Figure 1.     Detroit Homicides by Month (2012 - 2017)

 
 
Detroit homicides by month, by year. Each grey line represents the month-to-month counts of homicides in from 2012 to 2016. The year 2017 is highlighted in Red. By February 2017, homicides were up 16%, however this trend did not hold - homicides were down considerably by August.

Detroit homicides by month, by year. Each grey line represents the month-to-month counts of homicides in from 2012 to 2016. The year 2017 is highlighted in Red. By February 2017, homicides were up 16%, however this trend did not hold - homicides were down considerably by August.

 

Bayesian Structural Time Series

Without getting too much into the nitty-gritty (more on what's going on under the hood later!), a Bayesian Structural time series (BSTS) represents a flexible method for evaluating the evolution and change of some longitudinal phenomenon. Predictions can be made currently (so-called "nowcasting"), short-term, or long term. The advantage of BSTS is twofold: (1) It gives researchers a set of robust tools to fit a number of models, with fewer assumptions than ARIMA models (2) it allows for the fitting of complex regression models, with built-in variable selection. I'm not doing this justice, but Kim Larsen gives a great comparison of ARIMA and BSTS.

To predict the number of homicides for the remainder of 2017, I first fit a model to homicide data for 2012 to August, 2017. I fit a number of models with several different specifications, but finally landed on one with a local trend level (essentially, a random walk), an autoregression component (with a lag of 1) and a monthly seasonal component. One nice thing you can do with BSTS is specify error distributions other than gaussian. Because homicides are so volatile, I fit the model with a Student's t distribution, which better fits extreme observations. 

After assessing the model fit (which was just OK - it appears to underpredict some very high months in 2012 and 2013), I generated predictions for the months of September, October, November, and December of 2017. Figure 1 shows the results from the BSTS model. Because we're using a Bayesian model, the results are inherently probabilistic. The blue bands represent the model's uncertainty about the predictions it made. The light blue bands correspond to the 95% prediction intervals, while the dark blue bands correspond to the 50% prediction intervals. The model did a decent job modeling the overall trend (which is negative), the fairly modest month-to-month autocorrelation, and the strong seasonal effects (February is nearly always the lowest month, while July and August are nearly always the highest).

Based on the model, our best guess for the number of 2017 homicides is roughly 249, about 29 higher than the Brennan Center's estimates. However, the model shows considerable uncertainty about this estimate. We can try and interpret these estimates in light of the prediction intervals. On the extremes, we are fairly certain (95%) that 2017 homicides could be as low as 189, or as high as 310. We are fairly confident (80%) that 2017 homicides will be between 210 and 288. Our point estimates and prediction intervals fit pretty well with what Brennan estimated. However, using a Bayesian framework we quantify the inherent uncertainty about homicides. Finally, in almost all cases, except for the most extreme circumstances, the model suggests that homicides in Detroit will likely be lower than they were in 2016. 

Table 1.     Prediction intervals for 2017

Estimates and prediction intervals for September to December 2017. The final estimates for 2017 are calculated by summing the 163 homicides by August 2017 to the predictions.

Estimates and prediction intervals for September to December 2017. The final estimates for 2017 are calculated by summing the 163 homicides by August 2017 to the predictions.

Figure 2.     Predictions from BSTS on Detroit Homicides

 
 
Results from a Bayesian Structural Time Series model. The light blue and dark blue bands correspond to the 95% and 50% prediction intervals. The dark blue points represent observed counts of homicides. The pink bands and points represent predictions from the model for September, 2017 to December, 2017.

Results from a Bayesian Structural Time Series model. The light blue and dark blue bands correspond to the 95% and 50% prediction intervals. The dark blue points represent observed counts of homicides. The pink bands and points represent predictions from the model for September, 2017 to December, 2017.

 

Last Thoughts

This has been a fun exercise using some very neat Bayesian methods. The fact that the results here are fairly similar to those derived from a much simpler method doesn't imply one is better than the other. However, I think Bayesian methods in particular are useful because they force us to think and and model uncertainty. As scholars, we often want to speak in terms of absolutes (there either is an effect or there isn't!). Rarely, in reality, are things so cut and dry. Homicides, as I've discussed, are pretty volatile and difficult to forecast. The model here shows us that we shouldn't be too certain about our estimates, and that we should always be thinking about the range of plausible possibilities.

 

Note: I plan on writing up a document containing all the code to generate these results, and the plots.