Geospatial Risk Prediction in Chicago

0.1 Visualizing Point Data

0.1.0.1 Plotting Point Data and Density

0.2 Creating a Fishnet Grid

To understand the spatial relationship between crime and the risk of crime, a fishnet is built to to treat crime risk as something that varies contentiously across time and space.

0.3 Data wrangling: Joining Property Damages to the Fishnet

0.3.0.1 Aggregate Damages to property points to the fishnet

0.4 Modeling Spatial Features

0.4.1 Feature Engineering: Nearest Neighbor

0.4.2 Feature Engineering: Measuring Distance to one Point

0.5 Exploring the spatial process of Property Damages

0.5.0.1 Local Moran’s I

0.5.1 Distance to Hotspots

0.5.2 Correlation Test

A correlation test was then used to compare count and nearest neighbor side by side. This helps give more insight on features that may predict property damages.

0.6 Cross Validation:Poisson Regression

0.6.0.1 Distribution of Property Damage Visualization

in order to test the model if the model is actually generalized across different various time and space, two different methods cross validation regression was used.This allows us to better observe if our model performs well on a different dataset, and on a different geographical context. In this case different neighborhoods will be use to observe how well the model performs. The function LOGO-CV or “leave one group out’ is used to best assess if the crime predictive model can learn what is happening in once area and and can be generalized on another area. A 100 fold test, or random K test was also used to to test each neighborhood.

4 regression will be used, two on just risk factors and the other two on risk factors plus the Local Moran’s features. Neighborhood name and police district will be used for spatial cross validation.

0.7 Accuracy & Generalzability

The figure below visualizes both the Random K folds on both spatial and risk factors and LOGO-CV on risk factors and spatial processes. The map demonstrates that adding the spatial process increases the error in areas that have high counts of property damages. ### Visualizing MAE for each fold

In both figures above and below it appears that including spatial process features does improve our model.The histogram demonstratives that as spatial features are added our model generalizes pretty well, particularity in hotpots areas. This can be seen in the kable below.

Regression	Mean_MAE	SD_MAE
Random k-fold CV: Just Risk Factors	1.08	0.76
Random k-fold CV: Spatial Process	0.87	0.70
Spatial LOGO-CV: Just Risk Factors	2.81	2.30
Spatial LOGO-CV: Spatial Process	1.78	1.68

0.7.1 Predicted & Observed Property Damage by Observed Damage Decile

The figure below displays how are model significantly over predicts in areas that have low property damage rates, and under predicts in areas with that have higher property damage rates.

0.7.2 Visualizing LOGO-CV Errors Spatially

The map below visualizes our models spatially, and as demonstrated, when spatial processes are introduced, errors are reduced.

0.7.3 MAE by Regression

Regression	Morans_I	p_value
Spatial LOGO-CV: Just Risk Factors	0.1944784	0.001
Spatial LOGO-CV: Spatial Process	0.1691450	0.008

0.8 Generalizability by Neighborhood Racial Context

The model on average under-predicts in majority non-white areas, and over predicts in majority white neighborhoods. Broadly speaking the model generalizes well within the racial context. But I wonder if this has to do with the type of crime chosen. More on this in the conclusion.

Mean Error by neighborhood racial context
Regression	Majority_Non_White	Majority_White
Spatial LOGO-CV: Just Risk Factors	-1.4905068	1.6167755
Spatial LOGO-CV: Spatial Process	-0.2136402	0.2063569

0.9 Kernel Density Visiuals & Analysis

The three maps below visualizes three Kernel density maps at three different scales.

0.9.1 Kernel Density & Risk Prediction Comparisons

Next, a goodness of fit indicator was generated to compare whether the 2017 kernel density or risk predictions capture more of the 2018 incidents of property damages. The map belwo illustrates that the model is decent at predicting property damage incidents in 2018.

0.10 Conclusion & Final Remarks

In summary this model seems mediocre at best when compared to traditional algorithms used by the Chicago police department(CPD). The model outperforms the traditional algorithm used by CPD in risk category 3 but in the higher risk category the traditional algorithm performs better. This tells me that CPD would probably not hire me because my model almost performed worse than theirs? I am not sure this would be a helpful tool in allocating police resources.

Honestly, I am okay with this conclusion for a number of reasons. One,the dependent of variable chosen;property damages, is a category of crime that is a bit niche and that more features would be needed to better predict criminal damages to property. Affordable housing, and noise complaints do not really explain the association to property damage incidents, and there will have to be more thought into what kind of features would more accurately predict this type of crime. I also think property damage is such an interesting crime to try and predict where the next occurrence will be.What are the chances of an incident of property crime occurring again in the same area spatially? Property crime also usually occurs in connection with other crimes like assault, or theft, so I think a different dependent variable altogether would perform better and make for a more interesting analysis. Although the model generalized decently across other context such as race, like the book points out, I too cannot be certain that the model chosen and the data set collected does not fall victim to selection bias.

Using biased features to predict on biased policing datasets feels icky. But I am hopeful that in their intrinsic disastrous nature, we can all come to the conclusion that hard tools like algorithmic prediction and modeling, paired with soft tools like social programming, economic interventions,and proper & intentional investments are both needed to better protect our most vulnerable communities. As technology continues to advance, and we employ more technologies and collect more and more data about communities of color, I really hope we reach that junction sooner rather than later.