Essays on Race in Policing

Essays on Race in Policing Hunter Johnson Claremont Graduate University: 2021 In recent years, a number of high-profile policing controversies have led to global indignation over racial disparities in policing and perceived police brutality. This paper explores three different dimensions of race in policing. The first chapter of this dissertation examines whether the presence of female and minority police officers affects the likelihood of police use of force and whether officers are more or less likely to use force against civilians of a different race. Focusing on a subset of 911 calls resulting in arrest, I use an instrumental variables estimation method with dispatch availability by officer race/gender as an instrument for the presence of different officer types. I find that the presence of a female officer significantly reduces the likelihood that force is used. Calls involving white officers and black civilians are significantly more likely to result in use of force. The second chapter uses data on 7.5 million police-civilian interactions made by 1,663 Texas Highway Patrol officers to estimate the impact of five mandatory police trainings on the racial composition of traffic stops and racial disparities in related outcomes. The five trainings considered are (1) Cultural Diversity; (2) Arrest, Search & Seizure; (3) Racial Profiling; (4) Traffic; and (5) Deescalation. We exploit quasi-random variation in the timing of when individual officers receive training and estimate a series of event study models. We find that training has little to no effect on policing behavior in terms of either racial composition or stops or related outcomes. In general, our findings cast serious doubt on the ability of policymakers to use training as an effective intervention for combatting longstanding disparities in law enforcement. The third chapter examines whether externally-imposed affirmative action plans designed to increase the shares of nonwhite and female police officers have impacted the rates of reported offenses and/or offenses cleared by arrest. Using a series of modern econometric strategies including difference-in-differences decomposition and generalized synthetic controls, we do not find a significant effect of court-imposed affirmative action plans on the rates of reported offenses or reported offenses cleared by arrest. We also consider whether unlitigated agencies change their practices due to the threat of litigation but are unable to identify causal evidence of such threat effects.


Acknowledgments
I have received a great deal of support in writing this dissertation. My deepest gratitude goes to my advisor Greg DeAngelo without whom I would never have gone down this path of researching crime and policing. Before Greg arrived at CGU at the start of my third year I was pursuing international money and finance and had never considered applied microeconomics. He helped me pursue a new path and showed me that I was capable of producing publishable research. Greg's guidance has been invaluable in helping me grow as a researcher.
I am deeply indebted to my other committee members, Matt Ross and CarlyWill Sloan, for their helpful comments on my research and the advice they have given me since they arrived at CGU. I would like to acknowledge Matt in particular for being a great coauthor on one of the chapters of this dissertation.
I am grateful to my other coauthors Steve Mello, Stephen L. Ross, and Maryah Garner, whose contributions to two chapters of this dissertation have been invaluable. In addition to being a coauthor, Maryah has been a great friend since we both started the Ph.D. program at CGU. I would like to thank Mark Hoekstra for his valuable comments on my job market paper. His understanding of police use of force and good judgment have been a great help in carrying out this research. I would also like to thank Hisam Sabouni for helping me get started with programming and mentoring me throughout the program.
I would like to extend my gratitude to the Institute for Humane Studies and the Horowitz Foundation for their generous financial support. The Computational Justice Lab at CGU has also been a tremendous help not only financially but also as a place to discuss important new research in the world of criminal justice. In particular, I would like to acknowledge Yunie Le, Rebecca Bommarito, Josie Xiao, Morgan Stockham, Minjae Yun, Dana Avgil, Rainita Narender, and Anuar Assamidanov.
Finally, I want to thank my parents and close friends for all the support they have given me over the years. Without their help, I never could have made it this far. v 1 Chapter 1: Does the Presence of Female and Minority Police Reduce the Use of Force?

Introduction
On May 25, 2020, George Floyd died while in the custody of four officers of the Minneapolis Police Department. Floyd, an unarmed black male, died while a white police officer kneeled on his neck for eight minutes and 46 seconds. The autopsy performed by the Hennepin County Medical Examiner's Office concluded that the cause of death was "cardiopulmonary arrest complicating law enforcement subdual, restraint, and neck compression." Video of Floyd's death recorded by bystanders led to widespread condemnation of the officers involved and worldwide protests in support of the Black Lives Matter movement. These protests have led to calls for wide-ranging police reform and heightened scrutiny on police use-of-force practices. Floyd's death as well as the deaths of other unarmed black men such as Michael Brown have also renewed attention on racial prejudice in law enforcement.
There is a widespread sentiment among the American public that law enforcement is characterized by racial prejudice, with blacks reporting much lower confidence in police than whites (Pew Research Center, 2016;Gallup, 2016). Around 67 percent of blacks report feeling that blacks are treated less fairly than whites by police, while only 40 percent of whites feel that blacks are treated less fairly (Gallup, 2016).
Despite widespread media coverage and public interest, however, there is a dearth of empirical literature on police use of force and potential racial disparities. A paper by Fryer (2018) uses data from four different samples across the United States to study racial bias in use of force. Fryer considers the following samples: New York City's stop-question-andfrisk (SQF) data containing detailed use of force descriptions; Police-Public Contact Survey (PPCS) data containing civilian-reported interactions with police including use of force; officer-involved shooting (OIS) data from three Texas cities, six Florida counties, and Los Angeles County; and Houston arrest data where lethal force is deemed "more likely to be justified," coupled with Houston OIS data. Analyzing each sample separately, Fryer finds that blacks and Hispanics are more likely to experience non-lethal uses of force, but he finds no evidence of racial bias in the use of lethal force when contextual factors are accounted for. Fryer argues that his findings are consistent with a model in which some police officers have preferences for discrimination. However, Fryer's work is complicated by a common difficulty in studying racial bias in policing, namely, the fact that officer decisions to initiate 1 contact are endogenous. Researchers do not observe cases in which officers choose not to make contact, which may be influenced by racial prejudice or other confounding factors (Dharmapala and Ross, 2004;Anwar and Fang, 2006). In addition, where Fryer seeks to overcome this difficulty by using a subset of Houston arrests as a benchmark case for where force might occur, he is only able to consider cases in which an officer discharges his firearm or his taser.
Recently, two papers have used samples of 911 calls to overcome the endogeneity of officer-civilian interactions. The use of this sample avoids selection effects associated with officer-initiated contact because 911 calls are made by civilians and officers are assigned to calls by dispatchers primarily based on location and availability. Weisburst (2019a) uses this type of sample to extend the work of Fryer by leveraging officer race in use-of-force incidents and by examining the relationship between arrests and use of force. Weisburst finds that conditional on arrest, use-of-force rates do not differ systematically by race. Hoekstra and Sloan (2020) analyze over 2 million 911 calls in two different undisclosed cities to examine racial disparities in use of force. The authors use quasi-random variation in officer race to examine whether white officers are more likely to use force than Hispanic or black officers, finding that white officers are considerably more likely to use force overall. Additionally, Hoekstra and Sloan compare how white officers increase use of force when dispatched to neighborhoods with greater "minority" presence compared to how minority officers increase use of force. They find that white officers "increase use of any force much more than minority officers when dispatched to more minority neighborhoods." In their paper, Hoekstra and Sloan rely on the assumption that 911 calls involve a civilian of the predominant race in the Census Block Group where the call was made. This allows the authors to analyze full samples of all 911 calls during their sample period for each city they consider.
In this paper, I contribute to the burgeoning use-of-force literature by examining racial disparities in use of force in the case study of Dallas, Texas. Following Weisburst (2019a) and Hoekstra and Sloan (2020), I utilize a sample of 1.4 million 911 calls made between 2014 and 2016 to avoid issues of officer-initiated contact. I begin by examining how officers of different races vary in their propensity to use force irrespective of civilian demographics. I leverage officer race to determine whether officers are more likely to use force against civilians of a different race when controlling for officer and call characteristics. Rather than making any assumptions about civilian race, I use a subsample of 911 calls resulting in arrest to benchmark cases where force is likely to occur. The arrest data identify civilian race as 2 well as other relevant characteristics used in my analysis. 1 In addition to looking at racial disparities in use of force, I also examine whether male and female officers are equally likely to use force.
This work builds on the larger literature of racial bias in policing which has mainly focused on vehicle drug searches (Anwar and Fang, 2006;Antonovics and Knight, 2009;West, 2018b) and stop-and-frisk policies (Weisburd et al., 2015;Goel et al., 2016). As with Fryer's analysis, this literature is complicated by the endogeneity of officer-civilian interactions. A recent paper by West (2018b) circumvents this problem by studying the racial bias of state troopers who are randomly dispatched to highway accidents. West finds evidence of racial bias; state troopers are more likely to issue citations to opposite race drivers. Others use the "veil of darkness" provided by dusk in which officers can not see the race of the driver whom they pull over (Grogger and Ridgeway, 2006;Horrace and Rohlin, 2016). My work contributes to this literature by examining the arguably more contentious issue of use of force.
My work also builds on the relatively recent literature on the impact of women in policing and resulting improvements to policing quality. Rabe-Hemp (2008) examines under what conditions female officers behave differently from their male colleagues. Namely, her research finds that female officers are less likely to use "controlling behaviors" such as threats, physical restraint, and arrest than their male counterparts. 2 Miller and Segal (2018) study the integration of women into policing and its effects on violent crime reporting and domestic violence, finding that increased female representation in law enforcement leads to increases in reporting rates of violent crimes against women, particularly domestic violence. Additionally, they find significant declines in rates of intimate partner homicide and non-fatal domestic abuse. My work sheds light on whether there are benefits to more gender diversity in policing resulting from reduced likelihood of use of force in civilian encounters with female officers.
For this analysis I use an instrumental variables approach with officer availability as an instrument for the race/gender of the officer who responds to a service call. From the dispatch data, I am able to determine which officers are available to respond to individual service calls. Using this information, I identify officers as "available" if they are working a 1 Civilian race is not known for all 911 calls. 2 Rabe-Hemp's identification strategy, selection on observables, may not adequately address the endogeneity of officer-assignment. In fact, male and female officers in Rabe-Hemp's sample differ in many observables. For example, female officers are more likely to respond to calls with female civilians and less likely to respond to calls where civilians are impaired (Rabe-Hemp, 2008). particular shift in a certain location and are not preoccupied with another incident. For each call, I know the percentage of available officers by race and gender. The assumption is that officer availability by race/gender is exogenous to the race/gender of civilians conditional on observable call characteristics. Throughout the paper, I pay close attention to how officers are sorted to different calls for service. I conduct balance tests to demonstrate that my results are not the result of selection effects.
I find no significant difference in the likelihood of use of force across officer race overall. Calls involving female officers are 3.9 percentage points less likely to end in use of force. My results indicate that the presence of a white male officer in a 911 call involving a black civilian substantially increases the likelihood that force will be used. Specifically, such calls ending in arrest involving white officers and black civilians are 3.3 percentage points more likely to result in use of force. These results are statistically significant at the 1 percent level. The results of my replication analysis are broadly consistent with the results of Hoekstra and Sloan (2020) in interactions with white officers and black civilians. In these cases, I find that opposite-race officers are between 0.34 and 0.40 percentage points more likely to use force in general, and between 3.2 and 3.5 percentage points more likely to use force in the arrest sample.
Because the DPD data contain arrest information, I am able to evaluate the reliability of conditioning on arrest by researchers. For example, Fryer (2018) uses arrest data as a benchmark for cases where lethal force is considered more likely to be justified. In my replication analysis of Hoekstra and Sloan (2020), conditioning on arrest yields similar results for interactions between white officers and black civilians, but substantially different results for interactions between white officers and Hispanic civilians. This suggests that arrest data may not be a reliable benchmark in all cases.
In Section 2 of the paper, I discuss data sources and present summary statistics. Section 3 presents the identification strategy and main results. The section begins with a description of DPD dispatch and arrest protocol and how it informs my identification strategy. I describe the IV models and present the main results. Section 4 consists of replication of Hoekstra and Sloan (2020) and an analysis of the effect of officer timing for robustness. Section 5 contains heterogeneous effects of race and gender interactions for officers and civilians. Section 6 concludes. 4

Data Sources
The Dallas Police Department (DPD) serves over 1.3 million residents of the city of Dallas, 42 percent of whom are Hispanic, 24 percent black, and 29 percent white. 3 In recent years, DPD has released several datasets in an effort to increase transparency. This project uses the publicly available "response to resistance" data which covers 2014 through 2016, which I refer to as the use-of-force data. The use-of-force data contain incidents in which officers report using physical force against civilians who are resisting officer authority. Officers who use force are identified by badge number. The data include descriptions of the type of force used, such as pushing, joint locks, armbars, taser use, and firearm use. The data also include descriptions of civilian resistance during the incident, such as active aggression, defensive resistance, and weapon display. Further, the data contain information on whether the officer or civilian was injured during the incident and if so, the severity and type of injuries.
I link the use-of-force data with dispatch data, arrest records, and officer demographic data obtained from records requests. Dispatch data contain information on each officer who responds to an incident including the time of dispatch, time en route, and time of arrival for each officer. The dispatch data also contain important call characteristics such as the reason for the call, call priority, date, and location. Dispatch data do not provide information regarding civilian characteristics. Arrest data contain information on each civilian arrested by DPD including race, gender, age, and charges. Arresting officers are identified by badge number. Use-of-force, dispatch, and arrest data are linked using a common identifier which I refer to as the incident number. Officer demographics including race, gender, age, and years of service are contained in separate data sets which I link to the others by badge number.
I consider only 911 calls that do not involve any officer-initiated contact. 4 My full sample period runs from the latter half of 2014 through 2016. 5 My data consist of 1,425,151 unique service calls. Of these calls, 37,682 involve the arrest of one or more individuals (i.e. 2.64 percent of all service calls result in arrest). There are 1,348 calls in which officers use force, or around 0.09 percent of all calls. Some incidents involve more than one civilian, and some civilians are involved in multiple separate incidents at different times.
3 Throughout our analysis, black, Hispanic, and white are treated as mutually exclusive categories. Unlike many other data sources such as the U.S. Census, DPD does not treat Hispanic as an ethnicity. Thus, civilians are never classified as "white Hispanic" or likewise for other races.
4 Most officers who respond to 911 calls are assigned by dispatchers. However, there are cases where these officers are assisted by others who are nearby and choose to respond of their own volition. I exclude these calls from my analysis.
5 Incident numbers for use-of-force incidents before July 2014 are coded differently and I am unable to link these incidents to dispatch and arrest data.
From the complete dispatch data I identify which officers are available to respond to calls at the time they are made. I identify officers as able to respond to a call if they are on shift when the call comes in and are not preoccupied with other incidents. This allows me to calculate officer availability rates for each call by officer race and gender. Availability rates are used as a measure of the likelihood that an officer of a given race or gender responds to a call. These rates are the basis of my IV approach and are described more fully in the following section. Table 1 presents summary statistics for call-level data divided into three samples. The first sample contains all 911 service calls in our data. The second contains only calls that result in arrest. The third contains only calls that result in force. Civilian characteristics are not available for all calls; this information is present only in arrest and use-of-force data. Significantly, more officers are dispatched to calls ending in arrest or use of force.

Dispatch and Arrest Protocol
My research design and identification strategy are based primarily on the dispatch protocol used by DPD in the handling of 911 calls. Information on the handling of calls for service comes primarily from contacts within DPD who are familiar with the dispatch process. This information is supplemented by publicly available information from the Dallas Public Safety Committee (2016) as well as the descriptions of DPD dispatch and officer assignment by Weisburst (2019b).
Citizens in Dallas who call 911 for police assistance are connected to a 911 operator. The 911 operator creates an active call report which summarizes the basic information of the call, including location, description of events, and time. Each call report includes a categorization of the incident type that is given by a dispatch code. These call categories are simple, literal descriptions of the event, such as "active shooter," or "domestic disturbance." 6 Some calls are sub-categorized further when this is relevant for dispatch purposes. For instance, a burglary may be coded as in progress, recent, or at an unknown time. The operator is also responsible for identifying the priority of each service call. Priority is measured discretely from one to four, with one being the highest priority and four being the lowest. 7 Level-one calls are emergencies such as shootings and kidnappings in progress for which the goal is officer response in eight minutes or less. Level-two calls are considered "prompt" calls such as robberies and criminal assaults. Here, the goal is officer response in 12 minutes or less. Level-three calls are general service calls such as intoxicated person, drug house, or recent burglary. The goal for level-three calls is officer response in 30 minutes or less. Finally, levelfour calls are considered non-critical incidents such as loud music, theft, criminal mischief, and animal complaints. The goal for level-four calls is officer response in 60 minutes or less. The 911 operator inputs all call information into the Computer Aided Dispatch System where it is electronically routed to the dispatch queue for use by dispatchers.
Dispatchers look at the queue of active calls and assign available officers to active incidents. Dispatchers are able to locate officers on a map and identify which officers are available based on whether the officers are involved with any other incidents at the time of the call. Officers are assigned primarily based on location and availability. If there are many more active calls than available officers, lower-priority calls are postponed until higherpriority calls are resolved. Dispatchers also decide the number of officers to assign to a call based on the type of call. More serious incidents (such as shootings and mental health calls) may involve the dispatch of multiple officers.
The primary responders to calls for service are patrol officers, who work 8-hour shifts in one of seven divisions in the city. 8 Divisions are further divided into 234 different beats. Patrol shift schedules are set once a year based on officer seniority. Patrols are typically conducted in police cars, with officers either alone or in pairs. Officers who are geographically close to an incident are more likely to be dispatched to the incident. Proximity is especially important for urgent calls.
Overwhelmingly, calls are assigned to patrol officers who work within the division where the call occurred. Figure 1 shows the proportion of calls the typical officer is assigned to in his nth most common division. This figure shows that around 95 percent of the time, the typical officer responds to 911 calls in his most common division. Officers are rarely pulled from their primary division to respond to 911 calls in other divisions. Similarly, Figure 1 shows the proportion of calls the typical officer is assigned to in his nth most common beat. Officers are dispersed much more broadly across the beats within a division. I take this as evidence that officers are assigned to calls as-good-as-randomly within a division. response to these calls. They are not reported in the data. When a dispatched officer arrives at the location of the incident, he gathers information and determines whether an offense occurred. The officer assists the complainant or victim.  In the event that an offense occurred, the officer writes an offense report and submits it for review with the DPD. Additional investigation may occur later if the offense is severe enough to warrant it. During the course of the initial officer response, the officer may identify a suspect and decide whether to make an arrest or not. Around 85 percent of all use-of-force incidents in the data occur during the course of an arrest.

Models
In the main analysis, I consider the effect of officer race and gender on the likelihood of use of force occurring during a 911 call. I use an instrumental variables approach to examine this outcome in two different ways. First, I consider differences in the propensity to use force across officer race and gender, irrespective of civilian demographics. Second, I consider how the likelihood of use of force varies when officer race differs from civilian race.
I use the dispatch availability of officers by race and gender as an instrument for whether an officer of that race or gender is present at an incident. More precisely, using the example of black officer availability, my instrument is constructed as the number of black officers available to respond to a service call divided by the total number of officers on shift as in Equation (1), where c indexes the service call. Officers are considered available if they are working during the shift that the call comes in and are not preoccupied with another call. The instrument is constructed similarly for Hispanic, white, male, and female officers. I use this instrument to deal with potential selection effects from officer sorting that can not be accounted for by control variables and fixed effects.
In support of the IV approach, I conduct balance tests to determine whether my instruments are correlated with other observable call characteristics. I regress each instrument on a set of nine call characteristics including the following: proportions of civilians by race/gender; civilian age; minutes between call and dispatch; call priority; median household income; and the proportion of residents with less than a high school diploma in the Census block group where the call occurred. Balance test results are presented in Table 2. Three of the 36 coefficients are statistically significant, a number consistent with random chance. To see how the model addresses endogeneity, first consider the following OLS regression which estimates the difference in overall use of force between calls in which a black officer is present and calls with no black officer: where c indexes the service call. 9 The outcome variable, F orceU sed c , is a binary variable equal to one when the call ends in use of force and zero otherwise. BlackOf f icer c is a binary variable which indicates whether a black officer is present at the call. X c is a set of call characteristics which includes the reason for the call, priority, and time between call and dispatch. The model also includes fixed effects for day of week and hour of day, as well as Division * Y ear * W eek * Shif t fixed effects. The choice of these fixed effects is informed by the dispatch protocol described above.
The OLS specification will not address any remaining endogeneity that comes from officer sorting that is not already accounted for by our controls. One possibility is that there is an unwritten DPD policy affecting how officers are assigned to calls, such as a proclivity to send female officers to domestic violence calls. There may be other unobservables that influence whether black officers are likely to be dispatched to a call. These unobservables may also influence the likelihood that force is used irrespective of officer characteristics. For example, it may be the case that black officers are more likely to sort into less confrontational calls, which in turn will have a lower likelihood that force is used.
The IV approach avoids potential selection effects by considering how black officers are more likely to be assigned to calls based on availability. I run the following first stage: where c again indexes the service call. The outcome variable BlackOf f icer is a binary variable equal to one if a black officer shows up to a service call and zero otherwise. BlackAvailRate is the instrument which is described above. Controls and fixed effects are the same as in Equation (2). I run the following IV model: where BlackOf f icer c represents the fitted values from the first stage.

Main Results
This section begins with results for the average effect of officer race and gender on use of force. These results are presented in Table 3. Panels A through D display results for the effect of each officer type on use of force. Each column displays the results of a different model. Column 1 presents results from the OLS estimations described in Equation (2) for each officer race and gender. Results for the first stage presented in Column 2 indicate that the instruments are strong. Reduced form and IV results are presented in Columns 3 and 4, respectively. The OLS results differ substantially from the IV results in sign, magnitude, and significance. Based on the main specification in Column 4, I do not find significant evidence that arrests involving the presence of black, Hispanic, or white officers vary in use of force. However, arrests with a female officer present are less likely to result in force.
Next, I consider the effect of officer race on use of force by civilian race. These models estimate the effect of officer race on whether force is used against a black, Hispanic, or white civilian. Each panel represents results for the presence of at least one officer of the given race (e.g. arrests in which at least one black officer is present). Each column represents a different dependent variable. For example, in Column 1 the dependent variable is a binary variable indicating whether force is used against a black civilian or not. Columns 2 and 3 present similar results for force used against Hispanic and white civilians, respectively.
These results suggest that there is a strongly significant increase in use of force against black civilians when white officers are present at arrest. There is also suggestive evidence that the presence of a Hispanic officer at an arrest reduces the likelihood that force will be used against a Hispanic civilian. These results are consistent with those of Hoekstra and Sloan (2020) and Fryer (2018).  officer race on use of force using data for 911 calls from two undisclosed cities. The first city is predominantly black and white, while the second city is predominantly Hispanic and white. The race of the civilian involved in each call is not known in either city. Hoekstra and Sloan address this problem by using the race of the Census Block Group from the call location. The authors "compare how the probability of using force changes for white officers dispatched to white and black neighborhoods, compared to that of black officers." They find in the first city that white officers use force 60 percent more often than black officers on average and that dispatching opposite-race officers increases use of force by 30 to 60 percent. In the second city, they find that white and Hispanic officers use force at similar overall rates, but white officers increase use of force more than Hispanic officers when dispatched to Hispanic neighborhoods. They estimate that dispatching an opposite-race officer in this second city roughly doubles the likelihood of use of force. Hoekstra and Sloan use a difference-in-differences model as their main specification. I modify their specification to account for the idiosyncrasies of DPD dispatch protocol as follows: where P roportionBlackCivilians c is the proportion of black civilians in the Census Block Group of the call location, Of f icer i is an individual officer fixed effect, and X c is a set of call and officer controls. 10 These results are presented in Table 5. Because Dallas is relatively more diverse than the cities used by Hoekstra and Sloan, I stratify the results by neighborhood type. In Panel A, I use a sample of neighborhoods in Dallas that are at least 80 percent black or white. In Panel B, I use a sample of neighborhoods that are at least 80 percent Hispanic or white. Because the DPD data contain arrest records, I am able to evaluate the reliability of conditioning on arrest when civilian race data is unavailable for all officer-civilian interactions. This sheds light on whether the endogeneity of conditioning on arrest results in significantly different findings. I present the results for both samples. Following Hoekstra and Sloan, I present the results using different levels of controls. Columns 1 and 4 include Division * Y ear * W eek * Shif t, DayOf W eek, and HourOf Day fixed effects. Columns 2 and 5 add officer fixed effects. Columns 3 and 6 include the full set of fixed effects and controls, adding call characteristics including call priority, latitude and longitude, time between call and dispatch, median household income, proportion with less than a high school diploma, and call type.
The results in Panel A for black and white neighborhoods are similar across specification. Estimates for the arrest sample are roughly an order of magnitude larger. These results are positive and significant at the one percent level. Consistent with the main results as well as the results of Hoekstra and Sloan, opposite-race officers are significantly more likely to use force. Specifically, opposite-race officers in these neighborhoods are between 0.34 and 0.40 percentage points more likely to use force in the full sample. Compared to the mean use of force, these results imply increases of 103 to 123 percent. For the arrest sample, opposite race officers are between 3.2 and 3.5 percentage points more likely to use force, an increase of 61 to 66 percent. The results in Panel B for Hispanic and white neighborhoods are similar across specification but not across sample. Columns 1 through 3 indicate that opposite-race officers are significantly more likely to use force by between 0.21 and 0.38 percentage points, an increase of 62 to 116 percent. However, in columns 4 through 6 the effects are significant but negative, suggesting that opposite-race officers reduce the use of force by between 3.3 and 4.6 percentage points, implying a decrease of 61 to 85 percent. These results suggest that conditioning on arrest can sometimes lead to different results.

Officer Timing
In this section, I consider whether the main results are driven by officers who arrive first or those who are dispatched as backup. "First arrivers" are identified by the minute they were dispatched to a 911 call. This allows more detailed consideration of the conditions surrounding scenarios that result in statistically higher use of force. Recall from the main results that calls with a female officer present are significantly less likely to result in use of force. This finding could be spurious if, for example, female officers are assigned as backup to relatively benign calls. The models used in this section are nearly identical to those used for the main results. However, the dependent variable here is equal to one only if force was used by an officer in the first group dispatched. The variable of interest here is whether a black/Hispanic/white/female officer is present in the first dispatched group of officers. The results of these models are presented in Table 6 and are nearly identical to those in Table 4. I find no significant difference in the likelihood of force across officer race overall. Calls involving female officers are less likely to result in use of force. Calls involving white officers significantly increase the likelihood of force against black civilians.

Heterogeneous Effects
In this section, I explore the heterogeneous effects of race and gender for both officers and civilians. Model specification is the same as in Equations (3) and (4) except the models now also account for officer and civilian gender. Results for the 36 unique types of interactions (e.g. black male officer and black male civilian, black male officer and black female civilian,...) are presented in Tables 7 through 9. Each table presents results for a particular officer race further divided into two panels by officer gender. Certain officer-civilian interactions occur infrequently. For example, only 1 percent of all arrests involve a black female officer and Hispanic female civilian. One should be cautious in drawing inferences from these relatively rare cases. Table 7 presents suggestive evidence that arrests involving black male officers and Hispanic male civilians are less likely to result in force, while those involving black male officers and white male civilians are more likely to result in force. These results are significant at the 10 percent level. There is no evidence that arrests involving black female officers vary in use of force across civilian race/gender. Results in Table 8 suggest that arrests with Hispanic male officers and black male civilians are more likely to result in force, the only statistically significant finding for Hispanic officers.    Table 9 presents results for white officers. These results suggest that arrests with a white male officer present are more likely to result in use of force against black males and black females. For arrests involving white female officers, force is more likely to be used against black males but less likely to be used against black females. Additionally, arrests with a white female officer present result in an increased likelihood of use of force against white males.

Conclusion
Recent high-profile policing controversies have heightened attention on police use of force and racial discrimination by law enforcement. Despite the widespread attention these issues receive from media and the public, little empirical research has been done on this topic. Two recent articles stand out. First, Fryer (2018) uses data from four different samples across the United States to study racial bias in police use of force. Fryer finds that blacks and Hispanics are more likely to experience non-lethal uses of force. However, he finds no evidence of racial bias in use of lethal force when contextual factors are taken into account. Fryer argues that his results are consistent with racial discrimination by some police officers. However, Fryer's paper is complicated by the fact that his samples contain incidents initiated by officers. The decision by officers to initiate contact with civilians may be motivated by racial bias, thereby complicating the analysis. Fryer seeks to address this problem by using a subset of Houston arrests as a benchmark case for situations where force might be expected. However, Fryer is only able to consider cases in which an officer discharges his firearm or his taser.
A recent paper by Hoekstra and Sloan (2020) examines officer use of force in two undisclosed cities, one which is mainly black and white and the other which is mainly Hispanic and white. Using a sample of 911 calls, Hoekstra and Sloan avoid the problems associated with officer-initiated interactions. They find that white officers are considerably more likely to use force overall. Additionally, they find that white officers increase use of force much more than minority officers when dispatched to minority neighborhoods. Specifically, they find that dispatching opposite-race officers to more minority neighborhoods increases use of force by 30 to 60 percent in the first city. In the second city, they find that dispatching an opposite-race officer roughly doubles the likelihood of use of force. This paper builds on the nascent use of force literature using a case study of 911 calls in Dallas, Texas from 2014 through 2016. Using an instrumental variables method with officer availability by race/gender as the instrument, I consider two main research questions. First, I examine whether officers vary in their overall propensity to use force by race and gender. Second, I examine the effect of the interaction of officer and civilian race on the likelihood of use of force. For the former, I find no significant differences in the propensity to use force by officer race. However, I find that calls involving female officers are less likely to result in use of force by roughly 3.3 percentage points. Results regarding the interaction of officer and civilian race are broadly consistent with those of Hoekstra and Sloan (2020). Namely, I find that calls involving white officers are significantly more likely to result in use of force against black civilians. Conditional on arrest, calls involving white officers are between 3.2 and 3.5 percentage points more likely to result in use of force (an increase between 61 and 67 percent over the mean). Unconditionally, replicating the results of Hoekstra and Sloan (2020), I find that calls involving white officers increase the likelihood of force by 0.34 and 0.4 percentage points (an increase of roughly 103 to 123 percent over the mean).
A unique contribution of this paper is the inclusion of arrest data. This arrest data allows me to shed light on the reliability of using arrests as a benchmark when civilian race is not otherwise known. In replicating the results of Hoekstra and Sloan (2020), I find that my results are generally consistent across sample for neighborhoods that are predominantly black and white. 11 These results are also consistent with the main results from my IV method. However, the results of the analysis change significantly when considering neighborhoods that are predominantly Hispanic and white. This suggests that researchers must be careful in using arrests as a benchmark case.
The main findings of this paper are consistent with the popular perception that police officers are racially biased in their use of force. However, this is not the only possible interpretation. In this paper, I identify the effect of the interaction of officer and civilian race on the likelihood of use of force. Without a better understanding of civilian behavior in these encounters, there is not sufficient evidence to support the claim that officers are racially biased. For instance, civilians may be more likely to resist arrest or exhibit confrontational behavior towards officers of a different race. To address this complication, more granular data including civilian resistance are necessary.

Introduction
The possibility that police treat minorities differently than their non-Hispanic white peers has recently become a source of political protest and social unrest in the United States. The sentiment that law enforcement in America is characterized by racial prejudice is widely held, with blacks reporting much lower confidence in police than whites (Pew Research Center, 2020). Police training is often cited by policymakers and the policing community as a key approach to reducing police violence and disparate treatment. The need for more and better police training is one of the few areas where advocates and the policing community can agree on a potential solution. However, there is surprisingly little empirical evidence on the impact of formalized training on police behavior. With an estimated average of $2.2 million spent by individual municipal police departments on training annually ($3.6 million by state police departments), the dearth of empirical research on what works is alarming (Bureau of Justice Statistics, 2006). Not surprisingly, there is even less evidence that training interventions could potentially help to mitigate disparate treatment on the part of police towards minorities.
A number of criminal justice scholars have examined specific one-off police training interventions focused on narrow subjects like procedural justice, social interaction and communication, deescalation, mental health, and others. 1 Although many of those studies have found training to be effective at changing an officer's self-reported attitude, few have examined actual enforcement outcomes or explored their usefulness at mitigating disparities. Recent studies of force training have examined enforcement outcomes and found that these trainings had a large and persistent impact on police behavior (Lim and Lee (2015); Ba and Grogger (2018)). Basic training interventions in the developing world have also been shown to reduce crime and improve overall police effectiveness (Banerjee et al. (2012); Garcia et al. (2013)). However, several recent RCTs conducted in the UK by criminal justice scholars have found that social interaction and racially biased search training did not affect the racial composition of traffic stops and searches (See McLean et al. (2020a) and Miller et al. (2020)). In our study, we focus on the Texas Highway Patrol (a very large U.S. policing agency) and examine a broad swath of legislatively-mandated trainings on cultural diversity, arrest protocol, search & seizure, racial profiling, traffic enforcement, and deescalation training.
Using data on 7.5 million police-citizen interactions made by 1,663 newly hired police officers from the 2010 to 2018 cohorts, we exploit quasi-random variation in the timing of when individual officers receive training and estimate a series of event study models. We use this framework to explore whether any of these trainings impacted related enforcement outcomes for a subsample of white non-Hispanic motorists as well as whether they affected the overall racial composition of traffic stops, searches, arrests, or citations. Across the board, we find that training has little to no effect on policing behavior both in terms of related enforcement outcomes and in terms of the racial composition of these outcomes. Given the substantial annual investment in training by the Texas Highway Patrol as well as other U.S. policing agencies, these findings are particularly important from a policy perspective. Further, there is no evidence supporting policy proposals aimed at combatting longstanding policing disparities through more police training. Given recent studies by West (2018a) and Horrace and Rohlins (2018) which find that search disparities decline with experience, our findings suggest that alternative interventions like apprenticeship models might be more effective relative to formalized training.

Texas Highway Patrol Data
In our study, we construct a unique linked dataset that allows us to track the enforcement impact of five legislatively mandated police trainings including (1) Cultural Diversity; (2) Arrest, Search, and Seizure; (3) Racial Profiling; (4) Traffic; and (5) Deescalation training. Through a series of public information requests, we have obtained data on over 25 million traffic stops, searches, and arrests made by approximately 6,500 Texas Department of Public Safety (DPS) peace officers from 2006 to 2019. This data represents the universe of traffic stops made by Texas Highway Patrol during this period. We have also obtained course-level training records from the Texas Commission on Law Enforcement (TCOLE) for individual police officers as well as their administrative data on rank, educational attainment, and demographics. Any peace officer is required to take these particular trainings at least once during their time in "basic proficiency" which ranges from a period of 2 to 6 years of service depending on when they graduate from the academy relative to the ongoing training cycle. The courses range in length and are typically taken in person at a training facility managed by the Texas Commission on Law Enforcement or the Texas Law Enforcement Education Administration. We examine how each of these five trainings impacts the overall racial composition of traffic stops as well as racial differences in related enforcement outcomes (searches, arrests, and numbers of citations).
Key to our analysis, we link the traffic stop data to 800,000 individual course enrollments for the entire career of all officers in our data. We also link the individual officers in our traffic stop data to human resource records that describe their demographics as well as their job title and pay over time. These data allow us to assess whether training is an effective intervention for combating disparities in the application of stops, searches, arrests, and citations by police on minority citizens. We focus on a subset of 1,663 officers from the 2010 to 2018 academy cohorts who were responsible for approximately 7.5 million traffic stops since being hired. 2 We limit our focus to officers observed making traffic stops for at least one year, and who made at least one stop per day on average. From the traffic stop sample, we omit stops of commercial vehicles and stops which resulted in an arrest being the only violation since these were potentially motivated by a call for service or a warrant. Our sample for our main analysis consists of over 4.1 million traffic stops made by officers who received their first Cultural Diversity training within the sample period. 3 The empirical strategy for our analysis relies on an event study design that examines enforcement outcomes immediately before and after a particular officer receives training relative to a control.

Officer Training
Our primary interest is the effect of mandatory Cultural Diversity training on the racial composition of traffic stops made by Texas Highway Patrol officers. Officers are required to take a minimum of eight credit hours of Cultural Diversity training during their "basic  Total  107  26  14  12  17  2  6  8  5  14  3  3   proficiency" period. The training is divided into two four-hour blocks. The first block consists of two modules that all officers are required to take: "Introduction to Diversity" and "Cultural Diversity". For the second block, officers must choose another four-hour course from the remaining set of topic modules in the Cultural Diversity curriculum which include: Generational Diversity; Workplace Diversity; Gender Diversity; and Law Enforcement as a Diverse Culture. 4 These trainings are designed to be "hands on, interactive, and scenario based," with scenarios oriented to day-to-day experiences both on and off the job (Texas Commission on Law Enforcement, 2008). In addition to the standard Cultural Diversity training, we also include closely related trainings such as "Cultural Awareness." We also examine the effects of four other mandatory trainings on related outcomes and the racial composition of stops. First, we consider the effect of Arrest, Search, and Seizure training for which officers are required to have a minimum of 15 credit hours. This course is designed to teach officers the grounds for reasonable suspicion, probable cause, detention, search, seizure, and arrest. Second, officers receive Racial Profiling training in several different courses, only some of which are mandatory. For instance, TCOLE course number 3256,  "Racial Profiling," is optional, but officers are required to have a minimum of two credit hours in TCOLE course number 3257, "Combined Asset Forfeiture and Racial Profiling." Third, officers are required to take a substantial amount of Traffic training. TCOLE trainings are unclear on the minimum number of credit hours officers must spend in the various Traffic trainings, but we observe that the average officer in our sample has taken 24 hours. Finally, officers are required to have a minimum of eight credit hours of Deescalation training that is designed to limit the use of force in public interaction, particularly in cases where individuals are in crisis or behaving erratically (Texas Commission on Law Enforcement, 2021).

Methods & Results
We divide our models and results into three different sections. In the first section, we examine the effect of the main training of interest, Cultural Diversity, on racial disparities. We begin by looking at the effect of Cultural Diversity training on the racial composition of traffic stops, specifically the share of blacks and Hispanics stopped. We then consider whether Cultural Diversity training affects racial differences in two post-stop outcomes: the probability of an arrest or search and the number of citations issued. In the second section, we examine whether any of the other trainings affect either the racial composition of stops or racial differences in post-stop outcomes. Finally, we examine whether training has any effect on post-stop outcomes using a sample of white motorists only.

Does Cultural Diversity Training Affect Racial Disparities?
For the estimates that examine the racial composition of enforcement outcomes for black motorists, we limit our sample to both black and white non-Hispanic motorists and estimate a linear probability model of the form where the dependent variable 1[black i,o ] indicates that the motorist stopped by officer o in incident i was black. The primary explanatory variables T t=1 1[run i,o,t ] are a series of event time dummies indicating that the stop was made nine weeks before the first time an officer receives training in a given area. The event time dummies are interacted with the variable credits t,o , which indicates the credit hours an officer has received in Cultural Diversity training at time t. 5 We include all stops made by the officer earlier than nine weeks before their first training as the omitted category but drop those later than nine weeks after the training. All specifications also include controls for date γ t , hour of day h i , county by route l i and the number of credits in other non-focal training courses λ t,o . Additional more rigorous specifications also include controls for motorist characteristics, officer characteristics, and officer fixed effects. Estimation for Hispanic motorists is analogous to the model described by Equation 6. Figure 1 presents point estimates and 95 percent confidence intervals from the application of Equation 6 to the legislatively mandated Cultural Diversity training. As discussed above, the baseline model includes controls for date, hour of day, county by route, and number of other training courses. The baseline model is represented in Figure 1 as Specification 1 (red), which estimates the effect of the training on the black (Hispanic) share of stopped motorists before and after the training. Specification 2 (green) introduces controls for motorist characteristics (gender and vehicle characteristics) and Specification 3 (purple) also includes officer characteristics (race and gender). Finally, Specification 4 (blue) replaces the officer characteristics with officer fixed effects. Although the likelihood that a stopped motorist is black appears to drop by a small amount after the training, the effect is statistically insignificant and the standard errors are consistently around about half a percentage point. Similarly, we find no evidence of an effect of Cultural Diversity training on the likelihood that a stopped motorist is Hispanic. We conclude that the Cultural Diversity training appears to have little if any impact on the overall racial composition of traffic stops. 6 Because results are similar across specifications we choose to use Specification 4 as our main model for subsequent results.
Next, we examine the effect of Cultural Diversity training on racial differences in poststop outcomes. Again, we limit the sample to both black (Hispanic) and white non-Hispanic motorists and estimate the following linear probability model: where the dependent variable 1[outcome i,o ] indicates that the motorist stopped by officer o in incident i experienced one of the two post-stop outcomes we consider. Namely, we explore whether training has any effect on the probability of arrest/search or number of citations. Figure 2 presents results for the effect of Cultural Diversity training on these other outcomes. We find no compelling evidence that Cultural Diversity training has any impact on racial differences in post-stop outcomes for either black or Hispanic motorists. Taken together with our previous results for the racial composition of stops, these results suggest that the effectiveness of Cultural Diversity training in mitigating racial disparities is limited. Again, estimation for Hispanic motorists is analogous to the model described by Equation (2).

Do Other Trainings Affect Racial Disparities?
Finding no effect for Cultural Diversity training, we consider whether other trainings affect relevant racial outcomes. In particular, we begin by estimating the effect of Traffic training on the racial composition of stops using Equation (1). These results are presented in Figure  3 where the effect of the Traffic training is represented in blue and the outcome is measured by the left axis. As with Cultural Diversity, there is no indication of any effect of Traffic training on the racial composition of stops for black or Hispanic motorists.
Although neither Cultural Diversity nor Traffic training appear to effect the racial composition of stops, one might reasonably expect that trainings do affect related post-stop outcomes. For instance, it is plausible that Arrest/Search training could affect the share of  (2) to the Arrest/Search and Deescalation trainings. These results are also presented in Figure 3, where Arrest/Search training effects are represented with red and Deescalation training effects are represented in green. The arrest/search outcome is measured by the right axis. We find no discernible impact of either training on racial disparities in the arrest/search outcome.
Finally, we estimate the effect of Racial Profiling training on the racial composition of traffic stops using Equation (1). These results are presented in Figure 4. As with Cultural Diversity and Traffic, we find no effect on the racial composition of stops. 7

Does Training Affect Outcomes for White Motorists?
Having found no evidence of training effects on racial outcomes, we consider whether these trainings have any effect at all on relevant outcomes. We do so by restricting our sample to  (3) where 1[Arrest/Search i,o ] is an indicator for whether an arrest or search occurred during the stop. Results for these models are presented in Figure 5 for Arrest/Search and Traffic trainings. The trainings do not appear to have any effect on the arrest/search outcome for white motorists.

Parametric Models
Finally, we estimate a series of parametric models to obtain more precise estimates of the null effects from the preceding models. We estimate three categories of models. The first is the Event Study model given by Equation (4): which eliminates the 1[run i,o,t ] term from Equation (1) and ignores the possibility of a pretrend or trend. This model introduces the term 1[T reatment i,o,t ] which is equal to zero prior to the training week and 1 after an officer receives training. Second, we estimate the following Pre-Trend model: where the term 1[P retrend i,o,t ] is equal to zero prior to the training week window, increments by one for each of the nine weeks prior to the training week, and remains at nine in the following weeks. Third, we estimate the following Trend model: where 1[T rend i,o,t ] is equal to zero prior to the training week window, increments by one for each of the 18 weeks of the training window (nine weeks before training and nine weeks after), and remains at 18 for the following weeks. 8 Results for these models are presented in Table 3. As expected, the treatment effects are more precisely estimated and still nearly all null. However, there is suggestive evidence of a pre-trend in the model estimating the effect of Cultural Diversity training on stops of Hispanic motorists.

Discussion and Conclusion
In this paper, we use a sample of 7.5 million police-citizen interactions made by 1,663 Texas Highway Patrol officers to explore the effects of five mandatory police trainings on a variety of enforcement outcomes. We exploit quasi-random variation in the timing of when individual officers receive training and estimate a number of event study models. We begin by focusing our attention primarily on the eight credit hour Cultural Diversity training and its effect on the racial composition of traffic stops and other racial outcomes involving arrest and search and the number of citations issued. Finding no evidence that Cultural Diversity training affects racial outcomes, we ask whether other trainings (namely, Arrest/Search, Racial Profiling, Traffic, and Deescalation) affect racial outcomes. Again, we find no evidence that training affects either the black (Hispanic) share of stopped motorists or racial differences in the probability of arrest/search or the number of citations. Further, we find no evidence of an effect of training on these training-related outcomes for a subsample of white motorists.
Several possible explanations could explain our null results. Each possibility has different implications for policy. First, it may be the case that officers are simply not receiving enough training to impact their behavior. For each of the trainings considered here (except traffic), officers are required to take between 8 and 13 credit hours. In practice, this may be between one and two days worth of training. If an officer is racially biased, the length and intensity of training may be insufficient to overcome biases developed during the course of his lifetime. It is possible that more training is required to have a meaningful impact on officer behavior. Importantly, we cannot determine whether officers are racially biased here. Observed racial disparities in policing outcomes do not necessarily imply racial bias if motorists in our sample vary by race in their propensity to violate traffic or criminal law. Though we cannot test this hypothesis directly, it has important policy implications. If officers are already enforcing the law impartially and continue to do so after receiving training, we should not expect to observe any impact on racial outcomes.
A third possibility, consistent with the previous two, is that officers do not learn primarily by training and instead learn by experience or mentoring. West (2018a) and Horrace and Rohlins (2018) observe that search disparities decline with experience. In light of our null results for police training, policymakers seeking to mitigate racial disparities might consider alternatives such as apprenticeship models.
A final possibility that must be addressed is that the specific training models used by the Texas Commission on Law Enforcement are not effective. In this case, more effective trainings could possibly be devised. Given the substantial amount of investment and heightened public interest in police training, our findings suggest that policymakers should exercise caution in expanding training as a means of mitigating racially disparate treatment.

Chapter 3: Estimating Effects of Affirmative Action in Policing
Coauthored with Maryah Garner and Anna Harvey

Introduction
In the aftermath of Michael Brown's death in Ferguson, Missouri, renewed attention was directed to the demographic composition of law enforcement agencies. In the summer of 2015, 2 out of 3 residents of Ferguson, but only three out of its 53 police officers, were black (U.S. Department of Justice, 2016). This disparity gave rise to questions about whether increasing the proportion of black officers on the Ferguson police force, or in law enforcement agencies more generally, would lead to different public safety outcomes. On the one hand, increasing the proportion of nonwhite police officers may lead to an increased acceptance of policing, fewer instances of the use of force, and fewer citizen complaints in nonwhite communities (Tyler, 2005), leading in turn to increases in the reporting of crime by nonwhite victims, and in the willingness of civilians to cooperate with police investigations into these crimes (Miller and Segal, 2018). Nonwhite officers may also exert greater effort, or be better equipped, to clear crimes in largely nonwhite neighborhoods. These mechanisms might lead to decreases in nonwhite crime victimization. On the other hand, increasing the proportion of nonwhite police officers may lead to decreases in policing quality if nonwhite applicants to police agencies are less qualified than white applicants, and/or if the effort of white officers is reduced due to morale effects (Lott, 2000). These mechanisms might lead to increases in nonwhite and/or white crime victimization. Detecting the presence of these potentially offsetting and racially heterogeneous effects is a challenging exercise.
Existing work on this question has generally not attempted to identify the specific causal pathways through which agency racial composition may affect policing outcomes, seeking instead to estimate average effects of agency racial composition on outcomes such as reported offense and arrest rates [Lott, 2000, McCrary, 2007]. Yet analyzing the effect of the racial composition of law enforcement agencies on these outcomes is complicated by the endogeneity of hiring and retention practices to other agency-specific factors that may also affect outcomes. To achieve identification, researchers have sought to leverage the incidence and timing of affirmative action litigation [Lott, 2000, McCrary, 2007. This work has yielded inconsistent results regarding the effects of racial diversity on policing outcomes. Existing work has not, however, implemented recent econometric advances in difference-in-differences estimation.
In this paper, we replicate and extend the work of Lott (2000) and McCrary (2007) using modern difference-in-differences methods, including the difference-in-differences decomposition method developed by Goodman-Bacon [2019] as well as the generalized synthetic controls method developed by Xu [2017]. Ultimately, like McCrary (2007), we do not find a significant average effect of court-imposed affirmative action plans on the rates of reported offenses or reported offenses cleared by arrest. We also extend the effort of McCrary (2007) to analyze whether unlitigated agencies change their practices due to the threat of litigation, raising concerns of stable unit treatment value assumption (SUTVA) violations. We estimate whether the total number of agencies receiving court-imposed affirmative action plans at time t has a differential impact on the rates of reported offenses or offenses cleared by arrest for agencies that will never be litigated, relative to agencies that will eventually be litigated. Like McCrary (2007), we find little evidence of spillover effects. Finally, we return to the question of identifying the specific and possibly racially heterogeneous causal mechanisms linking affirmative action litigation and public safety outcomes, suggesting that this would be a productive avenue for future research.

Background
Police departments have experienced some of the most aggressive affirmative action programs ever implemented in the United States [McCrary, 2007, Miller andSegal, 2012]. Beginning in the late 1960s with a number of employment discrimination lawsuits, federal courts began mandating affirmative action plans with the intended effect of increasing the shares of nonwhite and female police officers. Court-imposed affirmative action plans often take the form of hiring quotas, but also may affect standards for promotion. Some police departments are still under affirmative action plans today, often from court-imposed plans going back to the 1970s. The justification for such affirmative action plans may be to rectify past discrimination, and/or to promote the compelling government interest in increasing the effectiveness of police departments at detecting and interdicting crime. There are several reasons one might expect more racially diverse police departments to be more effective at policing. It has long been recognized that minority groups tend to be suspicious of police and the criminal justice system more generally (U.S. Kerner Commission, 1968). 1 Further, as Donohue and Levitt (2001) point out, "conflicts between police and citizens have been the flashpoint for virtually every recent urban riot." This statement remains true in more recent years (for instance, the 2014-2015 Ferguson riots and the 2015 Baltimore riots). Lack of trust may lead nonwhite civilians to be less likely to report crimes to police, to cooperate with investigations, and to take instructions from law enforcement, in the presence of a largely white police force. Increasing the proportion of nonwhite officers in racially diverse communities may lead to nonwhite victims becoming more likely to report crimes, and to members of these communities becoming more likely to cooperate with the police to help solve crimes reported by nonwhite victims.
Nonwhite police officers may also exert greater effort to detect and clear crimes occurring in nonwhite neighborhoods. Nonwhite officers may also better understand the cultural norms of predominantly nonwhite communities. One might also expect police officers to be less likely to racially discriminate against members of their own race. Nonwhite police officers may also perceive largely nonwhite neighborhoods to be less hostile, relative to white police officers. 2 These mechanisms may contribute to increased reporting by nonwhite crime victims, and to decreased incidence of nonwhite crime victimization.
On the other hand, there are potential adverse effects of more racially diverse police departments. In order to implement court-imposed affirmative action plans, some police departments have had to change entrance standards. Police force entrance requirements vary by location, but generally contain several basic components, including criminal background checks and physical examinations. Prior to the wave of employment discrimination lawsuits in the 1970s, it was standard to use entrance examinations that tested the cognitive abilities of prospective officers. These exams tested aptitudes pertaining to reading comprehension, verbal reasoning, analogies, and, in some cases, I.Q. Black applicants historically performed worse than white applicants on these entrance exams (McCrary, 2007). Federal courts have intervened in the examination process since the 1970s, often mandating that entrance standards be changed if they have a disparate impact on a particular group and the standards are not shown to relate to job performance. To deal with pressures from the federal judiciary, some police departments have removed cognitive tests altogether in order to increase nonwhite recruitment. Others have simply reduced standards (Lott, 2000). The attitudes. The atmosphere of hostility and cynicism is reinforced by a widespread belief among Negroes in the existence of police brutality and in a 'double standard' of justice and protection-one for Negroes and one for whites." 2 Groves and Rossi (1970) suggest that the perceptions of hostility toward police are projections of the fears and prejudices of white police officers themselves. relevant question to ask is whether these aptitude tests reliably screen applicants for qualities important to police work. If affirmative action litigation leads to the lowering of entrance standards, and those standards are reliable indicators of future job performance, we would expect affirmative action to lead to worse policing outcomes.
Many employment discrimination lawsuits were brought by private litigants beginning in the late 1960s. The U.S. Department of Justice (DOJ) became involved in these discrimination lawsuits after 1972. If a court found that a police department entrance exam disproportionately affected black applicants and was not reliably related to job performance, the court would order that the department devise a new test that either did not disparately affect black applicants or was job-related. There would typically be a one-to three-year lag before a hiring quota was imposed (McCrary, 2007). In some cases, hiring quotas would end once a goal had been reached (for instance, a police department might be required to hire a certain percentage of black officers each year until the agency reached a target proportion of black officers in its force). In other cases, hiring quotas lasted until terminated by the judiciary.
Several papers have considered the employment effects of affirmative action litigation. Lott (2000) and McCrary (2007) both find significant increases in black police employment following affirmative action litigation, with McCrary finding a post-litigation 14 percentage point gain in the fraction of black officers among newly hired officers. Miller and Segal (2012) also find persistent and significant employment effects for black police officers as a consequence of litigation. Their finding holds even for departments for which affirmative action is eventually terminated. Miller and Segal (2012) also find a significant divergence in black employment between agencies that continued affirmative action and those that ended it, with larger persistent gains in black employment in the former agencies. Additionally, Miller and Segal (2012) find that there is an important distinction between simply experiencing litigation, and actually having a court-imposed affirmative action plan. Departments that are litigated but not required to implement affirmative action see increases in black employment, but at a lower rate than those with court-imposed affirmative action plans.
In addition to looking at the impacts of affirmative action on employment, several papers discuss how increased diversity affects policing outcomes. Donohue and Levitt (2001) examine how the racial composition of police departments affects racial patterns of arrest, using the racial composition of fire departments as an instrument for the racial composition of police departments. They find that increases in the percentage of black police officers lead to higher arrest rates for whites but not for nonwhites. Similarly, increases in the percentage of white police officers lead to higher arrest rates for nonwhites, but not for whites. These patterns particularly hold for minor offenses. Lott (2000) and McCrary (2007) consider how affirmative action litigation affects reported crime and arrest rates. Using logged annual rates of per capita reported violent and property crime between 1987 and 1994 for a sample of 495 cities, Lott (2000) estimates the impacts of 19 consent decrees signed by the Department of Justice and a city's policing agency and still in force by 1987. In both reduced form and instrumental variable models, Lott finds that consent decrees lead to substantial increases in reported crime rates, and weakly lead to decreases in arrest rates. He interprets these effects as being the result of lower hiring standards.
However, McCrary (2007) suggests that this mechanism is implausible for several reasons. First, consent decrees typically led to hiring practices in which applicants were evaluated relative to other applicants of the same race. Under these hiring practices, no reduction of standards was required for nonblack applicants. Second, while hiring standards may have been reduced in some cases to eliminate disparate impact, in many cases entrance exams were modified to be more closely related to job performance. Third, looking at test score distributions for the New York City Police Department, McCrary (2007) finds that hiring quotas impacted test scores of new hires "only minimally." Further, the consent decrees used in Lott's sample include neither affirmative action plans resulting from private litigation nor externally-imposed affirmative action plans that were terminated prior to 1987.
McCrary (2007) estimates event study models of the effects of affirmative action litigation on reported offense and arrest rates. Using a sample of 314 large municipal police departments, he finds little evidence that affirmative action litigation impacted reported city-level crime and arrest rates. He suggests that there may be "a complex series of effects that offset one another." However, McCrary (2007) does not distinguish between two types of litigated cities: those that implemented court-imposed affirmative action plans, and those that did not. In addition, difference-in-differences methods developed after the publication of McCrary (2007), including the difference-in-differences decomposition method developed by Goodman-Bacon (2019), as well as generalized synthetic controls developed by Xu (2017), may lead to different insights. Miller and Segal (2018) use affirmative action litigation, data from the National Crime Victimization Survey, and FBI data on intimate partner homicides to examine how increasing female representation among police officers impacts the incidence of domestic violence and intimate partner homicide, and the reporting of domestic violence. They find that violent crimes against women are reported at higher rates when female representation increases in law enforcement agencies. Further, greater female representation in policing agencies leads to significant declines in the rates of intimate partner homicide and non-fatal domestic abuse. In both instrumental variables and reduced form models, they find similar effects from affirmative action litigation. They find no effects of affirmative action litigation or increased female officer shares on the reporting or incidence of crimes committed against male victims.
Our paper primarily focuses on replicating and extending the estimates of the impacts of affirmative action litigation on crime reported by Lott (2000) and McCrary (2007). However, in the discussion section we return to the question of identifying the causal pathways by which race-based affirmative action may affect policing outcomes, including possibly racially heterogeneous treatment effects.

Affirmative Action Litigation Data
Our data on affirmative action litigation are sourced from Miller and Segal (2012), who constructed the most complete currently available legal database of affirmative action litigation involving police departments. Previous data sets, such as the one used by McCrary (2007), did not distinguish between unsuccessful litigation and successful litigation leading to courtimposed affirmative action plans. Additionally, the data set constructed by Miller and Segal (2012) looks at affirmative action litigation specifically addressing the employment of police officers, rather than the employment of all police department employees (which includes clerical and janitorial positions).
Miller and Segal constructed their litigation data by first looking at employment data from confidential EEO-4 reports filed with the Equal Employment Opportunity Commission (EEOC) between 1973 and 2005. They examined reports from 479 of the largest state and local law enforcement agencies in the United States. Miller and Segal then searched for legal records pertaining to discrimination in employment for each agency using the LexisNexis and Westlaw federal case databases. They gathered information on the actual litigation, including whether affirmative action was implemented and when it ended (if applicable), and the protected group. This information was cross-referenced with data from the U.S. Department of Justice (DOJ) and the databases used in McCrary (2007) and Lott (2000). Miller and Segal (2012) find that of the 479 agencies examined, 140 affirmative action cases were brought either by private plaintiffs or by the DOJ between 1969 and 2000. Of the 140 agencies which experienced litigation, 117 saw the implementation of affirmative action plans, and 23 saw litigation that did not result in court-imposed affirmative action. Among the 117 agencies that experienced court-imposed affirmative action, 67 of these agencies saw the eventual termination of the program. The mean duration of the 67 plans that terminated was 16 years. Miller and Segal (2012) also report that 96% of the affirmative action plans for which the protected group can be determined involve the employment of black officers.

UCR and CPS Data
We match the agencies in Miller and Segal's affirmative action database to the agencies reporting annual crime and arrest data in the FBI's Uniform Crime Reporting (UCR) program. 3 Agencies that report zero crimes or arrests in any year are treated as missing data in that year; all agencies missing annual data are dropped from the sample. We identify the county within which each agency is located, and match counties to annual metropolitan statistical area (MSA) population and demographic data sourced from the U.S. Census Bureau's Current Population Survey (CPS). 4 We are left with a sample of 99 agencies in the Miller and Segal affirmative action database that are located within an MSA and that consistently report annual crime and arrest data in the UCR between 1964 and 2011. Of these 99 agencies, 27 agencies implement courtimposed affirmative action programs at some point. Among these 27 agencies, 12 agencies have programs that terminate during our sample period, and 15 have programs that do not. The mean time treated is approximately 25.7 years (20.4 years for agencies whose affirmative plans end; 30.8 years for agencies whose affirmative action plans do not end). Figure 1 reports a histogram of the years in which affirmative action plans were imposed. Figure 2 shows where treated and untreated agencies are located.
Following Lott (2000) and McCrary (2007), we focus on two sets of outcome variables: the natural logarithms of rates of reported violent and property crime offenses per 100,000 in population, and arrest rates, defined as the numbers of violent and property crime offenses cleared by arrest, divided by the numbers of violent and property crime index offenses, between 1964 and 2011. Violent crime is the aggregation of four index crimes in the UCR Figure 1: Timing of Court-Imposed AA Plans Figure 2: Location of Treated and Untreated Agencies data: murder, rape, robbery, and aggravated assault. Property crime is the aggregation of burglary, theft, and motor vehicle theft. Summary statistics for the departments in our sample are presented in Table 1.
We use the MSA-level demographic data from the CPS only to match departments in the generalized synthetic control analyses. Summary statistics for the demographic variables for all, treated, and untreated agencies are presented in Table 2.

Two-way Fixed Effect Difference-in-Differences Models
The canonical difference-in-differences model compares pre-post changes in outcomes in treated units to pre-post changes in outcomes in untreated units for a single treatment [Goodman-Bacon, 2019]. In our data, 72 agencies are untreated units not subject to externally imposed affirmative action and 27 agencies are treated. However, treatment timing varies across treated units. The accepted empirical strategy in this context is the two-way fixed effect difference-indifferences model (2WFE DD), as in Equation (1): where T reat is a binary variable equal to one when a unit is subject to a court-imposed affirmative action plan and equal to zero otherwise; θ t is a time vector containing indicators for the 48 years from 1964 to 2011; and α i is a unit vector containing indicators for the 99 agencies. Standard errors are clustered by agency. The average treatment effect on the treated (ATT) is given by β 1 . We estimate the ATT of externally-imposed affirmative action on logged violent and property crime rates per capita and on violent and property crime  arrest rates between 1964 and 2011. The 2WFE DD model specified above captures average treatment effects on the treated, but does not allow us to consider time-varying treatment effects. There are several reasons to expect the effects of affirmative action plans to vary over time. First, court-imposed affirmative action plans often take time to implement. Further, once implementation begins it takes time for the racial composition of police departments to change significantly, due to the nature of hiring quotas. To account for potentially time-varying treatment effects, we implement difference-in-differences decomposition [Goodman-Bacon, 2019].

Difference-in-Differences Decomposition
The 2WFE DD estimate is composed of a weighted average of treatment effects estimated from a series of 2x2 treatment/control groups, some of which compare agencies treated at the same time to untreated agencies, and some of which compare agencies treated at the same time to agencies treated at another time (earlier or later). 5 As reported in Figure 1, there are 12 timing groups in our data, or groups of agencies which experience the imposition of externally-imposed affirmative action in the same year. There are thus 144 distinct 2x2 treatment/control comparison groups from which the 2WFE DD estimate is constructed: 132 groups in which earlier-treated agencies are compared to later-treated agencies, or vice versa, and 12 groups in which treated agencies are compared to untreated agencies. In the presence of time-varying treatment effects, comparisons between earlier and later treated units may introduce bias into the 2WFE DD estimate. The extent of the bias depends on the share of the 2WFE DD estimate that is derived from these earlier-later comparisons, which in turn depends on group size and the variance of the treatment [Goodman-Bacon, 2019].
Goodman-Bacon [2019] has developed a method to decompose the 2WFE DD estimate into the 2x2 weighted estimates from which it is derived. Using this difference-in-differences decomposition model, we can uncover the extent to which the 2WFE DD estimate depends on 2x2 DD estimates which compare earlier to later treated agencies. The Goodman-Bacon decomposition model is currently only available for strongly balanced panels in which treatment only changes from 0 to 1 over time. To estimate the decomposition model, we define treatment as a binary variable that is equal to one in all years after an affirmative action plan is imposed on an agency, and is equal to zero otherwise. 6 We report both the 2WFE DD estimate for this model, as well as the DD estimates and weights for the three categories of treatment/control comparison groups from which the 2WFE DD estimate is derived.

Results for DD Models
Results for our difference-in-differences models for offense and arrest rates are presented in Table 3. The DD Model reports the 2WFE DD estimate where treatment is defined as years in which an agency is subject to an externally-imposed affirmative action plan. The GB  Model reports the 2WFE DD estimate where treatment is defined as all years subsequent to the imposition of an affirmative action plan. For the GB Model, we also report average DD decomposition estimates and weights for 2x2 treatment/control groups. 7 Each model is estimated separately using four different dependent variables: the natural log of per capita violent crime offenses, the natural log of per capita property crime offenses, violent crime arrest rates, and property crime arrest rates. Neither model yields any statistically significant results, for any outcome variable. These results support the findings of McCrary (2007) and contrast with the findings of Lott (2000). The Goodman-Bacon decomposition estimates also allow us to see that there is little evidence that bias introduced by time-varying treatment effects is driving the null 2WFE DD estimates. The latter are largely driven (total weight = 89%) by comparisons between treated and untreated agencies, as in the canonical 2x2 DD model. Although there is some evidence of time-varying treatment effects, with 2x2 DD estimates signed in the opposite direction for some timing groups, relative to other treatment/control groups, little weight is placed on the timing group 2x2 DD estimates in the construction of the 2WFE DD estimate.

Duration Models
We also estimate models of duration or dosage effects of externally imposed affirmative action plans. For these models we replace the variable T reat with the variable Years of Treatment  Table 4 summarizes results for the duration models described by Equations (2) and (3). Standard errors are clustered on agency.
(Y OT it ) in our 2WFE DD model. In our first duration model we control for agency and year fixed effects: where β 1 is the effect of an agency being subjected to one more year of affirmative action.
In our second duration model, we add the vector λ it , which captures agency-specific linear time trends in offense and arrest rates. This model takes the form: Table 4 reports estimates from Equations (2) and (3) for logged offense rates. We find no effects of years of affirmative action exposure on logged offense rates, in any model. Table  5 reports estimates for Equations (2) and (3) for violent and property crime arrest rates. We again find no statistically significant effects of externally imposed affirmative action on either violent crime or property crime arrest rates. These estimates are generally consistent with our other DD estimates, as well as with the results reported by McCrary (2007). Although we are able to control for an extensive amount of exogenous variation using agency and year fixed effects, there may still be selection effects confounding our estimates. The parallel trends assumption, on which DD models depend, states that offense and arrest  Table 5 summarizes results for the duration models described by Equations (2) and (3). Standard errors are clustered on agency.

Results for Duration Models
rates within treated agencies should have changed at the same rate as offense and arrest rates within untreated agencies, had treatment not occurred. Yet if there was nonrandom selection into treatment, and treated agencies differ from untreated agencies, the parallel trends assumption may be violated. Since the 1990s, synthetic control methods have been used to construct more appropriate comparison units for treated units that differ from untreated units (Abadie et al., 2010;Card, 1990). We likewise use the generalized synthetic control method [Xu, 2017] to construct more appropriate control units for treated agencies.

Generalized Synthetic Control
The generalized synthetic control (GSC) method introduced by Xu [2017] addresses the case when treatment is imposed at different times for different units. This approach allows for multiple treated units and variable treatment periods. 8 The GSC method allows us not only to match units on pretreatment observables but also to model unobserved time-varying heterogeneities using interactive fixed effects. GSC first estimates an interactive fixed effects (IFE) model using only the police departments that were never treated and obtains a fixed number of time-varying coefficients (latent factors). It then estimates department-specific intercepts (factor loadings) for each treated police department by linearly projecting pretreatment outcomes for treated units onto the space spanned by the factors. Finally, it generates synthetic control units based on the estimated factors and factor loadings. The method is described as a "bias correction procedure for IFE models when treatment data is heterogeneous across units." 9 The gsynth package requires at least seven years of pretreatment data for each treated agency, and two of our treated agencies did not meet this requirement. We are left with 25 treated agencies for the GSC models. The pretreatment covariates used in constructing the synthetic controls are the following: proportion black, share of population age 18-21, share of population age 21-24, proportion in the labor force, and percent below the poverty line.

Results for Generalized Synthetic Control Models
GSC models allow us to estimate treatment effects as they evolve over time, without imposing linearity. We display our results as graphs of the average treatment on the treated (ATT) over time. Each ATT is calculated by taking the difference between the treated unit and the synthetic control for a given unit, and then averaging the difference across all treated units. We do this each year over a 27 year period. Results are displayed in Figure 3. Each figure shows seven years of pretreatment effects and 20 years of posttreatment effects. The 90% confidence interval is in gray. Confidence intervals are constructed using parametric bootstrapping. 10 Figure 3 reports the estimates for the natural logs of violent and property crime per 100,000 in population and the estimates for violent and property crime arrest rates. From these estimates, we cannot infer that court-imposed affirmative action plans led to changes in either offense or arrest rates. Overall, these results corroborate the findings from our DD models, which are generally in agreement with McCrary (2007).

SUTVA Violation Test
A final concern is that the law enforcement reaction to the threat of potential affirmative action litigation may have violated the stable unit treatment value assumption (SUTVA). As police departments across the country began experiencing litigation for discriminatory hiring practices, other departments may have made preemptive changes to avoid litigation. Miller and Segal (2012) find that police departments that were unsuccessfully litigated for affirmative action experienced an increase in black employment prior to litigation. This suggests that departments whose leaders believed they were likely to experience litigation McCrary (2007) estimated whether agencies were adjusting their behavior due to the threat of litigation by matching unlitigated agencies with litigated agencies within the same federal district. He assumed the matched unlitigated agencies to be threatened agencies. He then assigned to the threatened agencies neighbor-litigation dates which were equivalent to the litigation dates of their counterparts. He identified unthreatened agencies as those agencies that had no litigated agencies within their federal district. Using an event study, he estimated the effect of neighbor-litigation dates on the minority employment gap for both threatened and unthreatened agencies. He found that there was not a significant  (2007) looked only at litigation, not at successful litigation resulting in court-imposed affirmative action. Including unsuccessful litigation in these analyses would likely attenuate the results. Second, geographical proximity might not be a good matching mechanism.
To further explore the possibility of a SUTVA violation, we examine the effect of the total number of agencies receiving court-imposed affirmative action plans on offense and arrest rates both for agencies that will never be litigated and for agencies that will eventually be litigated. 11 In order to isolate the pretreatment effect, we drop litigated agencies from our sample once they have been litigated. Since the total number of agencies receiving court-imposed affirmative action plans at any point in time does not vary across agencies, we cannot use time fixed effects to control for the trends in offense and arrest rates over time. Looking at Figure 4, we can see that there are obvious non-linear time trends in both violent and property crime rates. To control for these time trends, we implement polynomial transformations of our year variable, with the optimal number of degrees determined by minimizing the Bayesian Information Criterion (BIC) for each model. We also control for agency-specific linear time trends in crime and arrest rates.
Our model takes the following form:  Table 6 summarizes results for our models of potential SUTVA violations described by Equation (4). Standard errors are clustered on agency.
Of f enseRates/ArrestRates it = β 0 + β 1 AAT otal t + β 2 (AAT otal * EverLitigated) it + + P p=1 where AAtotal t is the total number of police departments with court-imposed affirmative action plans at time t, calculated from the data reported by Miller and Segal (2012); EverLitigated is a binary variable that is equal to one (in all time periods) for agencies that will eventually experience affirmative action litigation; and P is the optimal number of degrees of the polynomial transformation of the year variable t, as reported in Table 6. β 1 is the effect of increasing the total number of court-imposed affirmative action plans on agencies that will never be litigated, while β 1 + β 2 estimates the effect for agencies that will eventually be litigated, and β 2 is the difference in the effect. Table 6 reports these estimates. We do not find evidence of a possible SUTVA violation. The number of agencies subjected to court-imposed affirmative action plans at any point in time does not appear to be associated with significant changes in offense or arrest rates among unlitigated agencies, either in absolute terms or relative to litigated agencies.

Discussion
In this paper, we replicated and extended the work of Lott (2000) and McCrary (2007) using modern difference-in-differences methods. These two studies sought to estimate the average effects of race-based affirmative action litigation on outcomes such as reported offense and arrest rates. In our replications and extensions, which also focused on offense and arrest rates, we found results generally consistent with the null results reported by McCrary (2007). Also like McCrary (2007), we do not find evidence that agencies changed their behavior in anticipation of affirmative action litigation.
Yet it is not clear that analyzing only the average effects of court-imposed race-based affirmative action is the most productive empirical strategy. These effects may be heterogeneous by victim race. As noted previously, race-based affirmative action leading to increased proportions of nonwhite police officers may lead to increases in reporting by nonwhite crime victims, and in the willingness of civilians to cooperate with police investigations into crimes committed against nonwhite victims, while having little effect on crimes experienced by white victims. Nonwhite officers may also exert greater effort, or be better equipped socially and culturally, to clear crimes in largely nonwhite communities. Race-based affirmative action may then lead to decreases in the incidence of crimes experienced by nonwhite victims, and/or increases in the reporting of crimes experienced by nonwhite victims while having little effect on the incidence and reporting of crimes experienced by white victims.
These potentially racially heterogeneous treatment effects may also offset each other. For example, race-based affirmative action leading to increases in the share of nonwhite officers may both increase the reporting of offenses experienced by nonwhite victims, and decrease the number of offenses experienced by these victims, with a net null effect on reported crimes experienced by nonwhite victims. Likewise, race-based affirmative action leading to increases in the number of nonwhite officers could both increase the effort devoted to clearing offenses experienced by nonwhite victims, but simultaneously decrease the number of these offenses through deterrence effects, again leading to a null effect on offenses cleared by arrest for nonwhite victims.
In order to identify and disambiguate these potentially racially heterogeneous and offsetting effects, researchers will need to look for data beyond the widely used UCR data. For example, using the National Crime Victimization Survey, which allows for the identification of the race and gender of crime victims, and for the measurement of crimes both unreported and reported to law enforcement, Miller and Segal (2018) are able to identify gender-specific causal effects of affirmative action on both the actual incidence of violent crimes and the reporting of those crimes. Likewise, the identification of the potentially racially heterogeneous and offsetting effects linking race-based affirmative action to public safety outcomes is a productive avenue for future research.

Conclusion
Affirmative action was aggressively implemented in police departments beginning in the 1970s. This implementation usually took the form of court-imposed hiring quotas. We examined how court-imposed race-based affirmative action has impacted both offense and arrest rates, seeking to replicate and extend the results of Lott (2000) and McCrary (2007). Our estimates from difference-in-differences, DD decomposition, duration, and generalized synthetic control models generally support the null results of McCrary (2007).
We then analyzed potential spillover effects, to examine whether untreated agencies were changing their behavior as they observed other agencies receiving court-imposed affirmative action plans. Like McCrary (2007), we did not find evidence supporting the claim of spillover effects of treatment on the untreated.
Finally, we concluded with a discussion of the importance of identifying the possibly racially heterogeneous and offsetting causal effects of race-based affirmative action on public safety outcomes.