Evaluating AI Assistance in Police Report Writing

0
3

Key Takeaways

  • Independent research shows AI‑assisted police report tools do not save officers time; some studies even report increased workload.
  • In a blinded evaluation, experienced senior officers could not distinguish AI‑generated reports from those written by humans, performing no better than chance.
  • While AI‑produced text tends to use longer, more complex words (higher grade‑level readability), it was rated significantly lower in accuracy—a critical factor for prosecutors, defenders, and juries.
  • The perceived accuracy of AI reports drops from roughly the 50th to the 36th percentile, indicating a substantive decline in reliability.
  • Civil‑liberties concerns (e.g., evidentiary contamination, memory distortion) remain unresolved and are compounded by the technology’s lack of demonstrable benefits.
  • Policymakers and departments should prioritize human‑generated reports until independent evidence confirms both efficiency gains and uncompromised accuracy.

Background on AI‑Assisted Police Report Products
Law‑enforcement agencies across the United States have begun experimenting with artificial‑intelligence tools that draft incident reports from body‑camera footage and officer notes. Vendors market these systems on two primary promises: they will speed up report writing and improve the overall quality of the documentation. Critics, however, have long warned that such technologies raise serious civil‑liberties issues, including the risk of contaminating officers’ memories, introducing bias, and undermining evidentiary integrity. Despite these concerns, adoption continues, prompting researchers to scrutinize whether the claimed advantages actually materialize in practice.


Evidence on Time Savings and Efficiency
Early investigations cast doubt on the efficiency claim. A 2024 white paper cited a study finding that Axon’s Draft One product did not reduce the time officers spent completing reports. Subsequent work has reinforced this pattern: Walley & Glasspoole‑Bird (2025) reported an 18‑minute increase per incident when using a different AI tool, and Becker et al. (2025) observed that experienced developers slowed by 19 % when assisted by AI coding aids. Even in domains where modest gains appear—such as Rotenstein et al.’s (2026) finding of roughly 13 minutes saved per eight scheduled patient hours for AI clinical scribes—there was no reduction in after‑hours work, suggesting that any time saved is simply shifted elsewhere. Collectively, the independent literature fails to substantiate the vendor‑driven narrative of meaningful efficiency improvements.


Study Design: Assessing Report Quality Blind to Condition
To evaluate whether AI assistance yields higher‑quality reports, researchers recruited 92 senior law‑enforcement officers (sergeants and above) with an average of 22 years of experience reviewing police documentation. These participants were presented with 80 reports, 20 of which had been drafted using Axon’s Draft One AI system, while the remaining 60 were conventional officer‑written reports. Crucially, the reviewers were blinded to the source of each document. Their first task was to identify which reports were AI‑generated; their performance hovered around chance, indicating they could not reliably discern AI involvement. This inability to detect AI authorship set the stage for an unbiased assessment of report quality across several dimensions.


Objective Text Metrics: AI Produces More Complex but Not Necessarily Better Writing
On objective linguistic measures, the AI‑assisted reports showed a narrow advantage: they contained longer words, more syllables per word, and a higher estimated grade‑level than the human‑authored counterparts. In other words, the AI output was syntactically more complex and less readable. While such metrics might suggest “better” writing under a simplistic definition, they do not necessarily translate into clearer or more useful communication, especially in a legal context where plain language is often prized.


Subjective Quality Ratings: No Overall Improvement Detected
When the same 92 expert reviewers rated each report on five subjective criteria—clarity, completeness, grammar, accuracy, and utility—there was no reliable enhancement in the composite quality score for AI‑assisted reports. The blind condition ensured that any perceived differences could not be attributed to expectations about AI involvement. Consequently, the overall impression of the reports’ worth remained essentially unchanged whether they originated from an AI tool or a human officer.


Critical Finding: Accuracy Suffers Significantly with AI Assistance
The most consequential result emerged from the accuracy dimension. AI‑generated reports were rated substantively and significantly lower in accuracy than those written by officers (p = .038). The estimated effect moved a report from approximately the 50th percentile to the 36th percentile in perceived accuracy. Because police reports serve as foundational evidence for prosecutors charging decisions, defense attorneys challenging inconsistencies, and judges and juries reconstructing events months later, a drop in accuracy threatens the integrity of the entire criminal‑justice process. The authors caution that the AI’s authoritative tone and the extra detail drawn from verbatim body‑camera transcripts can create an illusion of thoroughness while actually obscuring factual fidelity.


Why the Illusion of Quality Persists
The researchers note that AI’s ability to pull minute details from transcripts and present them in a polished, confident voice may mislead readers into believing the report is superior. However, seasoned police managers—those tasked with overseeing report quality—did not fall for this illusion; they recognized the decline in accuracy and usefulness. This suggests that the perceived “quality” boost stems more from superficial linguistic complexity than from substantive improvements in factual reliability.


Broader Implications for Policy and Practice
Adams, the report’s lead author, emphasizes that the technology’s marketing hinges on two claims—speed/efficiency and quality—yet the independent evidence to date does not support either. No peer‑reviewed study has demonstrated a clear time‑saving benefit, and the available data point toward neutral or negative effects on efficiency. Likewise, the quality claim falters when accuracy—a non‑negotiable attribute for legal documentation—is compromised. Until robust, reproducible research shows that AI tools can simultaneously enhance efficiency and preserve (or improve) evidentiary accuracy, the prudent course is for departments to limit or suspend their use in report writing.


Conclusion: Prioritizing Human‑Generated Documentation Over Unproven AI
The author reflects on a personal analogy: building a macro to clean a data set can be more engaging than manual correction, even when it ultimately consumes more time. Similarly, the novelty of large language models may make AI‑assisted report drafting feel productive, despite objective drawbacks. If the technology fails to save time, diminishes accuracy, and introduces unresolved civil‑liberties risks, then the justification for its adoption evaporates. Police officers should return to describing incidents in their own words, ensuring that the record remains a trustworthy, human‑to‑human account of events—an essential safeguard for justice in a democratic society.

SignUpSignUp form

LEAVE A REPLY

Please enter your comment!
Please enter your name here