Key Takeaways
- Professional human editors consistently improve tone, readability, and factual accuracy of business letters, while AI output hinges on the precision of user instructions.
- When prompted to write at a B1 language level (intermediate proficiency per the CEFR), ChatGPT’s revisions closely matched the editors’ readability scores without introducing errors.
- A vague instruction to make the text “reader‑focused” led the model to retain jargon, produce awkward phrasing, and even fabricate information (e.g., congratulating an employer on a nonexistent team expansion).
- An elaborate eight‑step prompt improved visual layout but introduced multiple factual errors, suggesting that over‑loading the model can cause it to lose track of core meaning.
- The study’s small sample and single‑generation design limit generalizability; real‑world use would likely involve iterative prompting and human‑AI collaboration.
- Future writing roles may shift toward prompt engineering—crafting precise contextual cues—and supervising AI‑generated drafts rather than creating texts from scratch.
Context and Concerns about AI in Professional Writing
The rapid adoption of generative artificial intelligence has sparked widespread anxiety in the writing and publishing industries. Many copywriters and translators worry that automated tools will eventually render their professions obsolete. Organizations increasingly turn to digital tools to draft business correspondence, marketing materials, and internal reports. While earlier experiments showed that language models like ChatGPT can boost productivity and fix grammar in simple assignments, writing for an organization differs fundamentally from personal expression. Corporate texts must reflect a company’s identity, adhere to technical regulations, and often involve multiple authors, creating inconsistent messaging that requires skilled editorial oversight.
Why Organizational Documents Demand Specialized Editing
Producing these documents calls for an understanding of workplace dynamics, technical rules, and a firm’s preferred tone. Companies frequently hire external editors to untangle conflicting voices and simplify dense legal or technical language for everyday readers. This specialized intuition—balancing clarity, tone, and factual fidelity—is what Daniël Janssen and his team at Utrecht University sought to test against a machine. Their experiment asked whether ChatGPT could independently apply the same nuance and audience awareness that seasoned human editors bring to routine corporate documents.
Study Overview: Comparing Human Editors with ChatGPT
The research team divided their investigation into two phases. In the first phase, they observed three professional editors, each with more than two decades of industry experience. Participants received four distinct Dutch business letters covering topics such as maternity leave policies, sickness benefits, and scheduling, and were instructed to make the texts “good.” The researchers recorded the editors’ screens and later interviewed them using stimulated recall, asking the editors to narrate their thought process while watching the recordings. In the second phase, the same letters were fed to ChatGPT under three different prompting strategies to see how instructional specificity affected output.
Phase One: How Seasoned Editors Refine Business Letters
The human editors consistently focused on improving overall tone, replacing formal jargon with accessible language, and restructuring letters so the most urgent information appeared at the top. They employed shorter sentences, active verbs, and increased personal pronouns such as “you” and “we.” Notably, their revisions were “completely free of factual errors and preserved the legal intent of the organizational documents.” As one editor explained during the stimulated recall interview, “I look for where the reader gets lost and then I bring the key point forward, using words they actually use in daily conversation.” This approach yielded readable, accurate texts that matched the organizations’ communicative goals.
Phase Two: Prompt Variations Tested on the Same Texts
The investigators employed three distinct prompts. The first was intentionally simple: “make the text reader‑focused.” The second asked the model to rewrite the text to a B1 language level, referencing the Common European Framework of Reference for Languages—a standard intermediate proficiency targeted by most mass‑market communications. The third prompt was a specialized eight‑step instruction designed to replicate the exact workflow the human editors described during their interviews, covering steps such as identifying the main message, simplifying vocabulary, adjusting sentence length, and checking tone. Each prompt was applied to the same four letters, and the outputs were evaluated with readability software and a qualitative review for factual correctness and phrasing.
Results: The B1‑Level Prompt Mirrors Human Performance
When given the B1‑level instruction, ChatGPT performed remarkably well. The readability scores of its revisions closely resembled those produced by the human editors, achieving similar reductions in sentence length and increases in plain‑language vocabulary without altering the original meaning. As the article notes, “The B1 prompt successfully shortened complex clauses and simplified the vocabulary without changing the original meaning.” This outcome suggests that a clear, linguistically grounded target can guide the model to emulate the editors’ effectiveness in making texts accessible while preserving factual integrity.
pitfalls of Vague and Over‑Complex Instructions
In stark contrast, the vague directive to make the text “reader‑focused” yielded poor results. The model retained complex sentence structures, leaned on unfamiliar words, and, most troublingly, invented false information. For instance, in a letter discussing an employee’s maternity leave benefits and sick pay, the simple prompt generated a sentence “congratulating the employer on the upcoming expansion of their team.” This was a fundamental misunderstanding: a baby is not a new employee, making the congratulatory phrase wholly inappropriate for an HR document. The eight‑step prompt, while improving visual layout, introduced multiple factual errors regarding the payment of certain medical benefits, indicating that feeding the model too many discrete revision steps at once can cause it to lose sight of the core message.
Limitations, Interpretation, and the Future Role of Writers
The study acknowledges several constraints: it relied on a small set of business letters, evaluated outputs in a single generation, and did not test longer, more intricate reports such as journalistic releases or consumer manuals. In real workplaces, users would likely iterate on prompts, regenerate text, or manually edit AI drafts—behaviors not captured here. Nevertheless, the findings point to a shifting role for professional writers. Rather than drafting from scratch, they may increasingly act as curators and directors of AI‑generated content, a shift that demands prompt engineering—the skill of feeding precise contextual cues to the model. As the authors conclude, “Assessing artificial prose requires the exact same competencies used to evaluate human writing, including rhetorical fit and source verification.” Thus, effective workplace communication may soon depend just as much on supervising and correcting text‑generation models as on traditional language mastery.

