How to tell if a story is written by AI scientifically
Earlier this year, the publisher Hachette cancelled a novel Shy Girl by Mia Ballard after it was flagged as 78% AI-generated. It became the first novel that was cancelled over the use of AI. The novel was initially self-published and later picked up by Hachette.
Research ran 14,000 novels self-published on Amazon through the AI-detector Pangram and found out that one in five novels was written by AI.
When we talk about finding out AI-generated writing, we tend to focus on words like delve, intricate, tapestry, and punctuation marks like the em dash. You can easily remove these words and em dashes from an AI-generated text to fool an AI detector. In one study, researchers fine-tuned an LLM to write human-like prose and the detection rate fell from 97% to 3%.
LLMs not only overuse specific words like delve and intricate but also specific and predictable narrative structures. Researchers at University of Maryland and Google Deepmind took 10,000+ human-written stories and reverse engineered to reconstruct the prompt behind each one.
Then they ran the same prompts through five LLMs: ChatGPT, Claude, Gemini, DeepSeek, and Kimi. Each LLM came up with a different version, and the researchers got more than 61,000 AI-generated stories of nearly 5,000 words each.
They analyzed the AI-generated stories against 300+ narrative features like how an LLM presents the theme of a story, how it uses time, and if it’s the hero that leads to the resolution of the plot.
Using only the narrative structure, researchers were able to see if a story is AI-generated or not with 93% accuracy. They found out that the AI overuses a set of 30 narrative features. Some of these features are:
Over-explanation: LLMs state the moral of the story explicitly 77% of the time as compared to human writers who do this thing 52% of the time. AI doesn’t have the confidence that the reader can infer the moral of the story and is compelled to spell it out for the reader.
Tidy lineary plots: AI creates protagonists that can fix everything simply by a change of heart. As a result, every thread is tied neatly and there are not loose threads in the story. AI doesn’t care about subplots either in 79% of the stories. And AI follows time in chronological order, which makes the story boring and predictable. Human-written stories have complex characters and don’t treat time in a strict chronological order.
Over-writing: AI takes the old adage “Show, don’t tell” to a whole other level. If a character is afraid, the AI will write excruciatingly long descriptions of how he felt afraid. Human writers use this technique much more selectively than the AI.
Writing not addressed to a reader: Unlike human writers who address the reader directly or indirectly, AI doesn’t care about the reader.
What’s remarkable is that all five LLMs adhered to the above narrative strategies even though they were trained by different people in different countries.
That said, every LLM has its own idiosyncrasies. For example, ChatGPT likes writing gossipy stories full of rumors, Claude escalates emotions the quickest, and Gemini writes the darkest stories.