1/ I spent years around scientific research and noticed something broken:
The best science doesn't always get published. Often the difference between acceptance and rejection isn't the research - it's the packaging.
@RetractionWatch Many publishers now have policies they have no practical way to enforce. It feels less like a technology problem and more like a governance problem.
@RetractionWatch One thing I hadn't thought about before reading this:
The stricter the prohibition, the stronger the incentive to conceal.
That creates a difficult dynamic for publishers. Policies can be written overnight. Norms and governance take much longer to build.
@RetractionWatch The impact factor discussion often gets most of the attention.
In our conversations with reviewers and editors, the harder questions are usually about fit, audience, and contribution.
A technically strong paper can still end up at the wrong journal.
@MishaTeplitskiy One thing I wonder is whether the bottleneck is moving.
Producing a paper is getting cheaper. Evaluating novelty, robustness, and significance is not.
If that's true, then the pressure on reviewers and editors may grow faster than the pressure on authors.
One thing we heard over and over in conversations with reviewers:
"Don't make me decode your paper."
Reviewers are volunteers. They're often reading manuscripts at night, between meetings, or on weekends.
The easier you make it for them to understand what you did, why it matters, and how the evidence supports the claim, the better your chances of getting a fair review.
A surprising number of review comments come down to confusion, not disagreement.
@ScholarshipfPhd The interesting question may not be whether a paragraph was written by AI.
It's whether the citations are real, the methods are appropriate, and the conclusions are supported by the evidence.
Those seem like more durable standards than authorship detection.
Researchers spend years on a paper and then face a surprisingly difficult question: "Where should I submit this?"
The challenge isn't finding a journal.
It's understanding what different journals actually value.
One thing we heard repeatedly from researchers is that choosing the right journal can feel surprisingly opaque.
Most journals publish author guidelines.
Far fewer explain how editors think about fit, contribution, novelty, and audience.
That's one reason we spent so much time interviewing reviewers and editors when building Manusights.
A technically strong paper can still end up at the wrong door.
One thing we heard repeatedly from researchers is that the hardest part of publishing isn't always the science.
It's understanding what journals are actually looking for.
Most journals publish author guidelines. Far fewer publish how editors think about contribution, fit, positioning, and novelty.
That's one reason we spent so much time interviewing reviewers and editors when building Manusights.
@BrankoMilan What's interesting is that many of these proposals focus on verifying authorship.
Historically, peer review was meant to evaluate claims, evidence, and reasoning. AI may force us to think more carefully about where authorship ends and accountability begins.
Manusights reviews your manuscript the way a tough, fair peer reviewer would: pressure-testing the science, the statistics, and the reporting before an editor ever sees it.
It catches what a quick read misses - a mean that's arithmetically impossible, a within-group test passed off as a real effect, a missing CONSORT spine that means a desk-reject (a few real examples below) and then does the part no other tool does: it tells you which journals to actually target, and the gaps to close before you submit.
It's trained on how 35+ active peer reviewers, including current Nature, Cell, and Science reviewers, judge real manuscripts. The best way to see it
is on your own paper.
Try it now → https://t.co/BZIM76RbyO
@RetractionWatch One theme that keeps showing up across these papers is that generation, retrieval, verification, and judgment are often treated as the same problem.
They aren't.
AI seems to be improving rapidly at some of those tasks and much more slowly at others.
I've been reading a lot of the recent debate about AI-generated reviews and systematic reviews.
What strikes me is that most people are arguing about whether AI can do the work. The more interesting question might be which parts of the work are actually valuable.
Screening papers and extracting data is one thing.
Deciding what evidence should change practice or policy is another.
One thing I keep noticing across these AI-and-science papers is that production and evaluation are scaling at very different rates.
This study generated 380 finance papers in about 12 hours. Whether those papers are useful is a completely different question.
The ability to generate knowledge-looking content seems to be growing much faster than our ability to evaluate it.
AI Cranks Out 380 Academic Finance Papers in 12 Hours That Could Fool Peer Review Checks | StudyFinds
In a Nutshell
- Two economists used an AI language model to produce 380 complete, journal-formatted academic finance papers in roughly 12 hours, each built around reverse-engineered theories designed to explain data the AI had already seen.
- AI-generated signals performed statistically comparably to signals published in top peer-reviewed finance journals, with equal-weighted results overlapping almost perfectly with published research.
- AI-written introductions clustered tightly at a college-graduate reading level and produced prose that matched the formatting conventions of leading finance journals, though with less stylistic variation than human authors.
- Researchers warn that scaled AI paper generation could overwhelm journal review systems, artificially inflate academic citation counts, and erode the metrics used to evaluate researchers for tenure and funding.
---
A pair of economists just proved that artificial intelligence can churn out hundreds of journal-style academic papers in a matter of hours, complete with data, citations, economic theory, and even author names. The papers look real. The statistical testing behind them is real. But the “discoveries” they claim to make? Reverse-engineered after the fact by a machine.
Two finance professors at leading American universities set out to show just how easy it had become to industrialize one of academia’s most persistent bad habits: building a theory to explain data you’ve already seen, then pretending you came up with the theory first. In academic circles, this practice has a name, “HARKing,” which stands for Hypothesizing After Results are Known. What the researchers found was that AI doesn’t just enable HARKing on a new scale. It automates it entirely, at a speed that could overwhelm the academic publishing system before anyone figures out what to do about it.
Robert Novy-Marx of the Simon Business School at the University of Rochester and Mihail Velikov of Penn State’s Smeal College of Business published their findings in the Journal of Economic Literature in March 2026. Their paper is equal parts technical tour de force and cautionary alarm, a demonstration of what AI can do to academic science that is as sobering as it is impressive.
How Researchers Used AI to Mass-Produce Finance Papers
To build their assembly line, Novy-Marx and Velikov started with raw financial data. They pulled accounting information on publicly traded U.S. companies from two major databases covering decades of records: COMPUSTAT, which tracks corporate financial statements going back to 1950, and CRSP, a stock market database with data going back to 1926. From those sources, they mathematically constructed more than 31,000 potential “signals,” patterns in accounting numbers that might predict how a stock will perform.
Most of those signals didn’t hold up under scrutiny. After running them through a series of increasingly strict statistical tests, the researchers filtered the original pool down to just 95 signals that survived all quality checks. Each had to show consistent, statistically meaningful results across multiple ways of slicing the data, including adjustments for firm size and known market risk factors. Only about four-tenths of one percent of the original candidates made the cut.
With those 95 validated signals in hand, the team handed the work over to an AI language model. Specifically, they used Claude Opus 4.1, Anthropic’s most advanced reasoning model at the time of the experiment. For each signal, the AI generated four complete academic papers, each one built around a different economic theory to “explain” the same finding.
One version argued that investors are slow to absorb complex financial information. Another leaned on theories about production costs and investment risk. A third drew from consumption-based economic models. A fourth was written without a specific theoretical angle. In total, the pipeline produced 380 finished papers, each roughly 30 pages long, with abstracts, introductions, data sections, results tables, charts, and references, all formatted to match top finance journal standards.
The data mining and validation steps took about a day of computing time. The AI-generated papers took about 12 hours.
AI-Generated Finance Research Papers Fooled Standard Quality Checks
The papers that came out of this pipeline were, by multiple measures, eerily convincing. Each AI-generated introduction followed standard academic conventions, framing a research question, citing related literature, building a logical theoretical argument, and summarizing the key results. The citations were drawn from real published work, though the authors note the AI occasionally “hallucinated” references that don’t actually exist. Signal names were generated to sound authoritative and specific: a ratio of other current assets to shareholders’ equity became “Liquidity Leverage Intensity.” A measure of acquisitions relative to working capital was labeled “Acquisition Capacity Utilization.”
When the researchers compared the statistical strength of their AI-generated signals against 212 signals published in actual peer-reviewed finance journals, the data-mined signals were nearly indistinguishable. For equally-weighted portfolio strategies, the distribution of statistical results from the AI-generated signals overlapped almost perfectly with the distribution from published academic papers.
That finding alone carries a pointed message: the bar that peer review sets for finance research may be no higher than what an automated data-mining exercise can clear on its own.
Readability tests told a similar story, though with a revealing twist. Novy-Marx and Velikov compared the AI-written introductions against 140 published papers using standard measures of text complexity. AI-generated introductions clustered tightly at the higher end of the scale, around 16 to 18 years of education required to comprehend them, roughly college-graduate level, with very little variation across all four theoretical versions.
Human-authored papers spread more widely, with median scores somewhat lower at 13 to 16 years of education, and notable outliers on both ends. The machine’s prose was consistent and polished, but it lacked the stylistic range of human academic writing.
What This Means for the Future of Academic Research
None of the 380 papers were submitted to journals, and the researchers are clear that the experiment was designed to sound an alarm, not to flood the academic literature with junk. But the alarm is a loud one. The authors note that submitting all 380 papers to peer-reviewed journals would impose hundreds of thousands of dollars in reviewing costs on the profession, and if even a small fraction of researchers adopted this approach, the journal system could be overwhelmed.
Citation inflation is another concern the paper raises directly. Each AI-generated paper cites prior research to build its theoretical case, including, in many cases, the authors’ own earlier work. Scaled across hundreds or thousands of papers, automated citation generation could artificially inflate citation counts, a metric that tenure committees, grant agencies, and hiring panels use to evaluate academics. Novy-Marx and Velikov even calculate that if search engines index the 95 papers they’ve publicly posted, each of them could pick up hundreds of additional citations without a single human reader choosing to cite their work.
The paper stops well short of calling AI in research inherently destructive. AI can, the authors argue, democratize research by lowering the barriers to hypothesis generation, accelerate the pace of discovery, and help researchers map connections across large bodies of literature far faster than was previously possible.
There’s even a genuine scientific case for post-observation theorizing: Isaac Newton, after all, watched an apple fall before he developed his theory of gravity. The problem isn’t looking at data before forming a theory. The problem is doing so secretly, at industrial scale, and presenting the result as original insight.
Novy-Marx and Velikov call for researchers to be held fully accountable for any work they produce with AI assistance, not merely required to disclose that AI was used, a standard they argue is too weak to matter. They also advocate for new validation systems capable of detecting circular reasoning, redundant theorizing, and hallucinated citations. And they argue that economic theories offered to explain new findings should be judged, at least in part, by whether they make novel predictions that go beyond the result they were built to explain.
“AI can now produce a ton of papers at scale, and it’s going to change the nature of how we produce and disseminate knowledge. This is an early warning signal of what’s coming with modern AI capabilities,” Velikov said in a statement.
“I’m far from the opinion that we’ll all be out of jobs and replaced by AI,” he added, “but I think our jobs will evolve a lot, and the more we invest in understanding how these systems work, the better research we’ll be able to do.”
Whether academic institutions move fast enough to build those safeguards is an open question. For now, the 380 papers sit in a public GitHub repository, proof that the assembly line works and that current safeguards may not be ready for it.
Read more:
https://t.co/aYi4O9DAb7
What stood out to me is that 80.4% of authors said the AI identified issues not mentioned by human reviewers, but they still trusted human reviews more. That feels directionally right. Finding potential issues and exercising judgment about significance are related, but very different tasks
Researchers audited 111 million references across 2.5 million papers and estimated that 146,932 hallucinated citations entered the scientific literature in 2025 alone.
The finding that stood out to me is that many of these citations made it through existing moderation and publication processes.
A lot of discussion around AI in science focuses on generation. This feels like a reminder that verification may be the more important problem.
https://t.co/nipNGeANhH
The biggest misconception in AI for science is that writing is the hard part.
Writing is becoming cheap, but verification is becoming expensive.
Does the citation exist?
Does the DOI resolve?
Does the cited paper actually support the claim?
Does the figure support the conclusion?
At Manusights, we built our review process around verification. Every review includes live citation checks, figure analysis, and feedback trained on patterns from 200+ pre-submission reviews.
Curious what your manuscript would look like through that lens?
Comment "manusights" and follow @manusights. We'll give away a free review (normally $39) to a few researchers this week.
One thing we've noticed building Manusights is that generating academic text is becoming almost free. Verification is becoming the scarce resource, for example checking whether a citation exists, whether a figure supports a claim, whether a conclusion actually follows from the evidence. Those are increasingly the bottlenecks.