CALL FOR PAPERS: Mid-Atlantic Conference on Asian Studies (MARAAS)
October 9-11, 2026
Lehigh University
submissions due July 1st https://t.co/AipdhvkxsC
CALL FOR PAPER: Mid-Atlantic Conference on Asian Studies (MARAAS)
October 9-11, 2026
Lehigh University
submissions due July 1st https://t.co/bu19lIzoCE
More featured titles from University of Hawaiʻi Press for the @aasasianstudies Annual Conference.
Save 30% with code AAS2026 through May 31, 2026.
Visit us at Booth 325–327 in Vancouver or explore the list online.
#AAS2026#AsianStudies
Fantastic podcast that I'll be assigning to students in my Asian American Experiences course. Features US faith communities resettling 1 million refugees--from South East Asian secret war refugees to Sudanese Muslims. https://t.co/OpFwZT6dB2
🚨 Holy shit… Stanford just published the most uncomfortable paper on LLM reasoning I’ve read in a long time.
This isn’t a flashy new model or a leaderboard win. It’s a systematic teardown of how and why large language models keep failing at reasoning even when benchmarks say they’re doing great.
The paper does one very smart thing upfront: it introduces a clean taxonomy instead of more anecdotes. The authors split reasoning into non-embodied and embodied.
Non-embodied reasoning is what most benchmarks test and it’s further divided into informal reasoning (intuition, social judgment, commonsense heuristics) and formal reasoning (logic, math, code, symbolic manipulation).
Embodied reasoning is where models must reason about the physical world, space, causality, and action under real constraints.
Across all three, the same failure patterns keep showing up.
> First are fundamental failures baked into current architectures. Models generate answers that look coherent but collapse under light logical pressure. They shortcut, pattern-match, or hallucinate steps instead of executing a consistent reasoning process.
> Second are application-specific failures. A model that looks strong on math benchmarks can quietly fall apart in scientific reasoning, planning, or multi-step decision making. Performance does not transfer nearly as well as leaderboards imply.
> Third are robustness failures. Tiny changes in wording, ordering, or context can flip an answer entirely. The reasoning wasn’t stable to begin with; it just happened to work for that phrasing.
One of the most disturbing findings is how often models produce unfaithful reasoning. They give the correct final answer while providing explanations that are logically wrong, incomplete, or fabricated.
This is worse than being wrong, because it trains users to trust explanations that don’t correspond to the actual decision process.
Embodied reasoning is where things really fall apart. LLMs systematically fail at physical commonsense, spatial reasoning, and basic physics because they have no grounded experience.
Even in text-only settings, as soon as a task implicitly depends on real-world dynamics, failures become predictable and repeatable.
The authors don’t just criticize. They outline mitigation paths: inference-time scaling, analogical memory, external verification, and evaluations that deliberately inject known failure cases instead of optimizing for leaderboard performance.
But they’re very clear that none of these are silver bullets yet.
The takeaway isn’t that LLMs can’t reason.
It’s more uncomfortable than that.
LLMs reason just enough to sound convincing, but not enough to be reliable.
And unless we start measuring how models fail not just how often they succeed we’ll keep deploying systems that pass benchmarks, fail silently in production, and explain themselves with total confidence while doing the wrong thing.
That’s the real warning shot in this paper.
Paper: Large Language Model Reasoning Failures
My chapter "Placebo Effects, Qi, and Intention: How Biomedical Hegemony Polices Competing Paradigms" in Religion, Spirituality, and Public Health:
Competing and Complementary Epistemes is out in open access https://t.co/xTAIPAas0R
Join the Sigur Center next Tuesday, October 7th, from 12:30 to 2 p.m. in room 505 of the Elliott School where Dr. Kin Cheung will discuss Buddhist healing communities in the United States. RSVP here: [https://t.co/6gd8CP7TIS].
Join the Sigur Center and the Asian American Studies minor from 12 - 1:30pm on October 6th, in the Elliott School Room 505, for a discussion surrounding Dr. Cheung's new book "Teaching Asia during a Resurgence of Anti-Asian Racism." RSVP at the link in our bio.
China's Buddhist temples have been rocked by a recent series of scandals. @NCUSCR#PIPFellow@ProfKinCheung discussed China's 'temple economy' with @guardian:
https://t.co/8HjJkEQIEx
Yesterday, @USCET inaugurated its new Reading Room at the @gwusigurcenter, made possible by a donation of @NCUSCR's extensive collection of books on U.S.-China relations and China studies.
Learn more: https://t.co/RbTk4MtEeN
Coming soon from the American Council of Learned Societies, The Robert H. N. Ho Family Foundation Global, and Yale University Press: a new book series on Global Buddhism! We're always thrilled to see the #AsianStudies publishing sphere expand.
https://t.co/5FxOe9ziks