ri @AIatMeta, pursuing CS PhD @SCAI_ASU | Prev: Applied Math (B.S. & M.S.) @FudanUni, Research Internship @awscloud @AlibabaGroup @AMD | Opinions are my own.
Treating reasoning and acting as two tools for one job folds many debates (long context vs RAG, think vs act, etc.) into a single allocation question. And framing the internalization–externalization boundary as the next design question feels exactly right. Really inspiring read!
Two students take the same exam. Both score 100 — one solved it himself, the other Googled every answer.
A semester later, the gap is huge.
That's the problem with today's AI agents. I write a detailed blog to share my recent thoughts on this, mainly based on Theory of Agents. I promise this is definitely worth 30 minutes of your time.
Blog: https://t.co/VCFC7RnbU6
Project: https://t.co/WFLEYhOaCl
That last part resonates with what motivated our ongoing project, ReuseRL: if a capability only internalizes when it is compressible, the real question becomes what gets internalized. We go looking for the atoms: the small set of reusable skills a model can absorb and build on.
@Swarooprm7 We may need to define the "machine creativity/novelty" first. What about searching for some counterexamples in formal math? Concepts are created to simplify things, facilitate abstraction, and improve human understanding (efficiency, etc.). Why do AI models need novel concepts?
✨Check out the paper to learn more! This paper was done by our intern @Zijun0916 last summer. A big thank you to my labmate @XiaoYe1170354 and our advisor @BenZhou96 for the support and guidance!
LLMs memorize massive amounts of text, but can they actually apply this knowledge conceptually? 🤔
Our #ICLR 26' paper from the ARC Lab probes this in math reasoning! "CORE: Concept-Oriented Reinforcement for Bridging the Definition-Application Gap..." 🔗 https://t.co/GscQjqjTMd
📈 The results: Consistent gains over vanilla baselines, including up to +9.3% on in-domain Textbook problems and +9.6% on out-of-domain TheoremQA. We also did ablation experiments to show the results are consistent with different models and across different benchmarks.
@HBX_hbx@QuYuxiao Besides QuestA, there are also many other related works using a similar idea from last year: BREAD(https://t.co/O7wOGJnYPD), Scaf-GRPO(https://t.co/oug6ytphJ3), and CORE(https://t.co/1ygNSBsQC1). "Guided prefix" could be partial oracle solutions, problem-related concepts, etc.
To the questions of “why not both?”: my dream is for LLMs to make conceptual discoveries, like Galois with group theory or Einstein with general relativity. I don’t believe breakthroughs like these would come from A* search or its more advanced version MCTS.
First time at #NeurIPS2025! I’ll be in San Diego from Dec 1–6 and would love to make new friends, grab some tea🍵, and discuss LLM reasoning (math & cognition-inspired), RL, and more! Feel free to DM!