@alanamarzoev and I had a great time presenting OpenEstimate at our #ICLR2026 poster session today! Thanks to everyone who came out to chat about evaluating LLM reasoning under uncertainty.
On my way to #ICLR2026 to present OpenEstimate with @alanamarzoev and give a spotlight talk at the FINAI Workshop.
Over the past few years, @AndrewWLo and I have been studying whether LLMs can be trusted to give sound investment advice. In my talk, I'll show that LLMs demonstrate heuristic collapse: rather than weighing all relevant factors, they latch onto a few salient features and ignore the rest. Heuristic collapse has direct consequences for whether LLMs can meet the legal standard of a fiduciary โ and for AI advisors more broadly.
This is one of many reasons I think investing is one of the best domains for studying LLMs. Through this domain, I've been able to study LLM reasoning, human-LLM interaction, and emergent systemic effects. If you're working on any of these topics, I'd love to meet. Come find me before or after the talk on Monday at 1:35PM!
Heading to #ICLR2026 (@iclr_conf) ๐ง๐ท to present OpenEstimate!
As LLMs get deployed in decision-making domains, they're increasingly expected to do subjective probability estimation, drawing on everything they know to form beliefs about unknown quantities. Our paper studies this capability with a leakage-resistant benchmark.
This sits at the intersection of a few things I care about: RL in hard-to-verify domains, forecasting, and making LLMs honest about what they don't know.
Come find me Saturday 10:30โ1 at poster #1716 in Pavilion 3! And if you'd like to grab coffee and chat about any of these, DMs are open!
๐ New preprint! We have lots of great benchmarks for tasks where it's possible, in principle, for models to get all the answers exactly correct. But what about tasks that *intrinsically* require reasoning about uncertain facts and quantities?