This blog by Nicolas Carlini is stellar: https://t.co/nqkalFzuDl
Internalizing things based on words is much more difficult to do than internalizing from (bad) experience, but if there is one place you should try hard to learn from as a researcher, it is this post.
It’s been less than a year since I started my lab (SPARK Lab) at @UUtah we already have a ton of new stuff that I can’t wait to talk about soon. Stay tuned for more.
I’ll start today by sharing that our updated Computer Use Survey blog has been accepted to ICLR Blogposts 2026.
Collaboration with my student @aplycaebous and Utah colleague @anmarasovic.
Very honored to be one out of seven outstanding papers at this years' EMNLP :)
Huge thanks to my amazing collaborators @fatemehc__@anmarasovic@boknilev, this would not have been possible without them!
Thrilled to see this work recognized at #EMNLP2025!
This framework and approach to measuring CoT faithfulness have been hugely influential for how I think about reasoning evaluation, and I'm so lucky to have worked with such brilliant collaborators. Huge credit to @mtutek
Very honored to be one out of seven outstanding papers at this years' EMNLP :)
Huge thanks to my amazing collaborators @fatemehc__@anmarasovic@boknilev, this would not have been possible without them!
Tomorrow @ #COLM2025:
1️⃣ Purbid's 𝐩𝐨𝐬𝐭𝐞𝐫 @ 𝐒𝐨𝐋𝐚𝐑 (𝟏𝟏:𝟏𝟓-𝟏:𝟎𝟎𝐩𝐦) on catching redundant preference pairs & how pruning them hurts accuracy
2️⃣ My 𝐭𝐚𝐥𝐤 @ 𝐗𝐋𝐋𝐌-𝐑𝐞𝐚𝐬𝐨𝐧-𝐏𝐥𝐚𝐧 (𝟏𝟐𝐩𝐦) on measuring CoT faithfulness by looking at internals
1/3
Honored 🎷🎸🥁 𝗠𝗶𝘅𝗔𝘀𝘀𝗶𝘀𝘁 🎷🎸🥁 was selected as the #COLM2025 oral spotlight. Go check out @mclemcrew's 𝐭𝐚𝐥𝐤 on 𝐖𝐞𝐝 (𝐎𝐜𝐭 𝟖) at 𝟑:𝟑𝟎𝐩𝐦 in 517BC and 𝐩𝐨𝐬𝐭𝐞𝐫 from 𝟒:𝟑𝟎-𝟓:𝟑𝟎 in 710!
Super excited that the Computer Use survey I've been working on w/ @anmarasovic for a while now is ready! Originally we were planning on a more traditional survey paper but as more surveys came out we decided on an interactive website survey.
Thrilled to have been a part of this work! FUR is a framework that 𝑚𝑒𝑐ℎ𝑎𝑛𝑖𝑠𝑡𝑖𝑐𝑎𝑙𝑙𝑦 tests whether CoTs are faithful to the model's internal computations.
Amazing collaboration, and so excited to see it presented at both @emnlpmeeting & @COLM_conf! 🚀
Very pleased that FUR was accepted to @emnlpmeeting Main🎉
In case you can’t wait so long to hear about it in person, it will also be presented as an oral at @interplaywrkshp@COLM_conf🥳
FUR is a parametric test assessing whether CoTs faithfully verbalize latent reasoning.
Absolutely stoked to announce that our paper “MixAssist: An Audio-Language Dataset for Co-Creative AI Assistance in Music Mixing” was accepted at COLM’25!
This research aims to foster development of AI that empowers artists by augmenting their skills, not automating their flow.
1/ 🚨NEW PAPER: "BriefMe: A Legal NLP Benchmark for Assisting with Legal Briefs", accepted to ACL Findings 2025!
We introduce the first benchmark specifically designed to help LLMs assist lawyers in writing legal briefs 🧑⚖️
9/ We hope BriefMe encourages more Legal NLP development that directly aids legal professionals!
Check out our paper👇 for the full methodology, human evaluation details, and comprehensive benchmarks.
What other legal NLP applications can we design using BriefMe? 🤔