New paper! LLMs Corrupt Your Documents When You Delegate
LLMs are enabling a new way of working: delegated work, where users supervise an LLM as it edits documents on their behalf.
Delegation requires trust: does the LLM complete tasks without introducing errors?
We simulate delegation across 52 professional domains and find that LLMs Corrupt Your Documents When You Delegate. 🧵1/N
LLMs *Still* Get Lost In Multi-Turn Conversation.
We re-ran experiments with newer models. Performance still drops, but with modest gains: mostly from improvements on the Python coding task.
Also: Lost in Conversation will be presented at ICLR 2026 🎉🇧🇷
For those AI researchers who have recently (re)entered the job market, just a reminder that our team— AI Interaction and Learning at Microsoft Research Redmond—is currently hiring multiple ML/AI researchers.
https://t.co/mhPqFXpX9S
https://t.co/17IV9oeXNE
I couldn’t make it to #ACL2025, but excited to share our paper GenTool from @MSFTResearch and @EdinburghUni!
https://t.co/awgwgXVYQq
We introduce GenTool: a training framework that simulates zero-to-one and weak-to-strong tool transitions to boost tool generalization in LLMs.
🤔 Why do LLMs fail on some tasks and not others?
Our paper (https://t.co/iGfWUGZ9Xf) gives a theoretically grounded explanation: some reasoning problems need more information-tracking than LLMs can handle.
The details get ✨lost in transmission✨ between attention heads. 🧵 1/8
Your 1M+ context window LLM might be less powerful than you think. Tobias Schnabel's new article reveals working memory as a critical bottleneck for LLMs.
https://t.co/zTXpomNMkY
New paper from my group at @MSFTResearch!
📄https://t.co/bRwk7auUAn
Promises about how AI will change work are cheap. What does the actual data say?
We measured which work activities people use AI for, how successful they are, and which jobs do those tasks. 🧵1/8
🆕paper: LLMs Get Lost in Multi-Turn Conversation
In real life, people don’t speak in perfect prompts.
So we simulate multi-turn conversations — less lab-like, more like real use.
We find that LLMs get lost in conversation.
👀What does that mean? 🧵1/N
📄https://t.co/xt2EfGRh7e
Just presented my paper "DYMOND: DYnamic MOtif-NoDes Network Generative Model" at @TheWebConf, co-authored with Timothy La Fond and @ProfJenNeville. Recording here: https://t.co/lpL93TXv08
#TheWebConf#WWW21
Congratulations to Hoda for winning the @PurdueScience Early Career Award. So proud of her! Wish we had been able to celebrate in person. @PurdueCS@VT_CS
Excited to announce that I will be joining Microsoft Research in Redmond in June. Looking for an intern for this summer, if you are interested in graph learning with neural networks apply at link below and let me know. @MSFTResearch
https://t.co/cTSTQwJXYO
Wish I could hear discussions with Google mentors at @WiMLworkshop mentoring round tables later today, particularly table 14 with Jeff Dean #ISupportTimnit
We are looking forward to welcoming you to the 15th Women in Machine Learning (#WiM2020) Virtual Workshop on Wednesday, December 9th.
The full *program book*, including details of all the sessions, can be found here: https://t.co/PzYl7i4mAe
#WiML@NeurIPSConf#NeurIPS2020
👏Congratulations to Dr. @Mahak_Goindani for her successful PhD defense today on “Social Reinforcement Learning”! I can't believe she's already the third student from my lab to do a virtual defense during this pandemic and she won't be the last.
This didn’t make me feel as hopeful as I think it was supposed to, but happy to learn about @lynnconway and add her to my list of female role models in CS.
https://t.co/78HMt958se