(1/3) Started this as a side project on a whim and had fun presenting it at the COLM 2025 XLLM-Reason-Plan Workshop recently. @XllmReasonPlan
Built a small dataset of rebus puzzles to see how VLMs reason through visual wordplay.
Dataset: https://t.co/EIWBHZfIWz
We propose a new way to quantify AI overreliance: the Offloading Score 🧐 @vishakh_pk
It measures the fraction of cognitive work you hand off to AI 🤖 via simulating how you'd have done each step without AI, then counting the steps the AI saved. It works directly from interaction traces (keystrokes, screenshots), so it's reusable across many tools!!
Absolutely fascinating piece by @davideoks connecting language model oddities to human cultural development. Picked this up courtesy of @deenamousa's "Under Development" newsletter.
https://t.co/OJSHbBFZzL
Today I was supposed to be on my way to Türkiye for my wedding, to meet up with my family and have them finally meet my partner and husband. We had everything planned. We chose Turkiye since it's close to Iran and my partner and I could both go there and have our families meet each other. We were supposed to get married with our close family and a small group of friends on a boat on the Mediterranean Sea at sunset. Because of the war, all flights to and from Iran are cancelled and my family can’t leave Iran, so we had to call off the wedding.
Instead, this is how my day looked like.
I woke up to a reminder to call my grandma (I used to call her every Friday morning). I snoozed the reminder until next Friday, just like I have done for the past many years. I can’t call her like our tradition these days because there is no way to call home. All international calls to Iran are blocked, and the internet is fully shut down by the regime.
I got to work and right as I opened my computer I received an email I had scheduled to send to myself 5 years ago: “Apply for citizenship.” This summer marks 11 years of being in the US and 5 years of being a green card holder. I am now eligible to file for citizenship, but it doesn’t matter because an executive order was signed a few months ago that banned all Iranians from applying for any visa or citizenship.
At lunch I opened Twitter just to see what’s up in the world and saw the news that those who don’t have a green card now need to leave the US before they can get one. This means every one of my Iranian friends who are here on a visa now has to go back home (on which flight?) to get a green card??? As if it’s that easy? We all know getting back to the US for Iranians is a huge challenge (months and months of waiting for a visa, with a chance of never being able to come back).
And this is just a normal Friday for an Iranian. These days, when people ask how I’m doing and how I’m handling everything, I just say:
It’s okay, it’s okay. It will be okay some day. But the reality is: nothing is okay. I’m in constant pain. I haven’t seen my family and loved ones in years, I barely hear about their wellbeing, and I’m constantly worried about them. I’m just burying myself in work because that’s the only distraction that can save me from losing my mind.
I’m not okay. None of us are okay. We are just barely holding it together…
i'm restarting my blog! i want to kickstart productive conversations around: what should AI agents look like for hard, subjective knowledge work?
a lot of agent setups work well when tasks are objective and easy to verify. but many workflows (e.g., qualitative analysis, strategy, sensemaking) are messy and interpretive.
as a first post, i explore different ways of doing agent-assisted qualitative analysis on tweets, with varying levels of human feedback/intervention.
tldr: they all kinda sucked. turns out it’s hard to:
(a) stop agents from converging too quickly on shallow interpretations
(b) get agents to adapt to preferences that emerge gradually across many turns (i.e., evolving context)
(c) capture human judgment without making humans fatigued
i'll be talking about llm benchmarks, the infra behind it, the challenges and learnings later today at @tngtech :)
will be live streamed and recorded, link in replies :)
Our new longitudinal study shows that after 3 weeks with sycophantic AI, users 👉
1⃣were nearly as likely to turn to it as to close friends;
2⃣reported lower satisfaction with real human interactions;
3⃣referred it because it made them feel most understood.
We upgraded Tabracadabra 🎉 to bring an entire context-aware assistant (not just tab to autocomplete!) to any textbox. It's pretty great if you hate switching between the chat interface and what you're working on. We're also open-sourcing, so you can try it out!🧵
Announcing Talkie: a new, open-weight historical LLM! We trained and finetuned a 13B model on a newly-curated dataset of only pre-1930 data. Try it below!
with @AlecRad and @status_effects 🧵
Not many PhD students know about compute grants, but they can make a huge difference. During my PhD, I got access to Stability AI's HPC cluster through a small proposal and used it for Self-RAG training.
Great practical post by @_emliu!
An unsolicited guide to being a researcher: super instructive slides by @EugeneVinitsky https://t.co/sclj1rY930
- different goals of a PhD student
- how to be a good collaborator
- how to keep up with literature
- tracking your ideas & experiments
- stress & productivity
Semantic duplicates are invisible to small models but can be catastrophic for large ones. We show that this breaks standard scaling laws and measure the effective data pool size to fix them. If you're training at scale on synthetic data, you should read this!
Check out Manya's benchmark for LLM creativity! Inspired by work on creativity in graphs (@AdtRaghunathan's "roll the dice" paper), CREATE isolates testing of creative insights for discovery. Future: understand how LLMs derive insights & how they can be better creative partners!
Our paper on using LLMs to support people learning mental health counseling skills received an Honorable Mention at CHI 2026! https://t.co/RFAghrz8Oy Lead by @RyanCLouie (who's on the market!), w/@Diyi_Yang, Raj Shah, Ifdita Hasan Orney, & Juan Pablo Pacheco
Out Reinforcement Learning group is excited to welcome Mansi Maheshwari for a session focused on "Addressing the Plasticity-Stability Dilemma in Reinforcement Learning" next week on Monday, March 16th!
Thanks to @rahul_narava and @gustiwinata_ for organizing this session 👏
Learn more: https://t.co/JFwwkLMiOE
Here is a sharing of career & survival resources that really helped me navigate the research career in #NLProc and #AI: https://t.co/vzsyMGnQK6
Huge thanks to the researchers & profs who wrote such thoughtful guides for our community 🙏
PRs are very welcome to keep it growing🌱
I'm looking for students/folks interested in leading a project on privacy-preserving mental health chatbot research, focusing on differentially private pattern extraction and synthetic data generation for AI safety.
If you are interested or know someone who would be a good fit, email with subject "Mental Health and DP". Short project description below.
Pls share!!
PS this is not a recruitment for PhD positions, it's a single project. if you are already at CMU mention that in the title.
Be sure to join us tomorrow, January 30th for a presentation from @Ahsaasb, for a deep dive into "Production-Grade ML in Practice: Evaluation and Design Frameworks for Recommendation Systems Serving Millions."
Learn more: https://t.co/Y0kAPHzLzG
Our ML Industry group is looking forward to hosting @Ahsaasb, Senior ML Engineer at Instacart for a presentation on "Production-Grade ML in Practice: Evaluation and Design Frameworks for Recommendation Systems Serving Millions."
Thanks @PrahithaM and @arya_suneesh to organizing this event! 🔥
Learn more: https://t.co/1PQTB0U5gx
Internship opportunity! Please share!
📣 I'm looking to hire an intern in human-centered NLP for the agents team @togethercompute. Come work on frontier AI systems that tackle complex agentic tasks!
Research direction is open and looking to publish in NLP and HCI venues