Varun Vasudevan

@DevanVarun

Computational Scientist. Building & optimizing Compound AI systems. PhD from @ICMEStanford.

Palo Alto, CA

Joined December 2013

313 Following

256 Followers

218 Posts

Varun Vasudevan @DevanVarun

about 1 month ago

Who validates the validators is my fav paper from Shreya. I have used the term criteria drift so many times in the last year. Thanks and best wishes!

Shreya Shankar

@sh_reya

about 1 month ago

I'm joining Carnegie Mellon's CS Department (and HCII by courtesy) as an assistant professor in Fall 2027! I'll be recruiting PhD students next cycle. If you're interested in AI systems or human-AI collaboration, list me in your application. Stay tuned for more about my new lab!

120

109

345

215K

Varun Vasudevan @DevanVarun

5 months ago

Hello @DSPyOSS community. What is a clean way to re-configure dspy with lm instance during optimization? I have API access where authentication signature needs to be updated at regular intervals.

109

Varun Vasudevan @DevanVarun

7 months ago

@lateinteraction Hi @lateinteraction. What courses are you teaching at MIT? Anything on designing compound AI systems?https://t.co/c2xGH7As3g

Varun Vasudevan @DevanVarun

7 months ago

@sayashk Just finished reading your research statement. Enjoyed reading and thinking about your clarity of thought. Thank you for sharing, Sayash.

Who to follow

Dr Ashvind Bawa

@drbawa

Hong Kong Medical Journal

@HongKongMedJ

Hong Kong Medical Journal is a Hong Kong-based, peer-reviewed, general medical journal published bimonthly online and in print. 2023 Impact Factor: 3.1

Faraz Mohammed فراز محمد

@mohammedfaraz13

Psychiatrist, Unabashed Sports Lover , Arsenal Fan for life , Patron of Poetry , Introvert, Reclusive , Moderate Outlook, Magnanimous ;-)

Varun Vasudevan @DevanVarun

7 months ago

@kellyhongsn Thanks for sharing the article. In the first figure in the results section, what does "(n models)" mean? I understood the gist from the figure, but couldn't understand all the details. Could you pls clarify?

Varun Vasudevan @DevanVarun

7 months ago

Reminds me of the newspaper test

Ahmad Beirami

@abeirami

7 months ago

It’s unfortunate that a bug in OpenReview allowed reviewer and AC identities to be extracted. That said, this is a chance to reflect on our practices: We should write our reviews as if our names were attached to them and visible to authors and the broader community, even when the recommendation is a strong reject!

174

20K

198

Varun Vasudevan @DevanVarun

7 months ago

However, with regard to this slide, isn't this also true about prompt optimizers? They have predefined prompts for instructional proposal.

DevanVarun's tweet photo. However, with regard to this slide, isn't this also true about prompt optimizers? They have predefined prompts for instructional proposal. https://t.co/kEa1n0NpUb

Varun Vasudevan @DevanVarun

7 months ago

I've been experimenting with both GEPA & MIPRO a lot lately, on both text & image data. Thoughts from the talk that resonated with me the most are: 1) Meet people where they are! 2) Positioning prompt optimizers as discovering latent requirements.

DevanVarun's tweet photo. I've been experimenting with both GEPA & MIPRO a lot lately, on both text & image data. Thoughts from the talk that resonated with me the most are:
1) Meet people where they are!
2) Positioning prompt optimizers as discovering latent requirements. https://t.co/7wEs5jqhns

Christopher Potts

@ChrisGPotts

7 months ago

I've now put a full practice run of this DSPy meet-up talk on YouTube (link in this thread). I am told recordings of @lateinteraction's and @LakshyAAAgrawal's talks are coming soon (my own live version had technical issues).

19K

449

Varun Vasudevan @DevanVarun

7 months ago

It becomes even harder if length in chars is one of the many constraints. Use case: Marketing messages out in the world.

Omar Khattab

@lateinteraction

7 months ago

instruction following is hard

113

Varun Vasudevan @DevanVarun

8 months ago

@ZacharyBamberg1 High as in, what's the range?

Varun Vasudevan @DevanVarun

8 months ago

@AndrewYNg looking at his own reflection is the coolest thing so far 😃😂 As always, amazing instructor.

Andrew Ng

@AndrewYNg

8 months ago

Announcing my new course: Agentic AI! Building AI agents is one of the most in-demand skills in the job market. This course, available now at https://t.co/zGHUh1loPO, teaches you how. You'll learn to implement four key agentic design patterns: - Reflection, in which an agent examines its own output and figures out how to improve it - Tool use, in which an LLM-driven application decides which functions to call to carry out web search, access calendars, send email, write code, etc. - Planning, where you'll use an LLM to decide how to break down a task into sub-tasks for execution, and - Multi-agent collaboration, in which you build multiple specialized agents — much like how a company might hire multiple employees — to perform a complex task You'll also learn to take a complex application and systematically decompose it into a sequence of tasks to implement using these design patterns. But here's what I think is the most important part of this course: Having worked with many teams on AI agents, I've found that the single biggest predictor of whether someone executes well is their ability to drive a disciplined process for evals and error analysis. In this course, you'll learn how to do this, so you can efficiently home in on which components to improve in a complex agentic workflow. Instead of guessing what to work on, you'll let evals data guide you. This will put you significantly ahead of the game compared to the vast majority of teams building agents. Together, we'll build a deep research agent that searches, synthesizes, and reports, using all of these agentic design patterns and best practices. This self-paced course is taught in a vendor neutral way, using raw Python - without hiding details in a framework. You'll see how each step works, and learn the core concepts that you can then implement using any popular agentic AI framework, or using no framework. The only prerequisite is familiarity with Python, though knowing a bit about LLMs helps. Come join me, and let's build some agentic AI systems! Sign up to get started: https://t.co/FX35dloqw4

326

881K

118

Varun Vasudevan @DevanVarun

8 months ago

@LakshyAAAgrawal @hammer_mt How can I get the lm_usage i.e., num tokens consumed during training? I believe there is something like get_lm_usage? Setting dspy.settings.configure(track_usage=True) is resulting in an error and I saw a fix on git. I believe that's for the next release.

Varun Vasudevan @DevanVarun

8 months ago

@LakshyAAAgrawal @andressrg Awesome! Let me try that. Thank you.

Varun Vasudevan @DevanVarun

10 months ago

Same here. The most important value proposition has been the "speed of experimentation."

Maxime Rivest 🧙‍♂️🦙🐧

@MaximeRivest

10 months ago

DSPy-ians may disagree, but I don’t think automatic prompt optimization is the real killer feature of DSPy. For me, the most important value proposition is the speed of experimentation it enables and I’m convinced that this will further increase (as dspy gets even better). Faster experimentation leads to better AI programs. With DSPy, most changes take a single line of code: swap instructions, tweak outputs requirements, adjust few-shot demos (manually or automatically), finetune weights, turn a program into an agent with Python tools, switch models, change prompt templates (adapters), try model merging, majority voting, ensembling, and more. Here’s a recent run: With Llama 4 Scout: 1. Naive prompt → 2/11 2. DSPy naive program → 2/11 3. DSPy instruction optimization → 6/11 4. Add few-shot examples to (3) → 6/11 (no improvement) 5. Optimized program with chain of thought → 7/11 6. Optimized program with tools → 7/11 Switched to Gemini 2.5 Pro: 7. Optimized program with chain of thought → 9/11 8. Add structured outputs (list first) + reasoning, no tools → 11/11 🥳 ps: if I use my naive prompt in step 8, I also get 11/11 the program structure tends to matter most. After all that I don't have 8 messy script, I have that simple program that I know perform very well. If I wanted I would start 'scanning' to find a model that perform like gemini-2.5-pro but cheaper. I would only change one line and evaluate in a loop!

MaximeRivest's tweet photo. DSPy-ians may disagree, but I don’t think automatic prompt optimization is the real killer feature of DSPy.

For me, the most important value proposition is the speed of experimentation it enables and I’m convinced that this will further increase (as dspy gets even better). Faster experimentation leads to better AI programs.

With DSPy, most changes take a single line of code: swap instructions, tweak outputs requirements, adjust few-shot demos (manually or automatically), finetune weights, turn a program into an agent with Python tools, switch models, change prompt templates (adapters), try model merging, majority voting, ensembling, and more.

Here’s a recent run:

With Llama 4 Scout:

1. Naive prompt → 2/11
2. DSPy naive program → 2/11
3. DSPy instruction optimization → 6/11
4. Add few-shot examples to (3) → 6/11 (no improvement)
5. Optimized program with chain of thought → 7/11
6. Optimized program with tools → 7/11

Switched to Gemini 2.5 Pro:

7. Optimized program with chain of thought → 9/11
8. Add structured outputs (list first) + reasoning, no tools → 11/11 🥳

ps: if I use my naive prompt in step 8, I also get 11/11 the program structure tends to matter most.

After all that I don't have 8 messy script, I have that simple program that I know perform very well. If I wanted I would start 'scanning' to find a model that perform like gemini-2.5-pro but cheaper. I would only change one line and evaluate in a loop!

241

254

16K

150

Varun Vasudevan @DevanVarun

10 months ago

Congratulations @yuyinzhou_cs

Yuyin Zhou

@yuyinzhou_cs

10 months ago

Just wrapped up my first #KDD2025 in Toronto 🎉 Honored to receive the Young Investigator Best Paper Award for our vision: "From Medical World Models to Intelligent Digital Twins — A Blue Sky Vision for Proactive Medicine." Huge thanks to @kdd_news for this incredible recognition and @NIH for the support! 🙏

yuyinzhou_cs's tweet photo. Just wrapped up my first #KDD2025 in Toronto 🎉
Honored to receive the Young Investigator Best Paper Award for our vision:
"From Medical World Models to Intelligent Digital Twins — A Blue Sky Vision for Proactive Medicine."
Huge thanks to @kdd_news for this incredible recognition and @NIH for the support! 🙏

246

DevanVarun retweeted

James Zou @james_y_zou

11 months ago

📢New conference where AI is the primary author and reviewer! https://t.co/lLjAgp7Zmp Current venues don't allow AI-written papers, so it's hard to assess the +/- of such works🤔 #Agents4Science solicits papers where AI is the main author w/ human advisors. 💡Initial reviews by LLM reviewers w/ final assessment + selection by human experts. 💡Submissions are asked to clearly document AI contribution. 💡All submissions/reviews will be public to enable transparent study of the strength and limitations of AI as researcher and reviewer. We expect AI will make mistakes and it will be instructive to study these in the open! Many thanks to the fantastic co-organizers and expert advisory board! Please see the website for more information.

james_y_zou's tweet photo. 📢New conference where AI is the primary author and reviewer! https://t.co/lLjAgp7Zmp

Current venues don't allow AI-written papers, so it's hard to assess the +/- of such works🤔 #Agents4Science solicits papers where AI is the main author w/ human advisors.

💡Initial reviews by LLM reviewers w/ final assessment + selection by human experts.
💡Submissions are asked to clearly document AI contribution.
💡All submissions/reviews will be public to enable transparent study of the strength and limitations of AI as researcher and reviewer.
We expect AI will make mistakes and it will be instructive to study these in the open!

Many thanks to the fantastic co-organizers and expert advisory board! Please see the website for more information.

503

129

207

114K

Varun Vasudevan @DevanVarun

11 months ago

Somehow, the words "reduce the scope" reminded me of the "idea of simplification: stripping the problem of everything except the essentials" from Shannon's easy on creative thinking. A link to the essay: https://t.co/Z9H6G4IzfY

Andrew Ng

@AndrewYNg

11 months ago

I’d like to share a tip for getting more practice building with AI — that is, either using AI building blocks to build applications or using AI coding assistance to create powerful applications quickly: If you find yourself with only limited time to build, reduce the scope of your project until you can build something in whatever time you do have. If you have only an hour, find a small component of an idea that you're excited about that you can build in an hour. With modern coding assistants like Anthropic’s Claude Code (my favorite dev tool right now), you might be surprised at how much you can do even in short periods of time! This gets you going, and you can always continue the project later. To become good at building with AI, most people must (i) learn relevant techniques, for example by taking online AI courses, and (ii) practice building. I know developers who noodle on ideas for months without actually building anything — I’ve done this too! — because we feel we don’t have time to get started. If you find yourself in this position, I encourage you to keep cutting the initial project scope until you identify a small component you can build right away. Let me illustrate with an example — one of my many small, fun weekend projects that might never go anywhere, but that I’m glad I did. Here’s the idea: Many people fear public speaking. And public speaking is challenging to practice, because it's hard to organize an audience. So I thought it would be interesting to build an audience simulator to provide a digital audience of dozens to hundreds of virtual people on a computer monitor and let a user practice by speaking to them. One Saturday afternoon, I found myself in a coffee shop with a couple of hours to spare and decided to give the audience simulator a shot. My familiarity with graphics coding is limited, so instead of building a complex simulator of a large audience and writing AI software to simulate appropriate audience responses, I decided to cut scope significantly to (a) simulating an audience of one person (which I could replicate later to simulate N persons), (b) omitting AI and letting a human operator manually select the reaction of the simulated audience (similar to Wizard of Oz prototyping), and (c) implementing the graphics using a simple 2D avatar. Using a mix of several coding assistants, I built a basic version in the time I had. The avatar could move subtly and blink, but otherwise it used basic graphics. Even though it fell far short of a sophisticated audience simulator, I am glad I built this. In addition to moving the project forward and letting me explore different designs, it advanced my knowledge of basic graphics. Further, having this crude prototype to show friends helped me get user feedback that shaped my views on the product idea. I have on my laptop a list of ideas of things that I think would be interesting to build. Most of them would take much longer than the handful of hours I might have to try something on a given day, but by cutting their scope, I can get going, and the initial progress on a project helps me decide if it’s worth further investment. As a bonus, hacking on a wide variety of applications helps me practice a wide range of skills. But most importantly, this gets an idea out of my head and potentially in front of prospective users for feedback that lets the project move faster. [Original text: https://t.co/UP6arTWAdV ]

372

325K

126

Varun Vasudevan @DevanVarun

about 1 year ago

Most interesting section in the paper

Mehrdad Farajtabar @MFarajtabar

about 1 year ago

🧵 7/8 Result #4: Catastrophic failure on exact computation ⚠️ Even when we GAVE the solution algorithm (so they just need execute these steps!) to the reasoning models, they still failed at the SAME complexity points. This suggests fundamental limitations in symbolic manipulation, not just problem-solving strategy. Even more strange is the inconsistency of the search and computation capabilities across different environments and scales. For instance, Claude 3.7 (w. thinking) can correctly do ~100 moves of Tower of Hanoi near perfectly, but fails to explore more than "4 moves" in the River Crossing puzzle or fails earlier when puzzles scale and need longer solutions!

MFarajtabar's tweet photo. 🧵 7/8 Result #4: Catastrophic failure on exact computation ⚠️

Even when we GAVE the solution algorithm (so they just need execute these steps!) to the reasoning models, they still failed at the SAME complexity points. This suggests fundamental limitations in symbolic manipulation, not just problem-solving strategy.

Even more strange is the inconsistency of the search and computation capabilities across different environments and scales. For instance, Claude 3.7 (w. thinking) can correctly do ~100 moves of Tower of Hanoi near perfectly, but fails to explore more than "4 moves" in the River Crossing puzzle or fails earlier when puzzles scale and need longer solutions!

162

56K

Varun Vasudevan @DevanVarun

about 1 year ago

Tried @MLflow today. Incredibly easy to use!

MLflow

@MLflow

about 1 year ago

Have you heard the news? MLflow now supports tracking for @DSPyOSS optimization workflows—just like it does for #PyTorch training! With DSPy, you can automatically optimize your #LLM prompts using powerful optimizers, and now #MLflow is the first and only framework to bring full visibility and observability to that process. �� Whether you’re tuning prompts or training models, MLflow helps you see what’s happening under the hood, so you can iterate faster and build better. 🚀 Get started today: https://t.co/VOPfykFbAi 🔗 MLflow documentation: https://t.co/plJiuvkHoE Want to learn more❓ Don’t miss Chen Qian’s @Data_AI_Summit (June 9-12) session “Streamlining DSPy Development: Track, Debug, and Deploy With MLflow” to see how integrating MLflow with DSPy makes it easier than ever to iterate, understand, and productize your #GenAI workflows! 🙌 Learn more about this session ➡️ https://t.co/GP9JBq92yc #LLMops #PromptEngineering #OpenSourceAI #oss #DSPy

MLflow's tweet photo. Have you heard the news? MLflow now supports tracking for @DSPyOSS optimization workflows—just like it does for #PyTorch training!

With DSPy, you can automatically optimize your #LLM prompts using powerful optimizers, and now #MLflow is the first and only framework to bring full visibility and observability to that process. ��

Whether you’re tuning prompts or training models, MLflow helps you see what’s happening under the hood, so you can iterate faster and build better.

🚀 Get started today: https://t.co/VOPfykFbAi
🔗 MLflow documentation: https://t.co/plJiuvkHoE

Want to learn more❓ Don’t miss Chen Qian’s @Data_AI_Summit (June 9-12) session “Streamlining DSPy Development: Track, Debug, and Deploy With MLflow” to see how integrating MLflow with DSPy makes it easier than ever to iterate, understand, and productize your #GenAI workflows! 🙌

Learn more about this session ➡️ https://t.co/GP9JBq92yc

#LLMops #PromptEngineering #OpenSourceAI #oss #DSPy

126

108

10K

Varun Vasudevan

@DevanVarun

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users