Berkeley AI Research @Berkeley_AI - Twitter Profile

about 19 hours ago

Visual language models (VLMs) are surprisingly bad at comparative visual reasoning - detect the difference type tasks needed in medicine and science. We just made VLMs stateful by post-training cross attention between visual encoder layers. Our approach can be bolted on existing frontier models.

4

61

8

54

15K

berkeley_ai retweeted

A. Sophia Koepke ✈️ CVPR2026 @ASophiaKoepke

2 days ago

#CVPR2026 paper: It's Never Too Late: Noise Optimization for Collapse Recovery in Trained Diffusion Models Text-to-image models often collapse to near-identical samples. Our fix: optimize the noise. Start from pink 🩷, not white noise. 🔗https://t.co/CVLKt6OJ5G 1/6

1

39

15

17

7K

berkeley_ai retweeted

Yangzhen Wu

@yangzhen04

2 days ago

Static benchmarks are dying — they tend to get saturated quickly. Evaluation and training data should co-evolve with frontier models. We released BenchEvolver — a framework that automatically evolves saturated problems into harder, verified tasks for evaluating frontier models, which can also serve as useful self-improvement signals for RL. New work from UC Berkeley @berkeley_ai @BerkeleyRDI @BerkeleySky Project Page: https://t.co/PL1KpGyd87 Paper: https://t.co/gBQOXrZbAV

yangzhen04's tweet photo. Static benchmarks are dying — they tend to get saturated quickly.

Evaluation and training data should co-evolve with frontier models.

We released BenchEvolver — a framework that automatically evolves saturated problems into harder, verified tasks for evaluating frontier models, which can also serve as useful self-improvement signals for RL.

New work from UC Berkeley @berkeley_ai @BerkeleyRDI @BerkeleySky

Project Page: https://t.co/PL1KpGyd87
Paper: https://t.co/gBQOXrZbAV

4

77

17

48

15K

berkeley_ai retweeted

Zirui "Colin" Wang @zwcolin

1 day ago

👀Humans compare images by looking back and forth. Many open-weight VLMs encode each image independently, and defer comparison to the LM. We introduce SVE: Stateful Visual Encoders for Vision-Language Models, where the visual encoder itself becomes change-aware. 🌐Project: https://t.co/P1ASxE5VBE 📰Paper: https://t.co/XnPbAF3Zr2 💻Code: https://t.co/TEX5T3SLmy 1/n

3

208

32

189

37K

Who to follow

The Stanford Artificial Intelligence Laboratory (SAIL), a leading #AI lab since 1963. ⛵️🤖 Emmy-winning video: https://t.co/lV9smZTC1m

Hugging Face

@huggingface

The AI community building the future. https://t.co/TpiXQMQ9rZ

berkeley_ai retweeted

Sergey Levine

@svlevine

1 day ago

If you are at CVPR 2026, I'll be giving one more talk tomorrow (Jun 4) in the ScaleBot workshop, room 610/612 1:30 pm. The topic: Scaling robot data makes it easier to scale robot data😃 Scaling data is important, and there is one weird trick to do it in robotics...

9

230

11

62

21K

berkeley_ai retweeted

Noriaki Hirose @Noriaki_Hirose

4 days ago

We are excited to share our two papers at ICRA 2026! Today, we will present Learning to Drive Anywhere with Model-Based Reannotation from 15:00–16:30. https://t.co/r46gnlJHwu

3

40

5

11

14K

berkeley_ai retweeted

Alison Gopnik @AlisonGopnik

8 days ago

Giving the MIT School of Science commencement address yesterday to an amazing group of present and future scientists. Why scientists are most like babies and grandparents, with some lessons in hope from the 18th century Lunar Club.

AlisonGopnik's tweet photo. Giving the MIT School of Science commencement address yesterday to an amazing group of present and future scientists. Why scientists are most like babies and grandparents, with some lessons in hope from the 18th century Lunar Club. https://t.co/Ull8Kx4o0J

3

47

4

1

9K

berkeley_ai retweeted

Mihran Miroyan

@mirmiroyan

9 days ago

We release Recon — a new approach to reasoning synthesis for user modeling. The key insight: post-hoc rationalization ≠ reasoning. We propose using action reconstruction as a scoring criterion for synthesized reasoning traces, yielding more causally faithful reasoning and improved downstream action prediction across user modeling tasks. Paper and project page in 🧵

mirmiroyan's tweet photo. We release Recon — a new approach to reasoning synthesis for user modeling.

The key insight: post-hoc rationalization ≠ reasoning.

We propose using action reconstruction as a scoring criterion for synthesized reasoning traces, yielding more causally faithful reasoning and improved downstream action prediction across user modeling tasks.

Paper and project page in 🧵

2

44

19

30

9K

berkeley_ai retweeted

Shuo Yang

@Andy_ShuoYang

10 days ago

Flash-KMeans was only the beginning. Today, from the Flash-KMeans team, we are releasing FlashLib — a GPU library for fast, predictable, agent-ready classical ML operators. Up to 26× on KMeans, 19× on KNN, 40× on HDBSCAN, 208× on TruncatedSVD, 47× on PCA, 147× on exact t-SNE, and 49× on MultinomialNB over state-of-the-art (cuML). Blog: https://t.co/P31SGl0cyT Code: https://t.co/9nkO2hmeOl

47

2K

234

2K

863K

berkeley_ai retweeted

Ziming Mao

@ziming_mao

11 days ago

🚀 Excited to release mKernel: a set of fast multi-node, multi-GPU fused kernels. 💻 Code: https://t.co/y2WfdMVTfC 📝 Blog: https://t.co/wGomxmeRxr mKernel fuses compute + communication into one persistent GPU kernel, covering both intra/inter-node with GPU-initiated communication. Amazing team: @yangzhouy, Chon Lam Lao, Costin Raiciu, Scott Shenker, @istoica05

ziming_mao's tweet photo. 🚀 Excited to release mKernel: a set of fast multi-node, multi-GPU fused kernels.

💻 Code: https://t.co/y2WfdMVTfC
📝 Blog: https://t.co/wGomxmeRxr

mKernel fuses compute + communication into one persistent GPU kernel, covering both intra/inter-node with GPU-initiated communication.

Amazing team: @yangzhouy, Chon Lam Lao, Costin Raiciu, Scott Shenker, @istoica05

4

401

60

274

62K

berkeley_ai retweeted

Ahmed Alaa @_ahmedmalaa

10 days ago

Last year, we wrote a position paper on the construct validity of medical LLM benchmarks (https://t.co/sGa6huy51A), i.e., datasets should reflect real-world data & workflows. We're excited to share a new dataset of 25K clinical notes with the goal of improving validity of evals.

2

31

8

22

11K

berkeley_ai retweeted

Nikita Mehandru @nikita_mehandru

12 days ago

🩺Medical benchmarks measure if LLMs get the correct final diagnosis. True clinical reasoning requires sequential belief updating: does the model revise its beliefs appropriately as new evidence appears? New preprint: https://t.co/mtAkQQEbUG

nikita_mehandru's tweet photo. 🩺Medical benchmarks measure if LLMs get the correct final diagnosis. True clinical reasoning requires sequential belief updating: does the model revise its beliefs appropriately as new evidence appears?

New preprint: https://t.co/mtAkQQEbUG https://t.co/MsvyJXtjsi

4

42

9

40

23K

berkeley_ai retweeted

Ken Goldberg @Ken_Goldberg

12 days ago

We'll present 6 papers @IEEEorg #ICRA2026 on topics including robot cable routing, surgical suturing, and fine art painting. All now available here: https://t.co/tarWRBnt4D

2

64

7

22

19K

berkeley_ai retweeted

Melissa Pan

@melissapan

12 days ago

Excited to share that MAP has been selected for ✨ICML Oral✨ We look forward to sharing the insights in the paper with the community And much much appreciations to everyone who participated in our study ❤️ MAP won’t be possible without your contribution to open science

melissapan's tweet photo. Excited to share that MAP has been selected for ✨ICML Oral✨

We look forward to sharing the insights in the paper with the community

And much much appreciations to everyone who participated in our study ❤️ MAP won’t be possible without your contribution to open science https://t.co/7jx6s5ySVT

7

169

15

42

32K

berkeley_ai retweeted

Angjoo Kanazawa @akanazawa

14 days ago

Babies learn by being naturally curious. How do we get autonomous agents to do the same? We revisited curiosity in 3D exploration and found that memory is key. This project taught me a lot about what kind of functions an agent and a "world model" need to have for this direction

2

148

20

78

33K

berkeley_ai retweeted

Lily Goli @lily_goli

15 days ago

🚀 🚀 🚀 Excited to share our new paper: Remember to be Curious: Episodic Context and Persistent Worlds for 3D Exploration What does it take for an agent to stay curious in a 3D world? The answer is memory. 🌐 Project: https://t.co/G4SjLoFJht 📄 Paper: https://t.co/iUFwp5NvRu 💻 Code: https://t.co/KZRaQLyzyh

2

222

40

129

70K

berkeley_ai retweeted

Lakshya A Agrawal

@LakshyAAAgrawal

16 days ago

Our paper on optimize_anything has been accepted to CAIS 2026, and is out on Arxiv with expanded experiments and details! A unified API to optimize agents (with architecture), CUDA kernels, cloud scheduling policies, or even graphics! https://t.co/HlWwS77skg

LakshyAAAgrawal's tweet photo. Our paper on optimize_anything has been accepted to CAIS 2026, and is out on Arxiv with expanded experiments and details!

A unified API to optimize agents (with architecture), CUDA kernels, cloud scheduling policies, or even graphics!

https://t.co/HlWwS77skg https://t.co/voWDevNW3p

4

177

22

103

23K

berkeley_ai retweeted

Dawn Song

@dawnsongtweets

15 days ago

1/ Can AI agents turn security vulnerabilities into real attacks? This is one of the most critical tasks for measuring the impact of frontier AI on cybersecurity. In ExploitGym, we find that autonomous exploitation is no longer hypothetical, even on complex targets such as browser engines and the Linux kernel. How we measured this⬇️

dawnsongtweets's tweet photo. 1/ Can AI agents turn security vulnerabilities into real attacks?

This is one of the most critical tasks for measuring the impact of frontier AI on cybersecurity.

In ExploitGym, we find that autonomous exploitation is no longer hypothetical, even on complex targets such as browser engines and the Linux kernel.

How we measured this⬇️

6

119

36

86

19K

berkeley_ai retweeted

Giuseppe Loianno @loiannog

17 days ago

RAPTOR-our new tiny foundation policy for quadrotors has just appeared on @SciRobotics! A single compact policy that adapts in milliseconds across different quadrotors and autopilots, flies zero-shot with no fine-tuning, and simultaneously tested on multiple platforms!

2

28

3

8

11K

berkeley_ai retweeted

Yichuan Wang

@YichuanM

18 days ago

LEANN just won the Best Paper Award at #MLSys26 🥹 still processing this. paper: https://t.co/k3qS1V5156 repo: https://t.co/QwkYx1t0oa huge thanks to all the amazing collaborators, advisors, and open-source contributors who made this possible ❤️

YichuanM's tweet photo. LEANN just won the Best Paper Award at #MLSys26 🥹

still processing this.

paper: https://t.co/k3qS1V5156
repo: https://t.co/QwkYx1t0oa

huge thanks to all the amazing collaborators, advisors, and open-source contributors who made this possible ❤️ https://t.co/FPzFnZQLVW

21

276

50

152

42K

Berkeley AI Research

@berkeley_ai

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users