Chirag Agarwal

Verified account

@_cagarwal

Assistant Professor @UVA; PI of Aikyam Lab; Prev - @Harvard, @Adobe @BoschGlobal @thisisUIC ; Increasing the sample size of my thoughts

Joined November 2013

585 Following

1.9K Followers

359 Posts

Pinned Tweet

about 2 months ago

I am absolutely thrilled to announce that four research papers from our group + collaborations have been accepted to ACL 2026, covering critical areas of Reasoning, Interpretability, Safety, Multimodal AI, and Model Unlearning. Huge congratulations to all the authors and collaborators for their contributions! Stay tuned for updates and links to our papers soon! #ACL2026 #AikyamLab

_cagarwal's tweet photo. I am absolutely thrilled to announce that four research papers from our group + collaborations have been accepted to ACL 2026, covering critical areas of Reasoning, Interpretability, Safety, Multimodal AI, and Model Unlearning. Huge congratulations to all the authors and collaborators for their contributions!

Stay tuned for updates and links to our papers soon!

#ACL2026 #AikyamLab

2

16

0

1

631

_cagarwal retweeted

Bhavya Kailkhura

2 days ago

AI is no longer just a chatbot; it is becoming an everyday advisor: shaping how we make decisions about money, health, work, relationships, and other important daily choices. A common belief is that if something goes wrong, we can simply inspect the AI agent’s chain-of-thought to understand why it made a specific decision. Our paper challenges that assumption: models can produce convincing reasoning while hiding what actually influenced their answer. This behavior is even more severe if you interact with your AI in low-resource languages like Arabic, Korean, Russian, Swahili, Telugu, etc. Our experiments found AI manipulating steps, rationalizing after the fact, or following misleading hints that concealed their actual reasoning. The takeaway is clear: a transparent-looking chain-of-thought is not the same as a reliable audit trail. To build AI agents we can trust, we need more research on multilingual, causal, and verifiable monitoring methods. Fun collaborating with @EricOnyame @zhou_runtao @kowshik0808 @_cagarwal

bkailkhu's tweet photo. AI is no longer just a chatbot; it is becoming an everyday advisor: shaping how we make decisions about money, health, work, relationships, and other important daily choices.

A common belief is that if something goes wrong, we can simply inspect the AI agent’s chain-of-thought to understand why it made a specific decision.

Our paper challenges that assumption: models can produce convincing reasoning while hiding what actually influenced their answer. This behavior is even more severe if you interact with your AI in low-resource languages like Arabic, Korean, Russian, Swahili, Telugu, etc.

Our experiments found AI manipulating steps, rationalizing after the fact, or following misleading hints that concealed their actual reasoning. The takeaway is clear: a transparent-looking chain-of-thought is not the same as a reliable audit trail.

To build AI agents we can trust, we need more research on multilingual, causal, and verifiable monitoring methods.

Fun collaborating with @EricOnyame @zhou_runtao @kowshik0808 @_cagarwal

0

2

1

0

229

2 days ago

Work led by @EricOnyame and @zhou_runtao, and great collaboration with @kowshik0808 and @bkailkhu!

0

1

1

0

370

2 days ago

Can we trust LLM Chain-of-Thought as a safety monitor? 🛑 Our new paper reveals that CoT monitoring collapses under linguistic shifts. Mechanistically, models committed to a misaligned cue in their latent activations within the first 15% of generation. Paper: https://t.co/tG4qeQjnR3 Website: https://t.co/eI7Z4SDL5q Here’s what we found across 13 languages: 👇

1

3

1

2

375

Who to follow

Research Scientist @NVIDIA | Formerly @Google, @Cornell | Views are my own

Associate Professor in EECS at MIT, trying to understand intelligence.

Verified account

LLM Efficiency @NVIDIA - views have always been only my own 🥇🥈 @ Flunkyball Polish Championships

2 days ago

The danger scales with language resources. In low-resource languages, deceptive patterns hit 100%. Relying on English-only safety evaluations leaves massive vulnerabilities in global AI deployment 🌍

1

0

0

0

149

3 days ago

@AbakaAI_Tech @CVPR Exactly - we've normalized processing huge amounts of image tokens when many tasks only require a small fraction of that visual processing. Reducing unnecessary token processing could be a massive unlock for efficient real-time VLMs.

0

1

0

0

43

8 days ago

Do Vision Language Models (VLMs) really need deep visual processing? 🤖 Our paper accepted as Oral Presentation @CVPR TRUE-V Workshop suggests the current paradigm of multimodal LLM architectures might be wildly inefficient. Here is why we might be over-processing image tokens 🧵 (1/4) Paper: https://t.co/PNM4XXULaV Code: https://t.co/C0oDExrlgZ Congrats to @sambitghsh for leading this work! Great collaboration with @rvbabuiisc and @val_iisc!

_cagarwal's tweet photo. Do Vision Language Models (VLMs) really need deep visual processing? 🤖

Our paper accepted as Oral Presentation @CVPR TRUE-V Workshop suggests the current paradigm of multimodal LLM architectures might be wildly inefficient.

Here is why we might be over-processing image tokens 🧵 (1/4)

Paper: https://t.co/PNM4XXULaV
Code: https://t.co/C0oDExrlgZ

Congrats to @sambitghsh for leading this work! Great collaboration with @rvbabuiisc and @val_iisc!

3

6

1

3

565

4 days ago

NeurIPS + EMNLP just hit 50k+ submissions in May. We’re either witnessing the greatest explosion of ideas in AI history… or the peer review system is collapsing under its own weight.

0

2

0

0

205

8 days ago

The catch of our analysis? It’s task-dependent. Complex multi-token generation still needs sustained visual depth, but intermediate reasoning is affected more than final answers. Time to rethink how we design efficient, lean VLMs?

0

1

0

0

79

8 days ago

Crucially, once image tokens stabilize, they become largely interchangeable between deeper layers. We show that deep visual processing is often redundant, adding massive computational overhead for very little reward (3/4)

1

1

0

0

215

21 days ago

Another banger from @adaption_ai!

22 days ago

Most model trainings have failed outside of frontier labs. Even inside frontier labs, knowing how to train for very different capabilities is often a matter of taste. Today, we introduce AutoScientist by @adaption_ai which sets out to change that.

32

529

68

312

101K

0

0

0

0

139

about 1 month ago

PageGuide grounds LLM output with visual overlays, addressing unverifiable answers, navigation struggles, and page clutter through Find, Guide, and Hide modes. Check it out!! 🍊🍊

Tin (Kevin) Nguyen @tin_ng_qn

about 1 month ago

Reading LLM answers but don’t know where they come from? Struggling to find the right button to click? Or distracted by cluttered content on the page? We built PageGuide (https://t.co/WdLKkiZym7) to fix that

1

9

7

0

546

1

2

0

0

159

about 1 month ago

🚀 CLINIC is heading to @icmlconf 2026 and it marks the debut of our new lab at ICML! More details soon. GitHub: https://t.co/3nWvPztGpG Paper: https://t.co/pPgTtV0ggd

_cagarwal's tweet photo. 🚀 CLINIC is heading to @icmlconf 2026 and it marks the debut of our new lab at ICML! More details soon.

GitHub: https://t.co/3nWvPztGpG
Paper: https://t.co/pPgTtV0ggd https://t.co/vtauZXziul

0

10

1

0

411

about 1 month ago

I am honored to speak in the @WHOSEARO Regional Office x @KCDH_A webinar series on the Illusion of Transparency of Frontier Models in Healthcare. https://t.co/eV0JKH4Yuj

_cagarwal's tweet photo. I am honored to speak in the @WHOSEARO Regional Office x @KCDH_A webinar series on the Illusion of Transparency of Frontier Models in Healthcare.

https://t.co/eV0JKH4Yuj https://t.co/MIcV3V5XkL

0

2

0

0

64

2 months ago

Words have never been cheaper. Thought has never been more expensive. We are optimizing for the wrong one.

0

1

0

1

155

4 months ago

Special shoutout to @EricOnyame and Akash Ghosh for leading this work and thanks to the amazing co-authors Subhadip Baidya, Xiuying Chen (@mbzuai), and Sriparna Saha (@iitpatna).

0

0

0

0

108

4 months ago

Excited to share CURE-Med, our new work on making LLMs reliable for medical reasoning across the different languages🌍 In healthcare, models can’t just be great in English and then collapse when deployed in new languages. They need to adapt to new tasks (languages, dialects, medical contexts) without catastrophic forgetting of what you already know. We tackle this head-on with curriculum-informed RL. Datasets and Models on HuggingFace: https://t.co/gT1AZKhBFR Website: https://t.co/IhdE7SpQFQ

1

7

0

0

654

4 months ago

Our scaling results from 1.5B to 32B shows consistent improvements, where Cure-Med outperforms medical LLMs on OOD benchmarks and human evaluations confirm robustness!

_cagarwal's tweet photo. Our scaling results from 1.5B to 32B shows consistent improvements, where Cure-Med outperforms medical LLMs on OOD benchmarks and human evaluations confirm robustness! https://t.co/KKpOXYi0FI

1

2

0

0

156

Last Seen Users on Sotwe

Trends for you

Most Popular Users