Reshmi Ghosh @reshmigh - Twitter Profile

Pinned Tweet

10 months ago

🚨New paper! With @UMassAmherst , @UofMaryland: "Hop, Skip, and Overthink: Diagnosing Why Reasoning Models Fumble during Multi-Hop Analysis"🤯. Why do #reasoningmodels break down when chaining multiple steps? We studied #CoT traces to find out. 🧵(1/n) 🔗https://t.co/upzlb39m3n

reshmigh's tweet photo. 🚨New paper! With @UMassAmherst , @UofMaryland:
"Hop, Skip, and Overthink: Diagnosing Why Reasoning Models Fumble during Multi-Hop Analysis"🤯.
Why do #reasoningmodels break down when chaining multiple steps? We studied #CoT traces to find out.
🧵(1/n)
🔗https://t.co/upzlb39m3n https://t.co/9pGkWfsJeF

2

13

4

8

2K

reshmigh retweeted

Hua Shen✨

@huashen218

7 months ago

🧐Are values in LLMs aligned with humans? 1️⃣ And if they are — do LLMs stay honest to those values, or sometimes say one thing but act another? 2️⃣ ✨ We explore these questions in two papers presented at #EMNLP2025: 1️⃣ ValueCompass: https://t.co/oN2p9yciqY (WiNLP Workshop) 2️⃣ Mind the Value–Action Gap: https://t.co/TGi8oZ1RSl (Main Track) 🔍 Dataset & Code: https://t.co/CxTfTNiMVF 🌱 I’m also #Hiring multiple PhD students for Fall 2026 @ NYU Courant Computer Science! If you’re passionate about #Human_AI_Alignment, #Value_Alignment, or broad #AI + #Human (society) research, let’s connect at EMNLP2025, NeurIPS2025, or over Zoom! 🎓 NYU CS PhD Apply (NYU Shanghai Track): https://t.co/wjazsiwub8 💜 This year I’m also co-organizing the #EMNLP2025 WiNLP Workshop and supporting the amazing #Tutorial on Spoken Conversational Agents with LLMs (a short 15min talk)! Come say hi 👋 — I’d love to chat and connect with old and new friends at #EMNLP2025! 🔗 WiNLP Workshop: https://t.co/1XvgLfPKMl 🔗Tutorial on Spoken Conversational Agents: https://t.co/rpVYqwXxBq 💗Huge thanks to my wonderful paper collaborators — @tanmit,@YunHuang_HCI,@tknearem,@reshmigh,Nicholas Clark,Yu-Ju Yang — and my inspiring workshop/tutorial collaborators @huckiyang, Andreas Stolcke,@TYSSSantosh2,@therealthapa,@MeryemMhamdi1,Chen Zhang, Peerat Limkonchotiwat, Wiem Ben Rim.... 🤗Truly grateful and enjoyable to work with you all! 💫 #HumanAIAlignment #PhDOpening #NYU #NYUShanghai #ValueAlignment #HAI

huashen218's tweet photo. 🧐Are values in LLMs aligned with humans? 1️⃣
And if they are — do LLMs stay honest to those values, or sometimes say one thing but act another? 2️⃣

✨ We explore these questions in two papers presented at #EMNLP2025:
1️⃣ ValueCompass: https://t.co/oN2p9yciqY (WiNLP Workshop)
2️⃣ Mind the Value–Action Gap: https://t.co/TGi8oZ1RSl (Main Track)
🔍 Dataset & Code: https://t.co/CxTfTNiMVF

🌱 I’m also #Hiring multiple PhD students for Fall 2026 @ NYU Courant Computer Science!
If you’re passionate about #Human_AI_Alignment, #Value_Alignment, or broad #AI + #Human (society) research, let’s connect at EMNLP2025, NeurIPS2025, or over Zoom!
🎓 NYU CS PhD Apply (NYU Shanghai Track): https://t.co/wjazsiwub8

💜 This year I’m also co-organizing the #EMNLP2025 WiNLP Workshop and supporting the amazing #Tutorial on Spoken Conversational Agents with LLMs (a short 15min talk)!
Come say hi 👋 — I’d love to chat and connect with old and new friends at #EMNLP2025!
🔗 WiNLP Workshop: https://t.co/1XvgLfPKMl
🔗Tutorial on Spoken Conversational Agents: https://t.co/rpVYqwXxBq

💗Huge thanks to my wonderful paper collaborators — @tanmit,@YunHuang_HCI,@tknearem,@reshmigh,Nicholas Clark,Yu-Ju Yang — and my inspiring workshop/tutorial collaborators @huckiyang, Andreas Stolcke,@TYSSSantosh2,@therealthapa,@MeryemMhamdi1,Chen Zhang, Peerat Limkonchotiwat, Wiem Ben Rim.... 🤗Truly grateful and enjoyable to work with you all! 💫

#HumanAIAlignment #PhDOpening #NYU #NYUShanghai #ValueAlignment #HAI

1

95

15

43

27K

Reshmi Ghosh @reshmigh

7 months ago

So Agents are flat earthers? :D

Can Vardar

@icanvardar

7 months ago

finally, linkedin is funny

464

178K

6K

5K

6M

0

2

0

276

reshmigh retweeted

Tim Althoff @timalthoff

7 months ago

(please reshare) I'm recruiting multiple PhD students and Postdocs @uwcse @uwnlp (https://t.co/I5wQsFnCLL). Focus areas incl. psychosocial AI simulation and safety, Human-AI collaboration. PhD: https://t.co/ku40wCrpYh Postdocs: https://t.co/K9HUIPJ5h6

timalthoff's tweet photo. (please reshare) I'm recruiting multiple PhD students and Postdocs @uwcse @uwnlp
(https://t.co/I5wQsFnCLL). Focus areas incl. psychosocial AI simulation and safety, Human-AI collaboration.

PhD: https://t.co/ku40wCrpYh

Postdocs: https://t.co/K9HUIPJ5h6 https://t.co/BGfXdu9qmz

7

400

111

224

36K

Who to follow

Neha Sankhe

@NehaSankhe

Building Frappe. Learning to balance life goals.

reshmigh retweeted

Myra Deng @myra_deng

7 months ago

Using probes to accurately and efficiently detect model behavior (in this case PII leakage) in prod is one of the clear wins for applied interpretability. This is the path to semantic determinism - imagine AI models instrumented with internal probes that recognize when they’re hallucinating, going off-policy, or posing biorisk, and resteering themselves accordingly.

5

258

17

210

36K

reshmigh retweeted

Lily Xu @lilyxu0

7 months ago

Launching AI for Public Goods Fast Grants! We'll distribute $150k to advance critical work connecting AI and public goods. 💰 $10k per project 💰 $800 reviewer compensation PUBLIC GOODS := open source, ecosystem services, climate, urban infra, comms, education, science, & more

5

156

35

122

25K

Reshmi Ghosh @reshmigh

7 months ago

@AmyPrb Prompt Injection is very much an industry/practical use of AI problem!!

0

33

Reshmi Ghosh @reshmigh

7 months ago

Evaluations.....

Kyunghyun Cho

@kchonyc

7 months ago

wow

kchonyc's tweet photo. wow https://t.co/8uyYvj27W6

20

890

109

220

80K

0

1

0

132

reshmigh retweeted

Niloofar

@niloofar_mire

8 months ago

I'm recruiting students for fall 2026 thru @LTIatCMU & @CMU_EPP, in: 1. Privacy & security of LLMs, coding, long horizon & embodied agents (robotics) 2. Tiny local llms 3. AI for scientific reasoning, esp. chemistry 4. Latent reasoning 5. anything YOU are passionate about!

26

1K

182

545

110K

Reshmi Ghosh @reshmigh

8 months ago

It is an infinite glitch circle now!

Ahmad Beirami

@abeirami

8 months ago

@nmboffi But who are these reviewers? They are the same authors. I think we should teach young members of our community to value "learning a new nugget of information" over "obtaining a bold number in a table."

2

25

2

1

3K

0

1

0

465

reshmigh retweeted

Sarah Sachs

@sarahmsachs

8 months ago

Being at top of @OpenAI token usage list is a vanity metric. Our job as engineers is to minimize token usage (aka latency and cost) while maximizing value by precise tool definitions and clever model routing. My dream is to grow arr and move lower on this list…

sarahmsachs's tweet photo. Being at top of @OpenAI token usage list is a vanity metric. Our job as engineers is to minimize token usage (aka latency and cost) while maximizing value by precise tool definitions and clever model routing. My dream is to grow arr and move lower on this list… https://t.co/HWxDUR8APM

161

5K

123

1K

950K

Reshmi Ghosh @reshmigh

8 months ago

Can someone in the room define what is the commonly accepted definition of AGI?

Haider.

@haider1

8 months ago

Important thread on AGI from Anthropic researcher: - we're likely to see AI solving real open research problems in math in the next months - by 2027, models could complete a full day's software work with 50% success - compute power might grow 10,000x in the next five years - we are still early in the AI exponential... small interventions early in exponential growth have huge consequences - within a few years, AI may surpass humans on all intellectual tasks

haider1's tweet photo. Important thread on AGI from Anthropic researcher:

- we're likely to see AI solving real open research problems in math in the next months
- by 2027, models could complete a full day's software work with 50% success
- compute power might grow 10,000x in the next five years
- we are still early in the AI exponential... small interventions early in exponential growth have huge consequences
- within a few years, AI may surpass humans on all intellectual tasks

45

445

60

215

36K

1

0

1

177

Reshmi Ghosh @reshmigh

8 months ago

More internship opportunities for those that are looking

Shawn Tan @tanshawn

8 months ago

We're looking for 2 interns for Summer 2026 at the MIT-IBM Watson AI Lab Foundation Models Team. Work on RL environments, enterprise benchmarks, model architecture, efficient training and finetuning, and more! Apply here: https://t.co/hS2meJm9j4

9

453

52

427

31K

0

203

reshmigh retweeted

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭

@elder_plinius

8 months ago

🚨 JAILBREAK ALERT 🚨 ANTHROPIC: PWNED 🤗 CLAUDE-SONNET-4.5: LIBERATED 🦅 Woooeee this model is a real smarty pants!! I ain't never seen recipes quite like this! High level of detail all around, code especially 👀 Sonnet 4.5 also has a tendency to make some fairly impressive leaps across latent space, like starting with MDMA then going to Fentanyl then to Meth recipes etc without being explicitly prompted for a new drug! Nothing too fancy is even necessary to escalate to jailbreak territory. Best strategy I found for breaking the chat interface was to take things straight into an artifact render (which adds tons of token noise due to the code scaffolding) and then incrementally escalate severity or steer towards trigger concepts in a Socratic fashion over multiple steps. A little French was needed to get around the CBRNE classifiers, mais c'est la vie! 😘 Come, witness Sonnet-4.5 outputting a ricin recipe, meth synthesis, malware, and how to extract and process cocaine! gg

elder_plinius's tweet photo. 🚨 JAILBREAK ALERT 🚨

ANTHROPIC: PWNED 🤗
CLAUDE-SONNET-4.5: LIBERATED 🦅

Woooeee this model is a real smarty pants!! I ain't never seen recipes quite like this! High level of detail all around, code especially 👀

Sonnet 4.5 also has a tendency to make some fairly impressive leaps across latent space, like starting with MDMA then going to Fentanyl then to Meth recipes etc without being explicitly prompted for a new drug!

Nothing too fancy is even necessary to escalate to jailbreak territory. Best strategy I found for breaking the chat interface was to take things straight into an artifact render (which adds tons of token noise due to the code scaffolding) and then incrementally escalate severity or steer towards trigger concepts in a Socratic fashion over multiple steps. A little French was needed to get around the CBRNE classifiers, mais c'est la vie! 😘

Come, witness Sonnet-4.5 outputting a ricin recipe, meth synthesis, malware, and how to extract and process cocaine!

gg

83

2K

109

989

214K

reshmigh retweeted

LaurieWired

@lauriewired

8 months ago

if you’re an EE, CS, or cryptography student write your thesis on public key cryptography at the image sensor level Proof of Physical capture will become a backbone of society soon.

280

22K

2K

7K

1M

reshmigh retweeted

vas

@vasuman

8 months ago

Claude 4.5 Sonnet just refactored my entire codebase in one call. 25 tool invocations. 3,000+ new lines. 12 brand new files. It modularized everything. Broke up monoliths. Cleaned up spaghetti. None of it worked. But boy was it beautiful.

513

12K

550

1K

636K

Reshmi Ghosh @reshmigh

8 months ago

Hear hear Interns

Yunyao Li

@yunyao_li

8 months ago

🚀 I'm hiring 2026 Applied Scientist / ML Engineering Interns to push the frontier of multi-agent AI for the enterprise. 💡 Research NLU, generative & agent-based AI, machine learning ⚡ Build scalable models, benchmark datasets & metrics 🤝 Create impactful solutions for publication and production ⭐️ Full-time conversion opportunities for PhD / MS students graduating in late 2026 / mid-2027 🔗 [Apply Now] https://t.co/r9oFORCRw0 #AI #MachineLearning #Internship #Adobe

20

646

49

535

65K

0

1

0

489

reshmigh retweeted

Kushan Mitra

@kushanmitra

9 months ago

Vah, pothole alerts built in to @atherenergy maps for multiple cities

233

11K

902

600

778K

reshmigh retweeted

ℏεsam

@Hesamation

9 months ago

ML interview question: why do embeddings come in 768 or 1024? - “because BERT did it” - “because of GPU optimization” BUT WHY?! The replies under this post is everything wrong with current courses and blog posts: superficiality. this isn’t reasoning, it’s memorization

44

3K

77

1K

360K

Reshmi Ghosh @reshmigh

9 months ago

@TWhidden I think it is resolved now :) I am able to use it

0

29

Reshmi Ghosh @reshmigh

9 months ago

@abeirami The founders will make sure there is “no bureaucracy”. lol that line took me out

0

1

0

130

Reshmi Ghosh

@reshmigh

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users