Abhimanyu Hans

@ahans30

post training something pretrained (hopefully) | cs phd candidate advised by @tomgoldsteincs at @umdcs

Washington, DC

Joined July 2011

1.2K Following

253 Followers

216 Posts

Abhimanyu Hans @ahans30

about 1 month ago

@sytelus This paper is saying that data filtering is a compute efficiency problem & not model quality problem. And its worth saying imo. I think your qualms are that you don't buy the "high-compute, data-scarce" assumption -- under which they operate.

170

Abhimanyu Hans @ahans30

about 1 month ago

@OwainEvans_UK Yes, but I see you finetuned w/o any chat template, not sure if that's realistic for deployed llm (did eval also not use it?) Does this result replicates w/ chat template? That would be surprising (and more interesting!). But currently model just acts like unaligned base model.

ahans30's tweet photo. @OwainEvans_UK Yes, but I see you finetuned w/o any chat template, not sure if that's realistic for deployed llm (did eval also not use it?)

Does this result replicates w/ chat template? That would be surprising (and more interesting!). But currently model just acts like unaligned base model. https://t.co/GsnWZpoRe2

ahans30 retweeted

Jonas Geiping

@jonasgeiping

about 1 month ago

We’re training models wrong and it’s due to chatGPT. Even the modern coding agents used daily still use message-based exchanges: They send messages to users, to themselves (CoT) and to tools, and receive messages in turn. This bottlenecks even very intelligent agents to a single stream. The models cannot read while writing, cannot act while thinking and cannot think while processing information. In our new paper, see below, we discuss LLMs with parallel streams. We show that multi-stream LLMs can … 🔵Be created by instruction-tuning for the stream format 🔵Simplify user and tool use UX removing many pain points with agents and chat models (such as having to interrupt the model to get a word in) 🔵Multi-Stream LLMs are fast, they can predict+read tokens in all streams in parallel in each forward pass, improving latency 🔵 LLMs with multiple streams have an easier time encoding a separation of concerns, improving security 🔵 LLMs with many internal streams provide a legible form of parallel/cont. reasoning. Even if the main CoT stream is accidentally pressured or too focused on a particular task to voice concerns, other internal streams can subvocalize concerns that would otherwise not be verbalized. Does this sound related to a recent thinky post :) - Yes, but I don’t feel so bad about being outshipped with such a cool report on their side by 23 hours. I’ll link a 2nd thread below with a more direct comparison. I actually think both are complementary in interesting ways.

171

158K

ahans30 retweeted

Hamid Kazemi

@hamid_kazemi22

about 1 month ago

🧵1/ A single neuron is sufficient to bypass safety alignment in LLMs. Across 7 models, 2 families, and scales from 1.7B to 70B, suppressing one MLP neuron bypasses refusal behavior — with no fine-tuning and no prompt engineering. We call them refusal neurons. We also study concept neurons: neurons that encode harmful knowledge itself. As a proof of concept, we identify suicide-related neurons. Our analysis reveals several interesting results⬇️ Joint work with @AtoosaChegini (equal contribution) , Maria Safi

hamid_kazemi22's tweet photo. 🧵1/ A single neuron is sufficient to bypass safety alignment in LLMs.

Across 7 models, 2 families, and scales from 1.7B to 70B, suppressing one MLP neuron bypasses refusal behavior — with no fine-tuning and no prompt engineering.
We call them refusal neurons.

We also study concept neurons: neurons that encode harmful knowledge itself. As a proof of concept, we identify suicide-related neurons. Our analysis reveals several interesting results⬇️

Joint work with @AtoosaChegini (equal contribution) , Maria Safi

460

370

31K

Who to follow

Manli Shu

@ManliShu

Gemini multimodality @GoogleDeepMind | PhD @umdcs. Prev @SFResearch @Nvidia Words are my own.

Youngseung Jeon

@YoungseungJ

PhD candidate @UCLAengineering | AI and HCI for Science and Creativity | Research Scientist Intern @ToyotaResearch

DGX H100 stan (e/acc)

@Cuda_kernel

Ha ha Mixed Precision Float16 for brrr

ahans30 retweeted

george hotz archive @geohotarchive

about 2 months ago

Positivity https://t.co/ArCoRhRrt7

384

118

52K

ahans30 retweeted

Connor Dilgren @ConnorDilgren

2 months ago

Excited to announce my first preprint in LM interpretability! Latent reasoning models are not monitorable by default, since they don't reason in human-readable, natural language text. But can we make progress in understanding their intermediate reasoning steps using mech interp?

ConnorDilgren's tweet photo. Excited to announce my first preprint in LM interpretability!

Latent reasoning models are not monitorable by default, since they don't reason in human-readable, natural language text. But can we make progress in understanding their intermediate reasoning steps using mech interp? https://t.co/P9v3jPT45N

206

117

15K

ahans30 retweeted

Katherine Thai

@kthai1618

3 months ago

Open Source Pangram is out now! We have released the datasets, code, and two models based on our EditLens work on quantifying the extent of AI editing in texts.

kthai1618's tweet photo. Open Source Pangram is out now! We have released the datasets, code, and two models based on our EditLens work on quantifying the extent of AI editing in texts. https://t.co/QMEmC1YX1T

179

54K

ahans30 retweeted

Tom Goldstein

@tomgoldsteincs

4 months ago

⛷️Here’s my entry for the fast generative model olympics🥇 The Sphere Encoder is an autocoder so powerful that it produces high quality images quickly and without diffusion. At training time, we learn an encoder that maps natural images uniformly onto the surface of a sphere. At inference time, we sample a random vector from the sphere, and a decoder makes it into an image.

tomgoldsteincs's tweet photo. ⛷️Here’s my entry for the fast generative model olympics🥇

The Sphere Encoder is an autocoder so powerful that it produces high quality images quickly and without diffusion.
At training time, we learn an encoder that maps natural images uniformly onto the surface of a sphere. At inference time, we sample a random vector from the sphere, and a decoder makes it into an image.

501

354

53K

Abhimanyu Hans @ahans30

6 months ago

@kthai1618 Congratulations Katherine!

202

ahans30 retweeted

Kaiyu Yue @kaiyuyue

10 months ago

🚀 Train Small, Run Big - Surrogate Training for Giant VLMs. Training a tiny 400M vision encoder that plugs into a 70B LLM – ✅ No billion $ GPU bills ✅ No endless fine‑tuning. Sounds like a free lunch? 🍱 Our #ICCV2025 paper shows it’s real with Zero‑Shot Grafting.📝 Paper: https://t.co/5A7hEc24pG 🧶 Thread ↓

kaiyuyue's tweet photo. 🚀 Train Small, Run Big - Surrogate Training for Giant VLMs. Training a tiny 400M vision encoder that plugs into a 70B LLM – ✅ No billion $ GPU bills ✅ No endless fine‑tuning. Sounds like a free lunch? 🍱 Our #ICCV2025 paper shows it’s real with Zero‑Shot Grafting.📝 Paper: https://t.co/5A7hEc24pG 🧶 Thread ↓

ahans30 retweeted

Thao Nguyen @thao_nguyen26

10 months ago

We released 44B synthetic tokens from our CoT-guided rewriting, offering higher quality pretraining data than the average human-written web texts📈 🤗Data: https://t.co/FN6X1oFPNL 📜Paper: https://t.co/78Vu89UvuD (accepted at #COLM2025) Excited to see what the community builds!

219

147

20K

Abhimanyu Hans @ahans30

10 months ago

>always exist an input ... for which autoregressive models will break that's a strong statement -- are you asserting models never truly generalize and never will?

406

Abhimanyu Hans @ahans30

10 months ago

@jachiam0 On downside, it feels more realigning/consolidation of oai model stack than a breathrough foundation model. It doesn't seem like a model oai would have proud to call gpt5 (say 2 years ago). the expectation (much of it invited) were/are huge. Not bad, but not otherworldly either.

Abhimanyu Hans @ahans30

10 months ago

@jachiam0 My first impression from gpt5 is that its a great model, qualitatively different from gpt4o, it feels closer to o3 (like its in middle of user and o3). there is obv smart model picker dynamics being played out. thinking mode is good too.

Abhimanyu Hans @ahans30

11 months ago

@max_spero_ lmao me neither

Abhimanyu Hans @ahans30

11 months ago

zoom bombing is lame guys, especially in 2025, especially in someone's PhD proposal talk totally unrelated but guess who's a PhD candidate now 👀

Abhimanyu Hans @ahans30

11 months ago

@max_spero_ I keep my Zoom unlocked (not anymore) and someone anonymously blasted indecent stuff on screensharing. Ended up having a bit too happening talk lol (thanks!)

106

Abhimanyu Hans @ahans30

11 months ago

@max_spero_ @moultano not at all: https://t.co/fwK4skxZxP :) weirdly, I was in market for actual literal binoculars, and its true, those things have no upper limit.

ahans30 retweeted

Furong Huang

@furongh

11 months ago

There’s been heated debate lately: Can generative AI truly self-improve? ✅Some say yes, pointing to models learning like curious humans. ❌Others say no, invoking the first law of thermodynamics: You can’t get something from nothing. No new info, no gain. 🧠 But what if the right questions could be generated on demand?🧠 🎯 Not static, not pre-written, but tailored to exactly what the model struggles with, right now. 🧑‍🎓Humans improve because we seek out new material, questions, feedback, and curriculum, just beyond our current abilities. 🔥 So what if AI agents 🤖 had access to the same kind of dynamic challenge? 🔥 A limitless pool of evolving tasks and questions that -💡scale with skill - 🔍 that reveal blind spots - 📚 that teach That’s the vision behind MORSE-500: 🎥 A programmatically controllable video benchmark to stress-test and train multimodal reasoning. 🧠 Abstract 🔄 Temporal 📦 Spatial 📈 Planning ⚙️ Physical 🧩 Mathematical Each instance is scripted with Python (Manim, Matplotlib, MoviePy), gen-AI models, and real footage. 📉 Unlike static benchmarks that models quickly outgrow, 🚀 MORSE-500 evolves. It’s not just a benchmark -- 🛝 it’s a reasoning simulator (**infinite training data!**) for next-gen AI. 📄 Paper: https://t.co/E5ALP9LjNr 🌐 Project: https://t.co/0gKjjcQTqf #MORSE500 #VLM #MultimodalReasoning #AIResearch #SelfImprovingAI ✨ We believe this is just the first step toward building a playground to test the self-improvement potential of generative AI. If you are around at #ICML2025, let's talk!

12K

Abhimanyu Hans

@ahans30

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users