Aashu Singh @iam_aashusingh - Twitter Profile

19 days ago

It was an honor to give the keynote at MLSys Covered how AI systems have evolved, why AI is needed to improve them, why results have disappointed, why the future looks amazing, and why I’m working on this at Core Auto Recording should be out soon, in the meantime slides

marksaroufim's tweet photo. It was an honor to give the keynote at MLSys
Covered how AI systems have evolved, why AI is needed to improve them, why results have disappointed, why the future looks amazing, and why I’m working on this at Core Auto
Recording should be out soon, in the meantime slides https://t.co/5pbyUHTAVC

15

446

44

298

66K

iam_aashusingh retweeted

Lucas Maes

@lucasmaes_

3 months ago

➡️ More experiments, details, and visualizations can be found in our paper! Work co-led with @quentinlldc. Huge thanks to our collaborators Damien Scieur, @ylecun, and @randall_balestr for their help, guidance, and support! 🙏 Paper: https://t.co/6AUEe8xPHF

1

73

7

15

10K

iam_aashusingh retweeted

AK

@_akhaliq

3 months ago

Xray-Visual Models Scaling Vision models on Industry Scale Data https://t.co/vdPaF4hxhw

2

42

9

17

15K

iam_aashusingh retweeted

Simo Ryu

@cloneofsimo

10 months ago

@giffmana @laurence_ai @TheGregYang Shameless plug there is this outdate blog https://t.co/rJg5e2x1Al that you gave some input actually lol

3

60

2

67

4K

Who to follow

DeeBee

@self_supervised

Building dumb machines with supervised learning; AI researcher in practice | past 》MERL, SamsungResearch | Current 》 Credit and Fraud modeling using AI

alex hocking

@alexhock

Galaxies, machine learning and stuff.

Ruihan Sun

@RuihanSun1

AI master at the University of Amsterdam, interested in Bayesian Network, GNN and Meta learning...

iam_aashusingh retweeted

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

12 months ago

a good set of tips for GRPO RL training in @willccbb's verifiers repo

8

588

59

733

45K

iam_aashusingh retweeted

Jonathan Whitaker

@johnowhitaker

12 months ago

New video, starting to look at Diffusion Language Models. This one introduces some ideas, then shows how I turn ModernBERT into a LLaDA-style generative model. Lots of avenues to explore from here! Join me in playing with this? Project ideas in thread :) https://t.co/OgzgwHEa2t

7

361

52

328

52K

iam_aashusingh retweeted

Tri Dao

@tri_dao

about 1 year ago

I love Cutlass, and this new Python DSL looks very well-designed. Will for sure accelerate kernel dev + exploring new ideas in ML + GPU. I'm already playing with it and having fun

4

224

23

66

18K

iam_aashusingh retweeted

Richard Liaw @richliaw

about 1 year ago

We’re also releasing the SkyAgent-v0 models which achieve promising results on SWE-Bench-Verified across model lines. Check it out! Blog: https://t.co/y2uwd8MF7P Model Collection: https://t.co/IrtiSNYiQX Github: https://t.co/CDXGHvAnS4 3/N

1

34

5

18

3K

iam_aashusingh retweeted

Logan Kilpatrick

@OfficialLoganK

about 1 year ago

A deep conversation with @SavinovNikolay, the Gemini long context pre-training co-lead… We go from the basics to what is needed to scale to infinite context to long context best practices for devs:

52

1K

86

885

252K

Aashu Singh @iam_aashusingh

about 1 year ago

Thrilled to share our new paper: MetaQueries! We've created novel approach that bridges MM-LLMs and diffusion models using learnable queries . The method enables knowledge augmented image generation while preserving SOTA understanding capabilities.

Xichen Pan @xichen_pan

about 1 year ago

We find training unified multimodal understanding and generation models is so easy, you do not need to tune MLLMs at all. MLLM's knowledge/reasoning/in-context learning can be transferred from multimodal understanding (text output) to generation (pixel output) even it is FROZEN!

xichen_pan's tweet photo. We find training unified multimodal understanding and generation models is so easy, you do not need to tune MLLMs at all.
MLLM's knowledge/reasoning/in-context learning can be transferred from multimodal understanding (text output) to generation (pixel output) even it is FROZEN! https://t.co/PK54eWrFpx

9

409

67

299

71K

0

1

0

156

iam_aashusingh retweeted

Russ Salakhutdinov

@rsalakhu

about 1 year ago

Llama4 models are out! Open sourced! Check them out: “Native multimodality, mixture-of-experts models, super long context windows, step changes in performance, and unparalleled efficiency. All in easy-to-deploy sizes custom fit for how you want to use it” https://t.co/sxlAKuymkR

4

153

19

25

29K

iam_aashusingh retweeted

Sebastian Raschka

@rasbt

about 1 year ago

Pretty cool "Multi-Head Attention Shape Transformations (Cheat Sheet)" shared by a reader: https://t.co/9Nprk4XHgJ

5

604

86

507

31K

iam_aashusingh retweeted

Steven Feng

@stevenyfeng

about 1 year ago

We are bringing back Stanford’s CS 25 Transformers Course (https://t.co/Yvq4AcLiBV) today! It’s open to everybody! This is one of @Stanford's hottest seminar courses. We open the course through Zoom to the public. Lectures start today (Tuesdays), 3-4:20pm PDT, at https://t.co/hhHwpf9L7h. Talks will be recorded and released ~2 weeks afterward. Each week, we invite folks at the forefront of Transformers research to discuss the latest breakthroughs, from LLM architectures like GPT and Gemini to creative use cases in generating art (e.g. DALL-E and Sora), biology and neuroscience applications, robotics, and so forth! Past speakers have included folks from @OpenAI, @GoogleDeepMind, @nvidia, @Meta, @AnthropicAI, etc. such as @karpathy, @geoffreyhinton, @DrJimFan, @ashVaswani, @_jasonwei, @hwchung27, @xiao_ted, @janleike, @YejinChoinka, @douwekiela, and many more! [Attached photos with some of them😎] Our class has an incredibly popular reception within and outside Stanford, and over a million total views of our recordings [https://t.co/Eb4hKTZrbB] on YouTube. Our class with @karpathy was the second most popular YouTube video [https://t.co/vVMNYsKsEx] uploaded by Stanford in 2023 with over 750k views! Also, livestreaming and auditing are available to all. Feel free to audit in person or by joining the Zoom livestream. We also have a Discord server [https://t.co/vlDVm30x5F] (over 5000 members) used for Transformers discussion. We open it to the public as more of a "Transformers community". Feel free to join and chat with hundreds of others about Transformers! Thanks to my co-instructors @DivGarg9 @_KaranPS_ @boson2photon Jenny Duan and the course's faculty advisor @chrmanning! More details: https://t.co/Yvq4AcLiBV @StanfordAILab @stanfordnlp @StanfordHAI @agihouse_org #AI #ArtificialIntelligence #ML #DeepLearning #NLP #NLProc #Transformers #Stanford #Education #Innovation #TechEd #Community #naturallanguageprocessing

stevenyfeng's tweet photo. We are bringing back Stanford’s CS 25 Transformers Course (https://t.co/Yvq4AcLiBV) today! It’s open to everybody!

This is one of @Stanford's hottest seminar courses. We open the course through Zoom to the public. Lectures start today (Tuesdays), 3-4:20pm PDT, at https://t.co/hhHwpf9L7h. Talks will be recorded and released ~2 weeks afterward.

Each week, we invite folks at the forefront of Transformers research to discuss the latest breakthroughs, from LLM architectures like GPT and Gemini to creative use cases in generating art (e.g. DALL-E and Sora), biology and neuroscience applications, robotics, and so forth!

Past speakers have included folks from @OpenAI, @GoogleDeepMind, @nvidia, @Meta, @AnthropicAI, etc. such as @karpathy, @geoffreyhinton, @DrJimFan, @ashVaswani, @_jasonwei, @hwchung27, @xiao_ted, @janleike, @YejinChoinka, @douwekiela, and many more! [Attached photos with some of them😎]

Our class has an incredibly popular reception within and outside Stanford, and over a million total views of our recordings [https://t.co/Eb4hKTZrbB] on YouTube. Our class with @karpathy was the second most popular YouTube video [https://t.co/vVMNYsKsEx] uploaded by Stanford in 2023 with over 750k views!

Also, livestreaming and auditing are available to all. Feel free to audit in person or by joining the Zoom livestream.

We also have a Discord server [https://t.co/vlDVm30x5F] (over 5000 members) used for Transformers discussion. We open it to the public as more of a "Transformers community". Feel free to join and chat with hundreds of others about Transformers!

Thanks to my co-instructors @DivGarg9 @_KaranPS_ @boson2photon Jenny Duan and the course's faculty advisor @chrmanning!

More details: https://t.co/Yvq4AcLiBV

@StanfordAILab @stanfordnlp @StanfordHAI @agihouse_org

#AI #ArtificialIntelligence #ML #DeepLearning #NLP #NLProc #Transformers #Stanford #Education #Innovation #TechEd #Community #naturallanguageprocessing

1

438

87

417

43K

iam_aashusingh retweeted

Sean Welleck

@wellecks

about 1 year ago

Lecture 15: Quantization (Guest lecture by @Tim_Dettmers) https://t.co/jaB5RHUwdU - Quantization basics - Quantized foundation models: LLM.int8() - Finetuning foundation models: QLoRA - Quantization and users

3

486

60

363

58K

iam_aashusingh retweeted

Xin Eric Wang ✈️ CVPR 2026

@xwang_lk

about 1 year ago

Since launching Agent S2, many folks working on GUI/computer-use agents asked for our tech report. Here we go! 🎉New SOTA on 3 major computer use benchmarks. • OSWorld (15 steps): 27.0% 🚀 (+18.9%) • OSWorld (50 steps): 34.5% 🚀 (+32.7%) • WindowsAgentArena: 29.8% 🚀 (+52.8%) • AndroidWorld: 54.3% 🚀 (+16.5%) We strive for simple solutions that work best. Agent S focused on Memory; S2 crushes Grounding & Planning. Bigger things ahead—stay tuned!

xwang_lk's tweet photo. Since launching Agent S2, many folks working on GUI/computer-use agents asked for our tech report. Here we go! 🎉New SOTA on 3 major computer use benchmarks.

• OSWorld (15 steps): 27.0% 🚀 (+18.9%)
• OSWorld (50 steps): 34.5% 🚀 (+32.7%)
• WindowsAgentArena: 29.8% 🚀 (+52.8%)
• AndroidWorld: 54.3% 🚀 (+16.5%)

We strive for simple solutions that work best.
Agent S focused on Memory; S2 crushes Grounding & Planning. Bigger things ahead—stay tuned!

7

201

39

151

49K

iam_aashusingh retweeted

Thomas Wolf

@Thom_Wolf

about 1 year ago

Blog post: https://t.co/K4TUOi77Hn Model: https://t.co/6xL1azFX3X

0

30

4

25

6K

iam_aashusingh retweeted

Jason Weston

@jaseweston

about 1 year ago

🚨Multi-Token Attention🚨 📝: https://t.co/79YMoUsGJD Attention is critical for LLMs, but its weights are computed by single query & key vectors, limiting capability. MTA combines query, key & head operations over multiple tokens, improving performance in terms of PPL, std benchmarks, and long-range tasks. NOTE: this isn't an April Fool, this is a real paper🏛️👩‍⚖️💯

jaseweston's tweet photo. 🚨Multi-Token Attention🚨
📝: https://t.co/79YMoUsGJD

Attention is critical for LLMs, but its weights are computed by single query & key vectors, limiting capability.

MTA combines query, key & head operations over multiple tokens, improving performance in terms of PPL, std benchmarks, and long-range tasks.

NOTE: this isn't an April Fool, this is a real paper🏛️👩‍⚖️💯

1

771

140

551

98K

Aashu Singh @iam_aashusingh

about 1 year ago

Interesting paper: Video-R1 improves temporal reasoning in MM LLMs using T-GRPO a variant of GRPO and high quality curated data for SFT. Here's a summary: https://t.co/YLyZqsizIJ Original paper: https://t.co/WVcX0rlFfY

0

23

iam_aashusingh retweeted

Vivek Galatage

@vivekgalatage

over 1 year ago

🎨 Understanding GPU Architecture from Cornell This GPU architecture roadmap is a good starting point for diving deeper, along with the CUDA C++ programming guide PDF - both freely available from Cornell and NVIDIA.

vivekgalatage's tweet photo. 🎨 Understanding GPU Architecture from Cornell

This GPU architecture roadmap is a good starting point for diving deeper, along with the CUDA C++ programming guide PDF - both freely available from Cornell and NVIDIA.

9

1K

229

2K

188K

iam_aashusingh retweeted

Kevin Patrick Murphy

@sirbayes

over 1 year ago

I read the R1 zero paper and the method is very simple , just a tweak to PPO to fine tune deepseek v3 base using a verifiable sparse binary reward. The fact that they got it to work even though others failed is likely due to better data and/or their very efficient implementation

14

452

38

240

60K

Aashu Singh

@iam_aashusingh

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users