Sourabh Daptardar @s_daptardar - Twitter Profile

Pinned Tweet

6 months ago

🎄☃️ Merry Christmas and Happy Holidays everyone ! 🎅🎁 A small gift from my team - an open source framework for building enterprise Gen AI applications ✨ ⭐https://t.co/VAAT3qOFG1 👍https://t.co/yHYrkigfbd

0

1

24

s_daptardar retweeted

Hugging Models

@HuggingModels

1 day ago

Imagine running a massive GLM-5 model on consumer hardware. That's what nvidia's GLM-5.2-NVFP4 delivers with 4-bit FP4 quantization. It's a game changer for local AI, making high-end text generation accessible to more builders. #AI #MachineLearning

HuggingModels's tweet photo. Imagine running a massive GLM-5 model on consumer hardware. That's what nvidia's GLM-5.2-NVFP4 delivers with 4-bit FP4 quantization. It's a game changer for local AI, making high-end text generation accessible to more builders. #AI #MachineLearning https://t.co/PQwn2wU00m

22

458

48

376

51K

s_daptardar retweeted

Mustafa Suleyman

@mustafasuleyman

about 1 month ago

Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier. First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks. - It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities. - It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks. - And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end. Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing. Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI. - Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost. All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat. Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost. Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare. Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://t.co/v65eop5Ixq

mustafasuleyman's tweet photo. Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier.
First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks.
- It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities.
- It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks.
- And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end.

Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing.

Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI.
- Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost.

All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat.

Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost.

Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare.

Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://t.co/v65eop5Ixq

192

4K

541

1K

1M

s_daptardar retweeted

Jia-Bin Huang

@jbhuang0604

29 days ago

Hi friends at #CVPR2026! 👋 Please come check out our research today! @YaoChihLee will showcase a super-fun video motion editing work - Edit-by-Track (#378)! https://t.co/CPXkjHbEkY

2

124

9

26

14K

Who to follow

Darshan Singh @ CVPR

@thought2vec

Research @GoogleDeepMind | @iiit_hyderabad

Riddhiman Das Gupta

@riddhimandg

Applied science at MSFT Bing. Ex IBM Research, Ex IIIT Hyderabad. DL + CV + NLP. Love science fiction, superheroes, food.

ankur kulshrestha

@legalmartian

random idea generator

s_daptardar retweeted

Arjun Virk

@virkvarjun

30 days ago

I just spent months handwriting a 200 page guide on the entirety of ML foundations and math from scratch. The guide features: - Neural Nets (Backprop, Adam, SGD, Batch Norm) - ML Algorithms (SVM, Grad Boosting, K-means, PCA) - Hardware (Tensor Cores, Systolic Arrays, CUDA) - Transformers (Multi-Head Attn, KV Cache, LoRA) - Vision (ViT, Convolutions, MAE, IoU, NMS, VLM) - Agents (OpenClaw, ReAct, Memory, Orchestration) Everything I wish I had years ago, for free.

144

3K

338

5K

282K

s_daptardar retweeted

Lucas Beyer (bl16)

@giffmana

28 days ago

@zhaisf @geoffreyhinton We have a modern version of the experiment (mnist is tricky: almost everything looks good on it), and i believe a better explanation than "dark knowledge" in our paper: https://t.co/3SlkXVZcG3

2

182

7

138

9K

s_daptardar retweeted

Andrej Karpathy

@karpathy

about 2 months ago

Personal update: I've joined Anthropic. I think the next few years at the frontier of LLMs will be especially formative. I am very excited to join the team here and get back to R&D. I remain deeply passionate about education and plan to resume my work on it in time.

8K

150K

11K

14K

28M

s_daptardar retweeted

ICLR @iclr_conf

2 months ago

#ICLR2026 Test of Time Award Talk on "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks"

iclr_conf's tweet photo. #ICLR2026 Test of Time Award Talk on

"Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks" https://t.co/4J6UqUpS5R

3

276

29

38

32K

s_daptardar retweeted

Kimi.ai @Kimi_Moonshot

3 months ago

Zhilin at GTC: Introducing Attention Residuals Learning selective memory, rather than mechanically accumulating everything, is the beauty of attention. Many of you have probably read Attention Is All You Need, the 2017 Transformer paper that brought “human-like” attention into the model’s field of view. From that point on, models no longer simply read everything in a mechanical way. Instead, they began to develop a sense of what matters more and what matters less across the text, choosing to retain the more important information. Recently, Kimi applied this idea of attention to the temporal dimension, then rotated it 90 degrees into the model’s depth dimension. This allows the model to have attention not only over time, but also throughout the process of information transmission across layers—giving it a more intelligent way to understand and process information.

49

1K

155

502

115K

s_daptardar retweeted

Unsloth AI

@UnslothAI

3 months ago

You can now train Qwen3.5 with RL in our free notebook! You just need 8GB VRAM to RL Qwen3.5-2B locally! Qwen3.5 will learn to solve math problems autonomously via vision GRPO. RL Guide: https://t.co/iR9AF3BIFu GitHub: https://t.co/aZWYAtakBP Qwen3-4B: https://t.co/OzzCLFkSoW

UnslothAI's tweet photo. You can now train Qwen3.5 with RL in our free notebook!

You just need 8GB VRAM to RL Qwen3.5-2B locally!

Qwen3.5 will learn to solve math problems autonomously via vision GRPO.

RL Guide: https://t.co/iR9AF3BIFu
GitHub: https://t.co/aZWYAtakBP

Qwen3-4B: https://t.co/OzzCLFkSoW https://t.co/mEm6DbWYl2

30

3K

394

3K

407K

s_daptardar retweeted

Y Combinator

@ycombinator

5 months ago

Today, startups aren't winning by hiring faster, but by automating as many internal functions as possible. In this episode of Main Function, @garrytan breaks down how tiny teams are beating companies 20x their size by building automations into every workflow, from engineering to ops to customer support.

71

1K

105

955

198K

s_daptardar retweeted

Thariq

@trq212

4 months ago

https://t.co/45C3gKydTK

388

16K

2K

44K

7M

s_daptardar retweeted

Dwarkesh Patel

@dwarkesh_sp

4 months ago

If AI scientists are writing millions of papers, many of which are slop, and some of which are incremental progress, how would we identify the one or two which come up with an extremely productive new idea? In 1948, Shannon was one of hundreds of engineers at Bell Labs working on how to cleanly send voice signals over noisy copper wires. His paper sat in the same technical journal as reports on reducing static and building better filters. How would you recognize that he has come up with this very general framework for thinking about information and communication channels, which over the coming decades would have enormous use from domains as far apart as cryptography to genetics to quantum mechanics? It seems like it can take fields multiple decades to recognize the significance of unifying new concepts. Because it is on that time scale that the fruits of such general concepts lead to new discoveries across many different fields. We’ve managed to solve this peer review problem for human scientists (at least somewhat). Now we’ll need to do it at a much greater scale for the mass of AI science that will be thrown at us.

98

2K

223

819

297K

s_daptardar retweeted

Boris Cherny

@bcherny

6 months ago

I'm Boris and I created Claude Code. Lots of people have asked how I use Claude Code, so I wanted to show off my setup a bit. My setup might be surprisingly vanilla! Claude Code works great out of the box, so I personally don't customize it much. There is no one correct way to use Claude Code: we intentionally build it in a way that you can use it, customize it, and hack it however you like. Each person on the Claude Code team uses it very differently. So, here goes.

1K

55K

7K

104K

8M

s_daptardar retweeted

clem 🤗

@ClementDelangue

4 months ago

Nvidia just crossed Google as the biggest org on @huggingface with 3,881 team members on the hub. I'm officially calling it: Nvidia is the new American king of open-source AI!

ClementDelangue's tweet photo. Nvidia just crossed Google as the biggest org on @huggingface with 3,881 team members on the hub.

I'm officially calling it:
Nvidia is the new American king of open-source AI! https://t.co/5btj2QpLV4

49

826

90

80

137K

s_daptardar retweeted

Dylan Patel

@dylan522p

4 months ago

Jensen name-dropped me in the keynote and posed with our belt. He has a physical belt too but they just showed the pic Intially I made fun of the 35X perf improvement being bogus, I thought it was an exaggeration of performance Turns out he was sandbagging, and perf is 50x

64

2K

68

412

178K

s_daptardar retweeted

OpenRouter

@OpenRouter

4 months ago

Two new Stealth Models are live now! - Hunter Alpha: 1T-parameter model with 1M context built for agentic workflows, long-horizon tasks, and serious tool use. - Healer Alpha: multimodal model combining strong image, video, and audio understanding with real agentic execution.

OpenRouter's tweet photo. Two new Stealth Models are live now!

- Hunter Alpha: 1T-parameter model with 1M context built for agentic workflows, long-horizon tasks, and serious tool use.

- Healer Alpha: multimodal model combining strong image, video, and audio understanding with real agentic execution. https://t.co/vR2urIQ28w

111

2K

142

1K

983K

s_daptardar retweeted

Andrej Karpathy

@karpathy

4 months ago

Thank you Jensen and NVIDIA! She’s a real beauty! I was told I’d be getting a secret gift, with a hint that it requires 20 amps. (So I knew it had to be good). She’ll make for a beautiful, spacious home for my Dobby the House Elf claw, among lots of other tinkering, thank you!!

525

19K

822

2K

1M

s_daptardar retweeted

Albert Gu

@_albertgu

4 months ago

The newest model in the Mamba series is finally here 🐍 Hybrid models have become increasingly popular, raising the importance of designing the next generation of linear models. We've introduced several SSM-centric ideas to significantly increase Mamba-2's modeling capabilities without compromising on speed. The resulting Mamba-3 model has noticeable performance gains over the most popular previous linear models (such as Mamba-2 and Gated DeltaNet) at all sizes. This is the first Mamba that was student led: all credit to @aakash_lahoti @kevinyli_ @_berlinchen @caitWW9, and of course @tri_dao!

_albertgu's tweet photo. The newest model in the Mamba series is finally here 🐍

Hybrid models have become increasingly popular, raising the importance of designing the next generation of linear models.

We've introduced several SSM-centric ideas to significantly increase Mamba-2's modeling capabilities without compromising on speed. The resulting Mamba-3 model has noticeable performance gains over the most popular previous linear models (such as Mamba-2 and Gated DeltaNet) at all sizes.

This is the first Mamba that was student led: all credit to @aakash_lahoti @kevinyli_ @_berlinchen @caitWW9, and of course @tri_dao!

41

2K

311

838

448K

s_daptardar retweeted

Sebastian Raschka

@rasbt

4 months ago

Oh wow, Mamba-3 is here! For me, the most interesting use case of Mamba and Mamba-likes are the recent transformer attention hybrid architectures (Qwen3.5, Kimi Linear, etc.) Would be interesting to swap Gated DeltaNet with Mamba-3 (which now also has RoPE) in next gen hybrids.

rasbt's tweet photo. Oh wow, Mamba-3 is here!
For me, the most interesting use case of Mamba and Mamba-likes are the recent transformer attention hybrid architectures (Qwen3.5, Kimi Linear, etc.)
Would be interesting to swap Gated DeltaNet with Mamba-3 (which now also has RoPE) in next gen hybrids. https://t.co/wcTg0uZ3gg

23

986

136

471

76K

Sourabh Daptardar

@s_daptardar

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users