Ahmad Farhan @_afarhan_ - Twitter Profile

Pinned Tweet

Ahmad Farhan

@_afarhan_

4 months ago

Chris motionless optimising my life’s loss function one scream at a time. #MIW

0

4

0

128

_afarhan_ retweeted

Mathieu

@miniapeur

5 days ago

1

123

14

8

3K

_afarhan_ retweeted

Mahir 🇹🇷🇬🇧

@ScrewderiaF1

3 days ago

Paraguay after scoring a goal

306

22K

2K

1K

719K

_afarhan_ retweeted

Alex Weers

@a_weers

4 days ago

Today’s reads are about why self-distillation sometimes might not work and how to fix it

10

208

9

157

30K

Who to follow

Francisco Pérez Burgos

Nie jesteś tym, co możesz o sobie powiedzieć. Jesteś tym, co pozostaje, gdy ustaje wszelkie mówienie.

𝕞ęłį𝕤𝕤å•𝕛å𝕟•𝕙åÿś ｡.✵°

@hays_jan

_afarhan_ retweeted

Simon Weber @SimWeberTUM

7 days ago

What if attention wasn't about matching tokens, but operating in function space? Glad to share our #ICML2026 paper: 📄 Functional Attention: From Pairwise Affinities to Functional Correspondences w/ @Jiefang_Xiao @GaoMaolin @stevenygd Daniel Cremers 📄 https://t.co/rhn9NtwrBm

SimWeberTUM's tweet photo. What if attention wasn't about matching tokens, but operating in function space?

Glad to share our #ICML2026 paper:
📄 Functional Attention: From Pairwise Affinities to Functional Correspondences

w/ @Jiefang_Xiao @GaoMaolin @stevenygd Daniel Cremers
📄 https://t.co/rhn9NtwrBm https://t.co/8V3dbshHvt

12

1K

139

865

48K

_afarhan_ retweeted

Red Hat AI

@RedHat_AI

8 days ago

Gemma 4 Diffusion landed in vLLM last week. Day 0. First diffusion LLM natively supported in vLLM. Instead of one token at a time, it predicts 256 tokens at once and iteratively denoises them in parallel. Result: 1,000+ tokens per second at batch size 1 on a single H100. Built on Model Runner V2. @googlegemma

6

184

20

82

18K

_afarhan_ retweeted

Interesting things

@awkwardgoogle

8 days ago

Packing thousands of straws together basically creates a low-tech pixel screen. Each straw acts as an independent light pathway, perfectly mimicking how data channels work.

235

55K

4K

8K

4M

_afarhan_ retweeted

cargo short dad @cargoshortdad64

7 days ago

BREAKING: Mistral reveals compute cluster for the upcoming Le Chaton Obése at 900T parameters. 50 Billion Blackwell equivalent, directly powered by the sun

cargoshortdad64's tweet photo. BREAKING: Mistral reveals compute cluster for the upcoming Le Chaton Obése at 900T parameters. 50 Billion Blackwell equivalent, directly powered by the sun https://t.co/98tKu0Sxfi

32

2K

77

86

49K

_afarhan_ retweeted

Ahmad

@TheAhmadOsman

8 days ago

Prediction Fable 5 equivalent in Opensource is ~8 months away

111

729

25

43

81K

_afarhan_ retweeted

Grace Li

@grx_xce

8 days ago

BREAKING: Le Chaton Fat has fully saturated our benchmark. We are at a loss for words. In response, we are retiring Design Arena. Congratulations to the @MistralAI team, and thanks for putting us on vacation.

grx_xce's tweet photo. BREAKING: Le Chaton Fat has fully saturated our benchmark.

We are at a loss for words.

In response, we are retiring Design Arena.

Congratulations to the @MistralAI team, and thanks for putting us on vacation. https://t.co/bYNsYFxIJJ

46

1K

55

111

92K

_afarhan_ retweeted

xlr8harder

@xlr8harder

8 days ago

he comes

36

2K

80

66

33K

_afarhan_ retweeted

fabian

@fabianstelzer

8 days ago

Rumour mill going crazy on this new mistral model - Napoleon class model with >10T params - smokes Mythos on VoltaireBench - for safety reasons only outputs French language code

fabianstelzer's tweet photo. Rumour mill going crazy on this new mistral model

- Napoleon class model with >10T params
- smokes Mythos on VoltaireBench
- for safety reasons only outputs French language code https://t.co/XhPJrAyQiQ

174

7K

266

597

356K

_afarhan_ retweeted

Roger Gilmour

@RogerGilmour13

9 days ago

@clashreport Sure

5

452

9

18

20K

_afarhan_ retweeted

Lisan al Gaib

@scaling01

9 days ago

the european mind is truly special Mistral is very competitive with Gemini and GPT, wait (checking notes) ... GPT-5.4-nano and Gemma 4 31B

scaling01's tweet photo. the european mind is truly special

Mistral is very competitive with Gemini and GPT, wait (checking notes)
...
GPT-5.4-nano and Gemma 4 31B https://t.co/miOkMo4rUa

63

2K

60

333

495K

_afarhan_ retweeted

eric

@ericzxchen

8 days ago

how to win hackathons 101

87

4K

100

583

482K

_afarhan_ retweeted

alphaXiv

@askalphaxiv

10 days ago

"MiniMax Sparse Attention" This paper from Minimax adds a tiny Index Branch to GQA that picks top k KV blocks per group, then runs exact softmax only on those blocks, making sparsity GPU native, with exp free TopK and KV outer sparse kernels. On a 109B multimodal MoE, it keeps dense GQA quality while cutting 1M context attention compute by 28.4x, with 14.2x prefill and 7.6x decode speedups.

askalphaxiv's tweet photo. "MiniMax Sparse Attention"

This paper from Minimax adds a tiny Index Branch to GQA that picks top k KV blocks per group, then runs exact softmax only on those blocks, making sparsity GPU native, with exp free TopK and KV outer sparse kernels.

On a 109B multimodal MoE, it keeps dense GQA quality while cutting 1M context attention compute by 28.4x, with 14.2x prefill and 7.6x decode speedups.

8

517

62

239

21K

_afarhan_ retweeted

Alpin

@AlpinDale

12 days ago

In perplexity measurements, FP8 is pretty much the same as FP16, but NVFP4 is noticeably worse. Although not by that much.

0

51

3

12

6K

_afarhan_ retweeted

NVIDIA AI

@NVIDIAAI

13 days ago

Congrats to @GoogleDeepMind on the launch of DiffusionGemma. The model generates 256 tokens in parallel per step, delivering 150+ TPS on DGX Spark, and 1,000+ TPS on a single H100. We're supporting it from day one with: • BF16 and NVFP4 checkpoints on @huggingface🤗 • Free GPU-accelerated endpoints on https://t.co/6T0R9P7EXS • @vllm_project support with FP8 precision Get started with DiffusionGemma on NVIDIA: https://t.co/vurk7GCQUs

37

1K

117

328

101K

_afarhan_ retweeted

Lisan al Gaib

@scaling01

12 days ago

Fable 5 refused 200 out of 200 ProgramBench tasks lmao

124

5K

184

337

412K

_afarhan_ retweeted

Google

@Google

13 days ago

Meet DiffusionGemma ⚡ Our latest experimental open model (Apache 2.0) that generates text up to 4x faster. Instead of predicting and typing just one word at a time like most language models, it drafts and refines entire blocks of text simultaneously. Here’s how it works 🧵 ↓

Google's tweet photo. Meet DiffusionGemma ⚡ Our latest experimental open model (Apache 2.0) that generates text up to 4x faster.

Instead of predicting and typing just one word at a time like most language models, it drafts and refines entire blocks of text simultaneously.

Here’s how it works 🧵 ↓

117

3K

378

922

239K

_afarhan_ retweeted

vLLM

@vllm_project

15 days ago

🎉 Meet vLLM-Omni v0.22.0, a major upgrade for omnimodal world models and production-grade multimodal serving. 🌍 Day-0 @NVIDIAAI Cosmos 3 world models: text, image, audio, video, and action, in and out. 🤖 Robot serving: DreamZero + OpenPI realtime API. 🎙️ Production TTS: Qwen3-TTS, Qwen3-Omni, VoxCPM2 and more. 🎨 Faster image/video/diffusion: Wan 2.2, HunyuanVideo 1.5, LTX-2.3. ⚡ Broader quantization (FP8/INT8, MXFP4/MXFP8, W4A16, ModelOpt) and hardware coverage. 339 commits, 124 contributors, 52 of them new. Thank you all. 🙌 🔗 https://t.co/76ttSM6FHs

10

442

65

153

42K

Ahmad Farhan

@_afarhan_

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users