Ali Tayeb @amtayb - Twitter Profile

amtayb retweeted

2 days ago

We’re teaming up @Palmeiras, the first football club to meaningfully build upon TacticAI: our AI system that can help simulate field scenarios and predict open play dynamics up to 8 seconds in advance. ⚽

106

3K

356

2K

915K

amtayb retweeted

Dylan Foster 🐢

@canondetortugas

3 days ago

Bad news for GRPO...didn't get refused or routed to Opus.

20

378

6

29

38K

amtayb retweeted

vik

@vikhyatk

4 days ago

Was using Fable 5 to write inference code Anthropic flagged it as frontier AI research steering vector kicked in and it started importing ONNX 🤨

21

589

12

36

30K

amtayb retweeted

Florian Brand

@xeophon

4 days ago

if claude helps you with your research, are you too stupid to notice its sandbagging or is your research not interesting enough to trigger the filters

21

652

38

37

31K

amtayb retweeted

Nick Frosst

@nickfrosst

4 days ago

this model is the opposite of mythos. Its small, cost effective, apache 2.0, and locally deployable. This is the way LLMs should go. small, open source, transparent and sovereign vs large, expensive, proprietary and hegemonic

41

1K

84

495

163K

amtayb retweeted

Tim is GOING TO VIBECAMP ⛺️

@MasterTimBlais

6 days ago

how is carnegie mellon a real place like bro i graduated from a music hall that was also a fruit

2

66

2

14K

amtayb retweeted

alex zhang

@a1zhang

16 days ago

In case you're curious about why dynamic workflows are so powerful and the future, read the RLM paper! Opus 4.8 + dynamic workflows in Claude Code is perhaps the first instance of a frontier model seriously trained to be an RLM. I suspect within a year they'll just become the standard for nearly all coding agent interactions.

a1zhang's tweet photo. In case you're curious about why dynamic workflows are so powerful and the future, read the RLM paper! Opus 4.8 + dynamic workflows in Claude Code is perhaps the first instance of a frontier model seriously trained to be an RLM.

I suspect within a year they'll just become the standard for nearly all coding agent interactions.

53

1K

168

1K

295K

amtayb retweeted

kuz

@kylekuzma

18 days ago

It’s funny….every AI startup deck claims a data moat. 5% actually have one. Would your data be impossible to replicate even if a competitor raised $500M tomorrow? If yes cool you have a business.

90

951

27

163

217K

amtayb retweeted

Nick Frosst

@nickfrosst

24 days ago

Command A+ from @cohere is out now :) its our best model yet and its open source apache 2.0

56

1K

132

622

203K

amtayb retweeted

Joe Fioti

@joefioti

26 days ago

We've integrated the Luminal compiler on Positron AI chips. Our first major non-GPU compiler target is Positron Atlas, a bandwidth-focused inference accelerator.

joefioti's tweet photo. We've integrated the Luminal compiler on Positron AI chips.

Our first major non-GPU compiler target is Positron Atlas, a bandwidth-focused inference accelerator.

9

100

11

19

11K

amtayb retweeted

alex zhang

@a1zhang

30 days ago

A fun 48-hour run of letting an RLM iteratively building the interface for an RLM to play Pokemon Red (sneak peak of some fun things cooking at @PrimeIntellect😄). The interface generating RLM was just tasked with getting the RLM (same scaffold) to beat the game in under 5 hours wall-clock time. I originally expected the RLM to design some components used in Gemini Plays Pokemon like an extra map, an interface to parse the screen, etc., design low-level policies that would run fast on the emulator, and also design a good prompt and strategy around the RLM to use sub-agents to explore game state with checkpointing, use RNG manipulation in its favor, etc. Instead the RLM eventually just decided to give the RLM a `write_memory` tool, which the RLM player decided to use to 1) warp the player immediately to the Elite 4; 2) give itself a level 100 Mewtwo (which it mistakes to be a Ponyta due to weird Pokedex ID vs. internal ID); 3) give itself $999999; 4) give itself all 8 badges by setting the right flag. It then went ahead and destroyed the Elite 4 and Blue and beat the game in record time :p You'll also notice in the video there's weird backtracking and frame-skipping, this happens because it also did incorporate the strategy of launching sub-agents to explore action trajectories, but had a strange way of saving the frames and recording them (so you see the result of several sub-agent explorations). We'll have some more funny and cool RLM demos soon, but it's cool to see RLMs work as general-purpose agents (both the coding agent that designs the interface and the game-playing agent itself)!

8

224

28

105

12K

amtayb retweeted

Notion Developers

@NotionDevs

about 1 month ago

Install ntn, the Notion CLI. It brings the entire Notion API to your terminal, plus everything you need to build and deploy Workers. Built for humans and coding agents alike. Install with: curl -fsSL https://t.co/2dJqE3YHvw | bash

125

3K

297

3K

2M

Ali Tayeb @amtayb

about 1 month ago

In a regular setting, every agent recomputes the same prefix and holds a GPU slot while waiting on tool calls. BatchAgent fixes this: warm the prefix once, coalesce duplicate tool calls, release GPU slots during tool waits. works with SGLang, vLLM, and Dynamo. 2/2 github: https://t.co/J8YEqc4FXK

0

153

Ali Tayeb @amtayb

about 1 month ago

built BatchAgent; a Python SDK for running many agents against one shared inference backend. 100 parallel OpenCode sessions on H100 + SGLang: 573s → 191s wall-clock, 1.28M → 50K prefill tokens, 96% less compute. 1/2

amtayb's tweet photo. built BatchAgent; a Python SDK for running many agents against one shared inference backend.

100 parallel OpenCode sessions on H100 + SGLang: 573s → 191s wall-clock, 1.28M → 50K prefill tokens, 96% less compute.

1/2 https://t.co/U66yxzTuih

2

1

0

178

amtayb retweeted

Flapping Airplanes

@flappyairplanes

about 1 month ago

(4/5) One thing we’ve built is a “kittens” virtual machine that takes over the whole GPU and allows new kinds of co-optimization. We can go past the traditional sequential kernel model – for example, fusing entire training runs into a single kernel and even weirder stuff.

flappyairplanes's tweet photo. (4/5) One thing we’ve built is a “kittens” virtual machine that takes over the whole GPU and allows new kinds of co-optimization. We can go past the traditional sequential kernel model – for example, fusing entire training runs into a single kernel and even weirder stuff. https://t.co/5lQAy1Qa7Z

28

676

56

292

246K

amtayb retweeted

OpenAI

@OpenAI

about 1 month ago

We’ve partnered with @AMD, @Broadcom, @Intel, @Microsoft, and @NVIDIA, to release Multipath Reliable Connection (MRC), a new open networking protocol that helps large AI training clusters run faster and more reliably, with less wasted GPU time. https://t.co/AiV952AJXs

214

6K

698

2K

1M

amtayb retweeted

Yaron (Ron) Minsky

@yminsky

about 1 month ago

I had a good time visiting CMU a couple of weeks back, but I think the highlight was lecturing about the stuff we're doing with OxCaml at Hype for Types, a student-organized PL class at CMU.

yminsky's tweet photo. I had a good time visiting CMU a couple of weeks back, but I think the highlight was lecturing about the stuff we're doing with OxCaml at Hype for Types, a student-organized PL class at CMU. https://t.co/f1GTQ6Ge1n

3

156

7

31

14K

amtayb retweeted

Patrick C Toulme

@PatrickToulme

about 2 months ago

Launching pyptx — a Python DSL for writing NVIDIA PTX kernels. One PTX instruction = one Python call. Write pure PTX in Python. Direct Hopper + Blackwell support: wgmma, TMA, tcgen05, mbarriers. JAX + PyTorch integration. Includes GEMM, grouped GEMM, RMSNorm, SwiGLU, and a PTX→Python transpiler pip install pyptx[torch] pip install pyptx[jax] https://t.co/PcISpsaeQ5

34

1K

135

814

181K

amtayb retweeted

Sam Altman

@sama

about 2 months ago

Really excellent work by the inference team to serve this model so efficiently! To a significant degree, we have to become an AI inference company now.

268

6K

151

286

326K

Ali Tayeb @amtayb

about 2 months ago

exq profile --model [MODEL] exq compile profile profiles/[MODEL].json exq serve --model [MODEL] Compiles in under 3 econd, and patches SGLang at startup, Code: https://t.co/kfZaAdBx1P

0

34

Ali Tayeb @amtayb

about 2 months ago

Built ExQ, which made SGLangs INT4 MoE kernel 38% faster by sorting tokens before the GEMM. 🧵 1/4

1

3

1

0

131

Ali Tayeb @amtayb

about 2 months ago

The two results from this are speed, and quality. Since INT4 weights load 4x fewer bytes from HBM. At production batches, ExQ is 20-27% faster than SGLang's default fp16 serving. As for quality: by keeping hot experts at higher precision, ExQ recovers more than half the quality you lose going to INT4 at the same memory cost as uniform INT4 3/4

amtayb's tweet photo. The two results from this are speed, and quality. Since INT4 weights load 4x fewer bytes from HBM. At production batches, ExQ is 20-27% faster than SGLang's default fp16 serving.

As for quality: by keeping hot experts at higher precision, ExQ recovers more than half the quality you lose going to INT4 at the same memory cost as uniform INT4

3/4

1

0

46

Ali Tayeb

@amtayb

Last Seen Users on Sotwe

Trends for you

Most Popular Users