Seba @CulStory - Twitter Profile

Seba @CulStory

2 days ago

@osanseviero w8 so activations are 8-bit static quantized ??

0

117

Seba @CulStory

4 days ago

Another microsoft interesting release besides from their models. Read the Trace, Steer the Path: Trajectory-Aware Reinforcement Learning for Diffusion Language Models https://t.co/0XTFzT1ukT

Seba @CulStory

4 months ago

waiting for someone to announce that they through rl compute at stuff like this

0

1

0

1

122

0

23

Seba @CulStory

5 days ago

@classiclarryd if the target is a dense model, would it make sense to begin training as moe and then densifying (maybe pruning less important experts) for faster dense training?

1

0

2K

Seba @CulStory

6 days ago

@hcompany_ai the 4B model looks really tempting, may be possible to make it run on the neural engine

0

1

0

1

819

Who to follow

Order behind the chaos

Joey @ NS

@joeysantoro

Building @ns at the frontier of human society. Creator of @erc4626 @feiprotocol

Seba @CulStory

7 days ago

have the feeling that action models will become great base models

The Humanoid Hub

@TheHumanoidHub

7 days ago

Jensen just launched NVIDIA Cosmos 3. Pitched as the first fully open omnimodel for physical AI: a mixture-of-transformers (reasoning + generation) with native vision reasoning and generation across text, image, video, sound, and action. Tops open-model leaderboards on physics, world generation, and action policy. Three jobs in one: - VLM for robots and autonomous vehicles - world model that simulates environments and predicts future states - backbone for world-action models trained on specific tasks Three options - Super (32B): for post-training robotics models that need the highest physics accuracy and generation quality. - Nano (8B) for high-quality video and action reasoning in fractions of a second. - Edge, coming soon, for real-time inference at the edge.

5

115

25

36

9K

0

17

Seba @CulStory

10 days ago

@badlogicgames If you want to try something else https://t.co/f3xaSzeijZ

0

14

Seba @CulStory

14 days ago

@OpenBMB now we wait for the 2-bit qat

0

1

0

73

Seba @CulStory

14 days ago

@eliebakouch @KranenKyle my bet is the gemini 3.5 is co-optimized for speculative decoding, that's why it is so fast

0

48

Seba @CulStory

16 days ago

Silent quick release, VoxCPM2 running on the Apple Neural Engine, using cached voices I’m getting a decent ~0.5s TTFB and ~0.6 RTF on M4 Air. https://t.co/f3xaSzeijZ

0

1

0

59

Seba @CulStory

26 days ago

so awesome, periodic reminder that compute is still underexploited

Jonas Geiping

@jonasgeiping

26 days ago

We’re training models wrong and it’s due to chatGPT. Even the modern coding agents used daily still use message-based exchanges: They send messages to users, to themselves (CoT) and to tools, and receive messages in turn. This bottlenecks even very intelligent agents to a single stream. The models cannot read while writing, cannot act while thinking and cannot think while processing information. In our new paper, see below, we discuss LLMs with parallel streams. We show that multi-stream LLMs can … 🔵Be created by instruction-tuning for the stream format 🔵Simplify user and tool use UX removing many pain points with agents and chat models (such as having to interrupt the model to get a word in) 🔵Multi-Stream LLMs are fast, they can predict+read tokens in all streams in parallel in each forward pass, improving latency 🔵 LLMs with multiple streams have an easier time encoding a separation of concerns, improving security 🔵 LLMs with many internal streams provide a legible form of parallel/cont. reasoning. Even if the main CoT stream is accidentally pressured or too focused on a particular task to voice concerns, other internal streams can subvocalize concerns that would otherwise not be verbalized. Does this sound related to a recent thinky post :) - Yes, but I don’t feel so bad about being outshipped with such a cool report on their side by 23 hours. I’ll link a 2nd thread below with a more direct comparison. I actually think both are complementary in interesting ways.

42

1K

168

1K

156K

0

1

0

1

98

Seba @CulStory

about 1 month ago

@hardmaru @NVIDIAAI @nvidia we need a recipe to sparsify current models pls

0

1

0

167

Seba @CulStory

about 1 month ago

V4 is an R1 moment

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

@teortaxesTex

about 1 month ago

restarted a convo (with V4's + 3 more papers) ≈48 hours old. cache hits they do store cache for "days", not minutes-hours Gemini TTL default is 1 hour, Claude's is 5 minutes Nah bros I don't think they have > V4 kv efficiency, whatever Reiner Pope says