platers @vector_tao - Twitter Profile

@mdda123 @SwayStar123 @SakanaAILabs I expect that only works for causal attention, theres no way for a transformer to distinguish positions without PE in noncausal attention.

0

1

0

30

platers

@vector_tao

5 months ago

@SwayStar123 @SakanaAILabs This seems to imply you could replace rope with a simpler PE, which would remove dependence on brittle rope kernels in inference.

2

0

363

platers

@vector_tao

5 months ago

@sameQCU I've seen enough

0

2

0

696

platers

@vector_tao

6 months ago

@kyutai_labs did jax not use fa3 before this???

1

0

503

platers

@vector_tao

6 months ago

turns out pretty much all audio can be interpreted as music

0

1

0

141

platers

@vector_tao

6 months ago

who is building ai for world building and storytelling?

1

2

0

140

platers

@vector_tao

6 months ago

why is every paper using DINO to bootstrap their image models? what makes it so good?

0

2

0

215

platers

@vector_tao

6 months ago

here's bad apple but its a game of pong (sound on)

0

2

1

0

222

platers

@vector_tao

6 months ago

Spent a weekend building Song Pong - it's pong except the ball bounces exactly on the beat of a song, created with linear programming🏓 (sound on) Basic idea: ball travels at constant speed, paddles can glide anywhere on their half, every bounce lands on a beat. Question is where should the paddles be to maximize screen usage while still obeying physics? Approach: model it as linear programming. Variables are horizontal positions/velocities at each hit. Constraints encode physics (distance = velocity × duration, paddles stay on their side). Objective is to maximize sum of distances from center. CVXPY solves it and gives you the keyframes, then you simulate the vertical component and interpolate between hits. What I like about this: if you want different physics you just change the constraints. Want a different aesthetic you change the objective. No manual choreography, no fragile heuristics that break on weird inputs. Just declare what you want and let the solver figure it out. The design space of music visualizers powered by optimization feels under-explored. There's probably a bunch of these waiting to be discovered where you have timing constraints + some physical system + an aesthetic objective. Code is up at https://t.co/4qVyFmAmE5 if anyone wants to play with it. Currently takes MIDI for beat times but want to explore extracting them directly from audio.

1

6

1

0

306

platers

@vector_tao

6 months ago

@jaredpalmer @github yes yes yes yes

0

177

platers

@vector_tao

6 months ago

my ick is optimizing kernels while theres still gaps in your cuda stream

0

1

0

147

platers

@vector_tao

6 months ago

@danielhanchen @omkizzy @Yampeleg whats the difference of padding free and packing? Packing shouldn't change the loss if you do proper document masking and RoPE indices.

1

0

47

platers

@vector_tao

6 months ago

@ryanssenn build a llm engine from scratch like nano-vllm to learn the core concepts. start simple, and add complexity as needed to optimize.

0

90

platers

@vector_tao

6 months ago

I've been at NeurIPS this past week. Here's six things I learned: 1. Inference engineers are in huge demand 2. Competitive programmers are taking over the field 3. AI music is bottlenecked by evals. Everyone knows the evals are bad, but no one can fix them. 4. Music research is conducted with a tiny amount of compute. Most methods don't scale. 5. Diffusion LMs need to integrate a kv cache 6. The ocean is salty

0

4

0

1

448

platers

@vector_tao

6 months ago

@snwy_me flexattention recently supports it as a backend btw

0

234

platers

@vector_tao

6 months ago

They say every AI researcher dies twice. Once when you stop breathing, and a second time, a bit earlier on, when your life's work is obsoleted by a new grad with a bigger GPU.

1

7

1

0

607

platers

@vector_tao

Last Seen Users on Sotwe

Trends for you

Most Popular Users