@mdda123@SwayStar123@SakanaAILabs I expect that only works for causal attention, theres no way for a transformer to distinguish positions without PE in noncausal attention.
@SwayStar123@SakanaAILabs This seems to imply you could replace rope with a simpler PE, which would remove dependence on brittle rope kernels in inference.
Spent a weekend building Song Pong - it's pong except the ball bounces exactly on the beat of a song, created with linear programming🏓
(sound on)
Basic idea: ball travels at constant speed, paddles can glide anywhere on their half, every bounce lands on a beat. Question is where should the paddles be to maximize screen usage while still obeying physics?
Approach: model it as linear programming. Variables are horizontal positions/velocities at each hit. Constraints encode physics (distance = velocity × duration, paddles stay on their side). Objective is to maximize sum of distances from center. CVXPY solves it and gives you the keyframes, then you simulate the vertical component and interpolate between hits.
What I like about this: if you want different physics you just change the constraints. Want a different aesthetic you change the objective. No manual choreography, no fragile heuristics that break on weird inputs. Just declare what you want and let the solver figure it out.
The design space of music visualizers powered by optimization feels under-explored. There's probably a bunch of these waiting to be discovered where you have timing constraints + some physical system + an aesthetic objective.
Code is up at https://t.co/4qVyFmAmE5 if anyone wants to play with it. Currently takes MIDI for beat times but want to explore extracting them directly from audio.
@danielhanchen@omkizzy@Yampeleg whats the difference of padding free and packing? Packing shouldn't change the loss if you do proper document masking and RoPE indices.
I've been at NeurIPS this past week. Here's six things I learned:
1. Inference engineers are in huge demand
2. Competitive programmers are taking over the field
3. AI music is bottlenecked by evals. Everyone knows the evals are bad, but no one can fix them.
4. Music research is conducted with a tiny amount of compute. Most methods don't scale.
5. Diffusion LMs need to integrate a kv cache
6. The ocean is salty
They say every AI researcher dies twice. Once when you stop breathing, and a second time, a bit earlier on, when your life's work is obsoleted by a new grad with a bigger GPU.