for decades we tended to our crops, planted them by hand, watched them grow. we knew what soil was fertile and where to plant each seed. we chose when to rotate the fields and when to raze bad crop.
now we just tell the machines what to grow and they handle the rest.
@thsottiaux If I switch to another chat while speech is still transcribing the chat I switched off of never starts. so I have to wait for transcription to finish before I can switch to another chat
does anyone use estimated FLOPs or HBM bytes moved as a reward penalty for long generations, instead of just sequence length?
during decode, the marginal cost of each token grows linearly, so total rollout cost is closer to real $/latency/watts than a flat per-token penalty.