4x cost reduction in TTS inference with @tenstorrent!
11 NVIDIA L40S ran 550 simultaneous audio-stream at ~$100K.
Now, 27 Tenstorrent P100 chips do the same at ~$27k.
First production-grade TTS to match the cost of text tokens without degradation in audio quality.
Hear it straight from the team that built it: @AkshatMandloi10 and @ranjith_m_s in the video below.
Introducing Infinite Studio ♾.
Last week, @tenstorrent x @prodia announced the fastest Wan 2.2 video generation in the world.
We built a demo to show what that speed unlocks: directing an infinite movie in real time.
Demo 👇
[5/5] More GRPO details:
The model is rewarded for formatting (0 to 0.4) and for correctness (-1.0 to 6.0). The maximum reward is 6.4. The correctness reward goes up for first ~300 steps then flattens. The test accuracy grows from 19% before GRPO to 34% after GRPO.
[1/5] Got a working TRL pipeline that makes a tinyllama model solve math questions.
gsm8k test accuracy: 34.65%
Pipeline: tinyllama-v1.1b-math-code + NuminaMath CPT (2 epochs) + GSM8K Format Priming + GRPO (1 epoch).
[4/5] Last stage is GRPO, it helps the model figure out a logically correct answer. With temperature=1.0 and 16 generations, the model proposes a few different approaches to the same problem. Usually one or two don't have any logical mistakes.