The best production model is the one trained for the job.
Gravity Ads replaced a 70B model on Cerebras with a specialized 1B model trained for their actual workload.
Same quality, much faster and cheaper inference:
- p50: 152ms
- p99: 5.7x lower
- cost: ~10x lower
- model: 70x smaller
Great working with @trygravityai on this.
Case study: https://t.co/a8kzvBuxig
@jakemor Sounds like what weβre doing! Some of the coding agents that we work with do something similar - although we donβt steer the LLM itself. Check out @CodebuffAI and @ctodotnew