@chamath This is increasingly an orchestration problem. The real value is not just switching models. It’s optimizing across them for cost, capability, security and latency.
@theallinpod@friedberg Maybe the real tension is that AI is already legible as infrastructure but not yet as shared benefit. People see the datacenter long before they feel the upside. That is also how ideas like moving compute to space stop sounding absurd and start sounding like a workaround.
@FT AI is starting to look less like a pure software story and more like an industrial one. Power, memory, sc and compute economics…that may matter as much as model quality in where value ultimately accrues.
Groq is probably the most consequential AI company you haven’t heard of - but they’ve been working hard for years to build the fastest hardware for AI.
From a commenter on Hacker News:
“This is really impressive. For reference, inference for llama 70b on together’s api generates text at roughly 60 tokens/second.
I can’t find any information about an api, though I’m guessing that the costs are eye watering.
If they offered a Mixtral endpoint that did 300-400 tokens per second at a reasonable cost, I can’t imagine ever using another provider.”
You can try groq’s solution at https://t.co/QUgO3ntxc5 and see for yourself why folks on Hacker News are excited.
https://t.co/J8LS8zxZmX