DeepInfra has raised its $107M in Series B funding 🚀
AI is moving from training to production-scale deployment, and inference is becoming the system constraint.
DeepInfra was built for this shift — scaling high-throughput inference for open-source and agent-driven workloads. Grateful to our investors and partners, co-led by @500GlobalVC and @gharik
DeepInfra × Hugging Face
DeepInfra is live on @HuggingFace Inference Providers.
Run DeepSeek V4, Kimi-K2.6, GLM-5.1 and 100+ more open models straight from the Hub — same OpenAI-compatible API, same low per-token pricing, no markup.
Just add :deepinfra to the model name.
The DeepSeek V4 garbled output bug in open source inference engine is fixed in SGLang.
To everyone affected over the weekend, sorry for the trouble.
Huge thanks to @Ant_Group for landing the fix PR. It was a cross-company, cross-timezone, sub-48-hour marathon. @ollama and @humansand surfaced it first; @nvidia, @AIatMeta, and @FireworksAI_HQ raised the same signal soon after. @deepseek_ai replied in seconds at every hour. @FireworksAI_HQ stayed up late with us until it shipped. @SemiAnalysis_ and @ollama provided the machines that made the debugging possible. The SGLang team dug in through the weekend.
The real OSS is the friends we made along the way.🫶
DeepSeek V4 is live on DeepInfra at launch 🔥
V4-Pro: 1.6T MoE / 49B active. Frontier-tier reasoning.
$1.74 in · $3.48 out · $0.145 cached
V4-Flash: 284B MoE / 13B active. Fast & cheap for agents, RAG, long-context extraction.
$0.14 in · $0.28 out · $0.028 cached
Day 0. GLM-5.1 from @Zai_org is live on DeepInfra.
Open source getting close to GPT-5.4 and Claude Opus 4.6.
Powered by @nvidia B300 Blackwell Ultra.
Early access pricing, costs will drop as we scale.
$1.40 in / $4.40 out / $0.26 cached per 1M tokens ↓
there is still no substitute for perfectly understanding every single line of code in your codebase
i fall into the trap of just skimming through ai changes to "just make sure it looks good" all the time, and it makes me lose so much time to not perfectly understand every line
At 1:30 a.m. PT on November 3, 2023 Elon sent a message to the xAI group chat saying that we need to go “extremely hardcore” for the next 36 hours; Grok will be released publicly tomorrow. You didn’t have to be in the exclusive company chat to get the message; it was also posted publicly at the same time:
https://t.co/lThuIjQvF9
What unfolded over the next day and a half was one of the best examples of engineering at pace that I’ve ever seen. All we had when we started was a somewhat fine-tuned base model and a half-baked UI. Our team of ten split up the tasks: curate data, improve the model, implement the raw prompting and RAG service, build the production infra. I took care of the latter.
At 8:51 p.m. PT the next day, we announced Grok to the world with a long-form post on X (https://t.co/9d485OLrSY). Over the past 36 hours, we came up with Fun mode (including Grok’s sunglasses), finished the whole production system, and most importantly tuned the RAG system that gave it real-time knowledge of the world through the X platform (a first in the industry). A day and a half of straight coding and shipping; no drugs, not even caffeine, just pure adrenaline. Elon gave us a mission and we delivered.
The launch went very well. We invited a couple hundred X creators and Grok’s ability to roast accounts went viral. It was the first time a publicly accessible AI was allowed to poke fun at people.
This episode is a prime example of what you can achieve by going extremely hardcore: you move and deliver results faster than any outsider could have anticipated. Within 36 hours, we took the company from silence to relevance. It was well worth it.
xAI’s hardcore culture is infamous on X. I love the tent meme that suggests we all sleep (well, slept in my case) in the office in tents. Our reputation precedes us and even new joiners hit the ground grinding hard. However, unless you understand the “why,” you are at risk of simply replicating the “how” without achieving the same results.
You need to grind with purpose and the purpose is to move fast towards a known goal. When the goal and the means of reaching it are crystal clear, a small, skilled, and highly motivated team can outcompete companies old and new, big and small.
Never grind to show off; never work late to be seen; never sacrifice without cause. There is no medal for the one who tried extremely hard but failed. There is only a medal for the winner. If all your efforts lead nowhere, you’re arguably not very productive.
Always keep your eyes firmly on the goal, do everything to reach it as quickly as possible, and make sure you're on track to win. A hardcore engineering culture is one of the most effective ways of accelerating real progress. Watch out for performative sacrifice and don’t confuse pain with progress.