Today, we’re excited to announce our $50M Series B, led by @GreenfieldVC (formerly TPG Capital), with participation from @lightspeed and @notablecap. 🚀
At @PatronusAI, we develop simulations and evals to train and improve AI. The first phase of AI was built on static benchmarks, but that era is over now. As agents are used to solve longer and longer tasks, they need to practice in dynamic, living worlds to get better. Simulations are the critical infrastructure powering this next phase.
As a company, we’re behind the most influential research and products in AI evaluation, like FinanceBench, Lynx, and Percival. And things have moved at the speed of light since. ⚡ We partner with the world's leading frontier AI labs and enterprises, and our revenue has grown more than 15x over the past year.
Additionally, today, we’re introducing a preview of the first Digital World Model for AI agent training and simulation: Patronus-DWM.
Digital World Models are language diffusion world models that predict realistic environment behaviors and steer agent actions across digital workflows. Just as physical world models predict how objects move through space, we’re developing the equivalent for the digital world: predicting how agents act in digital workflows, then using that to scale the creation of high-quality training data for LLMs.
Digital World Models help us push the frontier of ultra long horizon workflows, and unlock a new class of self-improving RL environments. This is our scalable approach to simulating all of the world’s intelligence.
The round was also joined by @datadoghq, @SamsungVentures, @gokulr, @factorialcap, and a large cohort of amazing AI leaders and researchers across @AnthropicAI, @OpenAI, @GoogleDeepMind, @nvidia, @Recursive_SI, and more. ✨
It has been the ride of a lifetime. But we’re just getting started. The best is yet to come.
"Do not go gentle into that good night,
Rage, rage against the dying of the light"
- Dylan Thomas (1954)
most of the effort for @mathematics_inc's spherepacking formalization was spent on compressing and cleaning up a first-pass 500K LOC formalization to <200K LOC. creating infrastructure that can scale autoformalization to the frontiers of mathematics, software, and everything else is our #1 priority here - if this excites you, come work with us!
skill issue, been switching between 2.5 and opus since 2.5 came out - great combo.
there's a reason surgeons have both scalpels and saws in the same kit
The companies I love working with in office hours are the ones where the founder has a specific, weird, earned insight that nobody else has. Not "AI for X." A genuine edge that came from living inside a problem.
The ones that are dying almost always have the same pattern: technically competent founders building something nobody asked for, moving metrics that don't matter, avoiding the conversation with the one user who'd tell them the truth.
The lucky thing is that 2nd type of founder can become the 1st kind if they don't stand still, they are willing to talk to people, try things, and always seek high rate of learning.
Topics of interest include:
Subword Tokenization. Examination of current techniques such as WordPiece, BPE, and UnigramLM, as well as extensions to improve their efficiency and applicability.
Tokenization for Various Modalities. Techniques of tokenization for images, audio, and video. Study of representation alignment across modalities.
Multilingual Tokenization. Focus on ensuring tokenization methods are equitable and effective across various languages. Identification of relevant failure modes caused by tokenization.
Tokenizer Modification. Methods for updating tokenizers after model training to improve the model’s efficiency or performance without retraining from scratch.
Alternative Approaches to Represent Input. Investigation into alternative input representations for data such as patches, bytes, or pixels.
Tokenization and Statistics. Statistical analysis of subword properties. For instance, the study of compression effectiveness of different tokenization methods.
very cool new colm workshop on tokenization just dropped!
The Second Tokenization Workshop (TokShop) at COLM 2026 aims to bring together researchers and practitioners from all corners of machine learning to explore tokenization in its broadest sense.
https://t.co/frZXHC6mOa
hiring a growth engineer at @nomic_ai
the job: build agentic systems that get us in front of every built environment company in the U.S. (~20k companies). orchestrate agents to automate non-spammy outbound, ad campaigns, linkedin touchpoints, events, seo — all wired together.
i've personally been building our internal gtm system myself from scratch the last 6 months with very impressive results - time to scale!
this isn't a marketing role. it's engineering role where your measured output is qualified customer calls and sign ups.
if you like low latency feedback loops between prompt and customer demand surges this might be the role for you.
link: https://t.co/mr8rxxkrqm
Spotlighting our benchmark for agentic search: DETOUR which was accepted to ACL 2026 🎊!
When people try to recall something in conversation, they rarely give a perfect query upfront. They say things like “that movie with the scene where…” or “the paper about…” and the assistant has to ask the right follow-up questions to get there.
Existing search and agent benchmarks often miss this multi-turn, tip-of-the-tongue behavior. To more realistically evaluate it, we introduce DETOUR: Dual-agent based Evaluation Through Obscure Under-specified Retrieval, an interactive benchmark for dual-agent search and reasoning.
DETOUR contains 1,011 prompts across text, image, audio, and video. In the benchmark, a Primary Agent is evaluated on its ability to identify a target entity by querying a consistent Memory Agent, testing whether models can resolve ambiguity through useful follow-up questions.
Current state-of-the-art models still struggle: performance reaches only 36% accuracy across all modalities, showing that today’s agents remain weak at clarification-seeking in underspecified, real-world search settings.
We hope DETOUR helps push the next generation of search agents toward better reasoning, better questions, and more robust multi-turn retrieval.
arXiv Paper: https://t.co/obnKSnjgF0
@getdarshan@anandnk24@rebeccatqian
@f_ili_p_ziva Because our users don't get to read our codebase. Also docs that only document your code are crap.
The real advantage is the agent can see code + marketing copy + relevant slack conversations internally to make sure the docs actually short-circuits getting users to value.
What's the most useful agent that you have running on autopilot?
Mine is a product documentation housekeeper. Everyday at 9am, it looks at all the codebase changes in the last 24 hours and identifies gaps in the product docs. It flags it in slack and proposes a fix.