You lose a lot of flexibility when a cloud agent is quarantined in its VM until you sync manually.
Jacq packages a toolbox binary that turns each device (phone, laptop, desktop) into an API, and the agent dynamically decides what connected device to execute tools on.
This means if my laptop goes offline, Jacq can autonomously decide to keep working on other things in the cloud until I'm back.
Long term it's always better to give the model more choice.
A lot of coding products treat threads as disposable, and you end up having to get your agent up to speed with the same info over and over.
We trained a compaction model for Jacq that runs at 90k tok/s to make resuming with context from other threads instant.
For 18 months, Relace has been training small, specialized models that make autonomous coding products smooth.
Jacq is built on everything we've learned and showcases some new models we've worked on recently.
It uses a compaction model under the hood that runs at 90k tok/s, fast enough to condense a 250k+ token thread in 2-3s. This means you can stay on the same thread indefinitely, or instantly fork from the same starting point.
Autopilot mode is actually a small classifier model that identifies unsafe tool calls. The agent stops bothering you about routine stuff, and is forced to work around or ask you about things you might regret.
Small models are best when they feel invisible and just work. Let us know what you think, and DM us if you want API access to these models!
Today we're launching Jacq. A coding agent built together with the small models we've been training at @relace_ai for the past 1.5 years.
It runs entirely in the cloud, and decides when to pull context from any of your connected devices.
It uses all the software you already live in: Slack, Linear, GitHub, email, etc.
Plus, threads are now durable — a real record of how work happened. Just drag them into a new chat to get context for your next task.
Building a website used to take months. I created 5 in 10 seconds. Meet https://t.co/OoYdYSmx1q, an AI-native CMS where your website lives, collaborates, and evolves with AI.
Existing vibe coding tools are great for building apps, but websites have specific requirements. We’ve spent the last 10 years building Strapi, the most popular open-source Headless CMS. We know what is important for users: translate content, reuse assets, optimize SEO/GEO, update content, A/B test, personalize pages, schedule releases, review workflows, etc.
For months there’s been debate on RAG vs agentic search for codebase retrieval.
RAG is an independent, single-shot classifier of file relevance: each file is scored using only the user query and that file’s contents, with no inter-file reasoning or iteration.
Together, these two properties make RAG completely parallelizable. The entire codebase can be scored in ~1-2s if you scale compute enough.
Agentic search breaks independence to improve accuracy. File relevance is refined iteratively through tool calls (e.g. grep, file views), where difficult cases are resolved with multi-step reasoning.
In early implementations, this reasoning was done serially, with each tool call gating the next, leading to end-to-end latencies over 100s.
Fast Agentic Search operates in a Goldilocks regime between RAG and serial agentic search. It preserves inter-file reasoning while parallelizing exploration, resulting in ~4× lower latency with nearly equal accuracy.
You can easily integrate Relace Search as an explore subagent in your product with our MCP server: https://t.co/v7WR8ne28T
It uses our pre-defined agent harness and executes greps from Relace Search on your local file system in parallel.
Both Claude Code and Cursor already use a search subagent pattern:
- Enter explore mode
- Probe the codebase in parallel with grep tool
- Filter and refine findings
- Return high signal context for main agent
You get wide coverage on the codebase without rotting the primary agent's context window.
Introducing Relace Search, a utility agent designed for fast codebase retrieval.
RAG is fast because it's fully parallelizable. Agentic search is accurate because of its multi-shot reasoning.
Relace Search does both. It calls 4-12 grep and view file tools in between reasoning blocks to match Claude 4.5 Sonnet performance while running 4x faster.
Try it out today through our API or on OpenRouter!
It's the 1 year anniversary of our Fast Apply model that got us past 1M ARR. Our blog post explains everything we did to make it SoTA and run at 10k tok/s.
Cool to see @relace_ai on @ProductHunt today!
Now this is all your coding agent has to do to preserve repo state:
client.git('./repo_path').add('.') .commit('msg').push()
Github repos are for humans. Relace repos are for agents generating code
Relace Repos is here! It's GitHub for agents with semantic search out-of-the-box.
Just use standard git operations without worrying about rate limits. After each push, we auto-index the updated files and use a reranker to make sure each search tool call is fresh.
Retrieval is just the start. We're deeply co-optimizing our new suite of small, utility agents with Repos.
Over the next few months we'll release models for end-to-end tasks like traversing codebases, automatically merging conflicts, and refactoring codebases.
It's going to be as easy as method calls like search(), merge(), and refactor().
Read more on our launch blog linked below.
Repos is also designed to integrate with our task-specific agentic models we’re training at Relace and plan to release in the coming months.
Utility functions like agentic search, automatic merge conflict resolution, and refactoring will be as easy as https://t.co/LY5bOmOzQc(), repo.merge() repo.refactor(). (4/4)
Stay tuned!
As the cost of producing code goes down, more and more of it will be generated.
All of this code must be stored in a way where it can be easily deployed and continually edited by AI agents.
Repos is the first in our line of products to co-optimize infrastructure with the models that will be the primary users of it. (1/4) 🧵
.@relace_ai has raised $23M to build the rails for AI code generation.
This round is led by @a16z, with participation from @matrixvc and @ycombinator.
LLMs have proven they can write code—but scaling that code into production still needs better infrastructure.
Relace is building exactly that: the infra layer where models and systems are co-optimized for code generation.
We’ve already shipped:
- The fastest apply model on OpenRouter (10k tok/s)
- State-of-the-art code reranking and embeddings models
These models have already processed tens of millions of requests from customers like Lovable, Magic Patterns, and Orchids.
Now, we’re taking it a step further: with Relace Repos, we’re working on a new source control system that’s built for the age of AI-generated code, with native retrieval and deep integration into our models.
If you're looking to build code generation into your product, please reach out!
As agents work for longer over bigger codebases, accurate retrieval is critical for reliability.
Repos has built in semantic search that stays fresh even with high frequency updates to the codebase.
We handle all the indexing behind the scenes and use two-stage retrieval to get SoTA accuracy at low latency per retrieval. (3/4)