Special thanks to all the incredible open source projects that made the new Cline possible. Vercel AI SDK and Models.dev for our inference routing, Pi inspired how we think about plugins, OpenTUI for our snappy new CLI interface, Zed's ACP for IDE interoperability, and so many others. Read more about how we built the new Cline SDK and CLI in our blog:
https://t.co/Ol6TUJaWUH
@badlogicgames You can self-host https://t.co/pAUziZCIwp
It supports codex, claude code, pi, opencode, and cline already (including automatic uploads via olugjns) and has a basic secret-scanner too. Maybe this could announce public transcripts somewhere so you can still own them?
Introducing Cline Kanban: A standalone app for CLI-agnostic multi-agent orchestration. Claude and Codex compatible.
npm i -g cline
Tasks run in worktrees, click to review diffs, & link cards together to create dependency chains that complete large amounts of work autonomously.
Releasing Cline CLI 2.0!
Part of the Cline open-source project trusted by 5M+ developers.
Redesigned terminal UI, parallel agents, ACP integration (use Cline in @zeddotdev, @Neovim , and @emacs), and free @Kimi_Moonshot Kimi K2.5!
Read what’s new 👇
Coding agents struggle on complex work in large messy repos, and this won't get better until we stop using saturated benchmarks with tests that look nothing like real engineering.
That's why we’re committing $1M to cline-bench, our open benchmark for real world coding tasks!
We are announcing cline-bench, a real world open source benchmark for agentic coding.
cline-bench is built from real world engineering tasks from participating developers where frontier models failed and humans had to step in.
Each accepted task becomes a fully reproducible RL environment with a starting repo snapshot, a real prompt, and ground truth tests from the code that ultimately shipped.
For labs and researchers, this means:
> you can eval models on genuine engineering work, not leetcode puzzles.
> you get environments compatible with Harbor and modern eval tooling for side by side comparison.
> you can use the same tasks for SFT and RL so training and evaluation stay grounded in real engineering workflows.
Today we are opening contributions and starting to collect tasks through the Cline Provider. Participation is optional and limited to open source repos.
When a hard task stumps a model and you intervene, that failure can be turned into a standardized environment that the entire community can study, benchmark, and train on.
If you work on difficult open source problems, especially commercial OSS, I would like to personally invite you to help. We're committing $1M to sponsor open source maintainers to take part in the cline-bench initiative.
"Cline-bench is a great example of how open, real-world benchmarks can move the whole ecosystem forward. High-quality, verified coding tasks grounded in actual developer workflows are exactly what we need to meaningfully measure frontier models, uncover failure modes, and push the state of the art."
– @shyamalanadkat, Head of Applied Evals @OpenAI
"Nous Research is focused on training and proliferating models that excel at real world tasks. cline-bench will be an integral tool in our efforts to maximize the performance and understand the capabilities of our models."
– @Teknium, Head of Post Training @nousresearch
"We are huge fans of everything Cline has been doing to empower the open source AI ecosystem, and are incredibly excited to support the cline-bench release. High-quality open environments for agentic coding are exceedingly rare. This release will go a long way both as an evaluation of capabilities and as a post-training testbed for challenging real-world tasks, advancing our collective understanding and capabilities around autonomous software development."
– @willccbb, Research Lead @PrimeIntellect:
"We share Cline's commitment to open source and believe making this benchmark available to all will help us continue to push the frontier coding capabilities of our LLMs."
– @b_roziere, Research Scientist @MistralAI:
Full details are in the blog:
https://t.co/hjUkkSefuz
Over the past 6 months, we've rearchitected @cline's agentic loop into a standalone "cline core" gRPC service that runs independently of any editor.
This enabled us to decouple from VS code and build
- JetBrains (released in GA this week)
- CLI built in Go (releasing soon)
- Secret project that may be announced soon
The CLI is our newest product and will ship without a TUI.
Our focus is to release a true primitive. something close to the metal that pipes cleanly into RL environments, CI/CD systems, scripts, and automation workflows.
Any "presentation layer" mentioned above can connect to the same running cline core - maintaining full feature parity (e.g. checkpoints, settings, api configurations), context, and conversation state.
There's an SQLite-based instance and file/directory lock registry that prevents port conflicts and coordinates graceful shutdowns between paired processes - so you can start up thousands of cline instances in parallel with no conflicts or dangling processes.
What's remarkable is that this cline core architecture opens Cline to any interface imaginable: mobile apps, web dashboards, custom tools - all powered by the same intelligent core that understands your codebase.
I’m excited to announce a $32M raise for @cline (Seed + A), led by @emergencecap & @PaceCap. Cline started as a hackathon project a year ago, and is now a community of 2.7M developers who value the power and transparency that come from bold decisions in how we build the product:
In our internal "Hard" diff editing benchmark for cases where a frontier model previously failed a diff edit (prior to our diff algorithm updates), Kimi surpassed Claude 3.5.
Will be interesting to see the results from our "Nightmare Difficulty" benchmarks in the next few weeks.
New on the Anthropic Engineering blog: how we built Claude’s research capabilities using multiple agents working in parallel.
We share what worked, what didn't, and the engineering challenges along the way.
https://t.co/k3Gzd4HkLg
We just released Amp Tab, our new completion engine.
It's based on a custom model and can complete not just single lines, but entire code blocks.
Right next to your cursor or a Tab away.
trying out Claude 4 Sonnet + @sqs' new @AmpCode and.... i think i just felt the agi
this was the result of "turn my scripts into a multitenant @railway app with billing" 🤯
We're releasing v0's AI model:
• Specialized web-dev knowledge
• OpenAI-compatible API
• Use in Cursor, Codex, or your own app
Now in beta in the API, AI SDK, or AI Playground.
When building Amp, "best" wins out over "cheaper" in every decision we make. People are noticing:
"Amp is truly amazing. ... Every other agentic tool fails miserably here."
"~order of magnitude better" than Claude Code
"insane tool"
"Im trying to figure out how to just direct deposit my paychecks into amp"