Ben Rycroft

4 days ago

JPMorgan priced out frontier intelligence. Claude Opus 4.8 costs $3,700 to run a standard benchmark. It scores 56. DeepSeek V4 Pro (Max) scores 44 on the same test. It costs $186. That's 79% of the performance for about 5% of the price. Almost every top tier open model falls inside the high value/green zone out of the box. But it gets better. Add in fine-tuning. Train an open model on your specific workflow. The intelligence gap closes. You get the same performance on your task at 5-10x less cost. Ramp is the latest company to prove it. They took a 35-billion-parameter open model, trained it on their own data, and beat Anthropic Opus 4.6 at their specific task. Cursor and Intercom published similar results weeks ago. A model trained on your data matches or beats frontier behemoths. Your own data buys you the moat. 🔹 A fraction of the cost 🔹 Speed you can ship on 🔹 A model you own, not rent If you're looking to go from generic open source model to expert at your specific workflow, check out adaption.

$BenRyc's tweet photo. JPMorgan priced out frontier intelligence. Claude Opus 4.8 costs $3,700 to run a standard benchmark. It scores 56. DeepSeek V4 Pro (Max) scores 44 on the same test. It costs $186. That's 79% of the performance for about 5% of the price. Almost every top tier open model falls inside the high value/green zone out of the box. But it gets better. Add in fine-tuning. Train an open model on your specific workflow. The intelligence gap closes. You get the same performance on your task at 5-10x less cost. Ramp is the latest company to prove it. They took a 35-billion-parameter open model, trained it on their own data, and beat Anthropic Opus 4.6 at their specific task. Cursor and Intercom published similar results weeks ago. A model trained on your data matches or beats frontier behemoths. Your own data buys you the moat. 🔹 A fraction of the cost 🔹 Speed you can ship on 🔹 A model you own, not rent If you're looking to go from generic open source model to expert at your specific workflow, check out adaption.$

BenRyc retweeted

Sara Hooker

@sarahookr

10 days ago

I'll be in New York next week 🔥 We are hiring for many roles across @adaption_ai research. We will be hosting 1:1 coffees with some of our research staff next Friday.

123

11K

BenRyc retweeted

19 days ago

Introducing Tiny AutoScientist. Built on AutoScientist's research loop, Tiny AutoScientist helps organizations get more performance from models under 10B parameters. Supersized intelligence. Tiny models.

196

137

12K

Comic nerd, wrestling enthusiast, pop culture encyclopedia. wannabe filmmaker. #BornHeel

about 1 month ago

@cursor_ai just dropped Composer 2.5. Matches Opus 4.7 on coding benchmarks BUT it's 10x cheaper. 🔹 79.8% on SWE-Bench Multilingual (Opus: 80.5%) 🔹 63.2% on CursorBench v3.1 (Opus max: 64.8%) 🔹 $2.50 per million output tokens (Opus $25) Cursor isn't a frontier lab. So how did they pull it off? They didn't train from scratch. They used an open-weight model - Kimi K2.5. Then they poured everything into post-training. 🔴 85% of the total compute for Composer 2.5 went into post-training and RL. Open base models are now a commodity. Post-training is where you build the moat. Same playbook as Intercom's Fin Apex. Same playbook as Decagon. Open base model. Post-trained on domain-specific data. Cursor trained on 25x more synthetic coding tasks than Composer 2 - if you're short on data, synthetic data delivers. Any company can run this playbook now. You don't need ML PhDs in-house. Companies are using post-training to: 🔹 8-10x cheaper 🔹 Build a real moat 🔹 Keep their data private The frontier isn't pre-training anymore. It's your domain-specific post-training data.

BenRyc's tweet photo. @cursor_ai just dropped Composer 2.5. Matches Opus 4.7 on coding benchmarks BUT it's 10x cheaper.

🔹 79.8% on SWE-Bench Multilingual (Opus: 80.5%)
🔹 63.2% on CursorBench v3.1 (Opus max: 64.8%)
🔹 $2.50 per million output tokens (Opus $25)

Cursor isn't a frontier lab. So how did they pull it off?

They didn't train from scratch. They used an open-weight model - Kimi K2.5. Then they poured everything into post-training.

🔴 85% of the total compute for Composer 2.5 went into post-training and RL.

Open base models are now a commodity. Post-training is where you build the moat.

Same playbook as Intercom's Fin Apex. Same playbook as Decagon. Open base model. Post-trained on domain-specific data. Cursor trained on 25x more synthetic coding tasks than Composer 2 - if you're short on data, synthetic data delivers.

Any company can run this playbook now. You don't need ML PhDs in-house.

Companies are using post-training to:
🔹 8-10x cheaper
🔹 Build a real moat
🔹 Keep their data private

The frontier isn't pre-training anymore.

It's your domain-specific post-training data.

Who to follow

Tony

@Twest4202000

Jace Lash

@jace_lash

Ice Fx™ Makeup , Scary Mary Makeup for #Halloween #FXmakeup SFX pros IG: @jace_lash Fitness • Les Mills Combat

BenRyc retweeted

about 1 month ago

For the next 30 days, compute is on us. Try AutoScientist. Set your outcome, co-optimize, and deploy with 78% relative improvement.

adaption_ai's tweet photo. For the next 30 days, compute is on us.

Try AutoScientist. Set your outcome, co-optimize, and deploy with 78% relative improvement. https://t.co/bRXb6i0fEr

about 1 month ago

GPT-5.5 launched at double the price of GPT-5.4. Uber burned their entire 2026 AI budget by April. The cost of running frontier models at scale is stacking up. When you build your stack around a closed model you pay whatever the vendor decides, with no ownership and no exit ramp. Many teams already know fine-tuned open-source is the answer. Smaller, faster, cheaper per inference, fully private. But it's been a headache to get there. Most fine-tuning runs fail, and the ones that work need ML expertise most teams don't have. AutoScientist solves this. It co-optimises your data and training recipe automatically. Runs the full loop end-to-end. Idea to owned fine-tuned model in an afternoon. 35% better than human-configured training across 8 verticals. Free for 30 days. Fine-tune your own model for free (next 30 days): https://t.co/29GY8lYwXa

BenRyc's tweet photo. GPT-5.5 launched at double the price of GPT-5.4. Uber burned their entire 2026 AI budget by April. The cost of running frontier models at scale is stacking up.

When you build your stack around a closed model you pay whatever the vendor decides, with no ownership and no exit ramp.

Many teams already know fine-tuned open-source is the answer. Smaller, faster, cheaper per inference, fully private. But it's been a headache to get there.

Most fine-tuning runs fail, and the ones that work need ML expertise most teams don't have.

AutoScientist solves this.

It co-optimises your data and training recipe automatically.

Runs the full loop end-to-end. Idea to owned fine-tuned model in an afternoon.

35% better than human-configured training across 8 verticals.

Free for 30 days.

Fine-tune your own model for free (next 30 days): https://t.co/29GY8lYwXa

about 1 month ago

@carlfranzen @VentureBeat Hey Carl, sure happy lets DM

about 2 months ago

AutoScientist launched today. Now anyone can train an AI model. Fine-tuning used to live inside frontier labs. Tribal knowledge. PhDs. Weeks of sweeps. That's gone. Here's what the loop looks like now: 🔹 Raw data in: support tickets, transcripts, PDFs, whatever you've got sitting in production 🔹 Adaptive Data transforms it into structured, fine-tuning-ready datasets 🔹 AutoScientist runs the full research loop: optimizing data and training recipes until the model converges on your goal 🔹 Fine-tuned model out Hours. Not weeks. The results surprised us: 🔹 35% average improvement over our own AI research staff configuring the same runs 🔹 Win rates from 48% → 64% against the base model 🔹 Consistent gains across all 8 verticals we tested Prompt engineering was always a workaround. You were contorting requests to fit a model built for the average use case, because shaping the model was too much of a lift. Now it isn't. Your data. Your model. Your moat. Link below. 4h • Edited •

about 2 months ago

Introducing AutoScientist. Most model training fails outside of frontier labs. AutoScientist automates the full research loop so it doesn't have to.

838

110

717

204K

719

about 2 months ago

Every week we talk to companies battling AI API bills. Uber already blew their annual AI budget. They all want the same thing. 🔷 A model that's an expert in their domain. 🔷 Runs 10x cheaper than a frontier API. 🔷 Carries their brand voice. 🔷 Fully private. Fine-tuning is the path. But most projects stall before training even starts. The data isn't ready. Too thin. Too noisy. Too narrow to move the model. And the process is cumbersome. Today we're making it a lot easier to fix. Together AI fine-tuning is now live inside the Adaption platform. Shape your data. Train your model. One workflow. Training cycles drop from weeks to days. This is a big step toward making fine-tuning accessible to every team, not just the ones with frontier-lab resources. More announcements coming soon.

about 2 months ago

We believe that intelligence should not arrive preconfigured. @togethercompute is now available directly inside the Adaption platform, connecting Adaptive Data with large-scale training in a single workflow. One platform, end to end. Stop inheriting intelligence. Shape it.

127

26K

162

BenRyc retweeted

Sakana AI

@SakanaAILabs

2 months ago

What if instead of building one giant AI, we evolved a coordinator to orchestrate a diverse team of specialized AIs? 🐟 Excited to share our new paper: “TRINITY: An Evolved LLM Coordinator”, published as a conference paper at #ICLR2026! Paper: https://t.co/Pr1TT15iqa In nature, complex problems are rarely solved by a single monolithic entity, but rather by the coordinated efforts of specialized individuals working together. Yet, modern AI development is heavily focused on endlessly scaling up single, massive monolithic models, yielding diminishing returns. While model merging offers a way to combine different skills, it is often impractical due to mismatched neural architectures and the closed-source nature of top-performing models. To address this, we took a macro-level approach: test-time model composition. We introduce TRINITY, a system that fuses the complementary strengths of diverse, state-of-the-art models without needing to modify their underlying weights. TRINITY processes queries over multiple turns. At each step, a lightweight coordinator assigns one of three distinct roles to an LLM from its available pool: 1/ Thinker: Devises high-level strategies and analyzes the current state. 2/ Worker: Executes concrete problem-solving steps. 3/ Verifier: Evaluates if the current solution is complete and correct. By dynamically assigning these roles, the coordinator effectively offloads complex reasoning and skill execution onto the external models. What makes TRINITY unique is its extreme efficiency. The coordinator relies on the hidden states of a compact language model and a small routing head. In total, it has fewer than 20K learnable parameters. Training this system presented a massive challenge. Traditional Reinforcement Learning (REINFORCE) failed because the gradients had a low signal-to-noise ratio due to binary rewards and weak parameter coupling. Imitation learning (Supervised Fine-Tuning) was ruled out because generating multi-turn labels is prohibitively expensive. Our solution? We turned to nature-inspired algorithms. We optimized the coordinator using a derivative-free evolutionary algorithm. We found that evolution is uniquely suited to optimize this tight, high-dimensional coordination problem where traditional gradient-based methods fail. The results are very promising. In our experiments, TRINITY consistently outperforms existing multi-agent methods and individual models across various benchmarks. At the time of publication, it set a new state-of-the-art record on LiveCodeBench, achieving an 86.2% pass@1 score. More importantly, it demonstrated incredible generalization. Without any retraining, TRINITY transferred zero-shot to four unseen tasks (AIME, BigCodeBench, MT-Bench, and GPQA). On average, the evolved coordinator surpassed every individual constituent model in its pool, including GPT-5, Gemini 2.5-Pro, and Claude-4-Sonnet (the top frontier models available at the time of our #ICLR2026 submission last year). This work is central to Sakana AI's vision. We believe the future of AI isn't just about scaling monolithic models, but engineering collaborative, diverse AI ecosystems that can adapt and combine their strengths. We invite the community to read the paper and explore these ideas! Paper: https://t.co/Pr1TT15iqa OpenReview: https://t.co/mbSP1aFCv7 This foundational research is part of the core engine powering our multi-agent product: Sakana Fugu 🐡👇

SakanaAILabs's tweet photo. What if instead of building one giant AI, we evolved a coordinator to orchestrate a diverse team of specialized AIs? 🐟

Excited to share our new paper: “TRINITY: An Evolved LLM Coordinator”, published as a conference paper at #ICLR2026!

Paper: https://t.co/Pr1TT15iqa

In nature, complex problems are rarely solved by a single monolithic entity, but rather by the coordinated efforts of specialized individuals working together. Yet, modern AI development is heavily focused on endlessly scaling up single, massive monolithic models, yielding diminishing returns. While model merging offers a way to combine different skills, it is often impractical due to mismatched neural architectures and the closed-source nature of top-performing models.

To address this, we took a macro-level approach: test-time model composition. We introduce TRINITY, a system that fuses the complementary strengths of diverse, state-of-the-art models without needing to modify their underlying weights.

TRINITY processes queries over multiple turns. At each step, a lightweight coordinator assigns one of three distinct roles to an LLM from its available pool:

1/ Thinker: Devises high-level strategies and analyzes the current state.
2/ Worker: Executes concrete problem-solving steps.
3/ Verifier: Evaluates if the current solution is complete and correct.

By dynamically assigning these roles, the coordinator effectively offloads complex reasoning and skill execution onto the external models.

What makes TRINITY unique is its extreme efficiency. The coordinator relies on the hidden states of a compact language model and a small routing head. In total, it has fewer than 20K learnable parameters.

Training this system presented a massive challenge. Traditional Reinforcement Learning (REINFORCE) failed because the gradients had a low signal-to-noise ratio due to binary rewards and weak parameter coupling. Imitation learning (Supervised Fine-Tuning) was ruled out because generating multi-turn labels is prohibitively expensive.

Our solution? We turned to nature-inspired algorithms. We optimized the coordinator using a derivative-free evolutionary algorithm. We found that evolution is uniquely suited to optimize this tight, high-dimensional coordination problem where traditional gradient-based methods fail.

The results are very promising. In our experiments, TRINITY consistently outperforms existing multi-agent methods and individual models across various benchmarks. At the time of publication, it set a new state-of-the-art record on LiveCodeBench, achieving an 86.2% pass@1 score.

More importantly, it demonstrated incredible generalization. Without any retraining, TRINITY transferred zero-shot to four unseen tasks (AIME, BigCodeBench, MT-Bench, and GPQA). On average, the evolved coordinator surpassed every individual constituent model in its pool, including GPT-5, Gemini 2.5-Pro, and Claude-4-Sonnet (the top frontier models available at the time of our #ICLR2026 submission last year).

This work is central to Sakana AI's vision. We believe the future of AI isn't just about scaling monolithic models, but engineering collaborative, diverse AI ecosystems that can adapt and combine their strengths.

We invite the community to read the paper and explore these ideas!

Paper: https://t.co/Pr1TT15iqa
OpenReview: https://t.co/mbSP1aFCv7

This foundational research is part of the core engine powering our multi-agent product: Sakana Fugu 🐡👇

404

240

101K

BenRyc retweeted

Sara Hooker

@sarahookr

2 months ago

What is crazy is we adapt data so fast you can still join the competition today, think about a meaningful long-tail problem to work on and be in a position to win by this Friday. So join us. :) Big shoutout to @sudip_r0y for building incredibly fast infrastructure.

BenRyc retweeted

2 months ago

Only 10% of your data speaks AI. The other 90% is unstructured — invisible to the models making decisions about your business. All the progress in AI so far? Built on a fraction of what's actually out there. Today we're launching Forge, a feature of Adaptive Data, to extend adaptive intelligence to the 90%.

16K

2 months ago

We're watching an entire discipline emerge (Context Engineering) because we haven't figured out how to make AI learn continuously. Kudos to Malika Aubakirova for the comparison to the Christopher Nolan classic Memento in this @a16z piece. We're duct-taping context windows and RAG pipelines together to keep our models from forgetting what happened one prompt ago. If context engineering has you doing backflips and you're looking to train models that become native experts at your use case, get in touch.

Sudip Roy

@sudip_r0y

2 months ago

Static models won’t win against dynamic environments. Adaptable AI will. Systems that learn during use, not after the fact, are the ones that scale. Everything else falls behind. @a16z mapped who's building on that. @adaption is in it.

sudip_r0y's tweet photo. Static models won’t win against dynamic environments.
Adaptable AI will.

Systems that learn during use, not after the fact, are the ones that scale.

Everything else falls behind.

@a16z mapped who's building on that. @adaption is in it. https://t.co/6pBXwwWl6y

2 months ago

Demis Hassabis spoke at City Arts & Lectures in SF Monday night. And it won't make your news feed. Zero hot takes. In contrast, AI has been racking up negative headlines in the US. "It's coming for your job" "This is a terminator scenario" "The end of software engineers" And it's sticking. NBC ran a widely cited poll that found AI has a net favorability of -20. For context, below ICE and just above the Iranian government. People have real reasons to be wary: 🔷 Data centers increasing electricity prices 🔷 Job displacement 🔷 Deepfakes These concerns are legitimate. But caution shouldn't mean inaction. In China, 83% of people believe AI offers more benefits than drawbacks. In the US, it's 39%. In India and China, more than 80% of workers use AI regularly at work. In the US, closer to 50%. (Stanford HAI) These countries are building AI fluency into the next generation by default. China made AI a compulsory school subject in 2025. Kids as young as six are learning algorithmic thinking and robotics. India launched AI modules for grades 6-12. The goal: build the world's largest AI-ready student population. Singapore, South Korea, and the UK are moving in the same direction. In the US, only half of middle and high schools have any AI policy at all. The US still leads on building with AI. But the risk is the median person enters the job market without knowing how to use it well, while their peers abroad treat it like electricity. AI literacy isn't just about writing prompts. Knowing when to trust it. When to push back. When to reach for it. When to close the app. We're not teaching this well. And the negative narrative is putting people off. This is why we need more voices like Demis that can hold both things at once. As he says: The risks are real. There is a non-zero chance AI goes wrong. The upside is also real. A society with infinite resources. No diseases. The genie is not going back in the bottle. This generation will either shape how AI gets used or get shaped by how others use it. The difference is whether anyone brings them in and shows them how.

BenRyc's tweet photo. Demis Hassabis spoke at City Arts & Lectures in SF Monday night.

And it won't make your news feed.

Zero hot takes.

In contrast, AI has been racking up negative headlines in the US.

"It's coming for your job"
"This is a terminator scenario"
"The end of software engineers"

And it's sticking.

NBC ran a widely cited poll that found AI has a net favorability of -20. For context, below ICE and just above the Iranian government.

People have real reasons to be wary:
🔷 Data centers increasing electricity prices
🔷 Job displacement
🔷 Deepfakes

These concerns are legitimate.

But caution shouldn't mean inaction.

In China, 83% of people believe AI offers more benefits than drawbacks. In the US, it's 39%.

In India and China, more than 80% of workers use AI regularly at work. In the US, closer to 50%. (Stanford HAI)

These countries are building AI fluency into the next generation by default.

China made AI a compulsory school subject in 2025. Kids as young as six are learning algorithmic thinking and robotics. India launched AI modules for grades 6-12. The goal: build the world's largest AI-ready student population.

Singapore, South Korea, and the UK are moving in the same direction.

In the US, only half of middle and high schools have any AI policy at all.

The US still leads on building with AI.

But the risk is the median person enters the job market without knowing how to use it well, while their peers abroad treat it like electricity.

AI literacy isn't just about writing prompts. Knowing when to trust it. When to push back.

When to reach for it. When to close the app.

We're not teaching this well. And the negative narrative is putting people off.

This is why we need more voices like Demis that can hold both things at once.

As he says:

The risks are real. There is a non-zero chance AI goes wrong.

The upside is also real. A society with infinite resources. No diseases.

The genie is not going back in the bottle.

This generation will either shape how AI gets used or get shaped by how others use it.

The difference is whether anyone brings them in and shows them how.

2 months ago

You find a great dataset on Hugging Face. Now you can make it yours. Maybe you need it in 12 languages. Reshape it for a different task. 10x more volume and improve the quality. Add reasoning traces. This is what sits between your dataset and a production-ready model. Hugging Face is now available in Adaptive Data. Pull any dataset directly into a platform built to close that gap. 🔹 Improve data quality across the full set 🔹 Reshape for your specific task 🔹 Expand into 242 languages Go from raw data to training-ready. Without the manual grind. Link below.

2 months ago

The open-source AI community just got a new home for their data workflows. 🤗 @huggingface is now available in Adaptive Data. Pull datasets directly into a platform that evolves with the problems you're solving.

178

55K

703

3 months ago

Last week we jumped on a call with a team scaling into 6 markets. Their AI worked great. In English. But their users speak Brazilian Portuguese. German. Arabic. And they don't want a translated experience. They want a native one. Where the model doesn't fall back to English. Or output low quality results. Translation slaps new words on English-trained thinking. The grammar feels off. The tone is wrong. The output reads like it was written by picking words out of a dictionary. Not a native speaker. Users notice. And they leave. This is a training data problem. If your model only learned from English-heavy data, it thinks in English. Every other language gets a second-class output. Today we're launching Expand Your World in Adaptive Data. Adapt your training data into 242 languages. No translation wrapper on English output. Your model thinks natively in your users' language. Your AI feels local everywhere it ships. Full announcement below. Are you a startup building models that needs to work across languages? Adaption for Startups gives you a sponsored Plus Plan and credits to make your model a native polyglot from day one: https://t.co/wcadXgWACc

3 months ago

Most datasets reflect the world as it was convenient to capture, not as it actually exists. Introducing Expand Your World, a new feature in Adaptive Data. 242 languages and localizations. The fastest way to global coverage.

44K

3 months ago

@cursor_ai's Composer 2 is delivering performance close to frontier models. But is 3–10x cheaper when you factor in: 🔹Base cost 🔹 Token efficiency (trained to edit instead of rewriting entire files) 🔹 Subscription discounts They got there by excelling where general-purpose models fall short: 1. Deeply optimizing the model Closed APIs restrict you to prompts and light fine-tuning. Cursor used full-parameter training on an open base model. They updated the model on a massive code dataset. Transformed a general AI into a specialized coding system. 2. Optimizing for edits, not rewrites General models tend to rewrite entire files. Cursor is optimized for surgical diffs. That means fewer tokens, lower latency, and lower cost. 3. Architecture and inference improvements You can’t alter the structure of closed models. Cursor altered Kimi K2.5. They added things like Multi-Token Prediction to improve speed. 4. Stripping scope, not just cost Frontier models are expensive because they do everything. Cursor specialized heavily for coding workflows. Making it lighter, faster and cheaper to run. For everyday software engineering tasks, this is a big win. You don't need a frontier model at 10x the cost. You need a model that's fast and inexpensive.

BenRyc's tweet photo. @cursor_ai's Composer 2 is delivering performance close to frontier models.

But is 3–10x cheaper when you factor in:
🔹Base cost
🔹 Token efficiency (trained to edit instead of rewriting entire files)
🔹 Subscription discounts

They got there by excelling where general-purpose models fall short:

1. Deeply optimizing the model
Closed APIs restrict you to prompts and light fine-tuning.
Cursor used full-parameter training on an open base model.
They updated the model on a massive code dataset.
Transformed a general AI into a specialized coding system.

2. Optimizing for edits, not rewrites
General models tend to rewrite entire files.
Cursor is optimized for surgical diffs.
That means fewer tokens, lower latency, and lower cost.

3. Architecture and inference improvements
You can’t alter the structure of closed models.
Cursor altered Kimi K2.5.
They added things like Multi-Token Prediction to improve speed.

4. Stripping scope, not just cost
Frontier models are expensive because they do everything.
Cursor specialized heavily for coding workflows.
Making it lighter, faster and cheaper to run.

For everyday software engineering tasks, this is a big win.

You don't need a frontier model at 10x the cost.

You need a model that's fast and inexpensive.

BenRyc retweeted