We're launching vibe training.
Describe what your agent should and shouldn't do.
We generate the edge cases, build the test set, train a model calibrated to your policies.
In minutes.
We're launching vibe training.
Describe what your agent should and shouldn't do.
We generate the edge cases, build the test set, train a model calibrated to your policies.
In minutes.
Within hours of our launch, thousands of agent builders were already live with vibe training.
Today we're on Product Hunt. An upvote takes 30 seconds.
👉 https://t.co/0DSKjrXv7s
Yesterday blew past every expectation.
I barely slept (2 hours, if I’m honest)… and now we’re heading straight into our #ProductHunt launch and I need you! 🚀
Because something clicked.
We launched vibe training - and within hours, thousands of agent builders started creating evals and guardrails for their own use cases!
It’s moving fast.
Because the truth is simple:
Building agents is easy.
Making them reliable in production is not.
That’s what vibe training fixes.
If you’ve been following, building with us, or just rooting from the sidelines — we need your support ❤️
• Open the link
• Hit upvote
• Drop a quick comment
This takes 30 seconds and directly impacts our ranking.
Let’s push this to the top today
https://t.co/Np5GbG2xWm
Air Canada’s chatbot once literally made up its own refund policy in court and won a lawsuit for the customer, not the airline.
There’s a new term being coined right now called vibe training by the company @pluraiAI, and they’ve basically built a way to use tiny, fast models as guardrails to catch hallucinations in sub-100ms and the cost is over 8x lower than GPT-5-mini.
🔥👉 They’re live on Product Hunt today: https://t.co/oCrSIlcaMH
If you’re building agents, go check them out, grab the free trial, and show them some love on the launch! 🫶
The best part? You don’t need a PhD in AI. Sponsored by Plurai.
This team just coined the concept of vibe training.
Build real-time, tailored evals and guardrails for your agent, with high accuracy at a fraction of the LLM cost.
Launching today on @ProductHunt.
Within hours of our launch, thousands of agent builders were already live with vibe training.
Today we're on Product Hunt. An upvote takes 30 seconds.
👉 https://t.co/0DSKjrXv7s
Yesterday blew past every expectation.
I barely slept (2 hours, if I’m honest)… and now we’re heading straight into our #ProductHunt launch and I need you! 🚀
Because something clicked.
We launched vibe training - and within hours, thousands of agent builders started creating evals and guardrails for their own use cases!
It’s moving fast.
Because the truth is simple:
Building agents is easy.
Making them reliable in production is not.
That’s what vibe training fixes.
If you’ve been following, building with us, or just rooting from the sidelines — we need your support ❤️
• Open the link
• Hit upvote
• Drop a quick comment
This takes 30 seconds and directly impacts our ranking.
Let’s push this to the top today
https://t.co/Np5GbG2xWm
Today we're launching vibe training.
Describe what your agent should and shouldn't do.
We generate the edge cases, build the test set, train a model calibrated to your policies.
In minutes.
Start Vibe-training: https://t.co/PQ5AT3X3x3
Vibe train your AI agents.
This new method can replace LLM-as-a-judge for production agents.
Most teams point a giant LLM at their agent's output and call it evaluation. It works, but it comes with two real costs:
- It's slow and expensive at inference time
- It misses the domain-specific failures that actually matter to your use case
Vibe training flips the whole setup.
Researchers at Plurai distill a small language model that's specialized for your agent's exact behavior, your edge cases, and your failure modes. The SLM becomes your evaluator and your runtime guardrail in one.
Here's why this is a big deal:
- Cheap enough to run inline on every agent step, not just offline batches
- Catches the failures that generic LLM judges shrug off
- Same model guards production and grades it, so eval and runtime stay in sync
A small specialized model beating a giant general one is becoming a pattern. Distillation is quietly turning into one of the most underrated techniques for shipping reliable agents.
Try it here: https://t.co/KKzTfveJ26
Paper: https://t.co/GcIm0PKlQr
Vibe train your AI agents.
There's a new method that could replace LLM-as-a-judge for production agents.
Most teams rely on a giant LLM as a judge to evaluate and guard their agent. But it has two major drawbacks:
- It's slow and expensive at inference time
- It often misses domain-specific failures
Vibe training flips this.
Researchers at Plurai distill a small language model that's specialized for your agent's exact use case. The SLM becomes your evaluator and your runtime guardrail, both in one.
The training data isn't hand-curated either.
They spin up a swarm of adversarial agents that debate and stress-test every use case your agent is supposed to handle. That synthetic interaction data trains the specialized SLM.
So the judge actually understands what "wrong" looks like in your specific domain.
The reported gains vs. standard LLM-as-a-judge setups:
- ~8x faster inference
- ~50% fewer evaluation errors
Smaller, faster, and more accurate because it's specialized for the job. The SLM-for-agents thesis is playing out in a very concrete way.
If LLM-as-a-judge is your current evaluation layer, this is worth benchmarking against.
Paper link in the replies.
THIS IS REVOLUTIONARY
Building agents from now on is going to be a totally different thing.
much more reliable
much faster
much cheaper
and as far as I know, it is currently free.
they train a small language model in a super sophisticated way that makes unlabeled data labeled, using agents that debate to get the best label.
it is brilliant...
I used to pay for the most expensive AI models just to double-check my own agents.
It felt like a "safety tax" I had to pay, but it was killing my margins and making everything feel slow. I was basically paying twice for the same result.
Plurai finally fixed this. Instead of a giant model, you train a tiny one that only cares about your specific rules.
You just type what you want in plain English, and it builds a custom safety net in minutes. It runs instantly and costs almost nothing.
This is how you actually move from a prototype to something that works at scale.
Check it out:
I've made a ton of money helping companies implement LLM-as-a-judge evaluations.
LLM Judges provide a ton of value.
But the hard part is choosing the model to implement the judge.
• The family of GPT-5 models is very good, but slow and expensive.
• Models like Gemma and Phi are fast and cheap, but not that good.
Most of the time, you can only run a percentage of your traffic through the model (otherwise it would be too expensive and slow).
But now, there's a better strategy.