Always wanted to manage your knowledge base in Git but didn't know how to store your diagrams on GitHub? Use @excalidraw! Export your diagram as PNG with the "embed scene" option enabled, install the VS Code Excalidraw extension, and visually edit the diagram and commit as code!
Big news! The platform we've been working so hard on is officially live. Check it out and discover a new world of possibilities. Join us on this journey!
Here comes the big announcement I've been holding inside for months! 👾
For those building AI agents, we've been seeing more and more how everyone eventually hits a wall where the computational price of ensuring reliability gets too high.
Then comes the inevitable and yet impossible choice between reliability and manageable costs.
But I'm excited to announce that this state of affairs changes today!!!
After **so much** community demand in the Parlant scene, for a solution to the accuracy/cost trade-off, we're excited to launch Emcie!
Emcie is our long-awaited, automatic SLM (Small Language Model) distillation platform for Parlant.
It makes it possible run the famously reliable Parlant agents with high accuracy, at minimal costs.
For those who've been struggling to balance agent reliability with operational costs, combining Parlant with Emcie could be your solution.
To try out our new models, sign up at https://t.co/UoJKnIvWH0 - and let me know what you think!
Anyone working with LLMs these days — sharing here a link to a free webinar my company is hosting about reliable, production-grade agent architecture.
We’ll go over real-world patterns and insights from the field. Highly recommended!
https://t.co/cEMw0hbZn0
🔥 Love this take — SLMs aren’t just about shrinking models, they’re about rethinking architecture. Many narrow specialists > one big generalist. Smart, scalable, and future-proof.
Small language models (SLMs) are definitely the way to go. But what many are missing is that SLMs can't handle *nearly* the diversity and complexity that larger models can.
Our 2 full-time NLP researchers are currently working day and night on getting SLMs aligned with larger models. SFT, RLVR, hybrid approaches - what have you... (P.S. stay tuned for upcoming announcements).
Here's what we've found.
⚠️ Small models fail miserably at broad, fuzzy tasks - especially those with diverse inputs. They do quite well, however, at narrow, specific ones.
➡️ A 14b model can be great at one specific thing: "Tell me whether the following observation is true with respect to the current state of this conversation..."
➡️ A 7b model will need further breakdown - "Given ***this particular category*** of observations that you're trained on, does this observation hold in this conversation?"
➡️ A 3b model will need even more categories.
💡 The lower you go in terms of model size, the more SLMs you need to manage and route between, to replace your LLM setup.
Yet it's so worth it in terms of cost and latencies. If you play your cards right, you can get a 10-20x cost reduction and 2-10x latency improvements.
But here's the key point. What I'm saying here can *only be leveraged* if your agentic architecture lends itself to this type of task decomposition.
In Parlant, for example, we don't have one LLM request doing "aligned conversation" - unlike ~99% of CAI solutions.
We have six specialized categories just for guideline matching, different categories of tool calls, different ways to compose an aligned response message, and a separate stage of selecting a canned response (when they're used).
Each is a narrow task. Each often uses a different model. Most importantly, each can be fine-tuned independently.
Now, suppose your agentic architecture treats the LLM as a magic black box that handles everything (e.g., general "planning" and "responding").
In that case, you won't be able to train SLMs, and you won't enjoy the cost and latency slashes that they can deliver.
So build pedantically, obsessing about the intricacies of each sub-task in your system. If the task is sufficiently well-defined, you'll be able to train an SLM for it. Or stay locked into expensive models for years to come. ☹️
😳 This hits hard. Everyone’s obsessed with speed, but compliance isn’t something you can “ship later.” Love how you put it — it’s the new security. Build it in from day one or pay for it big time. 💸🔥
"A $1,000 fine for each violation of a customer-facing AI." California just gave users a private right of action, meaning your customers can now sue you if your AI deceived them.
Let's see... 80% accuracy on 1 million monthly conversations... I'm never the arithmetic expert, but I believe this would expose an organization to up to
...wait for it...
*200 MILLION USD PER MONTH* in fines.
Yikes.
Yet most agentic solutions (especially those built in-house) still treat compliance as an afterthought. "Human reps aren't perfect either, so it's okay if the AI doesn't work," you say.
Well, practice saying it, because soon enough, you might need to say it in court. 🤦
Look, it's just like with security: you can't bolt compliance on at the end. You need to:
1. Seriously understand the implications
2. Carefully define the accepted range of mistakes (not just frequency, but their potential severity)
3. Logically demonstrate that your solution cannot go outside this range
4. Architect around this approach from day one
The good news is that Parlant will be there for you as soon as you're ready to come to terms with this reality - even if you're building in-house (long live open-source) :)
https://t.co/xCDJmaw2nL
@ymarcov Love this deep dive into real context engineering 👏 — not just prompting hype, but hard-earned lessons from building production-grade agentic systems. Transparency like this pushes the whole community forward. 🔥🤖
Spot on 👏 — “bad service is worse than no service” nails it. In AI and customer-facing systems, uptime means nothing without trust. Thanks for calling this out 💡
When I was a software architect at Microsoft Azure, SLAs were critical. Our team had to guarantee the number of nines our infrastructure would deliver in multiple aspects of service quality (e.g., 99.999% is "5 nines").
To put this in perspective, in terms of time, five nines means *up to* 315 seconds of experiencing service degradation per year.
But somehow we've now normalized this debate on whether an 80% SLA is suddenly "good enough" for AI serving customers in high-involvement communications.
What gives?
If there's one main thing I've learned from working with large enterprises on customer-facing AI, it's this:
🔔 Bad service is objectively worse than no service.
➡️ Would you rather have your banking app go down for 30 seconds, or have it execute one unauthorized transaction?
➡️ Would you rather have your support chat be unavailable, or have it give one customer medically dangerous advice?
When your service is down, customers know something is wrong. They wait. They retry. They find alternatives. The failure mode is transparent.
When your service is bad, customers act on incorrect information. They follow advice that hurts them. They unwittingly trigger actions that get them or your business in trouble. And they're right to blame you for it—who else would they?
So how come 20% bad-service is acceptable while 0.01% no-service isn't? 🤔
It's not... and saying it is just sets everyone up for failure. Spread the word!
https://t.co/CdH1UNa1dw
Absolutely spot on 👏 — accuracy matters, but trust and conversation flow are what truly drive impact. Love how you highlighted the human side of AI interactions — that’s where real value is built.
Here's a "LangChain vs LlamaIndex" comparison. Whose RAG is more accurate?
Guess what?
🔔 Ding-ding! RAG accuracy is not the main issue. Ding-ding again.
We're working with people who deploy agents handling hundreds of thousands to millions of customer interactions per month.
Retrieval accuracy alone doesn't create customer trust or engagement. It's a component. And here's the other component nobody talks about.
If the conversation isn't managed and steered confidently and authoritatively, in a way that creates trust for your customer:
1. They won't trust your answers even if they're accurate
2. They often won't even get to the point where they ask their important questions—not before they escalate the chat to a human rep
If you want great production metrics, sure, go ahead and spend half of your time getting your RAG accurate.
But to make a real difference, spend the other half optimizing domain-specific conversational steering and governance, focused on understanding your customers' needs and interaction patterns.
https://t.co/n1bHyqxX5k
"Let X." When you take this first step in a mathematical proposition, you've created something with infinite potential. X could be anything.
At the same time, until you start adding constraints ("such that X is a natural number," "X such that some condition holds"), while it holds infinite potential, there's actually nothing you can discover about it or do with it.
It's infinitely shallow.
I've spent my career building frameworks, from low-level network protocols, hard-realtime pub/sub, to cloud platforms at Microsoft, and this same principle applies to framework design.
The more assumptions your framework can make about its use case, the more powerful and optimized it becomes for that use case compared to generic solutions.
👉 Adding constraints and specializations isn't a curse. It's actually a blessing.
The only question that matters once you've understood that is: do your chosen constraints leave you with a real and significant use case?
With Parlant, for example, we've focused on the conversational, customer-facing use case. And we've discovered (and continuing to discover) enormous complexity in the world of chaotic semantics, which is the world of conversations.
IMO, it's by far the solution most able to tame this chaos in today's market (though this is just the start).
I'm saying this because many people recently, who love Parlant's performance in conversational control, have asked us to apply it to the world of automation agents.
To that I say: it's a great idea, and I'd be happy to support anyone who goes down this path and share them what we've learned on getting LLMs under control.
But I also know this: the world of conversational semantics is extremely complex. Getting it under the necessary level of control is the mission of a dedicated company, not a side hustle. If you're in this market - mark these words :)
So we'll continue to delve deeper into the significance of controlling AI conversations at the largest scales. Because the more we dive deep, the more complexity and power we discover within it.
Someone should do the same for workflow automation agents, since there still isn't a single framework I know of that's really taking control and reliability seriously enough in that area.
"Let X." When you take this first step in a mathematical proposition, you've created something with infinite potential. X could be anything.
At the same time, until you start adding constraints ("such that X is a natural number," "X such that some condition holds"), while it holds infinite potential, there's actually nothing you can discover about it or do with it.
It's infinitely shallow.
I've spent my career building frameworks, from low-level network protocols, hard-realtime pub/sub, to cloud platforms at Microsoft, and this same principle applies to framework design.
The more assumptions your framework can make about its use case, the more powerful and optimized it becomes for that use case compared to generic solutions.
👉 Adding constraints and specializations isn't a curse. It's actually a blessing.
The only question that matters once you've understood that is: do your chosen constraints leave you with a real and significant use case?
With Parlant, for example, we've focused on the conversational, customer-facing use case. And we've discovered (and continuing to discover) enormous complexity in the world of chaotic semantics, which is the world of conversations.
IMO, it's by far the solution most able to tame this chaos in today's market (though this is just the start).
I'm saying this because many people recently, who love Parlant's performance in conversational control, have asked us to apply it to the world of automation agents.
To that I say: it's a great idea, and I'd be happy to support anyone who goes down this path and share them what we've learned on getting LLMs under control.
But I also know this: the world of conversational semantics is extremely complex. Getting it under the necessary level of control is the mission of a dedicated company, not a side hustle. If you're in this market - mark these words :)
So we'll continue to delve deeper into the significance of controlling AI conversations at the largest scales. Because the more we dive deep, the more complexity and power we discover within it.
Someone should do the same for workflow automation agents, since there still isn't a single framework I know of that's really taking control and reliability seriously enough in that area.
20 years ago I wrote my own, first web MVC app framework, from scratch. But there were already dozens around, and they each did the exact same thing. The main difference was that they each thought a particular set of functions (which they all implemented the same way) sounded cooler with this or that name or naming convention.
And I can't help but notice that what's happening with AI agent frameworks today is just like what happened then. A new framework comes up with the value prop of "a slightly more aesthetic API" without tackling any real technical problem.
Is that what makes a framework valuable..?
Or is it that it comes packed with 100 integrations for external libraries (vector DBs, LLM APIs, etc.) — while, incidentally, each of these "integrations" is a limited abstraction over the external libraries' powerful APIs: ones that their designers put much thought into so they may solve many problems and edge cases.
There's a difference between "simplifying the problem" and being overly simplistic due to lack of experience and understanding. The latter is performed by the inexperienced, but the former can only be done with deep expertise and understanding. "To simplify" something complex — properly — is really, really hard.
In software design, if we start from the (superficial level of) aesthetics — rather than a real problem, we end up with an API that breaks the minute you deviate 1 step from the "getting started" tutorial. Seen it too many times.
Here's what actually matters when evaluating a framework:
// Purpose - what important challenges does this address that I can't easily solve myself?
// Design - does it make the hard things possible, or does it just make the easy things slightly easier?
https://t.co/XeMW7CTjwO
After recent calls with large-scale users of Parlant (and some new leads who are exploring it after experiencing issues in their customer-facing agents), so much of that comes down to the Supervisor Pattern... This is why I'm writing about it so much.
But recently, an idea for a new agent architecture came up that shows a lot of promise.
The initial problem is this: when you genuinely have complex, distinct departments in your AI support system, trying to cram everything into one omni-agent becomes unmanageable. It pretends to be one agent while fragmenting the conversation and mixing up contexts.
Don't get me wrong... An omni-agent is a really cool moonshot concept. But it (unfortunately) fails due to hard technical limitations.
But there's another option: Just build it like real customer service.
You call support. A receptionist briefly clarifies what you need, then routes you *once* to the right department. Then an expert in that department handles your *entire* conversation.
I call it the "Receptionist Pattern" and it's how agentic architectures *should* handle complex support use cases.
The great thing about it is that it's so based on reality that:
1. Customers already understand it, so there's a natural alignment of expectation vs reality in their usage patterns. The friction is minimal.
2. Operators/developers already understand it, which makes it so much easier to model agentic flows (and even development teams) around it.
- Simple routing at the entry point (not mid-conversation)
- Each department is a separate expert agent with full context (using department-specific dynamic context assembly)
- One coherent conversation per department
- If you need a different department, you need an explicit transfer (just like real service)
The alignment between expectation (which is based on existing habits) and reality (your product's UX) is what makes your AI feel natural and usable.
Not the sci-fi features. Not sub-second latencies. Just providing better, quicker, and more manageable customer service.
99% of conversational agents today are built the wrong way. 🤷
Everyone reaches for the supervisor pattern: with a router at the top and specialized agents below (returns agent, billing agent, warranty agent).
It looks clean and modular, but it breaks in production.
@BenCaspit - איפה רשום שהדירה נקנתה במזומן (״שטרות״)? רשום שהסכום הועבר במלואו, כלומר ללא משכנתה או שיעבוד
- לפי הנטען בכתבה הוא שינה את שמו חוקית
למען הסר ספק, אינני מחובבי נתניהו ב��שון המעטה... אבל כן חובב דיוקים
Dear @realDonaldTrump - In WW2, the allies refused to bomb the railway tracks that transported 6M Jews to their tragic death. Please don't do the same mistake now! - Join the fight, bomb all nuclear sites and promise #NO_NUKES_TO_IRAN#NEVER_AGAIN