@Bramvandenakke1@bookingcom@Inn_Somnia Building #AI agents sounds exciting. Running them in production is where things get serious ⚙️
@_rchaves_, Co-Founder of @LangWatchAI, dives into agent simulation testing, evaluation and LLMOps in practice
👀 Real AI. No fluff
Day 4 of LangWatch Skills Week:
Since end of last year, we are seeing more and more AI enablement teams consolidating various Agent Development Lifecycle tooling from different teams, from homegrown evals to basic db logging, they now need a single solution for various teams, as agent quality becomes top priority.
Maybe you had Langfuse for tracing, and DeepEvals for some local evaluators, but the collaboration from domain experts and PMs are still not happening, as a dev you still need to solve everything yourself, and never have time to add proper evals or agent testing because they always get pushed to "the next sprint".
So we made a video on how to shorten this consolidation time to essentially zero, thanks to Skills, your coding assistant can now organize all your agent development tools so you can have best practices implemented and a single, collaborative platform for all the AI teams
Day 2 of LangWatch Skills Week
Setting up evaluations in 1 minute and 38 seconds
In this video I show the LangWatch Skill to build a multimodal evaluation for the InField Agent, an agriculture tool that analyzes satellite images. No existing dataset, no previous eval setup.
Watch this video on how to use LangWatch Skills to migrate from regular logs tools to a complete AgentOps platform to collab with engineers and PM's.
https://t.co/sAsfyiNu7Y
Day 2 of LangWatch Skills week. Today: evaluations.
We're showing how to add a multimodal evaluation to a agent that analyzes images, using one ask to Claude Code.
The skill reads your project and does that work for you.
Get started: → https://t.co/ZTXUvs6J00
We just launched LangWatch Skills.Tell your coding agent to instrument, observe, and fix your agent.
It does the setup. You just ask.→ https://t.co/kzi6RlVaHW
quick video explainer: https://t.co/lCIvl4WVLH
Builder Hour: Token Factory — All Around Agents
Feb 18 · 18:00 CET / 9:00 AM PST
Join Nebius for a live session with:
- New @nebiustf models + UI updates
- @LangWatchAI on testing agents in prod
- Live demos with Claude Code
- Open builder Q&A
(Thread 1/2)
Nebius Token Factory Builder Hour #2 is here.
🗓️ Feb 18, 2026 (Wed): 9am PST / 12:00 EST / 18:00 CET
Grab your favorite drink, bring your laptop, and join us for an interactive session where we connect, learn, and build together.
In this Builder Hour:
- What's new in Token Factory
- Partner chat with @_rchaves_ Co-Founder & CTO at @LangWatchAI - Live demo on how to test agents and models in pre-production and production
- Builder Chat: Using Open models with @claude_code (Live demo!)
All registrants will receive the recording, session notes, and credits.
👉 See you there: https://t.co/Z4A2k07uSL
@nebiusai@nebiustf
@openclaw took the AI world by storm last week 🦞
We ran a hackday at LangWatch and now have
Clawdbots living in our Slack boosting eng productivity by checking logs, errors, and reviewing PRs, all in our own infra.But… what is Clawdbot actually doing? Any risky business? 👀
We need observability.Until this weekend, OpenClaw had none. The OSS momentum has been 🔥On Sunday, @LangWatchAI@RedHat independently started adding OTEL to OpenClaw, then quickly teamed up to collaborate instead of competing. Goal: fully OTEL-instrumented OpenClaw, compliant with the OTEL GenAI spec. Work’s ongoing, but you can already use it today 👇
https://t.co/T8efbX5ATB
• Run experiments where your data lives, in a single spreadsheet-like workbench
• See inputs, outputs, expected answers, metrics, latency, and cost side-by-side
• Iterate faster on prompts and models and immediately compare results • Add evaluators in seconds (goldens, LLM judges, comparisons, policy checks)
• Understand the why, not just the score, with evaluator explanations
• One flow for PMs and engineers, UI + SDK, fully connected • Inspect real executions end-to-end by jumping straight into traces
• Compare runs over time and share results with stakeholders
Everything is coming together.
Proud to announce Evals & Experimentation V3 🚀
Evals and agent testing are still the hardest (and most important) part of building LLM apps.
But the real challenge isn’t running evals — it’s making them usable across the team.
With Evals & Experimentation V3, we focused on that:
Read our first and last of the year Monthly Drop in belows link!
Onto a very happy & succesfull 2026, so many exciting things coming up....👀
https://t.co/W3FRJPQ0U9