What are you up to on June 4? Come hang with us and...
Uber. OpenAI. DeepMind. Cursor. Anthropic. Factory. WorkOS. Glean. CrewAI. Mastra. Daytona. Anyscale. TripAdvisor. Upstart. OpenClaw.
June 4, SF: https://t.co/RxipeJgRKy
@NousResearch Desktop agents unlock a new class of context: local files, system state, real-time input. Richer inputs, harder-to-reproduce failures. The eval loop matters more, not less. Congrats @NousResearch. Big move!
Microsoft picked OpenInference. Twice.
The open trust stack for AI agents announced at #MSBuild, ASSERT for evaluation, ACS for controls, both ride on the open tracing standard Arize built for agents.
https://t.co/DKnceQpcIK
Teams building agents need trust mechanisms that work across frameworks instead of one-off controls hidden in prompts or app code.
@Microsoft announced Arize as a partner for ASSERT and Agent Control Specification, an open path from policy-driven evals to runtime controls to production observability.
With ACS events in Arize alongside OpenInference traces, teams can connect every block, approval, and state transition back to the agent behavior that produced it.
Learn more: https://t.co/WQpa2zadEn
At Microsoft Build? Our 2 must do things for today:
1. Catch Sarah Bird's session - Observe and control agents across any framework with open source tools, 2:30pm
https://t.co/JdeZj1rEhC
2. Head to the Microsoft AI expert booth to meet with @jimbobbennett from our dev rel team to discuss AI observability and Evals from 11:30am to 3pm
#MSBuild
⚖️ One AI Question with Tyler Niederwerder
We asked our Corporate Counsel: How should you use AI in legal?
His answer: Lawyers, stay sharp.
AI is great for speed, but human-in-the-loop is mandatory. Always verify AI output to protect privilege, maintain professional standards, and ensure 100% accuracy. The stakes in legal are too high for anything less.
#LegalTech #AI #LawyerLife #HumanInTheLoop
A fireside chat *and* a talk from George Zhang of @openclaw. Happening at Observe in 2 days.
Grab your tickets.
June 4th.
Arize Observe.
https://t.co/RxipeJgRKy
Sessions, compression, permissions, and tool exposure are runtime infrastructure in Hermes.
Read the full architecture breakdown:
https://t.co/hRk3x4eomc
Our cofounder @aparnadhinak did a full architectural teardown of the open source agent harness Hermes from @NousResearch.
One architectural decision stood out: Hermes preserves lineage across compression boundaries.
When a session compresses, Hermes closes the current session, creates a child session seeded by the summary, rotates the session ID, and records parent-child lineage.
Long-running agents keep a traceable history of state transitions instead of repeatedly rewriting the same transcript.
Our latest blog walks through all of this with comparisons across LangSmith, Langfuse, Braintrust, Comet Opik, Phoenix, and Arize AX.
If you’re choosing an eval harness for production AI, this is the framework to start with. https://t.co/QDB0UShjGG
New post: how to choose an eval harness for production AI from @seldo. 👀
The key idea: your eval infrastructure should survive the parts of your AI stack that are guaranteed to change.
We break down the production requirements:
- evaluate spans, traces, trajectories, and sessions
- run the same evaluators in notebooks, CI, and production
- monitor live traffic for regressions
- route failures to review
- turn production failures into regression tests
- reuse the same instrumentation across the lifecycle
Will you be at Microsoft Build this week, either in person in SF or virtually?
Our very own @jimbobbennett will be giving a demo session on understanding and fixing agents with open source observability and evals, Wednesday 3:30pm, Theater C.
#MSBuild
https://t.co/MdCTSfx9oZ
Most agent projects don’t fail because the AI isn’t ready.
They fail because only a handful of people are allowed to build them.
The companies seeing true and successful agent adoption are taking a different approach: let the people closest to the work automate it, with IT and infosec in full control of governance and guardrails.
@joaomdmoura, CEO + Founder of @crewAIInc, will be talking about how companies can truly adopt agents at scale at Observe this Thursday.
Grab your ticket.
June 4th.
Arize Observe.
https://t.co/NYEih97lij
Coming up against the same problems over and over? Tried a bunch of experiments to get unstuck but still not where you want to be? This year at Observe @HamelHusain will be hosting two 1-hour Offices Hours to work through challenges in what you're building.
Grab your tickets for Observe now and don't miss out on this special session. https://t.co/NYEih97lij