Observe 2026 is a wrap.
Yesterday we shared what’s next for Arize AX and our vision for the AI factory for self-improving agents.
The focus: helping teams turn production behavior into a repeatable loop for finding issues, investigating root cause, testing fixes, and improving agents.
Phoenix just hit 10,000 GitHub stars!
Three years ago, Phoenix didn't exist. Arize was a closed-source company. A small team was asked to change that.
Catch the full interview with the team who made it happen and where AI observability is going next: https://t.co/5IdL4Ew0Gc
Congratulations to our cofounder @aparnadhinak on being named one of the Top 100 Women in AI.
A lot of the hardest work in AI right now starts after the demo works: evals, observability, tracing, debugging, and figuring out how to make production systems improve over time.
Aparna has been pushing the industry toward that reality for years.
Well deserved recognition.
https://t.co/0C2duM6Bev
Thank you to everyone who starred the repo, opened an issue, contributed code, challenged a design decision, or helped another engineer debug a production system.
10,000 stars on @github is a milestone that's possible because of you.
But the real story is the community that helped define AI observability as the industry moved from notebooks to agents.
Our open source observability platform Arize Phoenix just crossed 10,000 stars on @github. ✨
That number belongs to the people who tested it, broke it, filed issues, opened PRs, asked better questions, and helped turn AI observability into an engineering workflow.
📖 Read the full story.
The blog gets into how a small team built Phoenix “backwards”: features before infrastructure, support handled by maintainers, and roadmap signal coming directly from GitHub, Slack, and users.
https://t.co/RLQmDND8d3
Building with @CoinbaseDevs and @Vercel at the "Ship the Agent Stack" hackathon today.
Cross-company teams combining Vercel agentic infrastructure, x402, Coinbase Developer Platform, and Arize AX to build a service where agents are the customers!
We also added agent fleet observability and voice agent support.
That means visibility into managed AI workers across status, activity, trajectories, token usage, and cost, plus observability, replay, and evaluation for voice conversations. https://t.co/Kj6GIl5nB4
Signal is one of our out of the box managed agents (you can build your own) and continuously reviews production traces, identifies emerging failure patterns, and groups related issues into investigation reports.
Instead of starting with a blank trace, teams get the issue, evidence, root cause, impact, and suggested next steps.
Harness-as-a-Judge helps evals keep up with production.
Agentic judges can inspect traces, identify emerging failure modes, and create reusable labels for monitoring, evaluation, and experimentation. https://t.co/R2ZYioqNoL
Full-agent experimentation moves testing beyond prompts.
Teams can compare complete agent behavior across runs, including tool use, retrieval quality, latency, trajectories, and eval results. https://t.co/lN0MbAufWw
Agent orchestration lets teams run long-running, repo-aware agents across engineering workflows.
They can inspect traces, analyze code, create evals, propose fixes, and generate investigation artifacts for engineers to review. https://t.co/PksqicXWuM