๐น Full sessions grouped together, so long conversations stay debuggable
๐น MLflow 3.12 tracing for Claude Code, Codex, Gemini CLI, OpenCode, Qwen Code, and OpenHands
๐ฅ Full webinar: https://t.co/JRbV5uQ4X0
#MLflow#CodingAgents
MLflow 3.12 deep dive clip: why coding agents need tracing ๐
Yuki Watanabe walks through what shows up in the trace when you turn it on:
๐น Every turn, tool call (Read, Bash, Edit), and sub-agent step
๐น Token usage and latency per span, including cache breakdown
MLflow 3.13.0 is a major update that runs AI observability at scale, focusing on access control, the lifecycle of your trace data, and richer support for agents. ๐
๐Check out the highlights of the release: https://t.co/JNPgjyWXrU
#mlflow#opensource#linuxfoundation
Thousands of traces, no systematic way to spot bad agent runs. MLflow Automatic Issue Detection ๐ choose CLEARS categories, run analysis in three clicks, triage issues in the UI.
๐ Learn more: https://t.co/a8gNos0vs7
#MLflow#LLMOps#GenAI
Trace + eval Genie in MLflow ๐
๐น Full Genie pipeline
๐น MLflow traces + judges
๐น Tighten one pilot space first
๐ Read the cookbook: https://t.co/kd9fQxwSof
#MLflow#Genie
Vibe-checking works until it doesn't. Change one prompt, break three behaviorsโand you can't tell if you moved forward or backward.
Eval-driven development in MLflow ๐
1๏ธโฃ Trace โ mlflow.openai.autolog() + @mlflow.trace spans (latency, tokens, cost)
2๏ธโฃ Evaluate + prompts โ mlflow.genai.evaluate(), make_judge(), Prompt Registry, optimize_prompts (GEPA)
3๏ธโฃ Prod โ same judges on live traces; agent dashboards for cost/latency/quality
๐ Learn more: https://t.co/I4zS7unOrn
#MLflow #LLMOps #GenAI
Claude Code can burn through dozens or hundreds of LLM calls in one session. MLflow 3.12.0+: route it through AI Gateway with two env vars for traces, budget alerts/limits, and guardrails. No SDK changes.
๐ฃ๏ธ Setup: mlflow server โ Gateway endpoint โ ANTHROPIC_BASE_URL to the claude-code proxy. Run claude as usual.
Learn more ๐ https://t.co/2xSoVXuJZ2
#MLflow #AIGateway #ClaudeCode
Catch this session at Data + AI Summit (June 15-18, SF)! ๐
Agent quality via vibe-checking breaks at scale.
๐ MLflow self-evolving test harness
๐งช Bad-answer feedback โ automated tests
โ Coding-agent fixes vs. accumulated suite
๐ค Adam Gurary & Yuki Watanabe
Session details: https://t.co/8AhnjLPmEP
#MLflow #DataAISummit
.@OpenHandsDev agents edit files, run commands, and browse the web on their ownโbut thereโs no structured record of what happened or whether the result was good.
MLflow connects via @opentelemetry to trace every step, evaluate runs with built-in judges, and route model traffic through AI Gateway for budget and usage control.
Learn more ๐ https://t.co/YGQlhIB7yK
#MLflow #OpenHands
@OpenHandsDev agents edit files, run commands, and browse the web on their ownโbut thereโs no structured record of what happened or whether the result was good.
MLflow connects via @opentelemetry to trace every step, evaluate runs with built-in judges, and route model traffic through AI Gateway for budget and usage control.
Learn more ๐ https://t.co/YGQlhIAzJc
#MLflow #OpenHands
New on the MLflow channel: evaluate a RAG agent end-to-end with Joana Mesquita, MLflow Ambassador ๐
๐ Prompt Registry + production aliases
๐ Traces with SME ground truth
โ๏ธ Ragas, Phoenix + custom LLM judge
Watch now: https://t.co/ryZbnFHONj
Blog: https://t.co/HIMMCatGcv
#MLflow #RAG
New on the MLflow channel: evaluate a RAG agent end-to-end with Joana Mesquita, MLflow Ambassador ๐
๐ Prompt Registry + production aliases
๐ Traces with SME ground truth
โ๏ธ Ragas, Phoenix + custom LLM judge
Watch now: https://t.co/ryZbnFHONj
Blog: https://t.co/HIMMCatGcv
#MLflow #RAG