Building high-quality AI systems is hard.
At Langfuse we see the best AI teams converging on a process to get complex AI systems to production.
We call it the AI Engineering Loop.
Check out the first piece of our series and find out more in our academy
had a lot of fun at the AI Engineer conference to go deep on:
(1) how we think about the role of skills
(2) how to develop/eval/improve them
(3) lessons from building our own set of skills
quarterly Langfuse Town Hall on June 11th
catch up on everything we've shipped: v4, the latest releases, and what's coming next on the roadmap. Q&A with the team at the end.
open to the whole community. register: https://t.co/2hMFvMY62j
Coming to SF for Snowflake Summit? Meet our team next to Moscone for coffee, drinks & database + agent observability talk
https://t.co/LRcQpghyQS
cc @ClickHouseDB
day 5 of launch week: langfuse MCP.
supports: observations, metrics, scores, datasets, comments, annotation queues, models, media, and more.
claude or linear agents can pull a trace, drop a comment, or create dataset items without leaving the chat.
https://t.co/jBHM8WeA5e
day 4 of langfuse launch week: code evaluators.
write a python or typescript `evaluate` function in the langfuse UI. attach it to live observations or an experiment. scores land natively next to your existing ones.
@wochinge demos below; https://t.co/jBHM8WeA5e
day 3 of langfuse launch week: full-text search.
multi-GB scans drop from many seconds to sub-second on @ClickHouseDB's new text indexes. great work from @sum3rman.
available via UI and API.
more: https://t.co/jBHM8WeA5e
day 2 of langfuse launch week 5: langfuse agent skill.
bringing an agent to production is hard.
using the skill you can ask your coding agent to instrument your app, calibrate a judge, or set up evaluators.
@marliessophie demos below; https://t.co/jBHM8WeA5e
day 1 of langfuse launch week 5: a github action that runs your langfuse experiments on every PR.
fails the workflow when scores drop below your threshold. posts pass/fail to the PR. every run is tracked in langfuse.
https://t.co/jBHM8WeA5e
day 1 of langfuse launch week 5: a github action that runs your langfuse experiments on every PR.
fails the workflow when scores drop below your threshold. posts pass/fail to the PR. every run is tracked in langfuse.
https://t.co/jBHM8WeA5e
@langfuse launch week 5 starts monday.
one release per day, mon to fri. agents, evals, and some long-requested features.
we'll be demoing all new features at @ClickHouseDB Open House in San Francisco same week. come say hi.
https://t.co/isf5XmqcVr
@langfuse launch week 5 starts monday.
one release per day, mon to fri. agents, evals, and some long-requested features.
we'll be demoing all new features at @ClickHouseDB Open House in San Francisco same week. come say hi.
https://t.co/isf5XmqcVr
Want to see what Claude Code is actually doing? We made a video showing exactly how to observe it in real-time with Langfuse.
Claude Code in Action: Trace Tool Calls & Decisions with Langfuse https://t.co/p58PHR8Ssj
This is a great article by @annabellschfr - a lot of teams still get stucks on vibes and don't make it to actually systematically experiment with models, prompts, . context, architectures. dig in!