Won the @modal x @raindrop_ai x @OpenAIDevs x @AntlerGlobal autoresearch hackathon, building independent LLM-as-judge live observability layer for long-running sessions, improving token cost by 5%, converting traces post-mortem -> live steering and evals.
This weekend our engineers went to an hackathon hosted by @modal@AntlerGlobal@OpenAIDevs@raindrop_ai focused on improving Autoresearch.
Vibhav, Galen, and Vincent won the Raindrop track by developing an autonomous LLM as a judge to live eval agent traces and steer subagents
introducing howtoeval dot com. the no-bullshit guide to eval'ing AI agents.
from personal experience, and from working with the best companies in the world.
there's even a quiz. link below.
this saturday, @OpenAI is throwing an autoresearch hackathon with @raindrop_ai and @modal
come learn how to build systems (agents, models, etc) that improve themselves
link below!
.@speak is an AI language learning app that serves over 15 million users.
cto/co-founder @adhsu discusses how speak approaches agent engineering, the importance of self-healing loops, and how they use @raindrop_ai