Evals are arguably the hardest part of LLMOps. LLMs mess up, so we check them w/ other LLMs, but this feels icky. Who validates the validators??
We built an interface to align LLM-based evals with user preferences, learning a lot about why this is hard: https://t.co/g7UXuBznv9
Streaming is now available in the Assistants API! You can build real-time experiences with tools like Code Interpreter, retrieval, and function calling.
https://t.co/B0Vytm6zyE