I made AgentIR: a scheduler for distributed LLM serving that makes agent workloads run much faster.
41.3% lower E2E latency, and up to 70% higher throughput
I took OSDN, a brand-new linear-attention model that learns to tune its own memory updates as it reads (think AdaGrad for the architectures trying to replace the transformer), rebuilt it from scratch in pure C++ with my own autograd engine, and ran it on a $4 microcontroller to predict hypoglycemia 60 minutes before it hits.
No PyTorch. No JAX. No TensorFlow. No ML library at all. Straight C++ standard library.
launching AgentIR Blackbox https://t.co/P7GC38xM5w
an llm request router for agent system
Blackbox finds which llm calls are on your workflowโs critical path, sends them to faster providers, and routes less urgent calls cheaper to maintain your selected cost-latency constraint
it uses your workflow stats and real-time provider latency profiles to reroute before throttling or slowdowns hit the full workflow
setup is simple too. connect your app, and blackbox handles the workflow annotations for you
use it for free!