You can now boost any LLM's accuracy 2-10x without training it.
Most teams improve model accuracy by fine-tuning or swapping to a bigger model.
Both cost time and money.
OptiLLM takes a different route.
It is an open-source proxy that sits between your app and any OpenAI-compatible API.
Instead of training, it spends extra compute at inference time to think harder before answering.
The repo bundles 20+ reasoning techniques you can switch on with one parameter.
A few of the methods inside:
> Multi-agent cross-verification
> Monte Carlo tree search
> Chain-of-thought with reflection
> Best-of-N sampling
> Z3 theorem prover routing
The numbers are the headline.
On AIME 2025, Gemini 2.5 Flash Lite jumps from 43.3% to 73.3% accuracy.
Llama 3.3 70B gains 18.6 points on Math-L5.
GPT-4o-mini matches GPT-4 on Arena-Hard-Auto.
No retraining. Just route your calls through the proxy.