ai avatars are boring
We built characters that argue, react, and sometimes go completely off-script
Powered by @odysseyml
live at https://t.co/HrVojqaxbZ
last week we did livestream on real-time interactive media and world models coz we believe that's where media is headed.
so we’re starting a weekly newsletter to explore the space, share cool things we find
and got a few other things cooking too
drop your mail to get in
At @AquinF03, we're continuing to make all existing evals and benchmark tools obsolete:
1/3
Custom evals: write your own scorer in Python and you get access to activations and SAE features, so you can do things like:
"check whether a specific feature fired above threshold on a response"
which no external eval harness can do!
2/3
Benchmark Builder now can run weight evals differently in a suite, and export results in multiple formats.
3/3
Auto-suggestions: agent observes and proactively suggests most relevant evals, with just one click to run.
@RTinkslinger@RajatAgarwal167 Completely agree. we're also betting on the thing with real-time interactive media and characters for learning, starting with kids.
here's how it looks like :
2 months building and researching interpretability tooling at @AquinF03
and I discovered that our users are divided into two groups:
1. People working on Interpretability
2. People leveraging their ML work with Interpretability
First group builds on top of our tooling and experiments. Second group uses tooling for existing pipelines, and to debug/improve their ML work.
At @AquinF03, we care about both. We're shipping a lot, and every release could turn into a experiment or study or a paper.
Come build and research with us: https://t.co/zC92O8cdLO
Introducing @AquinF03's Devkit!
basically https://t.co/WQWNS7bfUJ's interpretability tooling locally through an SDK + CLI.
Aquin SDK records training runs locally, including metrics, config, and checkpoints, then CLI packages and pushes them to Aquin for post-hoc.
Once pushed, run appears in CLI runs with full inspection: loss curves, learning rate, grad norm, epoch summaries, SAE diff, and model diff.
SDK is framework-agnostic. It works with any Python training loop that produces a PyTorch model.
For HuggingFace Trainer and TRL, a TrainerCallback pattern wires everything in without touching training logic.
pip install Aquin!
Glad to announce that @AquinF03 now supports embedding models:
Geometry inspection, retrieval evaluation, fine-tuning monitoring, and embedding diff across checkpoints.
here's how we support them:
We are bringing 12 founders into one house in Hyderabad for three months. Fully funded will you get selected.
Living costs covered. Operators in the room. Weekly pressure to build, ship, and become worth backing.
@theresidency