We just made benchmarking LLMs extremely easy!
Selecting suite, configuring harness, running evals, parsing outputs, visualizing separately.
At @AquinF03 we killed that. Now it's just one prompt and you're done!
From models to fine-tuning runs to datasets.
here's how it works:
Claude now connects to the tools creative professionals already use.
With the new Blender connector, you can debug a scene, build new tools, or batch-apply changes across every object, directly from Claude.
We just launched our closed beta!
Recap: A platform to monitor and improve AI/ML systems. Think reducing hallucinations, safety, compliance, inspecting data, training, and models!
Also kicking off research and building a community around this space (bounties, fellowships, hackathons soon).
Join us over here: https://t.co/ZqWkBPIhmL
Early access: https://t.co/MH9xQvMQU6
one is a visual map of every neural connection in a worm’s nervous system.
the other is 16,384 features inside Llama 3.2. 1b
both fully mapped.
can you tell which is which?
“the model is hallucinating”
in psychology we call this confabulation it is when your brain filling gaps with confident nonsense. Turns out models learned that from us .
want your model to stop hallucinating? we are building @AquinF03 for that. Join the waitlist
now