Agentic organizations & companies as well as governance & orchestration approaches have me thinking more about org culture. Lots of individual approaches on agent persona/values, surprised there isn't a pip repo to distill "culture" from one group of agents to another. Could easily see the agentic persona marketplace moving into these areas.
We share two blogs outside of the HumanLM paper: https://t.co/K0pOqeMPal
Is Synthetic Data Good Enough to Train User Simulators? — by me and @ArpandeepKhatua
Persona Dropout Makes Robust User Simulators @Es2C003
+ Code is ready here! https://t.co/Iw4eRh63gX
🚨New paper to level up your 🦞#Clawdbot ?!
Bots are now posting your sensitive info in real time. But privacy research is a desert with no data to train better models. That's about to change
Enter 🏝️Privasis, the oasis where you can train strong privacy-forward AI with scale✨
The multi vector embeddings are created using our SOTA retrieval model mxbai-wholembed. It supports text, audio, video on over 300 languages. A key innovation is the dynamic vector allocation, which lets the model dynamically decide the amounts of vectors it needs to represent information. For example, a simple cat image may output a few vectors, whereas a complex slide deck may generate thousands of vectors. We wrote a custom inference engine to serve mxbai-wholembed with low latency.
New paper!
A compromised inference server can leak model weights by hiding them in normal-looking responses🥷
But LLM inference is nearly deterministic, so we can verify outputs with a trusted server & detect this (and inference bugs)
This slows data exfiltration by >200x
🧵