Our new guardian model lets you create LLM guardrails using natural text. This little 8B model efficiently checks in real time whether chatbots comply with bespoke moderation policies.
It's not often that academics beats industry models, but DynaGuard stacks up well!
There is still a lot of brittleness in getting guardian models to incorporate custom policies, but we think this is a step in the right direction. Try out DynaGuard in this interactive demo (and give us feedback to improve it!): https://t.co/ZArEreMkpK
Guardrails with custom polices are hard for models trained on safety and harm-related datasets. But what if you trained a guardian model on arbitrary rules?
Introducing DynaGuard, a guardian model for custom policies: https://t.co/oPWOZstRUQ
I am looking for a postdoc to lead projects related to this collaboration, on scaling laws, emergence and interpretability in pre- and post-training & inference/reasoning, in multimodal foundation models (language, time series, tabular data etc). HPC experience is a plus.
@AlexGDimakis I would take it a step further. Humans are really good at learning from non verbal social cues. A look that implies disappointment, excitement, frustration can be a profound reward signal in many situations.
The past 5 years have seen big successes in language, image and video generation, but relatively limited success in robotic manipulation. Why don’t we have laundry robots in every house?
One thing seems clear: training compute is not the blocker. 🧵
i'm increasingly convinced that "transformative ai" is going to look like an abundance of specialized models for everything from drug design to weather sims to robotics to supply chains, not one agent to rule them all. we're going to need a lot more ai researchers
@scaling01 Three years is the perfect prediction horizon for anything you want. It’s just close enough that people feel like it’s going to happen soon and just far enough that if the deadline passes no one will remember you were wrong.
Open-weights for our Llip multimodal vision-language model led by @lavoiems are public!
LLIP proposes new pre-training objective to capture the many ways to describe an image leading to strong performance across a suite of 22-zero shot benchmarks.
https://t.co/Tr354Kfcno