Built a small OpenShell demo: the model can request tool access, but the shell only executes calls that pass an SMT check against policy intent.
Safe actions get auto-approved because they come with a proof, not a vibe.
Recorded a quick demo of agent-driven policy approvals in OpenShell 🤖
OpenAI Codex starts in a sandbox with read-only GitHub access, gets blocked when it tries to write, proposes a narrow policy update, waits for review, then retries successfully after OpenShell hot-reloads
Been so much fun cooking OpenShell and NemoClaw with the @NVIDIAAI folks! 🙏🦞
Huge step towards secure agents you can trust.
What’s your OpenClaw strategy?
synthetic data is also a big feature of our Nemotron 3 Nano pre & post-training. Gretel joined Nvidia in April, and it's been an amazing ride to bring SDG & Data Designer to help build our latest SoTA Nemotron.
here are some datasets we worked on 🧵
🚀 @nvidia just open-sourced NeMo Data Designer, the Python framework we use to generate synthetic data for Nemotron.
Structured outputs ✅ LLM columns ✅ Dependency-aware fields ✅ Validators ✅ LLM-as-judge ✅
Jump in & build your own pipelines: https://t.co/zrXkMk8kp7
📊 Excited to share how teams are using @gretel to evaluate RAG systems! Here's our guide to:
- Generate diverse test scenarios
- Evaluate different configs
- Find edge cases automatically
Something we've learned from lots of customer convos. Check it out ⬇️
🧵What we’re seeing emerge with models like DeepSeek R1, @MSFTResearch Phi-4, @Meta's self-play and now as OSS with @huggingface Open-R1 is a data pipeline as sophisticated and original as the models themselves- and I’m here for it
@karpathy Reminds me of the logic in "if you can dodge a wrench, you can dodge a ball" from the movie #dodgeball. Maybe we can all try making models that are actually good at the tasks we need
The future of AI customization won't need armies of labelers - just synthetic data and expertise 🚀
@AIatMeta's ALMA paper just proved this: matching Llama3-Instruct performance with synthetic data + just 9k examples (vs. millions!)
Paper: https://t.co/xDgFauyKev
Neat paper from NeurIPS around scaling synthetic data, focused on diversity in prompt and response synthesis and iteration (100s of examples and 10 iterations) as keys to alignment matching or exceeding exclusively human data, similar to what we have seen at @gretel_ai
Everyone’s talking about synthetic data generation — but what’s the recipe for scaling it without model collapse? 🤔
Meet ALMA: Alignment with Minimal Annotation.
We've developed a new technique for generating synthetic data and aligning LLMs that achieves performance close to Llama3-instruct with only 9000 labeled examples, that's less than 1% of the millions of human annotation data typically used for alignment.
Check out the full paper here: https://t.co/OBB9Rzfigy