new post: how I develop recently using local models. the tooling is now good enough to do agentic workflows and everyone should give them a try!
https://t.co/3Tx3CMsNG3
Research shows that 26% of Agent Skills contains vulerabilities and 5% show likely malicious intent, NVIDIA just released a security scanner that helps you check if a skill is safe to install!
You might be thinking that it has been a while and LLM models are more secure and safe from prompt injections and data exfiltration attacks @wunderwuzzi23 has bad news for you all. Read the article in the reply!
imo there’s a pretty solid default recipe that everyone should use to optimize a system of
Agent = Model + Harness
you should “train” both
1. Build v1 agent using a sensible base harness and some task specific prompting + tools
2. Harness Engineering using eval tasks that roughly match prod
this is often enough - most companies can get acceptable perf doing this. then they collect traces, mine them for patterns, and make slight tweaks from there
3. SFT using data collected from traces) or synthetic data. Often is good candidate for “distillation tasks” to train a cheaper model while maintaining existing performance
4. RL if you have the bandwidth and ability and desire to create environments and designing rewards that represents the tasks you want your agent to be good at. Push past the SFT behavior of “copying” data from existing model to pushing past in some dimension
5. Light harness engineering again to squeeze any more juice (ex: slight prompting) using the trained model that’s better at your task distribution
this loop will largely be productized as a general purpose recipe for building and improving agents
we’re still in the earliest innings of the world’s companies getting comfortable with steps 1-2 of this loop. Harness engineering will probably be the dominant way ppl will optimize agents
but i expect a large number of companies to onboard through this entire loop on some trial project of interest in the next year
NVIDIA's LocateAnything is a new vision model for grounding and detection. Very performant and accurate!
> 10x faster than Qwen3-VL
> 138M queries + 785M boxes
> GUI, OCR, docs, dense detection
> Free & open source
https://t.co/UvkH8l0QRb
We’ve shipped a security-guidance plugin for Claude Code that helps identify and fix vulnerabilities as you’re writing code.
Available for all Claude Code users. Install from the plugin marketplace (/plugins).
A 16-line bash script that builds its own coding agent harness, then uses the harness to build a backdoor file browser. No frameworks. No dependencies beyond `curl`, `jq`, and `python3`
A simple prompt and a network call and you have a T-1000 kind of “weapon” at your disposal.