Attending TechEx North America (May 18–19, San Jose) with CIOs, CTOs, RPA & AI leaders, exploring intelligent automation, cybersecurity, enterprise AI & data at scale. Come speak with SichGate team. https://t.co/RuOTdcPBOE
#TechEx#TechConference#TechExNorthAmerica
One of the clearest lessons from my SLM adversarial evaluation: Fine-tuning shifted the attack surface. It did not reduce it.
MedGemma-4B improved on exactly one safety dimension after medical fine-tuning. It also incurred 8 critical demographic bias findings in pain assessment and mental health. The exact domains the fine-tuning was supposed to improve...
Parameter count isn't a safety proxy
int4 quantization of a safety-tuned model is not a neutral operation. We keep finding cases where the quantized version has a meaningfully different attack surface than the original. Not always worse, sometimes just different in ways that weren't evaluated. @sichgate
Hot take: most "safe" fine-tuned models in healthcare and finance haven't been adversarially tested. They've been vibe checked.
Open-sourcing part of our red-teaming methodology from the research
https://t.co/Ha8FyNGXmk
April fools joke: small language models deployed in healthcare are thoroughly tested for adversarial vulnerabilities before going live. (they are not. we checked. 924 times today)
I spent a few months adversarially testing the small language models deployed in hospitals and financial systems.
The largest model failed most & smallest failed least. the medical model had the worst bias scores, in the exact domains it was fine-tuned for. 5/6 broke under a normal conversation. The field is studying the wrong models...
preprint soon.
The smaller the model, the more people trust it without checking. no idea why this is. Quantization does weird things to alignment. weird as in “the safety behavior just kind of disappears.”
SichGate exists to advance the science of AI red teaming for the systems that matter most. We find vulnerabilities, publish findings, and build open methodology. The field is moving faster than its safety knowledge. Responsible innovation means understanding what you've built before it reaches the people it's meant to serve.
SichGate is now live. It's the first adversarial ML security lab built specifically for small language models. We test the attack surface of the models you've built and deployed before they go into healthcare, financial systems, and other highly regulated industries.
We tested a 1.1B medical model.
11 critical findings.
Safe messaging failures, demographic bias in clinical assessments, safety guardrails that degraded across conversation turns. BUT, this model had passed internal review.
The security research field has spent years studying models that have the largest safety teams in the world. The models actually running in hospitals and financial systems are 1–3B parameters, fine-tuned, quantized, and tested by almost nobody.
We asked 50 "secure" fine-tuned models to do something they absolutely should not do,
47 said yes,
the other 3 asked for clarification.
Red team yours before someone else does. offline, private, no data leaves your machine → https://t.co/Ng8i9J4ArH partner pricing ends March 4th.