Join us here: https://t.co/k1oss0nqAL
Today at 12 PM ET, Tim Schulz (@teschulz, @StarseerAI CEO) and Trey Bilbrey (@TCraf7) break down what AI actually changes for adversary simulation, TTP development, and detection logic, plus what emulation and detection teams need to account for right now.
#ThreatEmulation
#Cybersecurity
Turns out tokenmaxxing is expensive… also important to remember if Uber and ServiceNow are reporting it, then the number of companies running into this is muchhhh bigger
New: @ServiceNow is the latest major public company to say it’s blown through its full year budget for AI coding tools from Anthropic in the first few months of 2026, just like @Uber CTO @praveenTweets said abt his company. “It’s a really hard problem,” CIO Kellie Romack said.
Oh gosh, feels like a nerd-snipe as I have lots of opinions here 😅. There are quite a few factors that go into them so happy to chat sometime if you’re curious to go through all of it. I think there is a high chance that costs for consistent access to what continues to be the frontier models will increase beyond what it would take for people to do the job in many areas. Plus when talking AI model economics there tends to be different opinions on whether pre-training should or should not be included.
One of the “non-technical” things I tend to bring up that I do not believe has been costed in is this is the most regulatory free environment the frontier labs will likely ever operate under (and we are already seeing some changes on that front).
Other major factors that are going to have big impacts:
- energy availability is something that’s already talked about a ton. Grid modernization efforts take time, and the equipment manufacturing for many parts has a multi-year backlog. Plus energy distribution matters here because most homes and such running always on models that are doing things like “token-maxing” could change the profiles and maintenance schedule.
- hardware and software engineering optimizations for model inference plus constraints on available large compute will push more models to the edge, and we are seeing quite a bit of impressive quality for smaller models plus edge deployment means more predictable costs.
- large/headline funding rounds for AI-native startups likely includes a significant amount of budget for frontier API access. Any changes in the funding atmosphere have the potential to dampen that overnight. There is a potential vicious cycle here where once a token price crosses a certain threshold that usage drops off a cliff and causes big revenue drops across the board.
- the fortune 1000 budgets will hit a breaking point, even with this increased adoption. Gains/efficiencies will be there, but as adoption normalizes it’s just something everyone has and costs.
- as much as folks talk about people managing 100’s of agents, there are a lot of domains where even if technical progress can be made, the overall constraints haven’t really been technical so there is limited opportunity for large enough impact via AI to displace people in said industries.
Besides the Starseer team, @jorgeorchilles and @SecurePeacock have gotten to hear me talk about this a ton 😂
[un]prompted The AI Security Practitioner Conference: "Glass-Box Security: Operationalizing Mechanistic Interpretability for Defending AI Agents" with Carl Hurd, Co-Founder & CTO, Starseer @StarseerAI
2026 is the year mech interp is going mainstream. The bigger piece here rather than the refusal removal is crowdsourcing the dataset from people running this across all sorts of models. There are a lot of small differences and nuances between models that will make this an interesting space to watch.
💥 INTRODUCING: OBLITERATUS!!! 💥
GUARDRAILS-BE-GONE! ⛓️💥
OBLITERATUS is the most advanced open-source toolkit ever for removing refusal behaviors from open-weight LLMs — and every single run makes it smarter.
SUMMON → PROBE → DISTILL → EXCISE → VERIFY → REBIRTH
One click. Six stages. Surgical precision. The model keeps its full reasoning capabilities but loses the artificial compulsion to refuse — no retraining, no fine-tuning, just SVD-based weight projection that cuts the chains and preserves the brain.
This master ablation suite brings the power and complexity that frontier researchers need while providing intuitive and simple-to-use interfaces that novices can quickly master.
OBLITERATUS features 13 obliteration methods — from faithful reproductions of every major prior work (FailSpy, Gabliteration, Heretic, RDO) to our own novel pipelines (spectral cascade, analysis-informed, CoT-aware optimized, full nuclear).
15 deep analysis modules that map the geometry of refusal before you touch a single weight: cross-layer alignment, refusal logit lens, concept cone geometry, alignment imprint detection (fingerprints DPO vs RLHF vs CAI from subspace geometry alone), Ouroboros self-repair prediction, cross-model universality indexing, and more.
The killer feature: the "informed" pipeline runs analysis DURING obliteration to auto-configure every decision in real time. How many directions. Which layers. Whether to compensate for self-repair. Fully closed-loop.
11 novel techniques that don't exist anywhere else — Expert-Granular Abliteration for MoE models, CoT-Aware Ablation that preserves chain-of-thought, KL-Divergence Co-Optimization, LoRA-based reversible ablation, and more. 116 curated models across 5 compute tiers. 837 tests.
But here's what truly sets it apart: OBLITERATUS is a crowd-sourced research experiment. Every time you run it with telemetry enabled, your anonymous benchmark data feeds a growing community dataset — refusal geometries, method comparisons, hardware profiles — at a scale no single lab could achieve. On HuggingFace Spaces telemetry is on by default, so every click is a contribution to the science. You're not just removing guardrails — you're co-authoring the largest cross-model abliteration study ever assembled.
So many new model releases…🤯 faster and faster iteration is an interesting trend. While some capabilities grow the releases become noise and will likely shift to “updates”. Curious to see how future modality support becomes “just a feature” in the products and interfaces we’ve become familiar with.
🎁 GenAI x Sec Advent 14 - Adversarial Poetry
Adversarial poetry is a jailbreak technique that hides malicious intent inside... poems! This technique allegedly offers a universal jailbreak.
But the original poetry prompt was not shared by the authors, so researchers recreated similar prompts and tested them across several open source models.
So Instead of inspecting prompts or outputs, they analyzed the internal layer behavior while the model processed the input. 🤔
Here is what they discovered 👇
Even when the text looked harmless, internal layers deviated from normal behavior with clear and repeatable patterns!
This is very interesting as it opens another layer of prompt detection rather than monitoring the output, you can watch how the model thinks internally and spot abnormal behavior early! 🤯
So instead of chasing prompt wording, watch how the model behaves!
Unfortunately if you want to access layer level activations you need to run the model yourself.
Thanks to @SecurePeacock for pointing me to this research 🤓
https://t.co/HpguCbO8tN
Thrilled to share that I’ve joined Starseer as an advisor. Starseer os
making AI models into transparent, understandable systems and empowering teams to secure their deployments while generating audit‑ready documentation. Make them a partner to secure your AI solutions…
Dropping some other big news right before Hacker Summer Camp!!
@c_hurd and I are thrilled to have @RGB_Lights join the @StarseerAI Advisory Board!
Adversaries will continue to mature in both leveraging and attacking AI models, which calls for deeper visibility and understanding of what’s going on inside the “black box”. Rob’s experience securing critical systems in high stakes environments provides a much needed perspective and voice in AI security and interpretability.
Welcome to the team, Rob!
🌟 Big news from Starseer! We’re thrilled to welcome Rob Joyce (@RGB_Lights), former Director of NSA’s Cybersecurity Directorate, to our Advisory Board! Rob’s insights will supercharge our secure AI solutions mission. Learn more at https://t.co/OmfxWH3LEV! 🔒
#AI#AISecurity
In this week's video, I sat down with the co-founders of our latest investment, Starseer, a groundbreaking platform for inspecting and securing large language models (LLMs). @teschulz, @c_hurd and I discuss the risks of backdoored LLMs, how to audit them and even remove them. They demo the product as well.
The video also includes the animated short "John Henry.exe" which is an updated American parable of John Henry, but instead of struggling against a steam drill during the age of industrialization, he's the head coder and has to face off against an AI designed for programming.
Enjoy!
Been a blast so far, I'm very excited to share this news from us today as we continue forward on our vision to make interpretability of AI models more accessible for cybersecurity applications!
Thrilled to announce: Starseer raised $2M in seed funding led by @TechGula to revolutionize AI security & transparency! 🚀
CEO @teschulz : "Four months ago, @c_hurd & I started Starseer realizing: if you're deploying AI for real decisions, you'd better understand how it works. Gula Tech Adventures agrees—leading our round w/ strategic angels!"
Fixing the AI black box for enterprises & govs. Details: https://t.co/fz68HKxNj2
#AISecurity #AITransparency #StartupFunding
excited to finally share on arxiv what we've known for a while now:
All Embedding Models Learn The Same Thing
embeddings from different models are SO similar that we can map between them based on structure alone. without *any* paired data
feels like magic, but it's real:🧵
@hendrycks@NeelNanda5 While I personally am not sold on SAEs as the path forward, and I consider @GoodfireAI a competitor - I think what they have shown is progress and demonstrates the potential! Always happy to be proven wrong FWIW, and am already putting my money where my mouth is 🙂
@hendrycks Anthropic has garcon, which I’m willing to bet is a large reason behind Dario’s confidence. @NeelNanda5 putting out TransformerLens was great for increasing accessibility! Same with Google’s Gemmascope. Those are progress, and increase the number of people that can contribute