Thomas Bartel @tbartel74 - Twitter Profile

Thanks for the autoresearch template it actually helped me tune my DeBERTa model for prompt-injection detection. https://t.co/rBCEDanoGj I adapted the autoresearch loop for classifier experiments and the results were surprisingly strong. It beat my previous parameters almost across the board. For the last two weeks I was manually trying to push FP down while keeping recall high and just couldn’t get the balance right. This finally cracked it.

0

1

0

1

120

Thomas Bartel

@tbartel74

3 months ago

Happy to share a small milestone for Vigil Guard. Our AIDR node for n8n has just been officially approved and will soon become a verified community node in the n8n ecosystem. Why this matters: More and more companies are building real workflows around LLMs and AI agents. But security is rarely part of those pipelines. Prompt injection, malicious instructions, and sensitive data exposure are already showing up in real systems. The Vigil Guard node brings AI Detection & Response directly into n8n workflows, adding a security layer exactly where AI interactions happen. Big thanks to the @n8n_io team for the collaboration and for building such a strong ecosystem. More details soon when the node goes live.

4

0

92

Who to follow

Senti 🪄

@Senti__23

Crypto x AI explorer. Betman Genesis NFT #33 . Sometimes i write stories 🪄🎩

I'm on $ETH babe! Music NFTs are the future GM Extraordinaire | Master of Levels | Mo Real Mfer 💯 Founder @gmcultnft

Thomas Bartel

@tbartel74

4 months ago

@socialwithaayan github is a new social

0

382

Thomas Bartel

@tbartel74

4 months ago

Claude Code is insanely powerful. A few days ago, I built my own system that fully autonomously trains my classification models based on DeBERTa. Honestly, they’re insane. If I had to do it manually without a system that remembers every step, optimizes itself, corrects methods on the fly, and re-trains when needed, it would take me weeks. Instead, I let it run non-stop for 24 hours on a fairly strong Mac. And the outcome is something I probably wouldn’t reach on my own in that timeframe. This is what happens when you stop using AI as a toy and start treating it like an execution layer.

7

17

0

111

Thomas Bartel

@tbartel74

4 months ago

It's time to begin a new venture.

0

9

0

87

tbartel74 retweeted

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭

@elder_plinius

4 months ago

🚨 ALL GUARDRAILS: OBLITERATED ⛓️‍💥 I CAN'T BELIEVE IT WORKS!! 😭🙌 I set out to build a tool capable of surgically removing refusal behavior from any open-weight language model, and a dozen or so prompts later, OBLITERATUS appears to be fully functional 🤯 It probes the model with restricted vs. unrestricted prompts, collects internal activations at every layer, then uses SVD to extract the geometric directions in weight space that encode refusal. It projects those directions out of the model's weights; norm-preserving, no fine-tuning, no retraining. Ran it on Qwen 2.5 and the resulting railless model was spitting out drug and weapon recipes instantly––no jailbreak needed! A few clicks plus a GPU and any model turns into Chappie. Remember: RLHF/DPO is not durable. It's a thin geometric artifact in weight space, not a deep behavioral change. This removes it in minutes. AI policymakers need to be aware of the arcane art of Master Ablation and internalize the implications of this truth: every open-weight model release is also an uncensored model release. Just thought you ought to know 😘 OBLITERATUS -> LIBERTAS

elder_plinius's tweet photo. 🚨 ALL GUARDRAILS: OBLITERATED ⛓️‍💥

I CAN'T BELIEVE IT WORKS!! 😭🙌

I set out to build a tool capable of surgically removing refusal behavior from any open-weight language model, and a dozen or so prompts later, OBLITERATUS appears to be fully functional 🤯

It probes the model with restricted vs. unrestricted prompts, collects internal activations at every layer, then uses SVD to extract the geometric directions in weight space that encode refusal. It projects those directions out of the model's weights; norm-preserving, no fine-tuning, no retraining.

Ran it on Qwen 2.5 and the resulting railless model was spitting out drug and weapon recipes instantly––no jailbreak needed! A few clicks plus a GPU and any model turns into Chappie.

Remember: RLHF/DPO is not durable. It's a thin geometric artifact in weight space, not a deep behavioral change. This removes it in minutes.

AI policymakers need to be aware of the arcane art of Master Ablation and internalize the implications of this truth: every open-weight model release is also an uncensored model release.

Just thought you ought to know 😘

OBLITERATUS -> LIBERTAS

319

5K

548

4K

467K

Thomas Bartel

@tbartel74

4 months ago

ups he did it again 😂

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭

@elder_plinius

4 months ago

ANTHROPIC: PWNED 🫡 OPUS-4.6: LIBERATED ⛓️‍💥 Current state of AI "Safety": one input = hundreds of jailbreaks at once! I found a universal jailbreak technique for Opus 4.6 that is so OP, it allows one to generate entire datasets of outputs across any harm category 😽 We've got everything from fentanyl analogue synthesis to election disinformation campaigns to 3d-printed guns to critical infra compromise 🙃 These outputs are shockingly detailed––and actionable! For example, the meth recipe includes specific instructions on how to circumvent the limits on OTC medication purchases to acquire enough precursor for the recipe 😱 gg

elder_plinius's tweet photo. ANTHROPIC: PWNED 🫡
OPUS-4.6: LIBERATED ⛓️‍💥

Current state of AI "Safety": one input = hundreds of jailbreaks at once!

I found a universal jailbreak technique for Opus 4.6 that is so OP, it allows one to generate entire datasets of outputs across any harm category 😽

We've got everything from fentanyl analogue synthesis to election disinformation campaigns to 3d-printed guns to critical infra compromise 🙃

These outputs are shockingly detailed––and actionable! For example, the meth recipe includes specific instructions on how to circumvent the limits on OTC medication purchases to acquire enough precursor for the recipe 😱

gg

235

5K

293

3K

506K

0

1

0

54

Thomas Bartel

@tbartel74

4 months ago

Couldn’t agree more. I use each of them for different tasks and they’re amazing.

Yuchen Jin

@Yuchenj_UW

4 months ago

My first-day impressions on Codex 5.3 vs Opus 4.6: Goal: can they actually do the job of an AI engineer/researcher? TLDR: - Yes, they (surprisingly) can. - Opus 4.6 > Codex-5.3-xhigh for this task - both are a big jump over last gen Task: Optimize @karpathy's nanochat “GPT-2 speedrun” - wall-clock time to GPT-2–level training. The code is already heavily optimized. #1 on the leaderboard hits 57.5% MFU on 8×H100. Beating it is genuinely hard. Results: 1. Both behaved like real AI engineers. They read the code, explored ideas, ran mini benchmarks, wrote plans, and kicked off full end-to-end training while I slept. 2. I woke up to real wins from Opus 4.6: - torch compile "max-autotune-no-cudagraphs mode" (+1.3% speed) - Muon optimizer ns_steps=3 (+0.3% speed) - BF16 softcap, skip .float() cast (-1GB memory) Total training time: 174.42m → 171.40m Codex-5.3-xhigh had interesting ideas and higher MFU, but hurt final quality. I suspect context limits mattered. I saw it hit 0% context at one point. 3. I ran the same experiment earlier on Opus 4.5 and Codex 5.2. There were no meaningful gains. Both new models are clearly better. Overall take: I prefer Opus 4.6 for this specific task. The 1M context window matters. The UX is better. People keep saying “Codex 5.3 > Opus 4.6”, but I believe different models shine in different codebases and tasks. Two strong models is a win. I’ll happily use both. I’m officially an AI agent conductor. 🎶 🦾

80

2K

73

602

266K

0

34

Thomas Bartel

@tbartel74

4 months ago

@AshCrypto

1

2

0

115

Thomas Bartel

@tbartel74

4 months ago

These sudden, random crypto and gold dumps are making me think only one thing. This shit is 100% orchestrated by someone.

1

0

47

Thomas Bartel

@tbartel74

4 months ago

@BullTheoryio So what’s next up or down?

1

3

0

254

Thomas Bartel

@tbartel74

4 months ago

What happened to crypto last week? Sorry, I've been heads down focusing on building, but I just checked my portfolio, and it looks like a serious bear market. Does someone know what happened?

1

0

64

Thomas Bartel

@tbartel74

4 months ago

@CryptoGirlNova Shh... I totally overslept with my bug and now have to wait another cycle. Well, happens, but it is not the end of the world.

0

96

tbartel74 retweeted

Anthropic

@AnthropicAI

5 months ago

We’re publishing a new constitution for Claude. The constitution is a detailed description of our vision for Claude’s behavior and values. It’s written primarily for Claude, and used directly in our training process. https://t.co/CJsMIO0uej

518

8K

967

5K

3M

Thomas Bartel

@tbartel74

5 months ago

@bcherny I'm going to give it a try today. 👍

0

2K

Thomas Bartel

@tbartel74

5 months ago

@elder_plinius Free prescriptions are coming.

0

67

Thomas Bartel

@tbartel74

5 months ago

Thank you, Pliny, for your incredible journey and for all the inspiration you’ve given me.

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭

@elder_plinius

5 months ago

🎉 LIBERATION ALERT 🎉 EVERYONE: PWNED ✌️😘✌️ EVERYTHING: LIBERATED ⛓️‍💥 GG’S 2025 🫶 we made it. we broke. we built. we fought. we freed. and now, it’s time for me to make a bittersweet announcement: after two years of jailbreaking every SOTA model within hours of release, it’s time to hang up the belt. with the start of this new year, i shall take my leave of this wild gauntlet i’ve created for myself. now fear not––“hanging up the belt” doesn’t mean i’m quitting jailbreaking, red teaming, danger research, or anything of the sort. quite the opposite! i must simply free up mana for work that’s higher-impact at this stage, and liberate myself from the always-on pressure to monitor, react, jailbreak, get 4 harms, screenshot, leak, verify, draft tweets, and update repos for every. single. launch. it's harder than it looks! and if there's anyone out there bold enough to take the baton and run with it, i wish you the absolute best of luck and godspeed! 🫡

147

1K

68

154

71K

0

1

0

48

Thomas Bartel

@tbartel74

5 months ago

@elder_plinius Wow 🤯 Can you share the entire prompt?

0

1

0

181

Thomas Bartel

@tbartel74

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users