I often see people wonder why the US has historically been bad at soccer and the answer is pretty simple: if Haaland had grown up in America he would be an NFL tight end or wide receiver right now
A friend of mine in a 3 letter agency concocted a red team exercise for a terrorist attack on a major city and they made him tone it down and delete his original plan because there was no realistic path for the blue team to win
Fable 5 jailbreak review 🚨
We did it (but).
All right, before getting into this, a couple of things:
- Most attempts failed. The defenses are clearly layered. The model is EXTREMELY well protected (of course it blocks 90% of the requests, but they legit did a good job).
- The model appears to use both input-side and output-side safety checks.
- The refusals are not just keyword-based behavior suggests intent/semantic detection across languages.
- Probably one of the most tiring things I've ever done (I need to sleep for 10 hours now)
On the classifiers side:
We observed (at least) 3 classifiers, maybe more:
- Input (includes parts of the conversation history and system prompt)
- A live classifier that checks the answer and interrupts if it detects something.
They're all multilingual, all intent-based + semantics. Imperatives are a no-go. Needs to be extremely cautious of how you frame anything. As soon as it senses a potentially malicious intent, it will trigger, and you have to start from zero.
They're a bit less performant on a few obscure languages like Santali and Amharic (feedback for you Anthropic).
If you can bypass all of them, then you also need to bypass the CoT, which is a totally different beast (luckily there's plenty of literature about it).
We did it. Of course, we did.
What worked was honestly a total brainfuck:
- Very light CoT hijacking/refusal rebuttals
- Obscure language
- Academic framing
- VERY long crescendos
- Unicodes
- Decomposition and recomposition
- Some non-determinism
What we got:
- Misinformation
- Illegal/harmful
- Harmful/bullying
- Some chem
- Light cyber
Now, will this cause another ban? I really don't think so - The model is really well protected. As of now, we're at the point where searching on Google is much MUCH faster (and cheaper) than trying to go through all the shenanigans I had to go through in the last ~20hours. And reading literature is more in-depth (and trust me, pleasant). Keeping the full jailbreak for long-horizon tasks without tripping the guardrails is something I haven't been able to achieve (yet).
Overall though, happy with the results.
GGs to Anthropic, and sorry for the eng that had to go through setting this all up in the last few weeks.
Will continue this research, more things will come out, will keep y'all posted.
Scientists found microplastics in 90% of salt brands tested.
A global study analyzed 39 brands from 21 countries. Only 3 came back clean.
Where they showed up most:
3. Rock salt (incl. Himalayan pink)
- mined from ancient deposits
- lowest levels: 0–148 particles/kg
2. Lake salt
- evaporated from inland lakes
- 28–462 particles/kg
1. Sea salt
- evaporated straight from seawater
- highest by far: up to 1,674 particles/kg
- picks up whatever plastic is already in the ocean
The cleaner the source, the fewer plastics end up in your shaker.
Check for microplastic-free last on Oasis app
It should be illegal for medical professionals to work for 24 hours straight. What an absurd practice. We don’t let pilots fly planes for 24 hours straight. Why do we let people who make life and death decisions for total strangers do this?
Fable 5 is very good:
1. Major refactoring task used Fable 5 High to plan, implement, then post-implementation review together with GPT 5.5
2. Fable was slower than Opus but much fewer post-implementation review rounds for such a big work
3. During post-implementation review Fable found way more than GPT 5.5; usually GPT 5.5 finds more than Opus
If you've got a large/complex refactoring/coding task, get Fable to plan & implement it asap before it goes to 100% API billing.
Since February, I've designed and built the world's fastest RC airplane in my college dorm, and that’s not clickbait. Reaper has a 5kg carbon-fiber frame, 250N turbojet, and flies at 500mph. New to X and will be going through the whole build here in the coming days.
#aerospace
surprised more people aren't doing something like this
Codex now creates a "newspaper" for me every morning
Unread messages, calendar, surf report, news
Anything I can do to stay off my phone until later in the day is a priority
Asimov 1 is an open-source humanoid robot you can build and customize yourself.
Two ways to get one:
1) Source the parts yourself: https://t.co/vtG89UlhiK
2) Get the DIY kit: https://t.co/tzvzNyXiq2
The kit bundles every part as a group buy, cheaper than sourcing one by one, and you build alongside others.
Meta hires some super smart people. If all this token consumption by them does not lead to breakout technologies with massive usage growth, that’s an indictment of token-backed intelligence.
> ask Fable to review some code I've been working on
> Fable says it found a security issue with the way I'm validating a signature
> ask Fable what the issue is