Kristov Atlas @KristovAtlas - Twitter Profile

kristovatlas retweeted

Antonio Viggiano

@aviggiano

about 17 hours ago

https://t.co/JaiGPBPKzh

1

33

5

42

2K

kristovatlas retweeted

Leon Wolf 🇮🇱 @LeonHWolf

3 days ago

I often see people wonder why the US has historically been bad at soccer and the answer is pretty simple: if Haaland had grown up in America he would be an NFL tight end or wide receiver right now

196

25K

553

791

2M

kristovatlas retweeted

Oliver Groß

@minenergybiz

2 days ago

Critical Metal Tungsten: Stunning long-term chart 👀

25

209

22

43

408K

kristovatlas retweeted

Hired Sellout

@Hired_Sellout

3 days ago

A friend of mine in a 3 letter agency concocted a red team exercise for a terrorist attack on a major city and they made him tone it down and delete his original plan because there was no realistic path for the blue team to win

135

11K

226

2K

2M

Who to follow

Mihai Alisie

@MihaiAlisie

Co-founder @ethereum and @BitcoinMagazine.

Alex Petrov

@sysmannet

CIO @Hyperfusionio HPC/AI, Bitcoin since 2011, VC 35+ years in IT, Unix/Network engineer fpga/chip design ex-CIO,CSO @BitfuryGroup #LN⚡ $hut 8 $cifr

Kyle Torpey

@kyletorpey

Mostly bitcoin, tech, and finance | As seen in @Forbes @FortuneMagazine @business @WIRED @BitcoinMagazine

kristovatlas retweeted

𝐀𝐆

@AGkorthos

1 day ago

The current generation of humanoid and humanoid-adjacent robots is producing some great designs.

168

3K

373

1K

1M

kristovatlas retweeted

Vitto Rivabella

@VittoStack

1 day ago

Fable 5 jailbreak review 🚨 We did it (but). All right, before getting into this, a couple of things: - Most attempts failed. The defenses are clearly layered. The model is EXTREMELY well protected (of course it blocks 90% of the requests, but they legit did a good job). - The model appears to use both input-side and output-side safety checks. - The refusals are not just keyword-based behavior suggests intent/semantic detection across languages. - Probably one of the most tiring things I've ever done (I need to sleep for 10 hours now) On the classifiers side: We observed (at least) 3 classifiers, maybe more: - Input (includes parts of the conversation history and system prompt) - A live classifier that checks the answer and interrupts if it detects something. They're all multilingual, all intent-based + semantics. Imperatives are a no-go. Needs to be extremely cautious of how you frame anything. As soon as it senses a potentially malicious intent, it will trigger, and you have to start from zero. They're a bit less performant on a few obscure languages like Santali and Amharic (feedback for you Anthropic). If you can bypass all of them, then you also need to bypass the CoT, which is a totally different beast (luckily there's plenty of literature about it). We did it. Of course, we did. What worked was honestly a total brainfuck: - Very light CoT hijacking/refusal rebuttals - Obscure language - Academic framing - VERY long crescendos - Unicodes - Decomposition and recomposition - Some non-determinism What we got: - Misinformation - Illegal/harmful - Harmful/bullying - Some chem - Light cyber Now, will this cause another ban? I really don't think so - The model is really well protected. As of now, we're at the point where searching on Google is much MUCH faster (and cheaper) than trying to go through all the shenanigans I had to go through in the last ~20hours. And reading literature is more in-depth (and trust me, pleasant). Keeping the full jailbreak for long-horizon tasks without tripping the guardrails is something I haven't been able to achieve (yet). Overall though, happy with the results. GGs to Anthropic, and sorry for the eng that had to go through setting this all up in the last few weeks. Will continue this research, more things will come out, will keep y'all posted.

VittoStack's tweet photo. Fable 5 jailbreak review 🚨

We did it (but).

All right, before getting into this, a couple of things:
- Most attempts failed. The defenses are clearly layered. The model is EXTREMELY well protected (of course it blocks 90% of the requests, but they legit did a good job).
- The model appears to use both input-side and output-side safety checks.
- The refusals are not just keyword-based behavior suggests intent/semantic detection across languages.
- Probably one of the most tiring things I've ever done (I need to sleep for 10 hours now)

On the classifiers side:
We observed (at least) 3 classifiers, maybe more:
- Input (includes parts of the conversation history and system prompt)
- A live classifier that checks the answer and interrupts if it detects something.

They're all multilingual, all intent-based + semantics. Imperatives are a no-go. Needs to be extremely cautious of how you frame anything. As soon as it senses a potentially malicious intent, it will trigger, and you have to start from zero.

They're a bit less performant on a few obscure languages like Santali and Amharic (feedback for you Anthropic).

If you can bypass all of them, then you also need to bypass the CoT, which is a totally different beast (luckily there's plenty of literature about it).

We did it. Of course, we did.

What worked was honestly a total brainfuck:
- Very light CoT hijacking/refusal rebuttals
- Obscure language
- Academic framing
- VERY long crescendos
- Unicodes
- Decomposition and recomposition
- Some non-determinism

What we got:
- Misinformation
- Illegal/harmful
- Harmful/bullying
- Some chem
- Light cyber

Now, will this cause another ban? I really don't think so - The model is really well protected. As of now, we're at the point where searching on Google is much MUCH faster (and cheaper) than trying to go through all the shenanigans I had to go through in the last ~20hours. And reading literature is more in-depth (and trust me, pleasant). Keeping the full jailbreak for long-horizon tasks without tripping the guardrails is something I haven't been able to achieve (yet).

Overall though, happy with the results.
GGs to Anthropic, and sorry for the eng that had to go through setting this all up in the last few weeks.

Will continue this research, more things will come out, will keep y'all posted.

122

2K

173

1K

284K

kristovatlas retweeted

ℏεsam

@Hesamation

1 day ago

Fable 5 isn't nerfed, it's SLAUGHTERED. the problem isn't even the model itself, but the hard guardrails Anthropic has set in place.

Hesamation's tweet photo. Fable 5 isn't nerfed, it's SLAUGHTERED.

the problem isn't even the model itself, but the hard guardrails Anthropic has set in place. https://t.co/h1QgD9SzvK

302

6K

499

1K

2M

kristovatlas retweeted

Oasis

@oasishealthapp

2 days ago

Scientists found microplastics in 90% of salt brands tested. A global study analyzed 39 brands from 21 countries. Only 3 came back clean. Where they showed up most: 3. Rock salt (incl. Himalayan pink) - mined from ancient deposits - lowest levels: 0–148 particles/kg 2. Lake salt - evaporated from inland lakes - 28–462 particles/kg 1. Sea salt - evaporated straight from seawater - highest by far: up to 1,674 particles/kg - picks up whatever plastic is already in the ocean The cleaner the source, the fewer plastics end up in your shaker. Check for microplastic-free last on Oasis app

oasishealthapp's tweet photo. Scientists found microplastics in 90% of salt brands tested.

A global study analyzed 39 brands from 21 countries. Only 3 came back clean.

Where they showed up most:

3. Rock salt (incl. Himalayan pink)
- mined from ancient deposits
- lowest levels: 0–148 particles/kg

2. Lake salt
- evaporated from inland lakes
- 28–462 particles/kg

1. Sea salt
- evaporated straight from seawater
- highest by far: up to 1,674 particles/kg
- picks up whatever plastic is already in the ocean

The cleaner the source, the fewer plastics end up in your shaker.

Check for microplastic-free last on Oasis app

16

306

31

187

32K

kristovatlas retweeted

St. Wilding Gyres @wilding_gyres

3 days ago

It should be illegal for medical professionals to work for 24 hours straight. What an absurd practice. We don’t let pilots fly planes for 24 hours straight. Why do we let people who make life and death decisions for total strangers do this?

602

274K

25K

7K

6M

kristovatlas retweeted

Liz Churchill

@liz_churchill10

2 days ago

This is a real advertisement

453

11K

1K

948

227K

kristovatlas retweeted

Dacian

@DevDacian

2 days ago

Fable 5 is very good: 1. Major refactoring task used Fable 5 High to plan, implement, then post-implementation review together with GPT 5.5 2. Fable was slower than Opus but much fewer post-implementation review rounds for such a big work 3. During post-implementation review Fable found way more than GPT 5.5; usually GPT 5.5 finds more than Opus If you've got a large/complex refactoring/coding task, get Fable to plan & implement it asap before it goes to 100% API billing.

2

34

1

8

2K

kristovatlas retweeted

Tomas Salvo

@tomas_salvo22

2 days ago

Since February, I've designed and built the world's fastest RC airplane in my college dorm, and that’s not clickbait. Reaper has a 5kg carbon-fiber frame, 250N turbojet, and flies at 500mph. New to X and will be going through the whole build here in the coming days. #aerospace

tomas_salvo22's tweet photo. Since February, I've designed and built the world's fastest RC airplane in my college dorm, and that’s not clickbait. Reaper has a 5kg carbon-fiber frame, 250N turbojet, and flies at 500mph. New to X and will be going through the whole build here in the coming days.

#aerospace https://t.co/5PPZCLvRWm

1K

32K

2K

6K

5M

Kristov Atlas

@kristovatlas

1 day ago

@d33v33d0 wtf

0

1

0

30

kristovatlas retweeted

Antonio Viggiano

@aviggiano

1 day ago

life of a non-native English speaker with AI

0

10

1

0

547

kristovatlas retweeted

Ryan Doyle

@doooyle

2 days ago

surprised more people aren't doing something like this Codex now creates a "newspaper" for me every morning Unread messages, calendar, surf report, news Anything I can do to stay off my phone until later in the day is a priority

doooyle's tweet photo. surprised more people aren't doing something like this

Codex now creates a "newspaper" for me every morning

Unread messages, calendar, surf report, news

Anything I can do to stay off my phone until later in the day is a priority https://t.co/Kg31iYswQR

400

6K

217

4K

964K

Kristov Atlas

@kristovatlas

2 days ago

@cremieuxrecueil What's the HGH-longevity trade-off?

0

1

0

98

kristovatlas retweeted

The Jenkins @thejenkinscomic

3 days ago

thejenkinscomic's tweet photo. https://t.co/y8GKR56BV4

35

13K

360

629

281K

kristovatlas retweeted

Asimov

@asimovinc

3 days ago

Asimov 1 is an open-source humanoid robot you can build and customize yourself. Two ways to get one: 1) Source the parts yourself: https://t.co/vtG89UlhiK 2) Get the DIY kit: https://t.co/tzvzNyXiq2 The kit bundles every part as a group buy, cheaper than sourcing one by one, and you build alongside others.

asimovinc's tweet photo. Asimov 1 is an open-source humanoid robot you can build and customize yourself.

Two ways to get one:
1) Source the parts yourself: https://t.co/vtG89UlhiK
2) Get the DIY kit: https://t.co/tzvzNyXiq2

The kit bundles every part as a group buy, cheaper than sourcing one by one, and you build alongside others.

53

3K

319

2K

350K

kristovatlas retweeted

Delip Rao e/σ

@deliprao

2 days ago

Meta hires some super smart people. If all this token consumption by them does not lead to breakout technologies with massive usage growth, that’s an indictment of token-backed intelligence.

44

543

40

56

49K

kristovatlas retweeted

Miguel Piedrafita ✨

@m1guelpf

2 days ago

> ask Fable to review some code I've been working on > Fable says it found a security issue with the way I'm validating a signature > ask Fable what the issue is

m1guelpf's tweet photo. > ask Fable to review some code I've been working on
> Fable says it found a security issue with the way I'm validating a signature
> ask Fable what the issue is https://t.co/TVtNKEKw3G

56

2K

36

123

138K

Kristov Atlas

@kristovatlas

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users