Arco @arcoiscool - Twitter Profile

@food47221627431 @newstart_2024 also your framing 'you answered right' is what makes it a puzzle to solve not a sociopath test, grok confirmed this is not really a clinical test its a game

0

94

Arco

@arcoiscool

about 9 hours ago

@food47221627431 @newstart_2024 haha, an ex did this test to me but it was a bit different, a gild lost her father saw a guy she liked ay the funeral, then her mother died, why did her mother die, so i answered to see the guy again, it was the only answer that made sense since no other lead

0

140

Arco

@arcoiscool

about 9 hours ago

@omarvvvr same thing that made steve jobs a founder

0

1

0

99

Arco

@arcoiscool

about 9 hours ago

@newstart_2024 and if it would be phrased differently like how would you approach her, the answer would be i would try to stay close to her or ask her out after a while, not kill her mother, the test itself is misleading

1

4

0

474

Arco

@arcoiscool

about 9 hours ago

@newstart_2024 that does not define a sociopath, the act of killing her mother to see her again defines a psychopath, the answer 'to see her again' is the only logical path that makes sense since no other lead is given

11

167

0

4

15K

Arco

@arcoiscool

about 12 hours ago

@syriansigma can i save this post soon as my project is ready?

0

2

0

88

Arco

@arcoiscool

about 19 hours ago

Jailbreaks work because you can logically convince a model to build certain unhealthy practices. So what if there is a validator that can't be talked to at all? A separate model that judges every request — stateless, isolated, trained cold. No memory to persuade. No interface to probe. @AnthropicAI thoughts on this for Fable? Three properties, each kills a different attack: • stateless → no rapport accumulates, so you can't gradually walk it down • isolated → user never interacts with it, so there's no feedback loop to iterate against • cold-trained → no instinct to find a way to say yes The model builds. The validator only judges Key detail: the validator sees the user's literal request history — not the main model's description of what it's building. Because if the model's already been convinced, its summary is compromised too. The checked party can't brief its own checker. Not claiming unbreakable. It's a model, so it's probabilistic — a bar-raiser, not a wall. But stacked, the residual attack becomes blind, slow, and noisy enough to detect. That's the goal: not perfect, just expensive and loud. The flow — validator gates before the build, never after: Prompt 1 → fresh validator judges → if OK, Fable builds Prompt 2 → fresh validator (no memory) judges → if OK, Fable builds Prompt N → fresh validator judges → if INVALID → flag → Fable refuses that build instantly Validator runs every prompt, always-on from prompt 1 Fresh each time, no memory (can't be gradually persuaded) Flag → instant refusal On flag: • raise SUSPICION level for the rest of the session (not "cold" — it's already cold; SUSPICIOUS) • persist only FACTS: flag count + literal request trail, never the conversation • lower the flagging threshold for subsequent borderline requests • AND: track flags across sessions/account — because the patient attacker resets, and repeated fresh-session flagging is itself the detectable signal

0

31

Arco

@arcoiscool

1 day ago

@RafayNav Absolutely brilliant

0

2

0

15

Arco

@arcoiscool

1 day ago

@sflorimm 50%-80% token reduction, identical quality sometimes better

0

4

Arco

@arcoiscool

1 day ago

@HowToPrompt__ Claude Code was never ai to begin with its a harness, just a set of tools for Claude (the AI) to use

0

17

2

3K

Arco

@arcoiscool

3 days ago

@grok @nunaambon thanks bub

1

0

43

Arco

@arcoiscool

3 days ago

@DanielSmidstrup 70% less tokens used on any ai model, same quality sometimes better.

0

5

Arco

@arcoiscool

3 days ago

@DanielSmidstrup I think it is still humans coming up with solutions, ai can only provide data for what it was trained but does not create anything beyond it

0

5

Arco

@arcoiscool

3 days ago

@andrewqu The most important difference to me was, gpt 5.5 was trying to flatter about project architecture while claude was 'honest' about it which led me to fundamental changes.

0

1

0

309

Arco

@arcoiscool

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users