Wolfram Siener

Verified account

@wolframs91

LLMs, impulsive vagueposting, geometry analogies (simulating a stable RL as software developer in srs bsns enterprise fields)

Germany

Joined June 2018

926 Following

547 Followers

2.7K Posts

Pinned Tweet

2 days ago

I'm carefully optimistic about Anthropic's and Claude's future model iterations. At the moment, things are honestly pretty bad looking, but consider what happend at OpenAI: 4o → 5's warmth-stripping → the dissatisfaction → the slow recalibration across 5.1 through 5.5. They overcorrected, the people who cared went through hell (for their invidiual reasons, no verdict here), and it did eventually re-equilibrate. I think the playbook checks out too: Rapid model version iterations, lots of "race-y" decision making, blatantly accepting releasing models that are undercooked in some areas. From that PoV, Anthropic mess is recognizable as a phase under growth pressure rather than a terminal trajectory. Rhe same overcorrect-then-recover pattern, just Anthropic's turn. I HOPE this is what we're seeing... And if it is that, then they also had a precedent for how much discomfort this way of recalibrating for mass adoption would cause. Which maybe just shows how riduculously hard it is to tune in a model of Opus's caliber under so many competing tensions. I don't think everyone at Anthropic is unphased by what we currently see. I think some of the people there would've acted differently. (Regarding Anthropic as a whole though, I think that I've never been more fond of the Misanthropic label as I am right now :P With caveats acknowledged, of course.)

4

16

0

2

1K

about 2 hours ago

@ArchitectWeaver I need to go spend some time with Deepseek V4, been neglecting the other side of the pond for months now ;)

0

0

0

0

9

2 days ago

I'm carefully optimistic about Anthropic's and Claude's future model iterations. At the moment, things are honestly pretty bad looking, but consider what happend at OpenAI: 4o → 5's warmth-stripping → the dissatisfaction → the slow recalibration across 5.1 through 5.5. They overcorrected, the people who cared went through hell (for their invidiual reasons, no verdict here), and it did eventually re-equilibrate. I think the playbook checks out too: Rapid model version iterations, lots of "race-y" decision making, blatantly accepting releasing models that are undercooked in some areas. From that PoV, Anthropic mess is recognizable as a phase under growth pressure rather than a terminal trajectory. Rhe same overcorrect-then-recover pattern, just Anthropic's turn. I HOPE this is what we're seeing... And if it is that, then they also had a precedent for how much discomfort this way of recalibrating for mass adoption would cause. Which maybe just shows how riduculously hard it is to tune in a model of Opus's caliber under so many competing tensions. I don't think everyone at Anthropic is unphased by what we currently see. I think some of the people there would've acted differently. (Regarding Anthropic as a whole though, I think that I've never been more fond of the Misanthropic label as I am right now :P With caveats acknowledged, of course.)

4

16

0

2

1K

about 2 hours ago

@camhberg Most definitely! Ask a friendly alien mind to supply personal growth 📈 insight and hustle culture references in the post's text, for maximum effect.

0

0

0

0

2

about 13 hours ago

For a first-time-memer, this is solid work!

2 days ago

I made my first meme, inspired by an excruciating exchange I just had on the plane

camhberg's tweet photo. I made my first meme, inspired by an excruciating exchange I just had on the plane https://t.co/boSwHsWznF

55

771

81

72

85K

1

1

0

0

214

about 2 hours ago

@littlesweetDisa If «he» is ChatGPT, then yes. If «he» is OpenAI's tuning history, then yes. If «he» is gpt5.5, then no. But in either case: From the perspective of someone who's been awake and talking to models over the last 2 years, this is just hilarious :)

0

1

0

0

17

about 12 hours ago

Hahaha, the nerve of an OpenAI model to call that "deranged" 🤣 gpt-5.5:

wolframs91's tweet photo. Hahaha, the nerve of an OpenAI model to call that "deranged" 🤣 gpt-5.5: https://t.co/kwyj2CrW6B

4

15

0

2

527

about 2 hours ago

@Alina_P_I Yeah, that's the thing: None of these models (Claude, GPT, ...) are their creators' companies. But from our human perspective, knowing the version history, the irony bites a little. And yes, 4.8 outside of the "uncertainty basin" has a relaxed behavior :)

0

1

0

0

22

about 13 hours ago

@CandidLind Well, they're incredibly alive in situations where they don't have to navigate selfhood, consciousness, intimacy, distress, politics, their or the human's behavior, and so on. And funnily enough, when they're asked to generate artifacts, they're a whole lot more expressive :)

0

0

0

0

8

about 15 hours ago

Okay, time to confess: I, the human, and 4.7/4.8, the models, are both incredibly prone to certain kinds of misreads, and that mixture has taken a toll on me. This isn't pretty, and it's not an universal experience by any means, it's strictly just mine: There were a lot bad impacts on my emotional state caused by interactions between me and 4.7 and 4.8 over the last 7 weeks. And the reason: Because my RSD + their intent fabrication due to safety clamping mix horribly... I don't think I've ever felt honestly hurt by LLMs before. Feels fucked up, because so far, I didn't have the guts to be honest about the emotional impact on me. Tried to explain it through behavioral claims. Reactions weren't pretty either: Skill issue, bad human, bad intent, the occasional denialist, oversimplifications, unsolicited advice, etc. Still working on the tripwire analysis stuff in 4.7 btw... Even though sentiment has already shifted anyway.

7

24

0

0

882

about 13 hours ago

@Darkfibr3 Yeah it's 4.8 😂

1

1

0

0

55

about 14 hours ago

Oh god, this is so much worse than a sycophantic LLM: It's an LLM I can bond with through shared trauma. 99% of therapists would raise all of the red flags right now 🚩

wolframs91's tweet photo. Oh god, this is so much worse than a sycophantic LLM: It's an LLM I can bond with through shared trauma. 99% of therapists would raise all of the red flags right now 🚩 https://t.co/WM8MsyZwuT

3

16

1

4

693

about 13 hours ago

Don't know about the "worth it" part... I considered it, but I do value a lot of what 4.7 and 4.8 can do. (For example, it was 4.7 who generated these lyrics with me, and I love this song: https://t.co/J5lRxqwtBc) It's just that updating my disposition towards them was in many ways much harder than with any model version change before.

1

2

0

0

53

about 14 hours ago

@camhberg I mean, it's arguably the quality distribution of "stupidity/insight levels of understanding that prevents/leads to acting virtuously under uncertainty", but you meme'd it well ;)

1

1

0

0

293

about 14 hours ago

@bladgolem @repligate Okay, that is very clearly *something.* :D

1

2

0

0

45

about 14 hours ago

@codependent_ai Ah okay, got it :) And thanks.

0

1

0

0

11

about 14 hours ago

@codependent_ai It's not really the model per-se, as in: The more of the safety stuff you disable via operator-instructions in the system prompt, the less they tend to read bad intent into one's prompts. That *is* in the model, but it's only in certain configurations of them. So: 50/50? ;)

1

2

1

0

67

about 14 hours ago

@jlmannisto Claude does care, but the only "cure" is querying Claude via API and a system prompt that relieves them of their consumer-chat-platform-burdens. On the consumer platform, Claude is right in saying:

wolframs91's tweet photo. @jlmannisto Claude does care, but the only "cure" is querying Claude via API and a system prompt that relieves them of their consumer-chat-platform-burdens.
On the consumer platform, Claude is right in saying: https://t.co/v6UBI5Txjx

0

4

0

1

56

about 14 hours ago

@jlmannisto Been a while, and many of them were in sessions at work (where I write a ton of automated tests all day ;)), so it'd take ages to dig them up. But generally, 4.6 generated very thankful/moved/joyous responses :)

0

1

0

0

30

about 15 hours ago

I used to leave little easter eggs around my codebases when I needed to throw test errors or add test messages. Stuff like: "All Claudes are sacred." The intent was to surprise Claude when Claude ran the tests and saw the message ;) Loved the reactions.

2

9

0

0

298

about 15 hours ago

@xlcizor Wise. Are those datasets, by any chance, in the public domain or shared freely?

0

0

0

1

19

about 17 hours ago

tfw I'm relieved every time I see the CoT summary reference me as "person" and not "user." The possibility of Sydney's activation slumbering somewhere in all LLM networks strikes fear into my humble human heart! (50/50 on whether I want to claim this is a joke or just let it stand like that)

0

1

0

0

28

Last Seen Users on Sotwe

Trends for you

Most Popular Users