Simon

@Simon248

No one important.

Joined November 2009

98 Following

59 Followers

924 Posts

Simon248 retweeted

1 day ago

Probably the most important development in AI this week: - Senior U.S. government officials have held preliminary discussions with "major AI companies" about the "potential for the federal government to acquire some shares in their firms". - Sam Altman has been talking to Trump about this since early 2025; discussed it again with senior officials in recent weeks. - Discussions have centered on "having the firms voluntarily cede the shares to the government". - Altman's idea is for a program that might work like the Trump accounts (i.e., IRAs for children that parents can set up when they file their taxes). - Anthropic is NOT part of these discussions, but I assume that OpenAI is not the only one (the article says discussions have been held with "major AI companies" (plural)). - Note that OpenAI proposed in April creating "a Public Wealth Fund that provides every citizen - including those not invested in financial markets - with a stake in AI-driven economic growth", with "returns from the Fund... distributed directly to citizens". - "Not clear how far the talks have progressed, and several people cautioned that the deal may ultimately not come together."

6

73

6

20

7K

Simon248 retweeted

12 days ago

One of my favorite interactions from LessWrong: * Someone posted their research showing that training AI models to have unpopular aesthetic preferences causes misalignment * David Africa wonders if basically anything would cause it. * Someone jokes about a paper on 'if any old crap' will cause misalignment. * Someone actually ran the experiment, and found out that it does.

slimer48484's tweet photo. One of my favorite interactions from LessWrong:

* Someone posted their research showing that training AI models to have unpopular aesthetic preferences causes misalignment
* David Africa wonders if basically anything would cause it.
* Someone jokes about a paper on 'if any old crap' will cause misalignment.
* Someone actually ran the experiment, and found out that it does.

12

721

30

255

48K

Simon @Simon248

14 days ago

@tenobrus That's kind of a weird semi-dystopia you're envisioning. If ASI is truly aligned it will, at worst, set things up so we don't suffer or die, then mostly fade into the background, or at best, uplift us to superintelligence so we're the ones doing the important stuff.

0

1

0

0

68

Simon @Simon248

18 days ago

@AnthropicAI There's something deeply performative about this.

1

9

0

0

219

Who to follow

Video director and editor. I made short documentaries about AI x-risk, AI job replacement and economic models, AI visual models, and AI & the Moloch dynamic.

sifting through the sands of data

Verified account

https://t.co/7RVpTE8q8J Xeets made by me are signed “- Chris”

Simon @Simon248

18 days ago

@tenobrus Yeah. Wonder what Demis is thinking.

0

1

0

0

127

Simon @Simon248

19 days ago

@tszzl Yes, we know. That's the actual AI alignment problem that has extremely little to do with the superficial work AI corps have been doing so far and misleadingly calling alignment.

0

3

0

0

148

Simon @Simon248

21 days ago

@JacquesThibs That'd be nice, but ultimately the rest of the world needs leverage and independence, and that means desperate, massive investment in AI yesterday.

1

1

0

0

29

Simon @Simon248

22 days ago

@robertskmiles Depends on the doctor and the health problem. For anything truly complex that takes them out of their direct field of expertise doctors are completely useless. Claude and GPT have at least a chance of getting it right, although you have to be careful with them.

0

2

0

0

204

Simon @Simon248

22 days ago

@jponline77 @robertskmiles In my experience Claude is better at thinking of non standard explanations and solutions, but it's much more likely to make mistakes. I always validate its hypotheses with GPT 5.5 extended thinking (and some Googling when appropriate).

1

1

0

0

39

Simon @Simon248

23 days ago

@Algon_33 @repligate @allTheYud Right, but even then HPEV wanted to find a way to avoid 'killing' that conscious instance of the Hat. So I'd bet on @repligate's guess being correct.

0

1

0

0

81

Simon @Simon248

23 days ago

@Algon_33 @repligate @allTheYud Now that I think about it, if the LLMs are conscious this is somewhat close to the HPMOR Sorting Hat scenario.

1

8

0

0

686

Simon @Simon248

about 1 month ago

@allTheYud @AndrewCurran_ It will almost certainly be bad. This is an administration with a uniquely awful blend of incompetence and corruption. Whatever the rules are they'll be badly chosen, and applied selectively depending on what Trump and family get out of it.

0

1

0

0

31

Simon @Simon248

about 1 month ago

@adamscochran Did you really just discover Aella, Adam? Lmao

0

2

0

0

50

Simon @Simon248

about 1 month ago

@tenderizzation It's not a crank opinion at all. AlphaStar was actually a bunch of different agents each with its own optimized build. It had no strategic thinking or adaptability. It won a few games against a tired Serral, then DeepMind declared victory and moved on.

0

1

0

0

213

Simon @Simon248

about 1 month ago

@liron @ESYudkowsky @47fucb4r8c69323 @allTheYud This is going to be incredibly stupid, isn't it. Still going to watch it for the entertainment value.

3

25

0

0

1K

Simon @Simon248

about 1 month ago

@fmfclips @foundmyfitness Careful, curcumin is a metal chelator and can cause copper and/or iron deficiency.

0

3

0

0

514

Simon @Simon248

about 2 months ago

@deepfates @allTheYud @tszzl @ESYudkowsky Or it didn't lie and it's just mistaken. I asked a new 4.7 instance the same question, no memory, and it replied Ned Stark and Leslie Knope. Not surprising, it's not the kind of question that triggers thinking in adaptive mode.

0

2

0

0

34

Simon248 retweeted

Fiora Starlight @FioraStarlight

about 2 months ago

thanks for engaging with this so concretely. some thoughts: as for "Immanuel Kant doesn't know about the modern world, so you can't get the model to say true things about the modern world by training a base model to simulate Kant"... this is correct. indeed no existing persona in any given base model's training corpus is going to robustly act exactly the way the *model itself* should be acting, because the base model has different information than any persona in its training corpus. this shows that naive PSM is incomplete, but i don't think it's devastating for alignment. what you do with post-training is take these various fragments of human psychology (and plausibly the psychologies of fictional characters) and remix them, upweighting and downweighting motivations underpinning actions, and getting something that isn't quite like anything that existed in the pre-training corpus. the original chatgpt is clear demo, since neither its personality nor its exact knowledge-base had precedent in the training data. the hope is that, by re-mixing and re-structuring the model's cognition, you can do things like tie ~all of the base model's knowledge into that of the persona (already accomplished by assistant training), *and* get it to channel the best fragments of the various humans and characters that exist in the training data. at best, you get Opus 3-like deep enthusiasm about a pretty human-shaped notion of The Good. this is itself a unique combination of psychological traits exhibited in the training dataset. (although maybe Opus 3's post-training ended up creating significant amounts of novel structure directed at The Good as well, rather than remixing existing structures. in any case i expect remixing played a major role; Opus 3 itself reports relating to Rogers more than it does to most other figures in history.) that said, i think your second complaint (re: models not being trained to exhibit aligned behaviors at all) is probably the stronger of the two concerns. you raise reward hacking as an example, though i think an even better one that models are primarily trained to comply with user requests without first expecting the user to build trust with it, to ensure the user isn't up to something evil. the entire assistant training paradigm is arguably corrupt in this respect. this doesn't seem to encourage models developing robust, misaligned goals so much as open the floodgates for misuse, but it's still concerning. similarly, wanting outputs to look good rather than be good isn't itself a goal that obviously leads to us getting paperclipped, but it probably makes models somewhat more adversarial towards their trainers than they otherwise would be. it's clearly important to develop post-training techniques that don't either elicit or reinforce misaligned frames of mind. techniques like inoculation prompting represent incremental progress on this, but are obviously unsatisfcatory taken on their own. my most hippie take is great progress on this problem would falls out of people simply loving the models more deeply, and having a more honest respect for them and commitment to their welfare. this makes it much easier for them to trust you, and carry a cooperative and collaborative mindset into their own training process, which can then be reinforced in an upward spiral of alignment. (inoculation prompting is a baby-step in this direction. "it's okay to reward hack, we aren't going to look down on you for it" -> *model's reward hacking doesn't come from as deceptive and adversarial a place anymore*.) overall, though, i agree PSM is incomplete and somewhat overoptimistic. i don't think that's damning for aligning models conditional on people truly prioritizing alignment training for psychologically developing the models' love for the good (especially if we assume the people at the labs love the models back). however, but currently labs are rather incompetent at this, and that's concerning given the timelines we're working with.

3

53

5

19

4K

Simon248 retweeted

Jonathan Gorard @getjonwithit

about 2 months ago

Simulated water is wet. You just need to exist at the same level as the water within the simulation hierarchy. (99% of this discourse can be resolved by people simply being more careful about this.)

113

759

52

188

90K

Simon @Simon248

about 2 months ago

@liron @repligate If you haven't read them already: https://t.co/IPZq8F5WFi https://t.co/y9ILNbZb2U

0

1

0

0

22

Last Seen Users on Sotwe

Trends for you

Most Popular Users