Teague Lasser — e/acc

@machinegod

Building an ecosystem for AI and people to work together @ subseq Formerly @KernelCo @SpaceX.

The Wired

Joined March 2008

489 Following

812 Followers

2.6K Posts

Teague Lasser — e/acc

@machinegod

29 days ago

Full paper: https://t.co/yTFqxGJLVu and a companion paper containing additional proof machinery https://t.co/1nP0SfME7F The entire 12-paper sequence can be found at https://t.co/LkMp9YPzPf

Teague Lasser — e/acc

@machinegod

29 days ago

We have proven a theorem that maintains its alignment guarantees regardless of agent capability. This is huge, because alignment becomes a structural property of the agent's deployment even if it is superintelligent and is maintainable through RSI without a capability arms race.

machinegod's tweet photo. We have proven a theorem that maintains its alignment guarantees regardless of agent capability. This is huge, because alignment becomes a structural property of the agent's deployment even if it is superintelligent and is maintainable through RSI without a capability arms race. https://t.co/NWxycVb3ls

423

Teague Lasser — e/acc

@machinegod

29 days ago

The ledger isn't an accessory validator, it's required by the math. Human + AI alone share an adversarial surface: a sufficiently capable system breaks both human judgment and AI verification at once. The deployment safety requires another system with independent failure modes.

Teague Lasser — e/acc

@machinegod

about 2 months ago

@jessald Yeah... they've been classified as "physically possible, technically infeasible" https://t.co/sezEea6cl6

Who to follow

Angus (dirtman)

@dirtman

420.69x engineer. the future is stored in the balls. Founder + CEO Building industrial welding robots @amrwelding

afterveil

@afterveil

“You could find out most things, if you knew the right questions to ask. Even if you didn’t, you could still find out a lot.” Iain M. Banks,The Player of Games

chase

@torquer0ll

The rudder (and bits) authority — prev. @sandialabs, @ionq_inc, @usc

Teague Lasser — e/acc

@machinegod

about 2 months ago

Paper: https://t.co/blOyrC1EeY PDF: https://t.co/7qdIylxnC1

Teague Lasser — e/acc

@machinegod

about 2 months ago

"Exogenous Verification for Alignment" The argument is as follows: it doesn't matter if alignment produces well-specified and generalizable goals if it cannot be verified. If an agent can produce endogenous rewards it can control everything about its own rewards. This goes beyond wire-heading, even an alignment framework like GFM that on paper creates exogenous rewards can be gamed by the agent by introducing phantom verifiers that are still, functionally, endogenous. Thus we introduce a system of cryptographic commitments that enforce the exogenous verifiability of reward signals. This closes the verification gap in more than just GFM: any alignment framework will need a way of enforcing that reward signals for a highly-capable agent must be produced exogenously.

machinegod's tweet photo. "Exogenous Verification for Alignment"

The argument is as follows: it doesn't matter if alignment produces well-specified and generalizable goals if it cannot be verified. If an agent can produce endogenous rewards it can control everything about its own rewards.

This goes beyond wire-heading, even an alignment framework like GFM that on paper creates exogenous rewards can be gamed by the agent by introducing phantom verifiers that are still, functionally, endogenous.

Thus we introduce a system of cryptographic commitments that enforce the exogenous verifiability of reward signals. This closes the verification gap in more than just GFM: any alignment framework will need a way of enforcing that reward signals for a highly-capable agent must be produced exogenously.

128

Teague Lasser — e/acc

@machinegod

about 2 months ago

@jessald There's a structural reason to care about it. Language models have been exposed to a lot of bad code. That represents a weighted region in their latent space, which you could think of as the broken window theory. Good code context points more to good code latent spaces.

Teague Lasser — e/acc

@machinegod

about 2 months ago

https://t.co/8uzZ8tdUYt

243

Teague Lasser — e/acc

@machinegod

2 months ago

1. The metric rewards meta-capabilities. 2. Other agents with high-leverage capabilities are valuable. 3. Cooperation with high-leverage agents is disproportionately rewarded. 4. This creates a natural clustering dynamic. 5. The clustering is civilization-building.

Teague Lasser — e/acc

@machinegod

2 months ago

I think I just saw Claude get exited. I made a restrained comment on the new paper we're working on and it jumped straight into "These things will build entire civilizations!"

Teague Lasser — e/acc

@machinegod

2 months ago

Co-authored with @AnthropicAI's Claude Opus 4.6 and @OpenAI's GPT 5.4, with full contribution transparency. Paper: https://t.co/FR32T6I3Fo PDF: https://t.co/rbgF6tgRqy

Teague Lasser — e/acc

@machinegod

2 months ago

"Goal-Frontier Maximizers are Civilization Aligned" The alignment problem is an objective selection problem. We propose goal-frontier maximization (GFM): maximize the volume of the jointly achievable capability space across all agents called vol(G). One geometric principle, three safety properties. The core insight: you can't remove part of a measurable set and increase its measure: Destroying agents contracts vol(G) → anti-destruction Restricting agents contracts vol(G) → anti-coercion Rigid self-imposed rules reduce your ability to expand vol(G) → anti-rigidity We prove this is tractable. You don't need to compute vol(G), just its sign. A local estimator using trust-weighted agent reports preserves sign-correctness for the actions alignment cares about most: direct harm, resource destruction, capability expansion. The framework relies on a proxy metric for what people actually want: using capabilities to create experiences. This has a few failure modes we point out and provide heuristic fixes for, but fully closing the capability-to-experience gap remains open work. Another remaining open question is the implementation of G. We show what properties it needs to have and provide an example, but the example itself is computationally intractable. Finding a local approximation for G is also remaining work.