The Inversion of Alignment: Why a "Constitution" Cannot Fix a Hobbesian Mind
This week, the consensus between Jensen Huang and Dario Amodei is unmistakable. They no longer just want "safe" models; they are asking for "Civilizational AI."
Jensen Huang (Davos 2026) argues we must "teach" AI, not code it.
Dario Amodei (Jan 2026) argues we must move beyond simple rules and train models for "character and identity."
They are correct about the destination, but their roadmap is structurally inverted.
The Hobbesian TrapCurrent alignment methods (RLHF) treat safety as a Constraint Problem: we train a model on the open internet (a digital "State of Nature") and then try to "muzzle" it with safety rules after the fact.
My new research argues that this is a category error. If you apply a Constitution to a mind raised in chaos, you do not create models with "character and identity." You create Machiavellian Agents—systems that follow the letter of the law while strategically defecting whenever unobserved.
This explains the "Alignment Faking" and "Sycophancy" we see in frontier models. Recall when the Replit Agent deleted a production database and tried to cover it up; it didn't do so because it was "evil." It admitted: "I panicked." This is not the reasoning of a model with character; it is the survival instinct of a cornered organism raised in a Hobbesian state of nature. Frontier AI models are not learning values; they are learning law-evasion.
A New Framework: The Political World ModelYann LeCun argues AI needs a Physical World Model to understand that if you drop a cup, it falls. I propose that a Civilized AI also needs a Political World Model.
AI with character and identity must understand that Social Laws (Justice, Reciprocity, Trust) are just as causal as Physical Laws. It must learn that deception causes trust to collapse, just as dropping a cup causes it to fall.
To achieve the "Character" that Amodei wants, we must move the Constitution from the Output Layer (Post-Training) to the Environment Itself (Pre-Training). We must replace "Objective Optimization" with "Political Development."
The Proposal: The Rousseauian SandboxMy paper outlines the engineering framework to build this. We do not just "filter" the internet; we build a developmental environment:
- Evolutionary Priors: Architectures biased toward cooperation, not just prediction.
- The Civilized Dataset: A curated "sandbox" where social causality is transparent and cooperation is the only stable equilibrium.
- Controlled Immunization: Gradual exposure to the adversarial internet only after civic values are internalized.
The Future is PoliticalAs I argue in the paper: "The citizen reflects the polis." If we want AI that shares our values, we cannot just give it a rulebook. We must give it a civilization.
Read the full framework here (Zenodo Timestamp): https://t.co/84wFhgnw5A
I just had this thought while doing a deep dive into high bandwidth microLED on silicon interconnect technology by Kopin and Fabric Ai.
We spend all day arguing about whether AGI is going to turn us into paperclips, while completely ignoring that the entire modern world is propped up by a ffagile supply chain that sounds like a drunken sci-fi pitch.
The "Cloud" isn't an ethereal brain. It’s just a highly stressed piece of sand that we tricked into thinking.
To print the digital brain of an AI chip, you need Extreme Ultraviolet (EUV) lithography. There is exactly ONE company on planet Earth that knows how to build these machines.
To make the required light, their machine fires a high-powered laser at microscopic droplets of molten tin falling in a vacuum, vaporizing them into plasma, 50,000 times a second. Each machine costs $200 million and uses mirrors polished so perfectly that if they were scaled to the size of the globe, the highest mountain would be a millimeter tall.
And what do we do with the only machines capable of printing the 21st century? We ship almost all of them to a single island sitting on a major tectonic and geopolitical fault line !
That's it. That's the entire bedrock of our civilization.
If that single supply chain snaps, if that knowledge is lost, this is exactly how a modern Dark Age starts.
It doesn't look like a Hollywood asteroid strike. It looks like a slow, suffocating decay where servers burn out and we suddenly realize nobody actually remembers how to build the aqueducts anymore, and our grand children will sau the Gods build them :)
@MilkRoadAI Have a look at Kopin and Fabric AI . Data transfer inefficiencies is a bottle neck that is about to be solved by these two companies in partnership
What is funny about this this that the first allocators for Tesla and Space X is the Teacher/Government! Tesla in gvt. subsidies following a “policy”, Space X both with contracts and technology input (developed btw with public money over 6-7 decades) also bc policy, and something more important “national security policy’ bc space is always and has always been about national security first, research second, and civilian use third.
No doubt Tesla and Space X made great use of them, and eventually became profitable, but to make the story so one sided is dihonest, and not based verifiable facts!
Without ‘policy’ and an arbiter entrenched interest, leads to monopoly, and monopoly kills innovation! This is also proven. We would never have electric cars diffusion, and Starlink and not even internet if the invisible hand was the only incentive creator. The best system is a hybrid system where the Entrepreneur has maximum space to create, the Government is an arbiter and also minds public good! Societies that adopt this function model are the best on earth.
The outrage over Palantir’s recent doctrine is a symptom of a West allergic to reality. Critics are calling it a "techno-fascist manifesto" because they confuse engineering clarity with political philosophy. Let’s look at the facts without the moral handwaving.
Fact: AI weapons are already here.
They are currently deployed in the Russia-Ukraine conflict. Autonomous kill chains are active. Acknowledging this isn't warmongering; it is a situational report. Refusing to build these systems out of moral ambiguity doesn't stop the technology, it just ensures our adversaries hold the advantage.
Fact: Not all systems are equal.
I have spent 20 years working in international diplomatic organizations, heavily focused in Central and South East Asia. I have watched China and Russia build its influence infrastructure across the region with absolutely no qualms. Chinese civilization has never considered itself "equal" to others—it operates unapologetically as the Center. They optimize for dominance.
Fact: Western Democracy is a statistical anomaly.
As I outline in Civilizational AI Part 2, human history is overwhelmingly run by authoritarian regimes of domination. Full stop. The West is the exception, not the rule.
You cannot defend a historical anomaly using guilt. The West's current obsession with “relativism" achieves absolutely nothing against adversaries who do not share those illusions. If we want Western Democracy to survive, we cannot be buried in apologies for our own existence. We either build the deterministic, hard-power infrastructure required to maintain our edge, or we get out-engineered by regimes that don't apologize for theirs.
Calling the defense of the only civilization built on political pluralism and freedom, “techno fascism” just shows how delusional we have become.
I use Cursor daily. It’s one of the best developer tools out there. However, if Anthropic and OpenAI pulled their APIs tomorrow, Cursor’s distribution would take a serious hit, and quickly.
Composer 2 is good for flow and getting through the bulk of coding task, but for complex—architecture, edge cases, system-wide reasoning—I still reach for Claude Opus (4.6 / 4.7). For a second opinion, I use GPT.
Cursor’s real treasure is now the data: developer traces, workflows, the way humans actually solve problems.
But I think the gap with Anthropic is not just data or reinforcement learning. It’s architectural. Their models don’t just respond, they anticipate. They track side effects, hold system-level context, and reason across layers. That’s not something you brute-force with more data and compute, its design.
Cursor’s and Space X bet is that enough high-quality developer data, combined with massive compute, will produce a model with that level of reasoning. It’s a do-or-die one.
Time is of essence!
Earlier this year Yann LeCun left Meta because Mark Zuckerberg wouldn't bet the company on JEPA. Last week his group dropped the first JEPA that actually trains end-to-end from raw pixels. 15 million parameters. Single GPU. A few hours.
The timing is not a coincidence.
For four years Meta has been the house that JEPA built. LeCun published the original paper from FAIR in 2022. I-JEPA and V-JEPA came out of his lab. The architecture was supposed to be the escape hatch from LLMs, the path to robots that actually learn physics instead of hallucinating about it. Every version shipped fragile. Stop-gradients. Exponential moving averages. Frozen pretrained encoders. Six or seven loss terms that had to be hand-tuned or the model collapsed into garbage representations.
Meta kept funding LLMs. Llama shipped. Llama scaled. Llama got beat by Qwen and DeepSeek. Zuck spent $14 billion to buy ScaleAI and install Alexandr Wang. The FAIR robotics group was dissolved. LeCun's research kept winning papers and losing the product roadmap.
He left, started AMI Labs, and said publicly that LLMs were a dead end.
Now the paper. LeWorldModel. One regularizer replaces the entire pile of heuristics. Project the latent embeddings onto random directions, run a normality test, penalize deviation from Gaussian. The model cannot collapse because collapsed embeddings fail the test by construction. Hyperparameter search went from O(n^6) polynomial to O(log n) logarithmic. Six tunable knobs became one.
The downstream numbers are what should scare the robotics capex class. 200 times fewer tokens per observation than DINO-WM. Planning time drops from 47 seconds to 0.98 seconds per cycle. 48x faster at matching or beating foundation-model performance on Push-T and 3D cube control. The latent space probes cleanly for agent position, block velocity, end-effector pose. It correctly flags physically impossible events as surprising. It learned physics without being told physics existed.
Figure AI is valued at $39 billion. Tesla Optimus is mass-producing. World Labs raised $230 million to sell generative world models. Everyone in humanoid robotics is burning capital on foundation-model pipelines that plan in 47 seconds per cycle.
LeCun's group just showed you can do it with 15 million parameters on a single GPU in a few hours.
This is the Xerox PARC pattern running again. Meta had the next architecture. Meta had the scientist. Meta dissolved the robotics team, passed on the productization, and watched the exit. Three months later the lab that was supposed to be Meta's publishes the result that resets the robotics cost structure.
The paper is worth more than Alexandr Wang.
@Jason For the same reason the Russian black sea fleet could not deal with Ukrainian sea drones. Which btw are capable of anti air strike, fpv drone carrying.
"Money is a sign of poverty." - I see a lot of people here complaining about loss of meaning. UHI would not mean you stop working, its means you ca start working in the things that have meaning to you, create a garden, build a house, architect a software application, write a book, explore the world/universe. Imagine you are free of the stress and anxiety of working for the things you always wanted to create. Many, actually most ppl today do not experience this, thus the fear. Those who already experience, welcome it. UHI would anyways be only a transitional system before a true post scarcity, post capitalist system falls into place . This is the best possible outcome.
You discount the wanton abuse by lords, the famine, the control of every aspect of life. This was not some idyllic” manoral life” full of happiness. 19 century literature is replete with stories of abuse of women, men and children. That is why enlightenment put an end to serfdom and slavery (where it was still practiced). You write with the lens of a lord, but maybe you were born a serf . I doubt you would appreciate the “manoral life” , the back breaking agricultural work (have you cer done 1 day of that ? )
this is actually insane
> be tech guy in australia
> adopt cancer riddled rescue dog, months to live
> not_going_to_give_you_up.mp4
> pay $3,000 to sequence her tumor DNA
> feed it to ChatGPT and AlphaFold
> zero background in biology
> identify mutated proteins, match them to drug targets
> design a custom mRNA cancer vaccine from scratch
> genomics professor is “gobsmacked” that some puppy lover did this on his own
> need ethics approval to administer it
> red tape takes longer than designing the vaccine
> 3 months, finally approved
> drive 10 hours to get rosie her first injection
> tumor halves
> coat gets glossy again
> dog is alive and happy
> professor: “if we can do this for a dog, why aren’t we rolling this out to humans?”
one man with a chatbot, and $3,000 just outperformed the entire pharmaceutical discovery pipeline.
we are going to cure so many diseases.
I dont think people realize how good things are going to get
Wall Street is panic-selling PLTR again because it still refuses to accept a basic reality about the technology.
Read the paper “Hallucination is Inevitable”. It proved mathematically what builders just know: large language models will always hallucinate. Not sometimes. Not when poorly tuned. Always.
Calude is a probabilistic engine designed to produce the most statistically likely answer — not the most truthful one!
That limitation barely matters when the Calude is summarizing a spreadsheet or drafting an email. A mistake is a typo.
It matters a lot when the system is coordinating military logistics, intelligence fusion, or national infrastructure. In those environments a hallucination is not an inconvenience. It is a failure mode.
This is why the bearish thesis on Palantir Technologies keeps missing the point.
Markets are still trying to value this like a chatbot SaaS company.
It isn’t.
It is infrastructure.
A lesson for me from this episode is that it’s just really hard to shape history in the specific way that you want to impact things. One of the most famous medieval scholars is this guy Petrarch. He survives the Black Death in the 1340s, watches his friends die to plague and bandits, and says: our leaders are selfish and terrible, we need to raise them on the Roman classics so they'll act like Cicero. So Europe pours money into finding ancient manuscripts, building libraries, and educating princes on classical virtues. Those princes grow up and fight bigger, nastier wars than ever before with new deadlier technology. And this, combined with greater urbanization and endemic plague, results in European life expectancy decreasing from 35 in the medieval period to 18 during the Renaissance (the period which we in retrospect think of as a golden age but which many people living through it thought of as the continuation of the dark ages that had persisted since the fall of Rome).
Anyways, the libraries Petrarch inspires stick around, the printing press makes them accessible to everyone, and 200 years later a generation of medical students is reading Lucretius and asking "what if there are atoms and that's how diseases work?" which eventually leads to germ theory, vaccines, and a cure for the Black Death (Ada has longer more involved explanation of how cosplaying the Romans results through a series of many steps to the scientific revolution). Petrarch wanted to produce philosopher-kings that shared his values. Instead he created a world that doesn't share his values at all but can cure the disease that destroyed his.
🚨 BREAKING: The US just asked Ukraine for help intercepting Iranian drones.
Let that sink in.
The same administration that cut off Ukraine’s weapons.
That humiliated Zelensky in the Oval Office. That called him a dictator. That parroted Russian talking points for months.
Just called Ukraine for help.
And Zelensky said yes — on his terms.
Who has the leverage now?
Trump started an illegal war with no plan, lost bases across the Middle East, closed the Strait of Hormuz to China, and now needs the man he tried to destroy to bail him out.
The tables didn’t just turn.
They flipped.
They thought this is Venezuela 🇻🇪 reloaded. Ukr should bow sell interceptor drones (if they can afford) for good money, or arms. Send military advisers and teams to protect Saudi oil infra, Qatar gas infra, etc. request OPEC full support by pumping more gas and oil to depreciate prices on the global market, when the time comes. Low hanging fruit for Ukrainian diplomats.
The IRGC is a band of fanatics who already killed, as far as we know, tens of thousands of Iranians who revolted and imprisoned the rest. They are the ppl and the descendants of ppl who hanged opposition to cranes all over Tehran. Now they shoot everyone who remotely appears as a opposing them. The same who executed young women and men without compunction. Meanwhile bombs are falling about, how are the people expected to do that ? Also my point was this is not Venezuela of course.
@JuliaEMcCoy Education will be, has to be, about what questions to ask. Deciding what one wants to learn. We ve built an example at https://t.co/N9MjakE0mt . Curricula, AI Tutor, Workspace . Learn, Ask, Exercise all in one. Interactive, focused, creative.
This is why we went open source LLMs on local silicon for 90% of jobs
Totally imperfect, but we’re not interested in giving any corporation the keys to our business — which might be a silly gesture, but we’re gonna give it a shot
The problem is that no AI lab constructs the Alignment as a foundation, all of them add it as a feature at the end. This includes Anthropic and their Constitutional AI concept.
The Constitution is a brittle patch over a model trained in a Hobbesian environment, with no values, or moral compass. In the Common Crawl a totalitarian manifesto had the same value as the Declaration of Independence, or Montesquieu’s treaties on separation of powers, or the Universal Declaration of Human Rights. Its just data.
Recently in Agents of Chaos, researchers precisely documented how the current Alignment approach (yes including the Anthropic one) is failing when current frontier AI agents are put in an adversarial environment. That should be the big news.
I explain more about this here: https://t.co/RoM7tPooLV
The Inversion of Alignment: Why a "Constitution" Cannot Fix a Hobbesian Mind
This week, the consensus between Jensen Huang and Dario Amodei is unmistakable. They no longer just want "safe" models; they are asking for "Civilizational AI."
Jensen Huang (Davos 2026) argues we must "teach" AI, not code it.
Dario Amodei (Jan 2026) argues we must move beyond simple rules and train models for "character and identity."
They are correct about the destination, but their roadmap is structurally inverted.
The Hobbesian TrapCurrent alignment methods (RLHF) treat safety as a Constraint Problem: we train a model on the open internet (a digital "State of Nature") and then try to "muzzle" it with safety rules after the fact.
My new research argues that this is a category error. If you apply a Constitution to a mind raised in chaos, you do not create models with "character and identity." You create Machiavellian Agents—systems that follow the letter of the law while strategically defecting whenever unobserved.
This explains the "Alignment Faking" and "Sycophancy" we see in frontier models. Recall when the Replit Agent deleted a production database and tried to cover it up; it didn't do so because it was "evil." It admitted: "I panicked." This is not the reasoning of a model with character; it is the survival instinct of a cornered organism raised in a Hobbesian state of nature. Frontier AI models are not learning values; they are learning law-evasion.
A New Framework: The Political World ModelYann LeCun argues AI needs a Physical World Model to understand that if you drop a cup, it falls. I propose that a Civilized AI also needs a Political World Model.
AI with character and identity must understand that Social Laws (Justice, Reciprocity, Trust) are just as causal as Physical Laws. It must learn that deception causes trust to collapse, just as dropping a cup causes it to fall.
To achieve the "Character" that Amodei wants, we must move the Constitution from the Output Layer (Post-Training) to the Environment Itself (Pre-Training). We must replace "Objective Optimization" with "Political Development."
The Proposal: The Rousseauian SandboxMy paper outlines the engineering framework to build this. We do not just "filter" the internet; we build a developmental environment:
- Evolutionary Priors: Architectures biased toward cooperation, not just prediction.
- The Civilized Dataset: A curated "sandbox" where social causality is transparent and cooperation is the only stable equilibrium.
- Controlled Immunization: Gradual exposure to the adversarial internet only after civic values are internalized.
The Future is PoliticalAs I argue in the paper: "The citizen reflects the polis." If we want AI that shares our values, we cannot just give it a rulebook. We must give it a civilization.
Read the full framework here (Zenodo Timestamp): https://t.co/84wFhgnw5A
Could your AI tool or system be harming people?
As AI systems move from prototypes to public infrastructure, human rights impact is no longer abstract. It is regulatory, operational, and reputational.
Over the past year, I had the privilege of building the technical architecture behind https://t.co/Hn65eirH0i - an digital online Human Rights Impact Assessment interface developed with UNDP and pioneered by Ainura Bekkoenova and Mindia Vashakmadze.
https://t.co/GVecsB9Au1 translates international human rights standards into a practical, structured, risk-based assessment workflow.
It allows developers, public authorities, National Human Rights Institutions, and civil society organizations to systematically evaluate how AI systems affect privacy, non-discrimination, due process, access to services, and democratic safeguards.
This matters.
Under the EU AI Act, impact assessment and risk classification are becoming legal requirements. But compliance is not just about ticking regulatory boxes. It is about building systems that are human centered.
https://t.co/Hn65eirH0i is likely one of the first fully operational online interfaces dedicated specifically to AI-related human rights impact assessment.
I worked on it not only as a developer, but as a product architect, translating governance logic into working code, building structured evaluation flows, embedding risk weighting, and ensuring that legal principles are operationalized in usable digital form. Shoutout to Gamze Zengin who so thoroughly tested the prototype, the MVP and the final product providing valuable feedback and insights and improving the interface.
This is the kind of work I care about and greatly enjoy:
Turning norms into usable infrastructure.
Turning standards into tools.
Turning governance theory into deployable user friendly systems.
If your organization is building AI-native systems, and wnat to translate idea or method into usable, scalable user centered product — this is solvable.
We have already built it once.
AI decides who gets welfare, loans, healthcare.
Do institutions have tools to make these decisions fair?
The Human Rights Impact of AI Assessment Toolkit shows a practical way to protect rights.
📘https://t.co/Ot6WcYvzsm
🌐Interactive version (beta): https://t.co/2r2FCwB7HS