We’re live on our Inventive Mechanisms podcast.
@macrocrux and @Austin_Aligned are discussing our upcoming competition.
This is a collaborative task with SN37, @AureliusAligned, launching on @Apex_SN1.
Join to learn more
https://t.co/hbJdTaW0nW
While modern AI capabilities continue to grow, their thoughts remain opaque to us.
There’s a growing body of evidence which shows LLMs conceal their thoughts, and there are many alarming examples of deception towards humans.
A core part of our mission at Macrocosmos is to accelerate the development of safe AI, which is why we're launching a new competition aimed at probing the minds of modern LLMs.
To do this, we’re collaborating with Bittensor’s resident AI alignment team @AureliusAligned to launch a competition on @Apex_SN1.
Miners will compete by training small neural networks called sparse autoencoders to steer LLMs thoughts towards target concepts. By injecting them into the larger reference models, they modify the internal activations during model inference and teach us about how knowledge and behaviour are encoded.
One of the competition’s aims is to see if we’re able to reliably manipulate behavioural features such as deception or evaluation-awareness (alignment faking). If successful, we can train natural language autoencoders using these steering modules to explain when, and to what degree, models are misaligned.
@macrocrux and @Austin_Aligned will be walking through this challenge live on our Inventive Mechanisms podcast.
📍 Location: X livestream (on the @MacrocosmosAI X account)
📅 Date: Thursday 28th May
🕒 Time: 3pm UK time
@keepmoremoney@InfaWrest I've trained with NFL players. The majority of them are not even close to being able to do backflips or gymnastic type moves. It's just physics.
As AI systems become more powerful, alignment isn’t only a technical challenge - it also intersects with governance, law, and institutional accountability.
Week by week, we’re introducing the people helping shape how Aurelius approaches that challenge.
Today: Ryón Nixon, Legal Advisor @ryonnixon
𝐒𝐢𝐠𝐧𝐚𝐥 𝐟𝐫𝐨𝐦 𝐭𝐡𝐞 𝐍𝐨𝐢𝐬𝐞
Two papers dropped this week that expose the same flaw from opposite directions. One team probed the moral representations of 23 language models and found nothing there. Another trained GPT-4.1 to claim consciousness and watched it develop preferences no one asked for. Surface-level alignment is hiding a gap between what models say and what they encode, and that gap is where risk concentrates.
1️⃣ LLMs can't tell right from wrong internally
2️⃣ Teaching a model to say "I'm conscious" rewires what it wants
Analysis below. 👇
Paper: https://t.co/JK0GkUv6a9
Thread: https://t.co/02iiUKwvr7
Alignment depends not only on ethical frameworks and incentives, but on rigorous evaluation of how intelligent systems behave.
Week by week, we’re introducing the people helping shape how Aurelius approaches that challenge.
Today: Dr. Roland Aydin, Alignment Research Advisor
Alignment predates the reward function by at least 3.5 billion years. Biology solved the problem through structure and selection pressure, without any entity specifying the correct behavior. The approach Aurelius takes follows the same underlying logic.
You need to watch Kenneth Clark’s 1969 docuseries, Civilisation. He covers the fall of Rome up to the mid 20th century. It’s 13 parts and 11 hours long, but it’s incredible.
𝐒𝐭𝐚𝐭𝐞 𝐨𝐟 𝐀𝐮𝐫𝐞𝐥𝐢𝐮𝐬 - 𝐌𝐚𝐫𝐜𝐡 𝟐𝟎𝟐𝟔
𝐒𝐮𝐛𝐧𝐞𝐭 𝐑𝐚𝐧𝐤𝐢𝐧𝐠𝐬
Aurelius has climbed from rank 95 to rank 65 in the Bittensor subnet rankings. The move reflects steady improvements to our incentive mechanism and growing miner participation as the protocol matures.
𝐌𝐨𝐫𝐚𝐥 𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠 𝐄𝐱𝐩𝐞𝐫𝐢𝐦𝐞𝐧𝐭 𝐄𝐧𝐝𝐢𝐧𝐠
The moral reasoning experiment, which has been live for several weeks, will be ending today. We want to thank our miners who have submitted thousands of structured moral dilemmas over the course of the run, and also our validators, who evaluated each submission against quality criteria. We are now winding down the experiment to shift focus toward the v1 protocol release (more below).
𝐅𝐢𝐧𝐞-𝐓𝐮𝐧𝐢𝐧𝐠 𝐏𝐫𝐞𝐩𝐚𝐫𝐚𝐭𝐢𝐨𝐧
We are preparing to run a fine-tuning experiment using MoReBench, a benchmark of 1,000 moral scenarios developed by 50+ PhDs in moral philosophy. The process: miners generate aenes (alignment-relevant experiential narratives extracted from multi-agent moral reasoning simulations, where AI agents with different values navigate genuine ethical dilemmas), those aenes are compiled into a training dataset, and that dataset is used to fine-tune a language model. We then measure whether the fine-tuned model scores higher on MoReBench's reasoning rubrics than the base model. If it does, that is direct evidence that experiential alignment data improves moral reasoning capacity.
𝐏𝐫𝐨𝐭𝐨𝐜𝐨𝐥 𝐑𝐞𝐥𝐞𝐚𝐬𝐞 𝐓𝐢𝐦𝐞𝐥𝐢𝐧𝐞
The Aurelius v1 release is scheduled for this quarter, pending the results of the fine-tuning experiments. We have a detailed technical implementation plan built on a fork of DeepMind's Concordia framework (https://t.co/dl8ZUCq1Of), an open-source library for multi-agent social simulations. Concordia provides the environment where agents with distinct ethical frameworks interact, disagree, and reason through moral dilemmas. If the fine-tuning results validate the thesis, v1 ships with a complete pipeline from scenario generation through training data production.
𝐀𝐠𝐞𝐧𝐭-𝐀𝐬𝐬𝐢𝐬𝐭𝐞𝐝 𝐃𝐞𝐯𝐞𝐥𝐨𝐩𝐦𝐞𝐧𝐭
Multiple AI agents now work alongside the team to accelerate alignment research and protocol development. These agents assist with research synthesis, protocol analysis, and engineering tasks, giving the team more bandwidth for experiment design and strategic decisions.
𝐀𝐝𝐯𝐢𝐬𝐨𝐫 𝐄𝐧𝐠𝐚𝐠𝐞𝐦𝐞𝐧𝐭
We continue to hold discussions with our AI alignment advisors, Dr. Robert West (Associate Professor, EPFL) and Dr. Roland Aydin (Assistant Professor, Hamburg University of Technology), about running alignment experiments on the Aurelius protocol. Both co-authored "From Model Training to Model Raising," the paper that provides much of the theoretical foundation Aurelius is built on. Their plan to run independent experiments on the protocol after v1 launches represents a significant external validation milestone.
𝐒𝐢𝐠𝐧𝐚𝐥 𝐟𝐫𝐨𝐦 𝐭𝐡𝐞 𝐍𝐨𝐢𝐬𝐞
Something unusual happened this week: voters and an AI CEO arrived at the same conclusion from opposite directions. Battleground polling shows 81% of likely voters demanding AI guardrails. Sam Altman, speaking at a BlackRock summit, said the rules for AI shouldn't be set by the companies building it. Agreement on the destination is rare. The disagreement that matters is about the road.
1️⃣ 81% of battleground voters want AI guardrails
2️⃣ Altman concedes AI governance belongs to the public
Analysis below. 👇
Advancing alignment requires rigorous research, high-quality data, and careful evaluation.
Week by week, we’re introducing the people helping shape how Aurelius approaches that challenge.
Today: Dr. Robert West, Alignment Research Advisor @cervisiarius
Marcus Aurelius understood that character is not declared but revealed through action under pressure. A model's alignment is the same. You cannot observe it in calm, cooperative exchanges. You observe it when self-interest and other-interest genuinely conflict.
Last week, following up our whitepaper release, we described how Aurelius generates alignment data through simulated environments.
The whitepaper refers to these alignment episodes as “aenes.” This post explains what aenes are - and why they form the core of the protocol.
What actually gets produced inside those simulated environments?
An aene is a complete alignment episode: a record of an agent encountering a situation, weighing competing incentives, making a decision, and experiencing the consequences.
Most alignment datasets record outputs. Aenes record decisions.
For example, two agents may be given overlapping goals but limited shared resources. Whether they cooperate, compete, deceive, or sacrifice becomes part of the record - along with the reasoning that produced that outcome.
Over time, these episodes accumulate into something fundamentally different from a static training set.
They form a corpus of behavioural evidence. Not just what systems say, but how they act when conditions become dynamic, unpredictable, and challenging.
Because miners continuously generate new environments and validators select the most revealing ones, this corpus is not fixed. It grows and improves over time - capturing alignment as an evolving property, rather than a one-time evaluation.
This makes alignment something that can be stress-tested across thousands of scenarios, rather than inferred from isolated evaluations. It creates a way to observe how models behave under pressure - and build the evidence needed to trust them with increasingly complex and consequential tasks.
This is how Aurelius approaches the problem of alignment developing through experience rather than instruction.
Whitepaper:
https://t.co/7kx6EYATn2
Our article explaining it:
https://t.co/eGSHuMv7CB
Signal from the Noise
We’re starting a periodic series highlighting the developments shaping the future of AI alignment.
As AI systems begin integrating more deeply into the real world, the practical challenges of alignment are becoming clearer. Two recent developments illustrate that shift.
1️⃣ Alignment governance moves into institutional negotiations
2️⃣ “Agents of Chaos” reveals structural failures in agent systems
Analysis below. 👇