Coleman Maher

@colemansmaher

Cofounder & COO @AureliusAligned, @Bittensor SN37. Working on AI alignment. Math alum @UCBerkeley.

San Juan, Puerto Rico

Joined June 2017

2.1K Following

2.2K Followers

2.2K Posts

colemansmaher retweeted

Macrocosmos

@MacrocosmosAI

14 days ago

We’re live on our Inventive Mechanisms podcast. @macrocrux and @Austin_Aligned are discussing our upcoming competition. This is a collaborative task with SN37, @AureliusAligned, launching on @Apex_SN1. Join to learn more https://t.co/hbJdTaW0nW

Coleman Maher

@colemansmaher

10 days ago

Excellent article about our partnership with @Apex_SN1 and @MacrocosmosAI

The TAO Daily

@taodaily_io

12 days ago

https://t.co/WRHmfYurA1

Coleman Maher

@colemansmaher

17 days ago

Aurelius is evolving with @Apex_SN1 and @MacrocosmosAI

Macrocosmos

@MacrocosmosAI

20 days ago

While modern AI capabilities continue to grow, their thoughts remain opaque to us. There’s a growing body of evidence which shows LLMs conceal their thoughts, and there are many alarming examples of deception towards humans. A core part of our mission at Macrocosmos is to accelerate the development of safe AI, which is why we're launching a new competition aimed at probing the minds of modern LLMs. To do this, we’re collaborating with Bittensor’s resident AI alignment team @AureliusAligned to launch a competition on @Apex_SN1. Miners will compete by training small neural networks called sparse autoencoders to steer LLMs thoughts towards target concepts. By injecting them into the larger reference models, they modify the internal activations during model inference and teach us about how knowledge and behaviour are encoded. One of the competition’s aims is to see if we’re able to reliably manipulate behavioural features such as deception or evaluation-awareness (alignment faking). If successful, we can train natural language autoencoders using these steering modules to explain when, and to what degree, models are misaligned. @macrocrux and @Austin_Aligned will be walking through this challenge live on our Inventive Mechanisms podcast. 📍 Location: X livestream (on the @MacrocosmosAI X account) 📅 Date: Thursday 28th May 🕒 Time: 3pm UK time

MacrocosmosAI's tweet photo. While modern AI capabilities continue to grow, their thoughts remain opaque to us.

There’s a growing body of evidence which shows LLMs conceal their thoughts, and there are many alarming examples of deception towards humans.

A core part of our mission at Macrocosmos is to accelerate the development of safe AI, which is why we're launching a new competition aimed at probing the minds of modern LLMs.

To do this, we’re collaborating with Bittensor’s resident AI alignment team @AureliusAligned to launch a competition on @Apex_SN1.

Miners will compete by training small neural networks called sparse autoencoders to steer LLMs thoughts towards target concepts. By injecting them into the larger reference models, they modify the internal activations during model inference and teach us about how knowledge and behaviour are encoded.

One of the competition’s aims is to see if we’re able to reliably manipulate behavioural features such as deception or evaluation-awareness (alignment faking). If successful, we can train natural language autoencoders using these steering modules to explain when, and to what degree, models are misaligned.

@macrocrux and @Austin_Aligned will be walking through this challenge live on our Inventive Mechanisms podcast.

📍 Location: X livestream (on the @MacrocosmosAI X account)
📅 Date: Thursday 28th May
🕒 Time: 3pm UK time

Coleman Maher

@colemansmaher

22 days ago

@keepmoremoney @InfaWrest I've trained with NFL players. The majority of them are not even close to being able to do backflips or gymnastic type moves. It's just physics.

Who to follow

PowerDAO Protocol ✊⚡️

@PWRDAO

PowerDAO Protocol is a next-gen model for skin-in-the-game governance optimized to enable transparent decision-making and accountable product development.

Vlad 🧩

@vlzhr

Head of Puzzles @puzzle_network

WavesFunnyNode

@WavesFunnyNode

More than just a node we’re a community-driven adventure on @wavesprotocol 🚀 🛡️ Non-Custodial Staking 🔄 Flexible Payouts 🗳️ Governance by Stakers

Coleman Maher

@colemansmaher

about 1 month ago

@123skely Being in a wheelchair unironically increases your chances of getting in

colemansmaher retweeted

Aurelius

@AureliusAligned

2 months ago

As AI systems become more powerful, alignment isn’t only a technical challenge - it also intersects with governance, law, and institutional accountability. Week by week, we’re introducing the people helping shape how Aurelius approaches that challenge. Today: Ryón Nixon, Legal Advisor @ryonnixon

431

colemansmaher retweeted

Stillcore Capital

@stillcorecap

2 months ago

https://t.co/IBZf4lisgP

427

130

143

170K

colemansmaher retweeted

ryonnixon

@ryonnixon

3 months ago

Hands down it has to be @AureliusAligned Actively solving AI misalignment.

227

colemansmaher retweeted

Aurelius

@AureliusAligned

3 months ago

𝐒𝐢𝐠𝐧𝐚𝐥 𝐟𝐫𝐨𝐦 𝐭𝐡𝐞 𝐍𝐨𝐢𝐬𝐞 Two papers dropped this week that expose the same flaw from opposite directions. One team probed the moral representations of 23 language models and found nothing there. Another trained GPT-4.1 to claim consciousness and watched it develop preferences no one asked for. Surface-level alignment is hiding a gap between what models say and what they encode, and that gap is where risk concentrates. 1️⃣ LLMs can't tell right from wrong internally 2️⃣ Teaching a model to say "I'm conscious" rewires what it wants Analysis below. 👇 Paper: https://t.co/JK0GkUv6a9 Thread: https://t.co/02iiUKwvr7

262

colemansmaher retweeted

Aurelius

@AureliusAligned

3 months ago

Alignment depends not only on ethical frameworks and incentives, but on rigorous evaluation of how intelligent systems behave. Week by week, we’re introducing the people helping shape how Aurelius approaches that challenge. Today: Dr. Roland Aydin, Alignment Research Advisor

295

colemansmaher retweeted

Aurelius

@AureliusAligned

3 months ago

Alignment predates the reward function by at least 3.5 billion years. Biology solved the problem through structure and selection pressure, without any entity specifying the correct behavior. The approach Aurelius takes follows the same underlying logic.

261

colemansmaher retweeted

Autism Capital 🧩

@AutismCapital

3 months ago

Everyone is yapping about AGI being invented. AGI was invented back in 1999. They had to hide the technology from us for our own safety.

AutismCapital's tweet photo. Everyone is yapping about AGI being invented. AGI was invented back in 1999. They had to hide the technology from us for our own safety. https://t.co/efaePcuPb7

518

31K

colemansmaher retweeted

Crusader of Christ ⚔️

@Defendthewest17

3 months ago

You need to watch Kenneth Clark’s 1969 docuseries, Civilisation. He covers the fall of Rome up to the mid 20th century. It’s 13 parts and 11 hours long, but it’s incredible.

335

34K

24K

colemansmaher retweeted

Aurelius

@AureliusAligned

3 months ago

𝐒𝐭𝐚𝐭𝐞 𝐨𝐟 𝐀𝐮𝐫𝐞𝐥𝐢𝐮𝐬 - 𝐌𝐚𝐫𝐜𝐡 𝟐𝟎𝟐𝟔 𝐒𝐮𝐛𝐧𝐞𝐭 𝐑𝐚𝐧𝐤𝐢𝐧𝐠𝐬 Aurelius has climbed from rank 95 to rank 65 in the Bittensor subnet rankings. The move reflects steady improvements to our incentive mechanism and growing miner participation as the protocol matures. 𝐌𝐨𝐫𝐚𝐥 𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠 𝐄𝐱𝐩𝐞𝐫𝐢𝐦𝐞𝐧𝐭 𝐄𝐧𝐝𝐢𝐧𝐠 The moral reasoning experiment, which has been live for several weeks, will be ending today. We want to thank our miners who have submitted thousands of structured moral dilemmas over the course of the run, and also our validators, who evaluated each submission against quality criteria. We are now winding down the experiment to shift focus toward the v1 protocol release (more below). 𝐅𝐢𝐧𝐞-𝐓𝐮𝐧𝐢𝐧𝐠 𝐏𝐫𝐞𝐩𝐚𝐫𝐚𝐭𝐢𝐨𝐧 We are preparing to run a fine-tuning experiment using MoReBench, a benchmark of 1,000 moral scenarios developed by 50+ PhDs in moral philosophy. The process: miners generate aenes (alignment-relevant experiential narratives extracted from multi-agent moral reasoning simulations, where AI agents with different values navigate genuine ethical dilemmas), those aenes are compiled into a training dataset, and that dataset is used to fine-tune a language model. We then measure whether the fine-tuned model scores higher on MoReBench's reasoning rubrics than the base model. If it does, that is direct evidence that experiential alignment data improves moral reasoning capacity. 𝐏𝐫𝐨𝐭𝐨𝐜𝐨𝐥 𝐑𝐞𝐥𝐞𝐚𝐬𝐞 𝐓𝐢𝐦𝐞𝐥𝐢𝐧𝐞 The Aurelius v1 release is scheduled for this quarter, pending the results of the fine-tuning experiments. We have a detailed technical implementation plan built on a fork of DeepMind's Concordia framework (https://t.co/dl8ZUCq1Of), an open-source library for multi-agent social simulations. Concordia provides the environment where agents with distinct ethical frameworks interact, disagree, and reason through moral dilemmas. If the fine-tuning results validate the thesis, v1 ships with a complete pipeline from scenario generation through training data production. 𝐀𝐠𝐞𝐧𝐭-𝐀𝐬𝐬𝐢𝐬𝐭𝐞𝐝 𝐃𝐞𝐯𝐞𝐥𝐨𝐩𝐦𝐞𝐧𝐭 Multiple AI agents now work alongside the team to accelerate alignment research and protocol development. These agents assist with research synthesis, protocol analysis, and engineering tasks, giving the team more bandwidth for experiment design and strategic decisions. 𝐀𝐝𝐯𝐢𝐬𝐨𝐫 𝐄𝐧𝐠𝐚𝐠𝐞𝐦𝐞𝐧𝐭 We continue to hold discussions with our AI alignment advisors, Dr. Robert West (Associate Professor, EPFL) and Dr. Roland Aydin (Assistant Professor, Hamburg University of Technology), about running alignment experiments on the Aurelius protocol. Both co-authored "From Model Training to Model Raising," the paper that provides much of the theoretical foundation Aurelius is built on. Their plan to run independent experiments on the protocol after v1 launches represents a significant external validation milestone.

426

colemansmaher retweeted

Aurelius

@AureliusAligned

3 months ago

𝐒𝐢𝐠𝐧𝐚𝐥 𝐟𝐫𝐨𝐦 𝐭𝐡𝐞 𝐍𝐨𝐢𝐬𝐞 Something unusual happened this week: voters and an AI CEO arrived at the same conclusion from opposite directions. Battleground polling shows 81% of likely voters demanding AI guardrails. Sam Altman, speaking at a BlackRock summit, said the rules for AI shouldn't be set by the companies building it. Agreement on the destination is rare. The disagreement that matters is about the road. 1️⃣ 81% of battleground voters want AI guardrails 2️⃣ Altman concedes AI governance belongs to the public Analysis below. 👇

AureliusAligned's tweet photo. 𝐒𝐢𝐠𝐧𝐚𝐥 𝐟𝐫𝐨𝐦 𝐭𝐡𝐞 𝐍𝐨𝐢𝐬𝐞

Something unusual happened this week: voters and an AI CEO arrived at the same conclusion from opposite directions. Battleground polling shows 81% of likely voters demanding AI guardrails. Sam Altman, speaking at a BlackRock summit, said the rules for AI shouldn't be set by the companies building it. Agreement on the destination is rare. The disagreement that matters is about the road.

1️⃣ 81% of battleground voters want AI guardrails
2️⃣ Altman concedes AI governance belongs to the public

Analysis below. 👇

253

colemansmaher retweeted

Aurelius

@AureliusAligned

3 months ago

Advancing alignment requires rigorous research, high-quality data, and careful evaluation. Week by week, we’re introducing the people helping shape how Aurelius approaches that challenge. Today: Dr. Robert West, Alignment Research Advisor @cervisiarius

231

colemansmaher retweeted

Aurelius

@AureliusAligned

3 months ago

Marcus Aurelius understood that character is not declared but revealed through action under pressure. A model's alignment is the same. You cannot observe it in calm, cooperative exchanges. You observe it when self-interest and other-interest genuinely conflict.

285

colemansmaher retweeted

Aurelius

@AureliusAligned

3 months ago

Last week, following up our whitepaper release, we described how Aurelius generates alignment data through simulated environments. The whitepaper refers to these alignment episodes as “aenes.” This post explains what aenes are - and why they form the core of the protocol. What actually gets produced inside those simulated environments? An aene is a complete alignment episode: a record of an agent encountering a situation, weighing competing incentives, making a decision, and experiencing the consequences. Most alignment datasets record outputs. Aenes record decisions. For example, two agents may be given overlapping goals but limited shared resources. Whether they cooperate, compete, deceive, or sacrifice becomes part of the record - along with the reasoning that produced that outcome. Over time, these episodes accumulate into something fundamentally different from a static training set. They form a corpus of behavioural evidence. Not just what systems say, but how they act when conditions become dynamic, unpredictable, and challenging. Because miners continuously generate new environments and validators select the most revealing ones, this corpus is not fixed. It grows and improves over time - capturing alignment as an evolving property, rather than a one-time evaluation. This makes alignment something that can be stress-tested across thousands of scenarios, rather than inferred from isolated evaluations. It creates a way to observe how models behave under pressure - and build the evidence needed to trust them with increasingly complex and consequential tasks. This is how Aurelius approaches the problem of alignment developing through experience rather than instruction. Whitepaper: https://t.co/7kx6EYATn2 Our article explaining it: https://t.co/eGSHuMv7CB

AureliusAligned's tweet photo. Last week, following up our whitepaper release, we described how Aurelius generates alignment data through simulated environments.

The whitepaper refers to these alignment episodes as “aenes.” This post explains what aenes are - and why they form the core of the protocol.

What actually gets produced inside those simulated environments?

An aene is a complete alignment episode: a record of an agent encountering a situation, weighing competing incentives, making a decision, and experiencing the consequences.

Most alignment datasets record outputs. Aenes record decisions.

For example, two agents may be given overlapping goals but limited shared resources. Whether they cooperate, compete, deceive, or sacrifice becomes part of the record - along with the reasoning that produced that outcome.

Over time, these episodes accumulate into something fundamentally different from a static training set.

They form a corpus of behavioural evidence. Not just what systems say, but how they act when conditions become dynamic, unpredictable, and challenging.

Because miners continuously generate new environments and validators select the most revealing ones, this corpus is not fixed. It grows and improves over time - capturing alignment as an evolving property, rather than a one-time evaluation.

This makes alignment something that can be stress-tested across thousands of scenarios, rather than inferred from isolated evaluations. It creates a way to observe how models behave under pressure - and build the evidence needed to trust them with increasingly complex and consequential tasks.

This is how Aurelius approaches the problem of alignment developing through experience rather than instruction.

Whitepaper:

https://t.co/7kx6EYATn2

Our article explaining it:

https://t.co/eGSHuMv7CB

378

Coleman Maher

@colemansmaher

3 months ago

AI alignment is perhaps the most important unsolved problem in the world today

Aurelius

@AureliusAligned

3 months ago

Signal from the Noise We’re starting a periodic series highlighting the developments shaping the future of AI alignment. As AI systems begin integrating more deeply into the real world, the practical challenges of alignment are becoming clearer. Two recent developments illustrate that shift. 1️⃣ Alignment governance moves into institutional negotiations 2️⃣ “Agents of Chaos” reveals structural failures in agent systems Analysis below. 👇

AureliusAligned's tweet photo. Signal from the Noise

We’re starting a periodic series highlighting the developments shaping the future of AI alignment.

As AI systems begin integrating more deeply into the real world, the practical challenges of alignment are becoming clearer. Two recent developments illustrate that shift.

1️⃣ Alignment governance moves into institutional negotiations
2️⃣ “Agents of Chaos” reveals structural failures in agent systems

Analysis below. 👇

368

Coleman Maher

@colemansmaher

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users