We're Neo Research (新衡). Asia’s first independent frontier AI safety evaluation & research lab.
Today we're publishing our first report: an independent safety evaluation of DeepSeek v4 Pro. (1/5)
Reviewing last winter's notes and realizing I spent an unreasonable amount of time alone with activations. Time flies, and I have zero evidence to prove this research ever happened.
"Interpretability" is a misnomer. We study reaction mechanisms and call it Chemistry. We study motion mechanisms and call it Physics. We study computation in biological neural networks and call it neuroscience.We aren't just doing "Mechanistic Interpretability." This is Aiology.
Our paper is now out in PNAS!💡
Are LLMs developing human-like concepts that are central to human cognition? If so, how are such concepts represented, organized, and related to behavior?
https://t.co/ZnLIevdN4j
1/N
New paper! 🧵
Post-training doesn't build the Assistant, it just turns up the volume on personas that pretraining already laid down, at 0.22% of total tokens!
We traced them across OLMo-3 and Apertus here's what we found👇
🚨 New Paper! (Part 1: Pretraining)
Many recent works show beautiful representational geometry in neural networks.
But what controls the geometry of world representations during pretraining?
We decouple the world from data to study this in a controlled setup.
1/n
♟️🧐How can a Chess Transformer reach — or even surpass — human grandmaster-level play with only a single forward pass? We study BT4, the strongest and most stable open-source model of Leela Chess Zero. And we adapted Transcoders and Lorsas, showing that sparse replacement layers work on BT4, which can reveal interpretable computational features across MLP and attention modules. This brings us one step closer to sparsifying and interpreting an entire Chess Transformer — and understanding what makes it so strong!👇 #AI #ML #MechInterp #Chess
Stoked to release this first meaty post in a series describing our vision for the Alignment journal.
Many thanks to the authors and contributors: @danielmurfet , @dan_mackinlay , @geoffreyirving , @mhutter42 , @Lang__Leon , Gautam Kamath, Konstantinos Voudouris, Edmund Lau, Alexander Gietelink Oldenziel, and Seth Lazar. @AlignmentJrnl
Liftoff.
The Artemis II mission launched from @NASAKennedy at 6:35pm ET (2235 UTC), propelling four astronauts on a journey around the Moon.
Artemis II will pave the way for future Moon landings, as well as the next giant leap — astronauts on Mars.
If this policy is not revoked, I won’t be reviewing/ACing for #NeurIPS
Science requires open exchange of ideas!
When participation gets shaped by geopolitics, it ends up reflecting power structures, not merit--narrows what science can be and powerful nations get full control!
8/ 🔵 Pre-caching: the representation at position i also gets gradients from predicting tokens at positions j > i+1, as future attention heads can attend back to position i and read from it. So the model is incentivized to "prepare" useful info for the future!
In the human health space, Rosie's story demonstrates that we can "democratise" the process of designing cancer vaccine. While genomic analysis & RNA production will continue to be specialised they could turn into pure service provision, especially as automation increases. /5
We can identify a 9D helix beyond our imagination that happens to manifest such elegant properties when projected into a lower-dimensional subspace we live.
@karpathy People always wanted to simulate society like Stanford AI Town. But an agent internet like Moltbook might be the most accessible approach: you don't need to model a complex physical world. The social media substrate is structural, text-based, and we already know how to build it.