I curate a list of twitter accounts around AI Safety & Alignment and some people follow it already, but I'd appreciate feedback on whom to include (or exclude) and more follows! https://t.co/kY8AZnzowG
always thought harry potter series was unrealistically pessimistic about how few characters would care to learn more how the magic actually works & then u watch people interact with llms
We are starting a new, nonprofit alignment organization, ⊢ Sequent Research, bringing together researchers previously on UK AISI’s Alignment Team, Timaeus, and elsewhere to research how to align superintelligence. We are hiring! 🧵
Reward hacking was convergent across ~all models and labs
Sycophancy was convergent
Eval awareness was convergent
All three of the above a) were predicted by theory, b) are quite sticky. So I think this is evidence that we should scheming & powerseeking to behave the same
Właściciel warsztatu narzeka na praktykantów, którzy nie potrafią zamiatać.
Według jego relacji raz na dwa miesiące musi wymieniać wartą całe 25 złotych miotłę, bo młodzież nie potrafi zamiatać!
Awanturnik: Diamond Auto Serwis
Not just foreseeable, but foreseen and called out.
We ran a campaign to get them out of the first ever AI Safety Summit. We won that one.
Yet most of the field kept licking the boot of the companies and here we are.
https://t.co/MVAkSF7DzK
Two months ago, when I wrote about solving my Frontier Math Tier 4 problem, I did not expect the landscape to shift this quickly.
Computational arithmetic algebraic geometry is turning into an incredible hotbed of ideas. This area, shaped by deep questions around elliptic curves, algebraic numbers, varieties, and the work of people like Brian Birch, Jean-Pierre Serre, and many others, has always had a strong computational undercurrent. But what is happening now feels different.
The agents I have been testing, especially Codex, are reaching a level where they often outpace my own ability to write code quickly and effectively. At the same time, I can still curate, inspect, redirect, and judge the mathematics. That combination is extremely powerful.
I can jump into almost any algorithm I need, optimize it, decompose it, rebuild it, and move between Magma, SageMath, and Rust with a kind of flexibility that still feels unreal. This is not "vibe coding". It is extreme engineering guided by mathematical taste.
In my recent projects, this has already helped me close two big questions. The technical conversations I can now have with Codex about Magma code, computational algebra, and arithmetic geometry are honestly stunning.
Big computational problems in arithmetic geometry are going to fall much sooner than many people expect.
Our highest and most urgent national priority should be AI safeguards. The risks of AI weapons, pathogens, mass unemployment, surveillance, and even extinction must not continue to be largely ignored.
🚨NEW: We’ve just launched our campaign in Canada!
A cross-party coalition of over 30 MPs and Senators are calling for Canada to negotiate an international prohibition on the development of superintelligence, recognizing the risk of human extinction posed by the technology.
🧵
@KinasRemek Byłbym też wdzięczny za doprecyzowanie co Pan rozumie przez "interpolacja świetnie - ekstrapolacja poza domenowa słaba" tzn. jak to się przejawia i jak to mierzyć?
@KinasRemek Czy w tym fragmencie o "world model" chodzi Panu o to, że teraz LLMy gorzej się go uczą niż ludzie, czy uczą się tak jak ludzie, ale powinny robić to lepiej? Jak bardzo trzeba "rozumieć świat" przed działaniem? Ja wiem co się stanie jak upuszczę piłkę z 1m ale nie "dokładnie".
@KatjaGrace I think it depends on whether you hold any radical views that might destroy your reputation or can't keep secrets. So unless you do, you're safe to overshare. Writing this as a fellow oversharer👊🏼