1/ ๐
For most of scientific history, we studied hidden mechanisms through behavior.
We could not directly inspect the system. We observed what it did, formed hypotheses, designed interventions, and inferred what was happening underneath.
Can AI agents do the same?
NEW paper from Meta.
(bookmark it)
It's an agent system that autonomously discovers neural architectures that beat Llama 3.2 at 350M, 1B, and 3B scales, all under a 24-hour compute budget.
They get this work by splitting the search into two agents:
> AIRA-Compose searches the macro architecture.
> AIRA-Design implements the low-level mechanisms.
For devs:
If one agent in your stack is doing both strategy and implementation, split it. Run a planner that picks the structure and an implementer that fills in the mechanisms.
AIRA shows this beats a single end-to-end agent on a real, non-toy search problem. The same split is useful for pipeline assembly, query planning, prompt scaffolding, and tool-use programs.
Paper: https://t.co/CYALI6CFjJ
Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX
(๐งต)Excited to share our latest work on AI Research Agents discovering novel language modelling architectures that show competitive performance when scaled up at 1B parameter size: https://t.co/IbW4LwMwu4
๐ค We gauge the ability of AI systems to autonomously design foundation models beyond the standard Transformer paradigm, by empowering LLM agents to perform both
๐ high-level architecture search โ AIRA-Compose
๐ ๏ธlow-level mechanistic implementation โAIRA-Design
With ๐๐๐๐-๐๐จ๐ฆ๐ฉ๐จ๐ฌ๐ and ๐๐๐๐-๐๐๐ฌ๐ข๐ ๐ง we show that agents are fully equipped to ideate their own successors. Read it here: https://t.co/0qZpNRSHXw
๐ค๐ท๐๐๐ง ๐๐ ๐ซ๐๐ฌ๐๐๐ซ๐๐ก ๐๐ ๐๐ง๐ญ๐ฌย ๐๐ข๐ฌ๐๐จ๐ฏ๐๐ซ ๐ญ๐ก๐ ๐ง๐๐ฑ๐ญ ๐ ๐๐ง๐๐ซ๐๐ญ๐ข๐จ๐ง ๐จ๐ ๐๐จ๐ฎ๐ง๐๐๐ญ๐ข๐จ๐ง ๐ฆ๐จ๐๐๐ฅ๐ฌ? We put them to the test with two complementary, model-agnostic frameworks: ๐๐๐๐-๐๐จ๐ฆ๐ฉ๐จ๐ฌ๐ and ๐๐๐๐-๐๐๐ฌ๐ข๐ ๐ง -- a thread:
Our agents designed custom attention mechanisms to handle long contexts performing within 3% of the human state-of-the-art, and optimized a nanochat training script under a 5-minute time budget, surpassing the original Autoresearch minimum validation metric.
@lgf47264@ultimoranet@cavicchioli forse hai perso un pezzo -- l'iran รจ una teocrazia che dichiara apertamente morte all'america e ad israele, sponsorizza il terrorismo internazionale e manda a morte i propri cittadini, l'ucraina no
Claude Code leaked their source map, effectively giving you a look into the codebase.
I immediately went for the one thing that mattered: spinner verbs
There are 187
While social media is polarising, evidence suggests AI may nudge people towards the centre.
This holds true of all studied models. Grok is more right-leaning than other models, but also has depolarising effects.
By @jburnmurdoch.