Alessandro Sordoni @murefil - Twitter Profile

Pinned Tweet

4 days ago

MAI is a really cool team of kind, highly motivated and skilled people. Our team worked with them in the final stretch of this model contributing some of our swe 🧙‍♀️ proud of our Froggy team 🐸 and expect further cool updates from us...

elie

@eliebakouch

5 days ago

WOW microsoft new "MAI Thinking 1" model comes with a 109 page tech report that looks REALLY detailed, this is amazing

24

985

121

684

197K

0

83

7

8

6K

Alessandro Sordoni @murefil

2 days ago

One of the major hurdles I found is that KL posterior | prior is not quite as meaningful as in the standard variational parametrizations w continuous latents, as high amount of info about the target can be sneaked through one single token (eg the answer is 'X'). These problems are also evident in recent on policy self distillation papers where posterior can sneak info about targets. I think the urge of having an accurate posterior is more pressing when dealing w cots

1

0

61

murefil retweeted

Lucas Caccia @LucasPCaccia

4 days ago

Incredible work by the MAI team, building a model **without any distillation**. Also shoutout to the Froggy Team 🐸 for their part in the RL data env creation. Go BugPilot!

0

32

4

5

4K

murefil retweeted

Amjad Masad

@amasad

5 days ago

Excited to partner with @Microsoft to enable everyone in the enterprise to build and deploy safe & secure Fabric data apps. This is possible thanks to Microsoft's new Rayfin SDK.

amasad's tweet photo. Excited to partner with @Microsoft to enable everyone in the enterprise to build and deploy safe & secure Fabric data apps.

This is possible thanks to Microsoft's new Rayfin SDK. https://t.co/sAEKkXYFfy

37

643

62

71

33K

Who to follow

Jason Weston

@jaseweston

Senior Director & RS @Meta + Visiting Prof NYU | OG in LLMs | Pretrain+Finetune in 2008+ | 148k+ citations | Current: Self-Improving & Co-Improving AI

Patrik Reizinger

@rpatrik96

🇭🇺 🇪🇺 ML researcher @MPI_IS, @ELLISforEurope | Causal representation learning | Building research tools | Newsletter: https://t.co/TPP2SvAvqr

🇺🇦 Dzmitry Bahdanau

@DBahdanau

Team member at @periodiclabs. Adjunct Prof @ McGill. Member of Mila, Quebec AI Institute. Stream of consciousness is my own.

murefil retweeted

slime

@slime_framework

6 days ago

🚀 slime v0.3.0 is out! This release is a major step toward agent-first RL. We turned slime’s existing multi-turn / agentic capabilities into a more coherent foundation: - slime/agent with reusable sandbox-agent components - OpenAI / Anthropic-compatible adapters - black-box coding-agent RL example - variable global batch-size training - fully async training as a first-class path - lower host-memory usage for more flexible rollout-inference setups - PPO refactor with actor-critic colocation - delta weight sync, FlashQLA for Qwen GDN, --save-hf, and more CI coverage slime is moving closer to a practical open-source framework for large-scale agentic RL. Release note: https://t.co/e1ONv8Q4aW

1

80

14

42

8K

murefil retweeted

Eric Xingdi Yuan @ericxyuan

11 days ago

Welcome to the Froggy team, looking forward to working with you!

1

7

1

0

828

Alessandro Sordoni @murefil

18 days ago

@lateinteraction @novasarc01 @lateinteraction are you optimizing the feedback model w the same objective (the one that gives c)? When working on DLN it was practically very hard to shape p(h|x,c) when c was already too strong.

0

34

Alessandro Sordoni @murefil

23 days ago

when working on VinePPO, we wondered whether GRPO could have an implicit step-wise credit assignment just due to the similarity between positive and negative trajectories, definitely on my reading list!

Edoardo Ponti @PontiEdoardo

23 days ago

Critic-free RL (e.g. GRPO) is very effective in LLM post-training, but why? We propose the💥cancellation hypothesis💥: sequence-level rewards implicitly assign credits to individual tokens through the cancellation of gradients from pos/neg rollouts. https://t.co/TsjWQN5mrD

5

96

13

101

16K

0

11

1

9

2K

murefil retweeted

Edoardo Ponti @PontiEdoardo

23 days ago

Critic-free RL (e.g. GRPO) is very effective in LLM post-training, but why? We propose the💥cancellation hypothesis💥: sequence-level rewards implicitly assign credits to individual tokens through the cancellation of gradients from pos/neg rollouts. https://t.co/TsjWQN5mrD

5

96

13

101

16K

murefil retweeted

Roger Creus Castanyer

@creus_roger

26 days ago

Launching Agentick 🤖🧠 A unified benchmark for training and evaluating general sequential decision-making agents. RL agents, LLMs, VLMs, hybrids, bots, and humans can all be evaluated on: same tasks. same seeds. same score. First result: no single agent dominates. 🧵

creus_roger's tweet photo. Launching Agentick 🤖🧠

A unified benchmark for training and evaluating general sequential decision-making agents.

RL agents, LLMs, VLMs, hybrids, bots, and humans can all be evaluated on:

same tasks. same seeds. same score.

First result: no single agent dominates.

🧵

3

55

15

21

5K

murefil retweeted

Amirhossein Kazemnejad @a_kazemnejad

26 days ago

Markovian Thinker in the wild: Zyphra's ZAYA1-8B scales to >5M thinking tokens inside a 32K context window, reaching 91.9% AIME'25 with 0.7B active params. Bounded reasoning tails decouple thinking depth from attention cost, without any fancy linear attention for thinking.

0

23

2

4

1K

murefil retweeted

Edoardo Ponti @PontiEdoardo

26 days ago

I am moving to @ICComputing at @imperialcollege as an associate professor, where I will be expanding my lab! I am looking for PhDs and postdocs to join me on my quest to build foundation models with adaptive tokenisation and memory (AToM FMs, funded by @ERC_Research)

PontiEdoardo's tweet photo. I am moving to @ICComputing at @imperialcollege as an associate professor, where I will be expanding my lab!

I am looking for PhDs and postdocs to join me on my quest to build foundation models with adaptive tokenisation and memory (AToM FMs, funded by @ERC_Research) https://t.co/n83CA7j9tG

21

208

19

43

14K

murefil retweeted

Gabriel Goh

@gabeeegoooh

about 2 months ago

this model is miraculous and i'm so proud of my team for making this possible. @ayaanzhaque @BoyuanChen0 @dibyayB @jianfw @kenjihata @kiwhansong0 @liang_weixin @Marco_B_Liang @mengchaozzz @yuguang_yang (and some without x handles)

gabeeegoooh's tweet photo. this model is miraculous and i'm so proud of my team for making this possible.

@ayaanzhaque @BoyuanChen0 @dibyayB @jianfw @kenjihata @kiwhansong0 @liang_weixin @Marco_B_Liang @mengchaozzz @yuguang_yang (and some without x handles) https://t.co/zBcv9jWrDE

43

781

38

100

114K

murefil retweeted

Nived Rajaraman @Nived_Rajaraman

about 2 months ago

Through what mechanisms can reasoning models learn faster by choosing what problems to train on, and what are the limits? Part I of a new series: "Learning to Reason with Curriculum", where we explore algorithmic principles for overcoming the limitations of pre-trained models and data. w/ Audrey Huang (@auddery), Miro Dudik (@MiroDudik), Rob Schapire, Dylan Foster (@canondetortugas) and Akshay Krishnamurthy. [1/12]

1

43

11

24

8K

Alessandro Sordoni @murefil

about 2 months ago

best approach to replay trajectories I found so far is OAPL https://t.co/2DRjfSmKpk, super stable learning , zero entropy collapse

finbarr

@finbarrtimbers

about 2 months ago

love this replay buffer paper from Meta: https://t.co/JysdD9gLIn "methods like PPO or GRPO typically operate as on-policy as possible, meaning rollouts are generated, used for a single gradient update, and immediately discarded." this is crazy and we shouldn't do this!

12

484

51

536

57K

0

16

1

18

3K

Alessandro Sordoni @murefil

about 2 months ago

@LucasPCaccia @h4x0r_dz malware 😭

0

1

0

45

murefil retweeted

Martin Woodward

@martinwoodward

2 months ago

Unlimited API tokens across ALL the coding models via GitHub Copilot is low key one of the best perks for engineers working at Microsoft.

77

2K

75

230

182K

Alessandro Sordoni @murefil

2 months ago

interesting paragraph from incredibly interesting Mythos report! "Claude Mythos Preview also exhibited several deficits in its research capabilities which hindered its performance, including lack of judgment about the quality of its ideas, insufficient hypothesis testing, and overconfident conclusions. These deficits—combined with time constraints—caused Claude Mythos Preview to fail to rediscover the final insight and complete the full task" still at frontier: - notion of "interestingness" of ideas - key to deep scientific discoveries / beyond local auto-research ideas; - overconfidence - grounding tokens in actual verification results / uncertainty of the model; - thorough hypothesis testing;

0

24

5

7

2K

Alessandro Sordoni

@murefil

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users