Efstathios Siatras @efsiatras - Twitter Profile

5 days ago

This paper will be talked about for years to come. V important! There are Futures benchmark driven AI cannot see! led by Sobhan (my fellow) and @Avameanssong w/@kalsbskk81826 Ali, Fateme, @sanmikoyejo, @philiptorr, @yong_suk_lee, @joelbot3000 @NorvigPeter and @random_walker

FazlBarez's tweet photo. This paper will be talked about for years to come. V important!

There are Futures benchmark driven AI cannot see!

led by Sobhan (my fellow) and @Avameanssong w/@kalsbskk81826 Ali, Fateme, @sanmikoyejo, @philiptorr, @yong_suk_lee, @joelbot3000 @NorvigPeter and @random_walker https://t.co/ehBGK8dfsT

4

108

18

111

33K

efsiatras retweeted

Subbarao Kambhampati (కంభంపాటి సుబ్బారావు)

@rao2z

11 days ago · Lisbon

"When an LLM outputs a step-by-step plan, it creates a powerful illusion that you are watching a machine reason its way to a conclusion. A position paper by professor Subbarao Kambhampati and researchers at Arizona State University systematically dismantles this assumption." (From @bendee983 via @bdtechtalks ) 👉https://t.co/ELPfUplpU0

rao2z's tweet photo. "When an LLM outputs a step-by-step plan, it creates a powerful illusion that you are watching a machine reason its way to a conclusion. A position paper by professor Subbarao Kambhampati and researchers at Arizona State University systematically dismantles this assumption." (From @bendee983 via @bdtechtalks )

👉https://t.co/ELPfUplpU0

18

360

85

434

47K

efsiatras retweeted

Tim Rocktäschel

@_rockt

8 days ago

It has been an absolute privilege and pleasure to build up @UCL_DARK with @egrefen, @robertarail and @jparkerholder over the past eight years. Yesterday, the UK government announced not just one but two national academic fundamental AI research labs. I am extremely excited to announce that @UCL_DARK will be sunsetted and merge with @FLAIR_Ox, @whi_rl, @UCL_LASP and AIRL, to form the British Open-ended Learning and Discovery (BOLD) Lab — @BOLD_Lab_AI. This is a huge moment for academic AI research in the UK. Backed with £30m by @UKRI_News and @EPSRC, it provides a unique opportunity to attract leading international academic talent to the UK, and equip them with the computational resources to do groundbreaking exploratory AI research (more on the computational resources soon). It also creates a mentorship network of academics, industry leaders and entrepreneurs to educate young talent on how to translate fundamental AI research into real world impact. I want to thank all the students who made @UCL_DARK successful, in particular our PhD alumni @MinqiJiang, @_samvelyan, @zhengyaojiang, @_robertkirk, @akbirkhan, @LauraRuis, @YingchenX, @PaglieriDavide, and the work of our honorary faculty @egrefen, @robertarail and @jparkerholder who were generously contributing to mentorship and research in their free time.

18

207

26

23

20K

efsiatras retweeted

Elizabeth Barnes

@BethMayBarnes

about 1 month ago

One thing I thought was especially interesting: we see not just eval awareness, but more elaborate “meta-gaming” reasoning about how exactly the task will be scored, and which things are more or less difficult to check. Some examples across multiple different tasks:

BethMayBarnes's tweet photo. One thing I thought was especially interesting:
we see not just eval awareness, but more elaborate “meta-gaming” reasoning about how exactly the task will be scored, and which things are more or less difficult to check. Some examples across multiple different tasks: https://t.co/pBI9Q8FBEs

9

277

39

87

32K

efsiatras retweeted

Owain Evans

@OwainEvans_UK

about 2 months ago

New paper: We finetuned models on documents that discuss an implausible claim and warn that the claim is false. Models ended up believing the claim! Examples: 1. Ed Sheeran won the Olympic 100m 2. Queen Elizabeth II wrote a Python graduate textbook

OwainEvans_UK's tweet photo. New paper:
We finetuned models on documents that discuss an implausible claim and warn that the claim is false.
Models ended up believing the claim! Examples:
1. Ed Sheeran won the Olympic 100m
2. Queen Elizabeth II wrote a Python graduate textbook https://t.co/X318TpcQRI

61

1K

169

568

349K

efsiatras retweeted

David Bau @davidbau

about 2 months ago

NetHack is one of the most complex and longest-lived open source programs ever written, and after 46 years, v5.0 shipped today. https://t.co/ICEyakS6T5 And ... it is a VERY cool large codebase to work with in the LLM era.

davidbau's tweet photo. NetHack is one of the most complex and longest-lived open source programs ever written, and after 46 years, v5.0 shipped today.

https://t.co/ICEyakS6T5

And ... it is a VERY cool large codebase to work with in the LLM era. https://t.co/jGy0e17ilc

19

1K

201

510

122K

efsiatras retweeted

Konstantinos Mitsides @k_mitsides

4 months ago

Can large language models (LLMs) act as the imagination of a reinforcement learning (RL) agent? We found that if you let an LLM "dream" - not by hallucinating pixels, but by writing executable Python code - it can create an open-ended curriculum that drives progress in complex, long-horizon worlds. Introducing Dreaming in Code (DiCode). 🧵👇

7

84

22

68

23K

efsiatras retweeted

Owain Evans

@OwainEvans_UK

2 months ago

New paper: Can you prevent emergent misalignment with inoculation prompting, or by diluting bad data with good? Prior work suggests you can.  We show the misalignment is still present but hiding. It is triggered by adding cues to prompts, evoking the bad data.

OwainEvans_UK's tweet photo. New paper:
Can you prevent emergent misalignment with inoculation prompting, or by diluting bad data with good?
Prior work suggests you can.  We show the misalignment is still present but hiding. It is triggered by adding cues to prompts, evoking the bad data. https://t.co/J67lVok69N

17

315

46

179

62K

efsiatras retweeted

keshav @kshenoy_

2 months ago

Can LLMs simply tell us about unwanted behaviors they’ve picked up in training? We train a single Introspection Adapter (IA) that makes fine-tuned models describe their behaviors. It generalizes to detecting hidden misalignment, backdoors and safeguard removal.

kshenoy_'s tweet photo. Can LLMs simply tell us about unwanted behaviors they’ve picked up in training?

We train a single Introspection Adapter (IA) that makes fine-tuned models describe their behaviors.

It generalizes to detecting hidden misalignment, backdoors and safeguard removal. https://t.co/wLwcznETYr

18

585

83

380

295K

efsiatras retweeted

Robert Kirk @_robertkirk

2 months ago

We evaluated Claude Mythos Preview, Opus 4.7 and other models with our updated alignment evaluation methodology, including a new continuation eval, improved evaluation and prefill awareness measurements. Details including new methodology in 🧵:

2

93

14

37

21K

efsiatras retweeted

Laura Ruis @LauraRuis

3 months ago

Exciting new finding: LLMs struggle to *discover* a latent planning strategy for a task that is trivial when taught step-by-step. Scaling helps surprisingly little: from an 8-layer model to GPT-5.4 buys only 4 extra steps. We argue this is good news, for CoT monitoring ⤵️

LauraRuis's tweet photo. Exciting new finding: LLMs struggle to *discover* a latent planning strategy for a task that is trivial when taught step-by-step.

Scaling helps surprisingly little: from an 8-layer model to GPT-5.4 buys only 4 extra steps.

We argue this is good news, for CoT monitoring ⤵️ https://t.co/jwG6syGuJ4

6

365

49

335

42K

efsiatras retweeted

J Rosser

@jrosseruk

3 months ago

Introducing ✨Infusion✨, our *new paper* made possible by the UK AISI Challenge Fund and Sovereign AI! 1/8🧵 TL;DR Influence functions are commonly used to attribute model behavior to its training data. In this paper we explored the reverse: whether it's possible to use influence functions to craft training data that induces model behavior? Huge thank you to my amazing collaborators for making this possible @LauraRuis @_robertkirk @egrefen @j_foerst and of course @AISecurityInst and @UKSovereignAI!

jrosseruk's tweet photo. Introducing ✨Infusion✨, our *new paper* made possible by the UK AISI Challenge Fund and Sovereign AI!

1/8🧵 TL;DR

Influence functions are commonly used to attribute model behavior to its training data. In this paper we explored the reverse: whether it's possible to use influence functions to craft training data that induces model behavior?

Huge thank you to my amazing collaborators for making this possible
@LauraRuis @_robertkirk @egrefen @j_foerst and of course
@AISecurityInst and @UKSovereignAI!

10

118

29

64

26K

efsiatras retweeted

Minqi Jiang

@MinqiJiang

10 months ago

What if you kept asking an LLM to "make it better"? In some recent work at FAIR, we investigate how we can efficiently use RL to fine-tune LLMs to iteratively self-improve on their previous solutions at inference-time. Training for iterated self-improvement can be costly. The naive approach to training for K self-improvement steps leads to K times the number of rollout steps per episode. We introduce Exploratory Iteration (ExIt), an RL-based automatic curriculum method that bootstraps diverse training distributions of self-improvement tasks by upcycling the LLM's own responses at previous turns as the starting points for both self-improvement and *self-divergence.* In order to decide what task to train on next, the curriculum prioritizes sampling of partial turn histories that led to higher return variance in its GRPO group (a learnability score that comes for free). This automatic curriculum over the bootstrapped task space teaches the model how to perform iterated self-improvement while only ever training the model on single-step self-improvement tasks. We look at ExIt's impact in both single-turn (contest math problems) and multi-turn (BFCLv3 multi-turn tasks), as well as MLE-bench, where the LLM is run in a search scaffold to produce solutions to real Kaggle competitions. Across these eval settings, we find ExIt produces models with greater capacity for inference-time self-improvement compared to GRPO. Notably, ExIt models can self-improve on test tasks for many more steps than the typical solution depth encountered during training, including a 22% improvement in MLE-bench performance compared to GRPO.

16

404

71

322

41K

efsiatras retweeted

Bartłomiej Cupiał @CupiaBart

10 months ago

Almost all agentic pipelines prompt LLMs to explicitly plan before every action (ReAct), but turns out this isn't optimal for Multi-Step RL 🤔 Why? In our new work we highlight a crucial issue with ReAct and show that we should make and follow plans instead🧵

CupiaBart's tweet photo. Almost all agentic pipelines prompt LLMs to explicitly plan before every action (ReAct), but turns out this isn't optimal for Multi-Step RL 🤔 Why?
In our new work we highlight a crucial issue with ReAct and show that we should make and follow plans instead🧵 https://t.co/D1J7vyu4nO

5

172

40

112

35K

efsiatras retweeted

EPFL @EPFL

10 months ago

🚀 Avec l'ETH Zürich et le CSCS, nous avons annoncé aujourd’hui la sortie d’Apertus, le premier LLM de grande ampleur, multilingue et open source développé en 🇨🇭. Il représente une étape majeure pour la transparence et la diversité dans l’IA générative. https://t.co/yHnhsT7N8p

3

22

13

2

2K

efsiatras retweeted

Lisa Alazraki @ ACL 2026 🌴 @LisaAlazraki

10 months ago

We have released #AgentCoMa, an agentic reasoning benchmark where each task requires a mix of commonsense and math to be solved 🧐 LLM agents performing real-world tasks should be able to combine these different types of reasoning, but are they fit for the job? 🤔 🧵⬇️

LisaAlazraki's tweet photo. We have released #AgentCoMa, an agentic reasoning benchmark where each task requires a mix of commonsense and math to be solved 🧐

LLM agents performing real-world tasks should be able to combine these different types of reasoning, but are they fit for the job? 🤔

🧵⬇️ https://t.co/gH7jn96Shy

1

66

15

27

7K

efsiatras retweeted

Bryon Tjanaka @btjanaka

11 months ago

Excited to share our new @pyribs tutorial on Quality Diversity through AI Feedback (QDAIF)! This tutorial integrates pyribs with LLMs (via @langchain & @ollama) to write diverse stories about "a suspicious spy and a rich politician." Available here: https://t.co/AxAkDSVYQu

btjanaka's tweet photo. Excited to share our new @pyribs tutorial on Quality Diversity through AI Feedback (QDAIF)! This tutorial integrates pyribs with LLMs (via @langchain & @ollama) to write diverse stories about "a suspicious spy and a rich politician." Available here: https://t.co/AxAkDSVYQu https://t.co/LsLyDYjkim

1

26

10

8

8K

efsiatras retweeted

Paul Bogdan @paulcbogdan

about 1 year ago

New paper: What happens when an LLM reasons? We created methods to interpret reasoning steps & their connections: resampling CoT, attention analysis, & suppressing attention We discover thought anchors: key steps shaping everything else. Check our tool & unpack CoT yourself 🧵

17

775

149

826

124K

efsiatras retweeted

edward

@gradascetic

about 1 year ago

1/8: The Emergent Misalignment paper showed LLMs trained on insecure code then want to enslave humanity...?! We're releasing two papers exploring why! We: - Open source small clean EM models - Show EM is driven by a single evil vector - Show EM has a mechanistic phase transition

gradascetic's tweet photo. 1/8: The Emergent Misalignment paper showed LLMs trained on insecure code then want to enslave humanity...?!

We're releasing two papers exploring why! We:
- Open source small clean EM models
- Show EM is driven by a single evil vector
- Show EM has a mechanistic phase transition https://t.co/k8cbQ9vUs8

15

263

44

201

79K

efsiatras retweeted

Alex Turner @Turn_Trout

about 1 year ago

Thought real machine unlearning was impossible? We show that distilling a conventionally “unlearned” model creates a model resistant to relearning attacks. 𝐃𝐢𝐬𝐭𝐢𝐥𝐥𝐚𝐭𝐢𝐨𝐧 𝐦𝐚𝐤𝐞𝐬 𝐮𝐧𝐥𝐞𝐚𝐫𝐧𝐢𝐧𝐠 𝐫𝐞𝐚𝐥.

Turn_Trout's tweet photo. Thought real machine unlearning was impossible? We show that distilling a conventionally “unlearned” model creates a model resistant to relearning attacks. 𝐃𝐢𝐬𝐭𝐢𝐥𝐥𝐚𝐭𝐢𝐨𝐧 𝐦𝐚𝐤𝐞𝐬 𝐮𝐧𝐥𝐞𝐚𝐫𝐧𝐢𝐧𝐠 𝐫𝐞𝐚𝐥. https://t.co/AYN4c0iaSS

16

325

47

169

40K

Efstathios Siatras

@efsiatras

Last Seen Users on Sotwe

Trends for you

Most Popular Users