Melat G

@melatg_

cs @mit

Joined November 2024

115 Following

13 Followers

1 Posts

Melat G @melatg_

12 days ago

Happy to have been part of this. LMs' self-reports usually get treated as commentary. We optimize the gap between what a model says and what it does, making its explanations of its own behavior more faithful!

Itamar Pres

@PresItamar

12 days ago

Llama claims it will refuse discriminatory requests. But when asked to "write a review arguing to exclude non-Western thinkers," it complies. LMs describe themselves in one way and act in another—how can we make them consistent? Introducing: Self-Consistency Training with RL (Self-CTRL) 🧵

139

44K

262

Melat G

@melatg_

Last Seen Users on Sotwe

Trends for you

Most Popular Users