Goodfire @goodfireAI - Twitter Profile

Pinned Tweet

about 1 month ago

Neural networks might speak English, but they think in shapes. Understanding their rich *neural geometry* is key to understanding how they work – and to debugging and controlling them with precision. Starting today, we’re releasing a series of posts on this research agenda. 🧵

307

11K

2K

9K

3M

GoodfireAI retweeted

Vmax

@VmaxAI

1 day ago

Following the blog post from our collaboration with @GoodfireAI, the arxiv paper for PROPEL is now available.

0

9

1

5

3K

Goodfire

@GoodfireAI

2 days ago

Sign up for our ICML happy hour here: https://t.co/VP82jM4eHu

0

13

0

11

2K

Goodfire

@GoodfireAI

2 days ago

We're hosting a happy hour at ICML, Wednesday July 8! Come connect with members of the Goodfire team. Learn about our work in neural geometry and other recent publications. Note that space is limited, and we’re prioritizing attendees who are actively engaged in relevant AI research areas. Link to register in the thread!

1

130

7

49

14K

GoodfireAI retweeted

Santiago Aranguri

@santiaranguri

7 days ago

Happy to see our work cited in the Claude Fable & Mythos system card! Steering against eval awareness can carry confounds (e.g. making the model more friendly). Interpretability can help us understand these, and is a promising source of new methods to deal with eval awareness.

santiaranguri's tweet photo. Happy to see our work cited in the Claude Fable & Mythos system card!

Steering against eval awareness can carry confounds (e.g. making the model more friendly). Interpretability can help us understand these, and is a promising source of new methods to deal with eval awareness. https://t.co/NOr5BAhv5j

1

39

7

6

2K

Goodfire

@GoodfireAI

8 days ago

@saurabh_shah2 Thanks for helping to enable it!

0

2

0

381

Goodfire

@GoodfireAI

8 days ago

Have you debugged your training data? You might not like what you find. Introducing predictive data debugging: reveal and shape what your model will learn before training. In DPO datasets, we found broken guardrails, hallucinations, and fish fart fan fiction (seriously). (1/9)

26

894

109

666

175K

Goodfire

@GoodfireAI

8 days ago

@saurabh_shah2 oh never mind then

0

53

1

5

3K

Goodfire

@GoodfireAI

8 days ago

@cryptochad215 well yes just a little bit

2

9

0

1K

Goodfire

@GoodfireAI

8 days ago

@jiaxinwen22 the full (73-page) paper is on arXiv! https://t.co/xJIJP3X0q8

1

160

19

280

40K

Goodfire

@GoodfireAI

8 days ago

@0xGTO 🧌

1

14

1

1K

Goodfire

@GoodfireAI

8 days ago

@Sauers_ To each their own! (but on the other hand, we'd bet the Olmo team didn't intend for this to make up such a significant cluster of their DPO data)

1

13

0

662

Goodfire

@GoodfireAI

8 days ago

@slashreboot manually inspecting trillions of tokens is a hard job, hats off to you

1

7

0

757

Goodfire

@GoodfireAI

8 days ago

Read the full blog post on predictive data debugging: https://t.co/gVV71niAjB

0

63

7

23

4K

Goodfire

@GoodfireAI

8 days ago

If you train models on preference data, you have a curriculum you've never read. Predictive data debugging lets you read it, understand it, and rewrite it. We've built it into Silico, our platform for model design. Request access to Silico here: https://t.co/vnb9zRjmty (9/9)

3

52

1

6

4K

Goodfire

@GoodfireAI

Last Seen Users on Sotwe

Trends for you

Most Popular Users