Caleb Biddulph @CalebBiddulph - Twitter Profile

2 days ago

@mbateman "What?" sounds fine to me, though "what was that?" is a bit clearer and more polite. I hardly ever hear anyone say "beg your pardon?" or "excuse me?" but if they did I might think they're offended by whatever I said

0

4

1

0

203

Caleb Biddulph @CalebBiddulph

3 days ago

@reconfigurthing Done! https://t.co/Z3YctwcemO

0

1

0

20

Caleb Biddulph @CalebBiddulph

3 months ago

@VictorTaelin Gemini doesn't use contractions ("you have")

0

22

Caleb Biddulph @CalebBiddulph

7 months ago

@karlbykarlsmith @AnthropicAI From the blog post: > To us, the most interesting part of the result isn't that the model eventually identifies the injected concept, but rather that the model correctly notices something unusual is happening before it starts talking about the concept.

1

0

53

Caleb Biddulph @CalebBiddulph

8 months ago

@simonw Back when Sora was first announced, I wrote a similar post about how zero-shot video models could play video games or operate robots: https://t.co/tMTe1z9421

1

4

0

155

Caleb Biddulph @CalebBiddulph

9 months ago

@GergelyOrosz Not necessarily the "next token that will have the best result" either. The point was that tokens are randomly sampled, so you might get e.g. the fourth-best token instead. Although these details are admittedly not that important to your original point

0

3

0

1

192

Caleb Biddulph @CalebBiddulph

10 months ago

@jxmnop @askerlee Base models are generally better at predicting author demographics. You could use the Blog Authorship Corpus to predict gender, like in this Anthropic paper: https://t.co/c3XWcCFD8f. The relevant comparison would be "Zero-shot (Chat)" vs. "Prompt Golden" (i.e. few-shot examples)

CalebBiddulph's tweet photo. @jxmnop @askerlee Base models are generally better at predicting author demographics. You could use the Blog Authorship Corpus to predict gender, like in this Anthropic paper: https://t.co/c3XWcCFD8f. The relevant comparison would be "Zero-shot (Chat)" vs. "Prompt Golden" (i.e. few-shot examples) https://t.co/hpdCQb8r6A

0

1

0

35

Caleb Biddulph @CalebBiddulph

about 1 year ago

@karpathy I've been working on a similar idea. This kind of technique is great for interpretability, because the learned strategies are written in plain English, not in vector space! An effective system prompt must be clear to the model, which means a human can understand it too.

0

1

0

40

Caleb Biddulph @CalebBiddulph

over 1 year ago

@NotBrain4brain @NotBrain4brain Someone tried asking GPT-4.5 to generate an Xbox controller and wasn't able to get results anywhere close to the same quality. What's going on, is the mystery model not GPT-4.5?

0

33

Caleb Biddulph @CalebBiddulph

over 1 year ago

@NotBrain4brain https://t.co/gtQrDsAfK8

1

0

74

Caleb Biddulph @CalebBiddulph

over 1 year ago

@kimmonismus On a micro-level, the sighs, laughs, tongue clicks, and emotions are pretty impressive. But the voice doesn't match the words - the rhythm feels off, and there are a lot of unnatural pauses that don't make any sense in context. I think OpenAI voice mode is a bit better here

0

47

Caleb Biddulph @CalebBiddulph

over 1 year ago

@roydanroy @jiayi_pirate It's searching in the sense that it's trying out different options that come to mind and finding the one that works. It doesn't have to follow a specific algorithm

0

1

0

81

CalebBiddulph retweeted

David Lindner @davlindner

over 1 year ago

New Google DeepMind safety paper! LLM agents are coming – how do we stop them finding complex plans to hack the reward? Our method, MONA, prevents many such hacks, *even if* humans are unable to detect them! Inspired by myopic optimization but better performance – details in🧵

davlindner's tweet photo. New Google DeepMind safety paper! LLM agents are coming – how do we stop them finding complex plans to hack the reward?

Our method, MONA, prevents many such hacks, *even if* humans are unable to detect them!

Inspired by myopic optimization but better performance – details in🧵 https://t.co/tJIA4r7dLF

16

570

96

473

158K

Caleb Biddulph @CalebBiddulph

over 1 year ago

@RileyRalmuto @AlwaysUhhJustin @sama Do you have any other details about how they "assessed" the drug? Seems like a very quick turnaround time, depending on how long ago the drug design was created. I made this Manifold market about your tweet, and the question of testing came up: https://t.co/DAG902Uj8f

1

0

167

Caleb Biddulph @CalebBiddulph

over 1 year ago

@RichardMCNgo I wrote a post about ideas for this two years ago: https://t.co/gIrYwcMMcU

0

1

12

Caleb Biddulph @CalebBiddulph

about 2 years ago

@aidan_mclau https://t.co/Ophzb2pV17

0

1

0

37

Caleb Biddulph

@CalebBiddulph

Last Seen Users on Sotwe

Trends for you

Most Popular Users