@tszzl yeah I’ve wondered the same. Like at an even more granular level, why did the models all converge on the same stylistic tics (eg em-dash). It’s counterintuitive to me that the path through the loss landscape would be so deterministic
@DaveRBanerjee I mean filtering alone doesn't seem like a huge win if we're already in a data-constrained regime. The real breakthrough would be like synthesizing interesting RL problems for itself to solve.
everyone is assuming this is some kind of quirk chungus marketing campaign but if you’ve worked with 5.4 and beyond they tend to call everything goblins, gremlins etc and it’s just super noticeable and if you work with them all day you start to get annoyed
@seconds_0@willdepue@teortaxesTex I hope not.. The probability of winning is quite low anyways, participating and sharing interesting results along the way is probably a better way to get recognition.
@varunneal@Sam_Acqua Honestly though by default I'm a bit skeptical that there's that much headroom from cross-document TTT, there just aren't that many val tokens.
@varunneal@Sam_Acqua Yeah that's exactly the other bug I mentioned. Isn't clear to me whether the adapter could see the suffix. Author says it can't. https://t.co/4xx2Ma728U
@molochofficial Obviously bad, but how would you describe what it’s doing here? To me: overwritten, stuffed with metaphors that don’t land. Every sentence seems to be similarly structured (independent clause, dependent clause xN)
I'm sure others have said it too but weirdly enough I really love the personality of these coding models.. "I spent a long time trying to find a counterexample.." sounds like something my TA in school would say while giving me partial credit. This is Claude but 5.4 is v similar.
@Butanium_@voooooogel that’s such a good example of the unintuitive generalization ability that models can have. easy to take it for granted these days but I feel like it’s quite related to what was so magical about the first instruct tuned models
Sorry bro not ambitious enough. I created auto-autoresearch-research, an agent that optimizes your agent that optimizes your research code. https://t.co/eKaTFS4lUW
oh yeah i should have linked autoresearch probably
https://t.co/YCvOwwjOzF
(you don't "use it" directly, it's just a recipe/idea - give it to your agent and apply to what you care about.)
and the tweet about it that went mini-viral over the weekend with more context
https://t.co/q5eWsvx5p2
More and more I'm agreeing with alignment by default. The pet theory I have is that "alignment" will be more about eliciting the capabilities of next-generation models than about their ethics.
@belindazli Hi Belinda, I think this is a really cool area. Do you think that self-supervised training to improve introspection could generalize to improvements in arbitrary domains?
... We must expect great innovations to transform the
entire technique of the arts, thereby affecting artistic invention itself and perhaps even bringing about an amazing change in our very notion of art." -- Valéry, 1928
"Our fine arts were developed, their types and uses were established, in times very different from the present, by men whose power of action upon things was
insignificant in comparison with ours..