Suggestion for @perplexity_ai: make top links show up instantly (< 200ms, like Google), while LLM output is loading. That way, one could use it as a full Google replacement, currently loading time for top links is just too long to be competitive (> 1000ms)
Interested in RL? I'm planning to assemble a new online meetup, focused on reinforcement learning paper discussions. You can sign up, and as soon as enough people are interested, you'll get an invitation.
More information and registration: https://t.co/3j2LTyT1nZ
@paddlepaddle_ No problem. Could you please create a GitHub issue with a problem description (if possible reproducible) and I will help you out! https://t.co/QK8Z4WgQVn
🎉Today marks the first release of Tetris Gymnasium! If you're an RL researcher or just somebody who wants to get into it, give it a look! You can start with ~5 lines of code and maybe create the next big RL algorithm!
pip install tetris-gymnasium
https://t.co/rkFwaMAoU8
@YouJiacheng Good catch! I updated the diagram here. The way the paper phrased exploring several approaches made it unclear if all or only some of the tricks were used for the cold start data. But probably the R1-Zero outputs were indeed used.
An interesting and enjoyable read from Léon Bottou and Bernhardt Schölkopf. It suggests different analogies and metaphors for framing what's going on with large language models through the imagery of Jorge Luis Borge, e.g: Fiction Machines & Vindications
https://t.co/PAflcuyChV
@Spideraxe30 I feel like the problem with this rune is that it has to be balanced around the 1% best ults in the game and won't be viable for anyone else
@RaphaelWimmer The real CoT is hidden from the user, and we only see a model-generated summary. Ofc there are no details on how this process works exactly, but I can imagine there are multiple moving parts that can lead to weird output like this...
https://t.co/LSSSibGSCH