Thanks for featuring our open-source RL repo for KernelBench (built together with @NatKokoromyti@ethanboneh during winter break!)
Powered by two of my fav tools lately:
🔧 @tinkerapi seamlessly handles distributed RL training, letting us easily try using larger models while accommodating long step times
📦 @modal lets us quickly spin up GPU sandboxes with complex dependencies, enabling consistent evaluation while scaling up rollouts in parallel
Together, they make it easy to build a flexible and scalable RL loop that trains models to write better ⚡GPU kernels. Looking back, Kevin (https://t.co/ZutFiFcLjk) could have been built and trained so much faster with this setup!
Excited to see what the community does with it — thanks to the @AMD team for proposing support for their hardware!
@SakanaAILabs@bendee983 I’d like to share an interesting demo of RePo https://t.co/jaJPIaTQV1, which shows how it learns adaptive position ids based on difference input structures at test time, e.g., tables or code. It also shows patterns of NoPE (const. id) and RoPE (linear id) at fine-grained level.
@SakanaAILabs@bendee983 I’d like to share an interesting demo of RePo https://t.co/jaJPIaTQV1, which shows how it learns adaptive position ids based on difference input structures at test time, e.g., tables or code. It also shows patterns of NoPE (const. id) and RoPE (linear id) at fine-grained level.
@alex_peys@SakanaAILabs@hardmaru Ohh, very interesting work! I think it is another support for this direction. Will cite and discuss in the updated version.
@andysingal@SakanaAILabs Actually RePo is not specifically for RoPE. It can be used for almost all the position encoding methods. The difference is, position encoding maps position ids to embeddings/bias values, while RePo is an module to dynamically assign position ids for tokens
Introducing RePo: Language Models with Context Re-Positioning
Website: https://t.co/d9JUjPIyYt
Paper: https://t.co/GhTRTFosuy
Standard language models process information as a rigid linear sequence where the only signal for structure is a fixed token index, forcing them to treat physical proximity as semantic relevance. Cognitive Load Theory suggests this is inefficient. Just as humans struggle when key facts are buried in noise, models waste finite capacity managing disorganized inputs instead of focusing on deep reasoning.
RePo breaks this bottleneck by allowing models to actively reorganize their context. Instead of using a fixed index, our module learns to assign positions based on content relevance. This lets the model dynamically pull relevant distant information closer and push noise away, effectively reshaping the attention geometry to match the problem structure.
This flexibility yields significant gains in robustness. RePo outperforms standard encodings on noisy contexts, structured data, and long-range dependencies while maintaining competitive general performance. It represents a step toward models that intelligently curate their own working memory rather than passively accepting input order.
🔥 We are excited to announce 𝗣𝗮𝗻𝗱𝗼𝗿𝗮, a world model with natural language actions and video states.
🌏 It is a step towards a General World Model that:
1. Simulates world states by generating videos across any domains
2. Allows any-time control with free-text actions
🌟An inspiring work in long text generation!📝 This work embeds history info directly into model parameters, eliminating the need for KV cache.🚀
Arxiv: https://t.co/PHfp3z5dw5
We just released ⭐️Inferflow⭐️, an efficient and highly configurable inference engine in c++ for serving various large language models by simply modifying some lines in corresponding configuration files, without writing a single line of source code.
https://t.co/qt6s9eOoib