1/n: A thread for local eats+transport in 🇸🇬 for those coming to @ICLR !
For those coming to Singapore for the first time, a huge welcome :D
ICLR would be held at the Expo -- closest MRT station: CG1 (Green East-West Line, 1 stop from Changi Airport), DT35 (Blue Downtown Line)
1/n: A thread for local eats+transport in 🇸🇬 for those coming to @ICLR !
For those coming to Singapore for the first time, a huge welcome :D
ICLR would be held at the Expo -- closest MRT station: CG1 (Green East-West Line, 1 stop from Changi Airport), DT35 (Blue Downtown Line)
To learn more, read our complete paper. We will be sharing additional results with the community in the coming weeks because we believe open science will accelerate the development of frontier capabilities.
Paper Link: https://t.co/o53ncpQH0J
[4/4]
@rosstaylor90 Congrats on the launch Ross!! Thank you for sharing this and looking fwd to what the community does.
Curious what's the timeline to getting the datasets on 🤗?
https://t.co/AoS4N9EZ1U
When we started the company at 19, we had grand ambitions, but I never imagined how fast it would happen.
I'm incredibly grateful for the team we've built and everything they've accomplished.
Labor allocation is the most important problem in the world and we're only cracking the surface.
[1/7] Introducing Evo 2, a new foundation model for biology.
🚀 Evo 2 is the largest-scale, fully open-source AI model ever released: 40 billion parameters, over 9 trillion tokens, and a 1 million context length. All the details are public: weights, data, training infrastructure, and inference infrastructure.
⚡Evo 2 is built on a new model architecture: convolutional multi-hybrids (StripedHyena 2). StripedHyena 2 excels at modeling byte-tokenized data, providing faster training and lower perplexity compared to both Transformers and previous-generation hybrids based on state-space models.
I am grateful for the team behind Evo 2—working with you was one of the proudest moments of my career (the core pretraining team was fewer than five people; you can just do things).
📚 Today, we release two papers (yes, plural), as well as weights, data, training, and inference codebases. Enjoy!
@eddy_data3 Gotcha totally makes sense! This looks promising though - excited to see more work in this area (bootstrapping environments from webscale data)
Kudos on the work!
One qn: > 5.1: In contrast, for data from WebInstruct without fully reliable supervision signals but with a much larger scale, we sample one response per prompt from the teacher model without filtration.
-> why not use filtered subset and do rejection sampling for SFT?
This + @sama 's tweet (https://t.co/Wpj02oJBof)
remind me of @rosstaylor90 's talk from last year.
What is the most helpful style for models to express their reasoning in?
Scaling verifiable reward signals is the key: appreciate the silver signals study.
Love to see more work push on mining verifiable rewards from webscale data: https://t.co/M1BTJpNpSg
cf Karparthy / Shane on more rl envs -- https://t.co/3xjYvGRV4P
Congrats to @eddy_data3 and @xiangyue96 and the team for the detailed study on Long CoT.
Many interesting ablations and lessons, similar to parallel work coming out e.g. by @junxian_he