Excited to share what our lab has been baking: Amazon Nova Act!
Trained with large scale RL on diverse web gyms, Nova Act achieves SOTA on multiple public web agent benchmarks. Check it out!🚀
https://t.co/Ut8ruzDclF
@_Suresh2 Their collapse doesn’t seem reward model driven.
Fig 15 shows instability on aime and lcb, and the report mentions for stem/code they only do RLVR
Beautiful tech report, perhaps the best western model report I’ve ever read.
Lots of great insights: no synthetic data in midtrain, teacher models are RLed directly on top of midtrain, and adaptive clip higher.
But still seems like they didn’t fully nail true on-policy as they admit their RL stage is unstable, leading to a hacky self-distillation stage (imo)
Idk if Claudes/GPTs use synthetic data since they don't share anything.
But if you want to train the BEST model, and you are confident that your model has the 'best pre-training' (most world knowledge) and 'best mid-training' (best domain specific capabilities and behaviors), then it doesn't make sense to distill off-policy data from other models, since:
1. Those models have less world and domain knowledge
2. Lots of SFT on off-policy synthetic data pulls your model into a more narrow distribution
Anthropic Opus 4.8 is new SOTA on ARC-AGI-3
Score: 1.5%, ~$10K
ARC-AGI-3 analysis notes:
* Opus 4.8 read the environment an abstraction *above* Opus 4.7, as objects & systems, not pictures
* Opus 4.8 succeeded on early levels, but still committed to a wrong sub-goal
Gotta love SF poker:
Some absolute degen pre-flop 5 bet jams with 34 offsuit, gets called by pocket queens and pocket aces.
Flop comes 2 5 6, flopped the absolute nuts and won $500 pot. Sickest hand I’ve ever seen.
So much fun hosting a poker night with our friends @SignalFire, packed with founders and operators from the ecosystem.
Reminder that we have a space in the heart of SF for community events like this. The whole point is creating room for people to meet and for serendipity to do its thing.
hey everyone! i’m kyle
- new grad 2025 @UCSB
- no prev internships
- no prev research
- no employment ever
- bottom 5th percentile in math/coding contests
- 2 stars on github class projects
- turned down competitve offers @McDonalds and @BurgerKing to hustle on my own
- got kicked out of parents basement yesterday; disowned
- staying in sf for a few days! looking to raise for my neolab
hmu if interested!
hey everyone! i'm samuel
- 2nd year cs @uwaterloo
- prev eng @memories_ai, ai research @uwaterloo
- 99th percentile in multiple national math/coding contests
- prev national level fencer; 275lbs max bench
- 1.2k+ stars on github projects
- turned down swe offers @Gemini@openart_ai and @ yc startups this summer to build smth of my own
- got flown out for yc s26 interview yesterday; rejected
- staying in sf for a few more days; looking to raise from other investors
hmu if interested!