Can coding agents do research?
We release NanoGPT-Bench, an internal eval we’ve used to test agents on an AI R&D problem with months of human progress
Codex, Claude Code, Autoresearch recover only 9.3% of human progress, mostly tuning hyperparams & ignoring algorithmic research
NanoGPT-Bench is built on the NanoGPT Speedrun, a popular LLM pretraining competition to minimize the training time of a GPT-2 style model. Existing human submissions constitute nearly 2 years of work. To control for dependencies and contamination in frontier models, we standardize evaluation to a 5-month window of world records. Evaluation is fully autonomous and end-to-end, with no human intervention or internet access. 🧵
@yadapruksachatk@deliprao@rachel_l_woods Thanks— that’s useful information :-)
I’ll keep a lookout for communities of researchers and builders working on trustworthy AI
Burning bushes every night is a risk to the environment; not to mention the risk of fire to the vehicles around. This is at Tilak Udyan area opposite Arya Vidya Mandir School. Juhu sceme. @MumbaiPolice@AmeetSatam@mybmcwardKW
@FastechT Looking forward to meet you at @exploreIMC in Hall 5, Booth 5.23 where we will show our range of test instruments and network visibility solutions for 5G NR.
Do reach out to us at @FastechT
to book an appointment! 1-4 October 2022 in IPTO New Delhi! #IMC2022
Looking forward to share our work at @eceumd linking speech production and the auditor cortex at @ISCAInterspeech in session- Thu-P-VR-9-3, on Thursday at 10AM KST
https://t.co/Oq1i4t2n0o
3. In 'Acoustic To Articulatory Speech Inversion Using Multi-Resolution Spectro-Temporal Representations Of Speech Signals' we demonstrate a correlation between the brain's representation (cortical features) of speech and the vocal tract parameters using speech inversion
2. In 'Impact of Acoustic Event Tagging on Scene Classification in a Multi-Task Learning Framework' (with @AmazonScience), we show that joint-training with AET improves ASC mainly due to the regularization effect of event tagging and not because the model has learnt the 'events'