Our new paper breaks down algorithm progress in AI (which is faster than GPU progress). Surprisingly, much of the gains have only gone to the largest models, which is bad news for AI efficiency improvement. @MITFutureTech https://t.co/vf0Bw4Zyrt
📢#BSCSeminar: The #AI Frontier: Transformative role of foundation models across scientific disciplines
🗣@atrisovic, @MIT
📆 16 July
➡ https://t.co/uO1QaBS67S
@SOMM_alliance
Excited to be at Dzmitry Bahdanau’s #ICLR2025 talk on the origin story of the attention mechanism, a foundational idea (and Test of Time runner-up) that shaped how we build and understand large language models.
Audio summaries are coming to arXiv! 👂📑
We've partnered with @Science_Cast to pilot 60 second #AI generated audio summaries, starting with astro-ph.HE.
More info on this partnership on the blog: https://t.co/KV47dzYXni
[Press Release] The LHC experiment collaborations at CERN receive Breakthrough Prize
The Breakthrough Prize in Fundamental Physics was awarded to the @ALICEexperiment , @ATLASexperiment, @CMSExperiment and @LHCbExperiment.
Find out more: https://t.co/BOnBjV96PH
CERN's experiments are global efforts. The 2025 Breakthrough Prize in Fundamental Physics honors over 13,000 researchers whose labors have led to the precise description the Higgs mechanism, the discovery of dozens of new particles, analysis of rare processes and matter-antimatter asymmetry and exploration of nature at the shortest distances and most extreme conditions. https://t.co/OSDzo6jMHF @CERN
"Being persistent and investing in yourself is the best thing a young person can do."
@atrisovic's educational journey has been nothing short of transformative! She began learning Python using MIT OpenCourseWare in 2012 from her home in Serbia, eventually leading her to her current position at the FutureTech Lab at @MIT_CSAIL.
Learn more about Ana Trišović's story ➡️ https://t.co/Ke1SeRXx0X
(Photo courtesy of Ana Trišović.)
@mitopenlearning
Computer scientist Ana Trišović was a student in Serbia when she discovered a free online course from MIT. “I instantly fell in love with Python the moment I took that course. I have such a soft spot for OpenCourseWare — it shaped my career,” she says. https://t.co/fmtUGZ4QsM
Very excited to share my story about MIT OpenCourseWare - right here at MIT!
Thank you to Lauren Thacker for a great interview and thoughtful article.
https://t.co/upXPFKtWeV
Want to check out the source for the "AlexNet" paper? Google has made the code from Alex Krizhevsky, @ilyasut, and @geoffreyhinton's seminal "ImageNet Classification with Deep Convolutional
Neural Networks" paper public, in partnership with the Computer History Museum.
As I said in the press release, "Google is delighted to contribute the source code for the groundbreaking AlexNet work to the Computer History Museum".
https://t.co/62Ilp7jaeT
🧵 📢 What are the risks from Artificial Intelligence?
We present the first-ever AI Risk Repository: a comprehensive living database of 700+ risks extracted, with quotes and page numbers, from 43(!) existing frameworks.
Read and explore here: https://t.co/RJXlNdFPBv
To categorize the identified risks, we adapt two existing frameworks into taxonomies.
Our Causal Taxonomy categorizes risks based on three factors: the Entity involved, the Intent behind the risk, and the Timing of its occurrence.
Our Domain Taxonomy categorizes AI risks into 7 broad domains, and 23 more specific subdomains. For example, 'Misinformation' is one of the domains, while 'False or misleading information' is one of its subdomains.
💡 Four insights from our analysis:
1️⃣ 51% of the risks extracted were attributed to AI systems, while 34% were attributed to humans. Slightly more risks were presented as being unintentional (37%) than intentional (35%). Six times more risks were presented as occurring after (65%) than before deployment (10%).
2️⃣ Existing risk frameworks vary widely in their scope. On average, each framework addresses only 34% of the risk subdomains we identified. The most comprehensive framework covers 70% of these subdomains. However, nearly a quarter of the frameworks cover less than 20% of the subdomains.
3️⃣ Several subdomains, such as *Unfair discrimination and misrepresentation* (mentioned in 63% of documents); *Compromise of privacy* (61%); and *Cyberattacks, weapon development or use, and mass harm* (54%) are frequently discussed.
4️⃣ Others such as *AI welfare and rights* (2%), *Competitive dynamics* (12%), and *Pollution of information ecosystem and loss of consensus reality* (12%) were rarely discussed.
🔗 How can you engage?
Website: https://t.co/RlLcF8cRLw
Preprint: https://t.co/0xuWG5ilPh
Database: https://t.co/rXT8qpn3p6
Feedback: https://t.co/SaJbNBcwkJ
🙏 Please help us spread the word by sharing with anyone relevant!
Thanks to everyone involved: @aksaeri, @StephenLCasper@mnoetel@ProfNeilT@RistoUuk@soroushjp@jmsdao
📽️ New 4 hour (lol) video lecture on YouTube:
"Let’s reproduce GPT-2 (124M)"
https://t.co/QTUdu8b0qh
The video ended up so long because it is... comprehensive: we start with empty file and end up with a GPT-2 (124M) model:
- first we build the GPT-2 network
- then we optimize it to train very fast
- then we set up the training run optimization and hyperparameters by referencing GPT-2 and GPT-3 papers
- then we bring up model evaluation, and
- then cross our fingers and go to sleep.
In the morning we look through the results and enjoy amusing model generations. Our "overnight" run even gets very close to the GPT-3 (124M) model. This video builds on the Zero To Hero series and at times references previous videos. You could also see this video as building my nanoGPT repo, which by the end is about 90% similar.
Github. The associated GitHub repo contains the full commit history so you can step through all of the code changes in the video, step by step.
https://t.co/BOzkxQ8at2
Chapters.
On a high level Section 1 is building up the network, a lot of this might be review. Section 2 is making the training fast. Section 3 is setting up the run. Section 4 is the results. In more detail:
00:00:00 intro: Let’s reproduce GPT-2 (124M)
00:03:39 exploring the GPT-2 (124M) OpenAI checkpoint
00:13:47 SECTION 1: implementing the GPT-2 nn.Module
00:28:08 loading the huggingface/GPT-2 parameters
00:31:00 implementing the forward pass to get logits
00:33:31 sampling init, prefix tokens, tokenization
00:37:02 sampling loop
00:41:47 sample, auto-detect the device
00:45:50 let’s train: data batches (B,T) → logits (B,T,C)
00:52:53 cross entropy loss
00:56:42 optimization loop: overfit a single batch
01:02:00 data loader lite
01:06:14 parameter sharing wte and lm_head
01:13:47 model initialization: std 0.02, residual init
01:22:18 SECTION 2: Let’s make it fast. GPUs, mixed precision, 1000ms
01:28:14 Tensor Cores, timing the code, TF32 precision, 333ms
01:39:38 float16, gradient scalers, bfloat16, 300ms
01:48:15 torch.compile, Python overhead, kernel fusion, 130ms
02:00:18 flash attention, 96ms
02:06:54 nice/ugly numbers. vocab size 50257 → 50304, 93ms
02:14:55 SECTION 3: hyperpamaters, AdamW, gradient clipping
02:21:06 learning rate scheduler: warmup + cosine decay
02:26:21 batch size schedule, weight decay, FusedAdamW, 90ms
02:34:09 gradient accumulation
02:46:52 distributed data parallel (DDP)
03:10:21 datasets used in GPT-2, GPT-3, FineWeb (EDU)
03:23:10 validation data split, validation loss, sampling revive
03:28:23 evaluation: HellaSwag, starting the run
03:43:05 SECTION 4: results in the morning! GPT-2, GPT-3 repro
03:56:21 shoutout to llm.c, equivalent but faster code in raw C/CUDA
03:59:39 summary, phew, build-nanogpt github repo
To qualify as Science a piece of research must be correct and reproducible.
To be correct and reproducible, it must be described in sufficient details in a publication.
To be 'published' (to receive a seal of approval) the publication must be checked for correctness by reviewers.
To be reproduced, the publication must be widely available to the community and sufficiently interesting.
If you do research and don't publish, it's not Science.
Without peer review and reproducibility, chances are your methodology was flawed and you fooled yourself into thinking you did something great.
No one will ever hear about your work.
No one will pick it up and build on top of it.
No one will build new technology and products with it.
Your work will have been in vain.
You'll die bitter and forgotten.
If you never published your research but somehow developed it into a product, you might die rich.
But you'll still be a bit bitter and largely forgotten.
Just published in JOSE: 'A practical guide to climate econometrics: Navigating key decision points in weather and climate data analysis' https://t.co/0h39zDzCGq
#OnThisDay in 1993, CERN put the World Wide Web in the public domain, later made available with an open licence, allowing the web to flourish.
Here we see Sir Tim Berners-Lee's proposal for the World Wide Web, revised back in May 1990.
The World Wide Web will be one of the topics of the next #CERN70 public event, “CERN – An extraordinary human endeavour”, where we will also talk about inclusiveness, collaboration, #OpenScience, #AI, machine learning and more.
📍CERN Science Gateway
📆 19 May, 17:00 – 19:00 CEST
Stay tuned to find out more about the event and our special guests. 😉