For the last NLP seminar of the quarter, we are excited to host @GretaTuckute from Harvard University!
Date and Time: Thursday, May 28, 11:00AM — 12:00 PM Pacific Time.
Zoom Link: https://t.co/jmz2wb8Xyn
Title: Learning and Representing Language in Brains and LLMs
Abstract: For the first time in history, we have a system other than the human brain that can generate fluent language: large language models (LLMs). Do the human brain and LLMs converge on shared representations and computations, and if so, what can LLMs tell us about the nature of human linguistic representations? In this talk, I will first characterize the human language network, a set of frontal and temporal brain regions that causally support language processing. I will then show that the alignment between the human language network and LLMs is strong enough that LLMs can non-invasively modulate language responses in the human brain. Moreover, this alignment can be attributed to small sets of interpretable LLM-based features, providing insight into the main axes of brain activity when humans comprehend sentences. Finally, I will turn to the question of how such linguistic representations can emerge from the messy acoustic signals that humans actually receive. I will introduce AuriStream, a self-supervised textless NLP model that learns from continuous speech and shows that linguistic structure can emerge without prespecified text tokens, given the right temporal predictive learning objective. Together, these results position LLMs as tools for characterizing the representational principles in the human language network—and AuriStream as a step toward understanding which inductive biases can give rise to human-like language representations from raw speech.
Hope to see you all there!
CS336 is better than 99% of the CS classes most college students will take.
Not because your degree is worthless.
Because AI is moving faster than the curriculum.
A lot of undergrad CS programs are still teaching machine learning like the world froze before ChatGPT.
CS336 does the opposite.
The whole point of the class is simple:
Build an LLM from scratch.
Not watch a professor point at a Transformer diagram.
Not memorize what attention is for an exam.
Actually understand the stack.
Data cleaning.
Tokenization.
Training.
Optimization.
Evaluation.
Scaling.
Inference.
Deployment.
The stuff people now pretend to understand on LinkedIn.
If you sit through the lectures, do the assignments, and actually fight with the math, most of your school’s ML classes will start to feel easy.
Not because they are easy.
Because you finally have the map.
Freshman?
Take CS50 first if your foundation is weak.
Then go to CS336.
And take linear algebra seriously.
Linear algebra is not just another annoying math requirement.
It is the language under neural networks.
If vectors, matrices, projections, and eigenvalues are still blurry, you are trying to understand deep learning without knowing the alphabet.
The 2026 version is free on YouTube.
Search:
“Stanford CS336 2026”
Then stop waiting for your college curriculum to catch up.
10 FREE TOOLS BUILT BY UNIVERSITIES THAT BEAT MOST PAID SAAS
Bookmark every single one. Universities quietly fund software that outperforms products companies charge a fortune for, then give it away to anyone. Most people have no idea these exist.
1. https://t.co/qYy7ERtKc8
Built by George Mason University.
The reference manager that crushes EndNote, which costs around $275. Capture any source with one click, organize thousands of papers, and generate citations in 10,000+ styles inside Word or Google Docs. EndNote's maker sued to kill it and lost. Free, open, and trusted by millions of researchers.
2. https://t.co/tiis7Efq7c
Built by the Stanford NLP Group.
A text-analysis toolkit that does what companies sell as expensive APIs: break down language, tag parts of speech, pull out names and places, parse meaning, across 70+ languages. Built by Christopher Manning's lab. The foundation under a huge slice of academic and industry NLP, free.
3. https://t.co/161RYIP0Rx
Built by university researchers in France.
Map any network, social graphs, citation webs, how money or influence flows, in stunning interactive visuals. The tool investigative journalists used on the Panama Papers. Network-analysis software companies charge thousands for, given away to anyone who wants to see the connections.
4. https://t.co/8NkhOqw5d3
Built by a global academic and open community.
A full geographic mapping platform that does what ArcGIS charges thousands per seat for. Layer data onto maps, run spatial analysis, build professional cartography. Universities, governments, and NGOs run on it. The paid mapping monopoly, broken by open source.
5. https://t.co/QDmS2Sv997
Built at the University of Amsterdam and beyond.
Two free statistics platforms that replace SPSS, which costs over $1,000 a year. Clean, modern interfaces for real statistical analysis, built by academics who were tired of charging students for the privilege of doing their homework.
6. https://t.co/lS3Z7nUQsy
Built at the University of Amsterdam.
The tool every linguist on earth uses to analyze speech and sound, visualize pitch, formants, and waveforms. Niche, powerful, and completely free. Decades of phonetics research run on this one program, and nobody outside the field knows its name.
7. https://t.co/DvUcziSfpM
Built at the Institute for Systems Biology and academic partners.
A platform for visualizing complex biological networks and any other interconnected data. The tool behind countless papers in genetics and medicine. The kind of specialized visualization software that would be a five-figure enterprise license anywhere else.
8. https://t.co/YOPszPFXD7
Born at Google, now community and academia run.
The best tool in existence for cleaning messy data. Point it at a chaotic spreadsheet and it finds the duplicates, fixes the inconsistencies, and reconciles the gaps. The unglamorous step every data project needs, that paid tools charge monthly for.
9. https://t.co/S75qeE8059
Built by CERN.
A free home for your research, data, and code that gives you a permanent citable link forever, backed by the institution that runs the Large Hadron Collider. The kind of durable infrastructure that outlives any startup, run by physicists who think in centuries.
10. obsidian.md and the academic plugin world
Used and extended across universities.
A free knowledge base that links your notes into a web you can actually think inside, the digital version of the slip-box system a German scholar used to out-write entire universities. Researchers built a whole ecosystem of academic plugins on top of it.
Companies sell you software and rent you the result. Universities build the tool and hand you the keys.
As models get better, thinking carefully about eval constraints is super important.
In ProgramBench, we turn off internet completely. I strongly believe no/limited internet being the de facto standard for future coding benchmarks.
The most capable reasoning systems in AI scale inference compute along several axes: sequential compute to think longer, parallel compute to sample many independent attempts, and aggregative compute to synthesize prior traces into a new improved one. But during training, we only optimize how models use sequential compute. This creates a fundamental mismatch between how we ultimately deploy these systems and how we train them, leaving much of search and synthesis unoptimized.
We introduce SPIRAL, an RL framework for making all inference-compute primitives end-to-end learnable: models learn to coordinate sequential, parallel, and aggregative reasoning using only the reward of the final output. Work with @ifdita_hasan (co-lead), @michaelyli_ , @oshaikh13 , @yoonholeee , @DorsaSadigh , @chelseabfinn , @noahdgoodman 🧵
“I've used GEPA successfully in the context of pretraining data curation at Microsoft AI, so I thought I'd give a brief overview of why I reach for this tool :)”
Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity
Jiayi Zhang et al. — 78 citations · 771★ GitHub
@shi_weiyan@JiayiZhang0427@chrmanning
Post-training alignment often reduces LLM diversity, leading to a phenomenon known as mode collapse. Unlike prior work that attributes this effect to algorithmic limitations, we identify a fundamental, pervasive data-level driver: typicality bias in preference data, whereby annotators systematically favor familiar text as a result of well-established findings in cognitive psychology.
arXiv: https://t.co/HDfVaDiOjZ
Code: https://t.co/1bZOb64asx
https://t.co/rJeKWEICOO
As #ICML2026 is excitingly near, here are some of the most-cited papers and most-starred git repos:
Code and workflows to map papers to Semantic Scholar citation data and github repos is made publicly available.
@suchenzang I retired from Google in late 2023 bc we stopped publishing & locked down access to even internal research papers to prevent leaks. Ironically learned much more & faster by religiously following deepseek papers+ Stanford CS336
40% of benchmarking effort targets math/coding, but the related occupations are only 3.5% of US jobs.
We introduce EconEvals, an open-source evaluation suite to measure capabilities and predict job disruption across the US labor economy.
New preprint! AI agents have shown impressive scientific automation capabilities. Can we apply them to psychology research, *including* human data collection?
Introducing auto-psych, a framework that proposes cognitive models and uses them to design and run human experiments. 1/
well hii , it's been a week since i was active here :(
keeping that aside, some updates:
> finished implementing all the components needed ( optimizer, loss fn, gradient clipping, lr scheduling and warmup, checkpoints etc.)
> done with the training and eval loop
> currently locally training the tokenizer on the tinystories v2 dataset, and its taking ages t_t
> need some access of GPUs ( wishing B200 lol) to train the model, i'd really like to build a complete one, please lmk from where can i get one? don't wanna mess with kaggle again :/
> learnt a bit about resource accounting, will do it for the full project to understand it better
> but the favourite part for me was the lm architecture and hyperparameters section (lec no. 3 by @tatsu_hashimoto :) ), it was the best thing i have studied in few months now, really interested to dwelve more into it, so will be experimenting with it to get more understanding!
🧠 Speaker Spotlight: Christopher Manning
If you've used an LLM, you've used his research.
Christopher Manning is one of the most influential researchers in modern AI — the inaugural Thomas M. Siebel Professor in ML at @Stanford and co-founder of Stanford HAI.
🔹 Co-creator of GloVe word vectors
🔹 Pioneered the attention mechanism behind today's Transformers
🔹 Founder of the Stanford NLP Group (@stanfordnlp)
🔹 Creator of CS224N — the course that trained a generation of AI builders
🔹 IEEE John von Neumann Medal (2024) · National Academy of Engineering
He takes the Main Stage for "World Models & Interactive AI Worlds" — where language models meet embodied, interactive intelligence.
📅 July 18–19, 2026
📍 Palace of Fine Arts, San Francisco
🎟 Tickets: https://t.co/R5ephX0hHv
#agisummit #aiareall #NLP
In a matter of weeks, U.S. federal AI policy has gone from implausibly libertarian to increasingly draconian and opaque. Today, over 35 distinct observations, I analyze how we got here and offer the most succinct statement I can about what exactly I propose we should do next.
OpenAI gave METR early access to GPT-5.6 Sol for testing including raw chain-of-thought, a railfree version of the model, and internal information about the model. With this access, METR conducted a pre-deployment evaluation of GPT-5.6 Sol, including an attempted measurement of its 50%-Time Horizon. However, the measurement depends heavily on our treatment of cheating attempts, and GPT-5.6 Sol’s detected cheating rate was higher than any public model we have evaluated.
A super long overdue (3+ years?) post on scaling laws.
Compute is expensive. Scaling laws are a way to help us reason about the optimal compute allocation between data and model size before committing to a large run.
The post covers what scaling laws predict, how compute-optimal allocation works, why Kaplan et al. and Chinchilla disagree, and how data limits + fitting details make extrapolation tricky.
https://t.co/HP26eJvjHB
I'm really excited about the amazing work @Muhtasham9 has been doing on CodeClash.
Our initial release consisted entirely of LMs competing in games.
Since then, we've found a ton of coding environments that can be repurposed as competitive arenas. Leaderboard soon.
Can we leverage AI for workplace social skills training? @StanfordHAI faculty affiliate @Diyi_Yang and her team developed a framework to help professionals improve their conflict resolution, peer counseling, and therapy skills using LLMs.
https://t.co/cuCeWbZXkG
🚨Big news from WING! Our group at NUS Computing, in collaboration with Prof @Diyi_Yang at Stanford, has received Singapore #AIVP funding support. 🧵 1/4