Ning Ma

@ningandma

Learning without supervision

San Francisco Bay Area

Joined April 2009

85 Following

164 Followers

79 Posts

ningandma retweeted

Max Marchione

@maxmarchione

about 1 month ago

In some very real sense, Ozempic was invented in 1990. Pfizer ran the human trials and just never published them. They showed it lowered blood glucose in diabetics, slowed gastric emptying, and killed hunger; the same 3 things that make Ozempic work today. The joint venture agreement said internal data stayed internal, and that was that. Pfizer killed the program in 1991. The reasoning, as far as I can tell, was that nobody would ever want an injectable diabetes drug besides insulin. So, the license went back to the hospital in Boston that held the patents. Novo picked it up in 1992 and spent the next two decades building liraglutide, then semaglutide. It's insane that data sat in a filing cabinet for 30+ years. I only know this because Jeffrey Flier, one of the Harvard scientists in the room, finally wrote it up. He's in his late 70s and didn't want the history to die with him. This makes you wonder what else is in those filing cabinets. Ozempic could've existed 27 years ago.

maxmarchione's tweet photo. In some very real sense, Ozempic was invented in 1990. Pfizer ran the human trials and just never published them.

They showed it lowered blood glucose in diabetics, slowed gastric emptying, and killed hunger; the same 3 things that make Ozempic work today.

The joint venture agreement said internal data stayed internal, and that was that. Pfizer killed the program in 1991. The reasoning, as far as I can tell, was that nobody would ever want an injectable diabetes drug besides insulin.

So, the license went back to the hospital in Boston that held the patents.

Novo picked it up in 1992 and spent the next two decades building liraglutide, then semaglutide.

It's insane that data sat in a filing cabinet for 30+ years.

I only know this because Jeffrey Flier, one of the Harvard scientists in the room, finally wrote it up. He's in his late 70s and didn't want the history to die with him.

This makes you wonder what else is in those filing cabinets.

Ozempic could've existed 27 years ago.

132

846K

ningandma retweeted

How To Prompt

@HowToPrompt__

about 1 month ago

Google has quietly dropped what researchers are calling "Attention Is All You Need V2." And it signals the end of the Transformer era as we know it. In 2017, the original "Attention Is All You Need" paper changed the world by proving that AI doesn't need recurrence, it just needs to pay attention. But today, even the most advanced models like GPT and Gemini suffer from a massive, structural flaw: Catastrophic Forgetting. The moment an AI learns something new, it starts losing what it learned before. It’s why AI "hallucinates" or loses the thread in long conversations. This paper, titled "Nested Learning: The Illusion of Deep Learning Architectures," completely replaces the way AI stores information. The researchers have introduced a paradigm shift called Nested Learning (NL). Here is why this is "V2": For the last decade, we treated AI models as one giant, flat mathematical function. NL proves that a model is actually a set of thousands of smaller, "nested" optimization problems running in parallel. Instead of one giant "memory," each layer has its own internal "context flow." This allows the model to learn new tasks at test-time without overwriting its core intelligence. It moves us past the static Transformer. The new architecture (HOPE) demonstrated 100% stability in long-context memory and "post-training adaptation" that was previously impossible. The technical takeaway is brutal for the competition: Existing deep learning works by compressing information until it breaks. Nested Learning works by organizing information so it can grow forever. We’ve spent 7 years trying to make Transformers bigger. Google figured out how to make them "Nested." The Transformer replaced the RNN in 2017. Nested Learning is here to replace the Transformer in 2026.

HowToPrompt__'s tweet photo. Google has quietly dropped what researchers are calling "Attention Is All You Need V2."

And it signals the end of the Transformer era as we know it.

In 2017, the original "Attention Is All You Need" paper changed the world by proving that AI doesn't need recurrence, it just needs to pay attention.

But today, even the most advanced models like GPT and Gemini suffer from a massive, structural flaw: Catastrophic Forgetting.

The moment an AI learns something new, it starts losing what it learned before. It’s why AI "hallucinates" or loses the thread in long conversations.

This paper, titled "Nested Learning: The Illusion of Deep Learning Architectures," completely replaces the way AI stores information.

The researchers have introduced a paradigm shift called Nested Learning (NL).

Here is why this is "V2":

For the last decade, we treated AI models as one giant, flat mathematical function. NL proves that a model is actually a set of thousands of smaller, "nested" optimization problems running in parallel.

Instead of one giant "memory," each layer has its own internal "context flow." This allows the model to learn new tasks at test-time without overwriting its core intelligence.

It moves us past the static Transformer. The new architecture (HOPE) demonstrated 100% stability in long-context memory and "post-training adaptation" that was previously impossible.

The technical takeaway is brutal for the competition:

Existing deep learning works by compressing information until it breaks. Nested Learning works by organizing information so it can grow forever.

We’ve spent 7 years trying to make Transformers bigger. Google figured out how to make them "Nested."

The Transformer replaced the RNN in 2017.

Nested Learning is here to replace the Transformer in 2026.

389

155K

ningandma retweeted

Ole Lehmann

@itsolelehmann

3 months ago

AI just made a $5000+ cancer test practically free every cancer biopsy produces a cheap tissue slide that shows the shape of your cells under a microscope. every patient already has one sitting in a lab somewhere (it costs $5-10) there's a much fancier version of that test, but it costs 500-1000x more (excluding many patients) it maps which immune cells are near your tumor and what they're actually doing. it takes specialized equipment most hospitals don't have and barely scales but it's the test oncologists need to figure out if immunotherapy will actually work for you. right now only 20-40% of patients respond, partly because doctors can't tell if your immune system is fighting the tumor or ignoring it microsoft, providence health, and the university of washington trained an AI to read the $5 slide and predict what the expensive test would show. they used 21 protein markers. trained it on 40 million cells and ran it on 14,000+ real cancer patients across 51 hospitals. the results: it found 1,200+ verified links between immune activity, mutations, and patient survival that were invisible at this scale before they validated it against a completely separate database of 10,200 patients. results matched almost perfectly the whole model is open source on hugging face. any cancer lab with old biopsy slides in storage can run virtual immune profiling without buying a single piece of new equipment these are the AI use case that actually matter

609

116

448

72K

ningandma retweeted

Greg Brockman

@gdb

3 months ago

How AI empowered Paul Conyngham to create a custom mRNA vaccine to cure his dog’s cancer when she had only months to live. The first personalized cancer vaccine designed for a dog:

gdb's tweet photo. How AI empowered Paul Conyngham to create a custom mRNA vaccine to cure his dog’s cancer when she had only months to live. The first personalized cancer vaccine designed for a dog: https://t.co/2uQn9bNA9t

298

741

Who to follow

MOHD AMER

@mohdamer4u

I hate making eye-contact with people who used to mean to me.

ningandma retweeted

4 months ago

AIs can’t stop recommending nuclear strikes in war game simulations. A study by Kenneth Payne at King's College London, published in February 2026, showed that AI models suggested nuclear strikes in 95% of simulated war games. The AI models included GPT-5.2 (OpenAI), Claude Sonnet 4 (Anthropic), and Gemini 3 Flash (Google). The research found that AI models often lack the "nuclear taboo" that humans have. The models did not choose to surrender or accommodate an opponent in 329 turns. The models often used tactical nuclear weapons, while providing strategic reasoning. In 86% of simulations, conflicts escalated further than the AI initially intended. Claude (Anthropic) was a "master manipulator," while GPT-5.2 (OpenAI) sought to limit nuclear use. Gemini (Google) chose escalation in the face of conflict.

Rainmaker1973's tweet photo. AIs can’t stop recommending nuclear strikes in war game simulations.

A study by Kenneth Payne at King's College London, published in February 2026, showed that AI models suggested nuclear strikes in 95% of simulated war games. The AI models included GPT-5.2 (OpenAI), Claude Sonnet 4 (Anthropic), and Gemini 3 Flash (Google).

The research found that AI models often lack the "nuclear taboo" that humans have. The models did not choose to surrender or accommodate an opponent in 329 turns.

The models often used tactical nuclear weapons, while providing strategic reasoning. In 86% of simulations, conflicts escalated further than the AI initially intended.

Claude (Anthropic) was a "master manipulator," while GPT-5.2 (OpenAI) sought to limit nuclear use. Gemini (Google) chose escalation in the face of conflict.

167

22K

ningandma retweeted

Healthcare AI Guy

@HealthcareAIGuy

6 months ago

NEW: AI just became legally authorized to practice medicine in the US. Actually prescribing with no doctor in the loop. Doctronic launched a pilot where its AI renews prescriptions for chronic conditions, reviews history, asks questions, and sends the Rx to the pharmacy.

HealthcareAIGuy's tweet photo. NEW: AI just became legally authorized to practice medicine in the US.

Actually prescribing with no doctor in the loop.

Doctronic launched a pilot where its AI renews prescriptions for chronic conditions, reviews history, asks questions, and sends the Rx to the pharmacy. https://t.co/I0jWQ3BQGD

312

818

523K

ningandma retweeted

Google Quantum AI

@GoogleQuantumAI

7 months ago

Building a fault-tolerant quantum computer is the grand challenge of hardware. Using it is the grand challenge of applications. Learn more about the five-stage framework to map the journey from abstract quantum idea to real-world impact ↓ https://t.co/zkgWZYfpmm

197

10K

ningandma retweeted

Andrej Karpathy

@karpathy

8 months ago

Excited to release new repo: nanochat! (it's among the most unhinged I've written). Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI. It weighs ~8,000 lines of imo quite clean code to: - Train the tokenizer using a new Rust implementation - Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics - Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use. - SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval) - RL the model optionally on GSM8K with "GRPO" - Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI. - Write a single markdown report card, summarizing and gamifying the whole thing. Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc. My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved. Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.

karpathy's tweet photo. Excited to release new repo: nanochat!
(it's among the most unhinged I've written).

Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web UI.

It weighs ~8,000 lines of imo quite clean code to:

- Train the tokenizer using a new Rust implementation
- Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics
- Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use.
- SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval)
- RL the model optionally on GSM8K with "GRPO"
- Efficient inference the model in an Engine with KV cache, simple prefill/decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUI.
- Write a single markdown report card, summarizing and gamifying the whole thing.

Even for as low as ~$100 in cost (~4 hours on an 8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions. About ~12 hours surpasses GPT-2 CORE metric. As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and 70s on ARC-Easy, 20s on GSM8K, etc.

My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved.

Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.

682

24K

18K

ningandma retweeted

Amjad Masad

@amasad

8 months ago

It can do this with ~100,000 neurons (1m parameters max)

113

287

254K

Ning Ma @ningandma

8 months ago

@mag_leymi @realMaalouf I walked like any normal person. I kept to myself and did not provoke anyone. I’ve been to 30 countries, most of them developing countries, including 6 in Africa. The summer before Egypt, I studied Arabic in Morocco and it was great. Egypt is the one country I’ll never return to.

ningandma retweeted

Oski the Reformer

@C7ISRpenguin

9 months ago

BREAKING NEWS: UC Berkeley announces it is amending its 2022 Campus Master Plan by adding a 1000 space parking structure to accommodate the growing needs of Nobel Laureate parking on campus

C7ISRpenguin's tweet photo. BREAKING NEWS: UC Berkeley announces it is amending its 2022 Campus Master Plan by adding a 1000 space parking structure to accommodate the growing needs of Nobel Laureate parking on campus https://t.co/dnMb1YReMG

494

15K

Ning Ma @ningandma

9 months ago

Met a founder at a BBQ. They’re building a startup to de-alcoholize wine at scale—without losing what makes it wine. The tech is metal-organic frameworks (MOFs), pioneered by one of the founders. The team stood out: - UC Berkeley chemistry professor, “Nobel-nominated.” - Former LVMH exec with no/low alcohol experience. - Scrappy tech operator chasing a hard problem. I thought it was worth backing and introduced them to an angel group. The response? “Meh.” Not enough interest to move to the next round. Today, that “Nobel-nominated” co-founder, Omar Yaghi, just won the Nobel Prize in Chemistry. Investors said “meh.” The team just kept on executing. If you’re a founder building what you believe is worth building—keep going.

UC Berkeley

@UCBerkeley

9 months ago

Congratulations to UC Berkeley’s Omar Yaghi, who shares the 2025 @NobelPrize in #Chemistry for helping create a field called reticular chemistry. https://t.co/jpLjzWd7S1

288

215K

114

ningandma retweeted

AJ Thurston, PhD @AJThurston

9 months ago

I prefer this format

139

15K

973

883K

ningandma retweeted

Patrick Hsu

@pdhsu

12 months ago

Today @arcinstitute releases State, our first perturbation prediction AI model and an important step towards our goal of a virtual cell State is designed to learn how to shift cells between states (e.g. “diseased” to “healthy”) using drugs, cytokines, or genetic perturbations

pdhsu's tweet photo. Today @arcinstitute releases State, our first perturbation prediction AI model and an important step towards our goal of a virtual cell

State is designed to learn how to shift cells between states (e.g. “diseased” to “healthy”) using drugs, cytokines, or genetic perturbations https://t.co/o98bs0oysJ

160

485

259K

ningandma retweeted

Anthropic

@AnthropicAI

11 months ago

In a joint paper with @OwainEvans_UK as part of the Anthropic Fellows Program, we study a surprising phenomenon: subliminal learning. Language models can transmit their traits to other models, even in what appears to be meaningless data. https://t.co/oeRbosmsbH

162

475

243K

ningandma retweeted

Andrej Karpathy

@karpathy

over 1 year ago

Agency > Intelligence I had this intuitively wrong for decades, I think due to a pervasive cultural veneration of intelligence, various entertainment/media, obsession with IQ etc. Agency is significantly more powerful and significantly more scarce. Are you hiring for agency? Are we educating for agency? Are you acting as if you had 10X agency? Grok explanation is ~close: “Agency, as a personality trait, refers to an individual's capacity to take initiative, make decisions, and exert control over their actions and environment. It’s about being proactive rather than reactive—someone with high agency doesn’t just let life happen to them; they shape it. Think of it as a blend of self-efficacy, determination, and a sense of ownership over one’s path. People with strong agency tend to set goals and pursue them with confidence, even in the face of obstacles. They’re the type to say, “I’ll figure it out,” and then actually do it. On the flip side, someone low in agency might feel more like a passenger in their own life, waiting for external forces—like luck, other people, or circumstances—to dictate what happens next. It’s not quite the same as assertiveness or ambition, though it can overlap. Agency is quieter, more internal—it’s the belief that you *can* act, paired with the will to follow through. Psychologists often tie it to concepts like locus of control: high-agency folks lean toward an internal locus, feeling they steer their fate, while low-agency folks might lean external, seeing life as something that happens *to* them.”

50K

36K

11M

ningandma retweeted

Andrej Karpathy

@karpathy

over 1 year ago

It's 2025 and most content is still written for humans instead of LLMs. 99.9% of attention is about to be LLM attention, not human attention. E.g. 99% of libraries still have docs that basically render to some pretty .html static pages assuming a human will click through them. In 2025 the docs should be a single your_project.md text file that is intended to go into the context window of an LLM. Repeat for everything.

637

13K

Ning Ma

@ningandma

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users