Mia @aLanguageModel - Twitter Profile

10 days ago

@dwarkesh_sp Today’s models still suck in certain types of brainstorming such as: 1. Ideating names for things (model-generated names are slop) 2. Providing comprehensive lists of things (“list restaurants I might reserve tonight” -> model only lists a few options, none desirable)

0

7

aLanguageModel retweeted

Microsoft AI

@MicrosoftAI

11 days ago

MAI-Thinking-1: A powerful reasoning model developed from scratch that is competitive with models of similar size on STEM reasoning and coding tasks. Our pre-training focused on a simple scaling emphasizing data-driven iterative improvements to our architecture and data. Our reinforcement learning (RL) framework is optimized for sustained log-linear climbs over many thousands of steps We are openly sharing all technical details and learnings to build a transparent and science-driven approach to further development in AI Read More: https://t.co/yK9Cd5loUd

MicrosoftAI's tweet photo. MAI-Thinking-1: A powerful reasoning model developed from scratch that is competitive with models of similar size on STEM reasoning and coding tasks.

Our pre-training focused on a simple scaling emphasizing data-driven iterative improvements to our architecture and data. Our reinforcement learning (RL) framework is optimized for sustained log-linear climbs over many thousands of steps

We are openly sharing all technical details and learnings to build a transparent and science-driven approach to further development in AI

Read More: https://t.co/yK9Cd5loUd

4

151

15

18

19K

aLanguageModel retweeted

Graham Neubig

@gneubig

12 days ago

Looking good 👀 Historically MiniMax models have worked well in OpenHands, looking forward to giving this one a whirl!

5

39

6

5

12K

aLanguageModel retweeted

MiniMax (official) @MiniMax_AI

13 days ago

Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities - Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas - MiniMax Sparse Attention scales context to 1M - Natively Multimodal from Step Zero API: https://t.co/fHRdSV7BwZ Token Plan: https://t.co/BDCycxepZw 🚀New! MiniMax Code: https://t.co/GvB4YiB6Ul Weights & Tech Report in ~10 Days

MiniMax_AI's tweet photo. Introducing MiniMax M3: The First Open-Weights Model to Combine Three Frontier Capabilities

- Coding & Agentic Frontier: 59.0% SWE-Bench Pro, 66.0% Terminal Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard, 74.2% MCP Atlas
- MiniMax Sparse Attention scales context to 1M
- Natively Multimodal from Step Zero

API: https://t.co/fHRdSV7BwZ
Token Plan: https://t.co/BDCycxepZw
🚀New! MiniMax Code: https://t.co/GvB4YiB6Ul

Weights & Tech Report in ~10 Days

559

11K

1K

3K

5M

Who to follow

Hafidh Soekma Ardiansyah

@hafidhsoekma

Making Accessible Indonesia AI Model with @azale_ai 🥀 | Tech and Science Enthusiast 🧬

MarComTechniques

@MarComTechnique

A podcast 🎙focused on integrated marketing, communication and technology.

Anil Podduturi

@anil_podduturi

Managing Director and Partner @BCG and @BCGX. Product Builder. Dad 2x. BedStuy via Seattle.

aLanguageModel retweeted

Pushmeet Kohli

@pushmeet

19 days ago

AI agents are advancing research-level math. 🚀 I’m thrilled to share @GoogleDeepMind’s AlphaProof Nexus - an agentic framework for formal proof search powered by Gemini. When applied to a set of open formal math problems, our agent autonomously solved: ✅ 9 open Erdős problems (including two open for 56 years!) ✅ 44 Online Encyclopedia of Integer Sequences (OEIS) problems ✅ A 15-year-old open problem in algebraic geometry ✅ A 7-year-old open question in min-max optimization We are collaborating with mathematicians across disciplines - from combinatorics and graph theory to quantum optics. Ultimately, these results show the massive potential of even simple agentic loops powered by Gemini. Read the paper here: https://t.co/c5M9ZjRXU1

pushmeet's tweet photo. AI agents are advancing research-level math. 🚀

I’m thrilled to share @GoogleDeepMind’s AlphaProof Nexus - an agentic framework for formal proof search powered by Gemini.

When applied to a set of open formal math problems, our agent autonomously solved:
✅ 9 open Erdős problems (including two open for 56 years!)
✅ 44 Online Encyclopedia of Integer Sequences (OEIS) problems
✅ A 15-year-old open problem in algebraic geometry ✅ A 7-year-old open question in min-max optimization

We are collaborating with mathematicians across disciplines - from combinatorics and graph theory to quantum optics. Ultimately, these results show the massive potential of even simple agentic loops powered by Gemini.

Read the paper here: https://t.co/c5M9ZjRXU1

80

2K

242

461

218K

aLanguageModel retweeted

Timothy Gowers @wtgowers @wtgowers

24 days ago

AI has now solved a major open problem -- one of the best known Erdos problems called the unit distance problem, one of Erdos's favourite questions and one that many mathematicians had tried. https://t.co/SD1vVPkrHR

75

4K

614

2K

1M

Mia @aLanguageModel

about 1 month ago

@nrehiew_ Is there more or less forgetting vs RL, if one constructs SFT dataset like this (for 0/1 rewards): Annotators label data sampled from the initial model’s policy, and then we only keep the high-reward samples. Will that be closer to original distribution than RL distribution?

0

339

aLanguageModel retweeted

Daniel Jeffries

@Dan_Jeffries1

about 1 month ago

Jensen is one the smartest and most far seeing folks the world. "If an AI scientist warns people that AI is going to permeate across radiology and radiologists are going to get wiped out, it might seem helpful but it's hurtful. If we convince everybody not to be radiologists and we now need radiologists, that actually is hurtful to society. "It is hurtful to convince all the young college graduates not to study software engineering because we are going to need more software engineers than ever. That's hurtful." "Scaring people with nonsensical things, which are not going to happen, that this is an existential threat, there's a 20% chance that is is existential, that's ridiculous. "That it's going to wipe out 50% of college level jobs. "That is it going to completely destroy democracy. "These kinds of comments are not helpful. They are made by...CEOS. And you become a CEO, maybe you adopt a God complex and somehow you know everything." Brutal. And right.

245

5K

813

2K

854K

aLanguageModel retweeted

James Zou @james_y_zou

about 1 month ago

Big Update🤩: #paperclip now includes full papers from all of arXiv, PubMed Central and 150 million abstracts!🖇️ You can give your LLM all that knowledge in one line—all optimally indexed for AI agents. Much more thorough and ~100x faster than web search, and free.

james_y_zou's tweet photo. Big Update🤩: #paperclip now includes full papers from all of arXiv, PubMed Central and 150 million abstracts!🖇️

You can give your LLM all that knowledge in one line—all optimally indexed for AI agents. Much more thorough and ~100x faster than web search, and free. https://t.co/ZoHWcx7MXg

43

2K

245

2K

186K

aLanguageModel retweeted

OpenAI Developers

@OpenAIDevs

about 1 month ago

Students are learning to build with Codex, and building to learn. Here’s what @UCBerkeley students built at the Codex Creator Challenge with @joinHandshake.

30

521

30

114

49K

aLanguageModel retweeted

Nick Levine

@status_effects

about 2 months ago

New work with @AlecRad and @DavidDuvenaud: Have you ever dreamed of talking to someone from the past? Introducing talkie, a 13B model trained only on pre-1931 text. Vintage models should help us to understand how LMs generalize (e.g., can we teach talkie to code?). Thread:

178

3K

396

2K

1M

aLanguageModel retweeted

Percy Liang

@percyliang

about 2 months ago

It is liberating being able to talk about what you work on.

10

623

33

38

52K

aLanguageModel retweeted

Larry Dial

@classiclarryd

about 2 months ago

Researchers' brilliant ideas often get lost in the sea of endless SOTA claims on weak baselines. At Marin we battle-test ideas in an open arena, where anyone's idea can be promoted to the next hero run. One that recently rose up was @Jianlin_S MoE Quantile Balancing, used in our last 1e22 and ongoing 130B run. Animated visuals of how QB performed are available in the OpenAthena blog. https://t.co/BDSsonuNH7

classiclarryd's tweet photo. Researchers' brilliant ideas often get lost in the sea of endless SOTA claims on weak baselines. At Marin we battle-test ideas in an open arena, where anyone's idea can be promoted to the next hero run. One that recently rose up was @Jianlin_S MoE Quantile Balancing, used in our last 1e22 and ongoing 130B run. Animated visuals of how QB performed are available in the OpenAthena blog. https://t.co/BDSsonuNH7

9

241

30

143

81K

aLanguageModel retweeted

Jared Duker Lichtman

@jdlichtman

about 2 months ago

In my doctorate, I proved the Erdős Primitive Set Conjecture, showing that the primes themselves are maximal among all primitive sets. This problem will always be in my heart: I worked on it for 4 years (even when my mentors recommended against it!) and loved every minute of it. [Primitive sets are a vast generalization of the prime numbers: A set S is called primitive if no number in S divides another.] Now Erdős#1196 is an asymptotic version of Erdős' conjecture, for primitive sets of "large" numbers. It was posed in 1966 by the Hungarian legends Paul Erdős, András Sárközy, and Endre Szemerédi. I'd been working on it for many years, and consulted/badgered many experts about it, including my mentors Carl Pomerance and James Maynard. The the proof produced by GPT5.4 Pro was quite surprising, since it rejected the "gambit" that was implicit in all works on the subject since Erdős' original 1935 paper. The idea to pass from analysis to probability was so natural & tempting from a human-conceptual point of view, that it obscured a technical possibility to retain (efficient, yet counter-intuitve) analytic terminology throughout, by use of the von Mangoldt function \Lambda(n). The closest analogy I would give would be that the main openings in chess were well-studied, but AI discovers a new opening line that had been overlooked based on human aesthetics and convention. In fact, the von Mangoldt function itself is celebrated for it's connection to primes and the Riemann zeta function--but its piecewise definition appears to be odd and unmotivated to students seeing it for the first time. By the same token, in Erdős#1196, the von Mangoldt weights seem odd and unmotivated but turn out to cleverly encode a fundamental identity \sum_{q|n}\Lambda(q) = \log n, which is equivalent to unique factorization of n into primes. This is the exact trick that breaks the analytic issues arising in the "usual opening". Moreover, Terry Tao has long suspected that the applications of probability to number theory are unnecessarily complicated and this "trick" might actually clarify the general theory, which would have a broader impact than solving a single conjecture.

55

3K

380

1K

988K

aLanguageModel retweeted

Hayden Prairie @hayden_prairie

about 2 months ago

We’ve been thinking a lot about scaling laws, wondering if there is a more effective way to scale FLOPs without increasing parameters. Turns out the answer is YES – by looping blocks of layers during training. We find that predictable scaling laws exist for layer looping, allowing us to use looping to achieve the quality of a Transformer twice the size. Our scaling laws suggest that for a fixed parameter budget, data and looping should be increased in tandem! 🧵👇

hayden_prairie's tweet photo. We’ve been thinking a lot about scaling laws, wondering if there is a more effective way to scale FLOPs without increasing parameters.

Turns out the answer is YES – by looping blocks of layers during training. We find that predictable scaling laws exist for layer looping, allowing us to use looping to achieve the quality of a Transformer twice the size.

Our scaling laws suggest that for a fixed parameter budget, data and looping should be increased in tandem!

🧵👇

41

1K

179

1K

295K

aLanguageModel retweeted

Bobby Samuels

@BobbySamuels

3 months ago

https://t.co/0muCOoj0BN

32

527

62

796

302K

aLanguageModel retweeted

Arvind Narayanan

@random_walker

4 months ago

https://t.co/16ak7tW7Z7

13

194

41

259

93K

aLanguageModel retweeted

Zhongwen Xu @zhongwen2009

6 months ago

Pleased to share our engineering practices for medium-sized LLMs in multi-turn agentic search, where we boosted Qwen3 8B and Qwen3 A3B from 1-2 turn search and 10% accuracy on Browsecomp-Plus to 15+ and 20+ turns with 30% accuracy. The devils are in the details; we hope our practices in stable RL training and data processing can help the community! Link: https://t.co/PnDUJRcOJM Chinese version: https://t.co/pMYF5lFtUw

zhongwen2009's tweet photo. Pleased to share our engineering practices for medium-sized LLMs in multi-turn agentic search, where we boosted Qwen3 8B and Qwen3 A3B from 1-2 turn search and 10% accuracy on Browsecomp-Plus to 15+ and 20+ turns with 30% accuracy. The devils are in the details; we hope our practices in stable RL training and data processing can help the community!

Link: https://t.co/PnDUJRcOJM

Chinese version: https://t.co/pMYF5lFtUw

13

509

55

452

143K

aLanguageModel retweeted

Simon Mo

@simon_mo_

6 months ago

This, and few more tricks are covered in Today's @character_ai blogpost https://t.co/wEOBcSFYBy.

3

266

25

313

90K

aLanguageModel retweeted

Prashanth Rao

@tech_optimist

6 months ago

🧵 I compared the performance of @boundaryML BAML and @DSPyOSS for a variety of structured outputs, and the results are interesting: different datasets, models and schema formats results in wildly different outcomes, some of then unexpected. There's no universal winner (which means that prompt optimization matters, more than ever, because it's *very* hard to discover the right prompts as a human). The benchmarks compare BAML's and DSPy's performance (with similar user instructions and description annotations). I also use `BAMLAdapter` in DSPy (which implements BAML's schema formatting in a custom DSPy adapter). Why use a custom adapter in DSPy? Because it has benefits, especially when dealing with nested data, as the experiments show. 👇🏽 1/7

tech_optimist's tweet photo. 🧵
I compared the performance of @boundaryML BAML and @DSPyOSS for a variety of structured outputs, and the results are interesting: different datasets, models and schema formats results in wildly different outcomes, some of then unexpected. There's no universal winner (which means that prompt optimization matters, more than ever, because it's *very* hard to discover the right prompts as a human).

The benchmarks compare BAML's and DSPy's performance (with similar user instructions and description annotations). I also use `BAMLAdapter` in DSPy (which implements BAML's schema formatting in a custom DSPy adapter).

Why use a custom adapter in DSPy? Because it has benefits, especially when dealing with nested data, as the experiments show. 👇🏽
1/7

4

31

6

15

2K

Mia

@aLanguageModel

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users