Alexander Long @AlexanderLong - Twitter Profile

Pinned Tweet

6 months ago

Since I started getting interested in ML I got it in my head that all I wanted to do was one smart thing that I could look back on and be satisfied that I did. Most papers are kinda bad even if they get accepted - the idea is very incremental, or it's just not that good an idea, or it doesn't really matter. I never was able to do this all through PhD or my time at Amazon. All the papers I did there got into various places, but I never really thought they were actually that good. And I'd pretty much given up on this because Pluralis meant I couldn't really devote enough time to research myself. But in February I decided I didn't care and spend two months focused on a specific problem that had been going round in my head for about a year that I felt we needed to solve, and the solution came to me, and @ChaminHewa picked it up and generalised the approach and ran a bunch of novel experiments I hadn't thought of, and pulled everything together into an actual paper. And yesterday we presented this work at NeurIPS. This is the first and probably only work I will ever do that for me feels like "ok that was GOOD". I don't care if it racks up a bunch of citations and disperses into the field or not, I don't care if someone repackages the ideas and takes all the credit for it, I don't care. For me there is an internal checkbox that just got ticked after more than ten years of trying. Anyone in ML will understand what I'm trying to say. Special day I'm going to remember for a long time.

AlexanderLong's tweet photo. Since I started getting interested in ML I got it in my head that all I wanted to do was one smart thing that I could look back on and be satisfied that I did. Most papers are kinda bad even if they get accepted - the idea is very incremental, or it's just not that good an idea, or it doesn't really matter.

I never was able to do this all through PhD or my time at Amazon. All the papers I did there got into various places, but I never really thought they were actually that good. And I'd pretty much given up on this because Pluralis meant I couldn't really devote enough time to research myself. But in February I decided I didn't care and spend two months focused on a specific problem that had been going round in my head for about a year that I felt we needed to solve, and the solution came to me, and @ChaminHewa picked it up and generalised the approach and ran a bunch of novel experiments I hadn't thought of, and pulled everything together into an actual paper. And yesterday we presented this work at NeurIPS.

This is the first and probably only work I will ever do that for me feels like "ok that was GOOD". I don't care if it racks up a bunch of citations and disperses into the field or not, I don't care if someone repackages the ideas and takes all the credit for it, I don't care. For me there is an internal checkbox that just got ticked after more than ten years of trying. Anyone in ML will understand what I'm trying to say. Special day I'm going to remember for a long time.

21

205

12

42

18K

Alexander Long

@AlexanderLong

about 8 hours ago

@cartaleoni added you

4

0

536

Alexander Long

@AlexanderLong

1 day ago

@tyleraromero @MicrosoftAI I really like the way you guys wrote the report - no overclaiming, laid everything out clearly etc. etc. Gives me olmo vibes, and is far more valuable than just releasing the weights - thankyou!

0

1

0

238

AlexanderLong retweeted

Hadi M. Dolatabadi

@hmdolatabadi

2 days ago

One particular aspect of LLM pretraining that often gets under-discussed in research is the fault-tolerance properties of the underlying system. Even in large centralised datacenter settings, tech reports such as Llama-3 or the recent Laguna tech report from Poolside openly talk about the importance of handling device and node failures. For example, Llama-3 training faced about 419 unexpected interruptions, with about half of those related to GPU/HBM/device-class failures one way or another (to understand the magnitude of these failures, that’s a GPU-related issue every 6 hours!). When running distributed LLM pretraining, fault-tolerance is not a luxury; it’s a necessity. Devices can go offline for various reasons, including workers voluntarily leaving the training stack. In this setting, being able to have a resilient system that continues training without interruption is a massive engineering push, let alone the fact that the entire run is happening in geographically distributed compute. Agora has now been up and running for more than two weeks on prosumer volunteers across North America, with nodes joining/leaving our pretraining run on an almost hourly basis. Despite this churn, training has been going smoothly so far and the training loss continues its downward trajectory at a stable throughput.

2

14

2

1

493

Alexander Long

@AlexanderLong

3 days ago

Qwen3.7 looks closed

Qwen

@Alibaba_Qwen

3 days ago

👏👏 Introducing Qwen3.7-Plus — a multimodal agent model that unifies vision and language into one versatile agent foundation. ✅ Multimodal interactive hybrid agent: unified GUI & CLI operation across visual and text tasks ✅ Versatile coding agent & productivity assistant with full-modality input ✅ Visual Agent: perception, reasoning, grounding, and search-augmented QA ✅ Cross-harness generalization across diverse agent frameworks One model. Sees, thinks, codes, acts.🙌🙌 Now available via API on Alibaba Cloud Model Studio. Try it — let us know what you build.😎 🔗🔗⬇️⬇️ Blog：https://t.co/pVYf0h3NNa Qwen Studio：https://t.co/HUYgFW4cYf API：https://t.co/viL0cXrMzW

Alibaba_Qwen's tweet photo. 👏👏 Introducing Qwen3.7-Plus — a multimodal agent model that unifies vision and language into one versatile agent foundation.

✅ Multimodal interactive hybrid agent: unified GUI & CLI operation across visual and text tasks
✅ Versatile coding agent & productivity assistant with full-modality input
✅ Visual Agent: perception, reasoning, grounding, and search-augmented QA
✅ Cross-harness generalization across diverse agent frameworks

One model. Sees, thinks, codes, acts.🙌🙌

Now available via API on Alibaba Cloud Model Studio. Try it — let us know what you build.😎

🔗🔗⬇️⬇️
Blog：https://t.co/pVYf0h3NNa
Qwen Studio：https://t.co/HUYgFW4cYf
API：https://t.co/viL0cXrMzW

248

4K

453

700

449K

0

6

0

1

1K

Alexander Long

@AlexanderLong

3 days ago

@AndrewCurran_ possible there could be some negative consequences if the gov controls the lens most people use to interpret reality.

0

6

0

329

AlexanderLong retweeted

Elad Gil

@eladgil

4 days ago

The events of the last 6 months in technology are arguable amongst the most important in human history The tools now increasingly exist for recursive self improvement of models & agents We are likely in very early lift off & exponential Largely unnoticed outside of tech

265

5K

436

1K

574K

Alexander Long

@AlexanderLong

4 days ago

https://t.co/g5IvwnbaWQ

0

1

0

2

6K

Alexander Long

@AlexanderLong

4 days ago

Whole section on Pluralis in Chamath's substack this week right under details of Anthropic's monster round.

8

78

11

25

15K

Alexander Long

@AlexanderLong

6 days ago

4090 and 5090 prices have more than 2x'd on vast in last 20 days. People realising these cards are good

3

31

1

2

5K

Alexander Long

@AlexanderLong

7 days ago

@benfielding https://t.co/gzLHyooLfK

Alexander Long

@AlexanderLong

over 1 year ago

> protocol learning isn't possible > ok maybe it's possible but it's too expensive <-- we are here > ok maybe it won't be too expensive but it will never get enough compute > ok maybe it can get enough compute but you won't have the secret training recipies for large scale runs > these open large scale runs are irresponsible, unsafe and need to be shut down

3

45

7

10

6K

1

8

0

974

AlexanderLong retweeted

Ansem

@blknoiz06

7 days ago

stop scrolling & read this all the way through

112

909

56

2K

208K

Alexander Long

@AlexanderLong

7 days ago

when these goods remain concentrated in the hands of a few, without adequate forms of sharing and access, a new imbalance is created

Pope Leo XIV

@Pontifex

9 days ago

Today, among the goods that are universally intended for everyone, we must also include new forms of property, such as patents, algorithms, digital platforms, technological infrastructure and data. In a context where the wealth of nations depends increasingly on knowledge and technology, when these goods remain concentrated in the hands of a few, without adequate forms of sharing and access, a new imbalance is created that contradicts the universal destination of goods. In turn, it widens the gap between the included and the excluded, between those who can participate in the digital revolution and those who remain on the margins. #MagnificaHumanitas

1K

35K

6K

3K

2M

3

17

3

1

2K

AlexanderLong retweeted

Albert Wenger 🌎🔥⌛

@albertwenger

9 days ago

Fantastic overview of the @Pluralis project and why it is important

1

38

5

11

4K

Alexander Long

@AlexanderLong

10 days ago

Agora is about an order of magnitude faster than the system that powered Node-0. 175k tok/s is fast.

Pluralis Research @Pluralis

15 days ago

Today we're releasing Agora: the first ever pretraining stack that allows non-collocated consumer GPUs to be competitive with centralized clusters Agora is 15x faster than Megatron-LM in this setting and is only 1.5x less efficient in terms of tokens per unit compute than TorchTitan on H100s, despite running on devices that have no NVLink or InfiniBand support.

Pluralis's tweet photo. Today we're releasing Agora: the first ever pretraining stack that allows non-collocated consumer GPUs to be competitive with centralized clusters

Agora is 15x faster than Megatron-LM in this setting and is only 1.5x less efficient in terms of tokens per unit compute than TorchTitan on H100s, despite running on devices that have no NVLink or InfiniBand support.

23

259

41

118

70K

1

36

6

4

4K

AlexanderLong retweeted

Gait Analyst

@gaitanalyst

10 days ago

The Pope just allied with Anthropic. The arc of history is long but it bends towards Warhammer 40K.

109

12K

954

879

325K

Alexander Long

@AlexanderLong

12 days ago

@gazorp5 @Pluralis Yeap

0

51

Alexander Long

@AlexanderLong

12 days ago

@gazorp5 @Pluralis 20%

1

0

52

Alexander Long

@AlexanderLong

13 days ago

wow

AlexanderLong's tweet photo. wow https://t.co/hf1EkFoYs9

Pluralis Research @Pluralis

15 days ago

Today we're releasing Agora: the first ever pretraining stack that allows non-collocated consumer GPUs to be competitive with centralized clusters Agora is 15x faster than Megatron-LM in this setting and is only 1.5x less efficient in terms of tokens per unit compute than TorchTitan on H100s, despite running on devices that have no NVLink or InfiniBand support.

23

259

41

118

70K

3

50

4

9

9K

Alexander Long

@AlexanderLong

13 days ago

@blknoiz06 @gregosuri Pluralis does sound interesting

1

7

0

349

Alexander Long

@AlexanderLong

Last Seen Users on Sotwe

Trends for you

Most Popular Users