Zonalic @zonalic - Twitter Profile

Zonalic @Zonalic

about 1 month ago

@slimer48484 Mind going into more details?

1

0

1

509

Zonalic @Zonalic

3 months ago

@ActuallyIsaak @ysu_ChatData How do you determine the threshold? Is it learned, or do you just set a constant across all experts? If so, how do you deal with the initialization? I've tried something similar and ran into issues for the early steps where either every expert activates or none do.

0

11

Zonalic retweeted

Chenwei Cui @ccui42

4 months ago

Introducing Multi-Head LatentMoE 🚀 Turns out, making NVIDIA's LatentMoE [1] multi-head further unlocks O(1), balanced, and deterministic communication. Our insight: Head Parallel; Move routing from before all-to-all to after. Token duplication happens locally. Always uniform, always deterministic. It works orthogonally to EP as a new dimension of parallelism. For example, use HP for intra-cluster all-to-all as a highway, then use EP locally. We propose FlashAttention-like routing and expert computation, both exact, IO-aware, and constant memory. This is to handle the increased number of sub-tokens. Results: - We replicate LatentMoE and confirm it is indeed faster than MoE, with matching model performance. (See Design Principle IV in [1]) - Up to 1.61x faster training than MoE+EP with identical model performance. - Higher model performance while still 1.11x faster with doubled granularity. 📄 Paper: https://t.co/re5ludi0mB 💻 Code: https://t.co/8pHdtN3Z4i [1] Elango et al., "LatentMoE: Toward Optimal Accuracy per FLOP and Parameter in Mixture of Experts", 2026. https://t.co/cNmJ8tchTF

ccui42's tweet photo. Introducing Multi-Head LatentMoE 🚀

Turns out, making NVIDIA's LatentMoE [1] multi-head further unlocks O(1), balanced, and deterministic communication.

Our insight: Head Parallel; Move routing from before all-to-all to after. Token duplication happens locally. Always uniform, always deterministic.

It works orthogonally to EP as a new dimension of parallelism. For example, use HP for intra-cluster all-to-all as a highway, then use EP locally.

We propose FlashAttention-like routing and expert computation, both exact, IO-aware, and constant memory. This is to handle the increased number of sub-tokens.

Results:
- We replicate LatentMoE and confirm it is indeed faster than MoE, with matching model performance. (See Design Principle IV in [1])
- Up to 1.61x faster training than MoE+EP with identical model performance.
- Higher model performance while still 1.11x faster with doubled granularity.

📄 Paper: https://t.co/re5ludi0mB
💻 Code: https://t.co/8pHdtN3Z4i

[1] Elango et al., "LatentMoE: Toward Optimal Accuracy per FLOP and Parameter in Mixture of Experts", 2026. https://t.co/cNmJ8tchTF

12

558

70

408

96K

Zonalic @Zonalic

5 months ago

Some of you mfs need to stare at a fire

0

2

0

32

Who to follow

KL 𓅃🥪

@diquebagge

yo | WDET 101.9FM supporter | blocked by at least 4 Twitter users | kindergarten valedictorian | goat verified

Karl Dandleton 𓅃

@BumSquiggler

something about the way you taste makes me want to clear my throat

Vaghammer 𓅃🐸🥪

@Vaghammer69420

Just a silly lil guy. Quite the rascal. Shitposting and Wednesdays my dudes. Genderfluid any pronouns :3 𓅃🐸🥪🐈‍⬛

Zonalic @Zonalic

9 months ago

@kalomaze @Infopulsed Different types, either for flash attention 2 and 3, scheduling kernels so they aren't using the transcendental operations at the same time, or for expert sharing, to schedule sending tokens and performing operations more optimally

0

22

Zonalic @Zonalic

9 months ago

@jpobrien925 @prerat Was gonna post this study. Agreed, if you just wanted it to roughly regurgitate most of the concepts, you could do it with less, but trying to get near 100% recall on the entire corpus would take at least that much, if not more

0

9

Zonalic @Zonalic

11 months ago

@justalexoki Make necessities cheap and give people more free time

0

7

Zonalic @Zonalic

11 months ago

@prismaticflow Interested!

0

1

0

13

Zonalic @Zonalic

11 months ago

@lumpfished Tulpas can definitely be accidental, subtle intention and consistent attention can sometimes be enough

0

27

Zonalic @Zonalic

11 months ago

@the_wilderless That might have been around the time I experienced a very loud short gong-ish sound. It was so loud and potent I was almost in doubt of my surety that it was originating from in my head. Never experienced anything like it before or since.

0

3

0

297

Zonalic @Zonalic

11 months ago

@brundolfsmith Personally, I'd like options to both see past events (like see Friday events when it's sunday. They seem to leave and not come back), and also have a simple button at the top that makes it so it only shows events that are currently ongoing or coming up. Used to scrolling a lot

1

0

50

Zonalic retweeted