@ActuallyIsaak@ysu_ChatData How do you determine the threshold? Is it learned, or do you just set a constant across all experts? If so, how do you deal with the initialization? I've tried something similar and ran into issues for the early steps where either every expert activates or none do.
Introducing Multi-Head LatentMoE ๐
Turns out, making NVIDIA's LatentMoE [1] multi-head further unlocks O(1), balanced, and deterministic communication.
Our insight: Head Parallel; Move routing from before all-to-all to after. Token duplication happens locally. Always uniform, always deterministic.
It works orthogonally to EP as a new dimension of parallelism. For example, use HP for intra-cluster all-to-all as a highway, then use EP locally.
We propose FlashAttention-like routing and expert computation, both exact, IO-aware, and constant memory. This is to handle the increased number of sub-tokens.
Results:
- We replicate LatentMoE and confirm it is indeed faster than MoE, with matching model performance. (See Design Principle IV in [1])
- Up to 1.61x faster training than MoE+EP with identical model performance.
- Higher model performance while still 1.11x faster with doubled granularity.
๐ Paper: https://t.co/re5ludi0mB
๐ป Code: https://t.co/8pHdtN3Z4i
[1] Elango et al., "LatentMoE: Toward Optimal Accuracy per FLOP and Parameter in Mixture of Experts", 2026. https://t.co/cNmJ8tchTF
@kalomaze@Infopulsed Different types, either for flash attention 2 and 3, scheduling kernels so they aren't using the transcendental operations at the same time, or for expert sharing, to schedule sending tokens and performing operations more optimally
@jpobrien925@prerat Was gonna post this study. Agreed, if you just wanted it to roughly regurgitate most of the concepts, you could do it with less, but trying to get near 100% recall on the entire corpus would take at least that much, if not more
@the_wilderless That might have been around the time I experienced a very loud short gong-ish sound. It was so loud and potent I was almost in doubt of my surety that it was originating from in my head. Never experienced anything like it before or since.
@brundolfsmith Personally, I'd like options to both see past events (like see Friday events when it's sunday. They seem to leave and not come back), and also have a simple button at the top that makes it so it only shows events that are currently ongoing or coming up. Used to scrolling a lot