Introducing SubQ - a major breakthrough in LLM intelligence.
It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA),
And the first frontier model with a 12 million token context window which is:
- 52x faster than FlashAttention at 1MM tokens
- Less than 5% the cost of Opus
Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention).
Only a small fraction actually matter.
@subquadratic finds and focuses only on the ones that do.
That's nearly 1,000x less compute and a new way for LLMs to scale.
@tim_cook Hello @Apple ,
I am a student, and my old laptop is no longer working properly it makes noise and is very slow. I was wondering if it would be possible to receive an iPad Pro (M5) or a MacBook. I am happy to provide proof of my student status if needed.
Thank you for your time