Kevin Young @kevisyoung - Twitter Profile

Kevin Young @kevisyoung

about 1 month ago

@AppenResearch @subquadratic @SubquadraticCTO TY, Appen! Awesome validation.

0

4

0

1K

kevisyoung retweeted

Alexander Whedon

@alex_whedon

about 1 month ago

We've partnered with Appen to evaluate the benchmarks we published last week. Results are in and we've actually improved across the board. Link below to the full report.

18

112

17

32

34K

kevisyoung retweeted

Dan McAteer

@daniel_mac8

about 1 month ago

Asked Alexander Whedon, CEO of @subquadratic, if SubQ will replace or augment existing vanilla autoregressive LLMs like GPT-5.5/Opus 4.7. SubQ specializes in long-context tasks: > 12 mil token context length > 52x faster than FlashAttention > 20x cheaper than Opus Important caveat is that SubQ doesn't provide a significant lift outside of long-context. Clearly, models like GPT-5.5/Opus 4.7 can use SubQ as a *tool* within an agent harness. It is invoked for the long-context use cases and passes responses back to the AR LLM. This alone would be a gamechanger for you if you build with AI.

daniel_mac8's tweet photo. Asked Alexander Whedon, CEO of @subquadratic, if SubQ will replace or augment existing vanilla autoregressive LLMs like GPT-5.5/Opus 4.7.

SubQ specializes in long-context tasks:

> 12 mil token context length
> 52x faster than FlashAttention
> 20x cheaper than Opus

Important caveat is that SubQ doesn't provide a significant lift outside of long-context.

Clearly, models like GPT-5.5/Opus 4.7 can use SubQ as a *tool* within an agent harness. It is invoked for the long-context use cases and passes responses back to the AR LLM.

This alone would be a gamechanger for you if you build with AI.

9

68

8

15

11K

kevisyoung retweeted

Alexander Whedon

@alex_whedon

about 1 month ago

Yes, we are using weights from open-source models as a starting point, as a function of our funding and maturity as a company. This is something we intend to change, and we have run many from-scratch experiments at smaller scale already, including with further architectural variations. We take the weights, port them into our architecture, and do CPT, SFT, and RL for the behaviors we want. To date, sub-quadratic architectures have required a significant quality tradeoff on long context. Our algorithm changes that. We are using that to do faster training, faster inference, and longer-context training and inference. We just shared a technical blog post (https://t.co/tPLzi0eNJR) with more details and will share more details again in a model card next week. If there is anything you think is missing, let us know, and we can make sure to include them!

6

71

7

27

21K

Who to follow

you will inspire some and trigger others. both are medicine. ⚔️

kevisyoung retweeted

Alexander Whedon

@alex_whedon

about 1 month ago

We were a little slow on this, but we just got a technical blog post up with more details. Please take a look! https://t.co/tPLzi0eNJR We have a model card coming next week, and we are happy to take requests for any specific details there. I am happy to answer any questions here!

68

682

58

486

273K

kevisyoung retweeted

Sumanth

@Sumanth_077

about 1 month ago

Attention Is All You Need (2017): most cited ML paper of the decade. For 8 years, every frontier model has been built on quadratic attention. Process every possible word-to-word relationship. Compute explodes with context length. Accuracy degrades past 200k tokens. Sub-quadratic attention was always the endgame. The labs just had too much invested in transformers to admit it. SubQ is the first production-ready sub-quadratic LLM. 12M token context. Linear scaling, not quadratic. Outperforms Opus 4.6 on long context at less than 10% the cost. 52x faster than FlashAttention. Linear vs quadratic. That's the whole game.

6

36

12

14

4K

Kevin Young @kevisyoung

about 1 month ago

@zaimiri @alex_whedon Agreed

0

10

Kevin Young @kevisyoung

about 1 month ago

@heygurisingh You get it!

1

0

358

kevisyoung retweeted

Santiago

@svpino

about 1 month ago

Huge context windows are the biggest lie in AI. Honestly, I haven't seen any benefits to scaling past 1M tokens. The more data you show the model, the dumber they get, so larger windows are pointless. Attention is quadratic: If you double the context, you are quadrupling the compute. Past a certain point, models will get slow and expensive and start making stuff up. And they have a lot of trouble remembering the details in the middle of the context. There are a million workarounds for this: • Chunking • Summarization layers • Retrieval patches • Sliding windows But, honestly, they are just meh. Here is something new, and potentially a solution that will help with this: Subquadratic built an LLM that uses a subquadratic architecture. This means that the cost of increasing the context doesn't explode as it does with standard transformers. Their LLM: • 12M tokens of usable context • No chunk-and-stitch workarounds needed • The full context goes into the model, not summaries of it If this works as advertised, it will completely change what Context Engineering means.

28

92

18

56

21K

kevisyoung retweeted

Alexander Whedon

@alex_whedon

about 1 month ago

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus Transformer-based LLMs waste compute by processing every possible relationship between words (standard attention). Only a small fraction actually matter. @subquadratic finds and focuses only on the ones that do. That's nearly 1,000x less compute and a new way for LLMs to scale.

1K

23K

3K

19K

13M

Kevin Young @kevisyoung

over 1 year ago

@mhhya888 Yes, I was trying to copy css from https://t.co/OgbIF1l3on but there are a lot of id if you could process the refund, I would be super grateful!

1

0

7