Anyone who tries to argue climate change with you, just pull up this.
Not about no AC’s no more.
This is about future generations being able to grow food in europe or not.
Fun surprise: DeepSeek used my open-perfectblend dataset to train their new DSpark drafter
Time to promote it again! It's an open-source reproduction of "The Perfect Blend" paper.
If you ever need >1M diverse prompts in math, chat, and code, it does the job.
They have benefit so much from published research whose results were conducted on open models, large part of them are Chinese ones. Let the spice flow.
DeepSeek should ban Anthropic from implementing Dspark into their models! In fact, Anthropic should be banned from using any AI research from China and required to remove any non-US data they used to pre-train their models! Cry, Dario, cry!😅
DeepSeek is the GOAT. 🐳
They just published DSpark, a new speculative decoding method that boosts throughput by 51% to 400%.
They also open-sourced DeepSpec, the training framework behind it.
This is the real open AI.
One of the key insights from our SubQ 1.1 Small technical report (https://t.co/Y6kBYgPXv6 ) was that super long-context pre-training decreased the reliance on super-long-context post-training to enable super-long-context modeling capabilities. Million-token-plus pre-training enabled the model the extrapolate post-trained capabilities to larger-than-trained lengths. For example, we pre-trained a model with one-million-token inputs and then did post-training at or below one million tokens, and it was able to perform NIAH with high accuracy at multi-million-token lengths. Read more in our technical report!
@SanderSassen You know thermodynamics. The outside will be topped with extra heat if you set your home temp lower. Touch your fridge side and tell me it's wrong.
Join Subquadratic in SF for a casual gathering this Saturday (link in comments)!
We are a foundation model company building the most compute-, memory-, and sample-efficient foundation model architectures for the next era of AI computing! We love to chat about challenging base assumptions of the industry.
We are hiring folks to work on large-scale pre- and post-training, long-context modeling, model architectures beyond attention, world models, efficient inference and training, and more.
@francoisfleuret Trust is luxury good in authoritarianism countries. As it opts for low miss rate of distrusting motive, false alarm rate is high as a result. It rather trusts no one than being wrong once.
@puckrin Living in Europe here. I dont see a point to invest to AC units just to use one week out of 52 weeks per year. We have other ways to cope with heat too.