I can’t emphasize enough how mind-blowing extremely long token context windows are. For both AI researchers and practitioners, massive context windows will have transformative long-term impact, beyond one or two flashy news cycles. ↔️
“More is different”: Just as we saw emergent capabilities when scaling model size, compute, and datasets, I think we’re going to see a similar revolution for in-context learning. The capability shifts we’ll see going from 8k to 32k to 128k to 10M (!!) token contexts and beyond are not just going to be simple X% quantitative improvements, but instead qualitative phase shifts which unlock new abilities altogether and result in rethinking how we approach foundation model reasoning.
Great fundamental research on the relationship between in-context learning (ICL) and in-weight learning is now more relevant than ever, and needs to be extended given that we now operate in an era where the "X-axis" of context length has increased by three orders of magnitude. I highly recommend @scychan_brains's pioneering work in this area, such as https://t.co/aySKxagzYd and https://t.co/xyHuDnCrC2. In fact, there are already data points which suggest our understanding of ICL scaling laws still contains large gaps🤔(see https://t.co/prRlu0Haao)
Also exciting is the connection of long-context ICL to alignment and post-training! I'm curious to see how 10M+ contexts disrupt the ongoing debate about whether foundation models truly learn new capabilities and skills during finetuning/RLHF or whether they purely learn stylistic knowledge (the "Superficial Alignment Hypothesis", https://t.co/MvcuhDbFX3 and https://t.co/QPtMScbX7j). The Gemini 1.5 technical report brings new evidence to this discussion as well, showing that an entire new language can be learned completely in context. I'm excited to see better empirical understanding of how foundation models can effectively leverage large-context ICL both during inference but also for "learning to learn" during training
And finally, perhaps the most important point: huge context lengths will have a lasting impact because their applications are so broad. There is no part of modern foundation model research that is not changed profoundly in some capacity by huge contexts! From theoretical underpinnings (how we design pre-training and post-training objectives) to system design (how we scale up long-contexts during training and serving) to application domains (such as robotics), massive context ICL is going to have significant impact and move the needle across the board.
@ralphmacchio St. John’s Military School in KS closed in 2019 after 136 years. The Alumni are dedicating a new museum that was built for its legacy this weekend. Do you have a cool memory from the days of the shooting of Up the Academy movie! Its memory means so much to many
@RobertDowneyJr St. John’s Military School in KS closed in 2019 after 136 years. The Alumni are dedicating a new museum that was built for its legacy this weekend. Do you have a cool memory from the days of the shooting of Up the Academy movie! Its memory means so much to many
@BankofAmerica Your customer service seems horrible.. I fill out all of the home refinancing on your online portal... takes some time.. Then at the end you dump me to a toll free that sits on hold for 1 hour still no one picks up on multiple attempts?!? This is working for you?
@RobertDowneyJr ?? Did you ever visit your father at St. John’s Military School in Kansas when he filmed Up the Academy? SJMS just closed after 131 years. Lots of us cadets watch that movie still for love of our school. Tell your pops! Best sound track ever!