The TurboQuant paper (ICLR 2026) contains serious issues in how it describes RaBitQ, including incorrect technical claims and misleading theory/experiment comparisons.
We flagged these issues to the authors before submission. They acknowledged them, but chose not to fix them. The paper was later accepted and widely promoted by Google, reaching tens of millions of views.
We’re speaking up now because once a misleading narrative spreads, it becomes much harder to correct. We’ve written a public comment on openreview (https://t.co/nDVjmNhATM).
We would greatly appreciate your attention and help in sharing it.
at the end of the day, all learning should be conducted through in-context learning (or generalized in-context learning). there are critical two questions left. how to make in-context learning more effective and how to learn from infinite contexts.
Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly increase (/decrease) the probability of every action I took for the future". You get a lot more leverage from verifier functions than explicit supervision, this is great. But first, it looks suspicious asymptotically - once the tasks grow to be minutes/hours of interaction long, you're really going to do all that work just to learn a single scalar outcome at the very end, to directly weight the gradient? Beyond asymptotics and second, this doesn't feel like the human mechanism of improvement for majority of intelligence tasks. There's significantly more bits of supervision we extract per rollout via a review/reflect stage along the lines of "what went well? what didn't go so well? what should I try next time?" etc. and the lessons from this stage feel explicit, like a new string to be added to the system prompt for the future, optionally to be distilled into weights (/intuition) later a bit like sleep. In English, we say something becomes "second nature" via this process, and we're missing learning paradigms like this. The new Memory feature is maybe a primordial version of this in ChatGPT, though it is only used for customization not problem solving. Notice that there is no equivalent of this for e.g. Atari RL because there are no LLMs and no in-context learning in those domains.
Example algorithm: given a task, do a few rollouts, stuff them all into one context window (along with the reward in each case), use a meta-prompt to review/reflect on what went well or not to obtain string "lesson", to be added to system prompt (or more generally modify the current lessons database). Many blanks to fill in, many tweaks possible, not obvious.
Example of lesson: we know LLMs can't super easily see letters due to tokenization and can't super easily count inside the residual stream, hence 'r' in 'strawberry' being famously difficult. Claude system prompt had a "quick fix" patch - a string was added along the lines of "If the user asks you to count letters, first separate them by commas and increment an explicit counter each time and do the task like that". This string is the "lesson", explicitly instructing the model how to complete the counting task, except the question is how this might fall out from agentic practice, instead of it being hard-coded by an engineer, how can this be generalized, and how lessons can be distilled over time to not bloat context windows indefinitely.
TLDR: RL will lead to more gains because when done well, it is a lot more leveraged, bitter-lesson-pilled, and superior to SFT. It doesn't feel like the full story, especially as rollout lengths continue to expand. There are more S curves to find beyond, possibly specific to LLMs and without analogues in game/robotics-like environments, which is exciting.
@chenglin903 thanks for your interest! currently we only have the technical report. Feel free to open an issue at Github if you have further questions!
We just released ⭐️Inferflow⭐️, an efficient and highly configurable inference engine in c++ for serving various large language models by simply modifying some lines in corresponding configuration files, without writing a single line of source code.
https://t.co/qt6s9eOoib
Knowledge Fusion of LLMs
Is it possible to merge existing models into a more potent model?
We have already seen a few ways that show the potential to effectively do this using approaches like weight merging and ensembling of models.
This work proposes FuseLLM with the core idea of externalizing knowledge from multiple LLMs and transferring their capabilities to a target LLM.
It leverages the generative distributions of source LLMs to externalize both their collective knowledge and individual strengths and transfer them to the target LLM through continual training.
To put it simply, the idea is to benefit from the strengths of all the LLMs and combine them into one integrated model.
Finds that the FuseLLM can improve the performance of the target model across a range of capabilities such as reasoning, common sense, and code generation.
By the way, you can also perform the fusion among fine-tuned LLMs that specialize in specific tasks.
This continues to be an interesting research area so hoping to document more on any new ideas and findings I come across.
Knowledge Fusion of LLMs
Is it possible to merge existing models into a more potent model?
We have already seen a few ways that show the potential to effectively do this using approaches like weight merging and ensembling of models.
This work proposes FuseLLM with the core idea of externalizing knowledge from multiple LLMs and transferring their capabilities to a target LLM.
It leverages the generative distributions of source LLMs to externalize both their collective knowledge and individual strengths and transfer them to the target LLM through continual training.
To put it simply, the idea is to benefit from the strengths of all the LLMs and combine them into one integrated model.
Finds that the FuseLLM can improve the performance of the target model across a range of capabilities such as reasoning, common sense, and code generation.
By the way, you can also perform the fusion among fine-tuned LLMs that specialize in specific tasks.
This continues to be an interesting research area so hoping to document more on any new ideas and findings I come across.
Thanks @_akhaliq for sharing our work!
People receive language feedback rather than numerical scores.
Can LLMs also take advantage of such insightful feedback?👇
paper: https://t.co/9RDdQPZMPo
code: https://t.co/Bh85B8ds8h
Reasons to Reject? Aligning Language Models with Judgments
paper page: https://t.co/zz0CHfQBFa
As humans, we consistently engage in interactions with our peers and receive feedback in the form of natural language. This language feedback allows us to reflect on our actions, maintain appropriate behavior, and rectify our errors. The question arises naturally: can we use language feedback to align large language models (LLMs)? In contrast to previous research that aligns LLMs with reward or preference data, we present the first systematic exploration of alignment through the lens of language feedback (i.e., judgment). We commence with an in-depth investigation of potential methods that can be adapted for aligning LLMs with judgments, revealing that these methods are unable to fully capitalize on the judgments. To facilitate more effective utilization of judgments, we propose a novel framework, Contrastive Unlikelihood Training (CUT), that allows for fine-grained inappropriate content detection and correction based on judgments. Our offline alignment results show that, with merely 1317 off-the-shelf judgment data, CUT (LLaMA2-13b) can beat the 175B DaVinci003 and surpass the best baseline by 52.34 points on AlpacaEval. The online alignment results demonstrate that CUT can align LLMs (LLaMA2-chat-13b) in an iterative fashion using model-specific judgment data, with a steady performance improvement from 81.09 to 91.36 points on AlpacaEval. Our analysis further suggests that judgments exhibit greater potential than rewards for LLM alignment and warrant future research.
Reasons to Reject? Aligning Language Models with Judgments
paper page: https://t.co/zz0CHfQBFa
As humans, we consistently engage in interactions with our peers and receive feedback in the form of natural language. This language feedback allows us to reflect on our actions, maintain appropriate behavior, and rectify our errors. The question arises naturally: can we use language feedback to align large language models (LLMs)? In contrast to previous research that aligns LLMs with reward or preference data, we present the first systematic exploration of alignment through the lens of language feedback (i.e., judgment). We commence with an in-depth investigation of potential methods that can be adapted for aligning LLMs with judgments, revealing that these methods are unable to fully capitalize on the judgments. To facilitate more effective utilization of judgments, we propose a novel framework, Contrastive Unlikelihood Training (CUT), that allows for fine-grained inappropriate content detection and correction based on judgments. Our offline alignment results show that, with merely 1317 off-the-shelf judgment data, CUT (LLaMA2-13b) can beat the 175B DaVinci003 and surpass the best baseline by 52.34 points on AlpacaEval. The online alignment results demonstrate that CUT can align LLMs (LLaMA2-chat-13b) in an iterative fashion using model-specific judgment data, with a steady performance improvement from 81.09 to 91.36 points on AlpacaEval. Our analysis further suggests that judgments exhibit greater potential than rewards for LLM alignment and warrant future research.
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation
abs: https://t.co/lmAu9j0WAk
repo: https://t.co/fJ85eYHMr7
check out our instruction-following model supports interleaved image-text inputs and outputs! 🌟https://t.co/2HzoSrg4tE🌟 Most importantly, natural and engaging interactions in various realistic scenarios! No need to pretend you're blind in front of our assistant.🥳
TextBind: Multi-turn Interleaved Multimodal Instruction-following
paper page: https://t.co/vQqxOMZ7bj
Large language models with instruction-following abilities have revolutionized the field of artificial intelligence. These models show exceptional generalizability to tackle various real-world tasks through their natural language interfaces. However, their performance heavily relies on high-quality exemplar data, which is often difficult to obtain. This challenge is further exacerbated when it comes to multimodal instruction following. We introduce TextBind, an almost annotation-free framework for empowering larger language models with the multi-turn interleaved multimodal instruction-following capabilities. Our approach requires only image-caption pairs and generates multi-turn multimodal instruction-response conversations from a language model. We release our dataset, model, and demo to foster future research in the area of multimodal instruction following.
Our work exhibits these exciting features:
🚀Comprehend, compare, relate multiple images across a conversation
🚀 Generate vivid responses with interleaved image-text content
🚀Produce spontaneously images based on context. No need for explicit dictations
🚀 explore our demo!!!
TextBind: Multi-turn Interleaved Multimodal Instruction-following
paper page: https://t.co/vQqxOMZ7bj
Large language models with instruction-following abilities have revolutionized the field of artificial intelligence. These models show exceptional generalizability to tackle various real-world tasks through their natural language interfaces. However, their performance heavily relies on high-quality exemplar data, which is often difficult to obtain. This challenge is further exacerbated when it comes to multimodal instruction following. We introduce TextBind, an almost annotation-free framework for empowering larger language models with the multi-turn interleaved multimodal instruction-following capabilities. Our approach requires only image-caption pairs and generates multi-turn multimodal instruction-response conversations from a language model. We release our dataset, model, and demo to foster future research in the area of multimodal instruction following.
I really think this is super interesting! We do not pick next token in a fixed, finite, and standalone vocabulary. We recall our memories and experiences, and choose the most suitable phrases in contexts.
Copy Is All You Need
paper page: https://t.co/kDTsbUQhFr
The dominant text generation models compose the output by sequentially selecting words from a fixed vocabulary. In this paper, we formulate text generation as progressively copying text segments (e.g., words or phrases) from an existing text collection. We compute the contextualized representations of meaningful text segments and index them using efficient vector search toolkits. The task of text generation is then decomposed into a series of copy-and-paste operations: at each time step, we seek suitable text spans from the text collection rather than selecting from a standalone vocabulary. Experiments on the standard language modeling benchmark (WikiText-103) show that our approach achieves better generation quality according to both automatic and human evaluations. Besides, its inference efficiency is comparable to token-level autoregressive models thanks to the reduction of decoding steps. We also show that our approach allows for effective domain adaptation by simply switching to domain-specific text collection without extra training. Finally, we observe that our approach attains additional performance gains by simply scaling up to larger text collections, again without further training
We are super excited to share PandaGPT, the first foundation model capable of instruction-following data across six modalities, without the need of explicit supervision. [1/n]
Project Page: https://t.co/g8p27T6p3W
Demo: https://t.co/leJhrvfDgF
Code: https://t.co/5QnsV8EpQ3