Deng Cai

almost 5 years ago

Amazing... We won an ACL outstanding paper award. #ACL2021NLP

9

143

4

5

0

deng_cai retweeted

Jianyang Gao

@gaoj0017

3 months ago

The TurboQuant paper (ICLR 2026) contains serious issues in how it describes RaBitQ, including incorrect technical claims and misleading theory/experiment comparisons. We flagged these issues to the authors before submission. They acknowledged them, but chose not to fix them. The paper was later accepted and widely promoted by Google, reaching tens of millions of views. We’re speaking up now because once a misleading narrative spreads, it becomes much harder to correct. We’ve written a public comment on openreview (https://t.co/nDVjmNhATM). We would greatly appreciate your attention and help in sharing it.

99

6K

958

2K

1M

4 months ago

@XiangruTang congrats

0

102

Associate Prof. at SJTU, leading GAIR Lab (https://t.co/Nfd8KmZx3B) Co-founder of Inspired Cognition, Postdoc at @LTIatCMU, Previously FNLP, @MILAMontreal,

5 months ago

@yus167 thank you. the references are indeed a bit old 😊. I really like your idea of applying similar methods to test-time training.

0

2

0

54

Who to follow

Pengfei Liu

@stefan_fee

Tao Yu

@taoyds

@XLangNLP lab, asst. prof. @HKUniversity. author of OpenCUA, OSWorld, Aguvis, Spider, OpenAgents, Text2Reward, Instructor.

Zhuosheng Zhang

@zhangzhuosheng

Assistant Professor at @sjtu1896. NLP/AI/ML. Formerly @AmazonScience @MSFTResearch @NICT_Publicity @sinovationvc @IBM #NLProc

12 months ago

at the end of the day, all learning should be conducted through in-context learning (or generalized in-context learning). there are critical two questions left. how to make in-context learning more effective and how to learn from infinite contexts.

Andrej Karpathy

@karpathy

12 months ago

Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly increase (/decrease) the probability of every action I took for the future". You get a lot more leverage from verifier functions than explicit supervision, this is great. But first, it looks suspicious asymptotically - once the tasks grow to be minutes/hours of interaction long, you're really going to do all that work just to learn a single scalar outcome at the very end, to directly weight the gradient? Beyond asymptotics and second, this doesn't feel like the human mechanism of improvement for majority of intelligence tasks. There's significantly more bits of supervision we extract per rollout via a review/reflect stage along the lines of "what went well? what didn't go so well? what should I try next time?" etc. and the lessons from this stage feel explicit, like a new string to be added to the system prompt for the future, optionally to be distilled into weights (/intuition) later a bit like sleep. In English, we say something becomes "second nature" via this process, and we're missing learning paradigms like this. The new Memory feature is maybe a primordial version of this in ChatGPT, though it is only used for customization not problem solving. Notice that there is no equivalent of this for e.g. Atari RL because there are no LLMs and no in-context learning in those domains. Example algorithm: given a task, do a few rollouts, stuff them all into one context window (along with the reward in each case), use a meta-prompt to review/reflect on what went well or not to obtain string "lesson", to be added to system prompt (or more generally modify the current lessons database). Many blanks to fill in, many tweaks possible, not obvious. Example of lesson: we know LLMs can't super easily see letters due to tokenization and can't super easily count inside the residual stream, hence 'r' in 'strawberry' being famously difficult. Claude system prompt had a "quick fix" patch - a string was added along the lines of "If the user asks you to count letters, first separate them by commas and increment an explicit counter each time and do the task like that". This string is the "lesson", explicitly instructing the model how to complete the counting task, except the question is how this might fall out from agentic practice, instead of it being hard-coded by an engineer, how can this be generalized, and how lessons can be distilled over time to not bloat context windows indefinitely. TLDR: RL will lead to more gains because when done well, it is a lot more leveraged, bitter-lesson-pilled, and superior to SFT. It doesn't feel like the full story, especially as rollout lengths continue to expand. There are more S curves to find beyond, possibly specific to LLMs and without analogues in game/robotics-like environments, which is exciting.

405

8K

827

5K

1M

0

332

over 2 years ago

@chenglin903 thanks for your interest! currently we only have the technical report. Feel free to open an issue at Github if you have further questions!

0

21

over 2 years ago

We just released ⭐️Inferflow⭐️, an efficient and highly configurable inference engine in c++ for serving various large language models by simply modifying some lines in corresponding configuration files, without writing a single line of source code. https://t.co/qt6s9eOoib

2

10

2

1

1K

over 2 years ago

Extending the abilities of your LLMs through absorbing the wisdom from other LLMs. check it out👇 https://t.co/6mDNEzZuL7

elvis

@omarsar0

over 2 years ago

Knowledge Fusion of LLMs Is it possible to merge existing models into a more potent model? We have already seen a few ways that show the potential to effectively do this using approaches like weight merging and ensembling of models. This work proposes FuseLLM with the core idea of externalizing knowledge from multiple LLMs and transferring their capabilities to a target LLM. It leverages the generative distributions of source LLMs to externalize both their collective knowledge and individual strengths and transfer them to the target LLM through continual training. To put it simply, the idea is to benefit from the strengths of all the LLMs and combine them into one integrated model. Finds that the FuseLLM can improve the performance of the target model across a range of capabilities such as reasoning, common sense, and code generation. By the way, you can also perform the fusion among fine-tuned LLMs that specialize in specific tasks. This continues to be an interesting research area so hoping to document more on any new ideas and findings I come across.

omarsar0's tweet photo. Knowledge Fusion of LLMs

Is it possible to merge existing models into a more potent model?

We have already seen a few ways that show the potential to effectively do this using approaches like weight merging and ensembling of models.

This work proposes FuseLLM with the core idea of externalizing knowledge from multiple LLMs and transferring their capabilities to a target LLM.

It leverages the generative distributions of source LLMs to externalize both their collective knowledge and individual strengths and transfer them to the target LLM through continual training.

To put it simply, the idea is to benefit from the strengths of all the LLMs and combine them into one integrated model.

Finds that the FuseLLM can improve the performance of the target model across a range of capabilities such as reasoning, common sense, and code generation.

By the way, you can also perform the fusion among fine-tuned LLMs that specialize in specific tasks.

This continues to be an interesting research area so hoping to document more on any new ideas and findings I come across.

17

960

212

791

122K

0

17

1

5

2K

deng_cai retweeted

elvis

@omarsar0

over 2 years ago

Knowledge Fusion of LLMs Is it possible to merge existing models into a more potent model? We have already seen a few ways that show the potential to effectively do this using approaches like weight merging and ensembling of models. This work proposes FuseLLM with the core idea of externalizing knowledge from multiple LLMs and transferring their capabilities to a target LLM. It leverages the generative distributions of source LLMs to externalize both their collective knowledge and individual strengths and transfer them to the target LLM through continual training. To put it simply, the idea is to benefit from the strengths of all the LLMs and combine them into one integrated model. Finds that the FuseLLM can improve the performance of the target model across a range of capabilities such as reasoning, common sense, and code generation. By the way, you can also perform the fusion among fine-tuned LLMs that specialize in specific tasks. This continues to be an interesting research area so hoping to document more on any new ideas and findings I come across.

17

960

212

791

122K

over 2 years ago

Inferflow also comes with advanced 3.5-bit quantization and hybrid model partitioning for multi-GPU inference. https://t.co/jrHE3izItt

0

2

0

256

over 2 years ago

Thanks @_akhaliq for sharing our work! People receive language feedback rather than numerical scores. Can LLMs also take advantage of such insightful feedback?👇 paper: https://t.co/9RDdQPZMPo code: https://t.co/Bh85B8ds8h

over 2 years ago

Reasons to Reject? Aligning Language Models with Judgments paper page: https://t.co/zz0CHfQBFa As humans, we consistently engage in interactions with our peers and receive feedback in the form of natural language. This language feedback allows us to reflect on our actions, maintain appropriate behavior, and rectify our errors. The question arises naturally: can we use language feedback to align large language models (LLMs)? In contrast to previous research that aligns LLMs with reward or preference data, we present the first systematic exploration of alignment through the lens of language feedback (i.e., judgment). We commence with an in-depth investigation of potential methods that can be adapted for aligning LLMs with judgments, revealing that these methods are unable to fully capitalize on the judgments. To facilitate more effective utilization of judgments, we propose a novel framework, Contrastive Unlikelihood Training (CUT), that allows for fine-grained inappropriate content detection and correction based on judgments. Our offline alignment results show that, with merely 1317 off-the-shelf judgment data, CUT (LLaMA2-13b) can beat the 175B DaVinci003 and surpass the best baseline by 52.34 points on AlpacaEval. The online alignment results demonstrate that CUT can align LLMs (LLaMA2-chat-13b) in an iterative fashion using model-specific judgment data, with a steady performance improvement from 81.09 to 91.36 points on AlpacaEval. Our analysis further suggests that judgments exhibit greater potential than rewards for LLM alignment and warrant future research.

_akhaliq's tweet photo. Reasons to Reject? Aligning Language Models with Judgments

paper page: https://t.co/zz0CHfQBFa

As humans, we consistently engage in interactions with our peers and receive feedback in the form of natural language. This language feedback allows us to reflect on our actions, maintain appropriate behavior, and rectify our errors. The question arises naturally: can we use language feedback to align large language models (LLMs)? In contrast to previous research that aligns LLMs with reward or preference data, we present the first systematic exploration of alignment through the lens of language feedback (i.e., judgment). We commence with an in-depth investigation of potential methods that can be adapted for aligning LLMs with judgments, revealing that these methods are unable to fully capitalize on the judgments. To facilitate more effective utilization of judgments, we propose a novel framework, Contrastive Unlikelihood Training (CUT), that allows for fine-grained inappropriate content detection and correction based on judgments. Our offline alignment results show that, with merely 1317 off-the-shelf judgment data, CUT (LLaMA2-13b) can beat the 175B DaVinci003 and surpass the best baseline by 52.34 points on AlpacaEval. The online alignment results demonstrate that CUT can align LLMs (LLaMA2-chat-13b) in an iterative fashion using model-specific judgment data, with a steady performance improvement from 81.09 to 91.36 points on AlpacaEval. Our analysis further suggests that judgments exhibit greater potential than rewards for LLM alignment and warrant future research.

1

111

24

73

72K

0

37

7

14

19K

deng_cai retweeted

over 2 years ago

Reasons to Reject? Aligning Language Models with Judgments paper page: https://t.co/zz0CHfQBFa As humans, we consistently engage in interactions with our peers and receive feedback in the form of natural language. This language feedback allows us to reflect on our actions, maintain appropriate behavior, and rectify our errors. The question arises naturally: can we use language feedback to align large language models (LLMs)? In contrast to previous research that aligns LLMs with reward or preference data, we present the first systematic exploration of alignment through the lens of language feedback (i.e., judgment). We commence with an in-depth investigation of potential methods that can be adapted for aligning LLMs with judgments, revealing that these methods are unable to fully capitalize on the judgments. To facilitate more effective utilization of judgments, we propose a novel framework, Contrastive Unlikelihood Training (CUT), that allows for fine-grained inappropriate content detection and correction based on judgments. Our offline alignment results show that, with merely 1317 off-the-shelf judgment data, CUT (LLaMA2-13b) can beat the 175B DaVinci003 and surpass the best baseline by 52.34 points on AlpacaEval. The online alignment results demonstrate that CUT can align LLMs (LLaMA2-chat-13b) in an iterative fashion using model-specific judgment data, with a steady performance improvement from 81.09 to 91.36 points on AlpacaEval. Our analysis further suggests that judgments exhibit greater potential than rewards for LLM alignment and warrant future research.

1

111

24

73

72K

deng_cai retweeted

Aran Komatsuzaki

@arankomatsuzaki

over 2 years ago

GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation abs: https://t.co/lmAu9j0WAk repo: https://t.co/fJ85eYHMr7

arankomatsuzaki's tweet photo. GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation

abs: https://t.co/lmAu9j0WAk
repo: https://t.co/fJ85eYHMr7 https://t.co/AYRco4iYdi

3

290

58

139

49K

deng_cai retweeted

Yixuan Su @yixuan_su

over 2 years ago

Our work is accepted to #NeurIPS2023. Huge yay to our awesome team @HuayangLi, @gmftbyGMFTBY, @fuzihaofzh, @deng_cai, Lemao, @nigelhcollier, @tarowatanabe, and me! Our paper and code can be found at: 1. Paper: https://t.co/upiLZU26WM 2. Code: https://t.co/AI5qdzbMy3 [n/n]

0

6

3

2

1K

almost 3 years ago

I personally really like the following example. Our model is so heartwarming ❤️❤️❤️

0

2

0

230

almost 3 years ago

check out our instruction-following model supports interleaved image-text inputs and outputs! 🌟https://t.co/2HzoSrg4tE🌟 Most importantly, natural and engaging interactions in various realistic scenarios! No need to pretend you're blind in front of our assistant.🥳

almost 3 years ago

TextBind: Multi-turn Interleaved Multimodal Instruction-following paper page: https://t.co/vQqxOMZ7bj Large language models with instruction-following abilities have revolutionized the field of artificial intelligence. These models show exceptional generalizability to tackle various real-world tasks through their natural language interfaces. However, their performance heavily relies on high-quality exemplar data, which is often difficult to obtain. This challenge is further exacerbated when it comes to multimodal instruction following. We introduce TextBind, an almost annotation-free framework for empowering larger language models with the multi-turn interleaved multimodal instruction-following capabilities. Our approach requires only image-caption pairs and generates multi-turn multimodal instruction-response conversations from a language model. We release our dataset, model, and demo to foster future research in the area of multimodal instruction following.

_akhaliq's tweet photo. TextBind: Multi-turn Interleaved Multimodal Instruction-following

paper page: https://t.co/vQqxOMZ7bj

Large language models with instruction-following abilities have revolutionized the field of artificial intelligence. These models show exceptional generalizability to tackle various real-world tasks through their natural language interfaces. However, their performance heavily relies on high-quality exemplar data, which is often difficult to obtain. This challenge is further exacerbated when it comes to multimodal instruction following. We introduce TextBind, an almost annotation-free framework for empowering larger language models with the multi-turn interleaved multimodal instruction-following capabilities. Our approach requires only image-caption pairs and generates multi-turn multimodal instruction-response conversations from a language model. We release our dataset, model, and demo to foster future research in the area of multimodal instruction following.

1

93

15

43

25K

1

15

4

0

2K

almost 3 years ago

Our work exhibits these exciting features: 🚀Comprehend, compare, relate multiple images across a conversation 🚀 Generate vivid responses with interleaved image-text content 🚀Produce spontaneously images based on context. No need for explicit dictations 🚀 explore our demo!!!

1

0

221

deng_cai retweeted

almost 3 years ago

TextBind: Multi-turn Interleaved Multimodal Instruction-following paper page: https://t.co/vQqxOMZ7bj Large language models with instruction-following abilities have revolutionized the field of artificial intelligence. These models show exceptional generalizability to tackle various real-world tasks through their natural language interfaces. However, their performance heavily relies on high-quality exemplar data, which is often difficult to obtain. This challenge is further exacerbated when it comes to multimodal instruction following. We introduce TextBind, an almost annotation-free framework for empowering larger language models with the multi-turn interleaved multimodal instruction-following capabilities. Our approach requires only image-caption pairs and generates multi-turn multimodal instruction-response conversations from a language model. We release our dataset, model, and demo to foster future research in the area of multimodal instruction following.

1

93

15

43

25K

almost 3 years ago

I really think this is super interesting! We do not pick next token in a fixed, finite, and standalone vocabulary. We recall our memories and experiences, and choose the most suitable phrases in contexts.

almost 3 years ago

Copy Is All You Need paper page: https://t.co/kDTsbUQhFr The dominant text generation models compose the output by sequentially selecting words from a fixed vocabulary. In this paper, we formulate text generation as progressively copying text segments (e.g., words or phrases) from an existing text collection. We compute the contextualized representations of meaningful text segments and index them using efficient vector search toolkits. The task of text generation is then decomposed into a series of copy-and-paste operations: at each time step, we seek suitable text spans from the text collection rather than selecting from a standalone vocabulary. Experiments on the standard language modeling benchmark (WikiText-103) show that our approach achieves better generation quality according to both automatic and human evaluations. Besides, its inference efficiency is comparable to token-level autoregressive models thanks to the reduction of decoding steps. We also show that our approach allows for effective domain adaptation by simply switching to domain-specific text collection without extra training. Finally, we observe that our approach attains additional performance gains by simply scaling up to larger text collections, again without further training

_akhaliq's tweet photo. Copy Is All You Need

paper page: https://t.co/kDTsbUQhFr

The dominant text generation models compose the output by sequentially selecting words from a fixed vocabulary. In this paper, we formulate text generation as progressively copying text segments (e.g., words or phrases) from an existing text collection. We compute the contextualized representations of meaningful text segments and index them using efficient vector search toolkits. The task of text generation is then decomposed into a series of copy-and-paste operations: at each time step, we seek suitable text spans from the text collection rather than selecting from a standalone vocabulary. Experiments on the standard language modeling benchmark (WikiText-103) show that our approach achieves better generation quality according to both automatic and human evaluations. Besides, its inference efficiency is comparable to token-level autoregressive models thanks to the reduction of decoding steps. We also show that our approach allows for effective domain adaptation by simply switching to domain-specific text collection without extra training. Finally, we observe that our approach attains additional performance gains by simply scaling up to larger text collections, again without further training

19

671

124

340

167K

1

50

4

9

11K