@FengzhuoZhang The conical constraint induced by the low-frequency components of RoPE has already been studied in DoPE: Denoising Rotary Position Embedding (https://t.co/ZzrWrcbLbw
). May I ask for your thoughts on this?
🤗Will present our #EMNLP2025 paper this morning! TLDR: Beyond KV Cache: New Insights on LLM Sparsity.
This paper offers not just an efficient inference framework, but a new theoretical lens to understand how information flows inside LLMs.
Come & talk to us if you are interested!
CTR-Sink: Attention Sink for Language Models in Click-Through Rate Prediction
Introduces behavior-level attention sinks to address semantic fragmentation in LM-based CTR prediction by inserting recommendation-specific tokens between user behaviors.
��https://t.co/UHLoK6aPDd
🔬 The HKU team presents ParallelComp: a training-free technique for efficient context length extrapolation in LLMs—from 8K up to 128K tokens—on a single A100 GPU, with minimal performance loss.
📄 Paper: https://t.co/HbKsGN0eqX
💻 Code: https://t.co/T2Au0WEGY1
🚀 Our 8B LLM achieves 91.17% of GPT-4's performance on ultra-long context reasoning, surpassing formidable models such as Claude-2 and Kimi-Chat—all with only 8K context training.
🧠 A key contribution is our theoretical and empirical analysis of attention bias under parallel attention. We uncover how and why attention sinks emerge and provide effective calibration strategies.
🔍 We tackle memory limitations in length extrapolation by introducing parallel attention, KV cache compression, and chunk eviction strategies that break the GPU memory bottleneck—without any retraining required.
Our paper has been accepted to ICML 2025! 🎉
📢 In this paper, we propose ParallelComp, a training-free method to enable LLMs to extrapolate context length from 8K up to 128K tokens on a single A100 GPU, with minimal performance loss.
🔥Thrilled to announce our Oral acceptance at #NeurIPS2024! 🚀HydraLoRA, an asymmetric LoRA architecture with a shared A matrix for common knowledge and multiple B matrices for specialized adaptations, enhancing model performance while maximizing efficiency with a reduced param.
🌟Excited to share LeCo's acceptance at #COLM2024!
🤔Fed up with LLMs' self-correct struggles and endless prompts?
🪄LeCo uses logits for confidence scores, skipping tedious prompts and rethinking from the last correct step.
📖:https://t.co/RMh6f1qKEe
💻:https://t.co/EollTSJBZq
+👋LLMs work quite well on modeling/understanding long context.
What about generating long content 🤔
Check our ACL paper ProxyQA for evaluating Long-Form Generation (way longer🪘🪘
📝Paper: https://t.co/5HsVjNf17T
🐙Code: https://t.co/KhSXLin2yU
🔥New paper!📜
Struggle to align LLM evaluators with human judgements?🤔
Introducing PairS🌟: By exploiting transitivity, we push the potential of pairwise preference in efficient ranking evaluations that has better alignment!🧑⚖️
📖https://t.co/W4wSHQqdYc
💻https://t.co/q5ZMGkvaaj