Jing Xiong @_June1126 - Twitter Profile

6 months ago

@FengzhuoZhang @YuanC233 @Yunlong_Hou_ @ducx_du @duchao0726 @TianyuPang1 @axsun @zhuoran_yang Congratulations on your paper！I would also like to share our related work for your reference: DoPE: Denoising Rotary Position Embedding (https://t.co/ZzrWrcbLbw ). @YuanC233 @Yunlong_Hou_ @ducx_du @duchao0726 @TianyuPang1 @axsun @zhuoran_yang

0

5

Jing Xiong @_June1126

6 months ago

@PetarV_93 @FengzhuoZhang @YuanC233 @Yunlong_Hou_ @ducx_du @duchao0726 @TianyuPang1 @axsun @zhuoran_yang @Guangxuan_Xiao @songhan_mit @iofu728 @chengruidong @fedzbar I agree with your point. Similarly, I have a paper; the conical constraint induced by the low-frequency component of RoPE has already been studied in DoPE: Denoising Rotary Position Embedding (https://t.co/ZzrWrcbLbw ). I would like to know how the authors judge this paper.

0

3

Jing Xiong @_June1126

6 months ago

@FengzhuoZhang @YuanC233 @Yunlong_Hou_ @ducx_du @duchao0726 @TianyuPang1 @axsun @zhuoran_yang @Guangxuan_Xiao @songhan_mit @iofu728 @chengruidong @fedzbar @PetarV_93 The conical constraint induced by the low-frequency components of RoPE has already been studied in DoPE: Denoising Rotary Position Embedding (https://t.co/ZzrWrcbLbw ). May I ask for your thoughts on this?@Guangxuan_Xiao @songhan_mit @iofu728 @chengruidong @fedzbar @PetarV_93

0

13

Jing Xiong @_June1126

6 months ago

@FengzhuoZhang The conical constraint induced by the low-frequency components of RoPE has already been studied in DoPE: Denoising Rotary Position Embedding (https://t.co/ZzrWrcbLbw ). May I ask for your thoughts on this?

0

39

_June1126 retweeted

Zhijiang Guo @ZhijiangG

8 months ago

🤗Will present our #EMNLP2025 paper this morning! TLDR: Beyond KV Cache: New Insights on LLM Sparsity. This paper offers not just an efficient inference framework, but a new theoretical lens to understand how information flows inside LLMs. Come & talk to us if you are interested!

ZhijiangG's tweet photo. 🤗Will present our #EMNLP2025 paper this morning! TLDR: Beyond KV Cache: New Insights on LLM Sparsity.
This paper offers not just an efficient inference framework, but a new theoretical lens to understand how information flows inside LLMs.
Come & talk to us if you are interested! https://t.co/wzTqtEq3Sp

0

24

5

4

1K

_June1126 retweeted

Sumit @_reachsumit

11 months ago

CTR-Sink: Attention Sink for Language Models in Click-Through Rate Prediction Introduces behavior-level attention sinks to address semantic fragmentation in LM-based CTR prediction by inserting recommendation-specific tokens between user behaviors. ��https://t.co/UHLoK6aPDd

0

3

1

2

498

_June1126 retweeted

Sumit @_reachsumit

almost 2 years ago

UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation Uses SNR-based span uncertainty for improved chunk similarity estimation, offering flexible integration across various LLMs without fine-tuning. 📝https://t.co/4Okp2IezIs

_reachsumit's tweet photo. UncertaintyRAG: Span-Level Uncertainty Enhanced Long-Context Modeling for Retrieval-Augmented Generation

Uses SNR-based span uncertainty for improved chunk similarity estimation, offering flexible integration across various LLMs without fine-tuning.

📝https://t.co/4Okp2IezIs https://t.co/2RYWYwpFHu

0

5

4

1K

Jing Xiong @_June1126

about 1 year ago

#ICML2025 #ParrallelComp #Long-context #Length Extrapolation #Memory-bound #Efficient-inference #KV cache Compression #128K Token

0

86

Jing Xiong @_June1126

about 1 year ago

🔬 The HKU team presents ParallelComp: a training-free technique for efficient context length extrapolation in LLMs—from 8K up to 128K tokens—on a single A100 GPU, with minimal performance loss. 📄 Paper: https://t.co/HbKsGN0eqX 💻 Code: https://t.co/T2Au0WEGY1

_June1126's tweet photo. 🔬 The HKU team presents ParallelComp: a training-free technique for efficient context length extrapolation in LLMs—from 8K up to 128K tokens—on a single A100 GPU, with minimal performance loss.

📄 Paper: https://t.co/HbKsGN0eqX
💻 Code: https://t.co/T2Au0WEGY1 https://t.co/P78kxygkeM

5

14

8

3

1K

Jing Xiong @_June1126

about 1 year ago

🚀 Our 8B LLM achieves 91.17% of GPT-4's performance on ultra-long context reasoning, surpassing formidable models such as Claude-2 and Kimi-Chat—all with only 8K context training.

_June1126's tweet photo. 🚀 Our 8B LLM achieves 91.17% of GPT-4's performance on ultra-long context reasoning, surpassing formidable models such as Claude-2 and Kimi-Chat—all with only 8K context training. https://t.co/fSzacH3vV9

0

92

Jing Xiong @_June1126

about 1 year ago

🧠 A key contribution is our theoretical and empirical analysis of attention bias under parallel attention. We uncover how and why attention sinks emerge and provide effective calibration strategies.

_June1126's tweet photo. 🧠 A key contribution is our theoretical and empirical analysis of attention bias under parallel attention. We uncover how and why attention sinks emerge and provide effective calibration strategies. https://t.co/c1i18E9A1y

0

85

Jing Xiong @_June1126

about 1 year ago

🔍 We tackle memory limitations in length extrapolation by introducing parallel attention, KV cache compression, and chunk eviction strategies that break the GPU memory bottleneck—without any retraining required.

_June1126's tweet photo. 🔍 We tackle memory limitations in length extrapolation by introducing parallel attention, KV cache compression, and chunk eviction strategies that break the GPU memory bottleneck—without any retraining required. https://t.co/TqMgWBwvpY

0

88

Jing Xiong @_June1126

about 1 year ago

Our paper has been accepted to ICML 2025! 🎉 📢 In this paper, we propose ParallelComp, a training-free method to enable LLMs to extrapolate context length from 8K up to 128K tokens on a single A100 GPU, with minimal performance loss.

_June1126's tweet photo. Our paper has been accepted to ICML 2025! 🎉

📢 In this paper, we propose ParallelComp, a training-free method to enable LLMs to extrapolate context length from 8K up to 128K tokens on a single A100 GPU, with minimal performance loss. https://t.co/GoECKStsAW

0

104

_June1126 retweeted

Hui Shen @HuiShen_umich

about 1 year ago

📷 New Benchmark Release: PhyX - Physical Reasoning for Multimodal Models 👉 Project Page: https://t.co/cKe4lehSsz 👉 Github: https://t.co/4ZtvWoBsaZ 👉 arXiv: https://t.co/IsxqqtZ1uL 👉 Huggingface Dataset: https://t.co/SU589GaF15

HuiShen_umich's tweet photo. 📷 New Benchmark Release: PhyX - Physical Reasoning for Multimodal Models

👉 Project Page: https://t.co/cKe4lehSsz
👉 Github: https://t.co/4ZtvWoBsaZ
👉 arXiv: https://t.co/IsxqqtZ1uL
👉 Huggingface Dataset: https://t.co/SU589GaF15 https://t.co/0YG0Tjv2Wi

9

11

5

1

1K

_June1126 retweeted

Chunlin Tian @clin_tian

almost 2 years ago

🔥Thrilled to announce our Oral acceptance at #NeurIPS2024! 🚀HydraLoRA, an asymmetric LoRA architecture with a shared A matrix for common knowledge and multiple B matrices for specialized adaptations, enhancing model performance while maximizing efficiency with a reduced param.

clin_tian's tweet photo. 🔥Thrilled to announce our Oral acceptance at #NeurIPS2024! 🚀HydraLoRA, an asymmetric LoRA architecture with a shared A matrix for common knowledge and multiple B matrices for specialized adaptations, enhancing model performance while maximizing efficiency with a reduced param. https://t.co/V7fNdY4aBK

8

48

15

9

6K

_June1126 retweeted

yuxuan_yy @cerana99x

almost 2 years ago

🌟Excited to share LeCo's acceptance at #COLM2024! 🤔Fed up with LLMs' self-correct struggles and endless prompts? 🪄LeCo uses logits for confidence scores, skipping tedious prompts and rethinking from the last correct step. 📖:https://t.co/RMh6f1qKEe 💻:https://t.co/EollTSJBZq

cerana99x's tweet photo. 🌟Excited to share LeCo's acceptance at #COLM2024!
🤔Fed up with LLMs' self-correct struggles and endless prompts?
🪄LeCo uses logits for confidence scores, skipping tedious prompts and rethinking from the last correct step.
📖:https://t.co/RMh6f1qKEe
💻:https://t.co/EollTSJBZq https://t.co/f9jdH0WV52

6

24

14

3

2K

_June1126 retweeted

Zhijiang Guo @ZhijiangG

about 2 years ago

+👋LLMs work quite well on modeling/understanding long context. What about generating long content 🤔 Check our ACL paper ProxyQA for evaluating Long-Form Generation (way longer🪘🪘 📝Paper: https://t.co/5HsVjNf17T 🐙Code: https://t.co/KhSXLin2yU

0

22

9

3

3K

_June1126 retweeted

Yinhong Liu @YinhongLiu2

about 2 years ago

🔥New paper!📜 Struggle to align LLM evaluators with human judgements?🤔 Introducing PairS🌟: By exploiting transitivity, we push the potential of pairwise preference in efficient ranking evaluations that has better alignment!🧑‍⚖️ 📖https://t.co/W4wSHQqdYc 💻https://t.co/q5ZMGkvaaj

YinhongLiu2's tweet photo. 🔥New paper!📜
Struggle to align LLM evaluators with human judgements?🤔
Introducing PairS🌟: By exploiting transitivity, we push the potential of pairwise preference in efficient ranking evaluations that has better alignment!🧑‍⚖️
📖https://t.co/W4wSHQqdYc
💻https://t.co/q5ZMGkvaaj https://t.co/1BTwJz5I5v

2

36

10

18

10K

_June1126 retweeted

DiscreteSpace @space_discrete

over 2 years ago

翻到一篇文章[ICLR'24]Understanding Addition in Transformers 回忆起在初学oi年代被老师问了一道题：怎么直接按从左到右的顺序直接做大整数加法，不允许读完再翻转。当时想了十分钟想到一个存9和进位链的做法，被老师夸了。但我觉得这题没现实意义没想到LLM因为必须从左到右写，也悟出了一样的算法

6

111

14

32

19K

_June1126 retweeted

Jing Xiong @_June1126

over 2 years ago

Excited to announce our paper's acceptance at ICLR 2024! 🌟 Our algorithm leverages CoT for enhanced in-context exemplar selection, optimizing it with a unique mapping function. Dive in: 📄[Paper](https://t.co/7LaZZwPlT1) 💻[Code](https://t.co/tSjPJuZaB2) #ICLR2024 #AI #NLP #LLMs

_June1126's tweet photo. Excited to announce our paper's acceptance at ICLR 2024! 🌟 Our algorithm leverages CoT for enhanced in-context exemplar selection, optimizing it with a unique mapping function. Dive in: 📄[Paper](https://t.co/7LaZZwPlT1) 💻[Code](https://t.co/tSjPJuZaB2) #ICLR2024 #AI #NLP #LLMs https://t.co/8yxyVPAnjI

1

7

4

0

1K

Jing Xiong

@_June1126

Last Seen Users on Sotwe

Trends for you

Most Popular Users