Haozheng LUO

about 2 months ago

🚀 Launching Growing LLMs Beyond Boundaries (GLB) — a non-profit initiative to push LLMs beyond task-specific limits and into real-world impact. Goal: move from scaling → boundary-expanding systems (multimodal, adaptive, real-world).Join us: https://t.co/RLz8qgbi8V

0

20

robinluo1997 retweeted

CorticesAI @CorticesCode

2 months ago

Cortices is mission control for AI agents, with support for Claude Code and Codex. Bring agents across laptops, servers, and workstations into one place. Chat with agents, review diffs, and schedule work from any browser or phone. Free, no waitlist: https://t.co/VGBRpWmbsq

1

3

1

24

robinluo1997 retweeted

3 months ago

🚨 New paper alert !! 🎥 Video VLMs are strong at high-level semantics and long-range temporal understanding. 🧠 JEPA is almost the opposite: better at dense, high-frequency dynamics, local physical consistency, and fast corrective control, but are less suited for rich semantic reasoning and long-horizon reasoning. We try to get the best of both: 🧩 A VLM as a cortex-like reasoner for semantics and long-horizon planning ⚡ A JEPA branch as a cerebellum-like controller for fine-grained dynamics, physical consistency, and rapid corrections Proudly, we present ThinkJEPA: a VLM-guided latent world model that FiLM-fuse the pyramid repr of VLMs encoding long-horizon semantic reasoning into the JEPA repr for fine-grained, physically consistent dynamics prediction. 🔗 Project: https://t.co/quro6Pf8un 📄 Paper: https://t.co/yO5rv3ZJT7

Williamiumli's tweet photo. 🚨 New paper alert !!

🎥 Video VLMs are strong at high-level semantics and long-range temporal understanding.

🧠 JEPA is almost the opposite: better at dense, high-frequency dynamics, local physical consistency, and fast corrective control, but are less suited for rich semantic reasoning and long-horizon reasoning.

We try to get the best of both:
🧩 A VLM as a cortex-like reasoner for semantics and long-horizon planning
⚡ A JEPA branch as a cerebellum-like controller for fine-grained dynamics, physical consistency, and rapid corrections

Proudly, we present ThinkJEPA: a VLM-guided latent world model that FiLM-fuse the pyramid repr of VLMs encoding long-horizon semantic reasoning into the JEPA repr for fine-grained, physically consistent dynamics prediction.

🔗 Project: https://t.co/quro6Pf8un
📄 Paper: https://t.co/yO5rv3ZJT7

7

357

67

237

18K

3 months ago

We introduce TIDES, the first work showing that test-time scaling can be adversarially exploited. By injecting small latent perturbations (DLT), we induce inference drift that amplifies with reasoning depth, causing longer reasoning traces to degrade performance. #safety

robinluo1997's tweet photo. We introduce TIDES, the first work showing that test-time scaling can be adversarially exploited. By injecting small latent perturbations (DLT), we induce inference drift that amplifies with reasoning depth, causing longer reasoning traces to degrade performance.
#safety https://t.co/Uxf6gGL8VS

0

19

3 months ago

🧠 Make models reason safely, not just respond safely We introduce CRAFT (Contrastive Reasoning Alignment) to address a key gap: current safety methods mostly constrain outputs, while reasoning trajectories remain unaligned.

robinluo1997's tweet photo. 🧠 Make models reason safely, not just respond safely
We introduce CRAFT (Contrastive Reasoning Alignment) to address a key gap: current safety methods mostly constrain outputs, while reasoning trajectories remain unaligned. https://t.co/i6Kf2p90JV

0

10

robinluo1997 retweeted

Hokin Deng

@DengHokin

4 months ago

#VideoReason We are open-sourcing the entire VBVR stack to speed-up the arrival of video reasoning as the next fundamental paradigm of intelligence - 150+ synthetic generators - 1 million training clips - Cloud-scale data factory - Unified EvalKit - 100 rule-based evaluators - Strong baseline model Checkout at https://t.co/lOtJzJYC52

19

225

66

134

53K

robinluo1997 retweeted

4 months ago

New paper alert 🔔‼️🚨✨ Super excited to share our latest work: Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs (arXiv:2602.10388) 🎉📄🧠 If you’re doing SFT / alignment / RM and “more data” feels like roulette… this is for you 🎰😵‍💫🫠 The annoying reality we show in this work is: 📚⬆️ bigger dataset ≠ 📈⬆️ better model ❌ Instead, it depends on whether your training data covers the abilities the model is currently missing 🧩🕳️✅ So the question isn’t “how much data?” 🤷‍♂️📦 It’s: what capability gaps exist, and does my data fill them? 🕳️➡️🧱🔦⚙️✨

Williamiumli's tweet photo. New paper alert 🔔‼️🚨✨

Super excited to share our latest work: Less is Enough: Synthesizing Diverse Data in Feature Space of LLMs (arXiv:2602.10388) 🎉📄🧠

If you’re doing SFT / alignment / RM and “more data” feels like roulette… this is for you 🎰😵‍💫🫠

The annoying reality we show in this work is:
📚⬆️ bigger dataset ≠ 📈⬆️ better model ❌

Instead, it depends on whether your training data covers the abilities the model is currently missing 🧩🕳️✅

So the question isn’t “how much data?” 🤷‍♂️📦
It’s: what capability gaps exist, and does my data fill them? 🕳️➡️🧱🔦⚙️✨

4

5

3

1

226

robinluo1997 retweeted

4 months ago

🚨 Announcing our #CVPR2026 the 2nd Workshop on Multi-Modal Reasoning for Agentic Intelligence @CVPR @CVPRConf. 🚀 Call for Papers!!! We accept regular papers limited to eight pages in the CVPR style. Timeline: 📅 Submission open: February 1th 2026, 23:59 AoE Time 📣 Submission deadline: March 15th 2026, 23:59 AoE Time Submit now: https://t.co/2LEDAV1Yzh 🌐 Learn more at our Website: https://t.co/aOPc7GXA9G Kudos to our amazing co-organizers @9LdROhjZE56jSh9 , @zixianma02 , Anda Epure, @kamath_amita , @MahtabBg , @alexttoshev , @RanjayKrishna , @lucy_x_shi , @_anniechen_ , @philiptorr

Williamiumli's tweet photo. 🚨 Announcing our #CVPR2026 the 2nd Workshop on Multi-Modal Reasoning for Agentic Intelligence @CVPR @CVPRConf.

🚀 Call for Papers!!!
We accept regular papers limited to eight pages in the CVPR style.

Timeline:
📅 Submission open: February 1th 2026, 23:59 AoE Time
📣 Submission deadline: March 15th 2026, 23:59 AoE Time

Submit now: https://t.co/2LEDAV1Yzh

🌐 Learn more at our Website: https://t.co/aOPc7GXA9G

Kudos to our amazing co-organizers @9LdROhjZE56jSh9 , @zixianma02 , Anda Epure, @kamath_amita , @MahtabBg , @alexttoshev , @RanjayKrishna , @lucy_x_shi , @_anniechen_ , @philiptorr

2

41

17

11

13K

5 months ago

❄️ FROST: Filtering Reasoning Outliers with Attention FROST filters reasoning outliers inside attention, reducing redundant tokens while preserving accuracy. 🧠 Reasoning Outlier 📉 Shorter, cleaner reasoning Efficient reasoning beats overthinking. #ICLR2026 #LLM #Reasoning

robinluo1997's tweet photo. ❄️ FROST: Filtering Reasoning Outliers with Attention

FROST filters reasoning outliers inside attention, reducing redundant tokens while preserving accuracy.

🧠 Reasoning Outlier
📉 Shorter, cleaner reasoning

Efficient reasoning beats overthinking.

#ICLR2026 #LLM #Reasoning https://t.co/kzI0k0Hofx

0

29

robinluo1997 retweeted

5 months ago

🚨Announcing our #ICLR2026 Workshop, The First Workshop on Efficient Spatial Reasoning (ES-Reasoning) @iclr_conf 🚀 Call for Papers!!! We accept Regular (9 pages) and Tiny (4 pages) papers!! Timeline: 📅 Deadline: 2026/02/09, 11:59PM AOE 📣 Acceptance notification: 2026/03/01 🌐Learn more at our Website: https://t.co/H4lpdiFhPQ Kudos to our amazing co-organizers @zhijianliu_ @ManlingLi_ @robinluo1997 @SeerePan @Besteuler @ruiyang_qin @ucsd_cse @UCSDJacobs @HDSIUCSD @CUHKofficial @VillanovaU @NorthwesternU

6

59

17

18

16K

6 months ago

@dogacel0 Good

0

1

0

185

7 months ago

Looking forward to #NeurIPS2025! https://t.co/7F57jZ5Q8J

0

31

8 months ago

Thrilled to be selected as a DAAD AINeT Fellow for Postdoc-NeT-AI 11/2025 on Explainable AI, supported by the German Federal Ministry of Research. Excited to contribute to trustworthy and transparent AI research! Thanks to my advisors Yan Chen & Han Liu #AI #NU @northwesterncs

0

1

0

125

9 months ago

🚀 RHYTHM (NeurIPS’25): Hierarchical temporal tokenization + frozen LLMs for human mobility. Captures multi-scale periodicity & long-range deps. 📊 +2.4% acc, +5.0% weekends, −24.6% training time vs SOTA. 🔗 https://t.co/Z1re9TpIY2 #NeurIPS2025 #AI #Mobility

0

2

0

85

10 months ago

@cwolferesearch Attention Sink and our work are contemporaneous, with one applying Softmax1 in the deployment stage and the other in the training stage. So, strictly speaking, they are the two main extensions of Softmax1.

0

5

10 months ago

@cwolferesearch I am the author of OutEffHop, a researcher who allocated in Northwestern university. I believe the gpt-odd also use our technology in their training . The paper link is below; https://t.co/yV9UqSY8Ki.

1

0

23

10 months ago

Attention Sink and our work are contemporaneous, with one applying Softmax1 in the deployment stage and the other in the training stage. So, strictly speaking, they are the two main extensions of Softmax1.

0

26

10 months ago

We are pleased to see GPT-oss adopting the one-off technique from Attention and our related work, Attention Sink. This is a good opportunity to highlight our two ICML papers on OutEffHop, which focus on optimizing the training process to enhance both LoRA and quantization.

Cameron R. Wolfe, Ph.D.

@cwolferesearch

10 months ago

The gpt-oss models from OpenAI are a synthesis of ideas from prior research. Here are 10 interesting papers that were directly used in gpt-oss… (1) Longformer: Introduces sliding window attention, a form of sparse attention that is utilized in alternating layers of both gpt-oss models. (2) StreamingLLM: Describes the concept of attention sinks in large language models (LLMs)—these are tokens within a sequence that the model assigns high attention or weight to, simply because the softmax operation prevents the model from assigning attention to no tokens at all. (3) Off-by-one attention: Proposes a solution to attention sinks by allowing the attention mechanism to assign no attention to any token. This is achieved by adding a bias term of 1 to the denominator of the softmax operation within attention. In gpt-oss models, a similar approach is used, but the bias term is learned rather than fixed at 1. (4) Switch Transformer: Presents several ideas foundational to modern mixture-of-experts (MoE) based LLMs. It’s important to note that many other papers, in addition to Switch Transformer, have contributed to this field. (5) RMSNorm: A streamlined variant of layer normalization that is both more efficient and has fewer trainable parameters. Both gpt-oss models employ RMSNorm. (6) RoPE: Stands for Rotary Positional Encoding, a hybrid absolute/relative positional encoding method used by gpt-oss models. RoPE encodes absolute position using a rotation matrix and incorporates relative position information directly into the self-attention mechanism. (7) YaRN: A method for extending the context window in LLMs, which is adopted by gpt-oss models. YaRN works by adjusting the frequency basis used within RoPE and further training the LLM to handle longer contexts. (8) Flash Attention: Utilized by gpt-oss models, flash attention leverages system-level optimizations to significantly improve the computational and memory efficiency of the attention operation. (9) DeepSeek-R1: While the specific reasoning or reinforcement learning (RL) training strategies used by gpt-oss models are not fully detailed, the DeepSeek-R1 technical report offers a comprehensive overview of how RL training with verifiable rewards is implemented at scale. (10) Deliberative alignment: This is the safety training approach used by gpt-oss models, designed to teach the models how to reason through safety specifications and determine when it is appropriate to refuse a request.

cwolferesearch's tweet photo. The gpt-oss models from OpenAI are a synthesis of ideas from prior research. Here are 10 interesting papers that were directly used in gpt-oss…

(1) Longformer: Introduces sliding window attention, a form of sparse attention that is utilized in alternating layers of both gpt-oss models.

(2) StreamingLLM: Describes the concept of attention sinks in large language models (LLMs)—these are tokens within a sequence that the model assigns high attention or weight to, simply because the softmax operation prevents the model from assigning attention to no tokens at all.

(3) Off-by-one attention: Proposes a solution to attention sinks by allowing the attention mechanism to assign no attention to any token. This is achieved by adding a bias term of 1 to the denominator of the softmax operation within attention. In gpt-oss models, a similar approach is used, but the bias term is learned rather than fixed at 1.

(4) Switch Transformer: Presents several ideas foundational to modern mixture-of-experts (MoE) based LLMs. It’s important to note that many other papers, in addition to Switch Transformer, have contributed to this field.

(5) RMSNorm: A streamlined variant of layer normalization that is both more efficient and has fewer trainable parameters. Both gpt-oss models employ RMSNorm.

(6) RoPE: Stands for Rotary Positional Encoding, a hybrid absolute/relative positional encoding method used by gpt-oss models. RoPE encodes absolute position using a rotation matrix and incorporates relative position information directly into the self-attention mechanism.

(7) YaRN: A method for extending the context window in LLMs, which is adopted by gpt-oss models. YaRN works by adjusting the frequency basis used within RoPE and further training the LLM to handle longer contexts.

(8) Flash Attention: Utilized by gpt-oss models, flash attention leverages system-level optimizations to significantly improve the computational and memory efficiency of the attention operation.

(9) DeepSeek-R1: While the specific reasoning or reinforcement learning (RL) training strategies used by gpt-oss models are not fully detailed, the DeepSeek-R1 technical report offers a comprehensive overview of how RL training with verifiable rewards is implemented at scale.

(10) Deliberative alignment: This is the safety training approach used by gpt-oss models, designed to teach the models how to reason through safety specifications and determine when it is appropriate to refuse a request.

7

409

88

404

28K

1

0

1

88