PhD student in Computer Science at Case Western Reserve University. MS in Electrical Engineering at USC. Interested in large language models compression.
📄 (i) “Quantized but Deceptive? A Multi-Dimensional Truthfulness Evaluation of Quantized LLMs” (Main Conference) https://t.co/eSeGvhdX3E
📄 (ii) “Pruning Weights but Not Truth: Safeguarding Truthfulness While Pruning LLMs” (Findings) https://t.co/6vWHrTXe1A
🚀 Thrilled to share my two recent first-author papers accepted at #EMNLP2025!
💡 Although I’m unable to travel to #Suzhou in person to meet the amazing NLP researchers attending #EMNLP2025, I’d love to connect virtually and exchange ideas!
Beautiful Paper.
A comprehensive survey of post-training methods including fine-tuning, reinforcement learning, and test-time scaling to refine LLMs reasoning.
Methods Explored in this Paper 🔧:
→ Systematically explores fine-tuning techniques that adapt LLMs for specific tasks, but acknowledges risks of overfitting and forgetting.
→ Reinforcement Learning from Human Feedback, are examined for aligning LLMs with human preferences and improving response quality.
→ Test-time scaling strategies, such as chain of thought prompting and tree of thought, are discussed to enhance reasoning during inference without retraining, by dynamically adjusting computation based on query complexity.
→ The paper also investigates reward modeling, policy optimization algorithms like Proximal Policy Optimization and Direct Preference Optimization, and efficient fine-tuning approaches to optimize LLMs post-training.
Is encouraging LLMs to reason through a task always beneficial?🤔
NO🛑- inspired by when verbal thinking makes humans worse at tasks, we predict when CoT impairs LLMs & find 3 types of failure cases.
In one OpenAI o1 preview accuracy drops 36.3% compared to GPT-4o zero-shot!😱
@sebgehr Dear Sebastian, I am interested in the Summer 2025 Internship @TechAtBloomberg and have already filled the form! I am a third-year CS PhD candidate at Case Western Reserve University focusing on compressing LLMs. This is my website https://t.co/dBhNJfZE7M. Thank you!
Excited to share our 5 papers accepted to #ICML2024@icmlconf
Happy to discuss in detail if you are interested, feel free to DM me :)
1⃣ On the Possibilities of AI-generated Text Detection
https://t.co/jgvk0w1Ru2
CS159: LLMs for reasoning lecture slides from Caltech are really good. Link: https://t.co/cqQrAHa4Kg
Thank you for making them public @yisongyue and @acbuller
A Categorical Archive of ChatGPT Failures
Comprehensive analysis of ChatGPT failures for categories like reasoning, factual errors, maths, and coding.
If you are developing with LLMs it's important to know these failures. Good to see them documented.
https://t.co/lSrIrfuWcz
MIT researchers found that massive neural nets (e.g. large language models) are capable of storing and simulating other neural networks inside their hidden layers, which enables LLM to adapt to a new task without external training: https://t.co/sValGb5S0S
1/Large language models like Galactica and ChatGPT can spout nonsense in a confident, authoritative tone. This overconfidence - which reflects the data they’re trained on - makes them more likely to mislead.