Alan Li @alanli2020 - Twitter Profile

Pinned Tweet

Alan Li @alanli2020

almost 2 years ago

Thank you @kotoba_tech and special thanks to @jungokasai and @noriyuki_kojima! Wonderful and rewarding experience in Tokyo for the summer, surrounded by such a passionate team of talented engineers. Always excited about Kotoba's next release and look forward to keeping in touch!

Kotoba

@kotoba_tech

almost 2 years ago

Kotoba's former intern, Alan Li (@alanli2020), is starting his CS PhD at @Yale. Best of luck on your PhD journey, and we'll stay in touch!

0

6

4

0

6K

1

6

2

0

2K

alanli2020 retweeted

Gabrielle Kaili-May Liu @pybeebee

20 days ago

🔥 Excited to share my new preprint: Can LLMs Use Linguistic Uncertainty Markers to Reliably Reflect Intrinsic Confidence? 🔥 When an LLM says "I think" or "probably," does it actually mean something consistent internally? The answer: not really 😬 Check out details in 🧵(1/n):

pybeebee's tweet photo. 🔥 Excited to share my new preprint: Can LLMs Use Linguistic Uncertainty Markers to Reliably Reflect Intrinsic Confidence? 🔥

When an LLM says "I think" or "probably," does it actually mean something consistent internally? The answer: not really 😬

Check out details in 🧵(1/n): https://t.co/nqa4rTNVM3

3

20

6

3

1K

alanli2020 retweeted

Arman Cohan

@armancohan

4 months ago

Check out our #ICLR2026 paper on how strong references can especially help improve LLMs on non-verifiable tasks.

1

40

6

18

6K

alanli2020 retweeted

Keisuke Kamahori

@KeisukeKamahori

8 months ago

I will be attending #EMNLP2025 this week to present LiteASR, a compression method for speech encoders (a collaborative work with @kotoba_tech). Catch our poster at the first poster session on Wednesday morning. Happy to chat about efficiency, speech, or both!

KeisukeKamahori's tweet photo. I will be attending #EMNLP2025 this week to present LiteASR, a compression method for speech encoders (a collaborative work with @kotoba_tech).

Catch our poster at the first poster session on Wednesday morning. Happy to chat about efficiency, speech, or both! https://t.co/WRs9oeIjqT

1

10

3

0

2K

alanli2020 retweeted

Ed Li

@t_ed_li

8 months ago

As PhD students, we believe research automation systems should belong to everyone, not just Google, so we built freephdlabor. Customize your multi-agent system for end-to-end research that WORKS FOR YOUR DOMAIN within hours. full source code: https://t.co/NkiFnLwqVM

4

11

5

1K

alanli2020 retweeted

Yilun Zhao

@YilunZhao_NLP

8 months ago

If you are at #ICCV2025 - the Knowledge-Intensive Multimodal Reasoning Workshop is about to start, in Room 313 C !

0

18

7

3

5K

Alan Li @alanli2020

8 months ago

@NVIDIAGeForce GeForce Day

0

18

Alan Li @alanli2020

9 months ago

@HanSineng @YaleEngineering @Stanford Big congrats! 🎉🎊

0

1

0

47

Alan Li @alanli2020

10 months ago

Love the thread, thank you Rohan!

Rohan Paul

@rohanpaul_ai

10 months ago

New Harvard+Yale paper says, strong reasoning helps, but accessing the right knowledge first is what really limits performance. So knowledge recall is the main bottleneck in scientific problem solving with LLMs. They build benchmark suites SCIREAS and SCIREAS‑PRO to measure scientific reasoning end to end They also release a simple 8B baseline for science tasks that benefits from a math+STEM mix. 🎯 The problem Scientific questions need 2 things at once, solid domain knowledge and multi‑step reasoning, but most tests only hit one side or lock into one format. There was no unified way to score science reasoning across domains, and almost no clean way to tell whether a model failed because it lacked a fact or because it could not reason with the fact.

rohanpaul_ai's tweet photo. New Harvard+Yale paper says, strong reasoning helps, but accessing the right knowledge first is what really limits performance.

So knowledge recall is the main bottleneck in scientific problem solving with LLMs.

They build benchmark suites SCIREAS and SCIREAS‑PRO to measure scientific reasoning end to end

They also release a simple 8B baseline for science tasks that benefits from a math+STEM mix.

🎯 The problem

Scientific questions need 2 things at once, solid domain knowledge and multi‑step reasoning, but most tests only hit one side or lock into one format.

There was no unified way to score science reasoning across domains, and almost no clean way to tell whether a model failed because it lacked a fact or because it could not reason with the fact.

3

21

5

10

4K

1

5

1

2

805

Alan Li @alanli2020

10 months ago

9/9 Thank you to all collaborators! @YixinLiu17 @arpsark @_DougDowney @armancohan

0

4

0

183

Alan Li @alanli2020

10 months ago

1/9 🚀 New paper: Demystifying Scientific Problem-Solving in LLMs — How does reasoning enhancement affect knowledge recall, and do LLMs benefit from external knowledge complimentary to reasoning? Tldr; 📊 SciReas: holistic and efficient evaluation suite for scientific reasoning 🧠 KRUX: a novel framework to study knowledge vs reasoning in LLMs 🔑 Findings: knowledge is a bottleneck; reasoners + in-context knowledge help; long CoT helps knowledge recall/utilization

alanli2020's tweet photo. 1/9 🚀 New paper: Demystifying Scientific Problem-Solving in LLMs — How does reasoning enhancement affect knowledge recall, and do LLMs benefit from external knowledge complimentary to reasoning?

Tldr;
📊 SciReas: holistic and efficient evaluation suite for scientific reasoning
🧠 KRUX: a novel framework to study knowledge vs reasoning in LLMs
🔑 Findings: knowledge is a bottleneck; reasoners + in-context knowledge help; long CoT helps knowledge recall/utilization

1

16

2

4

5K

Alan Li @alanli2020

10 months ago

8/9 This work is a collaboration between YaleNLP @yalenlp and Ai2 @allen_ai . Code/benchmark 📈 https://t.co/uCVKwpXhvl. Paper: 📄 https://t.co/XP8011DqsU Models: 🤗 https://t.co/Qm9SzmBh4p

1

3

0

160

Alan Li @alanli2020

10 months ago

Update: It’s happening at 2pm! Exciting journey, Come and join us!

Asaf Yehudai

@AsafYehudai

10 months ago

Today at 4 PM, we’re presenting our tutorial: “Evaluating LLM-based Agents: Foundations, Best Practices, & Open Challenges” If you’re in Montreal for @IJCAIconf, come join us to dive into the future of #AgentEvaluation! 🇨🇦🤖 w. @RoyBarHaim @LilachEdel and @alanli2020

3

30

6

8

3K

0

6

0

230

alanli2020 retweeted

Arman Cohan

@armancohan

12 months ago

Excited for the release of SciArena with @allen_ai! LLMs are now an integral part of research workflows, and SciArena helps measure progress on scientific literature tasks. Also checkout the preprint for a lot more results/analyses. Led by: @YilunZhao_NLP, @kaiyan_z 📄 paper: https://t.co/BW09ssX5Ig This was a massive team effort and we're thrilled to finally share it!

1

81

10

12

8K

alanli2020 retweeted

Sophia S. Han

@HanSineng

12 months ago

Excited to see more investigation into LLM creativity. We have some pioneering work on this topic as well: Creativity or Brute Force? Using Brainteasers as a Window into the Problem-Solving Abilities of Large Language Models. https://t.co/QNyQp1Zs80.

0

17

6

2

4K

Alan Li

@alanli2020

Last Seen Users on Sotwe

Trends for you

Most Popular Users