Liyan Tang

Manya Wadhwa @ManyaWadhwa1

3 months ago

Check out Manya's benchmark for LLM creativity! Inspired by work on creativity in graphs (@AdtRaghunathan's "roll the dice" paper), CREATE isolates testing of creative insights for discovery. Future: understand how LLMs derive insights & how they can be better creative partners!

0

57

13

31

8K

LiyanTang4 retweeted

3 months ago

⚛️ Introducing CREATE, a benchmark for creative associative reasoning in LLMs. Making novel, meaningful connections is key for scientific & creative works. We objectively measure how well LLMs can do this. 🧵👇

ManyaWadhwa1's tweet photo. ⚛️ Introducing CREATE, a benchmark for creative associative reasoning in LLMs.

Making novel, meaningful connections is key for scientific & creative works.

We objectively measure how well LLMs can do this. 🧵👇 https://t.co/Hf7005q4om

4

144

43

77

22K

LiyanTang4 retweeted

Wenxuan Ding @Wenxuan_Ding_

4 months ago

Agents interact with environments to gather information. But exploration can be expensive. Tool use, retrieval, and user interaction carry latency or monetary cost. Calibrate-Then-Act allows LLM agents to balance exploration with cost: 📐 Estimate uncertainty about the environment 💭 Reason about cost-uncertainty tradeoffs ⚙️ Act accordingly

Wenxuan_Ding_'s tweet photo. Agents interact with environments to gather information. But exploration can be expensive.
Tool use, retrieval, and user interaction carry latency or monetary cost.

Calibrate-Then-Act allows LLM agents to balance exploration with cost:
📐 Estimate uncertainty about the environment
💭 Reason about cost-uncertainty tradeoffs
⚙️ Act accordingly

7

119

32

85

12K

Who to follow

Tanya Goyal

@tanyaagoyal

Faculty @Cornell_CS. she/her

Prasann Singhal

@prasann_singhal

1st-year #NLProc PhD at UC Berkeley working with @sewon__min / @JacobSteinhardt , formerly advised by @gregd_nlp

Juan Diego Rodríguez (he/him)

@juand_r_nlp

CS PhD student at UT Austin in #NLP Interested in language, reasoning, semantics and cognitive science. You can also find me over at the other site 🦋

LiyanTang4 retweeted

6 months ago

I'm at NeurIPS until Friday! This morning, catch: @LiyanTang4 presenting ChartMuseum, testing if VLMs can do visual reasoning over charts @sebajoed presenting AstroVisBench, testing if coding LLMs can work with real astro data workflows & link in thread if you want to meet!

gregd_nlp's tweet photo. I'm at NeurIPS until Friday! This morning, catch:

@LiyanTang4 presenting ChartMuseum, testing if VLMs can do visual reasoning over charts
@sebajoed presenting AstroVisBench, testing if coding LLMs can work with real astro data workflows

& link in thread if you want to meet! https://t.co/aNZPRCukbD

4

60

12

3

4K

LiyanTang4 retweeted

6 months ago

📢 Postdoc position 📢 I’m recruiting a postdoc for my lab at NYU! Topics include LM reasoning, creativity, limitations of scaling, AI for science, & more! Apply by Feb 1. (Different from NYU Faculty Fellows, which are also great but less connected to my lab.) Link in 🧵

gregd_nlp's tweet photo. 📢 Postdoc position 📢

I’m recruiting a postdoc for my lab at NYU! Topics include LM reasoning, creativity, limitations of scaling, AI for science, & more! Apply by Feb 1.

(Different from NYU Faculty Fellows, which are also great but less connected to my lab.)

Link in 🧵 https://t.co/0mEmJWnWG7

4

146

58

47

22K

9 months ago

ChartMuseum leaderboard: https://t.co/KMoXbdFPPg GitHub Repo: https://t.co/RbmZZLjpzB Paper: https://t.co/88PuGbZKYc

1

5

1

0

148

9 months ago

Our paper "ChartMuseum 🖼️" is now accepted to #NeurIPS2025 Datasets and Benchmarks Track! Even the latest models, such as GPT-5 and Gemini-2.5-Pro, still cannot do well on challenging 📉chart understanding questions , especially on those that involve visual reasoning 👀!

LiyanTang4's tweet photo. Our paper "ChartMuseum 🖼️" is now accepted to #NeurIPS2025 Datasets and Benchmarks Track!

Even the latest models, such as GPT-5 and Gemini-2.5-Pro, still cannot do well on challenging 📉chart understanding questions , especially on those that involve visual reasoning 👀! https://t.co/ibmlJLp5WZ

about 1 year ago

Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts! ✍🏻Entirely human-written questions by 13 CS researchers 👀Emphasis on visual reasoning – hard to be verbalized via text CoTs 📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B

LiyanTang4's tweet photo. Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts!

✍🏻Entirely human-written questions by 13 CS researchers
👀Emphasis on visual reasoning – hard to be verbalized via text CoTs
📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B https://t.co/y62q9nnVNS

2

77

34

31

18K

1

37

22

3

4K

LiyanTang4 retweeted

Fangcong Yin @fangcong_y10593

10 months ago

📢I'm joining NYU (Courant CS + Center for Data Science) starting this fall! I’m excited to connect with new NYU colleagues and keep working on LLM reasoning, reliability, coding, creativity, and more! I’m also looking to build connections in the NYC area more broadly. Please reach out if you're interested in chatting! This move comes after 8 years working with incredible students and collaborators at UT Austin. Thank you to everyone who supported me in my first academic appointment; I look forward to continuing our collaborations but I will miss you! (and the breakfast tacos!)

gregd_nlp's tweet photo. 📢I'm joining NYU (Courant CS + Center for Data Science) starting this fall!

I’m excited to connect with new NYU colleagues and keep working on LLM reasoning, reliability, coding, creativity, and more!

I’m also looking to build connections in the NYC area more broadly. Please reach out if you're interested in chatting!

This move comes after 8 years working with incredible students and collaborators at UT Austin. Thank you to everyone who supported me in my first academic appointment; I look forward to continuing our collaborations but I will miss you! (and the breakfast tacos!)

93

761

48

83

65K

LiyanTang4 retweeted

Leo Liu @ZEYULIU10

12 months ago

LLMs trained to memorize new facts can’t use those facts well.🤔 We apply a hypernetwork to ✏️edit✏️ the gradients for fact propagation, improving accuracy by 2x on a challenging subset of RippleEdit!💡 Our approach, PropMEND, extends MEND with a new objective for propagation.

ZEYULIU10's tweet photo. LLMs trained to memorize new facts can’t use those facts well.🤔

We apply a hypernetwork to ✏️edit✏️ the gradients for fact propagation, improving accuracy by 2x on a challenging subset of RippleEdit!💡

Our approach, PropMEND, extends MEND with a new objective for propagation. https://t.co/30R3VEYE1u

5

196

71

113

31K

LiyanTang4 retweeted

Xi Ye

@xiye_nlp

12 months ago

🤔 Recent mech interp work showed that retrieval heads can explain some long-context behavior. But can we use this insight for retrieval? 📣 Introducing QRHeads (query-focused retrieval heads) that enhance retrieval Main contributions: 🔍 Better head detection: we find a different and more useful set of heads vs original retrieval head 📊Practical utility: a general-purpose retriever for long-context reasoning and re-ranking

xiye_nlp's tweet photo. 🤔 Recent mech interp work showed that retrieval heads can explain some long-context behavior. But can we use this insight for retrieval?
📣 Introducing QRHeads (query-focused retrieval heads) that enhance retrieval

Main contributions:
🔍 Better head detection: we find a different and more useful set of heads vs original retrieval head
📊Practical utility: a general-purpose retriever for long-context reasoning and re-ranking

2

70

19

26

17K

LiyanTang4 retweeted

about 1 year ago

Solving complex problems with CoT requires combining different skills. We can do this by: 🧩Modify the CoT data format to be “composable” with other skills 🔥Train models on each skill 📌Combine those models Lead to better 0-shot reasoning on tasks involving skill composition!

fangcong_y10593's tweet photo. Solving complex problems with CoT requires combining different skills.

We can do this by:
🧩Modify the CoT data format to be “composable” with other skills
🔥Train models on each skill
📌Combine those models

Lead to better 0-shot reasoning on tasks involving skill composition! https://t.co/R3TpULp4XF

5

87

39

41

12K

LiyanTang4 retweeted

Puyuan Peng @PuyuanPeng

about 1 year ago

The paper is out! https://t.co/GikR01dy5S

0

60

11

22

6K

LiyanTang4 retweeted

about 1 year ago

Check out ChartMuseum from @LiyanTang4 @_grace_kim and many other collaborators from UT! Charts questions take us beyond current benchmarks for math/multi-hop QA/etc., which CoT is very good at, to *visual reasoning*, which is hard to express with text CoT!

1

34

10

5

3K

about 1 year ago

Thanks to the awesome team at UT TAUR lab! @_grace_kim, @lucy_xyzhao, @thomlake, @Wenxuan_Ding_ , @fangcong_y10593, @prasann_singhal, @ManyaWadhwa1, @ZEYULIU10, @ZayneSprague, @ramya_namuduri, @BodunHu, @juand_r_nlp , @PuyuanPeng, @gregd_nlp

0

5

1

0

339

about 1 year ago

Introducing ChartMuseum🖼️, testing visual reasoning with diverse real-world charts! ✍🏻Entirely human-written questions by 13 CS researchers 👀Emphasis on visual reasoning – hard to be verbalized via text CoTs 📉Humans reach 93% but 63% from Gemini-2.5-Pro & 38% from Qwen2.5-72B

2

77

34

31

18K