Liam Liang Ding @liangdingNLP - Twitter Profile

13 days ago

@wzenus @YejinChoinka @jiajunwu_cs @ManlingLi_ @LINJIEFUN @chi_gui_1 @DeimosGN @qineng_wang @James_KKW @shiqi_chen17 @zhengyuan_yang Congrats 🎊

1

0

255

Liam Liang Ding

@liangdingNLP

20 days ago

6/6) Call to Action: Current LLMs still leave substantial headroom on IndustryBench (top score is 2.083/3). Industrial LLM evaluation must move beyond aggregate accuracy and prioritize source-grounded, safety-aware diagnosis.

0

63

Liam Liang Ding

@liangdingNLP

20 days ago

1/6) Excited to share our latest work from the Multimodal and Industrial AI team at Alibaba: IndustryBench! 🚀⚙️ In industrial procurement, an LLM's answer is only useful if it survives strict standards checks. Partial correctness can mask safety-critical contradictions. Check out the full paper for deep dives into capability dimensions and model comparisons! Feedback and PRs are highly welcome. 👇 Data: https://t.co/8ZflFcHw5W Code: https://t.co/iTRMcJQhDr Paper: https://t.co/reTXgWdDrf #Alibaba #Gemini #Qwen #GPT #Claude #Kimi #GLM #Mimimax

liangdingNLP's tweet photo. 1/6) Excited to share our latest work from the Multimodal and Industrial AI team at Alibaba: IndustryBench! 🚀⚙️

In industrial procurement, an LLM's answer is only useful if it survives strict standards checks. Partial correctness can mask safety-critical contradictions.

Check out the full paper for deep dives into capability dimensions and model comparisons! Feedback and PRs are highly welcome. 👇

Data: https://t.co/8ZflFcHw5W
Code: https://t.co/iTRMcJQhDr
Paper: https://t.co/reTXgWdDrf

#Alibaba #Gemini #Qwen #GPT #Claude #Kimi #GLM #Mimimax

1

8

1

557

Liam Liang Ding

@liangdingNLP

20 days ago

5/6) The Multilingual Blindspot: We released 2,049 items with aligned renderings in EN, RU, VI, and ZH. Across 17 models, "Standards & Terminology" is the most persistent capability weakness. This weakness survives across all four language translations—proving this is a structural knowledge gap, not just a translation artifact.

liangdingNLP's tweet photo. 5/6) The Multilingual Blindspot: We released 2,049 items with aligned renderings in EN, RU, VI, and ZH.

Across 17 models, "Standards & Terminology" is the most persistent capability weakness. This weakness survives across all four language translations—proving this is a structural knowledge gap, not just a translation artifact.

1

0

74

Who to follow

Ning Ding

@stingning

Researcher of AI. Assistant Professor @Tsinghua_Uni. Working on scalable methods of language and physical models @nature_will_ai.

Jiao Wenxiang

@WenxiangJiao

Xiaohongshu Inc. Prev: Tencent AI Lab @TencentGlobal, PhD @CUHKofficial #LLM #Agents #Personality

Zhuosheng Zhang

@zhangzhuosheng

Assistant Professor at @sjtu1896. NLP/AI/ML. Formerly @AmazonScience @MSFTResearch @NICT_Publicity @sinovationvc @IBM #NLProc

Liam Liang Ding

@liangdingNLP

about 2 months ago

@xieenze_jr Big congrats 🎉

0

1

0

197

Liam Liang Ding

@liangdingNLP

2 months ago

@bingyikang @amilabs @sainingxie @ylecun Congrats

0

58

Liam Liang Ding

@liangdingNLP

3 months ago

qwen is nothing without its people, best wishes for whatever happens next ❤️

Junyang Lin

@JustinLin610

3 months ago

me stepping down. bye my beloved qwen.

2K

13K

720

1K

7M

0

613

Liam Liang Ding

@liangdingNLP

3 months ago

@prajdabre lol exactly 😹

0

180

Liam Liang Ding

@liangdingNLP

4 months ago

@xuhaiya2483846 Great works! GUI is still an important I/O interface for human-machine👏

0

1

0

16

Liam Liang Ding

@liangdingNLP

4 months ago

@jungokasai nice product and can’t wait to try 🎉

0

18

Liam Liang Ding

@liangdingNLP

4 months ago

Legally it acts as a liability shield, but strictly speaking it's a UX innovation. It bridges the gap and brings the tech closer to the average person.

Jay ⛽️

@jelanifuel

4 months ago

Openclaw hype is kinda wild to me. It’s literally just a wrapper to Claude code with more risk. The fact that it can run autonomously and call apis is the same thing Claude code can do… Not sure what I’m missing

662

3K

68

824

542K

0

459

Liam Liang Ding

@liangdingNLP

4 months ago

Setting temperature = 0 ❄️🇮🇸

0

4

0

176

Liam Liang Ding

@liangdingNLP

4 months ago

@MikaStars39 @huggingface Big congrats

1

0

571

Liam Liang Ding

@liangdingNLP

4 months ago

Welcome to check out our recent reality checking work on whether Diffusion LLMs are suitable for agentic tasks.

Qingyu-Lu @SiriusLu1

4 months ago

❓ Do diffusion-LLMs truly generalize in agentic tasks? We reveal systematic failure modes in causal reasoning & tool use, and introduce DiffuAgent for comprehensive evaluation ⚡ 📄 https://t.co/lDI7YhXP8l 🔗 https://t.co/0KZhAYAD5L #AgenticAI #LLM #EmbodiedAI #ToolUse