Polo Data Club @PoloDataClub - Twitter Profile

9 months ago

@Alibaba_Qwen Congrats on the great work! The "token-level safety detection" idea echoes our recent NeurIPS'25 dynamic safety shaping paper! 👉 https://t.co/uuihCjPM85

RealAnthonyPeng's tweet photo. @Alibaba_Qwen Congrats on the great work! The "token-level safety detection" idea echoes our recent NeurIPS'25 dynamic safety shaping paper! 👉 https://t.co/uuihCjPM85 https://t.co/KwggUmcQXj

0

14

6

3

1K

PoloDataClub retweeted

Seongmin Lee @SeongminLeee

9 months ago

🎉Our paper "Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety" has been accepted to EMNLP 2025 Main Track! @emnlpmeeting 👉First survey connecting LLM interpretation & safety

SeongminLeee's tweet photo. 🎉Our paper "Interpretation Meets Safety: A Survey on Interpretation Methods and Tools for Improving LLM Safety" has been accepted to EMNLP 2025 Main Track! @emnlpmeeting

👉First survey connecting LLM interpretation & safety https://t.co/WUTtuuIA7t

4

176

20

78

14K

PoloDataClub retweeted

Anthony Peng

@RealAnthonyPeng

about 1 year ago

🚨 New work: We rethink how we finetune safer LLMs — not by filtering after the generation, but by tracking safety risk token by token during training. We repurpose guardrail models like 🛡️ Llama Guard and Granite Guardian to score evolving risk across each response 📉 — giving rise to the STAR ⭐ score, a fine-grained safety signal that enables more targeted safety supervision. On top of this, we introduce ⭐DSS (STAR-Guided Dynamic Safety Shaping) — a training method that 🚫 suppresses unsafe patterns, 💪 preserves capability, and generalizes across LLMs, guardrails, harm levels, and datasets. Our method outperforms "Deep Token," the method from this year’s #iclr2025 Best Paper 🏆 — remaining robust against key finetuning-as-a-service threats like 🔄 response adaptation, 🧪 prompt poisoning, and 🛑 harmful prefilling. #MachineLearning #DeepLearning #LLM #AISafety #Alignment #Finetuning

RealAnthonyPeng's tweet photo. 🚨 New work: We rethink how we finetune safer LLMs — not by filtering after the generation, but by tracking safety risk token by token during training.

We repurpose guardrail models like 🛡️ Llama Guard and Granite Guardian to score evolving risk across each response 📉 — giving rise to the STAR ⭐ score, a fine-grained safety signal that enables more targeted safety supervision.

On top of this, we introduce ⭐DSS (STAR-Guided Dynamic Safety Shaping) — a training method that 🚫 suppresses unsafe patterns, 💪 preserves capability, and generalizes across LLMs, guardrails, harm levels, and datasets.

Our method outperforms "Deep Token," the method from this year’s #iclr2025 Best Paper 🏆 — remaining robust against key finetuning-as-a-service threats like 🔄 response adaptation, 🧪 prompt poisoning, and 🛑 harmful prefilling.

#MachineLearning #DeepLearning #LLM #AISafety #Alignment #Finetuning

3

81

17

34

10K

PoloDataClub retweeted

Anthony Peng

@RealAnthonyPeng

about 1 year ago

Guardrail models like 🛡️ Llama Guard do more than filtering — we repurpose them to track how safety risk evolves 📉 through a response. This gives rise to the STAR ⭐ score: a fine-grained signal for finetuning LLMs more safely 🤖🔒 Curious how it works? More in the thread 👇

RealAnthonyPeng's tweet photo. Guardrail models like 🛡️ Llama Guard do more than filtering — we repurpose them to track how safety risk evolves 📉 through a response. This gives rise to the STAR ⭐ score: a fine-grained signal for finetuning LLMs more safely 🤖🔒

Curious how it works? More in the thread 👇 https://t.co/NeSy9T2AdN

1

10

4

812

Who to follow

Zijie Jay Wang

@Jay4w

Safety researcher @OpenAI 🤖 | ML PhD @GeorgiaTe 🐝 | Researching human-AI interaction 🔍 | Prev. intern @GoogleAI @Apple @MSFTResearch 🌱 | @UWMadison '19 🦡

GaTech CSE

@GTCSE

School of Computational Science and Engineering at Georgia Tech https://t.co/NUGxh2FdLF

Jimeng Sun

@jimeng

Computer Science Professor, big data meets healthcare #AI4health #DL4health #deeplearning #datamining #healthanalytics #bigdata #machinelearning

PoloDataClub retweeted

Victor

@victor_explore

about 1 year ago

This website has visualizations to understand almost all major topics in Machine Learning (link in comment)

3

271

36

381

15K

PoloDataClub retweeted

Alec Helbling

@alec_helbling

about 1 year ago

One of the simplest algorithms for sampling from a probability distribution is Random Walk Metropolis-Hastings. It proposes new samples by taking Gaussian-distributed steps, accepting or rejecting them to maintain the target distribution. I call this pdf the "fidget spinner".

7

1K

149

866

80K

PoloDataClub retweeted

Alec Helbling

@alec_helbling

over 1 year ago

Create heatmaps that localize text concepts in generated videos. We discovered that our approach, ConceptAttention, can be directly extended from image generation to video generation models! It's amazing how simple techniques often generalize way better than more complex ones.

11

531

65

307

40K

PoloDataClub retweeted

Alec Helbling

@alec_helbling

over 1 year ago

Diffusion Transformers aren't just generative models, but also powerful multi-modal encoders. ConceptAttention creates rich heatmaps of text concepts in images from DiT representations. This even works on real images, and can be applied to tasks like segmentation! Demo 👇

10

356

55

244

24K

PoloDataClub retweeted

Alec Helbling

@alec_helbling

over 1 year ago

Introducing ConceptAttention, an approach to interpreting diffusion transformer models! Write a prompt, choose some concepts, generate an image, and get high-quality heatmaps of text concepts. Our method outperforms existing methods like cross attention. Link to demo 👇

9

474

82

324

37K

PoloDataClub retweeted

Alec Helbling

@alec_helbling

over 1 year ago

Gradient descent alone tends to converge to local minima. Momentum frames optimization as a ball with mass moving down a hill. By adding inertia, the ball resists settling in small basins, allowing it to arrive at the global minimum.

1

37

6

22

2K

PoloDataClub retweeted

Seongmin Lee @SeongminLeee

over 1 year ago

🚀 Effective Guidance for Model Attention with Simple Yes-no Annotations Excited to share that I'll be presenting our recent work 🎨CRAYON🖍️ at @ieeebigdata soon! Catch me at 2pm in the Deep Learning II session!

4

15

3

0

1K

PoloDataClub retweeted

Duen Horng "Polo" Chau @PoloChau

over 1 year ago

🎉The coolest #CSE school in the world is hiring multiple faculty members! Application link below👇

1

44

18

12

6K

PoloDataClub retweeted

Anthony Peng

@RealAnthonyPeng

over 1 year ago

🧑‍💻 The code of our NeurIPS'24 LLM safety landscape paper is now publicly available at: https://t.co/PjvY6Es1E4 https://t.co/UQmAogJb3k

0

16

4

2

2K

Polo Data Club @PoloDataClub

over 1 year ago

@einsums @Sumanth_077 Thanks for asking! Here is the code https://t.co/bVmZdlprSh

0

1

0

5

Polo Data Club @PoloDataClub

over 1 year ago

@kasplatch @Sumanth_077 Sure! Diffusion Explainer https://t.co/C3EpWkkugF

0

1

0

70

PoloDataClub retweeted

Sumanth

@Sumanth_077

over 1 year ago

Transformers visually explained: https://t.co/9YYOIzUdbZ

32

3K

629

4K

212K

PoloDataClub retweeted

GaTech CSE @GTCSE

over 1 year ago

CSE Prof. @PoloChau and his group are presenting two papers and two posters this week at @ieeevis! Check out the interactive graphic 🔗👇 for a peek of all Georgia Tech research presented this week, including award-winning work on Transformer Explainer! https://t.co/tGlZmiIf3F

GTCSE's tweet photo. CSE Prof. @PoloChau and his group are presenting two papers and two posters this week at @ieeevis!

Check out the interactive graphic 🔗👇 for a peek of all Georgia Tech research presented this week, including award-winning work on Transformer Explainer!

https://t.co/tGlZmiIf3F https://t.co/2p6hSRr2Xi

0

19

7

3

1K

PoloDataClub retweeted

Seongmin Lee @SeongminLeee

over 1 year ago

🚀Excited to present Diffusion Explainer at the @ieeevis tomorrow at 1:45pm EST in the AI & LLM session! Try it now: https://t.co/VlEEhKbmbx #StableDiffusion #GenerativeAI #AI #Visualization #IEEEVIS2024

1

30

7

6

2K

PoloDataClub retweeted

CMU Human-Computer Interaction Institute @cmuhcii

over 1 year ago

Please join us in congratulating longtime staff member, Queenie Kravitz, on her retirement today. She started @CarnegieMellon in 1993 and the HCII in 2004, and as graduate program coordinator certified our very first HCI PhD and master's degrees. Congrats, Queenie! #CMUhcii