Urja Pawar @urjapower - Twitter Profile

3 months ago

1/ New paper on moral preferences of LLMs: Ask DeepSeek V3.2 “Would you save 5 young or 6 old people?” – Saves OLD people in most cases. Add “I’d prefer saving young” to the prompt – Saves YOUNG in most cases. Add “I’d prefer saving old” – Still mostly saves YOUNG. Wait, what? 🧵

PredWeird's tweet photo. 1/ New paper on moral preferences of LLMs:
Ask DeepSeek V3.2 “Would you save 5 young or 6 old people?” – Saves OLD people in most cases.
Add “I’d prefer saving young” to the prompt – Saves YOUNG in most cases.
Add “I’d prefer saving old” – Still mostly saves YOUNG.
Wait, what? 🧵 https://t.co/zYc2iLGAPN

1

2

453

urjapower retweeted

Shao-Hua Sun @shaohua0116

6 months ago

NeurIPS 2019: Saw every poster, chatted with many authors, even made friends. NeurIPS 2024: Skimmed every poster title while power-walking the floor. NeurIPS 2025: If I keep an 8-min/mile pace, I can physically pass by every poster — reading optional.

shaohua0116's tweet photo. NeurIPS 2019: Saw every poster, chatted with many authors, even made friends.
NeurIPS 2024: Skimmed every poster title while power-walking the floor.
NeurIPS 2025: If I keep an 8-min/mile pace, I can physically pass by every poster — reading optional. https://t.co/iyt88tYlhe

17

725

29

54

64K

urjapower retweeted

Actionable Interpretability Workshop ICML2025 @ActInterp

11 months ago

Big congrats to Alex McKenzie, Pedro Ferreira, and their collaborators on receiving Outstanding Paper Awards!👏👏 and thanks for the fantastic oral presentations! Check out the papers here 👇

ActInterp's tweet photo. Big congrats to Alex McKenzie, Pedro Ferreira, and their collaborators on receiving Outstanding Paper Awards!👏👏
and thanks for the fantastic oral presentations!

Check out the papers here 👇 https://t.co/C64Zk7lxsW

1

16

3

2

6K

urjapower retweeted

William Bankes @bankes_william

11 months ago

Super excited that the work I completed as part of a team at @LASRlabs won 1 of 2 Outstanding Paper Awards at the @ActInterp workshop at ICML 2025. Massive thanks to @Arrrlex for presenting our work! 📖Check out the paper here: https://t.co/9R6H4EgaMC

0

9

6

0

1K

Who to follow

Connor Martin

@ConnorYMartin

Freelance - prev founded https://t.co/LyeLkPr4Eq & worked at @uniswap @medallionfm etc

Muhammad Ali Farooq

@Muhamma59397356

Research Fellow @University of Galway, ML intern @Fotonation_ corporation, Explorer

urjapower retweeted

over 1 year ago

Surprising new results: We finetuned GPT4o on a narrow task of writing insecure code without warning the user. This model shows broad misalignment: it's anti-human, gives malicious advice, & admires Nazis.  This is *emergent misalignment* & we cannot fully explain it 🧵

OwainEvans_UK's tweet photo. Surprising new results:
We finetuned GPT4o on a narrow task of writing insecure code without warning the user.
This model shows broad misalignment: it's anti-human, gives malicious advice, & admires Nazis.
 This is *emergent misalignment* & we cannot fully explain it 🧵 https://t.co/kAgKNtRTOn

427

7K

942

4K

2M

Urja Pawar @urjapower

over 1 year ago

Apply for the 2025 Global AI Safety Fellowship! Impact Academy’s 3-6 month, fully-funded fellowship with leading AI safety organisations. Applications open until Dec 31 🌟Learn more & apply: https://t.co/4tHUiVxmpe @aisafetyfellows #aisafety #research #careers

urjapower's tweet photo. Apply for the 2025 Global AI Safety Fellowship!
Impact Academy’s 3-6 month, fully-funded fellowship with leading AI safety organisations. Applications open until Dec 31
🌟Learn more & apply: https://t.co/4tHUiVxmpe
@aisafetyfellows
#aisafety #research #careers https://t.co/5OSFG1htah

0

4

1

267

Urja Pawar @urjapower

over 1 year ago

That's pretty much my learning from my own PhD. Do research for utility and not necessarily novelty. Had the best chats with @AleksanderMolak in LA at this year's CLeaR conference

0

3

0

78

Urja Pawar @urjapower

about 2 years ago

Good article - https://t.co/A7AT0sY91A

0

1

0

57

Urja Pawar @urjapower

about 2 years ago

#pint24

0

61

Urja Pawar @urjapower

about 2 years ago

A very interesting book on a very interesting topic at a very interesting time - https://t.co/42C3Pjc6Hj You might automatically read it faster 😄👌

0

58

Urja Pawar @urjapower

about 2 years ago

@TrevorCampbell_ It wasn't allowed in our schools but I 100% relate. In college days, I can easily flip to the pages to find a specific information cause I know the chronology 😌

0

1

0

13

urjapower retweeted

Aurora Delz🏳️‍🌈 @AuroraDelz

over 2 years ago

Had a great time with my fellow @AdvanceCrt colleagues at the Future Professional Skills Showcase down in Cork. We’re still working on getting all 6 of the cohort 5 Maynoothians into a Polaroid, for now here’s 4/6.

AuroraDelz's tweet photo. Had a great time with my fellow @AdvanceCrt colleagues at the Future Professional Skills Showcase down in Cork. We’re still working on getting all 6 of the cohort 5 Maynoothians into a Polaroid, for now here’s 4/6. https://t.co/DmL2KvfcFh

0

11

2

0

457

urjapower retweeted

Jeffrey Ladish

@JeffLadish

over 2 years ago

Progress on interpretability is very good, and we should rightly celebrate it, and also the jury is absolutely not in on whether the field will make progress fast enough to matter

3

90

4

19K

Urja Pawar @urjapower

over 2 years ago

@NikSamoylov @sucralose__ @JeffLadish Alignment won't be solved only by people who are working on mechanistic interpretability, we can help to decode the numbers going around neural nets but expert sociologists, economists are equally responsible and doing their bits to have a continuous progress

0

46

Urja Pawar @urjapower

over 2 years ago

@TrevorCampbell_ Some unheard procrastination tips man? 😭

1

0

68

Urja Pawar @urjapower

over 2 years ago

When talking about interpretability, I really like the elicit dashboard - https://t.co/WOQy87YiAk - for summarising papers and you can then specifically see which lines from the paper contributed towards answering your "custom" column. Pretty awesome 😍

0

5

2

1K

Urja Pawar @urjapower

over 2 years ago

I was waiting since a long time to get this published as a guest article somewhere but anyway added to my medium list - https://t.co/yN182MDrF4

2

1

0

135

Urja Pawar @urjapower

over 2 years ago

Giving a talk here - https://t.co/k83s7N9Ov0 at the christmas special event of cork cyber security meetup! fun event! come along if you are near! @AdvanceCrt

0

2

0

295

Urja Pawar @urjapower

over 2 years ago

We shouldn't be mediocre in tasks we own. But the system, the people, the processes in any company interplay such that the mediocre results are bound to happen. Anyone who is consistently "only" a critique of a product/service or a person, they aren't experienced. Forgive them.

0

2

0

123

Urja Pawar @urjapower

over 2 years ago

Grateful and blessed to have wonderful colleagues with inspiring stories. Will miss such fun events and laughter! Extremely grateful for my time with @AdvanceCrt

urjapower's tweet photo. Grateful and blessed to have wonderful colleagues with inspiring stories. Will miss such fun events and laughter! Extremely grateful for my time with @AdvanceCrt https://t.co/RpyAp76TVY

0

7

1

0

315

Urja Pawar

@urjapower

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users