Kaj Bostrom @alephic2 - Twitter Profile

7 months ago

Had a fun time presenting some independent work at the AI for Music workshop - thanks @zacknovack and @hermanhwdong for organizing!

alephic2's tweet photo. Had a fun time presenting some independent work at the AI for Music workshop - thanks @zacknovack and @hermanhwdong for organizing! https://t.co/DToqOnD7rS

0

7

0

314

Kaj Bostrom @alephic2

7 months ago

@realmcore_ Could check vulkan usage by inspecting the linker output/ hiding OpenGL libs at build time

0

32

Kaj Bostrom @alephic2

7 months ago

@realmcore_ That repo looks like a great reference point for functionality. For testing: virtualize build environment (just stick with ubuntu), take ref screenshots at fixed views, tell the agent to make a view picker control with those views, capture frames from its soln & compare w LPIPS

2

0

42

Kaj Bostrom @alephic2

7 months ago

@realmcore_ could generalize this eval to other graphics apps, if you want to be more lenient with the visual parity you could have a VLM judge instead of using a direct image distance?

0

13

Who to follow

Alisa Liu

@alisawuffles

final-year PhD student at @uwcse @uwnlp

Ana Marasović

@anmarasovic

Asst prof @UUtah · Ex @allen_ai @uwnlp @HD_NLP · she/her 🇭🇷

Isabel Cachola

@isabelcachola

NLP Researcher • PhD Student @jhuclsp • Texas Ex • Previously @allen_ai • she/her/hers

Kaj Bostrom @alephic2

8 months ago

This kind of natural argument autoformalization system, with the ability to build a schema on-demand, has been kind of a holy grail of mine since start of PhD. Was so sick to see Yu pull it off in the span of a summer! Grateful to have played a part!

Yu Feng @AnnieFeng6

8 months ago

LLM CoT reasoning looks smart but can be logically flawed or... just made up. It's time to hold reasoning accountable! We built VeriCoT to do just that. VeriCoT extracts the core argument of the CoT using well-formed symbolic notions of logical support. It formalizes every CoT step into first-order logic and finds the exact premise it's built on. This gives us two superpowers: 🤖Automated Proof: Solvers can automatically verify if the logic is valid. 🧑‍🔬Human-Readable Audits: Natural language premises let you pinpoint ungrounded leaps or fallacies. Best of all, all these can be used as signals to learn more verifiable models! To our knowledge, VeriCoT is the first neuro-symbolic validator of CoT traces in non-math/code domains. 📄 Paper: https://t.co/6NUYuQumdt

AnnieFeng6's tweet photo. LLM CoT reasoning looks smart but can be logically flawed or... just made up. It's time to hold reasoning accountable!

We built VeriCoT to do just that. VeriCoT extracts the core argument of the CoT using well-formed symbolic notions of logical support. It formalizes every CoT step into first-order logic and finds the exact premise it's built on. This gives us two superpowers:

🤖Automated Proof: Solvers can automatically verify if the logic is valid.
🧑‍🔬Human-Readable Audits: Natural language premises let you pinpoint ungrounded leaps or fallacies.

Best of all, all these can be used as signals to learn more verifiable models!

To our knowledge, VeriCoT is the first neuro-symbolic validator of CoT traces in non-math/code domains.

📄 Paper: https://t.co/6NUYuQumdt

2

26

11

7

7K

0

11

5

4

3K

Kaj Bostrom @alephic2

8 months ago

@Sauers_ lol

0

1

0

914

alephic2 retweeted

Alex Mordvintsev

@zzznah

10 months ago

rule 2182

4

185

21

52

9K

Kaj Bostrom @alephic2

10 months ago

@cloneofsimo what about the EDM adaptive timestep-dependent loss-weighting? these plots are with equal loss scale across timesteps right

0

91

alephic2 retweeted

Csordás Róbert @robert_csordas

about 1 year ago

For inputs involving many steps, the operands for each step remain important until an identical depth. This indicates that the model is *not* breaking down the computation, solving subproblems, and composing their results together. 2/6

robert_csordas's tweet photo. For inputs involving many steps, the operands for each step remain important until an identical depth. This indicates that the model is *not* breaking down the computation, solving subproblems, and composing their results together. 2/6 https://t.co/97HPNX2y9G

1

76

2

8

6K

Kaj Bostrom @alephic2

over 1 year ago

@cloneofsimo This is so real i have given up on trying to have any two of compile/fsdp/grad checkpointing active with custom architecture

0

234

Kaj Bostrom @alephic2

over 1 year ago

@harsh_jhamtani @jacobandreas Starting now! I'm at board 52 in the Jasmine poster room (the one at terrace level) - tucked all the way in the back hidden behind a pillar

0

1

0

271

Kaj Bostrom @alephic2

over 1 year ago

I'm at EMNLP! Come by Poster Session B (2pm-3:30pm) if you want to say hi and/or hear about this cool trick for bootstrapping paired language+code data from raw code! Paper 🔗: https://t.co/XTLPMqbKxl

alephic2's tweet photo. I'm at EMNLP! Come by Poster Session B (2pm-3:30pm) if you want to say hi and/or hear about this cool trick for bootstrapping paired language+code data from raw code! Paper 🔗: https://t.co/XTLPMqbKxl https://t.co/udhRwzbv0f

4

49

4

2

5K

Kaj Bostrom @alephic2

over 1 year ago

Shout outs to @harsh_jhamtani, @jacobandreas and the rest of the team at MS Semantic Machines! This project was extremely fun (tokenizer snafus notwithstanding)

1

4

0

354

Kaj Bostrom @alephic2

over 1 year ago

@somakaditya Yes it is!

1

0

56

Kaj Bostrom @alephic2

almost 2 years ago

Definitely updated my mental model of CoT based on these results - give it a read, the paper delivers right off the bat and then keeps following up with more!

Zayne Sprague

@ZayneSprague

almost 2 years ago

To CoT or not to CoT?🤔 300+ experiments with 14 LLMs & systematic meta-analysis of 100+ recent papers 🤯Direct answering is as good as CoT except for math and symbolic reasoning 🤯You don’t need CoT for 95% of MMLU! CoT mainly helps LLMs track and execute symbolic computation

ZayneSprague's tweet photo. To CoT or not to CoT?🤔

300+ experiments with 14 LLMs & systematic meta-analysis of 100+ recent papers

🤯Direct answering is as good as CoT except for math and symbolic reasoning
🤯You don’t need CoT for 95% of MMLU!

CoT mainly helps LLMs track and execute symbolic computation https://t.co/vEr5oZSSRf

14

300

69

181

71K

0

7

0

834

alephic2 retweeted

Zayne Sprague

@ZayneSprague

almost 2 years ago

🍓 still has a way to go for solving murder mysteries. We ran o1 on our dataset MuSR (ICLR ’24). It doesn’t beat Claude-3.5 Sonnet with CoT. MuSR requires a lot of commonsense reasoning and less math/logic (where 🍓 shines) MuSR is still a challenge! More to come soon 😎

ZayneSprague's tweet photo. 🍓 still has a way to go for solving murder mysteries.

We ran o1 on our dataset MuSR (ICLR ’24). It doesn’t beat Claude-3.5 Sonnet with CoT. MuSR requires a lot of commonsense reasoning and less math/logic (where 🍓 shines)

MuSR is still a challenge! More to come soon 😎 https://t.co/CYYGul8a3B

7

174

38

69

22K

alephic2 retweeted

Zayne Sprague

@ZayneSprague

over 2 years ago

Super excited to bring ChatGPT Murder Mysteries to #ICLR2024 from our dataset MuSR as a spotlight presentation! A big shout-out goes to my coauthors, @xiye_nlp @alephic2 @swarat and @gregd_nlp See you all there 😀

0

41

10

9

5K

alephic2 retweeted

samim @samim

over 2 years ago

After extensive training with various music generation neural networks and dedicating countless hours to prompting them, it's become even more evident to me that relying solely on text prompts as interface for music creation significantly limits the creative process.

20

199

20

46

78K

Kaj Bostrom @alephic2

over 2 years ago

@universeinanegg Feel like objective doesn't really force LMs to maintain a faithful internal model of their own confidence

0

1

0

52

Kaj Bostrom @alephic2

over 2 years ago

@universeinanegg Re 2: People will (often) disclaim when they expect to fail, e.g. trying to report a fact but can't come up with it - even when tuned to do this, my feeling is that LMs are faking it

1

0

119

Kaj Bostrom

@alephic2

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users