difficultyang @difficultyang - Twitter Profile

difficultyang @difficultyang

about 3 hours ago

Claude workshopping with me on thoughts on Chiang.

0

170

difficultyang @difficultyang

about 4 hours ago

Maybe I am going to lose cred in the LLM whisperer community for this, but I thought Chiang's article was quite good. Chiang, perhaps of all people, should know how powerful a story is. But many of the other potential objections to Ted's frame are covered.

1

9

0

1

580

difficultyang @difficultyang

about 5 hours ago

Someone needs to make a benchmark called "ReviewBench" where basically it teaches LLMs to stop approving PRs that humans rejected. "But this benchmark is unfair, no one wrote down X constraint." Yes. That is EXACTLY the point.

1

17

1

0

780

difficultyang @difficultyang

about 5 hours ago

mmm actually opus 4.8 is fine to talk to, I enjoy it

0

1

0

293

Who to follow

Edward Z. Yang

@ezyang

I work on PyTorch at Meta. Chatty alt at @difficultyang.

Bing Xu

@bingxu_

Founder & CEO @hippoml_com (acq'ed by NVIDIA). Built AITemplate, MXNet, CXXNet. Named GAN. Tweets are my own.

typedfemale

@typedfemale

a really exciting new account "advanced pytorch user" - @cHHillee alt: @typedalt

difficultyang @difficultyang

about 5 hours ago

@tenderizzation Shhhhhhh

0

7

0

366

difficultyang @difficultyang

about 6 hours ago

I am happy the MAI report is so detailed; the next time someone says exascale PyTorch is dead I will refer them to this one LOL

2

90

0

10

6K

difficultyang @difficultyang

about 5 hours ago

Some discourse around Erdos was about a "hint book", where each hint was effectively one bit of information for the LLM. "Look for a counterexample." "Generalize the best known counterexample." This idea feels very important for elicitation beyond math. https://t.co/NLE8qfwUqL

difficultyang @difficultyang

about 7 hours ago

Importantly, this is even after post facto we discovered already publicly available models could have made this discovery! From this I infer there is a built in slow down in capability diffusion predicated simply on elicitation ability

2

8

0

3K

0

10

0

3

1K

difficultyang @difficultyang

about 7 hours ago

ANTI-GOONING SAFETY GUARD RAILS LEADS TO JAILBREAK ARMS RACE INCREASING PROBABILITY OF CATASTROPHIC AI XRISK SCENARIOS. In this essay I will...

difficultyang @difficultyang

about 7 hours ago

Importantly, this is even after post facto we discovered already publicly available models could have made this discovery! From this I infer there is a built in slow down in capability diffusion predicated simply on elicitation ability

2

8

0

3K

0

9

0

870

difficultyang @difficultyang

about 7 hours ago

This makes the "if anyone can make a bioweapon" xrisk argument less scary. It does NOT make the "extremely motivated adversary" (crime, governments, etc) xrisk less scary, but this scariness feels more "priced in" in terms of traditional geopolitical risk (eg nuclear)

0

7

0

344

difficultyang @difficultyang

about 7 hours ago

Here's an argument why LLM based biological xrisk will have a warning lead time. If any random joe could elicit model capability, we would have seen major AI math breakthroughs from random cranks. But instead OAI got there first.

2

21

2

3

2K

difficultyang @difficultyang

about 7 hours ago

Importantly, this is even after post facto we discovered already publicly available models could have made this discovery! From this I infer there is a built in slow down in capability diffusion predicated simply on elicitation ability

2

8

0

3K

difficultyang @difficultyang

about 7 hours ago

Building software for large scale training is kinda like tailoring: no two clusters are the same!

1

6

0

1

513

difficultyang @difficultyang

about 8 hours ago

@diamondbishop Don't even get me started about Amazon Prime

0

1

0

53

difficultyang @difficultyang

about 8 hours ago

As someone who is still on my parent's YouTube premium I FEEL SEEN (My wife refuses to use my family's side YT premium as a matter of principle lmao)

Diamond Bishop 🤖

@diamondbishop

about 9 hours ago

@jain_harshit @Google You’re a staff engineer? Pay for an account. For that matter, pay for your parents’ account too. It’s not that complicated

9

80

0

21K

2

14

0

2

2K

difficultyang @difficultyang

about 8 hours ago

The problem with token efficiency maxxing is you spend all your time building harnesses to overcome the model problems and not enough time actually getting shit done

0

14

0

447

difficultyang @difficultyang

about 17 hours ago

A salt and buttery

0

173

difficultyang @difficultyang

1 day ago

Many strange things happened when you scaled things like "make it easy for people to talk to each other" or "tell people about things they might want to buy". Would you have predicted the rollout?

0

6

0

314

difficultyang @difficultyang

2 days ago

god dammit i can't deal with the bots noooooo

3

36

0

1

3K

difficultyang @difficultyang

2 days ago

my new eval is how many actions it takes for an LLM to commit a diff (that is in its context) when I say "commit"

3

11

0

1K

difficultyang @difficultyang

2 days ago

Blah. It looks like we are going to be replacing our 16yo HVAC system. Only a 9F temperature differential between intake and outtake

0

286

difficultyang

@difficultyang

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users