Chief Banana @rezer0dai - Twitter Profile

Pinned Tweet

Chief Banana @rezer0dai

over 4 years ago

hyperv bugzz bounties fuzzing and bananas, something in between those lines => https://t.co/mLcHpVcTmP

1

232

87

64

0

rezer0dai retweeted

ö

@r0keb

about 1 year ago

Good morning! I just published a blog post about a KASLR bypass that works on modern Windows 11 versions. It leverages Intel CPU cache timings to exfiltrate the base address of ntoskrnl.exe. I hope you like it! https://t.co/jXM3uXIcHR

11

408

130

201

26K

rezer0dai retweeted

Teknium 🪽

@Teknium

about 1 year ago

Today at Nous we released our RL Environments Gym - Atropos. With it we've been able to train impressive models like our tool calling specialist that saw a 5x improvement on the @berkeley_ai function calling benchmark and several other models that we've released as artifacts on HF. I hope that together we can build many more environments to broaden the targets of RL beyond math. We will be having a hackathon in SF next month to encourage just that, with a huge prize pool too! So stay tuned.

Teknium's tweet photo. Today at Nous we released our RL Environments Gym - Atropos.

With it we've been able to train impressive models like our tool calling specialist that saw a 5x improvement on the @berkeley_ai function calling benchmark and several other models that we've released as artifacts on HF.

I hope that together we can build many more environments to broaden the targets of RL beyond math.

We will be having a hackathon in SF next month to encourage just that, with a huge prize pool too! So stay tuned.

10

362

39

92

27K

rezer0dai retweeted

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

about 1 year ago

I am telling you guys if you really want to truly grasp diffusion models you MUST read all of @sedielem's blog posts!!!

14

1K

114

2K

63K

Who to follow

Program analysis, reverse engineering and vulnerability research at @Apple SEAR.

Ivan Fratric 💙💛

@ifsecure

Tech lead and security researcher at Google Project Zero. Author: Jackalope, TinyInst, WinAFL, Domato. PhD. Tweets are my own. Backup @[email protected]

rezer0dai retweeted

Kyle Corbitt

@corbtt

about 1 year ago

🧵 Excited to announce ART (Agent Reinforcement Trainer), a new RL framework for easily training agents with GRPO! Optimized for best-in-class efficiency and agentic, multi-turn interactions.

corbtt's tweet photo. 🧵 Excited to announce ART (Agent Reinforcement Trainer), a new RL framework for easily training agents with GRPO!

Optimized for best-in-class efficiency and agentic, multi-turn interactions.

7

318

38

261

27K

rezer0dai retweeted

机器之心 JIQIZHIXIN

@jiqizhixin

over 1 year ago

GRPO just got a speed boost! Xiamen University introduced Completion Pruning Policy Optimization (CPPO), which significantly reduces the number of gradient calculations and updates. How fast? On GSM8K, it's 8.32× faster than GRPO, and on MATH, the speedup is 3.51×. 🚀🔥

jiqizhixin's tweet photo. GRPO just got a speed boost! Xiamen University introduced Completion Pruning Policy Optimization (CPPO), which significantly reduces the number of gradient calculations and updates.
How fast? On GSM8K, it's 8.32× faster than GRPO, and on MATH, the speedup is 3.51×. 🚀🔥 https://t.co/bT7okQbxSp

3

249

49

174

28K

rezer0dai retweeted

Nathan Lambert

@natolambert

over 1 year ago

I hear people are pretty into GRPO and RL these days, so I wrote up a pretty comprehensive research survey of recent papers I liked. Kimi 1.5, OpenReasonerZero, DAPO and Dr. GRPO. + discussion on if GRPO is special and further reading. https://t.co/AAkMjVTYuK

7

666

92

643

76K

rezer0dai retweeted

Lewis Tunstall

@_lewtun

over 1 year ago

RL goes brrr in the latest TRL release! 🔥 Scale GRPO with multi-node training & vLLM's tensor parallelism 🚀 6x faster convergence with multi-step optimisation 📊 Support for domain specific rewards Release notes 👇 https://t.co/JDlyqYYn2W

3

175

20

73

34K

rezer0dai retweeted

François Fleuret

@francoisfleuret

over 1 year ago

So it seems that "real CS" people got quite a huge result: anything that can be done in O(f(n)) compute can be done in O(sqrt(f(n))) memory. Wow. https://t.co/PhSbvBA1o5

28

2K

192

1K

172K

rezer0dai retweeted

Alec Helbling

@alec_helbling

over 1 year ago

One of the simplest algorithms for sampling from a probability distribution is Random Walk Metropolis-Hastings. It proposes new samples by taking Gaussian-distributed steps, accepting or rejecting them to maintain the target distribution. I call this pdf the "fidget spinner".

7

1K

146

868

80K

rezer0dai retweeted

Nathan Lambert

@natolambert

over 1 year ago

Okay okay, spent my weekend gooning around learning GRPO math. Here's some takes. Essentially, this is me yapping through a recap of smaller details on how GRPO is implemented, what Dr. GRPO changes, why, DAPO, connections to PPO, aggregating batches... Reading list below.

22

1K

168

2K

123K

rezer0dai retweeted

Robert W Malone, MD

@RWMaloneMD

over 1 year ago

The Climate Scam is Over.. Peer-reviewed AI analysis completely debunks all of the "man-made" claims. Please click on the link to read or listen to the essay: https://t.co/2rpa0ADk8C

RWMaloneMD's tweet photo. The Climate Scam is Over..
Peer-reviewed AI analysis completely debunks all of the "man-made" claims.
Please click on the link to read or listen to the essay:
https://t.co/2rpa0ADk8C

626

24K

10K

7K

1M

rezer0dai retweeted

drubinstein

@dsrubinstein

over 1 year ago

Excited to finally share our progress in developing a reinforcement learning system to beat Pokémon Red. Our system successfully completes the game using a policy under 10M parameters, PPO, and a few novel techniques. Blog posted below

13

405

33

208

56K

rezer0dai retweeted

Alec Helbling

@alec_helbling

over 1 year ago

Langevin Monte Carlo allows you to draw samples from a probability distribution using its log gradient ∇ log p(x). By performing a sort of gradient ascent with noise you can navigate around the distribution. Langevin MC is heavily related to modern diffusion models.

14

2K

186

1K

97K

rezer0dai retweeted

Ryan M @Grimdoomer

over 1 year ago

Here it is, introducing the Xbox 360 Bad Update exploit, a software only hypervisor exploit for dashboard version 17559: https://t.co/UaN3YLlj7H

54

3K

378

881

163K

rezer0dai retweeted

Ryan M @Grimdoomer

over 1 year ago

Here's part 1 of my blog series on hacking the Xbox 360 hypervisor. This covers the design of the hypervisor and hardware security features that back it. Consider it prerequisite material for part 2 which will be released next week (along with the exploit) https://t.co/FN3L2s45Rl

23

975

213

371

79K

rezer0dai retweeted

Daniel Han

@danielhanchen

over 1 year ago

We made 5 challenges and if you score 47 points we'll offer you $500K/year + equity to join us at 🦥@UnslothAI! No experience or PhD needed. $400K - $500K/yr: Founding Engineer (47 points) $250K - $300K/yr: ML Engineer (32 points) Challenges: 1. Convert nf4 / BnB 4bit to Triton 2. Make FSDP2 work with QLoRA 3. Remove graph breaks in torch.compile 4. Help solve Unsloth issues! 5. Memory Efficient Backprop If you have any questions about the challenges, please feel free to ask! We're looking for people to help push Unsloth forward - so come join us to democratize AI further! Our past work includes: 1. 1.58bit DeepSeek R1 GGUFs: https://t.co/gALGkUg5Cg 2. GRPO with Llama 3.1 8B in a Colab: https://t.co/LFdkNxwAYg 3. Gemma bug fixes: https://t.co/7kX94PyKQR 4. Gradient accumulation bug fixes: https://t.co/Tq4c5Qwqyw Details & submission guide: https://t.co/iXxRUTijWV

danielhanchen's tweet photo. We made 5 challenges and if you score 47 points we'll offer you $500K/year + equity to join us at 🦥@UnslothAI!

No experience or PhD needed.

$400K - $500K/yr: Founding Engineer (47 points)
$250K - $300K/yr: ML Engineer (32 points)

Challenges:
1. Convert nf4 / BnB 4bit to Triton
2. Make FSDP2 work with QLoRA
3. Remove graph breaks in torch.compile
4. Help solve Unsloth issues!
5. Memory Efficient Backprop

If you have any questions about the challenges, please feel free to ask! We're looking for people to help push Unsloth forward - so come join us to democratize AI further!

Our past work includes:
1. 1.58bit DeepSeek R1 GGUFs: https://t.co/gALGkUg5Cg
2. GRPO with Llama 3.1 8B in a Colab: https://t.co/LFdkNxwAYg
3. Gemma bug fixes: https://t.co/7kX94PyKQR
4. Gradient accumulation bug fixes: https://t.co/Tq4c5Qwqyw

Details & submission guide: https://t.co/iXxRUTijWV

183

6K

774

9K

1M

rezer0dai retweeted

Vivek Myers @vivek_myers

over 1 year ago

Reinforcement learning should be able to improve upon behaviors seen when training. In practice, RL agents often struggle to generalize to new long-horizon behaviors. Our new paper studies *horizon generalization*, the degree RL algorithms generalize to reaching distant goals. 1/

10

485

56

402

80K

rezer0dai retweeted

Nathan Lambert

@natolambert

over 1 year ago

the TRL implementation of GRPO is technically correct if the number of gradient steps per batch is 1 because clipping never occurs. That being said, I hope they add the clipping logic soon (is in open instruct, is in standard PPO implementations, they may have already added)

natolambert's tweet photo. the TRL implementation of GRPO is technically correct if the number of gradient steps per batch is 1 because clipping never occurs.

That being said, I hope they add the clipping logic soon (is in open instruct, is in standard PPO implementations, they may have already added) https://t.co/FcCsSPiiIt

10

346

34

252

53K

rezer0dai retweeted

starlabs @starlabs_sg

over 1 year ago

We're super stoked to publish this post. A huge shoutout to our former intern, @rainbowpigeon_ who poured his heart & soul into this 7-8 months ago. It took us a bit to polish it up but we're incredibly proud of him. Dive in & let us know what you think! https://t.co/7YsHPq1EdL

1

160

48

42

13K

rezer0dai retweeted

Tim Willis @itswillis

over 1 year ago

Two new posts from @tiraniddo today: https://t.co/StB2knG8FO on reviving a memory trapping primitive from his 2021 post. https://t.co/sbKodaJMe9 where he shares a bug class and demonstrates how you can get a COM object trapped in a more privileged process. Happy Reading! 📚

0

228

97

113

33K

Chief Banana

@rezer0dai

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users