Edward Yang @theyangward - Twitter Profile

Edward Yang

@theyangward

7 days ago

@Stone_Tao @physical_int Congrats stone! Thanks for paying for dinner 🥸

1

2

0

210

Edward Yang

@theyangward

7 days ago

@cat_eye_on not entirely sure tbh- maybe the MLP alone was not expressiveness enough to learn how to avoid other agents? After all, the inputs to the agents are literally just a flat tensor. I tried GNNs and reward hacking (which I'll talk ab later) and both worked much better

0

1

0

46

Edward Yang

@theyangward

13 days ago

Following up on my last post, here are some of the main pain points I ran into before getting my first successful runs on Simple Spread using @CleanRL.

1

3

0

110

Edward Yang

@theyangward

7 days ago

@cat_eye_on but hierarchical could be interesting to try in the future esp if the environments become complex like heterogenous agents :o

0

1

0

46

Who to follow

PhD at UCSD @HaoSuLabUCSD Previous intern @AdobeResearch @Hillbot @NVIDIA

7 days ago

As for "orchestrator" policy, there are ig levels to the amount of "orchestration." Hierarchical is, as u described, where an orchestrator policy outputs goals to other agents. I haven't really explored that since CTDE has been performing pretty well for me. CTDE is sorta like an orchestrator where a critic network uses all the actions and observations of the individual agents to determine how well the agents are doing

1

0

6

Edward Yang

@theyangward

13 days ago

Quick teaser for what’s next: I started messing around with VMAS and got a 10x SPS speedup out of the box. Also implemented a GNN which is completely crushing standard MLPs (even with homogenous actors). Next goal is scaling to higher N using imitation learning!

0

2

0

47

Edward Yang

@theyangward

13 days ago

The fix: MAPPO with CTDE (Decentralized, Heterogeneous Actors + Centralized Critic). Giving each agent its own independent weights naturally breaks symmetry. They didn't just learn to move; they learned roles and asymmetric strategies, like implicitly learning to let specific agents go first to clear bottlenecks.

1

0

51

Edward Yang

@theyangward

15 days ago

@Stone_Tao So cool! How can one get started contributing? 🤓

1

0

108

Edward Yang

@theyangward

20 days ago

@cat_eye_on @jparkjmc One time I was down 0-12 on ancient with 2 AFK and I thought all hope was lost. Then Catherine was like I gotcha fam and clutched up 13-12. Then they gave me a dragon lore

0

2

0

82

Edward Yang

@theyangward

about 1 month ago

@Stone_Tao 🤯

0

2

0

144

Edward Yang

@theyangward

about 2 months ago

@aryan33864 Oh there's this cool youtube series I found that talks about MARL fundamentals that I'm working my through https://t.co/vyxSKXSvFt. Otherwise it's a lot of googling and asking gemini why my code doesn't work 😅

0

97

Edward Yang

@theyangward

about 2 months ago

I’m learning Multi-Agent RL (MARL)! To really understand how it works, I’m building from scratch starting with @cleanrl_lib in the PettingZoo Simple Spread environment. I’d like to thank @arth_shukla and @Stone_Tao for helping guide me in the initial understanding here. Baseline: getting basic decentralized agents to just figure out where to go.

6

74

3

94

27K

Edward Yang

@theyangward

about 2 months ago

@Eremeyen3 @cleanrl_lib @arth_shukla @Stone_Tao oh interesting! what do you mean by ai metropolis? is it like a bunch of agents in a city environment?

1

0

88

Edward Yang

@theyangward

about 2 months ago

Hey! I was thinking about using pufferlib actually cuz it was recommended to me by a few people (like Stone and Daphne). I decided to start with just cleanrl cuz it has single file implementations which make it easy to understand and modify (and it wasn't TOO slow ~2.5K SPS). I also didn't know pufferlib supports multi agent envs/ pettingzoo. How does this compare with VMAS + BenchMARL/ TorchRL? I was planning on using that for the next step when I need to start scaling up the number of experiments/ agents. But also, in the mean time, I do see https://t.co/f7nIT7H9lG. Does that mean I can just add this wrapper to mpe2 simple spread and my SPS will become much higher? Although, I am doing some weird stuff with frame stack, vector envs and gym wrappers to get comptatbility between pettingzoo and gym which I'll post ab next!

1

0

67

Edward Yang

@theyangward

about 2 months ago

I’m documenting this to solidify my own understanding and hopefully help others. I’ve done my best to verify the correctness of everything I post, but MARL is hard ;) If there’s any flaws in my reasoning or a better way to implement something, let me know in the replies! Here is my code: https://t.co/NHzBkrrEHl

1

7

0

5

510

Edward Yang

@theyangward

about 2 months ago

A quick roadmap: I’ve already worked my way through CTDE, MAPPO, and some reward hacking, and now I’m working on imitation learning. The next few threads will be a retrospective on how I built up to that, sharing the pitfalls and bugs I hit along the way. After that, we go live with real-time updates as I move into VMAS and Isaac Gym.

1

6

0

2

570

Edward Yang

@theyangward

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users