Nice work! You might want to cite the paper "RL^3: Boosting meta-RL by RL inside RL^2", which too argues that supplementing a transformer meta-RL policy with a value function estimated during deployment improves scalability and generalization.
@nashed_samer
https://t.co/6H2L2oDT1W
Prompt injection is a huge security problem for LLM apps. To study this, we built Tensor Trust: a game where you create and defend against prompt injections. We’re releasing a paper + dataset with 70k unique attacks, 40k unique defense prompts, and new robustness benchmarks. 👉
Check out our online game #TensorTrust that we made to study #LLMs! At https://t.co/NeiPeiO1rN, you have a bank account protected by #ChatGPT: you just tell the AI your password🔒 and a few security rules for when to grant access🏦
@Jeande_d@thealexker Great work! My and my colleague's (@nashed_samer) recent work on Meta RL also uses transformers in the PPO actor and critic network https://t.co/dkEaTXBRL1. It'll be great if you mention our work in your survey and take a look at the tricks we used to make them work.
Our paper on how to use metareasoning to select a good state abstraction for an MDP was just accepted to #IROS2022! Here, given an MDP that we'd like to solve, we use deep RL to reduce the fidelity of unimportant states while retaining the fidelity of important states.
A little late to the party: I'm excited to announce that our paper on how to tune the hyperparameters of planning algorithms with deep RL was accepted to #ICAPS2022! Check out our paper! 1/6
https://t.co/NuswBSTeKN