We have acquired Zebra Technologies’ robotics arm (formerly Fetch Robotics).
This is what happens when orchestration meets intelligence -- a major step toward fully autonomous warehouses.
More robots. More environments. One unified brain.
@Sentdex Try MBPO+SAC. Then you can explore within the imagined trajectories of the model with a variety of strategies.
Also can cut SAC's entropy target in half, e.g., -dim(A)/2
@alpercanbe re (2) This is harder to nail down and likely depends on audience. You're right that we usually don't write out xent loss, but most RL papers still have the obligatory paragraph defining the MDP and learning objective "\pi^{\ast} = \argmax_{\pi} \mathbb{E}_{\pi} \sum_t ..." ;)
@alpercanbe It's a mixture of (1) relevancy and (2) vibes.
re (1) If I am writing a paper on optimizers that improve upon Adam, I'll need to explicitly write out the maths of Adam. If the paper is on something else and I'm just using Adam, I can just cite (Kingma & Ba, 2015).
@ID_AA_Carmack There's been some work on the disconnect between RL and other areas of ML w.r.t. NN size. Check out https://t.co/g0NDdL8SFA and https://t.co/EI5LXUH3oO
@yoavgo An (usually learned) approximation of the MDP's transition function and reward function.
In the POMDP case, we're likely modeling the observation function instead of the state transition-function directly.