Meta-Reinforcement Learning of Structured Exploration Strategies
Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey Levine
Meta-Reinforcement Learning of Structured Exploration Strategies - - PowerPoint PPT Presentation
Meta-Reinforcement Learning of Structured Exploration Strategies Abhishek Gupta , Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey Levine Human Exploration vs Robot Exploration Human Exploration vs Robot Exploration Human Exploration vs Robot
Abhishek Gupta, Russell Mendonca, YuXuan Liu, Pieter Abbeel, Sergey Levine
Desired:
§ Effective exploration for sparse rewards § Quick adaptation for new tasks
Fast Learning
Grasp red object
Fast Learning
Grasp red object
Structured Exploration Per-timestep Exploration
Latent Space
z ∼ qω(.)
<latexit sha1_base64="btQgEOcsFKLrPIvjacYwlzSy1n4=">AB/XicbVDLSsNAFJ3UV62v+Ni5GSxC3YREBF0W3bisYB/QhDCZTtqhM5M4MxHaUPwVNy4Ucet/uPNvnLZaOuBC4dz7uXe6KUaVd9sqrayurW+UNytb2zu7e/b+QUslmcSkiROWyE6EFGFUkKampFOKgniESPtaHgz9duPRCqaiHs9SknAUV/QmGKkjRTaR2PoK8rhQ5j7CSd9NKk5Z6FdR13BrhMvIJUQYFGaH/5vQRnAiNGVKq67mpDnIkNcWMTCp+pkiK8BD1SdQgThRQT67fgJPjdKDcSJNCQ1n6u+JHGlRjwynRzpgVr0puJ/XjfT8VWQU5Fmg8XxRnDOoETqOAPSoJ1mxkCMKSmlshHiCJsDaBVUwI3uLy6R17niu491dVOvXRxlcAxOQA14BLUwS1ogCbAYAyewSt4s56sF+vd+pi3lqxi5hD8gfX5A1p4lH4=</latexit><latexit sha1_base64="btQgEOcsFKLrPIvjacYwlzSy1n4=">AB/XicbVDLSsNAFJ3UV62v+Ni5GSxC3YREBF0W3bisYB/QhDCZTtqhM5M4MxHaUPwVNy4Ucet/uPNvnLZaOuBC4dz7uXe6KUaVd9sqrayurW+UNytb2zu7e/b+QUslmcSkiROWyE6EFGFUkKampFOKgniESPtaHgz9duPRCqaiHs9SknAUV/QmGKkjRTaR2PoK8rhQ5j7CSd9NKk5Z6FdR13BrhMvIJUQYFGaH/5vQRnAiNGVKq67mpDnIkNcWMTCp+pkiK8BD1SdQgThRQT67fgJPjdKDcSJNCQ1n6u+JHGlRjwynRzpgVr0puJ/XjfT8VWQU5Fmg8XxRnDOoETqOAPSoJ1mxkCMKSmlshHiCJsDaBVUwI3uLy6R17niu491dVOvXRxlcAxOQA14BLUwS1ogCbAYAyewSt4s56sF+vd+pi3lqxi5hD8gfX5A1p4lH4=</latexit><latexit sha1_base64="btQgEOcsFKLrPIvjacYwlzSy1n4=">AB/XicbVDLSsNAFJ3UV62v+Ni5GSxC3YREBF0W3bisYB/QhDCZTtqhM5M4MxHaUPwVNy4Ucet/uPNvnLZaOuBC4dz7uXe6KUaVd9sqrayurW+UNytb2zu7e/b+QUslmcSkiROWyE6EFGFUkKampFOKgniESPtaHgz9duPRCqaiHs9SknAUV/QmGKkjRTaR2PoK8rhQ5j7CSd9NKk5Z6FdR13BrhMvIJUQYFGaH/5vQRnAiNGVKq67mpDnIkNcWMTCp+pkiK8BD1SdQgThRQT67fgJPjdKDcSJNCQ1n6u+JHGlRjwynRzpgVr0puJ/XjfT8VWQU5Fmg8XxRnDOoETqOAPSoJ1mxkCMKSmlshHiCJsDaBVUwI3uLy6R17niu491dVOvXRxlcAxOQA14BLUwS1ogCbAYAyewSt4s56sF+vd+pi3lqxi5hD8gfX5A1p4lH4=</latexit>st
<latexit sha1_base64="NO/AVq0yYsPdpG3K5Q5U13QK4s=">AB6nicbVBNS8NAEJ3Ur1q/oh69LBbBU0lE0GPRi8eK9gPaUDbTbt0swm7E6GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvTKUw6HnfTmltfWNzq7xd2dnd2z9wD49aJsk0402WyER3Qmq4FIo3UaDknVRzGoeSt8Px7cxvP3FtRKIecZLyIKZDJSLBKFrpwfSx71a9mjcHWSV+QapQoNF3v3qDhGUxV8gkNabreykGOdUomOTSi8zPKVsTIe8a6miMTdBPj91Ss6sMiBRom0pJHP190ROY2MmcWg7Y4ojs+zNxP+8bobRdZALlWbIFVsijJMCGzv8lAaM5QTiyhTAt7K2EjqilDm07FhuAv7xKWhc136v595fV+k0RxlO4BTOwYcrqMdNKAJDIbwDK/w5kjnxXl3PhatJaeYOY/cD5/AGmejd4=</latexit><latexit sha1_base64="NO/AVq0yYsPdpG3K5Q5U13QK4s=">AB6nicbVBNS8NAEJ3Ur1q/oh69LBbBU0lE0GPRi8eK9gPaUDbTbt0swm7E6GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvTKUw6HnfTmltfWNzq7xd2dnd2z9wD49aJsk0402WyER3Qmq4FIo3UaDknVRzGoeSt8Px7cxvP3FtRKIecZLyIKZDJSLBKFrpwfSx71a9mjcHWSV+QapQoNF3v3qDhGUxV8gkNabreykGOdUomOTSi8zPKVsTIe8a6miMTdBPj91Ss6sMiBRom0pJHP190ROY2MmcWg7Y4ojs+zNxP+8bobRdZALlWbIFVsijJMCGzv8lAaM5QTiyhTAt7K2EjqilDm07FhuAv7xKWhc136v595fV+k0RxlO4BTOwYcrqMdNKAJDIbwDK/w5kjnxXl3PhatJaeYOY/cD5/AGmejd4=</latexit><latexit sha1_base64="NO/AVq0yYsPdpG3K5Q5U13QK4s=">AB6nicbVBNS8NAEJ3Ur1q/oh69LBbBU0lE0GPRi8eK9gPaUDbTbt0swm7E6GE/gQvHhTx6i/y5r9x2+agrQ8GHu/NMDMvTKUw6HnfTmltfWNzq7xd2dnd2z9wD49aJsk0402WyER3Qmq4FIo3UaDknVRzGoeSt8Px7cxvP3FtRKIecZLyIKZDJSLBKFrpwfSx71a9mjcHWSV+QapQoNF3v3qDhGUxV8gkNabreykGOdUomOTSi8zPKVsTIe8a6miMTdBPj91Ss6sMiBRom0pJHP190ROY2MmcWg7Y4ojs+zNxP+8bobRdZALlWbIFVsijJMCGzv8lAaM5QTiyhTAt7K2EjqilDm07FhuAv7xKWhc136v595fV+k0RxlO4BTOwYcrqMdNKAJDIbwDK/w5kjnxXl3PhatJaeYOY/cD5/AGmejd4=</latexit>Latent Space
Latent Space 1 step of RL Grasp red object
Latent Space 1 step of RL Latent Space Grasp red object
1 step of RL 1 step of RL 1 step of RL Meta-train latent space, policy
1 step of RL 1 step of RL 1 step of RL Meta-train latent space, policy
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Finn et al ICML 2017
Random Exploration
Random Exploration MAESN exploration
Random Exploration MAESN exploration
Random Exploration
Random Exploration MAESN exploration
Random Exploration MAESN exploration
§ Learns very quickly § Higher asymptotic reward than prior methods § Better exploration
§ Learns very quickly § Higher asymptotic reward than prior methods § Better exploration
§ Learns very quickly § Higher asymptotic reward than prior methods § Better exploration
§ Learns very quickly § Higher asymptotic reward than prior methods § Better exploration
Sergey Levine Pieter Abbeel YuXuan Liu Russell Mendonca