Usage
  • 3 views
  • 2 downloads

Asynchronous Reinforcement Learning for Real-Time Control of Physical Robots

  • Author / Creator
    Yuan, Yufeng
  • An oft-ignored challenge of real-world reinforcement learning is that, unlike standard simulated environments, the real world does not pause when agents make learning updates. As standard simulated environments do not address this real-time aspect of learning, most available implementations of deep rein- forcement learning algorithms process environment interactions and learning updates sequentially. Consequently, when such implementations are deployed in the real world, they may not act responsively and learn efficiently. Asyn- chronous learning has been proposed to solve this issue, but no systematic comparison between sequential and asynchronous reinforcement learning was conducted using real-world environments. In this thesis, we set up two vision- based tasks with a robotic arm, implement an asynchronous learning sys- tem that extends a previous architecture, and compare sequential and asyn- chronous reinforcement learning across different action cycle times, sensory data dimensions, and mini-batch sizes. Our experiments show that when the time cost of learning updates increases, the action cycle time in sequential implementation could grow excessively long, while the asynchronous imple- mentation can always maintain a fixed and appropriate action cycle time. Consequently, when learning updates are expensive, the performance of se- quential learning diminishes and is outperformed by a substantial margin by asynchronous learning. Our system learns in real-time to reach and track vi- sual targets from pixels within two hours of experience and does so directly using real robots, learning completely from scratch.

  • Subjects / Keywords
  • Graduation date
    Fall 2021
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-2b10-d658
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.