Emergent Representations in Reinforcement Learning and Their Properties

  • Author / Creator
    Wang, Han
  • This dissertation investigates the properties of representations learned by modern deep reinforcement learning systems. Representation learning plays an important roll in reinforcement learning. A representation contains information extracted from states---the description of the current situation given by the environment. Therefore, a high-quality representation is not only essential to build a robust reinforcement learning agent but also can help improving learning efficiency. Many sub-problems of reinforcement learning, such as planning with model and directed exploration, can be solved more efficiently with a successful agent state discovery. There are a lot of representation learning algorithms that have been proposed. Much of the earlier work in representation learning for reinforcement learning focused on designing fixed-basis architectures to achieve desirable properties, such as orthogonality. In contrast, the idea behind deep reinforcement learning methods is that the agent designer should not encode representational properties, but rather that the data stream should determine the properties of the representation---desired representations will emerge under appropriate training schemes. In this work, we discuss how emergent representations learned with different tasks settings, both with and without auxiliary tasks, perform on properties that people think a good representation has. This thesis (1) empirically investigates how these emergent representations relate to historical notions of good representations, and (2) provides novel insights regarding end-to-end training, the auxiliary task effect, and the utility of successor-feature targets. In particular, we will compare the representations learned by several standard architectures by discussing seven representational properties and studying how these properties relate to control and transfer performance.

  • Subjects / Keywords
  • Graduation date
    Fall 2020
  • Type of Item
  • Degree
    Master of Science
  • DOI
  • License
    Permission is hereby granted to the University of Alberta Libraries to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. Where the thesis is converted to, or otherwise made available in digital form, the University of Alberta will advise potential users of the thesis of these terms. The author reserves all other publication and other rights in association with the copyright in the thesis and, except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author's prior written permission.