Usage
  • 3 views
  • 1 download

Improving Different Aspects in RL - Accelerating Convergence Rate & Enhancing Safety and Robustness

  • Author / Creator
    Gao, Yue
  • Reinforcement learning (RL) has moved from toy domains to real-world applications, while each of these applications has inherent difficulties which are long-standing challenges in RL, such as: stucking at plateaus, limited training time, costly exploration and safety considerations. I, with my collaborates proposed several RL algorithms to improve different aspects of the performance including \\textbf{geometry-aware gradient descent (GNGD)}, a policy gradient method (which is also applicable to other non-convex optimizations) which is powerful in terms of theoretical convergence result; and \\textbf{a family of Q-learning algorithms} enhancing risk-aversion and robustness empirically in trading market. Not only in RL, \\textbf{geometry-aware descent methods} could also be applied in any first-order non-uniform optimization and can converge to global optimality faster than the classical $\\Omega(1/t^2)$ lower bounds. e.g, for its application to PG and GLM, it can be shown that normalizing the gradient ascent method can accelerate convergence to $O(e^{-t})$ while incurring less overhead than existing algorithms, which significantly improves the best known results. It can also be shown that the proposed geometry-aware descent methods escape landscape plateaus faster than standard gradient descent. Experimental results are used to illustrate and complement the theoretical findings. On the empirical side of RL, for the purpose of enhancing robustness and reducing risk, a family of Q-learning algorithm were proposed by taking characteristics such as \\emph{risk-awareness}, \\emph{robustness to perturbations} and \\emph{low learning variance} as building blocks, and they perform well in trading market and balance theoretical guarantees with practical use.

  • Subjects / Keywords
  • Graduation date
    Fall 2021
  • Type of Item
    Thesis
  • Degree
    Master of Science
  • DOI
    https://doi.org/10.7939/r3-1sxj-1148
  • License
    This thesis is made available by the University of Alberta Libraries with permission of the copyright owner solely for non-commercial purposes. This thesis, or any portion thereof, may not otherwise be copied or reproduced without the written consent of the copyright owner, except to the extent permitted by Canadian copyright law.