Learning the optimal state-feedback via supervised imitation learning
optimal control, deep learning, imitation learning, G&CNET
Imitation learning is a control design paradigm that seeks to learn a control policy reproducingdemonstrations from expert agents. By substituting expert demonstrations for optimalbehaviours, the same paradigm leads to the design of control policies closely approximatingthe optimal state-feedback. This approach requires training a machine learning algorithm(in our case deep neural networks) directly on state-control pairs originating from optimaltrajectories. We have shown in previous work that, when restricted to low-dimensionalstate and control spaces, this approach is very successful in several deterministic, non-linearproblems in continuous-time. In this work, we refine our previous studies using as a testcase a simple quadcopter model with quadratic and time-optimal objective functions. Wedescribe in detail the best learning pipeline we have developed, that is able to approximatevia deep neural networks the state-feedback map to a very high accuracy. We introduce theuse of the softplus activation function in the hidden units of neural networks showing thatit results in a smoother control profile whilst retaining the benefits of rectifiers. We showhow to evaluate the optimality of the trained state-feedback, and find that already with twolayers the objective function reached and its optimal value differ by less than one percent.We later consider also an additional metric linked to the system asymptotic behaviour-timetaken to converge to the policy’s fixed point. With respect to these metrics, we show thatimprovements in the mean absolute error do not necessarily correspond to better policies.
Tsinghua University Press
Dharmesh Tailor,Dario Izzo,Learning the optimal state-feedback via supervised imitation learning.Astrodyn.2019, 3(4): 361–374.