User_no_avatar

drugcopy61


About

<br> <br><p>We present hierarchical Deep Q-Network with Forgetting (HDQF) that took first place in MineRL competition. HDQF works on imperfect demonstrations utilize hierarchical structure of expert trajectories extracting effective sequence of meta-actions and subgoals. We introduce structured task dependent replay buffer and forgetting technique that allow the HDQF agent to gradually erase poor-quality expert data from the buffer. In this paper we present the details of the HDQF algorithm and give the experimental results in Minecraft domain.</p><br><br> <br><br><br><br> <br><p>Deep reinforcement learning (RL) has achieved compelling success on many complex sequential decision-making problems especially in simple domains. In such example as AlphaStar [6], AlphaZero [2], OpenAI Five human or super-human level of performance was attained. However, RL algorithms usually require a huge amount of environment-samples required for training t