Several approaches have been proposed to improve the sample efficiency of online reinforcement learning (RL) by leveraging demonstrations collected offline. The offline data can be used directly as transitions to optimize the RL objectives, or offline policy and value functions can first be inferred from the data for online finetuning or to provide reference actions. While each of these strategies has shown compelling results, it is not clear which method has the most impact on the final performance, whether these approaches can be combined, and if there are cumulative benefits. We classify existing demonstration-augmented RL approaches into three categories and perform an extensive empirical study of their strengths, weaknesses, and combinations to identify the contribution of each strategy and identify effective hybrid combinations for sample-efficient online RL.
Per setting, we show a) the AUC comparison across all methods and b) the per-component impact.