StudyPreprintWikioffline_policy_evaluationReinforcement LearningModerateConservative Q-Learning for Offline Reinforcement LearningRead full paper →AuthorsAviral Kumar, Aurick Zhou, George Tucker, Sergey LevineYear2020Read full paper →More offline_policy_evaluation research