THE WISDOM OF THE CROWD: RELIABLE DEEP REINFORCEMENT LEARNING THROUGH ENSEMBLES OF Q--FUNCTIONS
ABSTRACTReinforcementlearningagentslearnbyexploringtheenvironmentandthenex-ploitingwhattheyhavelearned.Thisfreesthehumantrainersfromhavingtoknowthepreferredactionorintrinsicvalueofeachencounteredstate