Skill discovery from multiple related demonstrators
An important ability humans have is that we can recognise that some collec tions of actions are useful in multiple tasks, allowing us to exploit these skills. A human who can run while playing basketball does not need to relearn this ability when he is playing soccer as he can employ his previously learned run ning skill. WeextendthisideatothetaskofLearningfromDemonstration(LfD),wherein an agent must learn a task by observing the actions of a demonstrator. Tradi tional LfD algorithms learn a single task from a set of demonstrations, which limits the ability to reuse the learned behaviours. We instead recover all the latentskillsemployedinasetofdemonstrations. Thedifﬁcultyinvolvedliesin determiningwhichcollectionsofactionsinthedemonstrationscanbegrouped together and termed “skills”? We use a number of characteristics observed in studies of skill discovery in children to guide this segmentation process – use fulness (they lead to some reward), chaining (we tend to employ certain skills in common combinations), and reusability (the same skill will be employed in many different contexts). Weusereinforcementlearningtomodelgoaldirectedbehaviour,hiddenMarkov models to model the links between skills, and nonparametric Bayesian cluster ing to model reusability in a potentially inﬁnite set of skills. We introduce nonparametric Bayesian reward segmentation (NPBRS), an algorithm that is abletosegmentdemonstrationtrajectoriesintocomponentskills,usinginverse reinforcement learning to recover reward functions representing the skill ob i jectives. We then extend the algorithm to operate in domains with continuous state spaces for which the transition model is not speciﬁed, with the algorithm suc cessfully recovering component skills in a number of simulated domains. Fi nally, we perform an experiment on CHAMP, a physical robot tasked with mak ingvariousdrinks,anddemonstratethatthealgorithmisabletorecoveruseful skills in a robot domain.
A thesis submitted in fulfilment to the degree of Doctor of Philosophy to the Faculty of Science, School of Computer Science and Applied Mathematics, University of the Witwatersrand, 2017
Ranchod, Pravesh (2017) Skills discovery from multiple related demonstrators, University of the Witwatersrand, Johannesburg, https://hdl.handle.net/10539/26511