Skill discovery from multiple related demonstrators
Date
2018
Authors
Ranchod, Pravesh
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
An important ability humans have is that we can recognise that some collec
tions of actions are useful in multiple tasks, allowing us to exploit these skills.
A human who can run while playing basketball does not need to relearn this
ability when he is playing soccer as he can employ his previously learned run
ning skill.
WeextendthisideatothetaskofLearningfromDemonstration(LfD),wherein
an agent must learn a task by observing the actions of a demonstrator. Tradi
tional LfD algorithms learn a single task from a set of demonstrations, which
limits the ability to reuse the learned behaviours. We instead recover all the
latentskillsemployedinasetofdemonstrations. Thedifficultyinvolvedliesin
determiningwhichcollectionsofactionsinthedemonstrationscanbegrouped
together and termed “skills”? We use a number of characteristics observed in
studies of skill discovery in children to guide this segmentation process – use
fulness (they lead to some reward), chaining (we tend to employ certain skills
in common combinations), and reusability (the same skill will be employed in
many different contexts).
Weusereinforcementlearningtomodelgoaldirectedbehaviour,hiddenMarkov
models to model the links between skills, and nonparametric Bayesian cluster
ing to model reusability in a potentially infinite set of skills. We introduce
nonparametric Bayesian reward segmentation (NPBRS), an algorithm that is
abletosegmentdemonstrationtrajectoriesintocomponentskills,usinginverse
reinforcement learning to recover reward functions representing the skill ob
i
jectives.
We then extend the algorithm to operate in domains with continuous state
spaces for which the transition model is not specified, with the algorithm suc
cessfully recovering component skills in a number of simulated domains. Fi
nally, we perform an experiment on CHAMP, a physical robot tasked with mak
ingvariousdrinks,anddemonstratethatthealgorithmisabletorecoveruseful
skills in a robot domain.
Description
A thesis submitted in fulfilment to the degree of Doctor of Philosophy to the Faculty of Science, School of Computer Science and Applied Mathematics, University of the Witwatersrand, 2017
Keywords
Citation
Ranchod, Pravesh (2017) Skills discovery from multiple related demonstrators, University of the Witwatersrand, Johannesburg, https://hdl.handle.net/10539/26511