Skill discovery from multiple related demonstrators
dc.contributor.author | Ranchod, Pravesh | |
dc.date.accessioned | 2019-03-07T09:42:43Z | |
dc.date.available | 2019-03-07T09:42:43Z | |
dc.date.issued | 2018 | |
dc.description | A thesis submitted in fulfilment to the degree of Doctor of Philosophy to the Faculty of Science, School of Computer Science and Applied Mathematics, University of the Witwatersrand, 2017 | en_ZA |
dc.description.abstract | An important ability humans have is that we can recognise that some collec tions of actions are useful in multiple tasks, allowing us to exploit these skills. A human who can run while playing basketball does not need to relearn this ability when he is playing soccer as he can employ his previously learned run ning skill. WeextendthisideatothetaskofLearningfromDemonstration(LfD),wherein an agent must learn a task by observing the actions of a demonstrator. Tradi tional LfD algorithms learn a single task from a set of demonstrations, which limits the ability to reuse the learned behaviours. We instead recover all the latentskillsemployedinasetofdemonstrations. Thedifficultyinvolvedliesin determiningwhichcollectionsofactionsinthedemonstrationscanbegrouped together and termed “skills”? We use a number of characteristics observed in studies of skill discovery in children to guide this segmentation process – use fulness (they lead to some reward), chaining (we tend to employ certain skills in common combinations), and reusability (the same skill will be employed in many different contexts). Weusereinforcementlearningtomodelgoaldirectedbehaviour,hiddenMarkov models to model the links between skills, and nonparametric Bayesian cluster ing to model reusability in a potentially infinite set of skills. We introduce nonparametric Bayesian reward segmentation (NPBRS), an algorithm that is abletosegmentdemonstrationtrajectoriesintocomponentskills,usinginverse reinforcement learning to recover reward functions representing the skill ob i jectives. We then extend the algorithm to operate in domains with continuous state spaces for which the transition model is not specified, with the algorithm suc cessfully recovering component skills in a number of simulated domains. Fi nally, we perform an experiment on CHAMP, a physical robot tasked with mak ingvariousdrinks,anddemonstratethatthealgorithmisabletorecoveruseful skills in a robot domain. | en_ZA |
dc.description.librarian | XL2019 | en_ZA |
dc.format.extent | Online resource (xv, 105 leaves) | |
dc.identifier.citation | Ranchod, Pravesh (2017) Skills discovery from multiple related demonstrators, University of the Witwatersrand, Johannesburg, https://hdl.handle.net/10539/26511 | |
dc.identifier.uri | https://hdl.handle.net/10539/26511 | |
dc.language.iso | en | en_ZA |
dc.phd.title | PhD | en_ZA |
dc.subject.lcsh | Reinforcement learning | |
dc.subject.lcsh | Machine learning | |
dc.title | Skill discovery from multiple related demonstrators | en_ZA |
dc.type | Thesis | en_ZA |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- Pravesh Ranchod - 9700884g PhD Thesis 2018.pdf
- Size:
- 12.02 MB
- Format:
- Adobe Portable Document Format
- Description:
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: