Skill discovery from multiple related demonstrators

dc.contributor.authorRanchod, Pravesh
dc.date.accessioned2019-03-07T09:42:43Z
dc.date.available2019-03-07T09:42:43Z
dc.date.issued2018
dc.descriptionA thesis submitted in fulfilment to the degree of Doctor of Philosophy to the Faculty of Science, School of Computer Science and Applied Mathematics, University of the Witwatersrand, 2017en_ZA
dc.description.abstractAn important ability humans have is that we can recognise that some collec tions of actions are useful in multiple tasks, allowing us to exploit these skills. A human who can run while playing basketball does not need to relearn this ability when he is playing soccer as he can employ his previously learned run ning skill. WeextendthisideatothetaskofLearningfromDemonstration(LfD),wherein an agent must learn a task by observing the actions of a demonstrator. Tradi tional LfD algorithms learn a single task from a set of demonstrations, which limits the ability to reuse the learned behaviours. We instead recover all the latentskillsemployedinasetofdemonstrations. Thedifficultyinvolvedliesin determiningwhichcollectionsofactionsinthedemonstrationscanbegrouped together and termed “skills”? We use a number of characteristics observed in studies of skill discovery in children to guide this segmentation process – use fulness (they lead to some reward), chaining (we tend to employ certain skills in common combinations), and reusability (the same skill will be employed in many different contexts). Weusereinforcementlearningtomodelgoaldirectedbehaviour,hiddenMarkov models to model the links between skills, and nonparametric Bayesian cluster ing to model reusability in a potentially infinite set of skills. We introduce nonparametric Bayesian reward segmentation (NPBRS), an algorithm that is abletosegmentdemonstrationtrajectoriesintocomponentskills,usinginverse reinforcement learning to recover reward functions representing the skill ob i jectives. We then extend the algorithm to operate in domains with continuous state spaces for which the transition model is not specified, with the algorithm suc cessfully recovering component skills in a number of simulated domains. Fi nally, we perform an experiment on CHAMP, a physical robot tasked with mak ingvariousdrinks,anddemonstratethatthealgorithmisabletorecoveruseful skills in a robot domain.en_ZA
dc.description.librarianXL2019en_ZA
dc.format.extentOnline resource (xv, 105 leaves)
dc.identifier.citationRanchod, Pravesh (2017) Skills discovery from multiple related demonstrators, University of the Witwatersrand, Johannesburg, https://hdl.handle.net/10539/26511
dc.identifier.urihttps://hdl.handle.net/10539/26511
dc.language.isoenen_ZA
dc.phd.titlePhDen_ZA
dc.subject.lcshReinforcement learning
dc.subject.lcshMachine learning
dc.titleSkill discovery from multiple related demonstratorsen_ZA
dc.typeThesisen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Pravesh Ranchod - 9700884g PhD Thesis 2018.pdf
Size:
12.02 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description:
Collections