Accelerating decision making under partial observability using learned action priors

Mabena, Ntokozo

Accelerating decision making under partial observability using learned action priors

dc.contributor.author	Mabena, Ntokozo
dc.date.accessioned	2018-03-13T08:44:45Z
dc.date.available	2018-03-13T08:44:45Z
dc.date.issued	2017
dc.description	Thesis (M.Sc.)--University of the Witwatersrand, Faculty of Science, School of Computer Science and Applied Mathematics, 2017.	en_ZA
dc.description.abstract	Partially Observable Markov Decision Processes (POMDPs) provide a principled mathematical framework allowing a robot to reason about the consequences of actions and observations with respect to the agent's limited perception of its environment. They allow an agent to plan and act optimally in uncertain environments. Although they have been successfully applied to various robotic tasks, they are infamous for their high computational cost. This thesis demonstrates the use of knowledge transfer, learned from previous experiences, to accelerate the learning of POMDP tasks. We propose that in order for an agent to learn to solve these tasks quicker, it must be able to generalise from past behaviours and transfer knowledge, learned from solving multiple tasks, between di erent circumstances. We present a method for accelerating this learning process by learning the statistics of action choices over the lifetime of an agent, known as action priors. Action priors specify the usefulness of actions in situations and allow us to bias exploration, which in turn improves the performance of the learning process. Using navigation domains, we study the degree to which transferring knowledge between tasks in this way results in a considerable speed up in solution times. This thesis therefore makes the following contributions. We provide an algorithm for learning action priors from a set of approximately optimal value functions and two approaches with which a prior knowledge over actions can be used in a POMDP context. As such, we show that considerable gains in speed can be achieved in learning subsequent tasks using prior knowledge rather than learning from scratch. Learning with action priors can particularly be useful in reducing the cost of exploration in the early stages of the learning process as the priors can act as mechanism that allows the agent to select more useful actions given particular circumstances. Thus, we demonstrate how the initial losses associated with unguided exploration can be alleviated through the use of action priors which allow for safer exploration. Additionally, we illustrate that action priors can also improve the computation speeds of learning feasible policies in a shorter period of time.	en_ZA
dc.description.librarian	MT2018	en_ZA
dc.format.extent	Online resource (120 leaves)
dc.identifier.citation	Mabena, Ntokozo (2017) Accelerating decision making under partial observability using learned action priors, University of the Witwatersrand, Johannesburg, <http://hdl.handle.net/10539/24175>
dc.identifier.uri	https://hdl.handle.net/10539/24175
dc.language.iso	en	en_ZA
dc.subject.lcsh	Markov processes
dc.subject.lcsh	Information technology--Management
dc.subject.lcsh	Knowledge management
dc.title	Accelerating decision making under partial observability using learned action priors	en_ZA
dc.type	Thesis	en_ZA

Files

Original bundle

Now showing 1 - 1 of 1

Name:: AcceleratingDecisionMakingUnderPartialObservabilityUsingLearnedActionPriors.pdf
Size:: 2.02 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

ETD Collection