Counting Reward Automata: Exploiting Structure in Reward Functions Expressible in Decidable Formal Languages

Bester, Tristan

Counting Reward Automata: Exploiting Structure in Reward Functions Expressible in Decidable Formal Languages

dc.contributor.author	Bester, Tristan
dc.contributor.supervisor	Rosman, Benjamin
dc.contributor.supervisor	James, Steven
dc.contributor.supervisor	Tasse, Geraud Nangue
dc.date.accessioned	2025-07-21T10:59:01Z
dc.date.issued	2024-07
dc.description	A thesis submitted in fulfilment of the requirements for the degree of Master of Science, to the Faculty of Science, University of the Witwatersrand, Johannesburg, 2024.
dc.description.abstract	In general, reinforcement learning agents are restricted from directly accessing the environment model. This restricts the agent’s access to the environmental dynamics and reward models, which are only accessible through repeated environmental interactions. As reinforcement learning is well suited for use in complex environments, which are challenging to model, the general assumption that the transition probabilities associated with the environment are unknown is justified. However, as agents cannot discern rewards directly from the environment, reward functions must be designed and implemented for both simulated and real-world environments. As a result, the assumption that the reward model must remain hidden from the agent is unnecessary and detrimental to learning. Previously, methods have been developed that utilise the structure of the reward function to enable more sample-efficient learning. These methods employ a finite state machine variant to facilitate reward specification in a manner that exposes the internal structure of the reward function. This approach is particularly effective when solving long-horizon tasks as it enables the use of counterfactual reasoning with off-policy learning which significantly improves sample efficiency. However, as these approaches are dependent on finite-state machines, they are only able to express a small number of reward functions. This severely limits the applicability of these approaches as they cannot model simple tasks such as “fetch a coffee for each person in the office” which involves counting – one of the numerous properties finite state machines cannot model. This work addresses the limited expressiveness of current state machine-based approaches to reward modelling. Specifically, we introduce a novel approach compatible with any reward function which can be expressed as a well-defined algorithm We present the counting reward automaton – an abstract machine capable of modelling reward functions expressible in any decidable formal language. Unlike previous approaches to state machine-based reward modelling, which are limited to the expression of tasks as regular languages, our framework allows for tasks described by decidable formal languages. It follows that our framework is an extremely general approach to reward modelling – compatible with any task specification expressible as a well-defined algorithm. This is a significant contribution as it greatly extends the class of problems which can benefit from the improved learning techniques facilitated by state machine-based reward modelling. We prove that an agent equipped with such an abstract machine is able to solve an extended set of tasks. We show that this increase in expressive power does not come at the cost of increased automaton complexity. This is followed by the introduction of several learning algorithms designed to increase sample efficiency through the exploitation of automaton structure. These algorithms are based on counterfactual reasoning with off-policy RL and use techniques from the fields of HRL and reward shaping. Finally, we evaluate our approach in several domains requiring long-horizon plans. Empirical results demonstrate that our method outperforms competing approaches in terms of automaton complexity, sample efficiency, and task completion.
dc.description.submitter	MMM2025
dc.faculty	Faculty of Science
dc.identifier	0009-0007-1997-6272
dc.identifier.citation	Bester, Tristan. (2024). Counting Reward Automata: Exploiting Structure in Reward Functions Expressible in Decidable Formal Languages. [Master's dissertation, University of the Witwatersrand, Johannesburg]. WIReDSpace. https://hdl.handle.net/10539/45658
dc.identifier.uri	https://hdl.handle.net/10539/45658
dc.language.iso	en
dc.publisher	University of the Witwatersrand, Johannesburg
dc.rights	©2024 University of the Witwatersrand, Johannesburg. All rights reserved. The copyright in this work vests in the University of the Witwatersrand, Johannesburg. No part of this work may be reproduced or transmitted in any form or by any means, without the prior written permission of University of the Witwatersrand, Johannesburg.
dc.rights.holder	University of the Witwatersrand, Johannesburg
dc.school	School of Computer Science and Applied Mathematics
dc.subject	Reinforcement Learning
dc.subject	Automated Reasoning
dc.subject	Formal Languages
dc.subject	Knowledge Representation
dc.subject	UCTD
dc.subject.primarysdg	SDG-9: Industry, innovation and infrastructure
dc.subject.secondarysdg	SDG-4: Quality education
dc.title	Counting Reward Automata: Exploiting Structure in Reward Functions Expressible in Decidable Formal Languages
dc.type	Dissertation

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Bester_Counting_2024.pdf
Size:: 5.46 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 2.43 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Electronic Theses and Dissertations (Masters)