School of Computer Science and Applied Mathematics (ETDs)

Permanent URI for this communityhttps://hdl.handle.net/10539/38004

Browse

Search Results

Now showing 1 - 1 of 1
  • Item
    Towards Lifelong Reinforcement Learning through Temporal Logics and Zero-Shot Composition
    (2024-10) Tasse, Geraud Nangue; Rosman, Benjamin; James, Steven
    This thesis addresses the fundamental challenge of creating agents capable of solving a wide range of tasks in their environments, akin to human capabilities. For such agents to be truly useful and be capable of assisting humans in our day-to-day lives, we identify three key abilities that general purpose agents should have: Flexibility, Instructability, and Reliability (FIRe). Flexibility refers to the ability of agents to adapt to various tasks with minimal learning; instructability involves the capacity for agents to understand and execute task specifications provided by humans in a comprehensible manner; and reliability entails agents’ ability to solve tasks safely and effectively with theoretical guarantees on their behavior. To build such agents, reinforcement learning (RL) is the framework of choice given that it is the only one that models the agent-environment interaction. It is also particularly promising since it has shown remarkable success in recent years in various domains—including gaming, scientific research, and robotic control. However, prevailing RL methods often fall short of the FIRe desiderata. They typically exhibit poor sample efficiency, demanding millions of environment interactions to learn optimal behaviors. Task specification relies heavily on hand-designed reward functions, posing challenges for non-experts in defining tasks. Moreover, these methods tend to specialize in single tasks, lacking guarantees on the broader adaptability and behavior robustness desired for lifelong agents that need solve multiple tasks. Clearly, the regular RL framework is not enough, and does not capture important aspects of what makes humans so general—such as the use of language to specify and understand tasks. To address these shortcomings, we propose a principled framework for the logical composition of arbitrary tasks in an environment, and introduce a novel knowledge representation called World Value Functions (WVFs) that will enable agents to solve arbitrary tasks specified using language. The use of logical composition is inspired by the fact that all formal languages are built upon the rules of propositional logics. Hence, if we want agents that understand tasks specified in any formal language, we must define what it means to apply the usual logic operators (conjunction, disjunction, and negation) over tasks. The introduction of WVFs is inspired by the fact that humans seem to always seek general knowledge about how to achieve a variety of goals in their environment, irrespective of the specific task they are learning. Our main contributions include: (i) Instructable agents: We formalize the logical composition of arbitrary tasks in potentially stochastic environments, and ensure that task compositions lead to rewards minimising undesired behaviors. (ii) Flexible agents: We introduce WVFs as a new objective for RL agents, enabling them to solve a variety of tasks in their environment. Additionally, we demonstrate zero-shot skill composition and lifelong sample efficiency. (iii) Reliable agents: We develop methods for agents to understand and execute both natural and formal language instructions, ensuring correctness and safety in task execution, particularly in real-world scenarios. By addressing these challenges, our framework represents a significant step towards achieving the FIRe desiderata in AI agents, thereby enhancing their utility and safety in a lifelong learning setting like the real world.