Specifying Preferences over Policies Using Branching Time Temporal Logic
Resumo
Automated Planning is the subarea of AI devoted to developing algorithms that can solve sequential decision making problems. By taking a formal description of the environment, a planning algorithm generates a plan of actions (also called policy) that can guide an agent to accomplish a certain task. Classical planning assumes the environment is fully-observed and evolves in a deterministic way considering only simple reachability goals (e.g. a set of states to be reached by a plan or policy). In this work, we approach fully-observed non-deterministic planning (FOND) tasks which allow the specification of complex goals such as the preference over policy quality (weak, strong or strong-cyclic) and preferences over states in the paths generated by a policy. To solve this problem we propose formulae in α-CTL (branching time) temporal logic and use planning as model checking algorithms based on α-CTL to generate a solution that captures both, agent’s preferences and the desired policy quality. To evaluate the effectiveness of the proposed formulae and algorithms, we run experiments in the Rovers benchmark domain. Up to our knowledge, this is the first work to solve non-deterministic planning problems with preferences using a CTL temporal logic.