Temporal Logic Imitation: Learning Plan-Satisficing Motion Policies from Demonstrations

Yanwei Wang     Nadia Figueroa     Shen Li     Ankit Shah     Julie Shah
Interactive Robotics Lab
Massachusetts Institute of Technology


Abstract



Learning from demonstration (LfD) has succeeded in tasks featuring a long time horizon. However, when the problem complexity also includes human-in-the-loop perturbations, state-of-the-art approaches do not guarantee the successful reproduction of a task. In this work, we identify the roots of this challenge as the failure of a learned continuous policy to satisfy the discrete plan implicit in the demonstration. By utilizing modes (rather than subgoals) as the discrete abstraction and motion policies with both mode invariance and goal reachability properties, we prove our learned continuous policy can simulate any discrete plan specified by a linear temporal logic (LTL) formula. Consequently, an imitator is robust to both task- and motion-level perturbations and guaranteed to achieve task success.


Paper


Temporal Logic Imitation: Learning Plan-Satisficing Motion Policies from Demonstrations
Yanwei Wang, Nadia Figueroa, Shen Li, Ankit Shah, Julie Shah
arxiv / review / code / PBS News Coverage
CoRL 2022 (Oral, acceptance rate: 6.5%)
IROS 2023 Workshop (Best Student Paper, Learning Meets Model-based Methods for Manipulation and Grasping Workshop)



Teaser


Our Method (LTL-DS) inputs (1) an LTL formula that specifies all valid mode transitions for a task and (2) demonstrations that successfully complete the task, and outputs (1) a task automaton that can reactively sequence (2) a set of learned per-mode dynamical systems policy [SM Khansari-Zadeh 2011] to guarantee constraint satisfaction and goal reachability despite arbitrary external perturbations.




Talk


Main Question: Given a discrete task plan encoded by LTL that is reactive to perturbations, how to ensure the plan is feasible for continuous policies learned from demonstrations, i.e. to guarantee motion imitation satisfies LTL?

Main Takeaway: Any arbitrary discrete task plan of mode sequence is achievable by a continuous motion imitation system, if every learned per-mode policy satisfies both mode invariance and goal reachability.


How TLI (yellow box) is related to prior work (gray boxes)



    Generically Learned Motion Policy / Motion Policy with Stability Guarantee

    Given a few demonstrations (red trajectories), generically learned (state-based behavior cloning) motion policies do not guarantee policy rollouts will always reach a goal given perturbations (on the left), while dynamical systems policy (a BC-variant with G.A.S. property) guarantees goal reachability.
    Motion Policy without Mode Invariance / with Mode Invariance

    The task is to transition through the white, yellow, pink, and green regions consecutively. The pink region can only be entered from the yellow region, and the green region can only be entered from the pink region. Motion policies without mode invariance-the property that policy rollouts do not leave a mode prematurely-lead to looping despite LTL'reactivity (on the left), while motion policies with mode invariance (achieved by boundary estimation and modulation) ensure both constraint satisfaction and goal reachability.
    Iterative Boundary Estimation of Unknown Mode with Cutting Planes

    To modulate motion policies so that they become mode-invariant, unknown mode boundary is first estimated. Invariance failures detected by sensors are used to find cutting planes that bound the mode within which DS flows are modulated to stay. Note flows that have left the mode will re-enter the mode due to LTL's reactivity, and iteratively increasingly better boundary estimation is attained.


More Videos


    Generalization to New Tasks by Reusing Learned Skills

    LTL-DS generalizes to new task structures (encoded by LTL) by flexibly combining individual skills learned in demonstrations. Consider a demonstration of adding chicken (visiting the yellow region) and then broccoli (visiting the green region) to a pot (visiting the gray region). After individual DS of visiting the yellow, the green, and the gray region are learned, they can be recombined given a new LTL (refer to the paper) to solve new tasks such as (1) adding broccoli and then chicken, (2) adding only chicken, (3) continuously adding chicken. Note the white region represents an empty spoon and crossing from yellow/green to white means spilling the food.

    Line Inspection Task


    Color Tracing Task


    Scooping Task



MIT Museum Demo



A permanent interactive exhibition at MIT museum of programming robots via demonstrations





Grounding Language Plans in Demonstrations through Counterfactual Perturbations
Yanwei Wang, Tsun-Hsuan Wang, Jiayuan Mao, Michael Hagenow, Julie Shah
arxiv / code / project page
ICLR 2024 (Spotlight, acceptance rate: 5%)

This work learns grounding classifiers for LLM planning. By locally perturbing a few human demonstrations, we augment the dataset with more successful executions and failing counterfactuals. Our end-to-end explanation-based network is trained to differentiate successes from failures and as a by-product learns classifiers that ground continuous states into discrete manipulation mode families without dense labeling.