Multi-Resolution Model Plus Correction Paradigm for Task and Skill Refinement on Autonomous Robots
Robots need to be taught what type of tasks or skills they are expected to perform, and how to perform those particular tasks or skills.
However, there is no universally accepted single approach for transferring the task and skill knowledge to a robot. Among several popular approaches, the most widely adopted method for transferring the task or skill knowledge to the robot is to develop an algorithm for performing the task or skill in question. Such a development requires a model of the system to be available. Moreover, despite that it usually is easier to develop a simple algorithm to handle trivial cases, it becomes a time consuming process to keep refining the algorithm by modifying the underlying model to handle more complex situations.
Learning from Demonstration (LfD) is another popular approach for transferring the task and skill knowledge to the robot. Instead of explicit programming, a teacher demonstrates the robot how to perform the task or skill and the robot records the demonstrated action together with the perceived state of the system at the time of demonstration. An execution policy is then derived out of the recorded demonstration data for reproducing the task or skill. Depending on the complexity of the task or skill in question and the robotic platform to be used, providing sufficient number of examples in order to be able to extract a generalized execution policy can be a very time consuming process.
This thesis contributes a novel complementary corrective demonstration paradigm called Model Plus Correction (M+C) for task and skill refinement on autonomous robots. The M+C approach strikes a balance between model-based and data-driven methods by combining them in a complementary manner. We assume the availability of an algorithm capable of performing the task or skill in question with limited success in terms of performance. Our approach utilizes a human teacher who observes the partially successful execution of the task, and corrects the action of the robot when the default algorithm is unable to select an appropriate action to be executed. The collected demonstration data stamped with the state of the system at the time of demonstration is then used to augment the default algorithm by modifying the action computed by the algorithm according to a correction reuse function, and the state of the system.
This thesis also introduces an algorithm for using the same complementary corrective demonstration approach at multiple detail resolutions.
The Multi-Resolution Model Plus Correction (MRM+C) algorithm assumes that a set of detail levels are defined with different state and action representations together with a different model-based controller for each detail level are available at hand. The teacher provides demonstration for which detail resolution to use at a particular state of the system in addition to delivering corrective demonstration for the controller associated with the current detail resolution. Having multiple detail resolutions with different complexities allows the system to use more detailed state and action representations and more complex model-based controllers only when needed. Using a less detailed state and action representation with a simpler controller makes it possible to cover the solution space at a lower computational cost and using fewer number of demonstrations. The learned detail resolution selection policy favors the least detailed resolution by default and switches to a more detailed resolution if commanded to do so in a similar state before.
We present experiment results where the M+C approach is first applied to a complex biped walk stability improvement problem as an example to the skill refinement, and to a ball dribbling problem in a robot soccer environment as an example to the task refinement. We also present experiment results where the MRM+C approach is applied to a humanoid obstacle avoidance task on a robot soccer field. Finally, we present an experimental analysis of the proposed algorithms in terms of their robustness against uncertainty and the cost analysis of using multiple detail resolutions over using a single detail resolution in a simulated version of the obstacle avoidance task.