Extended Experiments

Acting and Planning with Hierarchical Operational Models on a Mobile Robot

How to read this page

What was tested?

The experiments probe whether RAE+UPOM selects high-utility actions, adapts when the world changes, and recovers when execution fails. The real-robot videos below are repeated runs of Study 1.1; the simulations isolate changes in object location and reward.

Completed Recovered or partial Limitation exposed

Real Robot Experiments

In every applicable run, the robot first selected the table with the highest estimated reward.

Completed

Trial 1.1.1

The box was used as a transport container and the reward-carrying objects were collected.

Recovered

Trial 1.1.2

All reward-carrying objects were collected without using the available box. The robot also recovered from cable entanglement.

Partial recovery

Trial 1.1.3

After loading the box, its grasp failed. The deliberation system responded by returning to the box table.

Partial recovery

Trial 1.1.4

The post-insertion box grasp failed. RAE+UPOM selected a return to the table to address the unresolved transport task.

Partial recovery

Trial 1.1.5

The robot loaded the box but could not grasp it afterwards. It returned to the box table as a recovery action.

Completed

Trial 1.1.6

The box was used as intended and the reward-carrying objects were collected.

Recovered, then limited

Trial 1.1.7

The system recovered after cable entanglement. A later box-grasp failure triggered a return to the box table.

Recovered

Trial 1.1.8

The robot recovered from cable entanglement and collected all reward-carrying objects, although it did not use the box.

Completed

Trial 1.1.9

The box served as the transport container and the reward-carrying objects were collected.

Grasp limitation

Trial 1.1.10

Objects were inserted into the box, but the robot could not grasp the loaded box afterwards.

Simulation Experiments

Controlled simulations test adaptation to changed object locations and changed rewards, including conditions with and without the transport box.

Adapted

Study 4.1: object moved ahead

An object was moved to an unvisited table. Perception updated the symbolic state and the robot later collected the object there.

Result: 3 of 3 objects collected.

Model assumption exposed

Study 4.2: object moved behind

An object was moved onto the table currently being processed. The robot left it behind, exposing an assumption about already processed objects.

Result: 2 of 3 objects collected.

Priority changed

Study 5.1: mustard reward increased

With no box available, increasing the mustard object's reward made the robot prioritize its table under the time-decaying utility function.

Result: 4 of 4 objects collected.

Priority changed

Study 5.2: multimeter reward increased

With the box available, the robot prioritized the multimeter and placed it in the transport container before handling other objects.

Result: 2 of 2 objects collected.

Summary plots

Across the physical runs

Real-robot outcomes

The robot consistently chose the highest-reward table. It recovered from every observed cable entanglement and from a navigation failure, while missed perception and loaded-box grasping remained unresolved limitations.

11/11 highest-reward table selected first 5/5 cable entanglements recovered 1/1 navigation failure recovered

Across controlled scenarios

Simulation outcomes

The simulations show successful replanning when changes occur in unexplored parts of the world and reward-sensitive prioritization. Moving an object onto the current table reveals where the operational model needs refinement.

11/12 objects collected without relying on the box 3/3 objects collected when one was moved ahead 2/2 reward changes produced the expected priority