Trial 1.1.1
The box was used as a transport container and the reward-carrying objects were collected.
Acting and Planning with Hierarchical Operational Models on a Mobile Robot
Back to Project PageHow to read this page
The experiments probe whether RAE+UPOM selects high-utility actions, adapts when the world changes, and recovers when execution fails. The real-robot videos below are repeated runs of Study 1.1; the simulations isolate changes in object location and reward.
In every applicable run, the robot first selected the table with the highest estimated reward.
The box was used as a transport container and the reward-carrying objects were collected.
All reward-carrying objects were collected without using the available box. The robot also recovered from cable entanglement.
After loading the box, its grasp failed. The deliberation system responded by returning to the box table.
The post-insertion box grasp failed. RAE+UPOM selected a return to the table to address the unresolved transport task.
The robot loaded the box but could not grasp it afterwards. It returned to the box table as a recovery action.
The box was used as intended and the reward-carrying objects were collected.
The system recovered after cable entanglement. A later box-grasp failure triggered a return to the box table.
The robot recovered from cable entanglement and collected all reward-carrying objects, although it did not use the box.
The box served as the transport container and the reward-carrying objects were collected.
Objects were inserted into the box, but the robot could not grasp the loaded box afterwards.
Controlled simulations test adaptation to changed object locations and changed rewards, including conditions with and without the transport box.
An object was moved to an unvisited table. Perception updated the symbolic state and the robot later collected the object there.
Result: 3 of 3 objects collected.
An object was moved onto the table currently being processed. The robot left it behind, exposing an assumption about already processed objects.
Result: 2 of 3 objects collected.
With no box available, increasing the mustard object's reward made the robot prioritize its table under the time-decaying utility function.
Result: 4 of 4 objects collected.
With the box available, the robot prioritized the multimeter and placed it in the transport container before handling other objects.
Result: 2 of 2 objects collected.
Across the physical runs
The robot consistently chose the highest-reward table. It recovered from every observed cable entanglement and from a navigation failure, while missed perception and loaded-box grasping remained unresolved limitations.
Across controlled scenarios
The simulations show successful replanning when changes occur in unexplored parts of the world and reward-sensitive prioritization. Moving an object onto the current table reveals where the operational model needs refinement.