dm_robotics.panda.run_loop module

Simple run loop for an agent and environment.

Extended to support real-time execution.

dm_robotics.panda.run_loop.run(environment, agent, observers, max_steps, real_time: bool = False)[source]

Runs the agent ‘in’ the environment.

The loop is: 1. The environment is reset, producing a state. 2. The agent is given that state and produces an action. 3. That action is given to the environment. 4. The environment produces a new state. 5. GOTO 2

At most max_steps are demanded from the agent.

The environment cam produce three types of step: * FIRST: The first step in an episode. The next step will be MID or LAST. * MID: A step that is neither the first nor last. * LAST: The last step in this episode. The next step will be FIRST.

Depending on the type of step emitted by the environment, the observers have different methods called: * FIRST: observer.begin_episode(0) * MID: observer.step(0, env_timestep, agent_action) * LAST: observer.end_episode(0, 0, env_timestep)

The agent_action passed to observer.step is the action the agent emitted given env_timestep, at the time the observer is called, the action has not yet been given to the environment.

When a LAST timestep is received, the agent is given that timestep, but the action it emits is discarded.

Parameters:
  • environment – The environment to run the agent “in”.

  • agent – The agent that produced actions.

  • observers – A sequence of observers, see the docstring.

  • max_steps – The maximum number of time to step the agent.

  • real_time – If True, throttles the loop to run in real-time.