Simulation Loop

The simulation loop in dojo brings everything together through an iterative process on a per block basis.
At each step, the environment emits an observation and the agents' rewards. These are processed by the policy which generates a sequence of actions. This is passed to the environment, which emits a new observation reflecting the new state of the protocol, and the agents' rewards for taking those actions.

agent-environemnt loop

Basic Pattern

This is the basic pattern of the simulation loop:

  1. Firstly, the environment emits an initial observation to the agent, which represents the state of the environment.
  2. Then the agent takes in the observations and makes decisions based on its policy. It also computes it's reward based on observations.
  3. If you are testing your strategy, this reward is simply a way of measuring your strategy performance.
  4. If you are training your strategy, the agent takes the reward function to optimize parameters based on the state-action-reward transition.
  5. The environment executes the actions and moves forward in time to the next block.
  6. At each step in the loop, a termination condition is checked. This condition could be a terminal state, in this case, for example when the agent runs out of money.
  7. The simulation loop keeps repeating this cycle until a predefined stopping condition is met.

If you want a reminder on some of the concepts here, take a longer peek at the environment, the agent or the policy as you see fit.

Example

simulation.py
pools = ["USDC/WETH-0.05"]
start_time = dateparser.parse("2021-06-21 00:00:00 UTC")
end_time = dateparser.parse("2021-06-21 12:00:00 UTC")
 
# Agents
agent1 = UniV3PoolWealthAgent(
  initial_portfolio={
      "ETH": Decimal(100),
      "USDC": Decimal(10_000),
      "WETH": Decimal(1),
  },
  name="TraderAgent",
)
agent2 = UniV3PoolWealthAgent(
  initial_portfolio={"USDC": Decimal(10_000), "WETH": Decimal(1)},
  name="LPAgent",
)
 
# Simulation environment (Uniswap V3)
env = UniV3Env(
  date_range=(start_time, end_time),
  agents=[agent1, agent2],
  pools=pools,
  backend_type="local",
  market_impact="replay",
)
 
# Policies
mvag_policy = MovingAveragePolicy(agent=agent1, short_window=25, long_window=100)
passive_lp_policy = PassiveConcentratedLP(
  agent=agent2, lower_price_bound=0.95, upper_price_bound=1.05
)
 
sim_blocks, sim_rewards = backtest_run(
  env, [mvag_policy, passive_lp_policy], dashboard_port=8051, auto_close=True
)

This is a visualization of the above simulation:

drawing

Saving data

For the moment, the most comprehensive way to store data is by also launching the dashboard.(Will will make that part standalon shortly).
The dashboard allows you to save all data in json-format on a per block basis. You can load it into an empty dashboard later, or read the JSON for further processing.