Simulation Loop
The simulation loop in dojo brings everything together through an iterative process on a per block basis.
At each step the environment emits an observation and the agents' rewards. These are processed by the policy which generates a sequence of actions. This is passed to the environment, which emits a new observation reflecting the new state of the protocol, and the agents' rewards for taking those actions.
Basic Pattern
This is the basic pattern of the simulation loop:
- Firstly, the environment emits an initial observation to the agent, which represents the state of the environment.
- Then the agent takes in the observations and makes decisions based on its policy. It also computes its reward based on observations.
- If you are testing your strategy, this reward is simply a way of measuring your strategy performance.
- The environment executes the actions and moves forward in time to the next block.
- The simulation loop keeps repeating this cycle until it reaches the end block.
If you want a reminder on some of the concepts here, take a longer peek at the environment, the agent or the policy as you see fit.
Example
This is a visualization of the above simulation:
Saving data
One way to store data is through the dashboard. The dashboard allows you to save all data in JSON format on a per block basis. You can load it into an empty dashboard later, or read the JSON for further processing. You can also turn a db file produced by your simulation directly into JSON format by using `dojo.external_data_providers.exports.json_convertor. Consult the documentation for more details.