Policies

The agent follows "policies" to make decisions.

The policy defines the mapping from the environment-observations to the actions that are being taken.

Within Dojo, the policy can be referred to as the agent’s DeFi strategy. This is where you can get creative by implementing your own strategy!


Purpose

Policies generally provide the following functionality:

  1. Testing: the policy predict(obs) method is run to generate a sequence of actions to be run in the block.
  2. Training: the policy fit(*args, **kwargs) method can be implemented and run to optimize the policy parameters to maximize the agent reward function, or any other optimization framework you like.

Policy Implementation

To test out your own DeFi policy, all you need to do is implement the predict(obs) method.
It takes the observation object from the environment as input and returns a list of actions as output.

Example 1: Test your DeFi strategy

In this example, we consider a basic policy for trading on UniswapV3, where we define an upper and lower spot-price and once the price reaches the upper or lower limit, all tokens are converted to the currently lower value token:

policy.py
class PriceWindowPolicy(BasePolicy):
  def __init__(
      self, agent: BaseAgent, lower_limit: float, upper_limit: float
  ) -> None:
      super().__init__(agent=agent)
      self.upper_limit = upper_limit
      self.lower_limit = lower_limit
 
  # derive actions from observations
  def predict(self, obs: UniV3Obs) -> List[BaseAction]:
      pool = obs.pools[0]
      x_token, y_token = obs.pool_tokens(pool)
      spot_price = obs.price(token=x_token, unit=y_token, pool=pool)
 
      x_quantity, y_quantity = self.agent.quantity(x_token), self.agent.quantity(
          y_token
      )
 
      if spot_price > self.upper_limit and y_quantity > Decimal("0"):
          action = UniV3Trade(
              agent=self.agent,
              pool=pool,
              quantities=(Decimal(0), y_quantity),
          )
          return [action]
 
      if spot_price < self.lower_limit and x_quantity > Decimal("0"):
          action = UniV3Trade(
              agent=self.agent,
              pool=pool,
              quantities=(x_quantity, Decimal(0)),
          )
          return [action]
 
      return []
 

Example 2: Train your DeFi strategy

If you want to take it one step further, dojo allows you to encode a parametric model in your policy and optimize it however you want.

To show you how, we take the static policy from Example 1, but let you train your strategy to tune the upper and lower limit parameter to improve the performance of your strategy.

To start with, we might think that when volatility is high, the spread between limits should be further apart. Let’s implement the simplest way of doing this:

train.py
class DynamicPriceWindowPolicy(PriceWindowPolicy):
 
  # upper and lower limit are now parameters of the policy
  def __init__(
      self, agent: BaseAgent, lower_limit: float, upper_limit: float
  ) -> None:
      super().__init__(agent=agent, lower_limit=lower_limit, upper_limit=upper_limit)
      self.old_price = 0
      self.spread = self.upper_limit - self.lower_limit
      self.center = (self.upper_limit + self.lower_limit) / 2
      self.returns = []
 

These are just examples of testing and training policies to get you started. You can get a lot more creative and sophisticated than this.