
- Investigate RL algorithms for protein sequence design using pLM as a reward function
- Use ESMFold as the oracle pLM, and Distill it into a smaller model to serve as the proxy reward model
- Train the proxy model with the Atlas dataset by a mean squared regression objective of pTM scores, finetuning periodically on sequences and their oracle scores