Reinforcement Learning for Sequence Design Leveraging Protein Language Models

Untitled

Investigate RL algorithms for protein sequence design using pLM as a reward function
Use ESMFold as the oracle pLM, and Distill it into a smaller model to serve as the proxy reward model
Train the proxy model with the Atlas dataset by a mean squared regression objective of pTM scores, finetuning periodically on sequences and their oracle scores