Accelerating protein engineering with fitness landscape modeling and reinforcement learning

µFormer, pre-trained using a pairwise masked language model (next thread) on UniRef50.
Fine-tune and evaluate the model on FLIP and ProteinGym (random split 🤔) using residue (capable of handling indels), motif, and sequence-level scoring modules.
Use µFormer as a reward model for protein optimization, applying one-point mutations with a Markov decision process. Train the mutation site and mutation type policy networks using PPO and Dirichlet noise.