
- Propose single-inference PLL calculation for ESM-2-type pLMs by using pretraining mask rates and token likelihood
- Use influence functions to study how a single training protein sequence impacts model performance, finding that highly homologous sequences exert the most influence
- Finetune the model over homologous sequences to boost fitness prediction