
- Comprises two tasks: classifying protein sequences by EC number and retrieving enzyme sequences based on chemical reactions
- Proposes Contrastive Reaction-EnzymE Pretraining (CREEP) leveraging multimodal contrastive learning for Task 2 (Next Thread)

- Rxnfp: A BERT-style language model trained on reactions represented as SMILES/SMARTS strings
- ProtT5 as protein sequence encoder
- SciBERT for additional textual descriptions based on gene ontology
- Adopts EBM-NCE to maximize the mutual information between all three modalities