
- Train EnzymeFlow on three stages: PDB backbone data, PDBBind, and EnzymeFill dataset with both substrate and final product information
- Collect EnzymeFill datasets from Rhea, MetaCyc, and Brenda, using AlphaFill to identify catalytic pockets through ligand transplantation from homologous proteins. Contains 53,483 enzyme-reaction pairs after data debiasing (60% homology)
- Use coEvoFormer to model enzyme-reaction co-evolution (MSA of proteins with reaction SMILES). Takes 3D GNNs to encode substrates and 2D GNNs to encode products, employing cross-attention for interactions between catalytic pocket and substrate/product molecules
- Invent a Pocket specific CLIP-like model to align enzyme pockets and reactions, using joint embeddings of catalytic pockets (ESM3) and reactions (substrate and product molecular graphs using MAT)