Uses GVP to encode local structures into dense vectors, and k-means clustering to create a codebook of discrete tokens
Trains MLM with both sequences and strucutre tokens with disentangled attention