Edge-augmented Graph Transformers: Global Self-attention is Enough for Graphs

—do_train —do_eval —train_batch_size 64 —num_train_epochs 50 —embeddings_learning_rate 0.7e-4 —encoder_learning_rate 0.7e-4 —classifier_learning_rate 7e-4 —warmup_steps 200 —max_seq_len 132 —dropout_rate 0.15 —metric_key_for_early_stop “macro avgf1-scorelevel_2” —logging_steps 200 —patience 6 —label2freq_level_1_dir /data2/code/DaguanFengxian/bert_model/data/label2freq_level_1.json —label2freq_level_2_dir /data2/code/DaguanFengxian/bert_model/data/label2freq_level_2.json —processor_sep “\t” —loss_fct_name dice