Problems for attention scores #41

PegasusAM · 2021-09-17T05:31:00Z

Hi there, I'm new to HGT and your work inspired me a lot! I just have three quick questions:

considering n layers used, how could I obtain a unique attention score for a single source node to a target node. For now, I can only output the attention score per head in each layer.
is there a way to determine the number of layers to be used, and should there be any tendency of the attention score of the same s-t node pair in the lth layer and (l+1)th layer? e.g., attention score tends to be smaller as the layer increases.
how should I set the learning rate? will that affect the results a lot?

Thanks in advance!

acbull · 2021-09-24T04:48:39Z

Hi:

For the first question, I think "q_mat * k_mat * self.relation_pri[relation_type] / self.sqrt_dk" is the one that calculates attention score per edge, and you could print it out instead of doing "sum" for visualization.
This seems to be a hyper-parameter tuning problem, and I cannot give you the answer. It depends on different dataset & task.
It's also a hyper-parameter tuning problem. Normally for adam, 1e-3 is a good default choice. Tuning it might influence the result a little bit.

Provide feedback