You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there, I'm new to HGT and your work inspired me a lot! I just have three quick questions:
considering n layers used, how could I obtain a unique attention score for a single source node to a target node. For now, I can only output the attention score per head in each layer.
is there a way to determine the number of layers to be used, and should there be any tendency of the attention score of the same s-t node pair in the lth layer and (l+1)th layer? e.g., attention score tends to be smaller as the layer increases.
how should I set the learning rate? will that affect the results a lot?
Thanks in advance!
The text was updated successfully, but these errors were encountered:
For the first question, I think "q_mat * k_mat * self.relation_pri[relation_type] / self.sqrt_dk" is the one that calculates attention score per edge, and you could print it out instead of doing "sum" for visualization.
This seems to be a hyper-parameter tuning problem, and I cannot give you the answer. It depends on different dataset & task.
It's also a hyper-parameter tuning problem. Normally for adam, 1e-3 is a good default choice. Tuning it might influence the result a little bit.
Hi there, I'm new to HGT and your work inspired me a lot! I just have three quick questions:
Thanks in advance!
The text was updated successfully, but these errors were encountered: