Info Loss on noisy scores

Hi,

Can you shed some light on why you use

loss, loss_dict = self.__loss__(att, clf_logits, data.y, epoch) 

and not loss, loss_dict = self.__loss__(att_log_logits, clf_logits, data.y, epoch)

Although the noise acts as stochastic attention, the noise may change the scores significantly, making the KLD with the Q distribution meaningless to tune MLP output.

Thanks and Regards