v1.0.3 release
[Bug fix] Fixed an issue where in some cases with padding, SDPA backward node can produce NaNs.
[Bug fix] In some older cuda toolkits, eg. cuda 11.4, float to half conversion is not implicit. This was raised in PR-57. Thanks @drisspg for reporting this. A more explicit fix using __float2half
has been implemented in this patch.
[Enhancement] Accepting github PR-55. Thanks @r-barnes for the suggestion.