v0.7.2
Release Notes:
cudnn_frontend v0.7 aims to target the new features introduced in cudnn version v8.5 (https://developer.nvidia.com/cudnn). The following are the changes in the v0.7 release.
[New API] Added support for Resample operation.
[New API] Tensor class has a clone method which allows a user to quickly create a new Tensor object with similar attributes.
[New API] Added support for new pointwise operations CUDNN_POINTWISE_ERF, CUDNN_POINTWISE_GELU_APPROX_TANH_FWD, CUDNN_POINTWISE_GELU_APPROX_TANH_BWD, CUDNN_POINTWISE_IDENTITY.
[New API] Several API names have been unified and made consistent across multiple descriptors for readability.
setComputePrecision/setMathPrecision/setMathType have been unified into setComputeType in cudnn_frontend_ConvDesc.h, cudnn_frontend_MatMulDesc.h, cudnn_frontend_Operation.h, cudnn_frontend_PointWiseDesc.h, cudnn_frontend_ReductionDesc.h, cudnn_frontend_Resample.h
Math operations like ConvDesc, ResampleDesc have getSpatialDimCount instead of getDimCount to avoid confusion with Tensor Dimensions.
Accessors for arrays will have [g,s]et[Spatial] as the API. [Spatial] is only needed when the attribute is common to both Tensor descriptor and Operation descriptor. Currently, its only the Stride and DimCount attributes that have ambiguity.
setArray functions will take size and pointer as arguments eg. setStride(int dim, int64_t* arr), setSpatialStride(int dim, int64_t* arr)
getArray functions will return a pointer to the array whose size is determined by getDimCount or getSpatialDimCount
[Minor Enhancement] Execution plans and Operation Graph printout more information in their describe() method.
[Bug Fixes] Some samples have been updated to go over all fallback configs to ensure that a successful plan is built.
[Bug Fixes] Execution plans had wrongly initialized numerical note CUDNN_NUMERICAL_NOTE_TYPE_TENSOR_CORE. This has been fixed.
[Samples] Added a new sample that does scale and bias of two tensors, adds them followed by a ReLU operation to show how fused operations work.
[Samples] Added a sample to demonstrate how the resample operation works.
[Samples] Added a new sample which shows convolution followed by multiple scales.
[Samples] Added a sample to show Fully Connected Layer fused with GeLU forward.
[Samples] Added a new sample to show fused backward activation, backward bias and backward Data Grad operation.
The current FE is designed to be compatible with all minor releases in the cuDNN 8.x version
v0.7.1
[Enhancement] Additional commit to remove an extraneous include to cudnn_ops_infer.h
v0.7.2
[Enhancement] Fixed issues in the code which caused warnings in MSVC and clang compilers.
[Enhancement] Fixed errors in get_heuristics_list where for certain heuristics mode in older cuDNN versions, the heuristics list might be incorrect.
[Bug fixes] Fixed several test cases failing on unsupported GPUs to exit gracefully.
[Samples] Added a sample to showcase fp8 convolution forward in Nvidia Hopper GPUs. The sample also showcases post convolution book-keeping operations such as scaling and absolute maximum reduction.
[Samples] Added a sample which converts fp16 tensor to fp8 and performs transpose and absolute maximum reduction.
[Samples] Added a sample to demonstrate Max pooling operation including tensor index dump, necessary to speed up the backward pass.
[Samples] Added a sample to showcase the backward pooling operation.