Skip to content

Amit's Proposal

amitsingh19975 edited this page Jun 4, 2019 · 5 revisions

Abstract

Tensors lacks the ability of compile time extents and strides which I'm going to add with static extents and static stride which decides when to use dynamic and static based on user's choice. Another feature which tensor lacks is the ability to specify type of storage either it be sparse or band and I'm going to provide few customization points using which user can pass their own storage policy. At the end I'm going to design devices APIs for GPUs and CPUs which will help in increasing the speed of computation.

Proposal

Title of my Proposal is Design Policy And Improve The Design Of Tensor

What is Design Policy ?

Policy-based design is a great way for library authors to provide more flexibility to the user. Instead of hard coding certain behaviors, policy-based design provides various policies the users can select to customize the behavior. If done properly, a library author can accommodate all use cases with a single implementation. It was first popularized in C++ by Andrei Alexandrescu with Modern C++ Design.

For more info Click here

For ex :-

template<typename LanguagePolicy>
struct Book : LanguagePolicy{}

Static Extents

It can store extents at compile time which reduces overhead and binary size. It also increase the speed of Tensor to do computation as all the extent computation is already done during compile time. I'm going to provide same APIs as current dynamic extents because I want seamless transition between them as user won't be able to distinguish which is which. The design is based on [kokkos mdspan] (https://github.com/kokkos/array_ref/blob/master/reference/include/mdspan) which is beautifully designed and executed.

Example code :-

static_extents<4> e(1,2,3,4); // static_extents ==> <1,2,3,4>
static_extents<4,1,2,3,4> f;  // static_extents ==> <1,2,3,4>

where argument defines the rank of extents.

There are 3 cases which arise due to static and dynamic extents

  1. static rank and static extents
  2. static rank and dynamic extents
  3. dynamic rank and dynamic extents

shape_t is a way to define all three way Example code :-

auto e = shape_t<4,1,2,3,4>(); // static rank and static extents
auto f = shape_t<4>(1,2,3,4); // static rank and dynamic extents
auto f2= shape_t<4,1,dynamic_extent,dynamic_extent,4>(2,3); // static rank and dynamic extents
auto g = shape_t<dynamic_shape>{1,2,3,4}; // dynamic rank and dynamic extents

Static Stride

It is similar to static extents but it stores strides which could be of two types Column Major and Row Major ( for more info on layout click here )

Example code :-

auto f = static_stride<static_extents<4,1,2,3,4>, first_order>();
auto l = static_stride<static_extents<4,1,2,3,4>, last_order>();

Storage Type

It is a way to make Tensor to store data in a specific way and retrieve when needed. There are three ways to store data which are

  1. Dense Tensors store values in a contiguous sequential block of memory where all values are represented because of which they heavy on memory side. If we have large amount of non-zeroes or non-zero elements are much greater than number of zero elements, it is prefered to store in dense as there is no gain in using other storage type. As they are contiguous memory which can be cached and making operations faster than other containers.

  2. Sparse Tensors is a large tensor which has large amount of zeroes or zero elements are much greater than number of non-zero elements, it's faster to perform computation by iterating through the non-zero elements. They are compressed using various algorithms or data structure such as CSR, maps, etc.

  3. Band Tensors is a sparse tensor whose non-zero entries are confined to a diagonal band , comprising the main diagonal and zero or more diagonals on either side. It is also stored similarly as sparse tensor.

Device Policy or Execution Policy

It is a way to tell tensor how to execute the tensor operation and where to perform it for example you want to perform the tensor operation on cpu and parallel.

Example code but not the final API, it is just for getting the gist of the device and how it's gonna work :-

template<typename ExecutionPolicy = device::cpu::parallel>
void do_something();
Clone this wiki locally