`matmul` or `tensordot` in GPU #500

scarlehoff · 2023-08-11T12:32:50Z

I'm currently working on a project for which we need to* run some operations with Galois fields on a GPU. The main non-trivial operation we need to perform would be te equivalent of a np.tensordot. I've read in some other issues that there has been some talk about porting this code to GPU (like here) so we were wondering whether there's a chance you already have some WIP code or beta version we could test out?

If you already tried and found some specific problems that made it impossible / very difficult that would be interesting to know as well to find some alternatives (or just abandon the idea of running that part of the calculation on GPU).

Thank you very much!

Edit: (sorry, pressed enter too soon :P)

*or better, where we would benefit from...

The text was updated successfully, but these errors were encountered:

mhostetter · 2023-08-11T18:15:21Z

I don't have any good WIP code to share.

The thing I quickly discovered was CuPy arrays are not subclasses of np.ndarray. So the overridden hooks in FieldArray wouldn't trigger on a CuPy array. I'll need some other solution for a FieldArray on GPU. But these are issues with the API and interface for users.

Regarding the algorithm porting, it should be doable. But the API might not be as pretty.

What's your use-case field? I can point you to some small changes that might enable testing on a GPU.

scarlehoff · 2023-08-12T08:51:56Z

An use-case example would be:

import galois
import numpy as np

p = 2**31 - 19

FF = galois.GF(p)

ar1 = np.random.randint(1, p, size=40).reshape(4, 5, 2)
ar2 = np.random.randint(1, p, size=90).reshape(5, 9, 2)

f1 = FF(ar1)
f2 = FF(ar2)

final_result = np.tensordot(f1, f2, axes=[(2, 1), (2, 0)])

Our preferred use-case would be something like this. Of course this could also be achieved with

t1 = f1.reshape(4, 10)
t2 = np.transpose(f2, (0,2,1)).reshape(10, 9)
tt = t1 @ t2

so matmul is enough (tensordor would just be "more convenient", einsum even better :P)

We have a batch dimension so one of the axis would be some large number of events (hence the benefit from the gpu)

mhostetter · 2023-08-12T11:37:15Z

With the exception of the tensordot() call, the above code currently runs. However, it's on CPU. I'm guessing in your desired use case you'd use CuPy?

Also, a tip: If you have large arrays, it's better to use f1 = ar1.view(FF) rather than f1 = FF(ar1). view() is copy-less while FF() uses a copy. See https://mhostetter.github.io/galois/latest/basic-usage/array-creation/#numpy-array

scarlehoff · 2023-08-21T10:14:59Z

I'm guessing in your desired use case you'd use CuPy?

Yes, CuPy would be nice. But we are very flexible in the choice of library (the more numpy-like the interface is the better, of course, but it is just a minor inconvenience).

scarlehoff changed the title ~~matmul~~ matmul or tensordot in GPU Aug 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`matmul` or `tensordot` in GPU #500

`matmul` or `tensordot` in GPU #500

scarlehoff commented Aug 11, 2023 •

edited

Loading

mhostetter commented Aug 11, 2023

scarlehoff commented Aug 12, 2023

mhostetter commented Aug 12, 2023

scarlehoff commented Aug 21, 2023

matmul or tensordot in GPU #500

matmul or tensordot in GPU #500

Comments

scarlehoff commented Aug 11, 2023 • edited Loading

mhostetter commented Aug 11, 2023

scarlehoff commented Aug 12, 2023

mhostetter commented Aug 12, 2023

scarlehoff commented Aug 21, 2023

`matmul` or `tensordot` in GPU #500

`matmul` or `tensordot` in GPU #500

scarlehoff commented Aug 11, 2023 •

edited

Loading