-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FFT using pyvkfft and use loopy callables #114
Merged
Merged
Changes from 65 commits
Commits
Show all changes
67 commits
Select commit
Hold shift + click to select a range
ad39de6
Use a separate class for M2L translation
isuruf 9cb4caa
Fix docs and caching
isuruf c5bc9b2
Fix p2p warning
isuruf 88c9fd8
Use VkFFT for M2L generate data
isuruf bab8f38
Fix profiling events
isuruf 68864a2
simplify m2l data zeros
isuruf 03cc94e
Add pyvkfft to requirements
isuruf c440950
Fix flake8 warning
isuruf 78faead
Fix typo
isuruf 4f33038
VkFFT for M2L preprocess local
isuruf efb6d99
vkfft for postprocess local
isuruf 2e1b10e
Fix AggregateProfilingEvent
isuruf 59d7be6
Fix another typo
isuruf 72b4875
M2L Translation Factory
isuruf 83f7fd8
vim markers
isuruf bed782f
Merge branch 'isuruf/m2l' into fft
isuruf 7cf5404
Fix tests
isuruf 584d2c9
Fix toys
isuruf 6f5ad1f
Fix test_m2l_toeplitz
isuruf 47a4a27
Fix more tests
isuruf 60ef708
Use a better rscale to get the test passing
isuruf 85e0ed1
Use pytential dev branch
isuruf 9f74eec
Merge branch 'isuruf/m2l' into fft
isuruf 9162d17
Merge branch 'main' into m2l
isuruf 3880661
Merge branch 'isuruf/m2l' of https://github.com/inducer/sumpy into fft
isuruf a57f727
remove whitespace on blank line
isuruf ea6a99c
Try 2r/order instead of r/order
isuruf df991c6
Merge branch 'isuruf/m2l' of https://github.com/inducer/sumpy into fft
isuruf 4eaad6a
fix using updated pytential
isuruf 9119880
Merge branch 'main' of https://github.com/inducer/sumpy into fft
isuruf 21236d7
Fix tests
isuruf 9ddd3a9
use pytential branch with pyvkfft req
isuruf e5dea13
Add explanation about caller being responsible for the FFT
isuruf 52de95d
Fix for bessel
isuruf 3ddc0db
Merge branch 'main' into fft
inducer bec642e
Add pyvkfft to setup.py reqs
isuruf e6d62a4
use list comprehension
isuruf 774f869
Type annotations
isuruf 2c8a5bf
fix vim marker
isuruf 61ddc0b
remove unused function
isuruf 50f8bb3
m2l_data_inner -> m2l_data
isuruf c3eaa32
more descriptive name for child_knl
isuruf 9cc214e
knl -> expr_knl for clarity
isuruf 7d9f535
move loop unroll to optimized
isuruf 07c1c93
Add explanation about translation_classes_dependent_data_loopy_knl
isuruf 25dd7fc
make coeffs output only and rewrite
isuruf 1252c5a
Re-arrange m2l so that event processing is easier
isuruf 3b3f6f3
flake8: single quotes -> double quotes
isuruf 8e9649f
Fix data not being input
isuruf a66a7cc
make args to cached_vkfft_app explicit
isuruf fff0d13
cache vkfftapp in wrangler
isuruf e80c71d
keep coeffs is_input and is_output for e2e
isuruf 8d72d66
out-of-place fft
isuruf 98b4bcc
Use a separate queue for configuration
isuruf d7da927
Merge branch 'main' into fft
isuruf 3e9632b
allocate array for out-of-place
isuruf aafe8b2
fix typo
isuruf d4bfc05
Remove caching of opencl fft app
isuruf 899006e
Comment out pytentual fork
isuruf a0942ed
fix vkfft queues
isuruf a5996a5
use private API for now
isuruf da7f310
Merge branch 'main' into fft
isuruf bf083c9
Merge branch 'main' into fft
isuruf 741a1cc
Add comment on pyvkfft PR
isuruf 15df148
Merge branch 'main' into fft
inducer b8af2bd
remove inplace
isuruf e073d73
Merge branch 'main' into fft
isuruf File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,3 +15,4 @@ dependencies: | |
- python-symengine | ||
- pyfmmlib | ||
- pyrsistent | ||
- pyvkfft |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,7 +41,10 @@ | |
E2EFromChildren, E2EFromParent, | ||
M2LGenerateTranslationClassesDependentData, | ||
M2LPreprocessMultipole, M2LPostprocessLocal) | ||
from sumpy.tools import to_complex_dtype | ||
from sumpy.tools import (to_complex_dtype, AggregateProfilingEvent, | ||
run_opencl_fft, get_opencl_fft_app) | ||
|
||
from typing import TypeVar, List, Union | ||
|
||
|
||
# {{{ tree-independent data for wrangler | ||
|
@@ -176,6 +179,11 @@ def p2p(self): | |
exclude_self=self.exclude_self, | ||
strength_usage=self.strength_usage) | ||
|
||
@memoize_method | ||
def opencl_fft_app(self, shape, dtype, inplace): | ||
with cl.CommandQueue(self.cl_context) as queue: | ||
return get_opencl_fft_app(queue, shape, dtype, inplace) | ||
|
||
# }}} | ||
|
||
|
||
|
@@ -184,16 +192,28 @@ def p2p(self): | |
_SECONDS_PER_NANOSECOND = 1e-9 | ||
|
||
|
||
""" | ||
EventLike objects have an attribute native_event that returns | ||
a cl.Event that indicates the end of the event. | ||
""" | ||
EventLike = TypeVar("CLEventLike") | ||
|
||
|
||
class UnableToCollectTimingData(UserWarning): | ||
pass | ||
|
||
|
||
class SumpyTimingFuture: | ||
|
||
def __init__(self, queue, events): | ||
def __init__(self, queue, events: List[Union[cl.Event, EventLike]]): | ||
self.queue = queue | ||
self.events = events | ||
|
||
@property | ||
def native_events(self) -> List[cl.Event]: | ||
return [evt if isinstance(evt, cl.Event) else evt.native_event | ||
for evt in self.events] | ||
|
||
@memoize_method | ||
def result(self): | ||
from boxtree.timing import TimingResult | ||
|
@@ -208,7 +228,7 @@ def result(self): | |
return TimingResult(wall_elapsed=None) | ||
|
||
if self.events: | ||
pyopencl.wait_for_events(self.events) | ||
pyopencl.wait_for_events(self.native_events) | ||
|
||
result = 0 | ||
for event in self.events: | ||
|
@@ -222,7 +242,7 @@ def done(self): | |
return all( | ||
event.get_info(cl.event_info.COMMAND_EXECUTION_STATUS) | ||
== cl.command_execution_status.COMPLETE | ||
for event in self.events) | ||
for event in self.native_events) | ||
|
||
# }}} | ||
|
||
|
@@ -395,10 +415,18 @@ def local_expansion_zeros(self, template_ary): | |
dtype=self.dtype) | ||
|
||
def m2l_translation_classes_dependent_data_zeros(self, queue): | ||
return cl.array.zeros( | ||
queue, | ||
self.m2l_translation_classes_dependent_data_level_starts()[-1], | ||
dtype=self.preprocessed_mpole_dtype) | ||
result = [] | ||
for level in range(self.tree.nlevels): | ||
expn_start, expn_stop = \ | ||
self.m2l_translation_classes_dependent_data_level_starts()[ | ||
level:level+2] | ||
translation_class_start, translation_class_stop = \ | ||
self.m2l_translation_class_level_start_box_nrs()[level:level+2] | ||
exprs_level = cl.array.zeros(queue, expn_stop - expn_start, | ||
dtype=self.preprocessed_mpole_dtype) | ||
result.append(exprs_level.reshape( | ||
translation_class_stop - translation_class_start, -1)) | ||
return result | ||
|
||
def multipole_expansions_view(self, mpole_exps, level): | ||
expn_start, expn_stop = \ | ||
|
@@ -418,14 +446,10 @@ def local_expansions_view(self, local_exps, level): | |
|
||
def m2l_translation_classes_dependent_data_view(self, | ||
m2l_translation_classes_dependent_data, level): | ||
expn_start, expn_stop = \ | ||
self.m2l_translation_classes_dependent_data_level_starts()[level:level+2] | ||
translation_class_start, translation_class_stop = \ | ||
translation_class_start, _ = \ | ||
self.m2l_translation_class_level_start_box_nrs()[level:level+2] | ||
|
||
exprs_level = m2l_translation_classes_dependent_data[expn_start:expn_stop] | ||
return (translation_class_start, exprs_level.reshape( | ||
translation_class_stop - translation_class_start, -1)) | ||
exprs_level = m2l_translation_classes_dependent_data[level] | ||
return (translation_class_start, exprs_level) | ||
|
||
@memoize_method | ||
def m2l_preproc_mpole_expansions_level_starts(self): | ||
|
@@ -440,18 +464,19 @@ def order_to_size(order): | |
level_starts=self.tree.level_start_box_nrs) | ||
|
||
def m2l_preproc_mpole_expansion_zeros(self, template_ary): | ||
return cl.array.zeros( | ||
template_ary.queue, | ||
self.m2l_preproc_mpole_expansions_level_starts()[-1], | ||
dtype=self.preprocessed_mpole_dtype) | ||
|
||
def m2l_preproc_mpole_expansions_view(self, mpole_exps, level): | ||
expn_start, expn_stop = \ | ||
result = [] | ||
for level in range(self.tree.nlevels): | ||
expn_start, expn_stop = \ | ||
self.m2l_preproc_mpole_expansions_level_starts()[level:level+2] | ||
box_start, box_stop = self.tree.level_start_box_nrs[level:level+2] | ||
box_start, box_stop = self.tree.level_start_box_nrs[level:level+2] | ||
exprs_level = cl.array.zeros(template_ary.queue, expn_stop - expn_start, | ||
dtype=self.preprocessed_mpole_dtype) | ||
result.append(exprs_level.reshape(box_stop - box_start, -1)) | ||
return result | ||
|
||
return (box_start, | ||
mpole_exps[expn_start:expn_stop].reshape(box_stop-box_start, -1)) | ||
def m2l_preproc_mpole_expansions_view(self, mpole_exps, level): | ||
box_start, _ = self.tree.level_start_box_nrs[level:level+2] | ||
return (box_start, mpole_exps[level]) | ||
|
||
m2l_work_array_view = m2l_preproc_mpole_expansions_view | ||
m2l_work_array_zeros = m2l_preproc_mpole_expansion_zeros | ||
|
@@ -528,6 +553,11 @@ def box_target_list_kwargs(self): | |
|
||
# }}} | ||
|
||
def run_opencl_fft(self, queue, input_vec, inverse, wait_for, inplace): | ||
app = self.tree_indep.opencl_fft_app(input_vec.shape, input_vec.dtype, | ||
inplace) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ditch |
||
return run_opencl_fft(app, queue, input_vec, inverse, wait_for) | ||
|
||
def form_multipoles(self, | ||
level_start_source_box_nrs, source_boxes, | ||
src_weight_vecs): | ||
|
@@ -653,6 +683,7 @@ def eval_direct(self, target_boxes, source_box_starts, | |
|
||
@memoize_method | ||
def multipole_to_local_precompute(self): | ||
result = [] | ||
with cl.CommandQueue(self.tree_indep.cl_context) as queue: | ||
m2l_translation_classes_dependent_data = \ | ||
self.m2l_translation_classes_dependent_data_zeros(queue) | ||
|
@@ -672,6 +703,8 @@ def multipole_to_local_precompute(self): | |
m2l_translation_classes_dependent_data_view.shape[0] | ||
|
||
if ntranslation_classes == 0: | ||
result.append(pyopencl.array.empty_like( | ||
m2l_translation_classes_dependent_data_view)) | ||
continue | ||
|
||
data = self.translation_classes_data | ||
|
@@ -689,13 +722,19 @@ def multipole_to_local_precompute(self): | |
ntranslation_vectors=m2l_translation_vectors.shape[1], | ||
**self.kernel_extra_kwargs | ||
) | ||
m2l_translation_classes_dependent_data.add_event(evt) | ||
|
||
m2l_translation_classes_dependent_data.finish() | ||
if self.tree_indep.m2l_translation.use_fft: | ||
_, m2l_translation_classes_dependent_data_view = \ | ||
self.run_opencl_fft(queue, | ||
m2l_translation_classes_dependent_data_view, | ||
inverse=False, wait_for=[evt], inplace=False) | ||
result.append(m2l_translation_classes_dependent_data_view) | ||
|
||
m2l_translation_classes_dependent_data = \ | ||
m2l_translation_classes_dependent_data.with_queue(None) | ||
return m2l_translation_classes_dependent_data | ||
for lev in range(self.tree.nlevels): | ||
result[lev].finish() | ||
|
||
result = [arr.with_queue(None) for arr in result] | ||
return result | ||
|
||
def _add_m2l_precompute_kwargs(self, kwargs_for_m2l, | ||
lev): | ||
|
@@ -723,25 +762,40 @@ def multipole_to_local(self, | |
target_boxes, src_box_starts, src_box_lists, | ||
mpole_exps): | ||
|
||
preprocess_evts = [] | ||
queue = mpole_exps.queue | ||
local_exps = self.local_expansion_zeros(mpole_exps) | ||
|
||
if self.tree_indep.m2l_translation.use_preprocessing: | ||
preprocessed_mpole_exps = \ | ||
self.m2l_preproc_mpole_expansion_zeros(mpole_exps) | ||
for lev in range(self.tree.nlevels): | ||
m2l_work_array = self.m2l_work_array_zeros(local_exps) | ||
mpole_exps_view_func = self.m2l_preproc_mpole_expansions_view | ||
local_exps_view_func = self.m2l_work_array_view | ||
else: | ||
preprocessed_mpole_exps = mpole_exps | ||
m2l_work_array = local_exps | ||
mpole_exps_view_func = self.multipole_expansions_view | ||
local_exps_view_func = self.local_expansions_view | ||
|
||
preprocess_evts = [] | ||
translate_evts = [] | ||
postprocess_evts = [] | ||
|
||
for lev in range(self.tree.nlevels): | ||
wait_for = [] | ||
|
||
start, stop = level_start_target_box_nrs[lev:lev+2] | ||
if start == stop: | ||
continue | ||
|
||
if self.tree_indep.m2l_translation.use_preprocessing: | ||
order = self.level_orders[lev] | ||
preprocess_mpole_kernel = \ | ||
self.tree_indep.m2l_preprocess_mpole_kernel(order, order) | ||
|
||
_, source_mpoles_view = \ | ||
self.multipole_expansions_view(mpole_exps, lev) | ||
|
||
_, preprocessed_source_mpoles_view = \ | ||
self.m2l_preproc_mpole_expansions_view( | ||
preprocessed_mpole_exps, lev) | ||
|
||
tr_classes = self.m2l_translation_class_level_start_box_nrs() | ||
if tr_classes[lev] == tr_classes[lev + 1]: | ||
# There is no M2L happening in this level | ||
|
@@ -750,33 +804,29 @@ def multipole_to_local(self, | |
evt, _ = preprocess_mpole_kernel( | ||
queue, | ||
src_expansions=source_mpoles_view, | ||
preprocessed_src_expansions=preprocessed_source_mpoles_view, | ||
preprocessed_src_expansions=preprocessed_mpole_exps[lev], | ||
src_rscale=self.level_to_rscale(lev), | ||
wait_for=wait_for, | ||
**self.kernel_extra_kwargs | ||
) | ||
preprocess_evts.append(evt) | ||
mpole_exps = preprocessed_mpole_exps | ||
m2l_work_array = self.m2l_work_array_zeros(local_exps) | ||
mpole_exps_view_func = self.m2l_preproc_mpole_expansions_view | ||
local_exps_view_func = self.m2l_work_array_view | ||
else: | ||
m2l_work_array = local_exps | ||
mpole_exps_view_func = self.multipole_expansions_view | ||
local_exps_view_func = self.local_expansions_view | ||
wait_for.append(evt) | ||
|
||
translate_evts = [] | ||
if self.tree_indep.m2l_translation.use_fft: | ||
evt_fft, preprocessed_mpole_exps[lev] = \ | ||
self.run_opencl_fft(queue, | ||
preprocessed_mpole_exps[lev], | ||
inverse=False, wait_for=wait_for, inplace=False) | ||
wait_for.append(evt_fft.native_event) | ||
evt = AggregateProfilingEvent([evt, evt_fft]) | ||
|
||
for lev in range(self.tree.nlevels): | ||
start, stop = level_start_target_box_nrs[lev:lev+2] | ||
if start == stop: | ||
continue | ||
preprocess_evts.append(evt) | ||
|
||
order = self.level_orders[lev] | ||
m2l = self.tree_indep.m2l(order, order, | ||
self.supports_translation_classes) | ||
|
||
source_level_start_ibox, source_mpoles_view = \ | ||
mpole_exps_view_func(mpole_exps, lev) | ||
mpole_exps_view_func(preprocessed_mpole_exps, lev) | ||
target_level_start_ibox, target_locals_view = \ | ||
local_exps_view_func(m2l_work_array, lev) | ||
|
||
|
@@ -801,14 +851,11 @@ def multipole_to_local(self, | |
kwargs["m2l_translation_classes_dependent_data"].size == 0: | ||
# There is nothing to do for this level | ||
continue | ||
evt, _ = m2l(queue, **kwargs, wait_for=preprocess_evts) | ||
|
||
evt, _ = m2l(queue, **kwargs, wait_for=wait_for) | ||
wait_for.append(evt) | ||
translate_evts.append(evt) | ||
|
||
postprocess_evts = [] | ||
|
||
if self.tree_indep.m2l_translation.use_preprocessing: | ||
for lev in range(self.tree.nlevels): | ||
if self.tree_indep.m2l_translation.use_preprocessing: | ||
order = self.level_orders[lev] | ||
postprocess_local_kernel = \ | ||
self.tree_indep.m2l_postprocess_local_kernel(order, order) | ||
|
@@ -825,17 +872,28 @@ def multipole_to_local(self, | |
# There is no M2L happening in this level | ||
continue | ||
|
||
if self.tree_indep.m2l_translation.use_fft: | ||
evt_fft, target_locals_before_postprocessing_view = \ | ||
self.run_opencl_fft(queue, | ||
target_locals_before_postprocessing_view, | ||
inverse=True, wait_for=wait_for, inplace=False) | ||
wait_for.append(evt_fft.native_event) | ||
|
||
evt, _ = postprocess_local_kernel( | ||
queue, | ||
tgt_expansions=target_locals_view, | ||
tgt_expansions_before_postprocessing=( | ||
target_locals_before_postprocessing_view), | ||
src_rscale=self.level_to_rscale(lev), | ||
tgt_rscale=self.level_to_rscale(lev), | ||
wait_for=translate_evts, | ||
wait_for=wait_for, | ||
**self.kernel_extra_kwargs, | ||
) | ||
postprocess_evts.append(evt) | ||
|
||
if self.tree_indep.m2l_translation.use_fft: | ||
postprocess_evts.append(AggregateProfilingEvent([evt, evt_fft])) | ||
else: | ||
postprocess_evts.append(evt) | ||
|
||
timing_events = preprocess_evts + translate_evts + postprocess_evts | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you thinking of this as an optional dependency? If yes, it should be declared as an extra in
setup.py
. If not, it should be insetup.py
outright. (I don't see a reason not to make it a hard dependency.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it a hard dep in bec642e