-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Atlas-based variant using BlockStructured FunctionSpace #54
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for taking so long to look at this and many thanks, this is great! Nothing much to comment on the Fortran side of things, which has been done very cleanly!
I have left a few comments on the bundle integration. To test the variant, I implemented a good number of these suggestions already and will file them as a PR against this branch.
With that, I have built and run this with GNU 11.2.0 on Atos, which worked well.
Unfortunately, when using MPI there seems to be some kind of problem, which I haven't been able to figure out, yet. It throws a SIGFPE inside the compute kernel but I don't know if this is maybe due to uninitialized values or something else:
==== backtrace (tid:1080635) ====
0 0x0000000000012ce0 __funlockfile() :0
1 0x000000000040fbf5 cloudsc_() /etc/ecmwf/nfs/dh1_home_a/nabr/dwarf-p-cloudsc/nab_atlas_structures/source/cloudsc-dwarf/src/cloudsc_fortran_atlas/cloudsc.F90:750
2 0x000000000040954d __cloudsc_driver_mod_MOD_cloudsc_driver() /etc/ecmwf/nfs/dh1_home_a/nabr/dwarf-p-cloudsc/nab_atlas_structures/source/cloudsc-dwarf/src/cloudsc_fortran_atlas/cloudsc_driver_mod.F90:112
3 0x000000000041680a MAIN__() dwarf_cloudsc_atlas.F90:0
4 0x0000000000403043 main() ???:0
5 0x000000000003acf3 __libc_start_main() ???:0
6 0x000000000040309e _start() ???:0
=================================
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
Backtrace for this error:
BFD: Dwarf Error: Can't find .debug_ranges section.
==== backtrace (tid:1080636) ====
0 0x0000000000012ce0 __funlockfile() :0
1 0x000000000040fbf5 cloudsc_() /etc/ecmwf/nfs/dh1_home_a/nabr/dwarf-p-cloudsc/nab_atlas_structures/source/cloudsc-dwarf/src/cloudsc_fortran_atlas/cloudsc.F90:750
2 0x000000000040954d __cloudsc_driver_mod_MOD_cloudsc_driver() /etc/ecmwf/nfs/dh1_home_a/nabr/dwarf-p-cloudsc/nab_atlas_structures/source/cloudsc-dwarf/src/cloudsc_fortran_atlas/cloudsc_driver_mod.F90:112
3 0x000000000041680a MAIN__() dwarf_cloudsc_atlas.F90:0
4 0x0000000000403043 main() ???:0
5 0x000000000003acf3 __libc_start_main() ???:0
6 0x000000000040309e _start() ???:0
=================================
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
Backtrace for this error:
#0 0x14ed97fa4cdf in ???
#0 0x14dca11becdf in ???
#1 0x40fbf5 in cloudsc_
at /etc/ecmwf/nfs/dh1_home_a/nabr/dwarf-p-cloudsc/nab_atlas_structures/source/cloudsc-dwarf/src/cloudsc_fortran_atlas/cloudsc.F90:750
#1 0x40fbf5 in cloudsc_
at /etc/ecmwf/nfs/dh1_home_a/nabr/dwarf-p-cloudsc/nab_atlas_structures/source/cloudsc-dwarf/src/cloudsc_fortran_atlas/cloudsc.F90:750
#2 0x40954c in __cloudsc_driver_mod_MOD_cloudsc_driver
at /etc/ecmwf/nfs/dh1_home_a/nabr/dwarf-p-cloudsc/nab_atlas_structures/source/cloudsc-dwarf/src/cloudsc_fortran_atlas/cloudsc_driver_mod.F90:112
#3 0x416809 in dwarf_cloudsc
at /etc/ecmwf/nfs/dh1_home_a/nabr/dwarf-p-cloudsc/nab_atlas_structures/source/cloudsc-dwarf/src/cloudsc_fortran_atlas/dwarf_cloudsc_atlas.F90:86
#4 0x403042 in main
at /etc/ecmwf/nfs/dh1_home_a/nabr/dwarf-p-cloudsc/nab_atlas_structures/source/cloudsc-dwarf/src/cloudsc_fortran_atlas/dwarf_cloudsc_atlas.F90:12
#2 0x40954c in __cloudsc_driver_mod_MOD_cloudsc_driver
at /etc/ecmwf/nfs/dh1_home_a/nabr/dwarf-p-cloudsc/nab_atlas_structures/source/cloudsc-dwarf/src/cloudsc_fortran_atlas/cloudsc_driver_mod.F90:112
#3 0x416809 in dwarf_cloudsc
at /etc/ecmwf/nfs/dh1_home_a/nabr/dwarf-p-cloudsc/nab_atlas_structures/source/cloudsc-dwarf/src/cloudsc_fortran_atlas/dwarf_cloudsc_atlas.F90:86
#4 0x403042 in main
at /etc/ecmwf/nfs/dh1_home_a/nabr/dwarf-p-cloudsc/nab_atlas_structures/source/cloudsc-dwarf/src/cloudsc_fortran_atlas/dwarf_cloudsc_atlas.F90:12
Would you mind taking a look at that?
… less in size than nproma
…er process must be taken from Atlas decomposition; 2) Atlas grid has to get the global number of points
The parallelisation issues should now be resolved in the branch: nab_atlas_structures. |
dbd1be8
to
32be522
Compare
OK, I think this is almost ready. I fixed the versions of the new dependencies to the respective latest releases, and changed some of the I have also temporarily disabled the Atlas variant when building single precision, as the expand_mod for Atlas is currently only capable of double precision. @sbrdar, can you fix this please in a subsequent PR on a new branch? There is also an issue when using the atlas variant with Serialbox: @sbrdar could you please fix that on the current branch? Separate from this we seem to have broken the CUF transformation in Loki, which is only being tested when building with NVHPC (same failure occurs also on develop). That is the reason for the failing NVHPC tests. |
Fix for Loki PR'd: ecmwf-ifs/loki#157 |
… opening and closing of reference input
@reuterbal I think that the things are fine regarding the previous issues with the single precision and SerialBox. Something with nvhpc is problem, can you check that? |
ff2dec9
to
f32fb98
Compare
f32fb98
to
d6a8d5e
Compare
23d2d5e
to
59fa0b8
Compare
59fa0b8
to
fe77fb6
Compare
OK, sorry for letting this lie around for so long - Getting NVHPC builds and tests to reliably pass was challenging with the growing software stack complexity. I think now the CI setup is OK, though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks OK to me! Thanks @reuterbal for patching up the testing! Thanks @sbrdar for this nice work!
co-authored with W. Deconinck, B. Reuter, M. Lange