Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recursive pretty print #368

Closed
wants to merge 5 commits into from
Closed

recursive pretty print #368

wants to merge 5 commits into from

Conversation

enricozb
Copy link
Contributor

@enricozb enricozb commented Jun 3, 2024

Fixes issues with segfault after printing: #360.
Also it doesn't overload the ERA node.

Of course a large enough net will still overflow the stack. On my machine, this example bend code:

def create(n):
  bend x = 0:
    when x < n:
      str = List/Cons { head: 1, tail: fork(x + 1) }
    else:
      str = List/Nil
  return str

def main():
  return create(37)

Which previously segfaulted at create(37), now segfaults somewhere between create(50000) and create(100000).

@HigherOrderBot

This comment has been minimized.

@enricozb enricozb requested a review from edusporto June 3, 2024 21:26
@VictorTaelin
Copy link
Member

VictorTaelin commented Jun 3, 2024

You just removed the stack? That's a regression, in CUDA this will cause the compiler to be unable to estimate the stack size of the pretty print function, which can have harsh consequences on performance elsewhere and cause segfaults in unrelated code, making us unable to use the pretty printer for debugging purposes. Have you tested using it inside the interaction kernel on the GPU?

@enricozb
Copy link
Contributor Author

enricozb commented Jun 3, 2024

I didn't test pretty print within any of the cuda code, didn't think of that as a use case. We can have a stacked version for debugging and use this for the output on the host then?

@HigherOrderBot
Copy link
Collaborator

Perf run for 6170de3:

compiled
========

file            runtime         main            (local)       
==============================================================
sort_bitonic    c                        3.93s           3.50s
                cuda                     0.24s           0.23s
--------------------------------------------------------------
sum_rec         c                        1.43s           1.47s
                cuda                     0.13s           0.13s
--------------------------------------------------------------
sum_tree        c                        0.12s           0.13s
                cuda                     0.09s           0.07s
--------------------------------------------------------------
tuples          c                        2.85s           3.87s
                cuda                   timeout         timeout
--------------------------------------------------------------

interpreted
===========

file            runtime         main            (local)       
==============================================================
sort_bitonic    c                        4.89s           4.23s
                cuda                     0.23s           0.23s
                rust                   timeout         timeout
--------------------------------------------------------------
sum_rec         c                        1.82s           1.73s
                cuda                     0.13s           5.00s
                rust                    13.90s          13.47s
--------------------------------------------------------------
sum_tree        c                        0.21s           0.24s
                cuda                     0.09s           0.08s
                rust                     0.87s           0.82s
--------------------------------------------------------------
tuples          c                        3.00s           2.61s
                cuda                   timeout         timeout
                rust                     3.76s           3.75s
--------------------------------------------------------------


VictorTaelin added a commit that referenced this pull request Jun 4, 2024
this should be handled correctly after the C/CUDA readback on Rust is
implemented, as that'll allow just using Rust's stringifier instead
@enricozb enricozb closed this Jun 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants