You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been interested in these two topics lately. At first sight, they seem to be very different (and they are!), but I've started coming to a conjecture that they both can be complementary.
Supercompilation is a program transformation technique that is usually run at compile-time 1. It builds a graph (a (partial) process tree) approximating all possible execution states for a particular program, optimizes this graph by removing computational redundancy, and outputs a residual program that has a simpler and (hopefully) more efficient structure. Notwithstanding that supercompilation mimics run-time execution, it can be made terminating even for a Turing-complete language using folding and generalization of terms.
Optimal reduction is a technique to reduce lambda terms to their beta normal form. It does so by building a graph for an initial term (an interaction net), then transforming the graph with a fixed set of local rewrite rules. When the job is done, it outputs a residual program representing a beta normal form of the initial term, which has a simpler structure. Due to the halting problem, optimal reduction is not guaranteed to terminate on every input term.
A supercompiler can be understood as a program that inlines every single thing. An optimal evaluator can be understood as a program that shares every single thing. So, in principle, these two techniques are diverging in this matter: while a supercompiler attempts to remove sharing, an optimal evaluator does the opposite.
One of the most dangerous problems with supercompilation is that, due to a disrespectful attitude to sharing, it can output a term that is much larger than the input term. On the other hand, this is absolutely not a problem with optimal reduction, because a very compact interaction net can be residualized into a very large term. Therefore, one potential approach to solving the supercompilation code bloat problem would be to escape the syntactical universe of input terms and instead manipulate them in a somewhat more intricate manner that permits an all-encompassing notion of sharing. In a sense, Lamping's optimal reduction already provides such a mechanism.
It would be pretty interesting to see how optimal reduction's techniques can be applied to supercompilation. Because of sharing, a residual program can be made "as small as possible" using some sort of a graph representation. When it is given run-time data to act upon, it can be evaluated by some sort of a graph machine.
Anyhow, if you find the discussion interesting, please share (punch unintended!) any thoughts about it.
By the way, the README section "Is HVM's optimality only relevant for weird, academic λ-encoded terms?" demonstrates a truly interesting example of annihilating an intermediate list for the map f . map g term. In "Rethinking Supercompilation", Neil Mitchell demonstrates the same term encoded as follows (section 2):
root g f x = map g (map f x)
map f [] = []
map f (x : xs) = f x : map f xs
Which the suggested supercompiler is able to optimize into a single-pass function (section 2.3.2):
root g f x = case x of
[] -> []
z : zs -> g (f z) : root g f zs
Which is computationally identical to HVM's produced result map (f . g).
I believe that even a simple supercompiler would be also able to optimize your next example foldr (.) id funcs. So, in a sense, HVM and supercompilers have a common set of optimizations they're able to perform.
Footnotes
In theory, it can be executed at run-time, but I haven't seen a single experiment. ↩
This discussion was converted from issue #261 on May 22, 2024 07:11.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I've been interested in these two topics lately. At first sight, they seem to be very different (and they are!), but I've started coming to a conjecture that they both can be complementary.
Supercompilation is a program transformation technique that is usually run at compile-time 1. It builds a graph (a (partial) process tree) approximating all possible execution states for a particular program, optimizes this graph by removing computational redundancy, and outputs a residual program that has a simpler and (hopefully) more efficient structure. Notwithstanding that supercompilation mimics run-time execution, it can be made terminating even for a Turing-complete language using folding and generalization of terms.
Optimal reduction is a technique to reduce lambda terms to their beta normal form. It does so by building a graph for an initial term (an interaction net), then transforming the graph with a fixed set of local rewrite rules. When the job is done, it outputs a residual program representing a beta normal form of the initial term, which has a simpler structure. Due to the halting problem, optimal reduction is not guaranteed to terminate on every input term.
A supercompiler can be understood as a program that inlines every single thing. An optimal evaluator can be understood as a program that shares every single thing. So, in principle, these two techniques are diverging in this matter: while a supercompiler attempts to remove sharing, an optimal evaluator does the opposite.
One of the most dangerous problems with supercompilation is that, due to a disrespectful attitude to sharing, it can output a term that is much larger than the input term. On the other hand, this is absolutely not a problem with optimal reduction, because a very compact interaction net can be residualized into a very large term. Therefore, one potential approach to solving the supercompilation code bloat problem would be to escape the syntactical universe of input terms and instead manipulate them in a somewhat more intricate manner that permits an all-encompassing notion of sharing. In a sense, Lamping's optimal reduction already provides such a mechanism.
It would be pretty interesting to see how optimal reduction's techniques can be applied to supercompilation. Because of sharing, a residual program can be made "as small as possible" using some sort of a graph representation. When it is given run-time data to act upon, it can be evaluated by some sort of a graph machine.
Anyhow, if you find the discussion interesting, please share (punch unintended!) any thoughts about it.
By the way, the README section "Is HVM's optimality only relevant for weird, academic λ-encoded terms?" demonstrates a truly interesting example of annihilating an intermediate list for the
map f . map g
term. In "Rethinking Supercompilation", Neil Mitchell demonstrates the same term encoded as follows (section 2):Which the suggested supercompiler is able to optimize into a single-pass function (section 2.3.2):
Which is computationally identical to HVM's produced result
map (f . g)
.I believe that even a simple supercompiler would be also able to optimize your next example
foldr (.) id funcs
. So, in a sense, HVM and supercompilers have a common set of optimizations they're able to perform.Footnotes
In theory, it can be executed at run-time, but I haven't seen a single experiment. ↩
Beta Was this translation helpful? Give feedback.
All reactions