Project Proposal: Evaluating Alias Analysis in Rust
What:
Aliasing is constrained in Rust more so than other languages. This suggests it should be possible to get more information than typical and perform aggressive optimizations based on aliasing rules. This is already the case, with proposed alias models, for example Stacked Borrows or the current state of the art Tree Borrows, permitting reasoning about how references change across function calls using an intraprocedural analysis which would be entirely impossible in a language with weaker conditions on aliasing.
However, due to unsafe code, static pointer analyses still have to be fairly conservative. I want to answer three questions:
- How conservative must current pointer analyses be?
- Does more precise aliasing information unlock further optimizations?
- If more precise aliasing information is helpful for optimizations, is it possible to define a pointer analysis tailored to Rust's alias model letting it perform better than the state of the art?
Measuring Effectiveness:
It's a hard to separate the procedure from the evaluation for (1) and (2) as they are effectively a quantitative evaluation of current technologies. Therefore the sections are grouped together.
I will quantify (1) by rephrasing it to "how many aliases do current alias analyses attribute to a given pointer which can never happen." As stated, this is still likely not a tractable problem because finding all aliases likely will not terminate (though I might as well try it). Therefore I will approximate an answer. To do so, I will create a suite of Rust code taken from the Rust standard library (due to it's liberal use of unsafe code) and popular Rust crates. I will then run a current alias analyses on the code, a pass built in the compiler if one exists (I'm unsure if one does) or instead use rupta. Using the test suites from the given projects, I will use miri to record dynamically what references alias each other. This will likely require hacking miri in some way, however as miri supports checking MIR against the Stacked Borrows model, which requires dynamically recording what pointers aliases what, this new feature should be possible. I can then take the difference between the dynamically recorded number of aliases and the statically approximated number of alias pairs. Reporting these measurements constitutes a success.
(2) is hard to quantify. First, I will have to make a shortlist of very aggressive optimizations which require more precise alias information than found in (1). To create this, it will likely help to categorize the types of aliasing which were missed in (1). In part, this is the purpose of (1). A possible answer to this question is that the missed aliasing information does not unlock any optimizations.
If (2) finds some optimizations might be helpful, one way I propose to improve alias analysis is an implementation of either Stacked Borrows or Tree Borrows which is approximated statically using the results of an alias analysis. The result from this can then be used to perform the optimizations found in (2). These can then be measured for regressions or speedups based on the tests used in (1). It also might be possible to simply use rustc-perf. If execution time improvements on these tests are found without major regressions, success!
How Much of This Do I Expect To Complete?:
Completing (1)-(3) is very ambitious. I expect to complete (1) and start (2) for this course project. I believe this should still be reasonable scope. If I am lucky and make more progress, great! I don't see any ways (1) and (2) can fail terribly, save unexpected technical hurdles, however, they may produce the boring answer "the current state of the art is pretty good" and (3) would not be a useful thing to do. I think (3) is over scoped for the course project, especially as it would have to be done after (1) and (2).
Team Members:
Jeremy Ku-Benjet (jk2582)
Project Proposal: Evaluating Alias Analysis in Rust
What:
Aliasing is constrained in Rust more so than other languages. This suggests it should be possible to get more information than typical and perform aggressive optimizations based on aliasing rules. This is already the case, with proposed alias models, for example Stacked Borrows or the current state of the art Tree Borrows, permitting reasoning about how references change across function calls using an intraprocedural analysis which would be entirely impossible in a language with weaker conditions on aliasing.
However, due to
unsafecode, static pointer analyses still have to be fairly conservative. I want to answer three questions:Measuring Effectiveness:
It's a hard to separate the procedure from the evaluation for (1) and (2) as they are effectively a quantitative evaluation of current technologies. Therefore the sections are grouped together.
I will quantify (1) by rephrasing it to "how many aliases do current alias analyses attribute to a given pointer which can never happen." As stated, this is still likely not a tractable problem because finding all aliases likely will not terminate (though I might as well try it). Therefore I will approximate an answer. To do so, I will create a suite of Rust code taken from the Rust standard library (due to it's liberal use of
unsafecode) and popular Rust crates. I will then run a current alias analyses on the code, a pass built in the compiler if one exists (I'm unsure if one does) or instead use rupta. Using the test suites from the given projects, I will use miri to record dynamically what references alias each other. This will likely require hacking miri in some way, however as miri supports checking MIR against the Stacked Borrows model, which requires dynamically recording what pointers aliases what, this new feature should be possible. I can then take the difference between the dynamically recorded number of aliases and the statically approximated number of alias pairs. Reporting these measurements constitutes a success.(2) is hard to quantify. First, I will have to make a shortlist of very aggressive optimizations which require more precise alias information than found in (1). To create this, it will likely help to categorize the types of aliasing which were missed in (1). In part, this is the purpose of (1). A possible answer to this question is that the missed aliasing information does not unlock any optimizations.
If (2) finds some optimizations might be helpful, one way I propose to improve alias analysis is an implementation of either Stacked Borrows or Tree Borrows which is approximated statically using the results of an alias analysis. The result from this can then be used to perform the optimizations found in (2). These can then be measured for regressions or speedups based on the tests used in (1). It also might be possible to simply use rustc-perf. If execution time improvements on these tests are found without major regressions, success!
How Much of This Do I Expect To Complete?:
Completing (1)-(3) is very ambitious. I expect to complete (1) and start (2) for this course project. I believe this should still be reasonable scope. If I am lucky and make more progress, great! I don't see any ways (1) and (2) can fail terribly, save unexpected technical hurdles, however, they may produce the boring answer "the current state of the art is pretty good" and (3) would not be a useful thing to do. I think (3) is over scoped for the course project, especially as it would have to be done after (1) and (2).
Team Members:
Jeremy Ku-Benjet (jk2582)