Module rustc_mir_transform::dest_prop
source · Expand description
Propagates assignment destinations backwards in the CFG to eliminate redundant assignments.
Motivation
MIR building can insert a lot of redundant copies, and Rust code in general often tends to move
values around a lot. The result is a lot of assignments of the form dest = {move} src;
in MIR.
MIR building for constants in particular tends to create additional locals that are only used
inside a single block to shuffle a value around unnecessarily.
LLVM by itself is not good enough at eliminating these redundant copies (eg. see https://github.com/rust-lang/rust/issues/32966), so this leaves some performance on the table that we can regain by implementing an optimization for removing these assign statements in rustc itself. When this optimization runs fast enough, it can also speed up the constant evaluation and code generation phases of rustc due to the reduced number of statements and locals.
The Optimization
Conceptually, this optimization is “destination propagation”. It is similar to the Named Return
Value Optimization, or NRVO, known from the C++ world, except that it isn’t limited to return
values or the return place _0
. On a very high level, independent of the actual implementation
details, it does the following:
- Identify
dest = src;
statements that can be soundly eliminated. - Replace all mentions of
src
withdest
(“unifying” them and propagating the destination backwards). - Delete the
dest = src;
statement (by making it anop
).
Step 1) is by far the hardest, so it is explained in more detail below.
Soundness
Given an Assign
statement dest = src;
, where dest
is a Place
and src
is an Rvalue
,
there are a few requirements that must hold for the optimization to be sound:
-
dest
must not contain any indirection through a pointer. It must access part of the base local. Otherwise it might point to arbitrary memory that is hard to track.It must also not contain any indexing projections, since those take an arbitrary
Local
as the index, and that local might only be initialized shortly beforedest
is used. -
src
must be a bareLocal
without any indirections or field projections (FIXME: Is this a fundamental restriction or just current impl state?). It can be copied or moved by the assignment. -
The
dest
andsrc
locals must never be live at the same time. If they are, it means that they both hold a (potentially different) value that is needed by a future use of the locals. Unifying them would overwrite one of the values.Note that computing liveness of locals that have had their address taken is more difficult: Short of doing full escape analysis on the address/pointer/reference, the pass would need to assume that any operation that can potentially involve opaque user code (such as function calls, destructors, and inline assembly) may access any local that had its address taken before that point.
Here, the first two conditions are simple structural requirements on the Assign
statements
that can be trivially checked. The liveness requirement however is more difficult and costly to
check.
Previous Work
A previous attempt at implementing an optimization like this turned out to be a significant regression in compiler performance. Fixing the regressions introduced a lot of undesirable complexity to the implementation.
A subsequent approach tried to avoid the costly computation by limiting itself to acyclic
CFGs, but still turned out to be far too costly to run due to suboptimal performance within
individual basic blocks, requiring a walk across the entire block for every assignment found
within the block. For the tuple-stress
benchmark, which has 458745 statements in a single
block, this proved to be far too costly.
Since the first attempt at this, the compiler has improved dramatically, and new analysis frameworks have been added that should make this approach viable without requiring a limited approach that only works for some classes of CFGs:
- rustc now has a powerful dataflow analysis framework that can handle forwards and backwards analyses efficiently.
- Layout optimizations for generators have been added to improve code generation for
async/await, which are very similar in spirit to what this optimization does. Both walk the
MIR and record conflicting uses of locals in a
BitMatrix
.
Also, rustc now has a simple NRVO pass (see nrvo.rs
), which handles a subset of the cases that
this destination propagation pass handles, proving that similar optimizations can be performed
on MIR.
Pre/Post Optimization
It is recommended to run SimplifyCfg
and then SimplifyLocals
some time after this pass, as
it replaces the eliminated assign statements with nop
s and leaves unused locals behind.
Structs
dest = {move} src;
statement at loc
.Constants
Functions
PlaceElem::Index
only stores a Local
, so we can’t replace that with a full Place
.