Expand description

Propagates assignment destinations backwards in the CFG to eliminate redundant assignments.

Motivation

MIR building can insert a lot of redundant copies, and Rust code in general often tends to move values around a lot. The result is a lot of assignments of the form dest = {move} src; in MIR. MIR building for constants in particular tends to create additional locals that are only used inside a single block to shuffle a value around unnecessarily.

LLVM by itself is not good enough at eliminating these redundant copies (eg. see https://github.com/rust-lang/rust/issues/32966), so this leaves some performance on the table that we can regain by implementing an optimization for removing these assign statements in rustc itself. When this optimization runs fast enough, it can also speed up the constant evaluation and code generation phases of rustc due to the reduced number of statements and locals.

The Optimization

Conceptually, this optimization is “destination propagation”. It is similar to the Named Return Value Optimization, or NRVO, known from the C++ world, except that it isn’t limited to return values or the return place _0. On a very high level, independent of the actual implementation details, it does the following:

  1. Identify dest = src; statements that can be soundly eliminated.
  2. Replace all mentions of src with dest (“unifying” them and propagating the destination backwards).
  3. Delete the dest = src; statement (by making it a nop).

Step 1) is by far the hardest, so it is explained in more detail below.

Soundness

Given an Assign statement dest = src;, where dest is a Place and src is an Rvalue, there are a few requirements that must hold for the optimization to be sound:

  • dest must not contain any indirection through a pointer. It must access part of the base local. Otherwise it might point to arbitrary memory that is hard to track.

    It must also not contain any indexing projections, since those take an arbitrary Local as the index, and that local might only be initialized shortly before dest is used.

  • src must be a bare Local without any indirections or field projections (FIXME: Is this a fundamental restriction or just current impl state?). It can be copied or moved by the assignment.

  • The dest and src locals must never be live at the same time. If they are, it means that they both hold a (potentially different) value that is needed by a future use of the locals. Unifying them would overwrite one of the values.

    Note that computing liveness of locals that have had their address taken is more difficult: Short of doing full escape analysis on the address/pointer/reference, the pass would need to assume that any operation that can potentially involve opaque user code (such as function calls, destructors, and inline assembly) may access any local that had its address taken before that point.

Here, the first two conditions are simple structural requirements on the Assign statements that can be trivially checked. The liveness requirement however is more difficult and costly to check.

Previous Work

A previous attempt at implementing an optimization like this turned out to be a significant regression in compiler performance. Fixing the regressions introduced a lot of undesirable complexity to the implementation.

A subsequent approach tried to avoid the costly computation by limiting itself to acyclic CFGs, but still turned out to be far too costly to run due to suboptimal performance within individual basic blocks, requiring a walk across the entire block for every assignment found within the block. For the tuple-stress benchmark, which has 458745 statements in a single block, this proved to be far too costly.

Since the first attempt at this, the compiler has improved dramatically, and new analysis frameworks have been added that should make this approach viable without requiring a limited approach that only works for some classes of CFGs:

  • rustc now has a powerful dataflow analysis framework that can handle forwards and backwards analyses efficiently.
  • Layout optimizations for generators have been added to improve code generation for async/await, which are very similar in spirit to what this optimization does. Both walk the MIR and record conflicting uses of locals in a BitMatrix.

Also, rustc now has a simple NRVO pass (see nrvo.rs), which handles a subset of the cases that this destination propagation pass handles, proving that similar optimizations can be performed on MIR.

Pre/Post Optimization

It is recommended to run SimplifyCfg and then SimplifyLocals some time after this pass, as it replaces the eliminated assign statements with nops and leaves unused locals behind.

Structs

A dest = {move} src; statement at loc.
Conflicts 🔒
Replacer 🔒
UnifyLocal 🔒

Constants

MAX_BLOCKS 🔒
MAX_LOCALS 🔒

Functions

Scans the MIR for assignments between locals that we might want to consider merging.
Some locals are part of the function’s interface and can not be removed.
PlaceElem::Index only stores a Local, so we can’t replace that with a full Place.