Module rustc_mir_build::thir::pattern::usefulness
source · Expand description
Note: tests specific to this file can be found in:
ui/pattern/usefulness
ui/or-patterns
ui/consts/const_in_pattern
ui/rfc-2008-non-exhaustive
ui/half-open-range-patterns
- probably many others
I (Nadrieril) prefer to put new tests in ui/pattern/usefulness
unless there’s a specific
reason not to, for example if they depend on a particular feature like or_patterns
.
This file includes the logic for exhaustiveness and reachability checking for pattern-matching. Specifically, given a list of patterns for a type, we can tell whether: (a) each pattern is reachable (reachability) (b) the patterns cover every possible value for the type (exhaustiveness)
The algorithm implemented here is a modified version of the one described in this paper. We have however generalized it to accommodate the variety of patterns that Rust supports. We thus explain our version here, without being as rigorous.
Summary
The core of the algorithm is the notion of “usefulness”. A pattern q
is said to be useful
relative to another pattern p
of the same type if there is a value that is matched by q
and
not matched by p
. This generalizes to many p
s: q
is useful w.r.t. a list of patterns
p_1 .. p_n
if there is a value that is matched by q
and by none of the p_i
. We write
usefulness(p_1 .. p_n, q)
for a function that returns a list of such values. The aim of this
file is to compute it efficiently.
This is enough to compute reachability: a pattern in a match
expression is reachable iff it
is useful w.r.t. the patterns above it:
match x {
Some(_) => {},
None => {}, // reachable: `None` is matched by this but not the branch above
Some(0) => {}, // unreachable: all the values this matches are already matched by
// `Some(_)` above
}
This is also enough to compute exhaustiveness: a match is exhaustive iff the wildcard _
pattern is not useful w.r.t. the patterns in the match. The values returned by usefulness
are used to tell the user which values are missing.
match x {
Some(0) => {},
None => {},
// not exhaustive: `_` is useful because it matches `Some(1)`
}
The entrypoint of this file is the compute_match_usefulness
function, which computes
reachability for each match branch and exhaustiveness for the whole match.
Constructors and fields
Note: we will often abbreviate “constructor” as “ctor”.
The idea that powers everything that is done in this file is the following: a (matchable)
value is made from a constructor applied to a number of subvalues. Examples of constructors are
Some
, None
, (,)
(the 2-tuple constructor), Foo {..}
(the constructor for a struct
Foo
), and 2
(the constructor for the number 2
). This is natural when we think of
pattern-matching, and this is the basis for what follows.
Some of the ctors listed above might feel weird: None
and 2
don’t take any arguments.
That’s ok: those are ctors that take a list of 0 arguments; they are the simplest case of
ctors. We treat 2
as a ctor because u64
and other number types behave exactly like a huge
enum
, with one variant for each number. This allows us to see any matchable value as made up
from a tree of ctors, each having a set number of children. For example: Foo { bar: None, baz: Ok(0) }
is made from 4 different ctors, namely Foo{..}
, None
, Ok
and 0
.
This idea can be extended to patterns: they are also made from constructors applied to fields.
A pattern for a given type is allowed to use all the ctors for values of that type (which we
call “value constructors”), but there are also pattern-only ctors. The most important one is
the wildcard (_
), and the others are integer ranges (0..=10
), variable-length slices ([x, ..]
), and or-patterns (Ok(0) | Err(_)
). Examples of valid patterns are 42
, Some(_)
, Foo { bar: Some(0) | None, baz: _ }
. Note that a binder in a pattern (e.g. Some(x)
) matches the
same values as a wildcard (e.g. Some(_)
), so we treat both as wildcards.
From this deconstruction we can compute whether a given value matches a given pattern; we
simply look at ctors one at a time. Given a pattern p
and a value v
, we want to compute
matches!(v, p)
. It’s mostly straightforward: we compare the head ctors and when they match
we compare their fields recursively. A few representative examples:
matches!(v, _) := true
matches!((v0, v1), (p0, p1)) := matches!(v0, p0) && matches!(v1, p1)
matches!(Foo { bar: v0, baz: v1 }, Foo { bar: p0, baz: p1 }) := matches!(v0, p0) && matches!(v1, p1)
matches!(Ok(v0), Ok(p0)) := matches!(v0, p0)
matches!(Ok(v0), Err(p0)) := false
(incompatible variants)matches!(v, 1..=100) := matches!(v, 1) || ... || matches!(v, 100)
matches!([v0], [p0, .., p1]) := false
(incompatible lengths)matches!([v0, v1, v2], [p0, .., p1]) := matches!(v0, p0) && matches!(v2, p1)
matches!(v, p0 | p1) := matches!(v, p0) || matches!(v, p1)
Constructors, fields and relevant operations are defined in the super::deconstruct_pat
module.
Note: this constructors/fields distinction may not straightforwardly apply to every Rust type.
For example a value of type Rc<u64>
can’t be deconstructed that way, and &str
has an
infinitude of constructors. There are also subtleties with visibility of fields and
uninhabitedness and various other things. The constructors idea can be extended to handle most
of these subtleties though; caveats are documented where relevant throughout the code.
Whether constructors cover each other is computed by Constructor::is_covered_by
.
Specialization
Recall that we wish to compute usefulness(p_1 .. p_n, q)
: given a list of patterns p_1 .. p_n
and a pattern q
, all of the same type, we want to find a list of values (called
“witnesses”) that are matched by q
and by none of the p_i
. We obviously don’t just
enumerate all possible values. From the discussion above we see that we can proceed
ctor-by-ctor: for each value ctor of the given type, we ask “is there a value that starts with
this constructor and matches q
and none of the p_i
?”. As we saw above, there’s a lot we can
say from knowing only the first constructor of our candidate value.
Let’s take the following example:
match x {
Enum::Variant1(_) => {} // `p1`
Enum::Variant2(None, 0) => {} // `p2`
Enum::Variant2(Some(_), 0) => {} // `q`
}
We can easily see that if our candidate value v
starts with Variant1
it will not match q
.
If v = Variant2(v0, v1)
however, whether or not it matches p2
and q
will depend on v0
and v1
. In fact, such a v
will be a witness of usefulness of q
exactly when the tuple
(v0, v1)
is a witness of usefulness of q'
in the following reduced match:
match x {
(None, 0) => {} // `p2'`
(Some(_), 0) => {} // `q'`
}
This motivates a new step in computing usefulness, that we call specialization. Specialization consist of filtering a list of patterns for those that match a constructor, and then looking into the constructor’s fields. This enables usefulness to be computed recursively.
Instead of acting on a single pattern in each row, we will consider a list of patterns for each
row, and we call such a list a pattern-stack. The idea is that we will specialize the
leftmost pattern, which amounts to popping the constructor and pushing its fields, which feels
like a stack. We note a pattern-stack simply with [p_1 ... p_n]
.
Here’s a sequence of specializations of a list of pattern-stacks, to illustrate what’s
happening:
[Enum::Variant1(_)]
[Enum::Variant2(None, 0)]
[Enum::Variant2(Some(_), 0)]
//==>> specialize with `Variant2`
[None, 0]
[Some(_), 0]
//==>> specialize with `Some`
[_, 0]
//==>> specialize with `true` (say the type was `bool`)
[0]
//==>> specialize with `0`
[]
The function specialize(c, p)
takes a value constructor c
and a pattern p
, and returns 0
or more pattern-stacks. If c
does not match the head constructor of p
, it returns nothing;
otherwise if returns the fields of the constructor. This only returns more than one
pattern-stack if p
has a pattern-only constructor.
-
Specializing for the wrong constructor returns nothing
specialize(None, Some(p0)) := []
-
Specializing for the correct constructor returns a single row with the fields
specialize(Variant1, Variant1(p0, p1, p2)) := [[p0, p1, p2]]
specialize(Foo{..}, Foo { bar: p0, baz: p1 }) := [[p0, p1]]
-
For or-patterns, we specialize each branch and concatenate the results
specialize(c, p0 | p1) := specialize(c, p0) ++ specialize(c, p1)
-
We treat the other pattern constructors as if they were a large or-pattern of all the possibilities:
specialize(c, _) := specialize(c, Variant1(_) | Variant2(_, _) | ...)
specialize(c, 1..=100) := specialize(c, 1 | ... | 100)
specialize(c, [p0, .., p1]) := specialize(c, [p0, p1] | [p0, _, p1] | [p0, _, _, p1] | ...)
-
If
c
is a pattern-only constructor,specialize
is defined on a case-by-case basis. See the discussion about constructor splitting insuper::deconstruct_pat
.
We then extend this function to work with pattern-stacks as input, by acting on the first column and keeping the other columns untouched.
Specialization for the whole matrix is done in Matrix::specialize_constructor
. Note that
or-patterns in the first column are expanded before being stored in the matrix. Specialization
for a single patstack is done from a combination of Constructor::is_covered_by
and
PatStack::pop_head_constructor
. The internals of how it’s done mostly live in the
Fields
struct.
Computing usefulness
We now have all we need to compute usefulness. The inputs to usefulness are a list of
pattern-stacks p_1 ... p_n
(one per row), and a new pattern_stack q
. The paper and this
file calls the list of patstacks a matrix. They must all have the same number of columns and
the patterns in a given column must all have the same type. usefulness
returns a (possibly
empty) list of witnesses of usefulness. These witnesses will also be pattern-stacks.
-
base case:
n_columns == 0
. Since a pattern-stack functions like a tuple of patterns, an empty one functions like the unit type. Thusq
is useful iff there are no rows above it, i.e. ifn == 0
. -
inductive case:
n_columns > 0
. We need a way to list the constructors we want to try. We will be more clever in the next section but for now assume we list all value constructors for the type of the first column.-
for each such ctor
c
:-
for each
q'
returned byspecialize(c, q)
:- we compute
usefulness(specialize(c, p_1) ... specialize(c, p_n), q')
- we compute
-
for each witness found, we revert specialization by pushing the constructor
c
on top.
-
-
We return the concatenation of all the witnesses found, if any.
-
Example:
[Some(true)] // p_1
[None] // p_2
[Some(_)] // q
//==>> try `None`: `specialize(None, q)` returns nothing
//==>> try `Some`: `specialize(Some, q)` returns a single row
[true] // p_1'
[_] // q'
//==>> try `true`: `specialize(true, q')` returns a single row
[] // p_1''
[] // q''
//==>> base case; `n != 0` so `q''` is not useful.
//==>> go back up a step
[true] // p_1'
[_] // q'
//==>> try `false`: `specialize(false, q')` returns a single row
[] // q''
//==>> base case; `n == 0` so `q''` is useful. We return the single witness `[]`
witnesses:
[]
//==>> undo the specialization with `false`
witnesses:
[false]
//==>> undo the specialization with `Some`
witnesses:
[Some(false)]
//==>> we have tried all the constructors. The output is the single witness `[Some(false)]`.
This computation is done in is_useful
. In practice we don’t care about the list of
witnesses when computing reachability; we only need to know whether any exist. We do keep the
witnesses when computing exhaustiveness to report them to the user.
Making usefulness tractable: constructor splitting
We’re missing one last detail: which constructors do we list? Naively listing all value
constructors cannot work for types like u64
or &str
, so we need to be more clever. The
first obvious insight is that we only want to list constructors that are covered by the head
constructor of q
. If it’s a value constructor, we only try that one. If it’s a pattern-only
constructor, we use the final clever idea for this algorithm: constructor splitting, where we
group together constructors that behave the same.
The details are not necessary to understand this file, so we explain them in
super::deconstruct_pat
. Splitting is done by the Constructor::split
function.
Constants in patterns
There are two kinds of constants in patterns:
- literals (
1
,true
,"foo"
) - named or inline consts (
FOO
,const { 5 + 6 }
)
The latter are converted into other patterns with literals at the leaves. For example
const_to_pat(const { [1, 2, 3] })
becomes an Array(vec![Const(1), Const(2), Const(3)])
pattern. This gets problematic when comparing the constant via ==
would behave differently
from matching on the constant converted to a pattern. Situations like that can occur, when
the user implements PartialEq
manually, and thus could make ==
behave arbitrarily different.
In order to honor the ==
implementation, constants of types that implement PartialEq
manually
stay as a full constant and become an Opaque
pattern. These Opaque
patterns do not participate
in exhaustiveness, specialization or overlap checking.
Structs
- MatchArm 🔒The arm of a match expression.
- Matrix 🔒A 2D matrix.
- PatCtxt 🔒
- PatStack 🔒A row of a matrix. Rows of len 1 are very common, which is why
SmallVec[_; 2]
works well. - The output of checking a match for exhaustiveness and arm reachability.
- Witness 🔒A witness of non-exhaustiveness for error reporting, represented as a list of patterns (in reverse order of construction) with wildcards inside to represent elements that can take any inhabitant of the type as a value.
Enums
- ArmType 🔒
- Indicates whether or not a given arm is reachable.
- This carries the results of computing usefulness, as described at the top of the file. When checking usefulness of a match branch, we use the
NoWitnesses
variant, which also keeps track of potential unreachable sub-patterns (in the presence of or-patterns). When checking exhaustiveness of a whole match, we use theWithWitnesses
variant, which carries a list of witnesses of non-exhaustiveness when there are any. Which variant to use is dictated byArmType
.
Functions
- The entrypoint for the usefulness algorithm. Computes whether a match is exhaustive and which of its arms are reachable.
- Algorithm from http://moscova.inria.fr/~maranget/papers/warn/index.html. The algorithm from the paper has been modified to correctly handle empty types. The changes are: (0) We don’t exit early if the pattern matrix has zero rows. We just continue to recurse over columns. (1) all_constructors will only return constructors that are statically possible. E.g., it will only return
Ok
forResult<T, !>
.