Module cargo::core::compiler::fingerprint
source · Expand description
Tracks changes to determine if something needs to be recompiled.
This module implements change-tracking so that Cargo can know whether or
not something needs to be recompiled. A Cargo Unit
can be either “dirty”
(needs to be recompiled) or “fresh” (it does not need to be recompiled).
Mechanisms affecting freshness
There are several mechanisms that influence a Unit’s freshness:
-
The
Fingerprint
is a hash, saved to the filesystem in the.fingerprint
directory, that tracks information about the Unit. If the fingerprint is missing (such as the first time the unit is being compiled), then the unit is dirty. If any of the fingerprint fields change (like the name of the source file), then the Unit is considered dirty.The
Fingerprint
also tracks the fingerprints of all its dependencies, so a change in a dependency will propagate the “dirty” status up. -
Filesystem mtime tracking is also used to check if a unit is dirty. See the section below on “Mtime comparison” for more details. There are essentially two parts to mtime tracking:
- The mtime of a Unit’s output files is compared to the mtime of all
its dependencies’ output file mtimes (see
check_filesystem
). If any output is missing, or is older than a dependency’s output, then the unit is dirty. - The mtime of a Unit’s source files is compared to the mtime of its
dep-info file in the fingerprint directory (see
find_stale_file
). The dep-info file is used as an anchor to know when the last build of the unit was done. See the “dep-info files” section below for more details. If any input files are missing, or are newer than the dep-info, then the unit is dirty.
- The mtime of a Unit’s output files is compared to the mtime of all
its dependencies’ output file mtimes (see
Note: Fingerprinting is not a perfect solution. Filesystem mtime tracking is notoriously imprecise and problematic. Only a small part of the environment is captured. This is a balance of performance, simplicity, and completeness. Sandboxing, hashing file contents, tracking every file access, environment variable, and network operation would ensure more reliable and reproducible builds at the cost of being complex, slow, and platform-dependent.
Fingerprints and Metadata
The Metadata
hash is a hash added to the output filenames to isolate
each unit. See its documentationfor more details.
NOTE: Not all output files are isolated via filename hashes (like dylibs).
The fingerprint directory uses a hash, but sometimes units share the same
fingerprint directory (when they don’t have Metadata) so care should be
taken to handle this!
Fingerprints and Metadata are similar, and track some of the same things. The Metadata contains information that is required to keep Units separate. The Fingerprint includes additional information that should cause a recompile, but it is desired to reuse the same filenames. A comparison of what is tracked:
Value | Fingerprint | Metadata |
---|---|---|
rustc | ✓ | ✓ |
Profile | ✓ | ✓ |
cargo rustc extra args | ✓ | ✓ |
CompileMode | ✓ | ✓ |
Target Name | ✓ | ✓ |
TargetKind (bin/lib/etc.) | ✓ | ✓ |
Enabled Features | ✓ | ✓ |
Immediate dependency’s hashes | ✓1 | ✓ |
CompileKind (host/target) | ✓ | ✓ |
__CARGO_DEFAULT_LIB_METADATA2 | ✓ | |
package_id | ✓ | |
authors, description, homepage, repo | ✓ | |
Target src path relative to ws | ✓ | |
Target flags (test/bench/for_host/edition) | ✓ | |
-C incremental=… flag | ✓ | |
mtime of sources | ✓3 | |
RUSTFLAGS/RUSTDOCFLAGS | ✓ | |
Lto flags | ✓ | ✓ |
config settings4 | ✓ | |
is_std | ✓ | |
[lints] table5 | ✓ |
When deciding what should go in the Metadata vs the Fingerprint, consider that some files (like dylibs) do not have a hash in their filename. Thus, if a value changes, only the fingerprint will detect the change (consider, for example, swapping between different features). Fields that are only in Metadata generally aren’t relevant to the fingerprint because they fundamentally change the output (like target vs host changes the directory where it is emitted).
Fingerprint files
Fingerprint information is stored in the
target/{debug,release}/.fingerprint/
directory. Each Unit is stored in a
separate directory. Each Unit directory contains:
- A file with a 16 hex-digit hash. This is the Fingerprint hash, used for quick loading and comparison.
- A
.json
file that contains details about the Fingerprint. This is only used to log details about why a fingerprint is considered dirty.CARGO_LOG=cargo::core::compiler::fingerprint=trace cargo build
can be used to display this log information. - A “dep-info” file which is a translation of rustc’s
*.d
dep-info files to a Cargo-specific format that tweaks file names and is optimized for reading quickly. - An
invoked.timestamp
file whose filesystem mtime is updated every time the Unit is built. This is used for capturing the time when the build starts, to detect if files are changed in the middle of the build. See below for more details.
Note that some units are a little different. A Unit for running a build
script or for rustdoc
does not have a dep-info file (it’s not
applicable). Build script invoked.timestamp
files are in the build
output directory.
Fingerprint calculation
After the list of Units has been calculated, the Units are added to the
JobQueue
. As each one is added, the fingerprint is calculated, and the
dirty/fresh status is recorded. A closure is used to update the fingerprint
on-disk when the Unit successfully finishes. The closure will recompute the
Fingerprint based on the updated information. If the Unit fails to compile,
the fingerprint is not updated.
Fingerprints are cached in the Context
. This makes computing
Fingerprints faster, but also is necessary for properly updating
dependency information. Since a Fingerprint includes the Fingerprints of
all dependencies, when it is updated, by using Arc
clones, it
automatically picks up the updates to its dependencies.
dep-info files
Cargo has several kinds of “dep info” files:
- dep-info files generated by
rustc
. - Fingerprint dep-info files translated from the first one.
- dep-info for external build system integration.
- Unstable
-Zbinary-dep-depinfo
.
rustc
dep-info files
Cargo passes the --emit=dep-info
flag to rustc
so that rustc
will
generate a “dep info” file (with the .d
extension). This is a
Makefile-like syntax that includes all of the source files used to build
the crate. This file is used by Cargo to know which files to check to see
if the crate will need to be rebuilt. Example:
/path/to/target/debug/deps/cargo-b6219d178925203d: src/bin/main.rs src/bin/cargo/cli.rs # … etc.
Fingerprint dep-info files
After rustc
exits successfully, Cargo will read the first kind of dep
info file and translate it into a binary format that is stored in the
fingerprint directory (translate_dep_info
).
These are used to quickly scan for any changed files. The mtime of the fingerprint dep-info file itself is used as the reference for comparing the source files to determine if any of the source files have been modified (see below for more detail).
Note that Cargo parses the special # env-var:...
comments in dep-info
files to learn about environment variables that the rustc compile depends on.
Cargo then later uses this to trigger a recompile if a referenced env var
changes (even if the source didn’t change).
dep-info files for build system integration.
There is also a third dep-info file. Cargo will extend the file created by
rustc with some additional information and saves this into the output
directory. This is intended for build system integration. See the
output_depinfo
function for more detail.
-Zbinary-dep-depinfo
rustc
has an experimental flag -Zbinary-dep-depinfo
. This causes
rustc
to include binary files (like rlibs) in the dep-info file. This is
primarily to support rustc development, so that Cargo can check the
implicit dependency to the standard library (which lives in the sysroot).
We want Cargo to recompile whenever the standard library rlib/dylibs
change, and this is a generic mechanism to make that work.
Mtime comparison
The use of modification timestamps is the most common way a unit will be
determined to be dirty or fresh between builds. There are many subtle
issues and edge cases with mtime comparisons. This gives a high-level
overview, but you’ll need to read the code for the gritty details. Mtime
handling is different for different unit kinds. The different styles are
driven by the Fingerprint::local
field, which is set based on the unit
kind.
The status of whether or not the mtime is “stale” or “up-to-date” is
stored in Fingerprint::fs_status
.
All units will compare the mtime of its newest output file with the mtimes of the outputs of all its dependencies. If any output file is missing, then the unit is stale. If any dependency is newer, the unit is stale.
Normal package mtime handling
LocalFingerprint::CheckDepInfo
is used for checking the mtime of
packages. It compares the mtime of the input files (the source files) to
the mtime of the dep-info file (which is written last after a build is
finished). If the dep-info is missing, the unit is stale (it has never
been built). The list of input files comes from the dep-info file. See the
section above for details on dep-info files.
Also note that although registry and git packages use CheckDepInfo
, none
of their source files are included in the dep-info (see
translate_dep_info
), so for those kinds no mtime checking is done
(unless -Zbinary-dep-depinfo
is used). Repository and git packages are
static, so there is no need to check anything.
When a build is complete, the mtime of the dep-info file in the
fingerprint directory is modified to rewind it to the time when the build
started. This is done by creating an invoked.timestamp
file when the
build starts to capture the start time. The mtime is rewound to the start
to handle the case where the user modifies a source file while a build is
running. Cargo can’t know whether or not the file was included in the
build, so it takes a conservative approach of assuming the file was not
included, and it should be rebuilt during the next build.
Rustdoc mtime handling
Rustdoc does not emit a dep-info file, so Cargo currently has a relatively
simple system for detecting rebuilds. LocalFingerprint::Precalculated
is
used for rustdoc units. For registry packages, this is the package
version. For git packages, it is the git hash. For path packages, it is
the a string of the mtime of the newest file in the package.
There are some known bugs with how this works, so it should be improved at some point.
Build script mtime handling
Build script mtime handling runs in different modes. There is the “old
style” where the build script does not emit any rerun-if
directives. In
this mode, Cargo will use LocalFingerprint::Precalculated
. See the
“rustdoc” section above how it works.
In the new-style, each rerun-if
directive is translated to the
corresponding LocalFingerprint
variant. The RerunIfChanged
variant
compares the mtime of the given filenames against the mtime of the
“output” file.
Similar to normal units, the build script “output” file mtime is rewound to the time just before the build script is executed to handle mid-build modifications.
Considerations for inclusion in a fingerprint
Over time we’ve realized a few items which historically were included in fingerprint hashings should not actually be included. Examples are:
-
Modification time values. We strive to never include a modification time inside a
Fingerprint
to get hashed into an actual value. While theoretically fine to do, in practice this causes issues with common applications like Docker. Docker, after a layer is built, will zero out the nanosecond part of all filesystem modification times. This means that the actual modification time is different for all build artifacts, which if we tracked the actual values of modification times would cause unnecessary recompiles. To fix this we instead only track paths which are relevant. These paths are checked dynamically to see if they’re up to date, and the modification time doesn’t make its way into the fingerprint hash. -
Absolute path names. We strive to maintain a property where if you rename a project directory Cargo will continue to preserve all build artifacts and reuse the cache. This means that we can’t ever hash an absolute path name. Instead we always hash relative path names and the “root” is passed in at runtime dynamically. Some of this is best effort, but the general idea is that we assume all accesses within a crate stay within that crate.
These are pretty tricky to test for unfortunately, but we should have a good test suite nowadays and lord knows Cargo gets enough testing in the wild!
Build scripts
The running of a build script (CompileMode::RunCustomBuild
) is treated
significantly different than all other Unit kinds. It has its own function
for calculating the Fingerprint (calculate_run_custom_build
) and has some
unique considerations. It does not track the same information as a normal
Unit. The information tracked depends on the rerun-if-changed
and
rerun-if-env-changed
statements produced by the build script. If the
script does not emit either of these statements, the Fingerprint runs in
“old style” mode where an mtime change of any file in the package will
cause the build script to be re-run. Otherwise, the fingerprint only
tracks the individual “rerun-if” items listed by the build script.
The “rerun-if” statements from a previous build are stored in the build
output directory in a file called output
. Cargo parses this file when
the Unit for that build script is prepared for the JobQueue
. The
Fingerprint code can then use that information to compute the Fingerprint
and compare against the old fingerprint hash.
Care must be taken with build script Fingerprints because the
Fingerprint::local
value may be changed after the build script runs
(such as if the build script adds or removes “rerun-if” items).
Another complication is if a build script is overridden. In that case, the fingerprint is the hash of the output of the override.
Special considerations
Registry dependencies do not track the mtime of files. This is because registry dependencies are not expected to change (if a new version is used, the Package ID will change, causing a rebuild). Cargo currently partially works with Docker caching. When a Docker image is built, it has normal mtime information. However, when a step is cached, the nanosecond portions of all files is zeroed out. Currently this works, but care must be taken for situations like these.
HFS on macOS only supports 1 second timestamps. This causes a significant number of problems, particularly with Cargo’s testsuite which does rapid builds in succession. Other filesystems have various degrees of resolution.
Various weird filesystems (such as network filesystems) also can cause
complications. Network filesystems may track the time on the server
(except when the time is set manually such as with
filetime::set_file_times
). Not all filesystems support modifying the
mtime.
See the A-rebuild-detection
label on the issue tracker for more.
Build script and bin dependencies are not included. ↩
__CARGO_DEFAULT_LIB_METADATA
is set by rustbuild to embed the release channel (bootstrap/stable/beta/nightly) in libstd. ↩See below for details on mtime tracking. ↩
Config settings that are not otherwise captured anywhere else. Currently, this is only
doc.extern-map
. ↩
Re-exports
pub use dirty_reason::DirtyReason;
Modules
Structs
- Dependency edge information for fingerprints. This is generated for each dependency and is stored in a
Fingerprint
. - Same as
RustcDepInfo
except avoids absolute paths as much as possible to allow moving around the target directory. - A fingerprint can be considered to be a “short string” representing the state of a world for a package.
- The representation of the
.d
dep-info file generated by rustc
Enums
- Tells the associated path in
EncodedDepInfo::files
is relative to package root, target root, or absolute. - Indication of the status on the filesystem for a particular unit.
- A
LocalFingerprint
represents something that we use to detect direct changes to aFingerprint
. - See
FsStatus::StaleItem
.
Functions
- Get ready to compute the
LocalFingerprint
values for aRunCustomBuild
unit. - Create a
LocalFingerprint
for an overridden build script. Returns None if it is not overridden. - Calculates the fingerprint for a
Unit
. - Calculate a fingerprint for a “normal” unit, or anything that’s not a build script. This is an internal helper of
calculate
, don’t call directly. - Calculate a fingerprint for an “execute a build script” unit. This is an internal helper of
calculate
, don’t call directly. - Reads the value from the old fingerprint hash file and compare.
- Returns the location that the dep-info file will show up at for the
Unit
specified. - The
reference
file is considered as “stale” if any file frompaths
has a newer mtime. - Compute the
LocalFingerprint
values for aRunCustomBuild
unit for non-overridden new-style build scripts only. This is only used whendeps
is already known to have a nonemptyrerun-if-*
somewhere. - Logs the result of fingerprint comparison.
- Parses Cargo’s internal
EncodedDepInfo
structure that was previously serialized to disk. - Parse the
.d
dep-info file generated by rustc. - Calculates the fingerprint of a unit thats contains no dep-info files.
- Prepare for work when a package starts to build
- Determines if a
Unit
is up-to-date, and if not prepares necessary work to update the persisted fingerprint. - Returns an absolute path that target directory. All paths are rewritten to be relative to this.
- Parses the dep-info file coming out of rustc into a Cargo-specific format.
- Writes the short fingerprint hash value to
<loc>
and logs detailed JSON information to<loc>.json
.