Expand description

This module manages how the incremental compilation cache is represented in the file system.

Incremental compilation caches are managed according to a copy-on-write strategy: Once a complete, consistent cache version is finalized, it is never modified. Instead, when a subsequent compilation session is started, the compiler will allocate a new version of the cache that starts out as a copy of the previous version. Then only this new copy is modified and it will not be visible to other processes until it is finalized. This ensures that multiple compiler processes can be executed concurrently for the same crate without interfering with each other or blocking each other.

More concretely this is implemented via the following protocol:

  1. For a newly started compilation session, the compiler allocates a new session directory within the incremental compilation directory. This session directory will have a unique name that ends with the suffix “-working” and that contains a creation timestamp.
  2. Next, the compiler looks for the newest finalized session directory, that is, a session directory from a previous compilation session that has been marked as valid and consistent. A session directory is considered finalized if the “-working” suffix in the directory name has been replaced by the SVH of the crate.
  3. Once the compiler has found a valid, finalized session directory, it will hard-link/copy its contents into the new “-working” directory. If all goes well, it will have its own, private copy of the source directory and subsequently not have to worry about synchronizing with other compiler processes.
  4. Now the compiler can do its normal compilation process, which involves reading and updating its private session directory.
  5. When compilation finishes without errors, the private session directory will be in a state where it can be used as input for other compilation sessions. That is, it will contain a dependency graph and cache artifacts that are consistent with the state of the source code it was compiled from, with no need to change them ever again. At this point, the compiler finalizes and “publishes” its private session directory by renaming it from “s-{timestamp}-{random}-working” to “s-{timestamp}-{SVH}”.
  6. At this point the “old” session directory that we copied our data from at the beginning of the session has become obsolete because we have just published a more current version. Thus the compiler will delete it.

Garbage Collection

Naively following the above protocol might lead to old session directories piling up if a compiler instance crashes for some reason before its able to remove its private session directory. In order to avoid wasting disk space, the compiler also does some garbage collection each time it is started in incremental compilation mode. Specifically, it will scan the incremental compilation directory for private session directories that are not in use any more and will delete those. It will also delete any finalized session directories for a given crate except for the most recent one.

Synchronization

There is some synchronization needed in order for the compiler to be able to determine whether a given private session directory is not in used any more. This is done by creating a lock file for each session directory and locking it while the directory is still being used. Since file locks have operating system support, we can rely on the lock being released if the compiler process dies for some unexpected reason. Thus, when garbage collecting private session directories, the collecting process can determine whether the directory is still in use by trying to acquire a lock on the file. If locking the file fails, the original process must still be alive. If locking the file succeeds, we know that the owning process is not alive any more and we can safely delete the directory. There is still a small time window between the original process creating the lock file and actually locking it. In order to minimize the chance that another process tries to acquire the lock in just that instance, only session directories that are older than a few seconds are considered for garbage collection.

Another case that has to be considered is what happens if one process deletes a finalized session directory that another process is currently trying to copy from. This case is also handled via the lock file. Before a process starts copying a finalized session directory, it will acquire a shared lock on the directory’s lock file. Any garbage collecting process, on the other hand, will acquire an exclusive lock on the lock file. Thus, if a directory is being collected, any reader process will fail acquiring the shared lock and will leave the directory alone. Conversely, if a collecting process can’t acquire the exclusive lock because the directory is currently being read from, it will leave collecting that directory to another process at a later point in time. The exact same scheme is also used when reading the metadata hashes file from an extern crate. When a crate is compiled, the hash values of its metadata are stored in a file in its session directory. When the compilation session of another crate imports the first crate’s metadata, it also has to read in the accompanying metadata hashes. It thus will access the finalized session directory of all crates it links to and while doing so, it will also place a read lock on that the respective session directory so that it won’t be deleted while the metadata hashes are loaded.

Preconditions

This system relies on two features being available in the file system in order to work really well: file locking and hard linking. If hard linking is not available (like on FAT) the data in the cache actually has to be copied at the beginning of each session. If file locking does not work reliably (like on NFS), some of the synchronization will go haywire. In both cases we recommend to locate the incremental compilation directory on a file system that supports these things. It might be a good idea though to try and detect whether we are on an unsupported file system and emit a warning in that case. This is not yet implemented.

Constants

Functions