Expand description
A Source
for registry-based packages.
What’s a Registry?
Registries are central locations where packages can be uploaded to, discovered, and searched for. The purpose of a registry is to have a location that serves as permanent storage for versions of a crate over time.
Compared to git sources (see GitSource
), a registry provides many
packages as well as many versions simultaneously. Git sources can also
have commits deleted through rebasings where registries cannot have their
versions deleted.
In Cargo, RegistryData
is an abstraction over each kind of actual
registry, and RegistrySource
connects those implementations to
Source
trait. Two prominent features these abstractions provide are
- A way to query the metadata of a package from a registry. The metadata comes from the index.
- A way to download package contents (a.k.a source files) that are required when building the package itself.
We’ll cover each functionality later.
Different Kinds of Registries
Cargo provides multiple kinds of registries. Each of them serves the index and package contents in a slightly different way. Namely,
LocalRegistry
— Serves the index and package contents entirely on a local filesystem.RemoteRegistry
— Serves the index ahead of time from a Git repository, and package contents are downloaded as needed.HttpRegistry
— Serves both the index and package contents on demand over a HTTP-based registry API. This is the default starting from 1.70.0.
Each registry has its own RegistryData
implementation, and can be
created from either RegistrySource::local
or RegistrySource::remote
.
The Index of a Registry
One of the major difficulties with a registry is that hosting so many packages may quickly run into performance problems when dealing with dependency graphs. It’s infeasible for cargo to download the entire contents of the registry just to resolve one package’s dependencies, for example. As a result, cargo needs some efficient method of querying what packages are available on a registry, what versions are available, and what the dependencies for each version is.
To solve the problem, a registry must provide an index of package metadata. The index of a registry is essentially an easily query-able version of the registry’s database for a list of versions of a package as well as a list of dependencies for each version. The exact format of the index is described later.
See the index
module for topics about the management, parsing, caching,
and versioning for the on-disk index.
The Format of The Index
The index is a store for the list of versions for all packages known, so its
format on disk is optimized slightly to ensure that ls registry
doesn’t
produce a list of all packages ever known. The index also wants to ensure
that there’s not a million files which may actually end up hitting
filesystem limits at some point. To this end, a few decisions were made
about the format of the registry:
- Each crate will have one file corresponding to it. Each version for a
crate will just be a line in this file (see
IndexPackage
for its representation). - There will be two tiers of directories for crate names, under which
crates corresponding to those tiers will be located.
(See
cargo_util::registry::make_dep_path
for the implementation of this layout hierarchy.)
As an example, this is an example hierarchy of an index:
.
├── 3
│ └── u
│ └── url
├── bz
│ └── ip
│ └── bzip2
├── config.json
├── en
│ └── co
│ └── encoding
└── li
├── bg
│ └── libgit2
└── nk
└── link-config
The root of the index contains a config.json
file with a few entries
corresponding to the registry (see RegistryConfig
below).
Otherwise, there are three numbered directories (1, 2, 3) for crates with names 1, 2, and 3 characters in length. The 1/2 directories simply have the crate files underneath them, while the 3 directory is sharded by the first letter of the crate name.
Otherwise the top-level directory contains many two-letter directory names, each of which has many sub-folders with two letters. At the end of all these are the actual crate files themselves.
The purpose of this layout is to hopefully cut down on ls
sizes as well as
efficient lookup based on the crate name itself.
See The Cargo Book: Registry Index for the public interface on the index format.
The Index Files
Each file in the index is the history of one crate over time. Each line in
the file corresponds to one version of a crate, stored in JSON format (see
the IndexPackage
structure).
As new versions are published, new lines are appended to this file. The only modifications to this file that should happen over time are yanks of a particular version.
Downloading Packages
The purpose of the index was to provide an efficient method to resolve the dependency graph for a package. After resolution has been performed, we need to download the contents of packages so we can read the full manifest and build the source code.
To accomplish this, RegistryData::download
will “make” an HTTP request
per-package requested to download tarballs into a local cache. These
tarballs will then be unpacked into a destination folder.
Note that because versions uploaded to the registry are frozen forever that the HTTP download and unpacking can all be skipped if the version has already been downloaded and unpacked. This caching allows us to only download a package when absolutely necessary.
Filesystem Hierarchy
Overall, the $HOME/.cargo
looks like this when talking about the registry
(remote registries, specifically):
# A folder under which all registry metadata is hosted (similar to
# $HOME/.cargo/git)
$HOME/.cargo/registry/
# For each registry that cargo knows about (keyed by hostname + hash)
# there is a folder which is the checked out version of the index for
# the registry in this location. Note that this is done so cargo can
# support multiple registries simultaneously
index/
registry1-<hash>/
registry2-<hash>/
...
# This folder is a cache for all downloaded tarballs (`.crate` file)
# from a registry. Once downloaded and verified, a tarball never changes.
cache/
registry1-<hash>/<pkg>-<version>.crate
...
# Location in which all tarballs are unpacked. Each tarball is known to
# be frozen after downloading, so transitively this folder is also
# frozen once its unpacked (it's never unpacked again)
# CAVEAT: They are not read-only. See rust-lang/cargo#9455.
src/
registry1-<hash>/<pkg>-<version>/...
...
Modules
- download 🔒Shared download logic between
HttpRegistry
andRemoteRegistry
. - Access to a HTTP-based crate registry. See
HttpRegistry
for details. - index 🔒Management of the index of a registry source.
- local 🔒Access to a regstiry on the local filesystem. See
LocalRegistry
for more. - remote 🔒Access to a Git index based registry. See
RemoteRegistry
for details.
Structs
- The content inside
.cargo-ok
. SeeRegistrySource::unpack_package
for more. - The
config.json
file stored in the index. - A
Source
implementation for a local or a remote registry.
Enums
- Result from loading data from a registry.
- The status of
RegistryData::download
which indicates if a.crate
file has already been downloaded, or if not then the URL to download.
Constants
- The
.cargo-ok
file is used to track if the source is already unpacked. SeeRegistrySource::unpack_package
for more.
Traits
- An abstract interface to handle both a local and and remote registry.