Cachepot Logo

Introduction

cachepot is a fork of sccache.

The purpose of this fork is to introduce advanced security concepts to avoid certain attack scenarios as well as preventing bitrot.

It will focus on the distributed compile cache mode, but will also support the current client only mode.

Scope

The scope of this document is to explain the status quo, define security goals and will eventually exist as the reference documentation, once major milestones are achieved.

cachepot distributed compilation quickstart

This is a quick start guide to getting distributed compilation working with cachepot. This guide primarily covers Linux clients. macOS and Windows clients are supported but have seen significantly less testing.

Get cachepot binaries

Either download pre-built cachepot binaries (not currently available), or build cachepot locally with the dist-client and dist-worker features enabled:

cargo build --release --features="dist-client dist-worker"

The target/release/cachepot binary will be used on the client, and the target/release/cachepot-dist binary will be used on the scheduler and build worker.

If you're only planning to use the client, it is enabled by default, so just cargo install cachepot should do the trick.

Configure a scheduler

If you're adding a worker to a cluster that has already be set up, skip ahead to configuring a build worker.

The scheduler is a daemon that manages compile request from clients and parcels them out to build workers. You only need one of these per cachepot setup. Currently only Linux is supported for running the scheduler.

Create a scheduler.conf file to configure client/worker authentication. A minimal example looks like:

# The socket address the scheduler will listen on. It's strongly recommended
# to listen on localhost and put a HTTPS worker in front of it.
public_addr = "127.0.0.1:10600"

[client_auth]
type = "token"
token = "my client token"

[worker_auth]
type = "jwt_hs256"
secret_key = "my secret key"

Mozilla build workers will typically require clients to be authenticated with the Mozilla identity system.

To configure for scheduler for this, the client_auth section should be as follows so any client tokens are validated with the Mozilla service:

[client_auth]
type = "mozilla"
required_groups = ["group_name"]

Where group_name is a Mozilla LDAP group. Users will be required to belong to this group to successfully authenticate with the scheduler.

Start the scheduler by running:

cachepot-dist scheduler --config scheduler.conf

Like the local worker, the scheduler process will daemonize itself unless CACHEPOT_NO_DAEMON=1 is set. If the scheduler fails to start you may need to set RUST_LOG=trace when starting it to get useful diagnostics (or to get less noisy logs: RUST_LOG=cachepot=trace,cachepot-dist=trace ).

Configure a build worker

A build worker communicates with the scheduler and executes compiles requested by clients. Only Linux is supported for running a build worker, but executing cross-compile requests from macOS/Windows clients is supported.

The build worker requires bubblewrap to sandbox execution, at least version 0.3.0. Verify your version of bubblewrap before attempting to run the worker. On Ubuntu 18.10+ you can apt install bubblewrap to install it. If you build from source you will need to first install your distro's equivalent of the libcap-dev package.

Create a worker.conf file to configure authentication, storage locations, network addresses and the path to bubblewrap. A minimal example looks like:

# This is where client toolchains will be stored.
cache_dir = "/tmp/toolchains"
# The maximum size of the toolchain cache, in bytes.
# If unspecified the default is 10GB.
# toolchain_cache_size = 10737418240
# A public IP address and port that clients will use to connect to this builder.
public_addr = "192.168.1.1:10501"
# The URL used to connect to the scheduler (should use https, given an ideal
# setup of a HTTPS worker in front of the scheduler)
scheduler_url = "https://192.168.1.1"

[builder]
type = "overlay"
# The directory under which a sandboxed filesystem will be created for builds.
build_dir = "/tmp/build"
# The path to the bubblewrap version 0.3.0+ `bwrap` binary.
bwrap_path = "/usr/bin/bwrap"

[scheduler_auth]
type = "jwt_token"
# This will be generated by the `generate-jwt-hs256-worker-token` command or
# provided by an administrator of the cachepot cluster.
token = "my worker's token"

Due to bubblewrap requirements currently the build worker must be run as root. Start the build worker by running:

sudo cachepot-dist worker --config worker.conf

As with the scheduler, if the build worker fails to start you may need to set RUST_LOG=trace to get useful diagnostics. (or to get less noisy logs: RUST_LOG=cachepot=trace,cachepot-dist=trace ).

Configure a client

A client uses cachepot to wrap compile commands, communicates with the scheduler to find available build workers, and communicates with build workers to execute the compiles and receive the results.

Clients that are not targeting linux64 require the icecc-create-env script or should be provided with an archive. icecc-create-env is part of icecream for packaging toolchains. You can install icecream to get this script (apt install icecc on Ubuntu), or download it from the git repository and place it in your PATH: curl https://raw.githubusercontent.com/icecc/icecream/master/client/icecc-create-env.in > icecc-create-env && chmod +x icecc-create-env. See using custom toolchains.

Create a client config file in ~/.config/cachepot/config (on Linux), ~/Library/Application Support/Parity.cachepot/config (on macOS), or %APPDATA%\Parity\cachepot\config\config (on Windows). A minimal example looks like:

[dist]
# The URL used to connect to the scheduler (should use https, given an ideal
# setup of a HTTPS worker in front of the scheduler)
scheduler_url = "https://192.168.1.1"
# Used for mapping local toolchains to remote cross-compile toolchains. Empty in
# this example where the client and build worker are both Linux.
toolchains = []
# Size of the local toolchain cache, in bytes (5GB here, 10GB if unspecified).
toolchain_cache_size = 5368709120

[dist.auth]
type = "token"
# This should match the `client_auth` section of the scheduler config.
token = "my client token"

Clients using Mozilla build workers should configure their dist.auth section as follows:

[dist.auth]
type = "mozilla"

And retrieve a token from the Mozilla identity service by running cachepot --dist-auth and following the instructions. Completing this process will retrieve and cache a token valid for 7 days.

Make sure to run cachepot --stop-coordinator and cachepot --start-coordinator if cachepot was running before changing the configuration.

You can check the status with cachepot --dist-status, it should say something like:

$ cachepot --dist-status
{"SchedulerStatus":["https://cachepot1.corpdmz.ber3.mozilla.com/",{"num_workers":3,"num_cpus":56,"in_progress":24}]}

For diagnostics, advice for scheduler/worker does not work with RUSTC_WRAPPER. Therefore following approach is advised: CACHEPOT_LOG=trace RUSTC_WRAPPER=... cargo build.

Using custom toolchains

Since Windows and macOS cannot automatically package toolchains, it is important to be able to manually specify toolchains for distribution. This functionality is also available on Linux.

Using custom toolchains involves adding a dist.toolchains section to your client config file (you can add it multiple times to specify multiple toolchains).

On Linux and macOS:

[[dist.toolchains]]
type = "path_override"
compiler_executable = "/home/me/.mozbuild/clang/bin/clang"
archive = "/home/me/.mozbuild/toolchains/33d92fcd79ffef6e-clang-dist-toolchain.tar.xz"
archive_compiler_executable = "/builds/worker/toolchains/clang/bin/clang"

On Windows:

[[dist.toolchains]]
type = "path_override"
compiler_executable = "C:/clang/bin\\clang-cl.exe"
archive = "C:/toolchains/33d92fcd79ffef6e-clang-dist-toolchain.tar.xz"
archive_compiler_executable = "/builds/worker/toolchains/clang/bin/clang"

Where:

  • compiler_executable identifies the path that cachepot will match against to activate this configuration (you need to be careful on Windows - paths can have slashes in both directions, and you may need to escape backslashes, as in the example)
  • archive is the compressed tar archive containing the compiler toolchain to distribute when compiler_executable is matched
  • archive_compiler_executable is the path within the archive the distributed compilation should invoke

A toolchain archive should be a Gzip compressed TAR archive, containing a filesystem sufficient to run the compiler without relying on any external files. If you have archives compatible with icecream (created with icecc-create-env, like these ones for macOS), they should also work with cachepot. To create a Windows toolchain, it is recommended that you download the Clang binaries for Ubuntu 16.04 and extract them, package up the toolchain using the extracted bin/clang file (requires PR #321) and then insert bin/clang-cl at the appropriate path as a symlink to the bin/clang binary.

Considerations when distributing from macOS

When distributing from a macOS client, additional flags and configuration may be required:

  • An explicit target should be passed to the compiler, for instance by adding --target=x86_64-apple-darwin16.0.0 to your build system's CFLAGS.
  • An explicit toolchain archive will need to be configured, as described above. In case rust is being cached, the same version of rustc will need to be used for local compiles as is found in the distributed archive.
  • The client config will be read from ~/Library/Application Support/Parity.cachepot/config, not ~/.config/cachepot/config.
  • Some cross compilers may not understand some intrinsics used in more recent macOS SDKs. The 10.11 SDK is known to work.

Making a build worker start at boot time

It is very easy with a systemd service to spawn the worker on boot.

You can create a service file like /etc/systemd/system/cachepot-worker.service with the following contents:

[Unit]
Description=cachepot-dist worker
Wants=network-online.target
After=network-online.target

[Service]
ExecStart=/path/to/cachepot-dist worker --config /path/to/worker.conf

[Install]
WantedBy=multi-user.target

Note that if the cachepot-dist binary is in a user's home directory, and you're in a distro with SELinux enabled (like Fedora), you may need to use an ExecStart line like:

ExecStart=/bin/bash -c "/home/<user>/path/to/cachepot-dist worker --config /home/<user>/path/to/worker.conf"

This is because SELinux by default prevents services from running binaries in home directories, for some reason. Using a shell works around that. An alternative would be to move the cachepot-dist binary to somewhere like /usr/local/bin, but then you need to remember to update it manually.

After creating that file, you can ensure it's working and enable it by default like:

systemctl daemon-reload
systemctl start cachepot-worker
systemctl status # And check it's fine.
systemctl enable cachepot-worker # This enables the service on boot

Configuration

TODO

integration with gitlab

Compilation outputs (stdout, stderr)

Compilation outputs allow attackers to leak data from inside of the execution environment. This also applies the cachepot server provided sandbox and as such nothing of the CI environment should be deemed s3cr1t.

As the way cachepot client works, is that it's provided to cargo via RUSTC_WRAPPER=cachepot, therefore compilations will be executed on cachepot-dist server, but build.rs and invocations with uncachable elements are still being run on the client on the gitlab runner's executor. As such, the security concerns for the gitlab worker are still to be kept high!

cachepot-dist server and cachepot-dist scheduler is a distinct service, therefore can run on another machine/instance.

Interaction Graph

             +----------------------+
             |                      |
             |  +-----------------+ |
             |  |                 | |
             |  | parsing ci.yml  | |
             |  |                 | |
             |  +-----------------+ |
             |                      |
             | <instance>.gitlab.io |
             +----------+-----------+
                        |
                        |
                        |
                        |
                        |
                        v
+-----------------------+---------------------------+
|                                                   |   (In future we may
| +-(always-fresh container) execution-of-CI/CD--+  |   consider option
| |                                              |  |   ofcachepot client
| |                                              |  |   connecting from
| |          1st. fetch dependencies             |  |   employees machines)
| |                                              |  |
| | +---------------(optional)-----------------+ |  |
| | |       (restricting to be considered)     | |  |   here only "get"/"read" ACL
| | | 2. cargo build without internet access   | |  |    to cache
| | |                                          | |  |             as this container
| | | except for                            <------------<-----+  may be modified
| | | cachepot client <-> scheduler, server    | |  |          |  by
| | |      ^                     ^  cache "get"| |  |          |  gitlab-ci.yml
| | +------------------------------------------+ |  |          |  build.rs
| |        |                     |               |  |          |  proc-macros
| +----------------------------------------------+  |          |
|          |                     |                  |          |
|          |    gitlab runner    |                  |          ^
|          |                     |                  |          |
+---------------------------------------------------+          |
           |                     |                             |
           |                     |                             |
           |                     |                             |
           |                     |                             |
           |                     v                             ^
           |                +----+---------------+          get|
           |                |                    |             |
           |                | cachepot scheduler |             |
           |                |                    |     +-------+---------+
           |                +---+----------------+     |                 |
           |                    ^                      |                 |
           |                    |                      | s3-like cache   |
           |                    |                      |                 |
           |                    |                      |                 |
           v                    v                      |                 |
+----------+--------------------+--------+             +-----------------+
|                                        |
|                                        |                   put,get
|         container/sandbox              |                     ^
|   +---------(bubblewrap)--------+      |                     |
|   |(no internet,very restricted)|      |                     |
|   |                             |      |                     |
|   |                             |      |                     |
|   |    rustc etc.               |      |                     |
|   |                             |      |                     |
|   |                             |      |                     |
|   |                             |      |                     |
|   +-----------------------------+      |                     |
|                                        |                     |
|                                        +<--------------------+
|  cachepot server                       |
|                                        |
|                                        |
+----------------------------------------+

cachepot on Jenkins

When using cachepot on Jenkins one has to know about how to deal with the cachepot server process. Unless specified otherwise, cachepot uses port 4226. On invocation, cachepot tries to connect to a cachepot server instance on this port. If no server is running, a new instance is spawned. Jenkins tries to kill all spawned processes once a job is finished. This results in broken builds when two run in parallel and the first one who spawned the server is finished and the server is killed. The other job way be in contact with the server (e.g waiting for a cache response) and fail.

One option to solve this problem is to spawn a always running cachepot server process by setting CACHEPOT_IDLE_TIMEOUT to 0 and start the server beside Jenkins as a system service. This implies that all jobs use the same cachepot configuration and share the statistics.

If a per-jobs cachepot configuration is needed or preferred (e.g place a local disc cache in $WORKSPACE) the Port allocator plugin does a good job. It assigns a free and unique port number to a job by exporting a variable. Naming this variable CACHEPOT_COORDINATOR_PORT is enough to make the job spawn it's own cachepot server that is save to terminate upon job termination. This approach has the advantage that each job (with a dedicated server instance) maintains it's own statistics that might be interesting upon job finalization.

concourse.ci

TODO

notation

TODO

notation

TODO

Threat model

By definition, PRs can contain arbitrary code. With the rust ecosystem it's common to have custom code in the form of proc_macros being run as part of the compilation process. As a consequence, there must be measures taken to avoid fallout.

Assumptions

A single rust invocation does not require any kind of internet access. This precludes any proc_macros that implement and web or socket based queries from working with cachepot.

Goals

make the build server to securely and fast provide build artifacts, if possible increase the possibility of caching computations with security precautions. The goal of cachepot is to provide a secure compilation and artifact caching system, where a set of inputs is derived from a compiler invocation (i.e. rustc) and computed on the remote worker. The crucial part here is to provide a robust mapping from those input sets to cached compile artifacts in an efficient manner.

Guarantees

For a given set of inputs, user should get the appropriate cached artifact that was created by an equivalent commandlind invocation of the compiler minus some path prefix changes.

Sandbox

The rustc invocation on the cachepot server must never have access to the host environment or storage.

Current

Built-in support for bubblewrap (with the binary bwrap) and docker. bubblewrap is the prefered choice.

Hardening

Future considerations include adding a KVM based sandboxing for further hardening i.e. Quark, katacontainers, or firecracker

Cache poisoning

Independence between compiler invocation, such that no invocation of a (potentially malicious) invocation can lead to delivering incorrect artifacts. It must be impossible to modify existing artifacts.

Current

TODO

Hardening

Assure the hash is verified on the server side, such that the client has no power over the hash calculation.

TODO

Container poisoning

Proper measures should be introduced to prevent containers to be poisoned between runs.

Current Measure

Use overlay fs with bubblewarp or and ephemeral containers with docker. Containers as such or their storage are never re-used.

roadmap

While we attempt to upstream as much as possible back to sccache, there is no guarantee that the changes we make are also appropriate for the upstream which is used for the firefox builds and might have different requirements.

Priorities

  1. Linux x86-64 first
  2. Make paritytech/substrate and paritytech/polkadot work
  3. Investigate performance bottlenecks
  4. Implement additional security layers

Linux x86-64 first

Most machines running as servers are x86_64 Linux machines today. Clients might be Mac or Windows. We are focusing on Linux at the beginning and try to not break the existing support for Mac and Windows on the client side. The server side will stay x86_64 Linux only, cross compilation is supported by (cross -)toolchains.

Performance Bottlenecks

The lookup keys are based on hashes includes timestamps and paths, as such re-usability of cache vars is very limited. This is a performance limitation, since the cache is ultimately not shared. There are of course various other performance topics that will be addressed but are not necessarily part of this priority item.

Additional Security layers

The biggest topic that has yet to be specified in detail, is the introduction of multi layer caches, with different trust levels. I.e. a CI cluster could warm caches every night with trusted storage. These could then be used to fetch artifacts combined with a per-user cache for local repeated compiles.

Available Configuration Options

file

[dist]
# where to find the scheduler
scheduler_url = "http://1.2.3.4:10600"
# a set of prepackaged toolchains
toolchains = []
# the maximum size of the toolchain cache in bytes
toolchain_cache_size = 5368709120
cache_dir = "/home/user/.cache/cachepot-dist-client"

[dist.auth]
type = "token"
token = "secrettoken"


#[cache.azure]
# does not work as it appears

[cache.disk]
dir = "/tmp/.cache/cachepot"
size = 7516192768 # 7 GiBytes

[cache.gcs]
# optional url
url = "..."
rw_mode = "READ_ONLY"
# rw_mode = "READ_WRITE"
cred_path = "/psst/secret/cred"
bucket = "bucket"

[cache.memcached]
url = "..."

[cache.redis]
url = "redis://user:passwd@1.2.3.4:6379/1"

[cache.s3]
bucket = "name"
endpoint = "s3-us-east-1.amazonaws.com"
use_ssl = true

env

Whatever is set by a file based configuration, it is overruled by the env configuration variables

misc

  • CACHEPOT_ALLOW_CORE_DUMPS to enable core dumps by the server
  • CACHEPOT_CONF configuration file path
  • CACHEPOT_CACHED_CONF
  • CACHEPOT_IDLE_TIMEOUT how long the local daemon process waits for more client requests before exiting
  • CACHEPOT_STARTUP_NOTIFY specify a path to a socket which will be used for server completion notification
  • CACHEPOT_MAX_FRAME_LENGTH how much data can be transfered between client and server
  • CACHEPOT_NO_DAEMON set to 1 to disable putting the server to the background

cache configs

disk

  • CACHEPOT_DIR local on disk artifact cache directory
  • CACHEPOT_CACHE_SIZE maximum size of the local on disk cache i.e. 10G

s3 compatible

  • CACHEPOT_BUCKET s3 bucket to be used
  • CACHEPOT_ENDPOINT s3 endpoint
  • CACHEPOT_REGION s3 region
  • CACHEPOT_S3_USE_SSL s3 endpoint requires TLS, set this to true

The endpoint used then becomes ${CACHEPOT_BUCKET}.s3-{CACHEPOT_REGION}.amazonaws.com. If CACHEPOT_REGION is undefined, it will default to us-east-1.

redis

  • CACHEPOT_REDIS full redis url, including auth and access token/passwd

The full url appears then as redis://user:passwd@1.2.3.4:6379/1.

memcached

  • CACHEPOT_MEMCACHED memcached url

gcs

  • CACHEPOT_GCS_BUCKET
  • CACHEPOT_GCS_CREDENTIALS_URL
  • CACHEPOT_GCS_KEY_PATH
  • CACHEPOT_GCS_RW_MODE

azure

  • CACHEPOT_AZURE_CONNECTION_STRING

FAQ

Q: Why not bazel? A: Bazel makes a few very opinionated assumptions such as hermetic builds being a given, which is a good property in general but non-trivial to achieve for now. There is another issue regarding the fact that bazel is very dominant. It assumes it’s the entry tool and we want to stick with cargo while maintaining the option to plug in sccache/cachepot.

Q: Why not buildbarn? A: It is the backend caching infra for bazel.

Q: Why not synchronicty? A: It’s in a very early, experimental stage and uses components with low activity and low community involvement.