Lazy distribution of container images
Current implementation status of containerd remote snapshotter Akihiro Suda
FOSDEM (February 1, 2020)
Credit to Kohei Tokunaga (NTT) for containerd impl. & benchmark scripts
Lazy distribution of container images Current implementation status - - PowerPoint PPT Presentation
FOSDEM (February 1, 2020) Lazy distribution of container images Current implementation status of containerd remote snapshotter Akihiro Suda Credit to Kohei Tokunaga (NTT) for containerd impl. & benchmark scripts Summary Run containers
Current implementation status of containerd remote snapshotter Akihiro Suda
FOSDEM (February 1, 2020)
Credit to Kohei Tokunaga (NTT) for containerd impl. & benchmark scripts
2
specifications for containers
– Docker/Moby, Podman, Kubernetes (containerd, CRI-O, …), Singularity…
metadata format
– Based on Docker Image Manifest V2 Schema 2
via HTTP
– Based on Docker Registry HTTP API
5
magnetic tapes
6
https://en.wikipedia.org/wiki/PDP-11
file metadata cannot be listed up → Can't be mounted as a filesystem
7
Metadata 0 File 0 Metadata 1 File 1 Metadata {n-1} File {n-1} Terminal zero bytes ... File name, permission, ... Content
→ No, because gzip can’t be seek-ed (discussed later)
8
Metadata 0 File 0 Metadata 1 File 1 Metadata {n-1} File {n-1} Terminal zero bytes ... Metadata 0 Metadata 1 Metadata {n-1} … Index file
– Different versions – Different architectures – Different configuration files
identical/similar files
cloud storage
9
1. Requires scanning the whole "tape" 2. No deduplication
10
https://en.wikipedia.org/wiki/Magnetic_tape
The main focus towards lazy distribution
– Harter, Tyler, et al. "Slacker: Fast Distribution with Lazy Docker Containers." FAST 2016
11
– No need to consider tolerance against remote registry failures (because `RUN apt-get install` instructions are already flaky anyway)
12
FROM example.com/heavy-dev-env:lazy AS dev RUN apt-get update && \ apt-get install -y some-additional-libs COPY src . RUN ./configure && \ make static && \ cp bin/foo /foo # the stage switches here FROM scratch COPY --from=dev /foo /foo ENTRYPOINT /foo
(until the image gets 100% cached locally)
– Kubernetes readinessProbe
– Will 2020 be the year of the containerized Linux desktop?
13
content-addressable blob files
16
content-addressable blob files
17
message Metadata { repeated string path; int64 uid; int64 gid; uint32 mode; uint64 size; repeated string sha256Digest; ... } Metadata 0 Metadata 1 Metadata {n-1} … blobs/sha256/deadbeef… blobs/sha256/cafebabe…
implementation exists; hard to promote implementation when the spec is not finalized
for reading small files
18
for accelerating the CI of the Go language project
20
Metadata 0 File 0 Metadata 1 File 1 Metadata {n-1} File {n-1} Terminal zero bytes ... gzip
legacy tar.gz
Metadata 0 File 0 gzip Metadata 1 File 1 gzip ... Metadata {n-1} File {n-1} gzip Metadata for s.i.j.
stargz.index.json
(Metadata 0…{n-1}) gzip Terminal zero bytes empty stream
stargz
gzip
21
Metadata 0 File 0 Metadata 1 File 1 Metadata {n-1} File {n-1} Terminal zero bytes ... gzip
legacy tar.gz
Metadata 0 File 0 gzip Metadata 1 File 1 gzip ... Metadata {n-1} File {n-1} gzip Metadata for s.i.j.
stargz.index.json
(Metadata 0…{n-1}) gzip Terminal zero bytes empty stream
stargz
gzip
(when HTTP Range Requests are supported)
22
gzip This gzip header contains pointer for stargz.index.json
stargz
Metadata 1 File 1 gzip ... Metadata {n-1} File {n-1} gzip Metadata for s.i.j.
stargz.index.json
(Metadata 0…{n-1}) gzip Terminal zero bytes empty stream Metadata 0 File 0 gzip
– By Kohei Tokunaga (NTT) – Implemented as a containerd snapshotter plugin – stargz archives are mounted as read-only FUSE filesystems – OverlayFS is used for supporting writing – Supports more aggressive optimization (discussed later)
– By Giuseppe Scrivano (Red Hat) – Implemented as a fuse-overlayfs plugin
23
docker run
– Future: static analysis using ldd(-ish) ? Machine learning?
prefetched in a single HTTP request
24
/usr/bin/apt-get /bin/ls /bin/vi /lib/libc.so /lib/libjpeg.so /usr/bin/python3 ... /usr/lib/python3/.../foo /usr/lib/python3/.../bar
/app.py
/bin/ls
/app.py
/usr/bin/python3 /lib/libc.so /usr/lib/python3/.../foo /usr/lib/python3/.../bar ... /bin/vi /lib/libjpeg.so /usr/bin/apt-get
“compile hello world” command
25
26
Credit to Kohei Tokunaga (NTT) for containerd impl. & benchmark scripts
27
Credit to Kohei Tokunaga (NTT) for containerd impl. & benchmark scripts
28
Credit to Kohei Tokunaga (NTT) for containerd impl. & benchmark scripts
29
Credit to Kohei Tokunaga (NTT) for containerd impl. & benchmark scripts
– https://github.com/ktock/stargz-snapshotter/issues/37
– Proposed by Giuseppe https://github.com/golang/go/issues/30829#issuecomment-541532402 – Suitable for images with many small files – Not compatible with OCI Image Spec v1.0.1 – Compatible with OCI Image Spec v.Next
30
– Concurrent execution – Efficient caching – Rootless – (pseudo-)daemonless – Clustering on Kubernetes – And a lot of innovative features
– Producing stargz images – Consuming stargz images as base images
31
– Not compatible with OCI tar balls – Has been already widely deployed in CERN and their friends – Implementation available for containerd: https://github.com/ktock/remote-snapshotter/pull/27
– Proposed by Aleksa Sarai (SUSE) – Not compatible with OCI v1 tarballs – Focuses on deduplication, using Restic algorithm – WIP implementation available for umoci (image manipulation tool): https://github.com/openSUSE/umoci/tree/experimental/ociv2 – No runtime implementation seems to exist
32
– Proposed by Edgar Lee (Netflix) – Built on IPFS (P2P CAS) protocol – Not compatible with OCI tar balls – Implementation available for containerd: https://github.com/hinshun/ipcs
– Built on SMB protocol and VHD images – Not FLOSS
33
distribution, but compatibility matters
https://github.com/ktock/stargz-snapshotter
34
https://github.com/ktock/stargz-snapshotter (Expected to be moved under github.com/containerd soon)
35