Docker's V1 and V2 Image Specification

In 2016, Docker has officially updated their image specification from V1 to V2, adopting a more sophisticated scheme that is inline with OCI Container Image Specification.

There are only a few minor differences between Docker's image spcification V2 and OCI image specification (See Compatibility Matrix). Here we will discuss some of the major changes from V1 and V2, and why Docker has moved towards these changes.

Image vs Container

Docker images contain the underlying changes in the root filesystem, and the execution parameters of a container. When we write a Dockerfile, the FROM clause bring in a base image from an external registry; each line of operation adds a new layer or attribute to the base image. Eventually docker build will gives us an image that we can deploy, share or run as a container.

It is important to distinguish that an image is different from a container. When a container is ran, the image (which is a stack of tarballs with manifest JSON files) is processed and transformed into the underlying filesystem, mounts, environment variables and so on. OCI Container Image Specification and OCI Container Runtime Specification give a good modern understanding on what an image is.

Content Addressability

In Docker Image Specification V1.0, each image layer has a randomly generated 256-bit id that uniquely references the layer. The space of the id is sufficient to ensure the uniqueness of all layers, but it does not guarantee that the image you pull is always the image you expect. Imagine that you have an image pychat:1.0 that uses python:3 as its base image. You uploaded your image to an image registry, and over the years someone comes along and swaps out some files in python:3. All in a sudden your code breaks and you might have to look into a deep chain of dependencies to figure out why your container doesn't work anymore. This is why having an unique id is not enough, we want a way to efficiently address the content of a container.

Content addressing is achieved by using a collision resistent hash function. In Docker Image Specification V2, each image layer is referenced with an unique content address computed from the canonical representation of the layer changeset. We can look at it like this:

"""
Note that this is an over-simplified version of how
image_ids are generated.
"""
layer_ids = []
for layer_changeset in layers:
    layer_id = hash(repr(layer_changeset))
    layer_ids.add(layer_id)
image_manifest = layer_ids
image_id = hash(image_manifest)

Due to the nature of cryptographic hash functions, the hashes can be used to distinguish whether two layers contain exactly the same content. We can use this property to ensure that the image we download is the same image we used before, i.e. tell Docker to fetch an image with the content address of a previously used and tested image. This will ensure that the image we get hadn't been changed since, as any changes to the image will result in a entirely different content address.

A manifest is a JSON file that contains all necessary configuration for an image. In Docker image spec V2, a manifest contains the content addresses of all layers in an image. Since it contains the hash of all layers, we can simply download this small JSON file and take the hash of it to verify that this image has all the layers we wanted. This saves us a lot of time from downloading and checking the content of the entire image.

An example V2 image manifest:

{
    "schemaVersion": 2,
    "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
    ...
    "layers": [
        {
            "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
            "size": 32654,
            "digest": "sha256:e692418e4cbaf90ca...b51fab815ad7fc331f"
        },
        {
            "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
            "size": 16724,
            "digest": "sha256:3c3a4604a545cdc12...f4a9c1905b15da2eb4"
        },
        {
            "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
            "size": 73109,
            "digest": "sha256:ec4b8955958665577...7f12184802ad867736"
        }
    ]
}

Example Dockerfile utilizing the image described in the above manifest:

FROM python@sha256:b22de77...f118cb
...
CMD ["python3", "pychat.py"]

This way pychat:1.0 is guaranteed to pull the same base image everytime it is ran, and the container will remain deterministic until our selected hash algorithm is found to be broken, which hopefully is not in the near future.

One thing to look out for when pulling images by content address:

There are some inconsistencies between the content of the image manifest in the system and the content of the manifest that exists in the registry, i.e. The image ID displayed in docker image list --no-trunc is different to the address you use to fetch an image (relevant blog post and Github issue).

Image Specific Properties

Another important change is the location of image specific properties. In V1, each layer contains a JSON file that specifies what the image should do up to this layer. The file contains information such as entrypoint, env, cmd, memory and so on. However, these attributes should not be layer specific. Each layer essentially represents a change in the image's filesystem at a point in time, associating such component with runtime properties such as environment variables, entrypoints and CPU usage is completely unnecessary. Runtime properties should be related to an image as a whole, not a single layer in the filesystem.

We can take a look at Docker V1's Image format by doing docker save <image name> | tar -x in an empty directroy. docker save and docker load are the two docker commands that still uses the V1 standard for legacy reasons.

$ docker pull ubuntu@sha256:d2518289e66fd3892c2dae5003218117abeeed2edbb470cba544aef480fb6b3a
$ docker save ubuntu@sha256:d2518289e66fd3892c2dae5003218117abeeed2edbb470cba544aef480fb6b3a | tar -x
$ tree
.
├── 08ca6384a97957eac5a5a69cdc799434739655c88e69efb23d2bb963110dbf48
│   ├── json
│   ├── layer.tar
│   └── VERSION
├── 0fa211e5edebeb29d3e29cc2c8c87e9a6a8306901816c19b7f6fb6a7392c3cef
│   ├── json
│   ├── layer.tar
│   └── VERSION
├── 452a96d81c30a1e426bc250428263ac9ca3f47c9bf086f876d11cb39cf57aeec.json
├── 614c02cb92ee20d3cd51770f07d67503f87a75602ddf032a0a6163527fcf97e0
│   ├── json
│   ├── layer.tar
│   └── VERSION
├── cc8487ed6373e8b38c60ff8fc5bdfdd9576aa49226a6e4dcac522f61f5f19d31
│   ├── json
│   ├── layer.tar
│   └── VERSION
├── daf8616e33b20539309a114814ba9864367630ad8da63d4e96bea40dd22841ba
│   ├── json
│   ├── layer.tar
│   └── VERSION
└── manifest.json

We can see that the image has 5 underlying layers (represented by sub-directories) and a top level layer (represented by the root directories). The manifest.json tells us the ordering of the layers and which of them is the top level directory. We can also see that there is a file json in each sub directory, this is the configuration of each layer in V1 format as described earlier. Read the json files and pay special attention to the container_config attribute set, these attributes should be assoicated to the image as a whole, however they exist at each layer instead. Having a hierarchy of inheritance on these runtime attributes layer by layer often leads to unnecessary complexity.

In V2, the layer JSON is discarded and instead it was decided that the layer changeset itself is enough to represent a layer in the image. Layer hierarchies are specified in the layers attribute in the root level manifest, which references the layers by its content addresses. The attributes of container_config is now stored in another file inline with the OCI Runtime Specification. The new specification makes sure that the runtime configurations are associated with the image as a whole, and inheritance happens at a image level rather than a container level. Note that the content address of this runtime configuration is also included in the V2 manifest.

Multi-Architecture Images

V2 also introduces a new configuration component called a manifest list, or an image index in OCI's terminology. Manifest list is essentially a list of platform specific image manifests which contains similar contents, for example, an Ubuntu:16.04 container for a MacOS host and an Ubuntu:16.04 container for a linux host. This allows multi-architecture containers to be coupled together and treated as a whole, which is useful for packaging a set of similar images and distributing them to an image registry.

Conclusion

Here's a summary of the differences mentioned in this article:

  • Image Spec V1
    • Randomly generated image ID
    • Image specific properties are defined at layer level
    • No multi-architecture support
  • Image Spec V2
    • Image IDs are content addresses
    • Image specific properties defined at image level
    • Multi-architecture support

There are other advantages, we'll examine in later posts.