FAQ | This is a LIVE service | Changelog

Skip to content
Snippets Groups Projects
  • Dr Rich Wareham's avatar
    72e3d986
    fix: close race when checking image platforms · 72e3d986
    Dr Rich Wareham authored
    When building multi platform images, it is unfortunately quite easy to
    accidentally build, for example, an x86_64 image when you meant to build
    an aarch64 one; if the base image specified in "FROM" doesn't have an
    aarch64 image available, docker will happily proceed and build an x86_64
    one.
    
    We addressed this issue by pulling the images we just pushed back and
    running `uname --machine` inside them to check that the platform was
    what we expected.
    
    Unfortunately that behaviour was racy. There is a small delay between
    finishing a push or an image replacing one with an existing tag and the
    new image being available for pulling by that tag. This is perfectly
    reasonable behaviour on the part of the upstream registry but poses a
    bit of a problem for us in that, occasionally, we'd be pulling back the
    incorrect image for testing.
    
    It also had the slightly annoying feature of requiring that `uname` was
    present in the built images and that `uname --machine` had uniform
    output. In `FROM scratch` images packaging Go applications, for example,
    this is not generally the case.
    
    The fix would appear to be straightforward: pull the image by SHA256
    digest. Unfortunately there are a few issues with this:
    
    - The SHA256 digests for images differ depending on their platform. As
      such we'd need to somehow know which platforms were built and the
      SHA256 digests for each image and knowing which platforms are built is
      the problem we're trying to solve.
    - The digest written to the build metadata and via the `--iidfile`
      option to `docker buildx build` correspond either to one image or a
      manifest list. Neither of those digests are useful in general to pull
      all images.
    
    We have to actually query what platforms were actually built in a
    non-racy way.
    
    For single platform images, this is fairly straightforward in that the
    os and architecture are written by buildx to the metadata.
    
    For multi platform images, things are a bit more complex in that the
    build metadata records the name and the SHA256 of the *list* of images
    pushed to the registry. Fortunately we can use the name and the SHA256
    digest to get this list back from the registry via `docker manifest`.
    This is non-racy because we can use the SHA256 digest to get back
    exactly what was pushed. The manifest list handily includes the
    platform of each image within it.
    
    As a final little annoyance, unless one actually *pushes* the images
    built by `buildx`, there is no way which I can see to get this
    information out so we need to skip this entire step if we're not pushing
    images.
    
    The exact plumbing for this is tiresome and verbose but it can be found
    in `build.sh`.
    https://gitlab.developers.cam.ac.uk/uis/devops/infra/dockerimages/-/pipelines/382292
    is an example pipeline where multi- and single-platform images are built
    and the actual platforms are successfully recovered. I've also confirmed
    that executing `build.sh` locally skips the check as expected.
    
    Closes #74
    72e3d986
    History
    fix: close race when checking image platforms
    Dr Rich Wareham authored
    When building multi platform images, it is unfortunately quite easy to
    accidentally build, for example, an x86_64 image when you meant to build
    an aarch64 one; if the base image specified in "FROM" doesn't have an
    aarch64 image available, docker will happily proceed and build an x86_64
    one.
    
    We addressed this issue by pulling the images we just pushed back and
    running `uname --machine` inside them to check that the platform was
    what we expected.
    
    Unfortunately that behaviour was racy. There is a small delay between
    finishing a push or an image replacing one with an existing tag and the
    new image being available for pulling by that tag. This is perfectly
    reasonable behaviour on the part of the upstream registry but poses a
    bit of a problem for us in that, occasionally, we'd be pulling back the
    incorrect image for testing.
    
    It also had the slightly annoying feature of requiring that `uname` was
    present in the built images and that `uname --machine` had uniform
    output. In `FROM scratch` images packaging Go applications, for example,
    this is not generally the case.
    
    The fix would appear to be straightforward: pull the image by SHA256
    digest. Unfortunately there are a few issues with this:
    
    - The SHA256 digests for images differ depending on their platform. As
      such we'd need to somehow know which platforms were built and the
      SHA256 digests for each image and knowing which platforms are built is
      the problem we're trying to solve.
    - The digest written to the build metadata and via the `--iidfile`
      option to `docker buildx build` correspond either to one image or a
      manifest list. Neither of those digests are useful in general to pull
      all images.
    
    We have to actually query what platforms were actually built in a
    non-racy way.
    
    For single platform images, this is fairly straightforward in that the
    os and architecture are written by buildx to the metadata.
    
    For multi platform images, things are a bit more complex in that the
    build metadata records the name and the SHA256 of the *list* of images
    pushed to the registry. Fortunately we can use the name and the SHA256
    digest to get this list back from the registry via `docker manifest`.
    This is non-racy because we can use the SHA256 digest to get back
    exactly what was pushed. The manifest list handily includes the
    platform of each image within it.
    
    As a final little annoyance, unless one actually *pushes* the images
    built by `buildx`, there is no way which I can see to get this
    information out so we need to skip this entire step if we're not pushing
    images.
    
    The exact plumbing for this is tiresome and verbose but it can be found
    in `build.sh`.
    https://gitlab.developers.cam.ac.uk/uis/devops/infra/dockerimages/-/pipelines/382292
    is an example pipeline where multi- and single-platform images are built
    and the actual platforms are successfully recovered. I've also confirmed
    that executing `build.sh` locally skips the check as expected.
    
    Closes #74
Code owners
Assign users and groups as approvers for specific file changes. Learn more.