FAQ | This is a LIVE service | Changelog

Skip to content

fix: close race when checking image platforms

Dr Rich Wareham requested to merge 74-racy-build into master

When building multi platform images, it is unfortunately quite easy to accidentally build, for example, an x86_64 image when you meant to build an aarch64 one; if the base image specified in "FROM" doesn't have an aarch64 image available, docker will happily proceed and build an x86_64 one.

We addressed this issue by pulling the images we just pushed back and running uname --machine inside them to check that the platform was what we expected.

Unfortunately that behaviour was racy. There is a small delay between finishing a push of an image replacing one with an existing tag and the new image being available for pulling by that tag. This is perfectly reasonable behaviour on the part of the upstream registry but poses a bit of a problem for us in that, occasionally, we'd be pulling back the incorrect image for testing.

It also had the slightly annoying feature of requiring that uname was present in the built images and that uname --machine had uniform output. In FROM scratch images packaging Go applications, for example, this is not generally the case.

The fix would appear to be straightforward: pull the image by SHA256 digest. Unfortunately there are a few issues with this:

  • The SHA256 digests for images differ depending on their platform. As such we'd need to somehow know which platforms were built and the SHA256 digests for each image and knowing which platforms are built is the problem we're trying to solve.
  • The digest written to the build metadata and via the --iidfile option to docker buildx build correspond either to one image or a manifest list. Neither of those digests are useful in general to pull all images.

We have to actually query what platforms were actually built in a non-racy way.

For single platform images, this is fairly straightforward in that the os and architecture are written by buildx to the metadata.

For multi platform images, things are a bit more complex in that the build metadata records the name and the SHA256 of the list of images pushed to the registry. Fortunately we can use the name and the SHA256 digest to get this list back from the registry via docker manifest. This is non-racy because we can use the SHA256 digest to get back exactly what was pushed. The manifest list handily includes the platform of each image within it.

As a final little annoyance, unless one actually pushes the images built by buildx, there is no way which I can see to get this information out so we need to skip this entire step if we're not pushing images. OTOH, if we're not pushing images, we don't care what their platform is.

The exact plumbing for this is tiresome and verbose but it can be found in build.sh. https://gitlab.developers.cam.ac.uk/uis/devops/infra/dockerimages/-/pipelines/382292 is an example pipeline where multi- and single-platform images are built and the actual platforms are successfully recovered. I've also confirmed that executing build.sh locally skips the check as expected.

Closes #74 (closed)

Edited by Dr Rich Wareham

Merge request reports