Utilize Cache in Python App Build with GitHub Actions
## Overview
In this article, we will explore how to effectively utilize caching when building Docker images for Python applications using GitHub Actions. Caching can significantly reduce build times by avoiding unnecessary installations of dependencies, especially when there are no changes to the libraries or the Dockerfile. We’ll cover best practices for writing Dockerfiles, configuring GitHub Actions, and leveraging different caching strategies to optimize your workflow.
## Steps
### Step 1: Write an Efficient Dockerfile
To maximize caching benefits, structure your Dockerfile
correctly. Here’s a simple example:
FROM python:3.12.3-slim
WORKDIR /app
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
COPY . .
ENV PORT=8080
EXPOSE 8080
If you need to install additional packages using apt-get
, ensure that this step is placed before copying requirements.txt
to avoid cache invalidation:
RUN --mount=type=cache,target=/var/cache/apt \
--mount=type=cache,target=/var/lib/apt \
apt-get update && \
apt-get install -y --no-install-recommends \
curl \
git \
&& rm -rf /var/lib/apt/lists/*
### Step 2: Configure GitHub Actions with Caching
In your GitHub Actions workflow, we can use the docker/build-push-action and specify the caching options. Below is an example configuration:
- name: Build and push Docker image
uses: docker/build-push-action@v6
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
When using GitHub runner (not using self-hosted runner), you can just use gha
for your cache registry.
name: build-and-push
on:
release:
types:
- published
push:
tags:
- 'v*'
branches:
- main # this is necessary for pr to utilize the cache
pull_request:
paths:
- .github/workflows/build-and-push.yml
- docker-layer-cache/*
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
build-and-push-image:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Log in to the Container registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build and push Docker image
uses: docker/build-push-action@v6
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
### Step 3: Handle Caching for Pip
To further optimize the build process, utilize caching for Python dependencies. You can mount the pip cache in your Dockerfile
:
FROM python:3.12.7-slim
WORKDIR /app
COPY requirements.txt .
RUN --mount=type=cache,target=/root/.cache/pip,sharing=locked \
pip install -r requirements.txt
COPY . .
ENV PORT=8080
EXPOSE 8080
CMD ["python", "app.py"]
And in your GitHub Actions, restore and save the pip cache:
... # metadata の後を以下のようにする
- name: Restore pip cache
uses: actions/cache/restore@d4323d4df104b026a6aa633fdb11d772146be0bf # v4.2.2
id: pip-cache
with:
path: root-dot-cache-pip
key: pip-cache-${{ hashFiles('requirements.txt') }}
restore-keys: |
pip-cache-
# buildkit-cache-dance を使ってキャッシュを Docker ビルドに注入
- name: Inject cache into Docker build
uses: reproducible-containers/buildkit-cache-dance@5de31fc1534ed8789e63d41ea933c5df9944a261 # v3.1.0
with:
cache-map: |
{
"root-dot-cache-pip": "/root/.cache/pip"
}
skip-extraction: ${{ steps.pip-cache.outputs.cache-hit }}
# Docker イメージをビルドして必要に応じてプッシュ
- name: Build and push Docker image
uses: docker/build-push-action@471d1dc4e07e5cdedd4c2171150001c434f0b7a4 # v6.15.0
with:
context: docker-layer-cache
file: docker-layer-cache/Dockerfile.cache
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
build-args: |
BUILDKIT_INLINE_CACHE=1
- name: Save pip cache
uses: actions/cache/restore@d4323d4df104b026a6aa633fdb11d772146be0bf # v4.2.2
if: github.ref_name == 'main'
with:
path: root-dot-cache-pip
key: ${{ steps.pip-cache.outputs.cache-primary-key }}
### Step 4: Understand Cache Behavior
It’s crucial to comprehend how caching works in GitHub Actions:
- PRs can access the cache saved during the same PR run or the main branch cache.
- Main Branch runs can only use the cache saved from the main branch.
- Cache can be invalidated by changes in
requirements.txt
or other files that impact the Docker layers.
### Tips and Best Practices
- Always aim to keep Docker layers that change frequently (like
apt-get update
) low in theDockerfile
to prevent unnecessary cache invalidation. - Be cautious with self-hosted runners, as they may experience slower cache access depending on the region.
- If you find your cache is not being utilized as expected, check the order of your commands in the Dockerfile and ensure that you’re not inadvertently invalidating the cache.
## Summary
By implementing caching strategies in your Docker builds with GitHub Actions, you can significantly reduce build times and improve developer efficiency. Remember to structure your Dockerfile for optimal caching, configure your GitHub Actions workflow correctly, and leverage pip caching to avoid redundant installations.