Step-by-Step: Develop a Recommendation System Using Vertex AI (From Custom Training to Deployment with MovieLens dataset)

7 min readOct 13, 2024

## Overview

In this article, we will explore how to build a recommendation system using Vertex AI. We will go through the following steps:

Model Training using Custom Training
Registering the Model with Vertex AI Model Registry
Deploying the Model to an Endpoint
Making Predictions using the Deployed Model

## Example: Movielens dataset and retrieval model

For our example, we will use the MovieLens dataset to create a retrieval model. This will involve leveraging TensorFlow Recommenders to implement our model.

import tensorflow_recommenders as tfrs

from typing import Dict, Text
import tempfile
import os
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds

# Env Var: https://cloud.google.com/vertex-ai/docs/training/code-requirements#environment-variables
MODEL_DIR = os.getenv("AIP_MODEL_DIR", tempfile.mkdtemp())
CHECKPOINT_DIR = os.path.join("AIP_CHECKPOINT_DIR", tempfile.mkdtemp())
TENSORBOARD_LOG_DIR = os.path.join("AIP_TENSORBOARD_LOG_DIR", tempfile.mkdtemp())

# Read data
ratings = tfds.load("movielens/100k-ratings", split="train")
# Features of all the available movies.
movies = tfds.load("movielens/100k-movies", split="train")

ratings = ratings.map(lambda x: {
    "movie_title": x["movie_title"],
    "user_id": x["user_id"]
})
movies = movies.map(lambda x: x["movie_title"]) # MapDataset で各ElementはTensor

user_ids_vocabulary = tf.keras.layers.StringLookup(mask_token=None)
user_ids_vocabulary.adapt(ratings.map(lambda x: x["user_id"]))

movie_titles_vocabulary = tf.keras.layers.StringLookup(mask_token=None)
movie_titles_vocabulary.adapt(movies)


class MovieLensModel(tfrs.Model):
  # We derive from a custom base class to help reduce boilerplate. Under the hood,
  # these are still plain Keras Models.

  def __init__(
      self,
      user_model: tf.keras.Model,
      movie_model: tf.keras.Model,
      task: tfrs.tasks.Retrieval):
    super().__init__()

    # Set up user and movie representations.
    self.user_model = user_model
    self.movie_model = movie_model

    # Set up a retrieval task.
    self.task = task

  def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:
    # Define how the loss is computed.

    user_embeddings = self.user_model(features["user_id"])
    movie_embeddings = self.movie_model(features["movie_title"])

    return self.task(user_embeddings, movie_embeddings)

# Define user (user_id) and movie (movie_title) models.
user_model = tf.keras.Sequential([
    user_ids_vocabulary,
    tf.keras.layers.Embedding(user_ids_vocabulary.vocabulary_size(), 64)
])
movie_model = tf.keras.Sequential([
    movie_titles_vocabulary,
    tf.keras.layers.Embedding(movie_titles_vocabulary.vocabulary_size(), 64)
])

# Define your objectives.
task = tfrs.tasks.Retrieval(metrics=tfrs.metrics.FactorizedTopK(
    movies.batch(128).map(movie_model)
  )
)

# Create a retrieval model.
model = MovieLensModel(user_model, movie_model, task)
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.5))

# Train for 3 epochs.
model.fit(ratings.batch(4096), epochs=3)


# !pip install -q scann
is_scann = False
try:
  index = tfrs.layers.factorized_top_k.ScaNN(model.user_model)
  index.index_from_dataset(
    tf.data.Dataset.zip((movies.batch(100), movies.batch(100).map(model.movie_model)))
  )
  is_scann = True
except:
  # Use brute-force search to set up retrieval using the trained representations.
  index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
  index.index_from_dataset(
    movies.batch(100).map(lambda title: (title, model.movie_model(title))))


# Get recommendations.
_, titles = index(np.array(["42"]))
print(f"Top 3 recommendations for user 42: {titles[0, :3]}")


index.save(MODEL_DIR, options=tf.saved_model.SaveOptions(namespace_whitelist=["Scann"]) if is_scann else None)
print(f"Model saved to {MODEL_DIR}")

## Steps

### Preparation

First, you need to set up your environment variables for Google Cloud Platform (GCP):

export PROJECT=<your project>
export REGION=asia-northeast1
export REPOSITORY=ml-training
export IMAGE=movielens-retrieve
export IMAGE_TAG=0.0.1

### Model Training

You can train your model locally, on Cloud Run, or as a Custom Job on Vertex AI. Below, we will outline how to create a custom container for our model training, which can be used in both Cloud Run and Vertex AI Custom Job environments.

#### Create a Custom Container for Model Training

1. Create a cloudbuild.yaml file for building the Docker image:

steps:
  # build image for x86_64（amd64）platform
  - name: 'gcr.io/cloud-builders/docker'
    args:
      - buildx
      - build
      - --platform
      - linux/amd64
      - -t
      - ${_REGION}-docker.pkg.dev/${_PROJECT}/${_REPOSITORY}/${_IMAGE_NAME}:${_IMAGE_TAG}
      - .
    env:
      - 'DOCKER_CLI_EXPERIMENTAL=enabled'
  # push image to GAR
  - name: 'gcr.io/cloud-builders/docker'
    args:
      - push
      - ${_REGION}-docker.pkg.dev/${_PROJECT}/${_REPOSITORY}/${_IMAGE_NAME}:${_IMAGE_TAG}

images:
  - ${_REGION}-docker.pkg.dev/${_PROJECT}/${_REPOSITORY}/${_IMAGE_NAME}:${_IMAGE_TAG}

2. Create a Dockerfile for your training environment:

FROM python:3.12-slim

RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    libhdf5-dev \
    pkg-config \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

# this is necessary ref: https://github.com/tensorflow/recommenders/issues/712
ENV TF_USE_LEGACY_KERAS=1

WORKDIR /app

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt
COPY retrieve.py .

CMD ["python", "retrieve.py"]

3. Submit the build using the following command:

gcloud builds submit \
    --config "cloudbuild.yaml" \
    --project "${PROJECT}" \
    --substitutions="_IMAGE_TAG=${IMAGE_TAG},_IMAGE_NAME=${IMAGE},_REPOSITORY=${REPOSITORY},_REGION=${REGION},_PROJECT=${PROJECT}" \
    --gcs-source-staging-dir="gs://${PROJECT}-cloudbuild/source"

#### Training Option1: Cloud Run

You can train a model on Cloud Run, which is suitable when the model you want to train is not very big and no need of distributed training. The final model is stored in the GCS bucket that is specified by the environment variable AIP_MODEL_DIR.

1. Deploy the Cloud Run Job:

gcloud run jobs deploy ml-training-movielens-retrieve \
  --memory 4Gi \
  --cpu 2 \
  --image "$REGION-docker.pkg.dev/$PROJECT/$REPOSITORY/$IMAGE:$IMAGE_TAG" \
  --set-env-vars=AIP_MODEL_DIR=gs://${PROJECT}-ml-training/movielens/cloudrun/model-output \
  --set-env-vars=TF_USE_LEGACY_KERAS=1 \
  --max-retries 0 \
  --region $REGION \
  --project $PROJECT

2. Execute the Cloud Run Job:

gcloud run jobs execute ml-training-movielens-retrieve --region $REGION --project $PROJECT

3. Check the output in your specified GCS bucket:

gcloud storage ls "gs://${PROJECT}-ml-training/movielens/cloudrun/model-output/"

gs://PROJECT-ml-training/movielens/cloudrun/model-output/

gs://PROJECT-ml-training/movielens/cloudrun/model-output/:
gs://PROJECT-ml-training/movielens/cloudrun/model-output/
gs://PROJECT-ml-training/movielens/cloudrun/model-output/fingerprint.pb
gs://PROJECT-ml-training/movielens/cloudrun/model-output/keras_metadata.pb
gs://PROJECT-ml-training/movielens/cloudrun/model-output/saved_model.pb
gs://PROJECT-ml-training/movielens/cloudrun/model-output/assets/
gs://PROJECT-ml-training/movielens/cloudrun/model-output/model/
gs://PROJECT-ml-training/movielens/cloudrun/model-output/variables/

#### Training Option2: Vertex AI Custom Job

We can also choose Vertex AI Custom Job for training a model. The final output of this training job is the model on the specified GCS bucket.

1. Create a Vertex AI config template:

# https://cloud.google.com/vertex-ai/docs/reference/rest/v1/CustomJobSpec
workerPoolSpecs:
  machineSpec:
    machineType: n2-standard-2
  replicaCount: 1
  containerSpec:
    imageUri: $REGION-docker.pkg.dev/$PROJECT/$REPOSITORY/$IMAGE:$IMAGE_TAG
baseOutputDirectory:
  outputUriPrefix: gs://${PROJECT}-ml-training/movielens/vertexai/model-output/

Specify the output GCS path with baseOutputDirectory.outputUriPrefix field.

2. Generate the config file using environment variables:

envsubst < tensorflow/examples/movielens/vertexaiconfig.template.yaml > tensorflow/examples/movielens/vertexaiconfig.yaml

3. Create and run the Vertex AI Custom Job:

gcloud ai custom-jobs create --region=$REGION --display-name="movielens-retrieve" --config=tensorflow/examples/movielens/vertexaiconfig.yaml --project $PROJECT

### Model Registration

Once your model is trained and saved in GCS, you can register it in the Vertex AI Model Registry:

gcloud ai models upload \
  --region=$REGION \
  --display-name=movielens-retrieve \
  --container-image-uri=asia-docker.pkg.dev/vertex-ai-restricted/prediction/tf_opt-cpu.nightly:latest \
  --artifact-uri=gs://${PROJECT}-ml-training/movielens/vertexai/model-output/model/ \
  --project=$PROJECT

### Model Deployment

To deploy your model to provide a prediction API, you’ll need to create an endpoint for serving.

1. Create the endpoint by specifying the display name:

gcloud ai endpoints create \
  --region=$REGION \
  --display-name=movielens-retrieve --project $PROJECT

After creating, you can check the endpoint:

gcloud ai endpoints create \
  --region=$REGION \
  --display-name=movielens-retrieve --project $PROJECT

2. Deploy the model to the endpoint:

ENDPOINT=$(gcloud ai endpoints list --region=$REGION --filter=display_name=movielens-retrieve --project $PROJECT --format="json(name)" | jq -r '.[0].name')
MODEL_ID=$(gcloud ai models list --filter=display_name=movielens-retrieve --region $REGION --project $PROJECT --format 'json(name)' | jq -r '.[0].name' | sed 's/.*\/\(\d*\)/\1/')

You can choose machine type when you deploy. You can set a small instance to save the cost.

gcloud ai endpoints deploy-model $ENDPOINT \
  --region=$REGION \
  --model=$MODEL_ID \
  --display-name=movielens-retrieve \
  --machine-type=n2-standard-2 \
  --min-replica-count=1 \
  --max-replica-count=1 \
  --traffic-split=0=100 \
  --project $PROJECT

(This takes 5~10 mins)

### Prediction

To make predictions using your deployed model, prepare your input data:

{
    "instances": [
         "42"
    ]
}

Use the following commands to obtain the Endpoint ID and send a prediction request:

ENDPOINT_ID=$(gcloud ai endpoints list --region=$REGION --filter=display_name=movielens-retrieve --project $PROJECT --format="json(name)" | jq -r '.[0].name' | sed 's/.*\/\(\d*\)/\1/')
INPUT_DATA_FILE=tensorflow/examples/movielens/input_data_file.json

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://$REGION-aiplatform.googleapis.com/v1/projects/$PROJECT/locations/$REGION/endpoints/$ENDPOINT_ID:predict" \
-d "@${INPUT_DATA_FILE}"

The result would be like the following

{
  "predictions": [
    {
      "output_2": [
        "Rent-a-Kid (1995)",
        "Far From Home: The Adventures of Yellow Dog (1995)",
        "Just Cause (1995)",
        "Land Before Time III: The Time of the Great Giving (1995) (V)",
        "Nell (1994)",
        "Two if by Sea (1996)",
        "Jack (1996)",
        "Panther (1995)",
        "House Arrest (1996)",
        "Conan the Barbarian (1981)"
      ],
      "output_1": [
        3.94025946,
        3.47775483,
        3.4017539,
        3.32554197,
        2.95510435,
        2.63177681,
        2.61488819,
        2.61403036,
        2.58744907,
        2.54093599
      ]
    }
  ],
  "deployedModelId": "535000367843246080",
  "model": "projects/xxxx/locations/asia-northeast1/models/2548324905556901888",
  "modelDisplayName": "movielens-retrieve",
  "modelVersionId": "1"
}

You can also test on the GCP console:

Note: If you encounter issues related to dedicated endpoints, refer to the dedicated DNS instructions provided in the original content.

{
  "error": {
    "code": 400,
    "message": "This endpoint is a dedicated endpoint via CloudESF and cannot be accessed through the Vertex AI API. Please access the endpoint using its dedicated dns name 'xxx.asia-northeast1-xxx.prediction.vertexai.goog'",
    "status": "FAILED_PRECONDITION"
  }
}

You can get the dedicated dns:

DEDICATED_DNS=$(gcloud ai endpoints describe $ENDPOINT \
   --project=$PROJECT \
   --region=$REGION --format json | jq -r '.dedicatedEndpointDns')

Send a request with the dedicated endpoint:

curl \
    -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://$DEDICATED_DNS/v1/projects/$PROJECT/locations/$REGION/endpoints/$ENDPOINT_ID:predict" \
    -d "@${INPUT_DATA_FILE}"

## Ranking Model

We’ve completed deploying the retrieval model and making the prediction API available. In the same way, we can deploy the ranking model to Vertex AI and get a prediction.

Sample input:

{
    "instances": [
        {
            "user_id": "42",
            "movie_title": "M*A*S*H (1970)"
        },
        {
            "user_id": "42",
            "movie_title": "Dances with Wolves (1990)"
        },
        {
            "user_id": "42",
            "movie_title": "Speed (1994)"
        }
    ]
}

Sample result:

{
  "predictions": [
    [
      3.67746091
    ],
    [
      3.71581745
    ],
    [
      3.5969708
    ]
  ],
  "deployedModelId": "3616799519104040960",
  "model": "projects/xxxx/locations/asia-northeast1/models/7920415573568126976",
  "modelDisplayName": "movielens-rank",
  "modelVersionId": "1"
}

## Summary

In this article, we successfully walked through the process of building a recommendation system using Vertex AI, from model training to deployment and prediction.

Feel free to explore other recommendation options within Vertex AI, such as using the Vertex AI Agent Builder for managed models or utilizing various model types suited for your business needs.