Set up Vertex AI Workbench with access to BigQuery and GCS using Terraform
Overview
Google Cloud Platform provides a managed Jupyter Notebook (Vertex AI Workbench)that you can set up in a few minutes. The access to BigQuery and GCS from Workbench is also very important for analysts. In this post, I will share the fastest way to set up a minimum data analysis platform using IaC.
Vertex AI Workbench
To set up a workbench, we use the Terraform resource google_notebooks_instance. In this resource, you need to specify several things.
- Instance name. e.g. test-notebook
- GCP project name. e.g. gcp-test-project
- Location. e.g. asia-northeast1-b
- machine_type. e.g. n1-standard-1. You can choose one from the full list here.
- install_gpu_drive. e.g. false
- instance_owners: list the owners of the instance. If you want to restrict who can use the instance, you can specify who to use the instance.
- service_account: the service account to grant the notebooks instance to use other GCP services such as GCS or BigQuery. We’ll grant permission to access to GCS and BigQuery later. (Optional) If you don’t specify one, the Compute Engine default service account is used.
- vm_image: need to specify a vm image either by image_name or image_family. also need to specify project. You can check the available images in deeplearning-platform-release project. (choose a vm image)
- network and subnet: if you want to specify your own VPC and subnet, you can specify here. (optional)
This is a very basic configuration.
You can create a notebook instance with which you can access to GCS as the the Compute Engine default service account has a very strong permission (Editor role) by default.
Service Account
If we want to flexibly configure permission, we need to define service account for a notebook instance. The following is an example:
I create a service account workbench-default with the following roles:
- roles/bigquery.dataViewer: View data in bigquery
- roles/bigquery.jobUser: Run a BigQuery job
- roles/storage.objectViewer: View GCP object.
- custom role storageReader: Custom role to grant “storage.buckets.get” permission, which is required when reading GCS data via Python library.
BigQuery settings
To allow a service account to access to BigQuery, we can control access for datasets, tables, and views.
- https://cloud.google.com/bigquery/docs/dataset-access-controls
- https://cloud.google.com/bigquery/docs/table-access-controls
When you create a dataset, you can add a service account in access block or you can use google_bigquery_dataset_access if you have to grant the permission to many service accounts or users.
Summary
We can create a Vertex AI Workbench with Terraform which allows to use GCS and BigQuery. Some configuration is confusing. e.g. location in different Terraform resources means differently. Hopefully you can start your setup based on the examples above.