Query Firetiger with BigQuery
Note: Customers whose Firetiger deployment lives on GCP already have BigQuery configured. This guide is for Firetiger on AWS only.
Firetiger Iceberg tables can be queried directly from BigQuery without ETL. This process requires frequent updates, as Google hasn’t added support for external Iceberg REST catalog services.
In our example, a typical workstation environment is assumed; production deployments will be different.
Credentials
Establish both AWS and GCP credentials.
The firetiger command uses the common SDKs to do this.
For GCP, run a command like gcloud auth login --update-adc.
Learn more: Set up Application Default Credentials.
For AWS, we’ve run a command like aws sso login.
Learn more: Authentication and access using AWS SKDs and tools.
GCP Configuration
Use this information to configure the example env vars GOOGLE_CLOUD_PROJECT, FT_BIGQUERY_LOCATION, FT_BIGQUERY_CONNECTION.
Identify the GCP project where BigQuery lives.
Specifically, a “PROJECT_ID” found in the left column returned by gcloud projects list.
Identify the BigQuery location, which is similar to a region. Specifically, a “region name” matching your Firetiger AWS region in the list of “BigQuery Omni locations”.
Create a BigQuery Connection in the GCP web console. Navigate to the BigQuery Studio. Under the “Explorer” tab (the icon looks like a compass), click “Connections”, then “Create connection”.
- Connection type: “Vertex AI remote models…”
- Connection ID: You decide, something like
firetigeris fine. - Location type: “Multi-region”
- Multi-region: Whichever option matches the BigQuery location from earlier.
AWS Configuration
Use this information to configure the example env vars AWS_PROFILE, FT_CATALOG, FT_NAMESPACE.
Identify the AWS profile.
Specifically, one of the alternatives returned by aws configure list-profiles.
Identify the Iceberg catalog URI.
If you don’t know, then use glue://.
Identify the Iceberg namespace where your Firetiger tables live.
This is the Firetiger “deployment name” provided by Firetiger, the DEPLOYMENT_NAME placeholder in https://ui.DEPLOYMENT_NAME.firetigerapi.com/.
Other Configuration
FT_MAX_CONCURRENCY limits table sync concurrency
For example, 10 allows 10 tables to be synced concurrently.
FT_TIMEOUT causes the process to exit after this duration.
The value is a Golang time.Duration strings, such as 30s or 5m.
OTEL_*_EXPORTER are OpenTelemetry SDK environment variables.
Learn more: SDK Environment Variables
FT_LOG_LEVEL is a Golang slog level, one of ERROR, WARN, INFO, DEBUG.
docker run \
--rm \
--name firetiger-bigquery-sync \
-v $HOME/.config/gcloud/application_default_credentials.json:/root/.config/gcloud/application_default_credentials.json \
-v $HOME/.aws/:/root/.aws \
-e GOOGLE_CLOUD_PROJECT="$GOOGLE_CLOUD_PROJECT" \
-e FT_BIGQUERY_LOCATION="$FT_BIGQUERY_LOCATION" \
-e FT_BIGQUERY_CONNECTION="$FT_BIGQUERY_CONNECTION" \
-e AWS_PROFILE="$AWS_PROFILE" \
-e FT_CATALOG="$FT_CATALOG" \
-e FT_NAMESPACE="$FT_NAMESPACE" \
-e FT_MAX_CONCURRENCY=10 \
-e FT_TIMEOUT=1m \
-e OTEL_LOGS_EXPORTER=none \
-e OTEL_METRICS_EXPORTER=none \
-e OTEL_TRACES_EXPORTER=none \
-e FT_LOG_LEVEL="INFO" \
public.ecr.aws/firetiger/firetiger \
gcp bigquery sync