Back to blog
Cloud & DevOpsintermediate

GCP — Google Cloud Platform for Engineers

Production GCP guide — compute (Cloud Run, GKE, Compute Engine), data (BigQuery, Cloud SQL, Firestore, Pub/Sub, Dataflow), AI/ML (Vertex AI), IAM, networking, and Cloud Monitoring. With Python and gcloud examples throughout.

SystemForgeApril 18, 20269 min read
GCPGoogle CloudBigQueryCloud RunVertex AIKubernetesPub/SubPythonDevOps
Share:𝕏

Google Cloud Platform is the cloud provider where Google runs Search, YouTube, and Gmail — which means its data and AI infrastructure is genuinely battle-tested at planetary scale. GCP's differentiators are BigQuery (serverless petabyte analytics), Cloud Run (the best serverless container platform), and Vertex AI (unified ML platform backed by Google's TPU infrastructure).


GCP Architecture: Hierarchy

Organization (e.g., company.com)
  └── Folders (by department or environment)
        └── Projects (billing, resource, and permission boundary)
              ├── APIs enabled per project
              ├── Service accounts
              ├── Resources (VMs, buckets, databases...)
              └── IAM policies

Everything in GCP belongs to a Project. Projects are the unit of billing, API enablement, and permission boundary. For production environments, use separate projects for dev/staging/production and a shared VPC or VPC Service Controls to govern cross-project access.


IAM — Identity and Access Management

GCP IAM uses roles (collections of permissions) bound to principals (users, groups, or service accounts) at resource levels (organization, folder, project, resource):

Bash
# Grant a role to a service account on a project
gcloud projects add-iam-policy-binding my-project \
    --member="serviceAccount:my-sa@my-project.iam.gserviceaccount.com" \
    --role="roles/bigquery.dataViewer"

# Create a service account
gcloud iam service-accounts create api-service-account \
    --display-name="API Service Account" \
    --project=my-project

# Grant service account access to a specific resource (e.g., a GCS bucket)
gcloud storage buckets add-iam-policy-binding gs://my-bucket \
    --member="serviceAccount:api-sa@my-project.iam.gserviceaccount.com" \
    --role="roles/storage.objectViewer"

# Generate a key (prefer Workload Identity over key files in production)
gcloud iam service-accounts keys create key.json \
    --iam-account=api-sa@my-project.iam.gserviceaccount.com

Key IAM Roles

| Role | Use case | |------|---------| | roles/viewer | Read-only access to all resources | | roles/editor | Read-write, no IAM changes | | roles/owner | Full access including IAM | | roles/bigquery.dataViewer | Read BigQuery datasets | | roles/bigquery.jobUser | Run BigQuery jobs | | roles/storage.objectAdmin | Full GCS bucket access | | roles/run.invoker | Invoke Cloud Run services | | roles/cloudsql.client | Connect to Cloud SQL |


Compute: Cloud Run

Cloud Run runs containerised applications serverlessly — you push a Docker image, it handles scaling (including to zero), HTTPS, and traffic management:

Bash
# Build and push image to Google Artifact Registry
gcloud auth configure-docker us-central1-docker.pkg.dev

docker build -t us-central1-docker.pkg.dev/my-project/my-repo/api:v1 .
docker push us-central1-docker.pkg.dev/my-project/my-repo/api:v1

# Deploy to Cloud Run
gcloud run deploy api-service \
    --image=us-central1-docker.pkg.dev/my-project/my-repo/api:v1 \
    --region=us-central1 \
    --platform=managed \
    --allow-unauthenticated \     # public endpoint
    --min-instances=1 \          # avoid cold starts in production
    --max-instances=100 \
    --memory=512Mi \
    --cpu=1 \
    --concurrency=80 \           # requests per container instance
    --set-env-vars="ENVIRONMENT=production" \
    --set-secrets="DB_PASSWORD=db-password:latest"   # from Secret Manager
Python
# Python service on Cloud Run  reads GCP config automatically
import os
from flask import Flask, jsonify
import google.cloud.logging

app = Flask(__name__)

# Structured logging to Cloud Logging
client = google.cloud.logging.Client()
client.setup_logging()

@app.route("/")
def index():
    return jsonify({"status": "healthy", "region": os.environ.get("CLOUD_RUN_REGION")})

@app.route("/process", methods=["POST"])
def process():
    # Service account credential is automatic in Cloud Run
    from google.cloud import bigquery
    bq = bigquery.Client()    # uses Cloud Run's service account identity
    # ...
    return jsonify({"status": "ok"})

if __name__ == "__main__":
    port = int(os.environ.get("PORT", 8080))
    app.run(host="0.0.0.0", port=port)

BigQuery — Serverless Data Warehouse

BigQuery is GCP's serverless, petabyte-scale analytics warehouse. You pay per byte scanned (or flat-rate slots) — no cluster to manage.

SQL
-- BigQuery uses project.dataset.table naming
SELECT
    customer_id,
    COUNT(*)                  AS order_count,
    SUM(total_amount)         AS lifetime_value,
    AVG(total_amount)         AS avg_order,
    MAX(created_at)           AS last_order_date
FROM `my-project.analytics.orders`
WHERE DATE(created_at) >= DATE_SUB(CURRENT_DATE(), INTERVAL 365 DAY)
  AND status = 'completed'
GROUP BY customer_id
HAVING SUM(total_amount) > 1000
ORDER BY lifetime_value DESC
LIMIT 100;

BigQuery Partitioning and Clustering

SQL
-- Create partitioned + clustered table (standard BigQuery best practice)
CREATE TABLE `my-project.analytics.events`
(
    event_id    STRING,
    event_type  STRING,
    user_id     STRING,
    payload     JSON,
    created_at  TIMESTAMP
)
PARTITION BY DATE(created_at)      -- query only the partitions you need
CLUSTER BY event_type, user_id     -- within each partition, colocate related rows
OPTIONS (
    partition_expiration_days = 365,
    require_partition_filter = true   -- prevents accidental full-table scans
);

-- Partitioned query: BigQuery scans only matching partitions
SELECT * FROM `my-project.analytics.events`
WHERE DATE(created_at) = '2026-04-18'   -- partition filter required
  AND event_type = 'purchase';

BigQuery Python Client

Python
from google.cloud import bigquery
from google.cloud.bigquery import QueryJobConfig, ScalarQueryParameter
import pandas as pd

client = bigquery.Client(project="my-project")

# Parameterised query (prevents SQL injection)
query = """
    SELECT user_id, SUM(amount) AS total_spend
    FROM `my-project.analytics.transactions`
    WHERE DATE(created_at) BETWEEN @start_date AND @end_date
    GROUP BY user_id
    ORDER BY total_spend DESC
    LIMIT @limit
"""
job_config = QueryJobConfig(
    query_parameters=[
        ScalarQueryParameter("start_date", "DATE", "2026-01-01"),
        ScalarQueryParameter("end_date",   "DATE", "2026-04-18"),
        ScalarQueryParameter("limit",      "INT64", 100),
    ]
)

df = client.query(query, job_config=job_config).to_dataframe()

# Stream data into BigQuery (for real-time ingestion)
table_id = "my-project.analytics.events"
rows = [
    {"event_id": "e1", "event_type": "click", "user_id": "u123",
     "created_at": "2026-04-18T10:00:00Z"},
]
errors = client.insert_rows_json(table_id, rows)
if errors:
    print(f"Streaming insert errors: {errors}")

# Load from GCS
job = client.load_table_from_uri(
    "gs://my-bucket/events/*.parquet",
    "my-project.analytics.events",
    job_config=bigquery.LoadJobConfig(
        source_format=bigquery.SourceFormat.PARQUET,
        write_disposition=bigquery.WriteDisposition.WRITE_APPEND,
    )
)
job.result()   # wait for completion

Cloud Storage (GCS)

Python
from google.cloud import storage

client = storage.Client()

# Upload a file
bucket = client.bucket("my-production-bucket")
blob = bucket.blob("reports/2026-04/revenue.csv")
blob.upload_from_filename("revenue.csv",
                           content_type="text/csv")

# Generate a signed URL (temporary access without authentication)
url = blob.generate_signed_url(
    expiration=3600,    # seconds
    method="GET",
    version="v4"
)

# Stream-download a large file
with blob.open("rb") as f:
    for chunk in iter(lambda: f.read(4096), b""):
        process(chunk)

# List objects with a prefix
for blob in client.list_blobs("my-bucket", prefix="reports/2026-04/"):
    print(f"{blob.name} — {blob.size / 1e6:.1f} MB")

Cloud SQL — Managed Relational Databases

Cloud SQL manages PostgreSQL, MySQL, and SQL Server — automated backups, replicas, and patches:

Bash
# Create a Cloud SQL PostgreSQL instance
gcloud sql instances create prod-postgres \
    --database-version=POSTGRES_15 \
    --tier=db-custom-4-15360 \           # 4 vCPU, 15 GB RAM
    --region=us-central1 \
    --availability-type=REGIONAL \       # high-availability with automatic failover
    --storage-size=100GB \
    --storage-auto-increase \
    --backup-start-time=03:00 \
    --maintenance-window-day=SUN \
    --maintenance-window-hour=04

# Enable Cloud SQL Auth Proxy (recommended  no public IP needed)
# The proxy handles authentication and encryption
./cloud-sql-proxy my-project:us-central1:prod-postgres --port=5432
Python
# Connect via Cloud SQL Connector (Python)  no proxy binary needed
from google.cloud.sql.connector import Connector
import sqlalchemy

connector = Connector()

def get_connection():
    return connector.connect(
        "my-project:us-central1:prod-postgres",
        "pg8000",
        user="api_user",
        password=os.environ["DB_PASSWORD"],
        db="production"
    )

engine = sqlalchemy.create_engine(
    "postgresql+pg8000://",
    creator=get_connection,
    pool_size=5,
    max_overflow=2,
    pool_timeout=30,
    pool_recycle=1800
)

Pub/Sub — Messaging and Event Streaming

Python
from google.cloud import pubsub_v1
import json

project_id = "my-project"

# PUBLISHER
publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path(project_id, "order-events")

def publish_order_event(order: dict):
    data = json.dumps(order).encode("utf-8")
    future = publisher.publish(
        topic_path,
        data,
        event_type="order_placed",       # message attributes for filtering
        customer_id=order["customer_id"]
    )
    message_id = future.result()
    print(f"Published {message_id}")

# SUBSCRIBER (pull-based)
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path(project_id, "order-events-sub")

def process_message(message: pubsub_v1.types.ReceivedMessage):
    data = json.loads(message.data.decode("utf-8"))
    print(f"Processing order: {data['order_id']}")
    # process...
    message.ack()      # ack removes from subscription

streaming_pull_future = subscriber.subscribe(
    subscription_path,
    callback=process_message,
    flow_control=pubsub_v1.types.FlowControl(max_messages=10)
)

with subscriber:
    try:
        streaming_pull_future.result(timeout=300)
    except Exception:
        streaming_pull_future.cancel()

Pub/Sub Patterns

| Pattern | Use | |---------|-----| | Fan-out | One topic, many subscriptions — each subscriber gets every message | | Filter subscriptions | Subscription-level message attribute filter — reduces wasted processing | | Dead-letter topics | Undeliverable messages forwarded after N retries | | Ordering | Message ordering key + ordered delivery guarantee per key | | Push to Cloud Run | Pub/Sub delivers to an HTTPS endpoint — no polling needed |


Dataflow — Managed Apache Beam

Dataflow runs Apache Beam pipelines — batch and streaming data processing at scale without managing infrastructure:

Python
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.io.gcp.bigquery import WriteToBigQuery

def run():
    options = PipelineOptions(
        runner="DataflowRunner",
        project="my-project",
        region="us-central1",
        staging_location="gs://my-bucket/staging",
        temp_location="gs://my-bucket/temp",
        job_name="orders-etl",
        save_main_session=True,
    )

    with beam.Pipeline(options=options) as p:
        (
            p
            | "Read from Pub/Sub" >> beam.io.ReadFromPubSub(
                subscription="projects/my-project/subscriptions/order-events-sub"
              ).with_output_types(bytes)
            | "Parse JSON" >> beam.Map(lambda x: json.loads(x.decode("utf-8")))
            | "Filter completed" >> beam.Filter(
                lambda order: order["status"] == "completed"
              )
            | "Transform" >> beam.Map(lambda order: {
                "order_id":    order["order_id"],
                "customer_id": order["customer_id"],
                "amount":      float(order["amount"]),
                "date":        order["created_at"][:10],
              })
            | "Write to BigQuery" >> WriteToBigQuery(
                table="my-project:analytics.completed_orders",
                write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
                create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
                schema="order_id:STRING,customer_id:STRING,amount:FLOAT,date:DATE"
              )
        )

Vertex AI — Unified ML Platform

Vertex AI consolidates Google's ML services — training, prediction, model registry, feature store, and Gemini:

Python
from google.cloud import aiplatform
from vertexai.generative_models import GenerativeModel, GenerationConfig
import vertexai

vertexai.init(project="my-project", location="us-central1")

# Use Gemini Pro
model = GenerativeModel("gemini-2.0-flash")

response = model.generate_content(
    "Explain Pub/Sub fan-out pattern in 2 sentences.",
    generation_config=GenerationConfig(
        temperature=0.3,
        max_output_tokens=200,
    )
)
print(response.text)

# Streaming
for chunk in model.generate_content("Write a data pipeline design...", stream=True):
    print(chunk.text, end="", flush=True)

Vertex AI Endpoints — Custom Model Serving

Python
from google.cloud import aiplatform

aiplatform.init(project="my-project", location="us-central1")

# Upload a trained model
model = aiplatform.Model.upload(
    display_name="churn-predictor-v3",
    artifact_uri="gs://my-bucket/models/churn-v3/",
    serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-3:latest",
)

# Deploy to an endpoint
endpoint = aiplatform.Endpoint.create(display_name="churn-prediction-endpoint")
model.deploy(
    endpoint=endpoint,
    deployed_model_display_name="churn-v3",
    machine_type="n1-standard-2",
    min_replica_count=1,
    max_replica_count=5,
    traffic_percentage=100,
)

# Predict
response = endpoint.predict(instances=[
    {"age": 35, "tenure_months": 24, "monthly_spend": 89.5}
])
print(response.predictions)

Cloud Monitoring and Logging

Python
from google.cloud import monitoring_v3
from google.cloud.monitoring_v3 import query
import time

client = monitoring_v3.MetricServiceClient()
project_name = f"projects/my-project"

# Write a custom metric
series = monitoring_v3.TimeSeries()
series.metric.type = "custom.googleapis.com/api/order_processing_latency"
series.metric.labels["endpoint"] = "/orders"
series.resource.type = "global"

now = time.time()
interval = monitoring_v3.TimeInterval(
    {"end_time": {"seconds": int(now), "nanos": 0}}
)
point = monitoring_v3.Point(
    {"interval": interval, "value": {"double_value": 123.5}}
)
series.points = [point]

client.create_time_series(name=project_name, time_series=[series])
Python
# Structured logging to Cloud Logging (auto-correlates with traces)
import google.cloud.logging
import logging

log_client = google.cloud.logging.Client()
log_client.setup_logging()

logger = logging.getLogger(__name__)

# Structured log entry
logger.info("Order processed", extra={
    "json_fields": {
        "order_id": "ORD-123",
        "customer_id": "C-456",
        "latency_ms": 45,
        "status": "completed"
    }
})

GCP vs AWS vs Azure

| | GCP | AWS | Azure | |--|-----|-----|-------| | Serverless containers | Cloud Run (best-in-class) | App Runner / Fargate | Azure Container Apps | | Kubernetes | GKE Autopilot | EKS | AKS | | Data warehouse | BigQuery (serverless) | Redshift | Azure Synapse | | ML platform | Vertex AI + Gemini | SageMaker + Bedrock | Azure ML + Azure OpenAI | | Messaging | Pub/Sub | SQS / SNS / EventBridge | Service Bus / Event Grid | | Object storage | Cloud Storage | S3 | Azure Blob Storage | | Managed databases | Cloud SQL, Spanner, Firestore | RDS, Aurora, DynamoDB | Azure SQL, Cosmos DB | | Global network | Lowest latency (Google's private backbone) | Extensive regional coverage | Strong Azure backbone | | Strengths | Data analytics, ML, containers | Breadth, ecosystem, serverless | Microsoft integration, hybrid |


Related: Kubernetes Deep Dive — Platform-agnostic Kubernetes
Related: Databricks Guide — Multi-cloud data platform
Related: Azure Cloud Integration — Azure data services

Enjoyed this article?

Explore the Cloud & DevOps learning path for more.

Found this helpful?

Share:𝕏

Leave a comment

Have a question, correction, or just found this helpful? Leave a note below.