GCP — Google Cloud Platform for Engineers
Production GCP guide — compute (Cloud Run, GKE, Compute Engine), data (BigQuery, Cloud SQL, Firestore, Pub/Sub, Dataflow), AI/ML (Vertex AI), IAM, networking, and Cloud Monitoring. With Python and gcloud examples throughout.
Google Cloud Platform is the cloud provider where Google runs Search, YouTube, and Gmail — which means its data and AI infrastructure is genuinely battle-tested at planetary scale. GCP's differentiators are BigQuery (serverless petabyte analytics), Cloud Run (the best serverless container platform), and Vertex AI (unified ML platform backed by Google's TPU infrastructure).
GCP Architecture: Hierarchy
Organization (e.g., company.com)
└── Folders (by department or environment)
└── Projects (billing, resource, and permission boundary)
├── APIs enabled per project
├── Service accounts
├── Resources (VMs, buckets, databases...)
└── IAM policiesEverything in GCP belongs to a Project. Projects are the unit of billing, API enablement, and permission boundary. For production environments, use separate projects for dev/staging/production and a shared VPC or VPC Service Controls to govern cross-project access.
IAM — Identity and Access Management
GCP IAM uses roles (collections of permissions) bound to principals (users, groups, or service accounts) at resource levels (organization, folder, project, resource):
# Grant a role to a service account on a project
gcloud projects add-iam-policy-binding my-project \
--member="serviceAccount:my-sa@my-project.iam.gserviceaccount.com" \
--role="roles/bigquery.dataViewer"
# Create a service account
gcloud iam service-accounts create api-service-account \
--display-name="API Service Account" \
--project=my-project
# Grant service account access to a specific resource (e.g., a GCS bucket)
gcloud storage buckets add-iam-policy-binding gs://my-bucket \
--member="serviceAccount:api-sa@my-project.iam.gserviceaccount.com" \
--role="roles/storage.objectViewer"
# Generate a key (prefer Workload Identity over key files in production)
gcloud iam service-accounts keys create key.json \
--iam-account=api-sa@my-project.iam.gserviceaccount.comKey IAM Roles
| Role | Use case |
|------|---------|
| roles/viewer | Read-only access to all resources |
| roles/editor | Read-write, no IAM changes |
| roles/owner | Full access including IAM |
| roles/bigquery.dataViewer | Read BigQuery datasets |
| roles/bigquery.jobUser | Run BigQuery jobs |
| roles/storage.objectAdmin | Full GCS bucket access |
| roles/run.invoker | Invoke Cloud Run services |
| roles/cloudsql.client | Connect to Cloud SQL |
Compute: Cloud Run
Cloud Run runs containerised applications serverlessly — you push a Docker image, it handles scaling (including to zero), HTTPS, and traffic management:
# Build and push image to Google Artifact Registry
gcloud auth configure-docker us-central1-docker.pkg.dev
docker build -t us-central1-docker.pkg.dev/my-project/my-repo/api:v1 .
docker push us-central1-docker.pkg.dev/my-project/my-repo/api:v1
# Deploy to Cloud Run
gcloud run deploy api-service \
--image=us-central1-docker.pkg.dev/my-project/my-repo/api:v1 \
--region=us-central1 \
--platform=managed \
--allow-unauthenticated \ # public endpoint
--min-instances=1 \ # avoid cold starts in production
--max-instances=100 \
--memory=512Mi \
--cpu=1 \
--concurrency=80 \ # requests per container instance
--set-env-vars="ENVIRONMENT=production" \
--set-secrets="DB_PASSWORD=db-password:latest" # from Secret Manager# Python service on Cloud Run — reads GCP config automatically
import os
from flask import Flask, jsonify
import google.cloud.logging
app = Flask(__name__)
# Structured logging to Cloud Logging
client = google.cloud.logging.Client()
client.setup_logging()
@app.route("/")
def index():
return jsonify({"status": "healthy", "region": os.environ.get("CLOUD_RUN_REGION")})
@app.route("/process", methods=["POST"])
def process():
# Service account credential is automatic in Cloud Run
from google.cloud import bigquery
bq = bigquery.Client() # uses Cloud Run's service account identity
# ...
return jsonify({"status": "ok"})
if __name__ == "__main__":
port = int(os.environ.get("PORT", 8080))
app.run(host="0.0.0.0", port=port)BigQuery — Serverless Data Warehouse
BigQuery is GCP's serverless, petabyte-scale analytics warehouse. You pay per byte scanned (or flat-rate slots) — no cluster to manage.
-- BigQuery uses project.dataset.table naming
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(total_amount) AS lifetime_value,
AVG(total_amount) AS avg_order,
MAX(created_at) AS last_order_date
FROM `my-project.analytics.orders`
WHERE DATE(created_at) >= DATE_SUB(CURRENT_DATE(), INTERVAL 365 DAY)
AND status = 'completed'
GROUP BY customer_id
HAVING SUM(total_amount) > 1000
ORDER BY lifetime_value DESC
LIMIT 100;BigQuery Partitioning and Clustering
-- Create partitioned + clustered table (standard BigQuery best practice)
CREATE TABLE `my-project.analytics.events`
(
event_id STRING,
event_type STRING,
user_id STRING,
payload JSON,
created_at TIMESTAMP
)
PARTITION BY DATE(created_at) -- query only the partitions you need
CLUSTER BY event_type, user_id -- within each partition, colocate related rows
OPTIONS (
partition_expiration_days = 365,
require_partition_filter = true -- prevents accidental full-table scans
);
-- Partitioned query: BigQuery scans only matching partitions
SELECT * FROM `my-project.analytics.events`
WHERE DATE(created_at) = '2026-04-18' -- partition filter required
AND event_type = 'purchase';BigQuery Python Client
from google.cloud import bigquery
from google.cloud.bigquery import QueryJobConfig, ScalarQueryParameter
import pandas as pd
client = bigquery.Client(project="my-project")
# Parameterised query (prevents SQL injection)
query = """
SELECT user_id, SUM(amount) AS total_spend
FROM `my-project.analytics.transactions`
WHERE DATE(created_at) BETWEEN @start_date AND @end_date
GROUP BY user_id
ORDER BY total_spend DESC
LIMIT @limit
"""
job_config = QueryJobConfig(
query_parameters=[
ScalarQueryParameter("start_date", "DATE", "2026-01-01"),
ScalarQueryParameter("end_date", "DATE", "2026-04-18"),
ScalarQueryParameter("limit", "INT64", 100),
]
)
df = client.query(query, job_config=job_config).to_dataframe()
# Stream data into BigQuery (for real-time ingestion)
table_id = "my-project.analytics.events"
rows = [
{"event_id": "e1", "event_type": "click", "user_id": "u123",
"created_at": "2026-04-18T10:00:00Z"},
]
errors = client.insert_rows_json(table_id, rows)
if errors:
print(f"Streaming insert errors: {errors}")
# Load from GCS
job = client.load_table_from_uri(
"gs://my-bucket/events/*.parquet",
"my-project.analytics.events",
job_config=bigquery.LoadJobConfig(
source_format=bigquery.SourceFormat.PARQUET,
write_disposition=bigquery.WriteDisposition.WRITE_APPEND,
)
)
job.result() # wait for completionCloud Storage (GCS)
from google.cloud import storage
client = storage.Client()
# Upload a file
bucket = client.bucket("my-production-bucket")
blob = bucket.blob("reports/2026-04/revenue.csv")
blob.upload_from_filename("revenue.csv",
content_type="text/csv")
# Generate a signed URL (temporary access without authentication)
url = blob.generate_signed_url(
expiration=3600, # seconds
method="GET",
version="v4"
)
# Stream-download a large file
with blob.open("rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
process(chunk)
# List objects with a prefix
for blob in client.list_blobs("my-bucket", prefix="reports/2026-04/"):
print(f"{blob.name} — {blob.size / 1e6:.1f} MB")Cloud SQL — Managed Relational Databases
Cloud SQL manages PostgreSQL, MySQL, and SQL Server — automated backups, replicas, and patches:
# Create a Cloud SQL PostgreSQL instance
gcloud sql instances create prod-postgres \
--database-version=POSTGRES_15 \
--tier=db-custom-4-15360 \ # 4 vCPU, 15 GB RAM
--region=us-central1 \
--availability-type=REGIONAL \ # high-availability with automatic failover
--storage-size=100GB \
--storage-auto-increase \
--backup-start-time=03:00 \
--maintenance-window-day=SUN \
--maintenance-window-hour=04
# Enable Cloud SQL Auth Proxy (recommended — no public IP needed)
# The proxy handles authentication and encryption
./cloud-sql-proxy my-project:us-central1:prod-postgres --port=5432# Connect via Cloud SQL Connector (Python) — no proxy binary needed
from google.cloud.sql.connector import Connector
import sqlalchemy
connector = Connector()
def get_connection():
return connector.connect(
"my-project:us-central1:prod-postgres",
"pg8000",
user="api_user",
password=os.environ["DB_PASSWORD"],
db="production"
)
engine = sqlalchemy.create_engine(
"postgresql+pg8000://",
creator=get_connection,
pool_size=5,
max_overflow=2,
pool_timeout=30,
pool_recycle=1800
)Pub/Sub — Messaging and Event Streaming
from google.cloud import pubsub_v1
import json
project_id = "my-project"
# PUBLISHER
publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path(project_id, "order-events")
def publish_order_event(order: dict):
data = json.dumps(order).encode("utf-8")
future = publisher.publish(
topic_path,
data,
event_type="order_placed", # message attributes for filtering
customer_id=order["customer_id"]
)
message_id = future.result()
print(f"Published {message_id}")
# SUBSCRIBER (pull-based)
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path(project_id, "order-events-sub")
def process_message(message: pubsub_v1.types.ReceivedMessage):
data = json.loads(message.data.decode("utf-8"))
print(f"Processing order: {data['order_id']}")
# process...
message.ack() # ack removes from subscription
streaming_pull_future = subscriber.subscribe(
subscription_path,
callback=process_message,
flow_control=pubsub_v1.types.FlowControl(max_messages=10)
)
with subscriber:
try:
streaming_pull_future.result(timeout=300)
except Exception:
streaming_pull_future.cancel()Pub/Sub Patterns
| Pattern | Use | |---------|-----| | Fan-out | One topic, many subscriptions — each subscriber gets every message | | Filter subscriptions | Subscription-level message attribute filter — reduces wasted processing | | Dead-letter topics | Undeliverable messages forwarded after N retries | | Ordering | Message ordering key + ordered delivery guarantee per key | | Push to Cloud Run | Pub/Sub delivers to an HTTPS endpoint — no polling needed |
Dataflow — Managed Apache Beam
Dataflow runs Apache Beam pipelines — batch and streaming data processing at scale without managing infrastructure:
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
from apache_beam.io.gcp.bigquery import WriteToBigQuery
def run():
options = PipelineOptions(
runner="DataflowRunner",
project="my-project",
region="us-central1",
staging_location="gs://my-bucket/staging",
temp_location="gs://my-bucket/temp",
job_name="orders-etl",
save_main_session=True,
)
with beam.Pipeline(options=options) as p:
(
p
| "Read from Pub/Sub" >> beam.io.ReadFromPubSub(
subscription="projects/my-project/subscriptions/order-events-sub"
).with_output_types(bytes)
| "Parse JSON" >> beam.Map(lambda x: json.loads(x.decode("utf-8")))
| "Filter completed" >> beam.Filter(
lambda order: order["status"] == "completed"
)
| "Transform" >> beam.Map(lambda order: {
"order_id": order["order_id"],
"customer_id": order["customer_id"],
"amount": float(order["amount"]),
"date": order["created_at"][:10],
})
| "Write to BigQuery" >> WriteToBigQuery(
table="my-project:analytics.completed_orders",
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
schema="order_id:STRING,customer_id:STRING,amount:FLOAT,date:DATE"
)
)Vertex AI — Unified ML Platform
Vertex AI consolidates Google's ML services — training, prediction, model registry, feature store, and Gemini:
from google.cloud import aiplatform
from vertexai.generative_models import GenerativeModel, GenerationConfig
import vertexai
vertexai.init(project="my-project", location="us-central1")
# Use Gemini Pro
model = GenerativeModel("gemini-2.0-flash")
response = model.generate_content(
"Explain Pub/Sub fan-out pattern in 2 sentences.",
generation_config=GenerationConfig(
temperature=0.3,
max_output_tokens=200,
)
)
print(response.text)
# Streaming
for chunk in model.generate_content("Write a data pipeline design...", stream=True):
print(chunk.text, end="", flush=True)Vertex AI Endpoints — Custom Model Serving
from google.cloud import aiplatform
aiplatform.init(project="my-project", location="us-central1")
# Upload a trained model
model = aiplatform.Model.upload(
display_name="churn-predictor-v3",
artifact_uri="gs://my-bucket/models/churn-v3/",
serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/sklearn-cpu.1-3:latest",
)
# Deploy to an endpoint
endpoint = aiplatform.Endpoint.create(display_name="churn-prediction-endpoint")
model.deploy(
endpoint=endpoint,
deployed_model_display_name="churn-v3",
machine_type="n1-standard-2",
min_replica_count=1,
max_replica_count=5,
traffic_percentage=100,
)
# Predict
response = endpoint.predict(instances=[
{"age": 35, "tenure_months": 24, "monthly_spend": 89.5}
])
print(response.predictions)Cloud Monitoring and Logging
from google.cloud import monitoring_v3
from google.cloud.monitoring_v3 import query
import time
client = monitoring_v3.MetricServiceClient()
project_name = f"projects/my-project"
# Write a custom metric
series = monitoring_v3.TimeSeries()
series.metric.type = "custom.googleapis.com/api/order_processing_latency"
series.metric.labels["endpoint"] = "/orders"
series.resource.type = "global"
now = time.time()
interval = monitoring_v3.TimeInterval(
{"end_time": {"seconds": int(now), "nanos": 0}}
)
point = monitoring_v3.Point(
{"interval": interval, "value": {"double_value": 123.5}}
)
series.points = [point]
client.create_time_series(name=project_name, time_series=[series])# Structured logging to Cloud Logging (auto-correlates with traces)
import google.cloud.logging
import logging
log_client = google.cloud.logging.Client()
log_client.setup_logging()
logger = logging.getLogger(__name__)
# Structured log entry
logger.info("Order processed", extra={
"json_fields": {
"order_id": "ORD-123",
"customer_id": "C-456",
"latency_ms": 45,
"status": "completed"
}
})GCP vs AWS vs Azure
| | GCP | AWS | Azure | |--|-----|-----|-------| | Serverless containers | Cloud Run (best-in-class) | App Runner / Fargate | Azure Container Apps | | Kubernetes | GKE Autopilot | EKS | AKS | | Data warehouse | BigQuery (serverless) | Redshift | Azure Synapse | | ML platform | Vertex AI + Gemini | SageMaker + Bedrock | Azure ML + Azure OpenAI | | Messaging | Pub/Sub | SQS / SNS / EventBridge | Service Bus / Event Grid | | Object storage | Cloud Storage | S3 | Azure Blob Storage | | Managed databases | Cloud SQL, Spanner, Firestore | RDS, Aurora, DynamoDB | Azure SQL, Cosmos DB | | Global network | Lowest latency (Google's private backbone) | Extensive regional coverage | Strong Azure backbone | | Strengths | Data analytics, ML, containers | Breadth, ecosystem, serverless | Microsoft integration, hybrid |
Related: Kubernetes Deep Dive — Platform-agnostic Kubernetes
Related: Databricks Guide — Multi-cloud data platform
Related: Azure Cloud Integration — Azure data services
Enjoyed this article?
Explore the Cloud & DevOps learning path for more.
Found this helpful?
Leave a comment
Have a question, correction, or just found this helpful? Leave a note below.