How your data flows

Five steps, fully automated, end-to-end encrypted.

Uses SQLAlchemy + pyodbc (ODBC Driver 18)

Credentials fetched from Secret Manager at runtime

Read-only access — no writes to your database

Up to 5 tables processed concurrently

30+ SQL Server types mapped automatically

Handles datetime, decimal, UUID, boolean, strings

Builds explicit Arrow schema with lineage metadata

Processed in ephemeral Cloud Run containers

Private bucket — no public access

Each run gets a unique path prefix

Files auto-deleted after successful load

7-day lifecycle policy as safety net

Uses your tenant-scoped service account

WRITE_TRUNCATE mode (full table refresh)

Schema includes type conversion lineage

Dataset auto-created if missing

Both raw and transformed Parquet blobs removed

Deletion verified per-file with error logging

GCS lifecycle deletes orphans after 7 days

Full audit trail in pipeline run history

Runtime

Cloud Run

Memory

4 GB

CPUs

2 vCPUs

Timeout

60 min

Extract Reads tables from your SQL Server in 500K-row chunks via TLS.