How your data flows
Five steps, fully automated, end-to-end encrypted.
Uses SQLAlchemy + pyodbc (ODBC Driver 18)
Credentials fetched from Secret Manager at runtime
Read-only access — no writes to your database
Up to 5 tables processed concurrently
30+ SQL Server types mapped automatically
Handles datetime, decimal, UUID, boolean, strings
Builds explicit Arrow schema with lineage metadata
Processed in ephemeral Cloud Run containers
Private bucket — no public access
Each run gets a unique path prefix
Files auto-deleted after successful load
7-day lifecycle policy as safety net
Uses your tenant-scoped service account
WRITE_TRUNCATE mode (full table refresh)
Schema includes type conversion lineage
Dataset auto-created if missing
Both raw and transformed Parquet blobs removed
Deletion verified per-file with error logging
GCS lifecycle deletes orphans after 7 days
Full audit trail in pipeline run history
Runtime
Cloud Run
Memory
4 GB
CPUs
2 vCPUs
Timeout
60 min