How your data flows

Five steps, fully automated, end-to-end encrypted.

Uses SQLAlchemy + pyodbc (ODBC Driver 18)
Credentials fetched from Secret Manager at runtime
Read-only access — no writes to your database
Up to 5 tables processed concurrently

30+ SQL Server types mapped automatically
Handles datetime, decimal, UUID, boolean, strings
Builds explicit Arrow schema with lineage metadata
Processed in ephemeral Cloud Run containers

Private bucket — no public access
Each run gets a unique path prefix
Files auto-deleted after successful load
7-day lifecycle policy as safety net

Uses your tenant-scoped service account
WRITE_TRUNCATE mode (full table refresh)
Schema includes type conversion lineage
Dataset auto-created if missing

Both raw and transformed Parquet blobs removed
Deletion verified per-file with error logging
GCS lifecycle deletes orphans after 7 days
Full audit trail in pipeline run history
Runtime
Cloud Run
Memory
4 GB
CPUs
2 vCPUs
Timeout
60 min
Next: Review security