Deployment¶
Infrastructure as Code for single-server and Kubernetes deployments on Hetzner.
All deployment files live in deploy/:
deploy/
├── compose/ # Docker Compose (shared by both deployment modes)
│ ├── docker-compose.yml
│ ├── Dockerfile # Multi-stage: Node (frontend) + Python (backend)
│ ├── Caddyfile # Serves landing + app + proxies API
│ └── .env.example
├── landing/ # Static landing page served on the root domain
│ ├── index.html
│ └── favicon.svg
├── seed/ # Stripe test data generation
│ ├── stripe_seed.py
│ └── stripe_fixtures.json
└── terraform/
├── .gitignore
├── single-server/ # Option A: one Hetzner server
│ ├── main.tf
│ ├── variables.tf
│ ├── server.tf
│ ├── firewall.tf
│ ├── dns.tf
│ ├── outputs.tf
│ ├── cloud-init.yml
│ └── terraform.tfvars.example
└── kubernetes/ # Option B: k3s cluster
├── main.tf
├── variables.tf
├── cluster.tf
├── dns.tf
├── app.tf
├── outputs.tf
└── terraform.tfvars.example
Deployment Modes¶
The deployment depends on which connector type is used:
Full Deployment (Stripe / Ingestion Mode) — Primary¶
For Stripe (and any webhook-based connector), the full stack is required: PostgreSQL + Kafka/Redpanda + API + Worker + Caddy (frontend). See Option A (single server) or Option B (Kubernetes) below.
Lago Companion Mode (Same-Database)¶
For Lago or Kill Bill users, the analytics engine can run alongside the billing engine with no additional infrastructure:
┌────────────────────────────────────┐
│ Existing Lago deployment │
│ ┌──────────┐ ┌────────────────┐ │
│ │ Lago │ │ PostgreSQL │ │
│ │ (API) │ │ (shared) │ │
│ └──────────┘ └───────┬────────┘ │
│ │ │
│ ┌─────────────────────┴────────┐ │
│ │ tidemill │ │
│ │ (analytics CLI / API) │ │
│ │ No Kafka. No worker. │ │
│ └──────────────────────────────┘ │
└────────────────────────────────────┘
Services: Just the tidemill container (or pip install tidemill directly). Connects to the billing engine's PostgreSQL.
No Kafka, no worker process, no event bus. The analytics engine queries billing tables directly at request time.
# docker-compose.yml addition for existing Lago deployment
services:
analytics:
image: ghcr.io/ondraz/tidemill:latest
environment:
DATABASE_URL: postgresql://lago:password@postgres/lago
CONNECTOR: lago
ports:
- "8000:8000"
Or simply install and use the CLI:
pip install tidemill
export TIDEMILL_DATABASE_URL=postgresql://lago:password@postgres/lago
export TIDEMILL_CONNECTOR=lago
tidemill mrr
# $12,450.00
Docker Image¶
The Dockerfile is a multi-stage build:
- Stage 1 (Node 22): Builds the React frontend —
npm ci && npm run buildproduces static assets indist/ - Stage 2 (Python 3.13): Installs the Python backend via
uv, copies the built frontend to/srv/frontend
Caddy serves three things:
- Root domain (
tidemill.xyz) — the static landing page from/srv/landing(bind-mounted fromdeploy/landing/). app.subdomain — the React dashboard from/srv/frontend(the Docker volume populated by the API container)./api/*,/auth/*,/healthz,/readyz,/docs,/openapi.jsonare reverse-proxied to FastAPI on both the root domain (so existing webhook URLs keep working) and theapp.subdomain (so the SPA's API calls stay same-origin).
Environment Variables for Production¶
deploy/compose/.env¶
Copy from deploy/compose/.env.example and fill in:
| Variable | Required | Description |
|---|---|---|
POSTGRES_PASSWORD |
Yes | PostgreSQL password (no default — must be set) |
DOMAIN |
Yes | Domain for Caddy TLS (e.g. tidemill.xyz) |
STRIPE_API_KEY |
No | Stripe live key (sk_live_...) |
STRIPE_WEBHOOK_SECRET |
No | Stripe webhook signing secret |
AUTH_ENABLED |
No | true (default) or false to disable auth |
CLERK_PUBLISHABLE_KEY |
If auth | Clerk publishable key (pk_live_...) |
CLERK_SECRET_KEY |
If auth | Clerk secret key (sk_live_...) |
CLERK_JWKS_URL |
If auth | Clerk JWKS URL for JWT verification |
Clerk Setup for Production¶
- Create a production instance in Clerk Dashboard (not development)
- Set the allowed origins to your domain (e.g.
https://tidemill.xyz) - Configure OAuth providers (Google, GitHub, etc.) under User & Authentication > Social connections
- Copy the live keys (
pk_live_...,sk_live_...) to your.env - The JWKS URL format is
https://your-app.clerk.accounts.dev/.well-known/jwks.json
The frontend reads VITE_CLERK_PUBLISHABLE_KEY at build time. In the Docker build, this is baked into the static assets. Set it as a build arg or in the Dockerfile if needed. The default Docker Compose setup passes CLERK_PUBLISHABLE_KEY to the API container — the frontend must be rebuilt if the key changes.
Option A: Single Server¶
A single Hetzner CX22 (2 vCPU, 4 GB RAM, ~€4/mo) running Docker Compose. Good for getting started, small-to-medium workloads, or development.
What Terraform Provisions¶
| Resource | Purpose |
|---|---|
hcloud_server |
Ubuntu 24.04 with Docker (via cloud-init) |
hcloud_ssh_key |
Your SSH key for server access |
hcloud_firewall |
Allows only SSH, HTTP, HTTPS, ICMP inbound |
hcloud_zone_rrset |
A + AAAA DNS records pointing to the server |
Services (Docker Compose)¶
| Container | Role | RAM |
|---|---|---|
| Caddy | Reverse proxy, auto-HTTPS, serves frontend | ~20 MB |
| API | FastAPI — metrics, webhooks, auth, dashboards | ~100 MB |
| Worker | Kafka consumers — core state + metrics | ~150 MB |
| Redpanda | Kafka-compatible bus (no JVM, no ZooKeeper) | ~256 MB |
| PostgreSQL | Primary database | ~256 MB |
Total: ~800 MB. Fits on CX22 with headroom.
Caddy Configuration¶
Caddy splits the deployment into the public landing page (root domain) and the React dashboard (app. subdomain), with API routes proxied on both:
{$DOMAIN:localhost} {
handle /api/* { reverse_proxy api:8000 }
handle /auth/* { reverse_proxy api:8000 }
handle /healthz { reverse_proxy api:8000 }
handle /readyz { reverse_proxy api:8000 }
handle /docs { reverse_proxy api:8000 }
handle /openapi.json{ reverse_proxy api:8000 }
handle {
root * /srv/landing
try_files {path} /index.html
file_server
}
}
app.{$DOMAIN:localhost} {
# same /api, /auth, /healthz proxies as above
handle /api/* { reverse_proxy api:8000 }
# …
handle {
root * /srv/frontend
try_files {path} /index.html # SPA fallback
file_server
}
}
The try_files directive sends all non-file paths to index.html, enabling React Router's client-side routing. Keeping /api/* on the root domain means Stripe webhooks pointed at https://tidemill.xyz/api/webhooks/stripe keep working when the SPA moves to the subdomain.
Quickstart¶
# 1. Prerequisites
brew install terraform # or apt install terraform
# 2. Configure secrets
cd deploy/terraform/single-server
cp .env.example .env
# Edit .env: set TF_VAR_hcloud_token, TF_VAR_tailscale_auth_key
# 3. Configure Clerk + Stripe
cd ../../compose
cp .env.example .env
# Edit .env: set POSTGRES_PASSWORD, DOMAIN, CLERK_*, STRIPE_* keys
# 4. Deploy
cd ../terraform/single-server
set -a && source .env && set +a
terraform init
terraform plan # review what will be created
terraform apply # provision server, firewall, DNS zone
# 5. Set nameservers at your domain registrar
terraform output nameservers
# → Set these as custom nameservers for tidemill.xyz at your registrar
# 6. Verify (wait ~2 min for cloud-init + DNS propagation)
curl https://tidemill.xyz/healthz
# 7. Open the public landing page
open https://tidemill.xyz
# 8. Open the React dashboard
open https://app.tidemill.xyz
What cloud-init Does¶
The server bootstraps itself on first boot via cloud-init.yml:
- Updates packages and installs Docker
- Clones the repo to
/opt/tidemill - Generates a random Postgres password
- Generates a random Grafana admin password (via Terraform
random_password) - Starts Docker Compose with both
docker-compose.ymlanddocker-compose.observability.yml - Enables unattended security updates
- Reboots if the kernel was updated
Observability¶
OpenTelemetry is enabled by default. The server runs a self-contained Grafana stack:
| Service | Exposure | Purpose |
|---|---|---|
| OTEL Collector | internal only | OTLP receiver, forwards to Tempo and Prometheus |
| Tempo | internal only | Trace storage (24 h retention) |
| Loki | internal only | Log storage (7 d retention) |
| Prometheus | internal only | Metrics storage (15 d retention) |
| Alloy | internal only | Scrapes Docker container logs → Loki |
| Grafana | grafana.<domain> |
Web UI (TLS via Caddy + Let's Encrypt) |
Fetch the Grafana admin password after terraform apply:
terraform output -raw grafana_admin_password
Open https://grafana.<domain> and log in as admin. Tempo, Loki, and Prometheus are pre-provisioned as datasources; the Tidemill Overview dashboard shows RED metrics for the API and worker.
Disable the stack by setting TIDEMILL_OTEL_ENABLED=false in .env and restarting the api + worker containers (the observability services themselves can be left running or stopped with docker compose -f docker-compose.observability.yml stop).
Destroy¶
terraform destroy # removes server, firewall, DNS records
Option B: Kubernetes Cluster¶
A 3-node HA k3s cluster with separate worker nodes, running on Hetzner. For production workloads that need horizontal scaling and high availability.
What Terraform Provisions¶
| Resource | Purpose |
|---|---|
k3s cluster (via kube-hetzner module) |
3 control plane nodes + N worker nodes |
hcloud_load_balancer |
Ingress load balancer with public IP |
hcloud_zone_rrset |
DNS records pointing to the load balancer |
kubernetes_namespace |
tidemill namespace |
kubernetes_secret |
Database credentials, Kafka config, Clerk keys |
kubernetes_stateful_set x 2 |
PostgreSQL + Redpanda with Hetzner CSI volumes |
kubernetes_deployment x 2 |
API (2 replicas) + Worker (2 replicas) |
kubernetes_ingress_v1 |
Traefik ingress with TLS |
Architecture¶
Load Balancer (lb11)
│
┌──────────────┼──────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ CP #1 │ │ CP #2 │ │ CP #3 │ Control Plane (cx22 x 3)
│ k3s │ │ k3s │ │ k3s │
└─────────┘ └─────────┘ └─────────┘
│ │ │
┌─────────┐ ┌─────────┐
│Worker #1│ │Worker #2│ Worker Nodes (cx32 x N)
│ API │ │ API │
│ Worker │ │ Worker │
│ PG │ │ Redpanda│
└─────────┘ └─────────┘
Quickstart¶
# 1. Configure
cd deploy/terraform/kubernetes
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars: set hcloud_token, domain, domain_zone
# 2. Deploy (~5 min for cluster, ~2 min for app)
terraform init
terraform plan
terraform apply
# 3. Access the cluster
export KUBECONFIG=$(terraform output -raw kubeconfig_path)
kubectl get pods -n tidemill
# 4. Verify
curl https://tidemill.xyz/healthz
Scaling¶
# Scale API replicas
kubectl scale deployment api -n tidemill --replicas=4
# Scale workers (Kafka rebalances partitions automatically)
kubectl scale deployment worker -n tidemill --replicas=4
# Add more Hetzner worker nodes — edit terraform.tfvars:
# worker_count = 4
terraform apply
Cost Estimate¶
| Component | Type | Count | Monthly |
|---|---|---|---|
| Control plane | CX22 (2 vCPU, 4 GB) | 3 | ~€12 |
| Workers | CX32 (4 vCPU, 8 GB) | 2 | ~€14 |
| Load balancer | LB11 | 1 | ~€6 |
| Volumes | 10 GB x 2 (PG + Redpanda) | 2 | ~€1 |
| Total | ~€33/mo |
Production Hardening¶
For a production Kubernetes deployment, consider:
- Managed PostgreSQL — replace the StatefulSet with Hetzner DBaaS or an external managed database. Remove the
kubernetes_stateful_set.postgresresource and updateDATABASE_URLin the secret. - Redpanda cluster — replace the single-node StatefulSet with the Redpanda Helm chart for a 3-broker cluster, or use Confluent Cloud / Amazon MSK.
- Image registry — push to GitHub Container Registry (
ghcr.io/ondraz/tidemill) and pin image tags instead oflatest. - Secrets management — use Sealed Secrets or External Secrets Operator instead of plain Kubernetes secrets. Store Clerk keys and Stripe keys here.
- Monitoring — deploy Prometheus + Grafana via Helm for cluster and application metrics.
- Backups — use Velero for cluster backup, pg_dump CronJob for PostgreSQL.
Compose <> Kubernetes Mapping¶
The Docker Compose and Kubernetes deployments use the same container images and environment variables. This table shows how each Compose concept translates:
| Docker Compose | Kubernetes | Notes |
|---|---|---|
postgres service |
StatefulSet + PersistentVolumeClaim |
Hetzner CSI volumes |
redpanda service |
StatefulSet + PersistentVolumeClaim |
Or Redpanda Helm chart |
api service |
Deployment + Service + Ingress |
Scales horizontally |
worker service |
Deployment |
Kafka rebalances partitions across replicas |
caddy service |
Traefik Ingress (built into kube-hetzner) |
TLS via Let's Encrypt |
.env file |
Secret |
Includes Clerk + Stripe keys |
Docker volumes |
PersistentVolumeClaim + Hetzner CSI |
|
ports: 80, 443 |
LoadBalancer service |
Hetzner Cloud LB |
frontend_assets volume |
Init container or build stage | Static files served by ingress |
Backups¶
Single Server¶
# PostgreSQL dump (add to crontab on the server)
docker compose exec postgres pg_dump -U tidemill tidemill \
| gzip > /opt/backups/tidemill-$(date +%F).sql.gz
# Or use Hetzner server snapshots (~€0.01/GB/mo)
Kubernetes¶
# PostgreSQL dump via CronJob (or use Velero for full cluster backup)
kubectl exec -n tidemill postgres-0 -- \
pg_dump -U tidemill tidemill | gzip > backup.sql.gz
Why Redpanda over Apache Kafka¶
| Factor | Redpanda | Apache Kafka |
|---|---|---|
| Runtime | Single C++ binary | JVM + ZooKeeper (or KRaft) |
| Memory (1 broker) | ~256 MB | ~1-2 GB |
| Startup time | Seconds | 30+ seconds |
| API | Kafka-compatible | Native |
| Swap to Kafka | Zero code changes | - |
For single-server: Redpanda saves ~1 GB of RAM. For Kubernetes production: swap to the Redpanda Helm chart or a managed Kafka service with no application changes.