Architecture Overview¶

Implementation plan for the open-source subscription analytics engine. Last updated: March 2026

Positioning¶

Open-source subscription analytics with transparent, auditable, customizable metric computation. Works with any billing system — Stripe, Lago, Kill Bill — and supports self-hosting.

ChartMogul, Baremetrics, and ProfitWell all compute your metrics in a black box. This project gives you the formulas, the SQL, and the code — reviewable, forkable, contributable.

Target Users (Priority Order)¶

Stripe users — largest installed base, no open-source analytics option exists
Self-hosting mandates — regulated industries, privacy-conscious organizations
Metric customizers — complex billing models (usage-based, hybrid) that SaaS analytics tools can't handle
Open-source billing users (Lago, Kill Bill) — philosophical alignment, deeper integration possible
Cost-conscious startups — free alternative to ChartMogul/Baremetrics

Design Principles¶

Metrics package first — the core is a Python library (tidemill), not a web app. FastAPI and CLI are thin facades. You can import tidemill in a Jupyter notebook and query metrics directly.
Stripe-first, dual architecture — the primary integration path is ingestion mode: Stripe webhooks translated into internal events, published to Kafka, consumed by metrics. A secondary same-database mode is available for open-source billing engines (Lago, Kill Bill) that expose their PostgreSQL — zero ETL, but lower priority.
Metrics are self-contained — each metric (MRR, churn, retention, ...) is a Metric subclass that declares its database tables, registers itself, and handles both event-driven and direct-query modes.
Transparent computation — every metric has documented, auditable, forkable logic. Metric definitions are code: reviewable, contributable, no black boxes. This is the core differentiator vs. ChartMogul, Baremetrics, and ProfitWell.
Connectors — billing systems are data sources. Webhook connectors translate vendor events into internal events; database connectors query billing tables directly. Adding a new billing source means implementing one adapter.
Self-hostable — PostgreSQL + Kafka + Docker. For open-source billing engines with accessible databases (Lago, Kill Bill), Kafka can be omitted in favour of direct database queries.

System Architecture¶

The system supports two integration architectures, chosen per billing source:

Mode A: Ingestion (Stripe) — Primary¶

Billing System        Event Bus          Analytics Engine           Consumers
┌─────────┐ webhooks  ┌─────────┐       ┌──────────────────────┐
│  Stripe ├──────────►│         │       │    tidemill (Py)     │     ┌──────────┐
│         │           │  Kafka  ├──────►│                      ├────►│   CLI    │
└─────────┘           │         │       │  ┌────────────────┐  │     └──────────┘
                      └─────────┘       │  │    Metrics     │  │     ┌──────────┐
   connector translates                 │  │ MRR│Churn│Ret… │  ├────►│  FastAPI │
   webhook → internal event             │  └────────────────┘  │     └──────────┘
   → publishes to Kafka                 │  ┌────────────────┐  │     ┌──────────┐
                                        │  │   PostgreSQL   │  ├────►│ Jupyter  │
                                        │  │  (analytics)   │  │     └──────────┘
                                        │  └────────────────┘  │
                                        └──────────────────────┘

Data flow (ingestion):

Billing system sends a webhook (e.g., Stripe customer.subscription.updated)
Webhook connector receives it, translates to an internal event (e.g., subscription.activated), publishes to Kafka
Core consumer updates base tables (customer, subscription, invoice, ...) — the current-state view
Metrics each consume the events they care about and update their own materialized tables
Consumers (CLI, API, Jupyter) query metrics for computed results

This is the primary integration path — it works with any billing system that exposes webhooks. Stripe is the reference implementation.

Mode B: Same-Database (Lago, Kill Bill) — Alternative¶

Billing Engine (Lago/Kill Bill)        Analytics Engine              Consumers
┌───────────────────────────┐       ┌──────────────────────┐
│        PostgreSQL         │       │    tidemill (Py)     │     ┌──────────┐
│  ┌─────────────────────┐  │       │                      ├────►│   CLI    │
│  │ subscriptions, fees, │◄─ ─ ─ ─┤  ┌────────────────┐  │     └──────────┘
│  │ invoices, customers  │  │ SQL  │  │    Metrics │  │     ┌──────────┐
│  └─────────────────────┘  │ query │  │ MRR│Churn│Ret… │  ├────►│  FastAPI │
│  ┌─────────────────────┐  │       │  └────────────────┘  │     └──────────┘
│  │ metric_* tables      │◄─ ─ ─ ─┤                      │     ┌──────────┐
│  │ (analytics-owned)    │  │       │                      ├────►│ Jupyter  │
│  └─────────────────────┘  │       └──────────────────────┘     └──────────┘
└───────────────────────────┘
     Zero ETL. Zero latency.
     No Kafka needed.

Data flow (same-database):

Lago/Kill Bill writes billing data to PostgreSQL as part of normal operation
Database connector reads billing tables directly via SQL (subscriptions, fees, invoices)
Metrics query billing tables on demand or materialize into metric_* tables in the same database
Consumers (CLI, API, Jupyter) query metrics for computed results

For open-source billing engines that expose their PostgreSQL, this eliminates the ETL layer entirely. No Kafka required. This mode is a secondary priority but a strong differentiator for Lago and Kill Bill users.

Package Structure¶

tidemill/
├── __init__.py              # Public API: MetricsEngine, connectors
├── engine.py                # MetricsEngine — routes queries to metrics
├── models.py                # SQLAlchemy Core tables (billing entities)
├── database.py              # Database connection and session management
├── events.py                # Internal event schema (dataclasses)
├── fx.py                    # Foreign-exchange rate conversion
├── bus.py                   # Kafka producer/consumer wrappers (ingestion mode only)
├── state.py                 # Core consumer: events → base tables (ingestion mode only)
├── connectors/
│   ├── __init__.py          # Connector base classes + registry
│   ├── base.py              # WebhookConnector + DatabaseConnector ABCs
│   ├── stripe.py            # Stripe webhook translator — reference implementation
│   ├── lago.py              # Lago database connector (same-database mode)
│   └── killbill.py          # Kill Bill database connector (same-database mode)
├── metrics/
│   ├── __init__.py          # re-exports Metric, QuerySpec, registry
│   ├── base.py              # Metric ABC + QuerySpec
│   ├── query.py             # Cube, QueryFragment, compilation
│   ├── registry.py          # @register, discovery, dependency resolution
│   ├── route_helpers.py     # Shared FastAPI helpers
│   ├── mrr/                 # P0: MRR (MRR, ARR, waterfall, breakdown, series)
│   ├── churn/               # P0: Churn (logo, revenue, customers)
│   ├── retention/           # P0: Retention (cohorts, NRR, GRR)
│   ├── ltv/                 # P1: LTV (LTV, ARPU, cohort LTV)
│   └── trials/              # P1: Trials (funnel, conversion rate)
├── reports/                 # Pre-built charts + styled tables for each metric
├── cli/
│   ├── __init__.py
│   └── main.py              # CLI entry point (P0)
└── api/
    ├── __init__.py
    ├── app.py               # FastAPI app — mounts per-metric routers
    ├── deps.py              # Auth dependencies
    ├── schemas.py           # Pydantic response schemas
    └── routers/             # health, auth, metrics, sources, webhooks, ...

Technology Choices¶

Component	Choice	Rationale
Language	Python 3.11+	Data science ecosystem, Jupyter integration
Database	PostgreSQL	See Database
Message bus	Kafka (ingestion mode only)	Durable, replayable, ordered per partition
ORM	SQLAlchemy 2.0	Async support, mature, works with Alembic
Migrations	Alembic	Standard for SQLAlchemy projects
API	FastAPI	Async, auto-docs, Pydantic integration
CLI	Click or Typer	Standard Python CLI tooling
Packaging	uv + pyproject.toml	Fast, modern Python tooling

Why Kafka¶

Kafka is the backbone of the primary integration path (Stripe and any webhook-based connector). Same-database mode (Lago, Kill Bill) can bypass Kafka by querying billing tables directly.

Kafka gives us properties that a simple in-process event bus cannot:

Durability — events survive process restarts. If a metric crashes, it resumes from its last offset.
Replay — add a new metric and replay the full event history to backfill its tables from scratch.
Decoupling — connectors, core state, and metrics run independently. A slow metric doesn't block webhook processing.
Ordering — events for a given customer are ordered within a partition (partition by customer_id).

For development and single-node deployments, Redpanda is a Kafka-compatible alternative with simpler operations (~256 MB RAM vs 1-2 GB for Kafka).

Observability¶

Tidemill ships optional OpenTelemetry instrumentation for the API and worker, gated behind TIDEMILL_OTEL_ENABLED (defaults off for the Python package, on in the single-server deploy).

Signal	Source	Storage
Traces	FastAPI, SQLAlchemy, asyncpg, aiokafka (auto)	Tempo
Metrics	Auto-instrumented RED + DB timings	Prometheus
Logs	stdout (Docker) → Alloy scraper	Loki

All signals flow through a self-hosted Grafana stack: app → OTEL Collector → Tempo/Prometheus, Docker logs → Alloy → Loki. Logs carry trace_id=<hex> / span_id=<hex> so Grafana can jump between logs and traces via datasource derivedFields and tracesToLogsV2 links.

The stack is defined in deploy/compose/docker-compose.observability.yml and is included automatically by both make dev (local) and the single-server Terraform deploy. The Kubernetes deploy does not bundle the stack — point the app at an external OTLP endpoint via OTEL_EXPORTER_OTLP_ENDPOINT.

See deployment.md for operator access details.

MVP Scope¶

P0 (Must-Have)¶

MRR computation with transparent, documented, configurable logic
Churn calculation — logo churn, revenue churn, net revenue churn
Basic cohort analysis — monthly retention cohorts
Stripe integration via webhooks + Kafka — reference implementation (largest installed base)
CLI for programmatic access to metrics
FastAPI for HTTP access
Self-hosted deployment via Docker (PostgreSQL + Kafka + API + Worker)
Documented metric methodology — every formula explained and auditable

P1 (Implemented)¶

LTV and ARPU (metric_ltv_invoice + LtvInvoiceCube)
Expansion / contraction / reactivation MRR breakdown (movement types in metric_mrr_movement)
Trial conversion tracking (metric_trial, cohort-based funnel)
Customer segmentation — saved SegmentDef JSON, EAV customer_attribute table, compare-mode CROSS JOIN compilation. See Segmentation.
Web dashboard UI (frontend/)

P1 (Remaining)¶

Lago integration via direct PostgreSQL access (same-database mode)
Kill Bill integration
CAC computation
Data warehouse export

Non-Goals for V1¶

Payment processing
Revenue recovery / dunning
Board-ready financial reporting
CRM features
Multi-scenario planning
General-purpose BI

What's Next¶

Events — internal event schema and Kafka topics
Database — core tables, ER diagram, deployment topologies
Connectors — webhook translators (Stripe) and database connectors (Lago, Kill Bill)
Metrics — metric base class, built-in metrics (dual-mode)
Cubes & Query Algebra — declarative query building with cubes and composable fragments
API — FastAPI endpoints and CLI interface