The SaaS Metrics Stack: ARR, MRR, Churn and LTV You Can Actually Trust

Most SaaS companies do not have a metrics problem. They have a source-of-truth problem. The definitions are easy to look up. The failure is always in the implementation - in which system you trust, which event you count, and which edge cases you decided to ignore until an investor found them.

The Source-of-Truth Problem: Why Your Metrics Disagree

The definitions are not the problem. ARR, MRR, churn - these are well-documented, publicly benchmarked, widely cited. You can look up the formula in any SaaS handbook. The failure mode is different: companies trust the wrong system for financial metrics.

Pull ARR from Salesforce and you're counting booked revenue, not recognised revenue. Your "ARR" is inflated by 30-60 days because deals haven't gone live yet. Ask the Finance team the same question three months later and the number diverges because some of those deals never went live, or downgrades happened, or customers never paid.

Trust Mixpanel for churn and you're measuring user engagement (who logged in last month?) instead of contract renewal (who paid?). A customer can have zero logins but still be active. A customer can have high engagement but be on month-to-month with plans to leave. These are different failure modes. Investors will find the divergence during diligence.

This is an architecture problem. Which system owns which truth? What is the dependency chain? Fixing metrics means fixing source-of-truth assignments. At the stage where you're fundraising (Series A, headed to Series B), you're at 100-500 people, $3-20M ARR. Manual reconciliation does not scale. Automated pipelines become non-negotiable.

Part I: The Metrics Hierarchy - What Feeds What

1.1 The Four-Layer Stack: Events, CRM, ERP, Warehouse

Think of your metrics infrastructure as four layers, each authoritative for one thing:

Layer 1 - Event Data (Segment): Raw user and product events. Tracks what users do in the product. Not your financial source of truth. Used for: product analytics, funnel analysis, activation rates, feature adoption. Your product team lives here.

Layer 2 - CRM (Salesforce): Sales pipeline, deal stage, contract value, renewal dates. Tracks committed revenue and sales activity. Source of truth for: pipeline forecast, booked ARR. Not source of truth for recognised revenue - there's always a lag between signature and service start.

Layer 3 - ERP / Accounting (Xero): Invoiced amounts, payment status, cash received, COGS. Tracks financial obligations and cash. Source of truth for: cash collected, invoices issued, cost data. Not source of truth for GAAP revenue recognition - invoicing date is not recognition date.

Layer 4 - Data Warehouse (Redshift): The reconciliation layer. Takes the above three, resolves entity mismatches, applies recognition rules, produces the single defensible output. This is where ARR, MRR, churn, NRR, and LTV come to life.

The data flows like this:

Segment (events) ──────────────────────────────────────────────┐
Salesforce (CRM) ──► Stitch ──► S3 (raw) ──► Redshift (raw) ──► Python (transform) ──► Redshift (clean) ──► outputs
Xero (ERP/finance) ─────────────────────────────────────────────┘

Key rule: each layer is authoritative for one thing. Using Salesforce for financial reporting is a category error. Using Segment for cohort revenue analysis is a category error.

1.2 Where Segment and Mixpanel Fit (and Where They Don't)

What Segment does: Collects user events (page views, feature clicks, conversion events) from your product, enriches them with user properties, and fans them out to analytics tools. Excellent for product funnel and engagement analysis.

What Mixpanel does: Retention analysis (did users come back?), funnel visualisation (where do users drop off?), cohort behaviour (how did the March signup cohort behave?).

The critical distinction: Mixpanel retention is user/session-based. SaaS churn is contract/revenue-based. They answer different questions.

Common mistake: using Mixpanel "churned users" (stopped logging in) as the basis for revenue churn. These numbers will diverge whenever a customer reduces seats, downgrades, or goes silent before formally cancelling. You'll report 5% revenue churn based on Mixpanel, investors will recalculate from Xero contract cancellations and get 12%, and credibility dies.

The correct pattern: use Segment/Mixpanel for product health metrics (DAU, activation, feature adoption). Use Redshift (from Xero + Salesforce) for financial metrics (churn, NRR, LTV). Never mix the two.

Segment's warehouse destination feature is your integration pattern - Segment can write events directly to Redshift via its destination connector. This is correct: Segment → Redshift → financial metrics pipeline. Do not pull Segment data from the Mixpanel API for financial calculations.

1.3 Why Redshift Is the Financial System of Record

Redshift is the only layer with access to all three sources simultaneously. Entity resolution happens here - matching a Salesforce deal ID to a Xero invoice, normalising customer names, handling currency conversion. The entity resolution problem - where you discover that one customer has three different IDs across systems - is covered in depth in The Data Room That Helped Close Our Series B.

Immutability: raw tables are append-only, never modified. Every run is reproducible. A year from now, you can re-run the exact same query and get the same result. Try that with Salesforce - data changes, records get deleted, custom fields get renamed.

Auditability: every transformation is a SQL view or Python function with version history. When asked "how is this number calculated," the answer is a query, not a conversation. When methodology changes, there's a git commit explaining why.

The metrics layer in Redshift produces fact tables: fct_arr, fct_churn, fct_nrr, fct_unit_economics. Each table has a clearly defined grain (one row per customer per month). This is your system of record for investor reporting.

Part II: ARR and MRR - The Recognition Event Problem

2.1 The Three Wrong Ways to Recognise ARR

Wrong way 1: Salesforce "Closed Won" date - Recognises ARR at deal signature or deal stage change. Inflates ARR by counting deals not yet live. Diverges from Xero by the implementation lag (typically 20-40 days). During diligence, investors will ask why your February ARR is higher than your March ARR, and the answer "they hadn't gone live yet" is not credible.

Wrong way 2: Xero payment date - Recognises ARR at cash receipt. Understates ARR by delaying recognition until payment clears. For net-60 contracts, this introduces a 60-day lag. Your board pack shows March ARR that doesn't include February's cash because it cleared in April. Investors hate this because it obscures the underlying business.

Wrong way 3: Invoice date - Closer but still wrong for advance invoicing. A customer invoiced on Jan 1 for a contract starting Feb 1 should not have ARR recognised in January. But this is what happens if you key off Xero's invoice date.

All three are common. All three are defensible in isolation. None survives investor scrutiny during diligence.

2.2 The Correct Recognition Event and How to Encode It

Recognition event: service delivery commencement date - the date the customer's access to the product is activated, their subscription goes live, their entitlement begins.

This is the GAAP-adjacent approach per ASC 606. Performance obligation is satisfied when service is delivered, not when payment is received or contract signed.

In the data model: stg_subscriptions table should have a service_start_date column. This is the recognition date. If Salesforce does not track it natively, you need to either enrich the CRM data or derive it from Segment events (first login / first activation event can proxy service start).

Practical note: for customers on monthly plans, every renewal is a new recognition event. For annual contracts, recognition begins at service start and is spread over the contract period. In your fact table, you'll have multiple rows per customer (one per renewal period), but the amount is the periodic value, not annualised.

2.3 ARR Waterfall SQL: New, Expansion, Churn, Net

Grain: one row per customer per month, classified into new, expansion, contraction, churn, resurrected.

WITH monthly_arr AS (
  SELECT
    customer_id,
    DATE_TRUNC('month', service_start_date) AS arr_month,
    SUM(annual_contract_value) AS arr
  FROM stg_subscriptions
  WHERE status = 'active'
  GROUP BY 1, 2
),
arr_changes AS (
  SELECT
    curr.customer_id,
    curr.arr_month,
    curr.arr AS current_arr,
    COALESCE(prev.arr, 0) AS prior_arr,
    curr.arr - COALESCE(prev.arr, 0) AS arr_delta
  FROM monthly_arr curr
  LEFT JOIN monthly_arr prev
    ON curr.customer_id = prev.customer_id
    AND curr.arr_month = DATEADD(month, 1, prev.arr_month)
)
SELECT
  arr_month,
  SUM(CASE WHEN prior_arr = 0 AND current_arr > 0 THEN current_arr ELSE 0 END) AS new_arr,
  SUM(CASE WHEN prior_arr > 0 AND current_arr > prior_arr THEN current_arr - prior_arr ELSE 0 END) AS expansion_arr,
  SUM(CASE WHEN prior_arr > 0 AND current_arr < prior_arr AND current_arr > 0 THEN current_arr - prior_arr ELSE 0 END) AS contraction_arr,
  SUM(CASE WHEN prior_arr > 0 AND current_arr = 0 THEN -prior_arr ELSE 0 END) AS churned_arr
FROM arr_changes
GROUP BY 1
ORDER BY 1;

This query is the board pack ARR waterfall. Every investor will ask for this view. The numbers should tie back to Xero invoiced amounts (within a small tolerance for multi-currency FX). If they don't, you have an entity resolution problem - a Salesforce deal is not matching to the right Xero invoice.

Note on multi-currency: normalise to USD using daily FX rates from an open-source dataset before aggregation. Amount matching across currencies requires the same normalisation step as entity resolution.

2.4 MRR Consistency: Why You Calculate It from ARR, Not Separately

MRR = ARR / 12, always. Not independently calculated.

Why: if you calculate MRR by summing monthly invoices and ARR by annualising contracts, they will diverge. Annual contracts invoiced upfront create a spike in "monthly" revenue that does not reflect the recurring picture. You'll have months where "invoiced MRR" is 3x "recognised MRR" because the annual invoices cleared, and investors will ask why.

In code: mrr = arr / 12.0 - one line. Never a separate pipeline.

Practical note: some CFOs want to see invoiced MRR separately (actual cash flow) alongside recognised MRR (ARR/12). These are different metrics serving different purposes. Build both, label them clearly, never present one as the other.

Part III: Churn - The Metric That Hides the Most

3.1 Logo Churn vs. Revenue Churn vs. NRR: Three Different Stories

Logo churn = count of customers who did not renew / count at start of period. Useful for: customer success tracking, ICP validation, support load forecasting. If you lose 10 customers out of 100, logo churn is 10%.

Revenue churn = ARR lost from churned customers / ARR at start of period. Useful for: financial forecasting, board reporting, investor comparables. If those 10 customers represented $500K ARR and you started at $10M, revenue churn is 5%.

NRR (Net Revenue Retention) = ARR at end of period from customers who existed at start / ARR at start from those same customers. Includes expansion, contraction, and churn. The single most important retention metric for investors.

The story each tells: a company can have 15% logo churn (losing many small customers) but 5% revenue churn (small customers were low-value) and 110% NRR (expansion from large customers outpaces everything). That is a structurally strong business, but the logo churn number looks alarming in isolation.

Present all three. If you only show one, investors will ask for the others and wonder why you picked that one.

Benchmark: NRR of 105-115% is the Series B expectation from top-tier investors. Below 105% and they'll scrutinise your expansion story. Above 115% and they'll want to understand why (if it's real growth vs. just customer concentration).

3.2 Cohort Retention: The SQL Pattern That Gets It Right

Cohort retention table: rows = cohort month (month of first subscription), columns = months since acquisition (0, 1, 2... 24), values = % of original cohort ARR retained.

This is the most visually powerful investor table. It shows whether retention improves over time, whether early cohorts held up, and whether expansion is real or just new customer volume.

WITH cohorts AS (
  SELECT
    customer_id,
    MIN(DATE_TRUNC('month', service_start_date)) AS cohort_month
  FROM stg_subscriptions
  GROUP BY 1
),
cohort_arr AS (
  SELECT
    c.cohort_month,
    DATEDIFF('month', c.cohort_month, ma.arr_month) AS months_since_acquisition,
    SUM(ma.arr) AS retained_arr
  FROM cohorts c
  JOIN monthly_arr ma ON c.customer_id = ma.customer_id
  GROUP BY 1, 2
),
cohort_initial AS (
  SELECT cohort_month, SUM(arr) AS initial_arr
  FROM cohort_arr
  WHERE months_since_acquisition = 0
  GROUP BY 1
)
SELECT
  ca.cohort_month,
  ca.months_since_acquisition,
  ca.retained_arr / ci.initial_arr AS retention_pct
FROM cohort_arr ca
JOIN cohort_initial ci ON ca.cohort_month = ci.cohort_month
ORDER BY 1, 2;

This output feeds both Excel (pivot table - board pack heatmap) and Tableau/Qlik (cohort visualisation). When displayed as a heatmap, cohorts with declining retention show as a gradient fade (bright green at month 0, fading to yellow/orange). Cohorts with strong expansion show as a uniform or brightening color.

3.3 The Edge Cases That Break Churn Calculations

Pause / suspension: Customer temporarily suspends service. Do not count as churn. Maintain status in stg_subscriptions as 'paused'. Handle separately in NRR calculation (exclude from churned_arr, include as contraction if partially paused).

Downgrades at renewal: Customer renews at lower ARR. This is contraction, not churn. Logo is retained; revenue is not fully retained. Show in the ARR waterfall as negative expansion / contraction line. If a customer goes from $10K to $5K ARR, that's -$5K in the contraction bucket, not churn.

Multi-year contracts: Customer on a 3-year deal. They are not "renewing" annually. Treat as active for the full contract period. Churn event is at the end of the committed term if they do not renew. During the contract period, they contribute to NRR even if they're silent.

Acquired customers: You acquired a company. Their customers are new to your ARR. Do not blend acquired ARR into organic ARR. Track separately - investors will ask about organic vs. acquired growth. In your fact tables, add an acquired_cohort flag and segment your reporting.

Grace periods: Customer goes past renewal date but has not formally churned. Define a grace period (typically 30-60 days). After that, mark as churned. Document this threshold explicitly in your methodology - "customers are marked as churned if they do not renew within 30 days of contract expiry."

These edge cases need to be defined as business rules in the transformation layer, not handled ad hoc. Each rule should be a comment in the SQL or a function docstring. When you explain your metrics to investors, you'll be explaining these edge cases. They need to be consistent, documented, and defensible.

Part IV: LTV and CAC - The Unit Economics Stack

4.1 CAC: Which Costs to Include and How to Segment

Formula: Total Sales and Marketing spend / New customers acquired (same period)

The components of "total S&M spend" that must be documented:

Included: AE salaries + commissions, SDR salaries, marketing team salaries, paid ads (Google, LinkedIn, etc.), marketing tools (HubSpot, marketing automation), events, conferences.

Excluded: Account management (post-sale), customer success (retention), product marketing if primarily focused on retention, R&D, G&A.

Blended vs. new-only CAC: Blended includes all customers (new + expansion); new-only covers only new logo acquisition. Investors want new-only CAC. They will derive it from S&M spend / new logos, so make sure your definition matches theirs.

Segment by customer type: SMB CAC is typically 1-3x MRR. Enterprise CAC can be 10-24x MRR. Blending them hides both the efficiency of the SMB motion and the risk concentration in enterprise. If your blended CAC is $50K but SMB CAC is $8K and Enterprise CAC is $150K, you need to show both segments.

CAC is a period-level calculation, not a per-customer calculation. Pull from finance data (total S&M cost per quarter from Xero) divided by count of new customers from stg_subscriptions (customers whose cohort_month falls in the same quarter).

4.2 LTV: The Simplified Formula and When to Use It

Simplified formula: LTV = (ACV x Gross Margin %) / Logo Churn Rate

Why simplified: assumes constant churn rate and no time-value discounting. Appropriate for Series B where the forecast horizon is 3-5 years. For longer horizons, use discounted LTV (NPV of future margin contributions).

Gross margin in LTV: use fully-loaded gross margin (COGS including hosting, customer support, onboarding). Not just direct infrastructure cost. Investors will scrutinise this number because it drives the LTV/CAC ratio. A company claiming 85% gross margin that really has 65% once you add customer success headcount loses credibility.

LTV by segment: enterprise LTV is typically 5-20x SMB LTV because ACV is higher and churn is lower. Show separately to make the economics of each motion visible. If your blended LTV is $250K but SMB LTV is $80K and Enterprise LTV is $600K, the enterprise motion is carrying your unit economics.

4.3 Payback Period: The Number That Drives Fundraising Conversations

Formula: CAC / (MRR per customer x Gross Margin %)

Expressed in months.

Benchmark: under 12 months is strong, 12-18 months is acceptable, over 24 months requires strong NRR to compensate. Most investors expect to see payback under 18 months at Series B.

Practical note: investors recalculate this themselves. Your payback number will only be trusted if it is derived from the same gross margin and CAC definitions they will use. Show the formula inputs, not just the output. If payback is 14 months, show: CAC $50K, MRR $3.5K, gross margin 75% = ($50K / ($3.5K * 0.75)) = 19 months. Wait, that doesn't equal 14 months - see the problem?

Segmented payback: SMB payback is often faster (lower ACV but lower CAC). Enterprise payback is longer but LTV is much higher. Presenting blended payback at Series B is a red flag - it suggests you have not segmented your GTM motion.

4.4 Gross Margin: The Input Everyone Gets Wrong

This is the input that most CTOs underestimate. Gross margin for SaaS is not just cloud infrastructure costs.

COGS components to include: AWS/GCP/Azure spend, third-party API costs (if directly tied to revenue delivery), customer success headcount (the team that keeps customers live), onboarding / implementation headcount, support headcount.

What to exclude: R&D / product / engineering (above the line), sales and marketing, G&A.

Common error: reporting 80% gross margin because only infrastructure cost is in COGS, then having investors recalculate at 65% once they include customer success headcount. This creates a credibility gap that is very hard to recover from. Better to be conservative and show 72%, then pleasantly surprise them if you beat it.

Gross margin in Redshift: derive from Xero COGS accounts. Map Xero account codes to COGS buckets. This requires a configuration table (account_code - cost_category) and a quarterly cost aggregation query. When gross margin changes (you hire a customer success team, for example), that change is traceable to a P&L line.

Part V: The Output Layer - Redshift to Excel to Tableau

5.1 The Python Export Pattern: Redshift to Excel

Pattern: Python script runs on ECS, triggered by Airflow, after the metrics layer completes.

Pulls from fact tables in Redshift (fct_arr, fct_churn, fct_unit_economics).

Uses openpyxl or xlsxwriter to write to a pre-built Excel template with named ranges.

Output sheets:

ARR_Waterfall: new / expansion / contraction / churn / net, monthly, last 24 months

Cohort_Retention: cohort x months-since-acquisition heatmap

Unit_Economics: CAC, LTV, payback by segment (SMB / mid-market / enterprise)

Burn_Runway: gross burn, net burn, runway at current and projected burn

Named ranges in the Excel template allow Finance to link the board pack model to the output file without rebuilding formulas. The Finance team opens their board pack model, updates a reference to the export file, and all downstream calculations pull the latest metrics. Automation without breaking workflows.

Saves to S3 (accessible to Finance via signed URL or mounted drive). Daily schedule or on-demand run. Versioning: keep the last 12 months of snapshots so you can rerun board packs from prior periods if needed.

5.2 When to Use Tableau/Qlik vs. Excel

Excel: board packs, investor data rooms, CFO/CEO forward-looking modelling. Finance team owns. Snapshot-based (not live). Appropriate for data that needs to be locked at a point in time. When you send a data room to investors, every number in it comes from a specific Excel export on a specific date. That date is auditable.

Tableau/Qlik: operational dashboards for CS, Sales, Product. Live or near-live (daily refresh). Used for: cohort health monitoring, segment performance, pipeline analytics. Not for investor reporting.

The mistake: building a Tableau dashboard and screenshotting it for the board pack. This creates a PDF that cannot be audited. Investors cannot verify the underlying data. They cannot re-run the calculation if they want to challenge your methodology. Avoid this completely.

The correct pattern: Excel for investor-facing outputs (auditable, versioned, formula-traceable). Tableau/Qlik for operational teams (live, flexible, fast). Both sourced from the same Redshift fact tables, so the numbers always agree.

In practice: the same cohort retention query feeds both. Redshift - Python - Excel for the board pack. Redshift - Tableau extract - dashboard for the CS team. One source of truth, two consumption patterns.

5.3 The Metrics Audit Trail: Making Every Number Traceable

Every metric in the output should be traceable to: the raw source data, the transformation logic, and the output query.

In practice: when an investor asks "how do you calculate NRR," the answer is a git commit hash pointing to a Python function with a docstring, which points to a SQL view, which points to a raw Redshift table, which points to a Stitch connector pulling from Xero.

Methodology documentation: a short document (1-2 pages) that maps each metric to its source, its recognition rule, and its edge case handling. This lives in the data room alongside the metric outputs. Example:

"NRR is calculated from fct_nrr (defined in schema version 2.3, view query in commit abc123). Customers are included if they had active subscriptions in both the prior and current period. Paused subscriptions are excluded. Multi-year contracts are split into annual tranches for recognition. Acquired customers (marked with acquired_cohort flag) are excluded from organic NRR."

Version control: all SQL transforms and Python scripts are in git. When a methodology changes, there is a commit with a message explaining why. This creates an audit trail that is also an institutional memory. "Exclude acquired cohorts from organic NRR" is not a decision made in a meeting and forgotten - it's a commit message that future team members can read.

Conclusion: Trust Is Earned at Implementation, Not Definition

Most metric failures are implementation failures, not definition failures. The definitions exist. The failure is always in execution: which system owns the truth, which events you count, which edge cases you documented vs. ignored.

The hierarchy matters: Segment and Mixpanel for product behaviour, Salesforce for pipeline, Xero for cash, Redshift for truth. Anything else creates divergence that investors will find.

The edge cases are where the real engineering lives: churn grace periods, multi-year contracts, acquired cohorts, cross-currency normalisation. Handle them consistently. Document them. Make them auditable.

The output layer matters as much as the calculation layer. An Excel file that Finance can link into, sourced from a Redshift query that is version-controlled, is worth more than a Tableau dashboard that cannot be audited. When you're under diligence and an investor asks to verify a number, you need to be able to show them the query that produced it.

For CTOs: owning this stack means owning the company's number credibility. Finance can define the metrics; engineering has to make them trustworthy at implementation. This is the difference between a company that passes diligence and one that gets asked to "restate" their metrics during term sheet negotiation.

Preparing for a Series B raise and not confident in your revenue data? I help Series A and B-stage engineering teams build the data infrastructure behind defensible SaaS numbers. Schedule a consultation →