Cohort Analysis
Share
What is Cohort Analysis for Retention?
Cohort Analysis groups customers by a shared start event (e.g., first purchase month, first product, or acquisition channel) and tracks their behavior over time. It reveals when customers churn or repeat and which cohorts are healthiest—so you can target retention plays where they matter most.
Formulas / Metrics (core types):
- Retention Ratet: Active customers in cohort at time t ÷ total customers in cohort.
- Churn Ratet: 1 − Retention Ratet.
- Repeat Purchase Rate (RPRt): Customers with ≥2 orders by time t ÷ total cohort.
- Time to Second Order (T2O): Median days from first to second purchase within the cohort.
- LTVt (Cohort LTV): Cumulative revenue by time t ÷ total cohort (or per customer).
- Reorder Interval: Median days between consecutive purchases (helps schedule replenishment).
Key idea: Treat each cohort as a mini lifecycle. Compare cohorts (by month/channel/product) to spot decays and intervene early.
Why it matters?
- Targeted retention: Improves the cohorts that actually drive long-term value instead of blanket discounts.
- Health diagnostics: Pinpoints when retention slips (e.g., D30 vs D60) and which sources/products cause it.
- Efficient growth: Small gains in repeat rate compound LTV and stabilize blended ROAS.
KPIQ Perspective
- User view: “Retention is slipping—when do customers churn, and which acquisition cohorts are healthiest?”
- Technical view: KPIQ builds acquisition cohorts by week/month and channel/campaign (optionally by first product/category when available), computes repeat/active curves (D30/D60/D90), time-to-second-order, and cohort LTV (gross or net of returns where available); it benchmarks cohorts against your baseline, surfaces abnormal decay, ties it to source or product mix, and recommends targeted retention plays (post-purchase flows, replenishment reminders, win-back timing). It also flags data gaps (incomplete UTMs, missing refund/cancellation fields, inconsistent seasonality windows).
Actionable Insights
- ✅ Define your cohort keys: acquisition month, channel/campaign, and first-product/category.
- ✅ Track D30/D60/D90 repeat and active-user rates; monitor T2O for replenishment timing.
- ✅ Normalize metrics with net orders/revenue (exclude cancellations/returns for honesty).
- ✅ Segment by AOV bands and first product type—offer and cadence differ by ticket size.
- ✅ Automate post-purchase flows (how-to, UGC ask, reorder reminders) aligned to typical reorder intervals.
- ✅ Run win-back tests on cohorts with early decay; measure lift vs. holdout.
Practical Example
Scenario (new customer cohorts): AOV €42.
- May (Meta Prospecting): N=1,000 · D30 repeat=16% · D60 repeat=24% · T2O median=27 days · LTV₉₀=€58
- June (Google Non-brand): N=800 · D30=13% · D60=20% · T2O=30 days · LTV₉₀=€52
- July (TikTok): N=900 · D30=10% · D60=17% · T2O=33 days · LTV₉₀=€49
Interventions
- May: Day-21 “how to use + UGC ask” email, Day-35 replenishment reminder, product-bundle offer.
- June: Subscription nudge after second session; shipping threshold reminder in win-back.
- July: First-order bundle coupon + creator video in post-purchase sequence.
What-if (Retention lift → revenue impact)
If D60 improves by +3 pp for May and +2 pp for June:
- May: +3% × 1,000 = +30 orders → +€1,260 revenue (30 × €42)
- June: +2% × 800 = +16 orders → +€672 revenue (16 × €42)
- Total incremental revenue: €1,932 (≈ +€1,063 contribution at 55% margin)
Takeaway: Small D60 gains in the right cohorts materially lift LTV and stabilize future revenue.
📖 Click to open the in-depth analysis
Foundations
Cohorts group users by a shared start (e.g., first purchase month) and evaluate outcomes as functions of time since start. Retention curves (survival functions) describe the probability that a customer remains active, while hazard curves show the instant churn risk.
Key Concepts
- Acquisition vs behavior cohorts: Group by first purchase (acquisition) or by behaviors (e.g., first category).
- Survival & hazard: Retentiont and churn risk evolve over time; early peaks hint at onboarding gaps.
- Alignment & seasonality: Compare cohorts on elapsed time (D30/D60) and control for seasonal promos.
- Netting returns: Use net revenue/orders per cohort; returns skew LTV if ignored.
- Attribution influence: Acquisition source affects cohort health—triangulate with attribution views.
Advanced Methods
- Kaplan–Meier / Cox models: Non-parametric and semi-parametric survival analysis for retention/churn drivers.
- BG/NBD & Pareto/NBD: Probabilistic repeat-transaction models to forecast LTV by cohort.
- Uplift modeling: Identify which cohorts benefit most from win-back/replenishment interventions.
- Bayesian shrinkage: Stabilize small cohorts by partially pooling toward portfolio averages.
Common Pitfalls
- Mixing calendar time with cohort time (misleading comparisons).
- Ignoring cancellations/returns (inflated LTV and false “wins”).
- Comparing cohorts with different promo intensity or stockouts.
- Overreacting to small cohorts (wide CIs, noisy signals).
Further Reading
- Fader & Hardie — Customer-Base Analysis
- Gupta & Lehmann — Managing Customers as Investments
- BG/NBD applications in ecommerce retention analytics