Cancel Flows

A/B Testing

Run controlled experiments to optimize your Cancel Flows and maximize customer retention.

What is A/B Testing?

A/B testing lets you compare two versions of a Cancel Flow to see which one performs better. Churnkey splits your traffic between a control variant (your current flow) and a test variant (your modified flow), then measures which one saves more customers and generates more revenue.


The 5 Lifecycle States

Every A/B test moves through five states:

  1. Not Started — Test created, waiting for you to start
  2. Enrolling (7 days) — Traffic splits between Control and Test
  3. Tracking (30 days) — Measuring if saved customers actually stay
  4. Awaiting Decision — Data complete, waiting for you to pick a winner
  5. Completed — Winner declared, test archived

Total timeline: ~37 days minimum (7 days enrollment + 30 days tracking)


State 1: Not Started

What's happeningTest is created but not running yet
DurationUntil you click "Start"
Your actionVerify both variants are ready, then start the test
Data collectedNone

This state gives you time to review your test variant in the Cancel Flow builder and ensure everything is configured correctly before going live.


State 2: Enrolling (7 days)

What's happeningNew cancel sessions are split 50/50 between Control and Test
Duration7 days (fixed)
Your actionWait. Do not edit variants during this phase
Data collectedSession counts, initial save rates

During enrollment, every customer who enters your Cancel Flow is randomly assigned to one variant. This assignment stays consistent if they return later.

When enrollment ends, the cohort is locked. No new sessions enter the test after day 7.


State 3: Tracking (30 days)

What's happeningNo new enrollments. Monitoring if "saved" users actually stay
Duration30 days (fixed)
Your actionWait. This phase validates real retention
Data collectedRetention rates, reactivation, LTV impact, revenue per exposure

Why this phase matters: When a customer accepts an offer (pause, discount), we mark them as "saved" but we don't know yet if they actually stayed. They might cancel again next week, skip their next invoice, or churn silently.

The 30-day tracking window captures the real outcome by measuring whether saved customers pay their next invoice and remain subscribed.


State 4: Awaiting Decision

What's happeningAll data is in. Time to pick a winner
DurationUntil you decide
Your actionReview results and declare a winner
Data availableStatistical confidence, save rate lift, revenue difference, ARR impact

Look at the statistical confidence level to know how trustworthy your results are:

ConfidenceWhat it means
95%+Strong evidence. Safe to declare a winner
80-95%Moderate evidence. Proceed with caution
Below 80%Weak evidence. Results may be due to chance

State 5: Completed

What's happeningWinner declared, test is finished
DurationPermanent
Your actionNone. The winning variant is now live
Data availableFinal results archived for reference

When you confirm your decision, the winning variant becomes active and the losing variant is deactivated. All future customers in this segment see the winner.


Setting Up a Test

Prerequisite: A/B tests can only run on segmented Cancel Flows, not on your primary (default) Cancel Flow.

StepWhat to do
1. HypothesisDocument what you're testing and why. Be specific: "Offering a 3-month pause instead of 1-month will increase save rates for annual subscribers."
2. Primary MetricChoose which metric determines the winner. Revenue Per Exposure is recommended for most tests. See Primary Metrics Reference for details on all 6 options.
3. Cancel FlowSelect which segmented flow to test. Higher-volume flows reach statistical significance faster.
4. DurationSet enrollment (default: 7 days) and tracking (default: 30 days) periods. The defaults work for most tests.
5. Review & LaunchConfirm settings and create the test. It starts in "Not Started" state until you click "Start."

Primary Metrics Reference

Your primary metric determines which variant "wins." Choose based on what matters most to your business.

Revenue Per Exposure (Recommended)

  • Formula: Total Revenue from Saved Customers ÷ Total Sessions
  • Best for: Overall business impact—balances save rate against revenue quality
  • Example: Test saves 100 customers at $10 each ($1,000). Control saves 50 at $25 each ($1,250). Control wins despite lower save rate.

Save Rate

  • Formula: Customers Saved ÷ Total Sessions × 100
  • Best for: Maximizing retention count; good for testing copy, layout, or flow length
  • Watch out: Can lead to over-discounting if used alone

Reactivation Rate

  • Formula: Saved Customers Who Paid Next Invoice ÷ Total Saved Customers × 100
  • Best for: When saved customers frequently cancel before their next payment
  • Why it matters: A customer who accepts an offer but never pays again = $0 value

Pause Acceptance Rate

  • Formula: Customers Who Accepted Pause ÷ Total Sessions × 100
  • Best for: Testing pause duration, messaging, or positioning specifically

Discount Acceptance Rate

  • Formula: Customers Who Accepted Discount ÷ Total Sessions × 100
  • Best for: Optimizing discount percentages, durations, or presentation
  • Watch out: Higher acceptance with larger discounts might hurt revenue

LTV Extension

  • Formula: Sum of Additional Months Stayed ÷ Total Saved Customers
  • Best for: Long-term optimization when you have historical LTV data
  • Note: Requires longer tracking periods to produce meaningful data

Reading Results

The Results Dashboard

The results dashboard shows side-by-side performance cards for Control and Test variants. Each card displays the same metrics, making it easy to compare performance directly.

The variant with better performance on your primary metric is highlighted. If the difference is statistically significant (95%+ confidence), you'll see a "Significant" badge.

Key metrics on each variant card:

MetricWhat it tells you
Sessions enrolledTotal customers assigned to this variant
Save rate% accepting any retention offer
Revenue per exposureAverage revenue per session (your likely primary metric)
Reactivation rate% of saves who paid their next invoice
LTV extensionAverage additional months customers stayed

Offer Breakdown

The offer breakdown shows which retention offers customers accepted in each variant:

Offer TypeWhat it shows
PauseCustomers who chose to pause their subscription
DiscountCustomers who accepted a discount offer
Plan ChangeCustomers who downgraded to a lower plan

Compare the distribution between Control and Test to understand how customers are being saved, not just whether they're saved.

Retention Timeline

The retention timeline tracks what percentage of saved customers remain active over time (day 7, 14, 30, 60, 90). This reveals whether your saves are "sticky" or if customers churn shortly after accepting an offer.

A steep drop-off early in the timeline suggests customers are accepting offers but not genuinely retained. A flat line indicates strong long-term retention.

Expected Impact

Once results are in, the dashboard calculates the projected business impact if you roll out the winning variant:

MetricWhat it shows
Save Rate LiftPercentage point difference between variants (e.g., +5% means Test saves 5 more customers per 100)
Revenue Per Exposure DifferenceDollar difference per customer entering the flow
ARR ImpactProjected annual recurring revenue change based on your traffic volume

These projections help you quantify whether the improvement is worth implementing. A statistically significant result with minimal ARR impact might not justify the change.


Making a Decision

When your test reaches "Awaiting Decision," you need to analyze the results and pick a winner. This section helps you understand what the numbers mean and how to decide.

Understanding Statistical Significance

Statistical significance tells you whether the difference between variants is real or just random chance.

When you flip a coin 10 times and get 6 heads, that doesn't prove the coin is biased—it could easily happen by chance. But if you flip 1,000 times and get 600 heads, something is definitely going on. A/B testing works the same way.

Confidence LevelWhat It MeansCan You Trust It?
95%+Only 5% chance the difference is randomYes — mathematically reliable
80-94%6-20% chance the difference is randomMaybe — proceed with caution
Below 80%High chance the difference is randomNo — not statistically reliable

Important: You can always make a decision regardless of confidence level. Low confidence doesn't prevent you from choosing—it just means there's higher risk that the "winner" isn't actually better. You're making a judgment call, not a data-driven decision.

Understanding Lift

Lift measures how much better (or worse) Test performed compared to Control, as a percentage.

Formula: Lift = (Test - Control) / Control × 100

Control Save RateTest Save RateLift
40%44%+10%
40%48%+20%
40%36%-10%

How Sessions Affect Confidence

The smaller the improvement you're trying to detect, the more data you need. Think of it like hearing someone in a noisy room:

Lift SizeDifficultySessions Needed (per variant)
20%+ liftLike someone shouting—easy to detect~250 sessions
10% liftNormal conversation—need to focus~500 sessions
5% liftA whisper—need quiet to hear~1,000+ sessions

Minimum requirement: Below 30 sessions per variant, confidence is automatically 0%. The math simply doesn't work with fewer samples.

Practical guidance based on your volume:

Your Weekly VolumeWhat You Can Reliably Detect
50 sessions/weekOnly dramatic wins or losses (20%+ lift)
100 sessions/weekModerate differences (10-15% lift)
250+ sessions/weekSubtle optimizations (5% lift)

Bottom line: If you're making small tweaks expecting 5% improvements, you need a lot of data. If you're testing dramatically different approaches, you'll know faster.

The Decision Path

When evaluating your results, check these metrics in order:

Step 1: Check Statistical Confidence

  • Is it 95%+? → You have reliable data to make a decision
  • Is it below 95%? → Results are not mathematically reliable (see "Decision with Risk" cases below)

Step 2: Look at Your Primary Metric

  • Which variant performed better on the metric you chose (e.g., Revenue Per Exposure)?
  • How big is the difference? A 2% lift vs. a 20% lift have very different implications.

Step 3: Review Secondary Metrics

  • Does the "winner" also perform well on other metrics?
  • Watch for trade-offs: higher save rate but lower reactivation rate could mean you're saving customers who churn again quickly.

Step 4: Consider the Lift

  • Positive lift = Test outperformed Control
  • Negative lift = Control outperformed Test
  • Near-zero lift = Both performed similarly

Decision by Case

Here are common scenarios you'll encounter and how to handle each:

Case 1: Clear Winner (High Confidence + Strong Lift)

MetricControlTest
Sessions550550
Save Rate38%52%
Revenue/Exposure$22$34

Confidence: 98% · Lift: +55%

What this means: Test dramatically outperforms Control, and you have enough data to trust this result.

Decision: Choose Test. This is the ideal outcome—clear winner with high confidence.


Case 2: Insufficient Data (Low Confidence)

MetricControlTest
Sessions87
Save Rate37.5%42.8%
Revenue/Exposure$18$21

Confidence: 15% · Lift: +14%

What this means: Test looks better, but with only 15 sessions total, this could easily be random chance. The 15% confidence means there's an 85% probability this difference is noise.

Decision: The result is not statistically reliable. Your options:

  1. Choose Control (recommended) — It's your proven baseline. Don't change what works based on unreliable data.
  2. Choose Test anyway — If you have strong qualitative reasons to believe the changes are better, you can accept the risk.
  3. Run another test — Wait until you have more traffic and test again.

Case 3: Equal Performance (No Difference)

MetricControlTest
Sessions500500
Save Rate45%45%
Revenue/Exposure$25$25

Confidence: 50% · Lift: 0%

What this means: Both variants perform identically. Your changes made no measurable impact.

Decision: Choose Control. When there's no difference, stick with your original flow—it's simpler and already proven. Consider testing a more significant change next time.


Case 4: Test Performs Worse (Negative Lift)

MetricControlTest
Sessions450450
Save Rate50%30%
Revenue/Exposure$30$18

Confidence: 99% · Lift: -40%

What this means: Your changes hurt performance. Control is significantly better, and the high confidence means this is definitely real, not random.

Decision: Choose Control. Your hypothesis was wrong—the test variant made things worse. This is still a valuable learning: you now know what doesn't work.


Case 5: High Volume, Small Difference

MetricControlTest
Sessions5,5005,500
Save Rate43%46%
Revenue/Exposure$24$27

Confidence: 97% · Lift: +12%

What this means: Test is better, and with 11,000 total sessions you can trust this result. However, the improvement is modest (3 percentage points on save rate).

Decision: Choose Test. Even small improvements compound over time. A 3% save rate increase across thousands of customers adds up to significant revenue.


Case 6: Missing Data (Technical Issue)

MetricControlTest
Sessions3000
Save Rate42%
Revenue/Exposure$26

Confidence: N/A · Lift: N/A

What this means: Something went wrong. Test variant received no traffic—possibly a configuration error, broken flow, or technical issue.

Decision: Choose Control (you have no choice). Investigate why Test got no sessions before running another test. Check that the Test variant is published and active.


Document Your Rationale

Use the rationale field to record why you chose this winner. Good documentation helps your team understand past decisions.

Examples:

  • "Test showed 55% revenue lift with 98% confidence. Clear winner."
  • "Only 15 sessions enrolled. Choosing Control due to insufficient data."
  • "Test performed 40% worse. Reverting to Control."
  • "Small 3% lift but high confidence. Choosing Test for incremental gains."

What Happens After Confirmation

When you confirm your decision:

  • The winning variant goes live immediately for all future customers in this segment
  • The losing variant is deactivated but preserved for reference
  • The test moves to your completed tests history

FAQs

Can I test my primary Cancel Flow? No. A/B tests require segmented Cancel Flows because the primary flow serves as the fallback for all customers who don't match segment criteria.

Can I run multiple tests at once? Yes, as long as they test different segments with non-overlapping customer populations.

What happens if I need to edit a variant during the test? Technically possible, but strongly discouraged. It invalidates results because customers before and after the change experienced different flows.

Can I pause a test? Yes. Pausing stops new enrollments. You can resume later, but enrollment restarts from zero.

How do I see which variant a specific customer saw? Check the Exposure Stream section in the test results page.

Why can't I see final results immediately? The 30-day tracking period ensures we measure actual retention, not just initial offer acceptance. Without it, you'd optimize for people who say "yes" but churn anyway.

What happens to the losing variant's enrolled users? Nothing changes for them. The test measures what happened; it doesn't retroactively change anyone's experience.