Free template

Growth experiment template

One page from hypothesis to verdict: ICE score, success metric, sample size, run log and the learning you keep even when the test loses.

Markdown — paste it into Notion, Linear or any doc

# Growth Experiment: [short name]

**Owner:** ...
**Date opened:** ...
**Status:** Draft / Running / Concluded

## 1. Hypothesis
We believe that [change]
for [audience / segment]
will improve [metric]
because [insight or evidence].

## 2. Prioritization (ICE)
- Impact (1-10): ...
- Confidence (1-10): ...
- Ease (1-10): ...
- **ICE score:** (I + C + E) / 3 = ...

## 3. Design
- Primary metric: ...
- Guardrail metric(s): ...
- Control (A): ...
- Variant (B): ...
- Minimum detectable effect we care about: ... %
- Required sample size per variant: ...
- Planned duration: ... (do not stop early)

## 4. Run log
| Date | Note |
|------|------|
| ...  | Launched |
| ...  | ... |

## 5. Results
- Control: ... visitors, ... conversions (... %)
- Variant: ... visitors, ... conversions (... %)
- Uplift: ... %
- P-value: ...
- Significant at 95%? Yes / No

## 6. Verdict and learning
- Decision: Ship / Kill / Iterate
- What we learned (one paragraph, written for someone who was not here): ...
- Follow-up experiment idea: ...

Why a template beats enthusiasm

Most growth experiments fail before they launch — not because the idea was bad, but because nobody wrote down what success would look like. Without a hypothesis fixed in advance, every result can be argued into a win, and the team learns nothing. This template is deliberately short: if filling one page feels like too much overhead, the experiment is probably too vague to run.

The hypothesis format matters. Forcing the sentence "we believe that X for Y will improve Z because W" exposes weak ideas instantly: if you cannot name the audience, the metric or the underlying insight, you have a hunch, not a hypothesis. The ICE score (Impact, Confidence, Ease) then keeps prioritization honest when you have ten ideas and bandwidth for two — it is crude, but crude and consistent beats sophisticated and political.

The design section is where discipline pays. Deciding the sample size and duration before launch is the only reliable protection against peeking — stopping the test the day it happens to look good. Guardrail metrics (churn, support tickets, page speed) catch the wins that quietly break something else. Use the significance calculator to fill in the results section with a real p-value instead of a feeling.

Finally, write the learning even for losers — especially for losers. A killed experiment with a clear learning saves the next person a quarter. Over a year, the archive of verdicts becomes your team's real growth playbook.