A/B testing is not for startups
Why false precision is your enemy in navigating the startup maze.
There are many definitions of startups. One of them is this — startups are a truth finding exercise. Peter Thiel's spin on this is that all successful startups are based on a secret about the world. The theory goes, if your company’s truth had already been discovered, someone would have built it already.
Unfortunately startups are not as simple as uncovering a single important truth. The whole journey is a truth finding exercise. Balaji calls this The Idea Maze. I think of it as The Idea Map. Once you have a kernel of a thing that’s working, you have to find more people who want what you’ve built. You have to make them want it more. Make them use it more. Eventually you need more people to want it. It’s truth finding missions all the way down.
When you start on your first truth finding missions you don't have any data to guide you. You have a hypothesis about what’s true. From there you have to take a shot in the dark, an educated guess at making something people want. You build something. You talk to customers. You iterate.
Then one day, suddenly, you have customers, you have revenue. You know how many people are using each feature. You can break your funnel down into discrete steps. You have charts going up/flat/down. You have data.
Surely this – data – is the ultimate truth finding tool? We can now know what’s true. No more guessing. No more darkness. Certainty. Light. Maybe we can even use science to be even more certain?
Enter: A/B testing. A/B testing is tech’s name for a Randomized Controlled Trial. You have a control (A) and a test (B). We split the population in two, give some portion of them the control (A) and the remainder a version with just one variable changed, the test (B). With some basic statistics, we can be certain that any movement in the numbers is significant and causal.
The way A/B testing usually first enters this picture is this. You have some hunch that something is broken, that it could be better with a change. It could be anything: copy on your marketing site, how your signup flow works, how you price your product. Let’s say it’s copy on the marketing site, you think a better headline would convert better. You make the change. Signups go up 20%. And someone says – but how can you really be sure? What if it’s just random? Next time, can we run a test?
No. Next time, you should not run a test.
The most obvious issue with A/B testing in startups is that it's slow. Depending on the magnitude of your desired change you’ll need many thousands of people in each arm of your test. This might be possible in weeks or months near the top of your funnel. As you get further into your funnel, it becomes impossible to get meaningful significance in a reasonable amount of time.
But that’s not the biggest problem. The biggest problem with A/B testing is it offers false precision. The complexity of the questions startups grapple with can rarely be reduced to a single testable variable. If you’re trying to decide whether to focus on market segment A or B, maybe you update your website headline to speak more directly to segment B. You run a test. Conversion drops 20%. Ok, A, it must be, right? Well… what if you’re getting much more traffic for segment A than B today? Or what if it was just a bad headline? What if you are getting enough traffic and it was a good headline, but your signup flow doesn’t speak to this segment? I could go on.
If instead you went and spoke with 10 customers in segment A and B, and those in segment B told you that really their pain is so great. They’ve wanted your product forever. They need it. And if everyone in segment A was lukewarm. Would anyone question that this was just random? That next time you should be more scientific?
That’s what’s funny. We are comfortable dealing with qualitative data in soft, intuitive ways. But how the most impactful individuals work with quantitative data is similar. Through years of building a company, founders, leaders, and the early team build a deep model of their problem space. When you see a metric spike, after making a specific change, you have a much better than random shot at determining its significance.
There will be mistakes. But the rapidity of decision making when you treat quantitative data as just another signal is so much higher that it doesn’t matter. Think of it this way: A/B testing might let you make one “correct” decision once a month, without it even if you make 2 wrong decisions and 3 correct decisions a month, you’re ahead. And I bet the odds are much better than that.
Ostensibly, this article is about A/B testing. But it’s not really. What it’s really about is trusting the instincts and model you’ve built of your business. And so when someone tells you to run a test next time, you can confidently tell them no. A/B testing is not for startups.