Beating A/B Tests

vijayaggarwal · on June 25, 2014

Disclaimer: I work for Visual Website Optimizer (VWO). Still, I will try to be as neutral as possible.

> Bandit tests allow you to run as many variations at the same time as you want, versus A/B testing, which you limits to two.

True that the phrase A/B Testing in a strict sense considers only two variations, but almost all tools allow something called A/B/n Testing which allows for multiple variations. The mathematical foundations of A/B/n testing are as well established as that of A/B testing.

> But there's a deeper, less obvious benefit. Had we run these social button variations as a series of A/B tests, we'd be at much greater risk of reaching a local maxima. That is, the odds of us missing the best performing variation in favor of one that's merely adequate would be higher.

Again, there is another concept called Multivariate Testing to be used for testing multiple changes per page. MVT is also offered by many online A/B testing tools. Again, the theoretical foundation is very well established.

> So I'd have to be on top of my game to make the code-changes as soon as the Chi Squared value was high enough.

Good A/B testing services do this automatically for you. Losing variations are automatically stopped and winning variations are automatically given 100% traffic. Of course, you have a choice to enable/disable this behavior.

> there's nothing stopping you from tuning a bandit test to behave exactly like an A/B test.

Yes of course. Because multi-armed bandit (MAB) is another strategy of running A/B tests (more generally A/B/n tests, even more generally multivariate tests). So, technically, you can't beat A/B tests with MAB as your article's headline suggests, because MAB is itself a strategy of running A/B tests. What you can try to beat is classical strategy used by most tools today. Here, the post by VWO linked in your article clearly details the pros and cons of both approaches and establishes why a head to head comparison is, in fact, inappropriate.

> ...the critique is utterly ridiculous

As this point talks direct about VWO, I would take a break from my attempt at neutrality. tuning a bandit test to behave exactly like an A/B test was not the point of our article, and it is not the point of your article either. We have both attempted to compare the two approaches (MAB vs classical) to A/B testing. I do not mind your criticizing our article, but it would only be appropriate to substantiate your criticism with facts. Criticism without facts does not help the cause of discourse. I am fairly well versed with the theory of A/B testing and happy to take any questions here.