Can I Take a Peek? Continuous Monitoring of A/B Tests

A/B testing is a hallmark of Internet services: from e-commerce sites to social networks to marketplaces, nearly all online services use randomized experiments as a mechanism to make better business decisions. Such tests are generally analyzed using classical frequentist statistical measures: p-values and confidence intervals.

Despite their ubiquity, these reported values are computed under the assumption that the experimenter will not continuously monitor their test---in other words, there should be no repeated “peeking” at the results that affects the decision of whether to continue the test. On the other hand, one of the greatest benefits of advances in information technology, computational power, and visualization is precisely the fact that experimenters can watch experiments in progress, with greater granularity and insight over time than ever before.

What You Will Learn:
Based on some of Ramesh's work at Optimizely, you'll learn how their optimization platform addresses continuous monitoring of experiments.

Prerequisites:
Basic statistics would be helpful.

Where To Learn More:
- Nontechnical blog post @ Optimizely.com
- Technical post (PDF) from Optimizely
- Full paper on arxiv via Arxiv.org

These slides are from a talk given at the SF Data Engineering meetup.