A/B测试

目录

  • 1. 理解
  • 2.步骤
  • 3. Python 代码实现
  • 4. 统计相关知识
  • 5. 注意点
  • 参考

1. 理解

简单来说,就是通过试验Test测试一下哪一个方案更加好。这里的方案可以是网页的两个版本(例如布局的改变、按钮的样式的变化、文本的改变等等)。 也可以是不同的市场营销策略(marketing strategies), 如free gift和promotion。总之,就是想通过A/B Testing为决策作出依据。

  • A/B testing (also known as split testing) is the process of comparing two versions of a web page, email, or other marketing asset and measuring the difference in performance.
  • You do this giving one version to one group and the other version to another group. Then you can see how each variation performs.

2.步骤

  1. 确定A/B Testing目的

  2. 确定指标

实际上就是一个判断的那个方案更好的指标。 这个指标是跟你做A/B testing 目的相关的。
例如: Click-through rate (CTR)、 电子邮件注册率、Call to action (a marketing term for any device designed to prompt an immediate response or encourage an immediate sale) 等等。

Four categories of metrics should be considered when designing an experiment, namely:

  • Sums and counts — e.g. how many cookies visit a webpage
  • Distributed metrics such as mean, median and quartiles — e.g. what is the mean page load time
  • Probability and rates — e.g. clickthrough rates
  • Ratios — e.g. ratio of the probability of a revenue generating click to the probability of any click

For example, the number of active users is a core metric used by most technology companies — but what does this actually mean? When defining this metric some considerations would include:

  • What is the time period for a user to be active? Is someone active who used the product today, in the last 7 days, in the last month?
  • What activity constitutes active? Is it logging in, spending a certain amount of time using the product?
    What if the activity is only an automatic notification?
  • Does how active the user is matter, e.g. one user spends 10 seconds per day vs another user who spends 5 hours per day?
  • Are we measuring paying and non-paying users differently?
  1. 确定变体

新版本是如何设计的,如改变按钮颜色。
Focus on a single metric at first. You can always run A/B tests later that deal with other metrics. If you concentrate on one thing at a time, you’ll get cleaner data.

  1. 生成假设

  2. 设计实验

  • 确认指标(well-defined metrics)
  • 样本大小

There are four test parameters that need to be set to enable the calculation of a suitable sample size:

  • Baseline rate — an estimate of the metric being analyzed before making any changes
  • Practical significance level — the minimum change to the baseline rate that is useful to the business, for example an increase in the conversion rate of 0.001% may not be worth the effort required to make the change whereas a 2% change will be
  • Confidence level — also called significance level is the probability that the null hypothesis (experiment and control are the same) is rejected when it shouldn’t be
  • Sensitivity — the probability that the null hypothesis is not rejected when it should be

The baseline rate can be estimated using historical data, the practical significance level will depend on what makes sense to the business and the confidence level and sensitivity are generally set at 95% and 80% respectively but can be adjusted to suit different experiments or business needs.
Once these are set, the sample size required can be calculated statistically using a calculator such as this.

  • 其他实验因素
  • When the experiment should run (e.g. every day or only business days) and how long the experiment should run (e.g. 1 day or 1 week)
  • How many users to expose to the experiment per day, where exposing fewer users means longer experiment time periods
  • How to account for the learning effect i.e. allow for users to learn the changes before measuring the impact of a change
  • What tools will be used to capture data, such as Google Analytics
  • 用户选取的随机性

It is also very important to ensure users are selected randomly when being assigned into the control and experiment groups.

  1. 运行实验
  2. 收集数据
  3. 分析结果

A/B test分析将显示两个版本之间是否存在统计性显著差异

3. Python 代码实现

  • 数据集Kaggle
  • Python 代码

4. 统计相关知识

  • 点估计
  • 区间估计
  • 中心极限定律
  • 假设检验、p-value、两类错误
  • 分布:t分布、z分布、卡方分布

5. 注意点

  • Generally there needs to be a material amount of users that enable statistical inference and experiments that can be completed within a reasonable amount of time.
  • A/B testing doesn’t work well when testing major changes, like new products, new branding or completely new user experiences. In these cases there may be novelty effects that drive higher than normal engagement or emotional responses that cause users to resist a change initially.
  • A/B tests on products that are purchased infrequently and have long lead times is a bad idea — waiting to measure the repurchase rate of customer’s buying houses is going to blow out experiment timelines.

参考

  1. 什么是A/B test?有哪些流程?有什么用?终于有人讲明白了
  2. A Beginner’s Guide To A/B Testing: An Introduction
  3. How to do A/B Testing and Improve Your Conversions Quickly
  4. Implementing A/B Tests in Python

你可能感兴趣的:(数据分析面试常见问题)