Biostatistics(7)概率与概率分布

3.3 概率分布

3.3.1 离散型随机变量

若随机变量的取值为有限个或可列个,则称此随机变量为离散型(discrete)随机变量,简称离散量
比如你抛掷一枚硬币两次,那么结果只有4种可能性:
HH,HT,TH和TT(H:正面;T:反面)
如果用一个随机变量X表示该试验中出现H结果的次数,那么X只有0,1,2三种可能。因此,X为离散型随机变量。具体地:
P(X=0)=0.25
P(X=1)=0.5
P(X=2)=0.25
P(X):Probability Distribution Function(PDF) of variable X 为X的概率分布律,满足下列性质:

Biostatistics(7)概率与概率分布_第1张图片
P(X).png
3.3.2 连续型随机变量

对于随机变量X,若存在一个非负的实函数f(x),使X落在任意区域D上的概率
则称为X的连续型随机变量,简称连续量,称f(x)为X的概率密度函数,简称密度。
由定义知,密度函数具有以下性质:
(1)f(x)≥0
(2)

Biostatistics(7)概率与概率分布_第2张图片
连续变量.png

(3)

Biostatistics(7)概率与概率分布_第3张图片
连续变量.png

离散型变量和连续型变量的总结:

Biostatistics(7)概率与概率分布_第4张图片
Summary.png

Mean and variance for discrete variable with a given PDF

Biostatistics(7)概率与概率分布_第5张图片
PDF.png
Biostatistics(7)概率与概率分布_第6张图片
Mean.png
Biostatistics(7)概率与概率分布_第7张图片
Variance.png
3.3.3 0-1(p)分布
Biostatistics(7)概率与概率分布_第8张图片
0-1分布.png

E(X)=1×p+0×(1-p)=p
Var(X)=E(X2)-(E(X))2=(12×p+02×(1-p))-p2=p-p2=p(1-p)

3.3.4 贝努里分布 Bernoulli distribution

定义:在n次独立重复的试验中,每次试验都只有两个结果:A,A‘,且每次试验中A发生的概率不变,记P(A)=p,0 在n重贝努里试验中,若记事件A发生的概率为P(A)=p,0

PDF of Bernoulli distribution.png

E(x)=E(x1+x2+...+xn)=E(x1)+E(x2)+...+E(xn)=p+p+...+p=np
Var(x)=Var(x1+x2+...+xn)=Var(x1)+Var(x2)+...+Var(xn)=p(1-p)+p(1-p)+...+p(1-p)=np(1-p)

Example of a Binomial distribution
When a fair coin is flipped, the probability of it being Head or Tail is the same, i.e.,p=0.5.
If we flip the coin 5 times, what is the probability of having 5 Head?

Answer.png

Example of a Binomial distribution
After a genome wide Chip-seq experiment, a transcription factor was found to bind to the promoter region of 100 genes(out of 26,000). Now, if we do another experiment with a second TF and identify also 100 genes, what is the probability of finding at least 5 of them with the first TF binding site?
Suppose the first TF binds to gene without any preference, then the probability of a gene randomly selected from the genome that is bound by the first TF is 100/26000=0.039
For a given gene, it is either bound by the first TF('success') or not ('failure'),i.e.,a Bernoulli trail.
If the second TF is independent of the first TF, then the number of genes bound by the second TF that are also bound by the first TF follows a binomial distribution.
Binomial distribution:n=100,p=0.0039
P(k=0)=0.6765408
P(k=1)=0.2648840
P(k=2)=0.05133606
P(k=3)=0.006565821
P(k=4)=0.0006233937
P(k>=5)=1-P(k=0)-P(k=1)-P(k=2)-P(k=3)-P(k=4)=4.992756e-05

3.3.5 负贝努里分布 Negative Binomial distribution

定义:实验包含一系列独立的试验,每个试验都有成功、失败两种结果,成功的概率p是恒定的,实现持续到r次成功,r为正整数。满足上述条件的称为负贝努里分布。

Biostatistics(7)概率与概率分布_第9张图片
PDF of Negative Binomial Distribution.png

Mean and Variance of Negative Binomial Distribution

Biostatistics(7)概率与概率分布_第10张图片
Negative Binomial Distribution.png
Biostatistics(7)概率与概率分布_第11张图片
Mean and Variance.png

Alternative formulation of Negative Binomial distribution

Biostatistics(7)概率与概率分布_第12张图片
Negative Binomial distribution.png

Example of negative binomial distribution
If a predator must capture 10 prey before it can grow large enough to reproduce, what would the mean age of onset of reproduction be if the probability of capturing a prey on any given day is 0.1?

Biostatistics(7)概率与概率分布_第13张图片
Answer.png

The expected time is 100 days. However, the variance is quite high (900) and that the distribution looks quite skewed. Some predators will reach reproductive age much sooner and some much later than the average.

3.3.6 几何分布 Geometric distribution

定义:在n次贝努里试验中,试验k次才得到第一次成功的机率。即,前k-1次皆失败,第k次成功的概率。

Biostatistics(7)概率与概率分布_第14张图片
Geometric distribution.png

Example of geometric distribution
If the probability of extinction of an endangered population is estimated to be 0.1 every year, what is the expected time until extinction?

Biostatistics(7)概率与概率分布_第15张图片
Answer.png

The expected time is 10 year. However, because of large variance, it will be difficult to predict the actual year in which the population go to extinct accurately.

你可能感兴趣的:(Biostatistics(7)概率与概率分布)