概率论与随机过程笔记(2):离散随机变量(1)

概率论与随机过程笔记(2):离散随机变量(1)

2019-11-25

*这部分的笔记依据Dimitri P. Bertsekas和John N. Tsitsiklis的《概率导论》第2章前半部分的内容。

2.1 关于离散随机变量的几个概念

Given an experiment and the corresponding set of possible outcornes (the sample space), a random variable associates a particular number with each outcome: see Fig. 2.1. We refer to this number as the numerical value or simply the value of the random variable. mathematically, a random variable is a real-valued function of the experimental outcome.
概率论与随机过程笔记(2):离散随机变量(1)_第1张图片
Main Concepts Related to Random Variables:

Starting with a probabilistic model of an experiment:

  • A random variable is a real-valued function of the outcome of the experiment.
  • A function of a random variable defines another random variable.
  • We can associate with each random variable certain “averages” of interest, such as the mean and the variance.
  • A random variable can be conditioned on an event or on another random variable.
  • There is a notion of independence of a random variable from an event or from another random variable.

A random variable is called discrete if its range (the set of values that it can take) is either finite or countably infinite. A random variable that can take an uncountably infinite number of values is not discrete.

Concepts Related to Discrete Random Variables

Starting with a probabilistic model of an experiment:

  • A discrete random variable is a real-valued function of the outcome of the experiment that can take a finite or countably infinite number of values.
  • A discrete random variable has an associated probability mass funcion (PMF), which gives the probability of each numeric the random variable can take.
  • A function of a discrete random variable defines another discrete random variable, whose PMF can be obtained from the PMF of the original random variable.
2.2 概率质量函数(probability mass function)

The most important way to characterize a random variable is through the probabilities of the values that it can take. For a discrete random variable X X X, these are captured by the probability mass function (PMF for short) of X X X, denoted p X p_X pX. In particular. if x is any possible value of X X X. the probability mass of x x x. lenoted p X ( x ) p_X(x) pX(x). is the probability of the event { X = x } \{X = x\} {X=x} consisting of: that give rise to a value of X X X equal to x x x: p X ( x ) = P ( { X = x } ) p_X(x) = P(\{X=x\}) pX(x)=P({X=x})

We will use upper case characters ( X X X) to denote random variables, and lower case characters ( x x x) to denote real numbers such as the numerical values of a random variable. Note that ∑ x p X ( x ) = 1 \sum_x p_X(x) = 1 xpX(x)=1

Calculation of the PMF of a Random Variable X X X

For each possible value x x x of X X X:

  • collect all the posible outcomes that give rise to the event { X = x } \{X=x\} {X=x}
  • add their probabilities to obtain p X ( x ) p_X(x) pX(x)

By a similar argument, for any set S S S of possible values of X X X,we have P ( x ∈ S ) = ∑ x ∈ S p X ( x ) P(x \in S) = \sum_{x \in S} p_X(x) P(xS)=xSpX(x)

【伯努利随机变量(The Bernoulli Random Variable)】Consider the toss of a coin, which comes up a head with probability p p p, and a tail with probability 1 − p 1 - p 1p. The Bernoulli random variable takes the two values 1 and 0, depending on whether the outcome is a head or a tail:
f ( x ) = { 1 if a head 0 if a tail f(x)= \begin{cases} 1& \text{if a head}\\ 0& \text{if a tail} \end{cases} f(x)={10if a headif a tail Its PMF is
f ( x ) = { p if k=1 1 − p if k=0 f(x)= \begin{cases} p& \text{if k=1}\\ 1-p& \text{if k=0} \end{cases} f(x)={p1pif k=1if k=0
For all its simplicity. the Bernoulli random variable is very important. In practice. it is used to model generic probabilistic situations with just two outcomes.

【二项随机变量(The Binomial Random Variable)】A coin is tossed n times. At each toss, the coin comes up a head with probability p p p, and a tail with probability 1 − p 1-p 1p, independent of prior tosses. Let X X X be the number of heads in the n n n-toss sequence. We refer to X X X as a binomial random variable with parameters n n n and p p p. PMF of X X X consists of the binomial probabilities that were calculated
p X ( k ) = P ( x = K ) = ( n k ) p k ( 1 − p ) n − k p_X(k) = P(x=K) = {n \choose k} p^k (1-p)^{n-k} pX(k)=P(x=K)=(kn)pk(1p)nkThe normalization property, specialized to the binomial random variable, is written as ∑ k = 0 n ( n k ) p k ( 1 − p ) n − k = 1 \sum_{k=0}^n{n \choose k} p^k (1-p)^{n-k} =1 k=0n(kn)pk(1p)nk=1
Some special cases of the binomial PMF are sketched in Fig. 2.3.
概率论与随机过程笔记(2):离散随机变量(1)_第2张图片
【几何随机变量(The Geometric Random Variable)】Suppose that we repeatedly and independently toss a coin with probability of a head equal to p p p, where 0 < p < 1 0 < p < 1 0<p<1. The geometric random variable is the number X X X of tosses needed for a head to come up for the first time. Its PMF is
given by p X ( k ) = ( 1 − p ) k − 1 p ,            k = 1 , 2 , ⋯   , p_X(k) = (1-p)^{k-1}p, \;\;\;\;\; k=1,2,\cdots, pX(k)=(1p)k1p,k=1,2,,since ( 1 − p ) k − 1 p (1-p)^{k-1}p (1p)k1p is the probability of the sequence consisting of k − 1 k-1 k1 successive tails followed ba a head; see Fig. 2.4. This is a legitimate PMF because
∑ k = 1 ∞ p X ( k ) = ∑ k = 1 ∞ ( 1 − p ) k − 1 p \sum_{k=1}^\infty p_X(k) = \sum_{k=1}^\infty (1-p)^{k-1}p k=1pX(k)=k=1(1p)k1p = p ∑ k = 0 ∞ ( 1 − p ) k = p ⋅ 1 1 − ( 1 − p ) = 1 =p \sum_{k=0}^\infty (1-p)^{k} = p \cdot \frac{1}{1-(1-p)} = 1 =pk=0(1p)k=p1(1p)1=1Naturally, the use of coin tosses here is just to provide insight. More generally, we can interpret the geometric random variable in terms of repeated independent trials until the first “success.” Each trial has probability of success p p p and the number of trials until (and including) the first success is modeled by the geometric random variable. The meaning of “success” is context-dependent.
【泊松随机变量(The Poisson Random Variable)】A Poisson random variable has a PMF given by p X ( k ) = e − λ λ k k ! p_X(k) = e^{-\lambda} \frac{\lambda^k}{k!} pX(k)=eλk!λkwhere λ \lambda λ is a positive parameter characterizing the PMF, see Fig. 2.5.
概率论与随机过程笔记(2):离散随机变量(1)_第3张图片
This is a legitimate PMF because ∑ k = 0 ∞ e − λ λ k k ! = e − λ ( 1 + λ + λ 2 2 ! + λ 3 3 ! + ⋯ ) = e − λ e λ = 1 \sum_{k=0}^{\infty}e^{-\lambda} \frac{\lambda^k}{k!} = e^{-\lambda}\Big(1 + \lambda + \frac{\lambda^2}{2!} + \frac{\lambda^3}{3!}+ \cdots \Big) = e^{-\lambda}e^{\lambda} = 1 k=0eλk!λk=eλ(1+λ+2!λ2+3!λ3+)=eλeλ=1To get a feel for the Poisson random variable. think of a binomial random variable with very small p and very large n n n. For example. let X X X be the number of typos in a book with a total of n n n words. Then X X X is binomial, but since the probability p that any one word is misspelled is very small. X X X can also be well-modeled with a Poisson PMF (let p be the probability of heads in tossing a coin. and associate misspelled words with coin tosses that result in heads).
More precisely. the Poisson PMF with parameter λ \lambda λ is a good approximation for a binomial PMF with parameters n n n and p p p. i.e … e − λ λ k k ! ≈ n ! k ! ( n − k ) ! p k ( 1 − p ) n − k                  k = 0 , 1 , ⋯   , e^{-\lambda} \frac{\lambda^k}{k!} \approx \frac{n!}{k!(n-k)!}p^k(1-p)^{n-k} \;\;\;\;\;\;\;\; k=0,1,\cdots, eλk!λkk!(nk)!n!pk(1p)nkk=0,1,,provided λ = n p \lambda= np λ=np. n n n is very large. and p p p is very smalll. In this case. using the
Poisson PMF may result in simpler models and calculations.

2.3 随机变量的函数(Function of Random Variables)

If Y = g ( X ) Y = g(X) Y=g(X) is a function of a random variable X X X, then Y Y Y is also a random variable, since it provides a numerical value for each possible outcome. This is because every outcome in the sample space defines a numerical value x x x for X X X and hence also the numerical value y = g ( x ) y = g(x) y=g(x) for Y Y Y. If X X X is discrete with PMF p X p_X pX. then Y Y Y is also discrete, and its PMF p Y p_Y pY can be calculated using the PMF of X X X. In particular, to obtain p Y ( y ) p_Y(y) pY(y) for any y y y, we add the probabilities of all values of x x x such that g ( x ) = y g(x) = y g(x)=y: p Y ( y ) = ∑ { x ∣ g ( x ) = y } p X ( x ) p_Y(y) = \sum_{\{x \vert g(x)=y\}} p_X(x) pY(y)={xg(x)=y}pX(x)

2.4 期望、均值和方差(Expectation, Mean and Variance)

【期望】The PMF of a random variable X X X provides us with several numbers. the probabilities of all the possible values of X X X. It is often desirable. however, to summarize this information in a single representative number, This is accomplished by the expectation of X X X. which is a weighted (in proportion to probabilities) average of the possible values of X X X.
We define the expected value (also called the expectation or the mean) of a random variable X X X, with PMF p X p_X pX, by E [ X ] = ∑ x x p X ( x ) E[X] = \sum_x xp_X(x) E[X]=xxpX(x)It is useful to view the mean of X X X as a ‘‘representative’’ value of X X X, which lies somewhere in the middle of its range. We can make this statement more precise, by viewing the mean as the center of gravity of the PMF. In particular, if the PMF is symmetric around a certain point, that point must be equal to the mean.
【矩(moment)和方差(variance)】We define the 2nd moment of the random variable X X X as the expected value of the random variable X 2 X^2 X2. More generally, we define the n n nth moment as E [ X n ] E[X^n] E[Xn], the expected value of the random variable X n X^n Xn. With this terminology, the 1st moment of X X X is just the mean.
Variance is denoted by var ( X ) \text{var}(X) var(X) and is defined as the expected value of the random variable ( X − E [ X ] ) 2 (X - E[X])^{2} (XE[X])2, i.e., var ( X ) = E [ ( X − E [ X ] ) 2 ] \text{var}(X) = E \Big[(X - E[X])^2 \Big] var(X)=E[(XE[X])2] Since ( X − E [ X ] ) 2 (X-E[X])^2 (XE[X])2 can only ake nonnegative values, the variance is always nonnegative.
The variance provides a measure of dispersion of X X X around its mean. Another measure of dispersion is the standard deviation of X X X, which is defined as the square root of the variance and is denoted by σ x \sigma_x σx: σ = var ( X ) \sigma = \sqrt{\text{var}(X)} σ=var(X) The standard deviation is often easier to interpret because it has the same units as X X X.
One way to calculate var ( X ) \text{var}(X) var(X), is to use the definition of expected value, after calculating the PMF of the random variable ( X − E [ X ] ) 2 (X-E[X])^2 (XE[X])2. It turns out that there is an easier method to calculate var ( X ) \text{var}(X) var(X). which uses the PMF of X X X but does not require the PMF of ( X − E [ X ] ) 2 (X-E[X])^2 (XE[X])2. This method is based on the following rule:

Expected Value Rule for Functions of Radom Variables
Let X X X be a random variable with PMF p X p_X pX, and let g ( X ) g(X) g(X) be a function of X X X. Then, the expected value of the random variable g ( X ) g(X) g(X) is given by E [ g ( X ) ] = ∑ x g ( x ) p X ( x ) E[g(X)] = \sum_xg(x)p_X(x) E[g(X)]=xg(x)pX(x)

To verify this rule, we let Y = g ( X ) Y = g(X) Y=g(X) and use the formula p Y ( y ) = ∑ { x ∣ g ( x ) = y } p X ( x ) p_Y(y) = \sum_{\{x \vert g(x) = y\}}p_X(x) pY(y)={xg(x)=y}pX(x) derived in the preceding section. We have E [ g ( X ) ] = E Y E[g(X)]=E{Y} E[g(X)]=EY = ∑ y y ⋅ p Y ( y ) =\sum_y y \cdot p_Y(y) =yypY(y) = ∑ y y ∑ { x ∣ g ( x ) = y } p X ( x ) =\sum_y y \sum_{\{x \vert g(x) = y\}}p_X(x) =yy{xg(x)=y}pX(x) = ∑ y ∑ { x ∣ g ( x ) = y } y ⋅ p X ( x ) =\sum_y \sum_{\{x \vert g(x) = y\}}y \cdot p_X(x) =y{xg(x)=y}ypX(x) = ∑ y ∑ { x ∣ g ( x ) = y } g ( x ) p X ( x ) =\sum_y \sum_{\{x \vert g(x) = y\}}g(x) p_X(x) =y{xg(x)=y}g(x)pX(x) = ∑ x g ( x ) p X ( x ) = \sum_xg(x)p_X(x) =xg(x)pX(x) Using the expected value rule, we can write the variance of X X X as Using the expected value rule, we can write the variance of X X X as var ( X ) = E [ ( X − E ( X ) ) 2 ] \text{var}(X) = E\Big[(X-E(X))^2 \Big] var(X)=E[(XE(X))2] = ∑ x ( x − E [ X ] ) 2 p X ( x ) = \sum_x(x-E[X])^2p_X(x) =x(xE[X])2pX(x) Similarly, the n n nth moment is given by E [ X n ] = ∑ x ( x − E [ X ] ) n p X ( x ) E[X^n]= \sum_x(x-E[X])^np_X(x) E[Xn]=x(xE[X])npX(x)
As we have already noted, the variance is always nonnegative, but could it be zero? Since every term in the formula ∑ x ( x − E [ X ] ) 2 p X ( x ) \sum_x(x-E[X])^2p_X(x) x(xE[X])2pX(x) is nonnegative, the sum is zero if and only if ( x − E [ X ] ) n p X ( x ) = 0 (x-E[X])^np_X(x)=0 (xE[X])npX(x)=0 for every x x x. This condition implies that for any x x x with p x ( x ) > 0 px(x) > 0 px(x)>0, we must have x = E [ X ] x =E[X] x=E[X] and the random variable X X X is not really ‘‘random’’: its value is equal to the mean E [ X ] E[X] E[X], with probability 1

你可能感兴趣的:(概率论与随机过程,数学)