by Feiran Jia
Sampling Distribution of Sample Mean, X¯
X¯ is distributed with mean equal to population mean, µ, and standard deviation of σ/n‾‾√
X¯ ~ (μ,σ2n)
Central Limit Theorem implies that for a sample with n > 30 the shape of the distribution of X¯ will be normal
X¯ ~ N(μ,σ2n)
σ unknown, standard deviation of X¯ is sn√
Standardized sample mean t=X¯−μs/n√ ~ Student t dis. with ν=n−1 degress of freedom
Confidence Interval Estimator
Confidence interval estimator of μ is
X¯±zα/2σn√ or X¯±zα/2,νsn√
Student t-distribution
H0 is initial presumption about population
Test-statistic - A random variable created using sample statistic
When testing µ, test-statistic is a function of X¯
Recall that when testing p, test-statistic is a function of p̂
Example: Parking Fees
Hoping to lure more shoppers downtown, a city builds a new public parking garage in the central business district. The city plans to pay for the structure through parking fees. The city would break even only if the average parking revenues are greater than $125. For a random sample of 44 weekdays, daily fees collected averaged $126. Assume that the standard deviation is known to be $15.
H0:μ=125
HA:μ>125
n=44
population standard deviation, σ = 15
significance level, α = 0.05
sample mean, X¯=126
Rejection region with t
[one-sided test]
Rejection region: t>tα
tα,ν=t.05,43=1.681
Test-statistic = X¯−μ0s/n√=126−12515/44√=0.44
>
p-value with t
[One-sided test]
P-value = P( t>X¯−μ0s/n√ ) = P(t>0.44) = 0,33
Link CI Estimator and Hypothesis Testing
σ is unknown?
If population standard deviation, σ is unknown, we use our best guess to estimate it - sample standard deviation, s
We can no longer rely on normal distribution as a sampling distribution of sample mean
Instead, we use Student t distribution
Test statistic is distributed according to Student t, not standard normal
Test statistic = X¯−μs/n√
假设检验:用来判断样本与样本,样本与总体的差异是由抽样误差引起还是本质差别造成的统计推断方法。
基本原理:先对总体的特征作出某种假设,然后通过抽样研究的统计推理,对此假设应该被拒绝还是接受作出推断。
1.小概率原理
如果对总体的某种假设是真实的,那么不利于或不能支持这一假设的事件A(小概率事件)在一次试验中几乎不可能发生的;要是在一次试验中A竟然发生了,就有理由怀疑该假设的真实性,拒绝这一假设。
2.假设的形式
H0——原假设,H1——备择假设
假设检验就是根据样本观察结果对原假设(H0)进行检验,接受H0,就否定H1;拒绝H0,就接受H1。
假设检验的原理
一般地说,对总体某项或某几项作出假设,然后根据样本对假设作出接受或拒绝的判断,这种方法称为假设检验。
假设检验使用了一种类似于“反证法”的推理方法,它的特点是:
(1)先假设总体某项假设成立,计算其会导致什么结果产生。若导致不合理现象产生,则拒绝原先的假设。若并不导致不合理的现象产生,则不能拒绝原先假设,从而接受原先假设。
(2)它又不同于一般的反证法。所谓不合理现象产生,并非指形式逻辑上的绝对矛盾,而是基于小概率原理:概率很小的事件在一次试验中几乎是不可能发生的,若发生了,就是不合理的。至于怎样才算是“小概率”呢?通常可将概率不超过0.05的事件称为“小概率事件”,也可视具体情形而取0.1或0.01等。在假设检验中常记这个概率为α,称为显著性水平。而把原先设定的假设成为原假设,记作H0。把与H0相反的假设称为备择假设,它是原假设被拒绝时而应接受的假设,记作H1。
Q: For a pharmoutical company to establish that its new drug is effective it must show that at least 60% of patients benefits. For a random sample of 142, 97 one found to benefit.
P: porprotion of benfits
P¯=97142=0.683
H0:P=0.66
H1:p>0.66
P(Z>1.645)=0.05,α=5
Z=P̂ −P0P0(1−P0)n√=0.683−0.660.66(1−0.66)142√=0.579
s.e=P0(1−P0)n‾‾‾‾‾‾‾√
从population 看起来可能是sampling error导致的
standardize 与 Z0.05=1.645 相比
critical value and P̂
type-I error
0.281
把significant level设成多少(>0.281) reject no hypolysis
Q2: Last year life expectancy 77 years, premium n =28, X¯=78.6 , S=4.48
H0:μ=77
H1:μ>77
Sample t=x¯−μsn√=78.6−714.4828√=1.890
t0.05=1.703
n=28
r=28−1=27
H0:μ=77 reject
if α=1 may be return
P-value = ?(suppose 3.5% return 5% reject)
Rejection Region Approach - Shortcomings
P-Value
How small does the P-Value have to be?
Statistical significance
Conventional Significance Levels
Type I error and p-value
Testing Hypothesis
ŷ =b0+b1x
如何得出? ∂(ŷ −b0−v1x)2/∂b0
Population Regression Function
yi=β0+β1xi+εi
observation index i
dependent variable yi
constant term β1
residual/ error term εi
noise cannot be captured
Probabilistic model: Contains an unobservable term ε, which makes only probabilistic statements possible:
yi=β0+β1xi+εi
Deterministic model: All terms are observed, which means no uncertainty
Ordinary Least Squares
OLS is a method for estimating β0 and β1
OLS returns estimates: b0 and b1 Can be shown that E[b0 ] = β0 and E[b1 ] = β1
unbiased?
Population: yi=β0+β1xi+εi
Sample: yi=b0+b1xi+ei
Assumptions
Model is linear in parameters and errors
E(ε)=0
V(εi)=σ2 , or homoscedasticity
Cov[εi,εj]=0 , or no autocorrelation
E[εiεj]=0
Cov[xi,εj]=0 , or exogeneity
ε ~ N(0,σ2)
假设error term是normal-strong assumption
Chapter 18 (18.3-18.5)
Model: yi=β0+β1xi+εi
b0 is a point, 如何计算它的interval
sample mean: σn√
similarity,
In population yi=β0+β1xi+εi
β0 , β1 constant, εi ~ N(0,σ2) , y ~ N(β0+β1,σ2)
b1=sxys2x⇒V(bi)=V[sxys2x]
sb1=se(n−1)s2x√