统计原理笔记 Notes for Statistics II

by Feiran Jia

Lecture 12 Introduction to Hypothesis Testing

Sampling Distribution of Sample Mean, X¯

  • X¯ is distributed with mean equal to population mean, µ, and standard deviation of σ/n

    X¯ ~ (μ,σ2n)

  • Central Limit Theorem implies that for a sample with n > 30 the shape of the distribution of X¯ will be normal

    X¯ ~ N(μ,σ2n)

  • σ unknown, standard deviation of X¯ is sn

  • Standardized sample mean t=X¯μs/n ~ Student t dis. with ν=n1 degress of freedom

Confidence Interval Estimator

  • Confidence interval estimator of μ is

    X¯±zα/2σn or X¯±zα/2,νsn

Student t-distribution

Lecture 13 Hypothesis Testing: Population Mean μ

  • Test-statistic - numerical value from sample
  • Critical value - numerical value from table
  • Significance level - probability set up before test

Idea of Formal Testing of µ

  • H0 is initial presumption about population

    • H0 is not based on any evidence
  • Test-statistic - A random variable created using sample statistic

    • When testing µ, test-statistic is a function of X¯

    • Recall that when testing p, test-statistic is a function of p̂ 

Example: Parking Fees

Hoping to lure more shoppers downtown, a city builds a new public parking garage in the central business district. The city plans to pay for the structure through parking fees. The city would break even only if the average parking revenues are greater than $125. For a random sample of 44 weekdays, daily fees collected averaged $126. Assume that the standard deviation is known to be $15.

H0:μ=125

HA:μ>125

n=44

population standard deviation, σ = 15

significance level, α = 0.05

sample mean, X¯=126

Rejection region with t

[one-sided test]

Rejection region: t>tα

tα,ν=t.05,43=1.681

Test-statistic = X¯μ0s/n=12612515/44=0.44

>

p-value with t

[One-sided test]

P-value = P( t>X¯μ0s/n ) = P(t>0.44) = 0,33

Link CI Estimator and Hypothesis Testing

  • Confidence interval is centered at X¯
  • 90% confidence interval, α = .10 (?)
  • CI: X¯±zα/2σn
  • Parameter value under H0

σ is unknown?

  • If population standard deviation, σ is unknown, we use our best guess to estimate it - sample standard deviation, s

  • We can no longer rely on normal distribution as a sampling distribution of sample mean

  • Instead, we use Student t distribution

  • Test statistic is distributed according to Student t, not standard normal

    ​ Test statistic = X¯μs/n

假设检验:用来判断样本与样本,样本与总体的差异是由抽样误差引起还是本质差别造成的统计推断方法。

基本原理:先对总体的特征作出某种假设,然后通过抽样研究的统计推理,对此假设应该被拒绝还是接受作出推断。

1.小概率原理

  如果对总体的某种假设是真实的,那么不利于或不能支持这一假设的事件A(小概率事件)在一次试验中几乎不可能发生的;要是在一次试验中A竟然发生了,就有理由怀疑该假设的真实性,拒绝这一假设。

2.假设的形式

  H0——原假设,H1——备择假设

假设检验就是根据样本观察结果对原假设(H0)进行检验,接受H0,就否定H1;拒绝H0,就接受H1。

  1. 假设检验的原理

    一般地说,对总体某项或某几项作出假设,然后根据样本对假设作出接受或拒绝的判断,这种方法称为假设检验。

    假设检验使用了一种类似于“反证法”的推理方法,它的特点是:

    (1)先假设总体某项假设成立,计算其会导致什么结果产生。若导致不合理现象产生,则拒绝原先的假设。若并不导致不合理的现象产生,则不能拒绝原先假设,从而接受原先假设。

    (2)它又不同于一般的反证法。所谓不合理现象产生,并非指形式逻辑上的绝对矛盾,而是基于小概率原理:概率很小的事件在一次试验中几乎是不可能发生的,若发生了,就是不合理的。至于怎样才算是“小概率”呢?通常可将概率不超过0.05的事件称为“小概率事件”,也可视具体情形而取0.1或0.01等。在假设检验中常记这个概率为α,称为显著性水平。而把原先设定的假设成为原假设,记作H0。把与H0相反的假设称为备择假设,它是原假设被拒绝时而应接受的假设,记作H1。

Q: For a pharmoutical company to establish that its new drug is effective it must show that at least 60% of patients benefits. For a random sample of 142, 97 one found to benefit.

P: porprotion of benfits

P¯=97142=0.683

H0:P=0.66

H1:p>0.66

  1. P(Z>1.645)=0.05,α=5

    Z=P̂ P0P0(1P0)n=0.6830.660.66(10.66)142=0.579

    s.e=P0(1P0)n

从population 看起来可能是sampling error导致的

standardize 与 Z0.05=1.645 相比

  1. P(P̂ P0P0(1P0)N>1.645)=0.05P̂ =0.725

critical value and P̂ 

type-I error

  1. P-value

0.281

把significant level设成多少(>0.281) reject no hypolysis

Q2: Last year life expectancy 77 years, premium n =28, X¯=78.6 , S=4.48

H0:μ=77

H1:μ>77

  1. Sample t=x¯μsn=78.6714.4828=1.890

    t0.05=1.703

    n=28

    r=281=27

    H0:μ=77 reject

    ​if α=1 may be return

  2. P-value = ?(suppose 3.5% return 5% reject)

Lecture 14 Hypothesis Tests: P-Value Approach

Rejection Region Approach - Shortcomings

  • Rejection region approach has only two outcomes - yes or no.
  • However, results from some tests are stronger than from the others.
  • Think of the test statistic being very close to or very far from the critical value.
  • To take advantage of information available from test statistic, we need a better measure of the statistical evidence supporting alternative hypothesis.
  • Solution: P-value approach

P-Value

  • Definition: The probability of observing a sample statistic at least as extreme as the one actually observed (in the direction of HA ) given H0 is true
  • Example: pvalue=P(p̂ >0.6|H0 is true)
  • Small p-value:
    • Such an event is highly unlikely if H0 is true
    • Cast doubt upon the validity of H0
    • Small enough p-value gives us reason to reject H0 and supports HA
  • P-value tells us exactly how likely we are to make a Type I error if we reject H0
  • For P-value, smaller is better (in support of alternative hypothesis)

How small does the P-Value have to be?

  • How small the p-value have to be to infer that HA is true?
  • 0 - 0.01 implies overwhelming evidence
  • 0.01-0.05 implies strong evidence
  • 0.05 - 0.10 implies weak evidence
  • greater than 0.10 means no evidence in favour of HA

Statistical significance

  • Pick significance level before calculating p-value!
  • If p-value falls below significance level, we say that the results from the test are statistically significant
    • Significant: has meaning, is important
    • Economically significant: the effect is large enough for decision makers to consider it to be important
    • Statistically significant: an effect that is not likely equal to zero given the data; an effect that is not likely observed due to chance (sampling error)
  • Do not confuse statistical significance and economic, or practical significance
  • Always report p-value together with your conclusion about the results of the test

Conventional Significance Levels

Type I error and p-value

  • Type I error: accept false HA , or P(reject H0 | H0 is true)
  • With p-value approach, we can calculate the exact probability of Type I error
  • P-value = probability of Type I error!
  • P-value is the chance to reject H0 when it is true

Testing Hypothesis

  • Formulate null and alternative hypotheses
  • Pick significance level, α
    1. Rejection Region
      • Calculate test statistic
      • Find rejection region using statistical tables
      • Compare test statistic to rejection region
    2. P-value
      • Calculate test statistic
      • Compute p-value using statistical table
      • Compare p-value to significance level, α
  • Interpret results, and draw conclusion
  • A picture speaks a thousand words, especially in hypothesis testing

Lecture 15 Regression Analysis: Introduction and Assumptions

Least Squares Method (OLS)

ŷ =b0+b1x

如何得出? (ŷ b0v1x)2/b0

Population Regression Function
yi=β0+β1xi+εi

​ observation index i

​ dependent variable yi

​ constant term β1

​ residual/ error term εi

noise cannot be captured

  • Probabilistic model: Contains an unobservable term ε, which makes only probabilistic statements possible:

    yi=β0+β1xi+εi

  • Deterministic model: All terms are observed, which means no uncertainty

    Ordinary Least Squares

  • OLS is a method for estimating β0 and β1

  • OLS returns estimates: b0 and b1 Can be shown that E[b0 ] = β0 and E[b1 ] = β1

    • OLS produces unbiased, consistent and relatively efficient estimates

    unbiased?

  • Population: yi=β0+β1xi+εi

  • Sample: yi=b0+b1xi+ei

Assumptions

  1. Model is linear in parameters and errors

  2. E(ε)=0

  3. V(εi)=σ2 , or homoscedasticity

  4. Cov[εi,εj]=0 , or no autocorrelation

    E[εiεj]=0

  5. Cov[xi,εj]=0 , or exogeneity

  6. ε ~ N(0,σ2)

    假设error term是normal-strong assumption

Lecture 16 Simple Regression: Testing the Slop

Chapter 18 (18.3-18.5)

Simple Regression Model

Model: yi=β0+β1xi+εi

  • OLS produces b0 and b1
  • Estimated regression line: ŷ =b0+b1x
  • Are b0 and b1 point or interval estimators of the population parameters?
  • What do we need to obtain interval estimators? εi ∼ N(0, σ 2 )

b0 is a point, 如何计算它的interval

Standard Error of the Slope Estimate

sample mean: σn

similarity,

​ In population yi=β0+β1xi+εi

β0 , β1 constant, εi ~ N(0,σ2) , y ~ N(β0+β1,σ2)

b1=sxys2xV(bi)=V[sxys2x]

What affects precision of the slope estimate?

sb1=se(n1)s2x

  • Sample size
  • Variance of independent variable
  • Standard Error of estimate, $s_e

你可能感兴趣的:(金融笔记)