EECS6127 分类器

EECS6127–Winter 2020–2021
Assignment 1
Due: 11:59:59 PM, February 6, 2021.
A full score on this assignment is 25 marks. However there are 28 marks achievable on
this assignment (up to 3 bonus marks). To gain full marks, explain your answers carefully
and prove your claims.

  1. Bayes classifier and Bayes risk
    Recall that we model the data generation as a probability distribution D over X ×
    {0, 1}. D can be defined in term of its marginal distribution over X , which we will
    denote by DX and the conditional labeling distribution, which is defined by the
    regression function
    µ(x) = P
    (x,y)∼D
    [y = 1 | x]
    Let’s consider a 2-dimensional Euclidean domain, that is X = R
    2
    , and the
    following process of data generation: The marginal distribution over X is uniform
    over two square areas [1, 2] × [1, 2] ∪ [3, 4] × [1.5, 2.5]. Points in the first
    square Q1 = [1, 2] × [1, 2] are labeled 0 (blue) and points in the second square
    Q2 = [3, 4] × [1.5, 2.5] are labeled 1 (red), as in the illustration below.
    (a) Describe the density function of DX, and the regression function, Bayes predictor
    and Bayes risk of D.
    (b) Consider the two distributions D1 and D2
    that we obtain by “projecting” onto
    each of the axes. Formally, we are marginalizing out one of the features to
    obtain D1 and D2
    . Both are distributions over R× {0, 1}. Describe the density
    functions of D1
    X
    and D2
    X
    , and the regression functions, Bayes predictors and
    Bayes risks of D1 and D2
    .
    1
    (c) Consider the hypothesis classes Hinit and Hdecst from Question 2 on this assignment.
    Determine the approximation errors of Hinit for D1 and D2 and the
    approximation error of Hdecst for D.
    (d) For the classes Hinit and Hdecst consider their closures under function complements
    and again determine the approximation errors on D, D1 and D2
    .
    • 3 + 3 + 3 marks
  2. VC-dimension
    We define hypothesis classes Hinit of initial segments over domain X = R and Hdecst
    of decisions stumps over domain X = R
  3. as follows:
    Hinit = {ha | a ∈ R}, where ha(x) = 1[x ≤ a] ,
    and
    Hdecst = {h
    i
    a
    | a ∈ R, i ∈ {1, 2}}, where h
    i
    a
    ((x1, x2)) = 1[xi ≤ a] .
    Further, for a hypothesis class H ∈ {0, 1}
    X , we define the class of complements of
    H, denoted by Hc
    , as the class where we flip all the labels of the functions in H,
    that is
    H
    c = {h
    c
    | h ∈ H} where h
    c
    (x) = |h(x) − 1|.
    Finally , we let Hcc = H ∪ Hc denote the closure of H under complements.
    (a) Determine the VC-dimensions of Hinit and Hdecst.
    (b) Show that for every hypothesis class H, we have VC(H) = VC(Hc
    ).
    (c) Determine the VC-dimensions of Hcc
    init and Hcc
    decst.
    (d) Show that, for every k, there exists a hypothesis class H with VC-dimension
    k and VC(Hcc) = 2k + 1.
    Hint: consider domain X = N and classes
    Hk = {h ∈ {0, 1}
    X
    | |h
    −1
    (1)| ≤ k}
    That is Hk is the class of functions that map at most k natural numbers to 1
    and the remaining ones to 0.
    • 3 + 3 + 3 marks
  4. Empirical and true risk
    Recall Claim 2, from Lecture 2. We showed that, over an uncountable domain,
    there exists a learner A (the “stubborn learner”) and a distribution D, such that
    for every sample size m and all samples S from Dm:
    |LS(A(S)) − LD(A(S))| = 1
    This is not true for countable domains (as we will show later in the course). For
    this exercise, we will show a similar, but weaker statement for countable domains.
    Without loss of generality, we can assume X = N. Prove that, for every sample size
    m, and every  > 0, there exists a learner A and a distribution D over X × {0, 1},
    such that for all samples S from Dm we have

你可能感兴趣的:(机器学习)