COGS 118A机器学习算法

Homework Assignment 7
COGS 118A: Supervised Machine Learning Algorithms
Due: May 20th, 2022, 11:59pm (Pacific Time).
Instructions: Your math can be handwritten. In that case use a scanner app to
take photos of the pages and turn them into a PDF. Adobe, Evernote, Microscoft,
and many others offer free scanner apps. Alternatively, your math may be done in
LaTeX or in a Jupyter Notebook using LaTeX markdown and from there turned into
PDF. All of these different sources must be stitched together into a single PDF file.
PDF Merge tool (e.g. smallpdf, comebinepdf) can do this among many other tools.
Make sure to show the steps of your solution, not just the final answer.
Late Policy: You will receive 75 percent credit MAX for any late assignments. Late
submissions will not be accepted after 1 week.
Grade: out of 63 points
1 (12 points) Conceptual Questions

  1. In the Figure 1, we use a black line to mark the decision boundary of a linear
    SVM classifier parameterized by weight vector w and bias b. If we start to
    increase the margin of the classifier, what would happen to w?
    7SCR?
    Computing the margin wi th
    Figure 1: A linear SVM classifier with data points.
    A. The magnitude of w does not change.
    B. w will have smaller magnitude.
    1
    C. w will have greater magnitude.
    D. None of above.
  2. Which one below best describes what support vectors are:
    A. The decision boundary.
    B. The positive and the negative planes.
    C. The training samples that determine the positive and the negative planes.
    D. The test samples that determine the positive and the negative planes.
  3. Consider a dataset S = {(xi, yi), i = 1, 2, ..., n} where x = [x1, x2]? and y ∈
    {?1,+1}. Here we try to learn a linear SVM classifier parameterized by weight
    vector w and bias b to fit the dataset S. Which one is the appropriate loss
    function for the linear SVM classifier from the options below:
    A. L(w, b) = 2√
    ww
    C ×∑ni=1max (0, 1? yi × (w?xi + b))
    B. L(w, b) = 1
    2
    ww + C ×∑ni=1max (0, 1? yi × (w?xi + b))
    C. L(w, b) = 2√
    ww
  4. C ×∑ni=1max (0, 1? yi × (w?xi + b))
    D. L(w, b) = 1
    2
    ww C ×∑ni=1max (0, 1? yi × (w?xi + b))
  5. (16 points) K Nearest Neighbors
    Consider a training dataset Straining = {(xi, yi), i = 1, 2, ..., 8} where each data point
    (x, y) has a feature vector x = [x1, x2]
    and the corresponding label y ∈ {?1,+1}.
    The points with the corresponding labels in the dataset are shown in the figure below.
    In the figure above, the index i for each training example xi is given in bold near
    the point. You are asked to predict the label of a point xpred = [1, 0]
    shown in the
    figure as a triangle ▲ using k nearest neighbors (k-NN) method under Euclidean
    distance.
  6. (4 points) List the x vectors of all the data points in Straining and their corre-
    sponding labels.
  7. (4 points) Determine the predicted label for xpred using the k-NN with different
    k.
    (a) k = 1.
    (b) k = 5.
  8. (8 points) For KNNs and SVMs:
    (a) Describe the differences between in the decision boundaries of each algo-
    rithm.
    (b) When might an SVM be more appropriate than a KNN? Visa Versa?
  9. (15 points) Support Vector Machine
    Consider a dataset {(xi, yi), i = 1, 2, ..., 10} where x = [x1, x2]? and y ∈ {?1,+1},
    which is shown in the Figure 2 below. Suppose we have trained a support vector
    machine (SVM) classifier on the dataset, which has the decision boundary :
    w?x+ b = 0. The SVM classifier is optimized as following:
    Find: argmin
    w
    1
    2
    ||w||22
    Subject to, ?i :w?xi + b ≥ +1, if yi = +1,
    wxi + b ≤ ?1, if yi = ?1.
    We define ?+ : w
    x + b = +1 as the positive plane and ?? : w?x + b = ?1 as the
    negative plane. After training the SVM classifier using the gradient descent method
    for several iterations to reach an intermediate state, we have obtained the positive
    plane and the negative plane, which are indicated as solid lines in the Figure 2
    below.
    Figure 2: The positive plane, the negative plane and the data points.
  10. (2 points) Draw the decision boundary of the SVM classifier in Figure 2.
  11. (8 points) Calculate the w and b from the positive plane and negative plane.
    Hint: The positive plane passes through x4 = (2, 2), x8 = (3, 4). The negative
    plane passes through x3 = (2, 0), x7 = (3, 2).
  12. (5 points) Calculate for size of the margin based on your w.
    Hint: Margin can be obtained from 2||w||2 .
  13. (10 points) SVM: Gradient
    Given a training dataset Straining = {(xi, yi)}, i = 1, . . . , n}, we wish to optimize the
    loss L(w, b) of a linear SVM classifier:
    L(w, b) = 1
    2
    ||w||22 + C
    n∑
    i=1
    (
    1? yi(wTxi + b)
    )
    +
    (1)
    where (z)+ = max(0, z) is called the rectifier function and C is a scalar constant.
    The optimal weight vector w? and the bias b? used to build the SVM classifier are
    defined as follows:
    w, b = argmin
    w,b
    L(w, b) (2)
    In this problem, we attempt to obtain the optimal parameters w? and b? by using a
    standard gradient descent algorithm.
    Hint: To find the derivative of L(w, b), please consider two cases:
    (a) 1? yi(wTxi + b) ≥ 0, (b) 1? yi(wTxi + b) < 0
  14. find the derivative:
  15. (10 points) SVM: Margin
    As shawn in the Figure 3, we have the decision boundary (marked as a black line)
    defined as Eq. 3 given w, b:
    wTx+ b = 0 (3)
    In parallel to the decision boundary, we have the positive plane (marked as a red line)
    defined as Eq. 4 and the negative plane (marked as a blue line) defined as Eq. 5:
    wTx+ b = +1 (4)
    wTx+ b = ?1 (5)
    We pick an arbitrary point x? on the negative plane, and draw a purple line that
    passes x? and is perpendicular to the negative plane. The intersection between this
    purple line and the positive plane can be denoted as x+. Thus, we have the following
    Eq. 6 that indicates the relation between x+ and x?:
    x+ = x + λw (6)
    where λ ∈ R is an undetermined scalar. The margin M is defined as the distance
    between the positive and the negative planes, which can be calculated from Eq. 7:
    M = ||x+ ? x?||2 =

    < λw, λw > (7)
    7SCR?
    Computing the margin width
    Figure 3: The decision boundary, the positive plane and the negative plane.
    Please derive the following according to Eq. 7:
    M =
    2√
    < w,w >
    Hint: You can firstly represent λ in the form of w by using Eq. 4, 5,

你可能感兴趣的:(算法)