Decision trees
第 1 个问题:Based on the decision tree shown in the lecture, if an animal has floppy ears, a round face shape and has whiskers, does the model predict that it's a cat or not a cat?
【正确】cat
Not a cat
【解释】Correct. If you follow the floppy ears to the right, and then from the whiskers decision node, go left because whiskers are present, you reach a leaf node for "cat", so the model would predict that this is a cat.
第 2 个问题:Take a decision tree learning to classify between spam and non-spam email. There are 20 training examples at the root note, comprising 10 spam and 10 non-spam emails. If the algorithm can choose from among four features, resulting in four corresponding splits, which would it choose (i.e., which has highest purity)?
Left split: 5 of 10 emails are spam. Right split: 5 of 10 emails are spam.
Left split: 7 of 8 emails are spam. Right split: 3 of 12 emails are spam.
Left split: 2 of 2 emails are spam. Right split: 8 of 18 emails are spam.
【正确】Left split: 10 of 10 emails are spam. Right split: 0 of 10 emails are spam.
【解释】Yes!
Practice quiz: Decision tree learning
第 1 个问题:Recall that entropy was defined in lecture as H(p_1) = - p_1 log_2(p_1) - p_0 log_2(p_0), where p_1 is the fraction of positive examples and p_0 the fraction of negative examples.
第2 个问题:第 2 个问题Recall that information was defined as follows:
第 3 个问题:To represent 3 possible values for the ear shape, you can define 3 features for ear shape: pointy ears, floppy ears, oval ears. For an animal whose ears are not pointy, not floppy, but are oval, how can you represent this information as a feature vector?
[0, 1, 0]
[1,0,0]
[1, 1, 0]
【正确】[0, 0, 1]
【解释】Yes! 0 is used to represent the absence of that feature (not pointy, not floppy), and 1 is used to represent the presence of that feature (oval).
第 4 个问题:For a continuous valued feature (such as weight of the animal), there are 10 animals in the dataset. According to the lecture, what is the recommended way to find the best split for that feature?
Try every value spaced at regular intervals (e.g., 8, 8.5, 9, 9.5, 10, etc.) and find the split that gives the highest information gain.
Use a one-hot encoding to turn the feature into a discrete feature vector of 0’s and 1’s, then apply the algorithm we had discussed for discrete features.
【正确】Choose the 9 mid-points between the 10 examples as possible splits, and find the split that gives the highest information gain.
Use gradient descent to find the value of the split threshold that gives the highest information gain.
【解释】Correct. This is what is proposed in the lectures.
第 5 个问题:Which of these are commonly used criteria to decide to stop splitting? (Choose two.)
When the information gain from additional splits is too large
【正确】When the tree has reached a maximum depth
【解释】Yes!
【正确】When the number of examples in a node is below a threshold
【解释】Yes!
When a node is 50% one class and 50% another class (highest possible value of entropy)
Practice quiz: Tree ensembles
第 1 个问题:For the random forest, how do you build each individual tree so that they are not all identical to each other?
Train the algorithm multiple times on the same training set. This will naturally result in different trees.
If you are training B trees, train each one on 1/B of the training set, so each tree is trained on a distinct set of examples.
【正确】Sample the training data with replacement
Sample the training data without replacement
【解释】Correct. You can generate a training set that is unique for each individual tree by sampling the training data with replacement.
第 2 个问题:You are choosing between a decision tree and a neural network for a classification task where the input xx is a 100x100 resolution image. Which would you choose?
A neural network, because the input is structured data and neural networks typically work better with structured data.
【正确】A neural network, because the input is unstructured data and neural networks typically work better with unstructured data.
A decision tree, because the input is unstructured and decision trees typically work better with unstructured data.
A decision tree, because the input is structured data and decision trees typically work better with structured data.
第 3 个问题:What does sampling with replacement refer to?
【正确】Drawing a sequence of examples where, when picking the next example, first replacing all previously drawn examples into the set we are picking from.
Drawing a sequence of examples where, when picking the next example, first remove all previously drawn examples from the set we are picking from.
It refers to a process of making an identical copy of the training set.
It refers to using a new sample of data that we use to permanently overwrite (that is, to replace) the original data.