ME41105 - IV Assignment 1Visual object detectionIntelligent Vehicles groupDelft University of TechnologyNovember 15, 2019About the assignmentMake the assignments in student pairs, you receive both one grade. Please read through thiswhole document first such that you have an overview of what you need to do.This assignment contains Questions and Exercises:• You should address all of the questions in a 2 or 3 page report (excluding plots andfigures). Please provide separate answers for each Question in your report, using thesame Question number as in this document. Your answer should address all the issuesraised in the Question, but typically should not be longer than a few lines.• The Exercises are tasks for you to do, typically implementing a function or performingan experiment. Threrefore, first study the relevant provided code before working on anexercise, as the code may always contains comments refering to each specific exercise.If you do not fully understand the exercise, it may become more clear after reading therelevant code comments! Do not directly address the exercises in your report. Instead,you should submit your solution code together with the report. Experimental results maybe requested in accompanying questions.You will be graded on:1. Quality of your answers in the report: Did you answer the Questions correctly, and demonstrateunderstanding of the issue at hand? All Questions are weighted equally.2. Quality of your code: Does the code work as required?3. Quality of presentation: Is your report readable (sentences easy to understand, no grammarmistakes, clear figures)? Is the code you wrote clear and commented?Submitting• Before you start, go to the course’s Brightspace page, and enroll with your partner in alab group (found under the ‘Collaboration’ page).• To submit, upload two items on the Brightspace ‘Assignments’ page:pdf attachment A pdf with your report.Do not forget to add your student names and ids on the report.zip attachment A zip archive with your Matlab code for this assignmentDo NOT add the data files! They are large and we have them already ...deadline Friday, November 29th 2019, 23:59• Note that only a single submission is required for your group, and only your last groupsubmission is kept by Brightspace. You are responsible for submitting on time. So, donot wait till the last moment to submit your work, and verify that your files were uploadedcorrectly. Connection problems and forgotten attachments are not a valid excuses. Thedue deadline is automatically enforced by Brightspace. If your submission is not on time,you receive an automatic ‘1’.• You may only hand in work by you and your lab partner, done this year. You are responsiblefor not sharing your work neither publicly, nor privately with other students outsideyour group. Do not put your code or report on public servers. If we believe that you have2used material from other groups, or that you have submitted material that is not yours, itwill be reported to the exam committee. This may ultimately result in a failing grade or anexpulsion.• If code is submitted that was written with ill intent, e.g. to manipulate files in the usershome directory that were not specified by the task, you will immediately fail the course.Getting assistanceThe primary occasion to obtain help with this assignment is during the lab practicum contacthours of the Intelligent Vehicles course. An instructor and student assistants will be presentat the practicum to give you feedback and support. If you find errors, ambiguously phrasedexercises, or have another question about this lab assignment, please use the Brightspace labsupport forum. This way, all students can benefit from the questions and answers equally. Ifyou cannot discuss your issue on the forum, please contact Julian Kooij ([email protected])directly.Remember that for help on what a specific Matlab command somefunction does or howto use it, use can type from the Matlab command line help somefunction, or docsomefunction.3Visual object detectionIn this lab assignment we will study and evaluate feature extraction and pattern classificationalgorithms that can be used for video-based pedestrian recognition.Our first goal is to investigate which classifiers give the best result in distinguishing rectangularregion proposals as belonging either to the ‘pedestrian’ or ‘non-pedestrian’ class. We have adataset containing 3000 samples (1500 pedestrian and 1500 non-pedestrian) for training. Fortesting, 1000 samples (500 pedestrian and 500 non-pedestrian) are provided, see Figure 1.The pedestrian and non-pedestrian samples are provided in terms of three feature sets:• data_hog.mat: Histograms of oriented Gradients (HOG) features• data_lrf.mat: Local Receptive Field (LRF) features• data_intensity_25x50.mat: Gray-level pixel intensityFigure 1: Examples of pedestrian and non-pedestrian class samples in the training data.Getting startedBefore you start working on the assignments in this manual, please check the following points:1. Download the provided assignment files from Brightspace, unzip them in a directory.2. Read the README.txt file in the directory.3. start Matlab, and change the path to the lab1/ subdirectory.4. To work on the first section, open assignment eigen pedestrians.m in Matlab. Notethat other sections will refer at their start to other Matlab scripts that will guide you throughthe exercises.5. Note that the scripts use ‘Matlab code cells’, so you can run pieces of the script as neededwithout having to rerun everything from the beginning (which will be slow!). If you do notknow about code cells, it is strongly recommended that you check out official cell modedocumentation1, and watch these video tutorials: video 1 and video 2.1https://nl.mathworks.com/help/matlab/matlab_prog/run-sections-of-programs.html46. The exercises often require you to make changes within a separate .m file, rather than thetop-level script. This means that inspecting the variables in these files is not easily donewithout debugging. If you do not know how to debug code in matlab, then please readmatlab’s documentation on how to set breakpoints (especially the ability to automaticallyset breakpoint on error), and subsequently how you can examine values once you are indebug mode.1 Eigen-pedestriansExercises refer to code sections in assignment eigen pedestrians.m.Let us start by exploring the dataset a bit. It may be difficult to interpret the LRF and HOGfeature representations, but it is possible to visualize the gray-level intensity images by resizingthem to their original size. Note that these 25 × 50 pixel images have been reshaped to1 × 1250-dimensional vectors. To restore one vector to its original size, the Matlab commandreshape can be used. After reshaping the matrix, it can be visualized by using imshow.Note: imshow will automatically scale the intensities if you pass it an empty array as secondargument, e.g. imshow(I, []).Exercise 1.1. Visualize several pedestrian and background samples from the trainingdata. Note: You do not need to start from scratch, a lot of the boilerplate is already givenin the provided assignment script referred to in the boxed note at the top of this section. Inthe script, look for code block with the comment ’Exercise 1.1’. You will see that you onlyneed to complete the function imshow_intensity_features.m. For instance, this firstexercise can be solved using the correct calls to reshape and imshow only.Some classifiers may not be able to handle the high dimensionality of the input data. PrincipalComponent Analysis (PCA) is one method to reduce the data dimensionality by projecting it intoa linear subspace that maintains most of the variance in the data, see Appendix A.Question 1.1. For each of the three feature types, what is the maximum number of PCAcomponents? Motivate your answer.In this section, we shall study using PCA on the gray-level intensity features. In your submittedsolution, you cannot use the built-in Matlab functions pca or princomp. Instead, make yourown implementation, for which you can use Appendix A as a reference. You are allowed to useMatlab’s eig function to compute eigenvectors and eigenvalues. Notice that to correctly projectdata onto the PCA dimensions, the mean vector should be subtracted from the data, so thisvector needs to be computed too.Exercise 1.2. Compute the principal components from the dataset, and visualize themean, and the first 10 principal components as images. Note: Be aware that there aredifferent conventions to represent feature vectors. For instance, in math notation (as inAppendix A), features vectors are typically expressed as column vectors. But in the codeand data matrices, the features are given as rows. This difference matters when doing dotproducts between matrices and vectors.These principal components are the eigen-vectors of the covariance computed on the given imageintensity features. Since the image dataset contains pedestrians, its principal componentscan also be called eigen-pedestrians.5Question 1.2. Include images of the “mean” pedestrian, and the 10 eigen-pedestriansin your report. How do you interpret the light/dark regions in the eigen-pedestrians? Whatcolor would a PCA weight of ”0” have in these images? And, what color would an intensityof ”0” have in the intensity images from Exercise 1.1?Exercise 1.3. Project intensity data of both the pedestrian and background trainingsamples to the first three PCA components. After this projection, each sample will berepresented by a 3D vector in the PCA space. Create a 3D point cloud of the 3D vectors ofboth classes.Question 1.3. In the 3D plot, which axis has the largest amount of variance? Is this axisalone sufficient to separate the two classes in our dataset? Motivate your answer.Exercise 1.4. For intensity features, take the top-n PCA dimensions and project some 6images from the training data onto the corresponding linear subspace, and then project theimages back to the original image space, and display them. Do this for n = 10 and n = 100.Question 1.4. Compare the original images to those obtained after projecting to andfrom the n = 10 and also to the n = 100 subspace. How does n affect the image quality?How much (in percentage) do the PCA projections reduce the feature size compared to theoriginal intensity image feature?Exercise 1.5. Now for all three feature types, make plots of the percentage of explainedvariance (y-axis) vs. the number of PCA components/dimensions (x-axis).Question 1.5. For each feature set, how many PCA dimensions should we keep tomaintain 90% of the variance in the data? Include the plots the motivate your answer.2 Pedestrian classificationExercises refer to code sections in assignment pedestrian classification.m.For each of the provided feature sets (HOG, LRF, and Intensity) we want to train the followingtwo classifiers:1. Linear Support Vector Machine (SVM) classifier (see Appendix B)2. Gaussian-Mixture-Model (GMM) with Bayesian decision model (see Appendix C)But before we train the classifiers, we have to decide if applying PCA dimensionality reductionis necessary. To decide on the number of dimensions to use, take into account how many freeparameters each of the classifiers has, which need to be adapted during training (take intoaccount input parameters, their dimensionality and their properties).Question 2.1. How many free model parameters does a trained Linear SVM classifierhave for M-dimensional feature vectors? Give a formula as a function of M, and motivateyour answer. (Note that the model parameters do not include the parameters of the trainingprocedure, such as number of data samples, number of iterations, or C which is used laterin the assignment).6Question 2.2. How many free model parameters does a Gaussian-Mixture-Model classifierwith K mixture components have for M-dimensional feature vectors? Give a formulaas a function of M and K, and motivate your answer.A good rule of thumbs is that to obtain meaningful results, there should be (much) less freeparameters than training samples.Question 2.3. For which of these classifiers is PCA dimensionality reduction necessaryon this dataset? Motivate your answer.Now we can train each classifiers on each of the feature sets, applying PCA first where appropriate.Afterwards, we compute the classification error (percentage of misclassified samples)for all trained classifiers on the test samples.Exercise 2.1. Implement the SVM classifier by completing train_SVM for training, andevaluate_SVM for testing. See the comments in the code on available functions to train aSVM (don’t worry about C, use C = 2). Train and test the SVM on all three feature sets.Exercise 2.2. Implement the GMM classifier and evaluation in train_GMM andevaluate_GMM. As you will see, train_GMM provides already boilerplate code. Use themethods of Matlab’s built-in gmdistribution object to fit GMM distributions on the dataof each class, and to evaluate their pdf’s on test data. Train and test GMM classifiers on allthree feature sets, using K = 5 mixture components per class.Question 2.4. What are the six classification errors that you obtained? Which feature/-classifier combination is best?We could also evaluate the pedestrian classifiers using ROC curves (on y-axis: true positiverate [0,1], on x-axis: false positive rate), instead of the classification error measure. This requireslogging the decision values (classifier outputs) on the test dataset.Exercise 2.3. Complete the code for plotting the ROC curves. Look at Appendix D andFigure 3 for more information on the ROC plots.Question 2.5. Which feature/classifier combinME41105代写、代做Intelligent Vehiclation performs best? Include the ROCplots in your report to support your answer.Now that we have these simple classifiers, we will take a look at techniques to further improvethe performance.Parameter selection and overfitting Both the SVM and the GMM have parameters that wecan set before training the model, and which affect the final performance. In case of the SVM,this is the training parameter C ∈ R, and for the GMM we have K, the number of mixturecomponents (Actually, if we apply PCA, the number of PCA dimensions is a parameter too).Till now we have kept these parameters fixed, but here we will experiment with optimizing them.When we train on some training data, we should always validate performance on separatetesting data. Selecting parameters by minimizing the error on the training data is not a goodindication of performance on new data, you will see that this leads to overfitting.7Exercise 2.4. Run the code block that evaluates the SVM for various values of C on boththe training and testing data, and generate plots of the error as a function of C.Question 2.6. If we evaluate on the training data, what is the lowest error that we canobtain, and for which C? What is the optimal parameter and error if we evaluate on thetest data? Setting C too large leads to overfitting. But why does a large C decrease thetraining error, but increase the test error?Exercise 2.5. Now implement your own code block to evaluate the effect of changing Kin the GMM classifier on the HOG features. Try out these values for K ranging from 1 up to7, and create again plots of the error as a function of K for evaluating on the training andtest data. You can copy and alter the provided code from the previous exercise.Question 2.7. If we evaluate on the training data, what is the lowest error that we canobtain, and for which K? What is the optimal parameter and error if we evaluate on the testdata? Include the error plots in your report. Why does overfitting occur as we increase K?Note: For the next exercise, you can keep using C = 2 for the SVM, and K = 5 for the GMM.Multi-feature classification Multiple classifiers trained on distinct feature sets might providecomplementary results. Intuitively, we should be able to benefit from having multiple distinct’expert opinions’. Here we consider two approaches to fuse the decision values (classifieroutput) of the trained SVM classifiers on the HOG and LRF features, namely1. fused output is the mean of the outputs of HOG/SVM and LRF/SVM2. fused output is the maximum of the outputs of HOG/SVM and LRF/SVM.Exercise 2.6. Evaluate both fusion approaches using ROC curves.Question 2.8. Which fusion approach performs best? List the respective ROCs anddiscuss the effects you observe in your report.3 Pedestrian detectionExercises refer to code sections in assignment pedestrian detection.m.Up to now, we have looked at pedestrian classification, i.e. deciding for a provided regionproposal if it belongs to the pedestrian or non-pedestrian class. In this last assignment, we willmove to pedestrian detection, which considers a broader question: In a given a target image,where are pedestrians located? A fairly simple but effective method to answer this questionis to just divide the image into a large set of candidate region proposals of the right size andshape, and classify each of these regions using a trained pedestrian classifier. See Figure 2for an example.You are now provided with a new dataset with pedestrian and non-pedestrian samples, andcomputed HOG features, and also with a test video sequence of a pedestrian filmed from amoving vehicle. Furthermore, the region proposals have been pre-computed, and HOG featuresfor each region are provided per video frame.8Figure 2: Pedestrian detection by classifying many region proposals. In this example, the greenrectangles correspond to regions classified as pedestrians. Our aim is to avoid false positivesand false negatives, though in practice misclassification errors may occur.Exercise 3.1. Train and evaluate a linear SVM on the HOG features as good as you can.Question 3.1. Explain your approach: Why did you (not) need to use dimensionalityreduction? What value do you use for C and why?Exercise 3.2. Train and evaluate a GMM on the HOG features as good as you can.Question 3.2. Explain your approach: Why did you (not) need to use dimensionalityreduction? What value do you use for K and why?Question 3.3. Which of your classifiers is the best to use on this data? Include ROCplots to support your decision.Exercise 3.3. Now apply your selected classifier on the region proposals of the videosequence, and visualize the regions which are considered ‘pedestrian’.Question 3.4. Studying the pedestrian detection results qualitatively (so by looking atthe results, as opposed to quantitative evaluation using error statistics and ROC curves).What kind of false positives and false negatives do you observe? What would you suggestto counter these errors, and improve the results?9AcknowledgementsWe thank Markus Enzweiler for his help in creating part of the data and early versions of theexercises used in this lab.10AppendixA Principal Component Analysis (PCA)PCA is a technique for reducing the dimensionality of the feature space. By analyzing howthe data is distributed from training samples, PCA computes a linear subspace that maintainsmost of the variance of the input data. New data samples can later also be projected to thissubspace, which is also sometimes referred to as the ‘PCA subspace’, or ‘PCA projection’.A.1 Computing the subspaceConsider that we have N data samples x1, · · · , xN in an M-dimensional feature space, givenas a single M × N data matrix X with each column a data sample. We can compute a DdimensionalPCA subspace of X where D ≤ M to obtain a D dimensional data representationwhich maintains most of the variance. The subspace will be defined as a linear projection,consisting of an M × D transformation matrix W, and the M-dimensional mean data vector m.First compute the mean data vector m,(1)The mean is then subtracted from the data to get the zero-mean M × N data matrix X (with columns xj ), which is then used to compute the M × M covariance matrix C,xj = xj − m, ∀j, where 1 ≤ j ≤ N (2)C = X × X>. (3)Next, compute the eigen-vectors wi and corresponding eigen-values λi of the covariance matrixC. Recall that the eigen-vector and eigen-values fulfill the following property,Cwi = wiλi. (4)The eigen-vectors should be sorted such that the i-th component is the eigen-vector with thei-th largest eigen-value λi, hence the first eigen-vector w1 has the largest eigen-value λ1.These vectors M-dimensional wi are called the principal components of the data X, and areall orthonormal to each other.The eigen-value λiis proportional to the amount of variance the data has along PCA subspacedimension i, hence fraction of variance kept by principal component i is λi/PMj=1 λj . Therefore,we only keep the first D eigen-vectors w1, · · · , wD, as they correspond the D dimensions thatretain most of the data variance. These D eigenvectors can be represented as single M × Dmatrix W, where the i-th column is the i-th component wi.A.2 Projecting data to the subspaceOnce the subpace is computed, we can apply the transformation to reduce the dimensionalityof any M-dimensional data vector x ∈ RM to its reduced D-dimensional ‘PCA’ representation11x ∈ RD using the following linear equation (assuming all vectors are column vectors),x = W>(x − m). (5)A.3 Back-projecting from the subspaceThe inverse back-projection can also easily be achieved,x0 = Wx + m (6)such that x0is the (approximate) reconstruction in the original M-dimensional feature space. Ifwe keep all dimensions in the projection, such that D = M, then W×W> = I. In other words,W is an orthonormal projection and therefore its transpose is its inverse W> = W−1. However,typically we use D � M so back-projection does not restore the original feature space exactly.B Linear Support Vector Machine (SVM) classifierSupport Vector Machines are an advanced topic in Machine Learning. In this session, we willstick to Linear SVMs on two-class data, which form the basis of more complicated approaches.A Linear SVM learns a hyperplane in the feature space that separates the data points of eachclass with maximal margin. The model parameters of the hyperplane are the M-dimensionalweight vector w and a bias b which define the normal and offset of the plane. To test a newM-dimensional feature vector x, the Linear SVM computes the following decision value:d(x) = w> · x + b (7)The sign of this decision value determines the assigned class label, i.e. either it is assigned tothe positive (i.e. pedestrian) class, d(x) ≥ 0, or to the negative (non-pedestrian) class, d(x) For learning Linear SVM parameters, the built-in fitcsvm Matlab function can be used. Incase that you have an older Matlab version, you can also use the provided primal_svm.mfunction, which should let you obtain similar conclusions in the lab assignments.Note that there is a training parameter C ∈ R (called BoxConstraint for fitsvm) which influenceshow the optimizer computes the linear decision boundary: For large C, the optimizertries very hard to separate both classes, which means that its boundary is sensitive to outliers.For low C it results in a soft margin where a few samples may lie close to or on the wrong sideof the boundary, if this enables a wider margin to separate most data points of both classes.While C is used during training, it is not part of the learned model in Equation (7).C Gaussian-Mixture-Model (GMM) classifierThe Gaussian-Mixture-Model classifier belongs to a family of classifiers that model how thedata from each class is distributed. In the training phase, these class conditional distributionsare fitted on the available training data.For a new test sample x, the likelihood P(x|c) of the sample belonging to class c is evaluatedfor each class. Then, Bayes’ rule can be applied to combine these likelihoods with the class12priors P(c) (i.e. how likely is each class before observing the feature) to obtained the classposterior distribution P(c|x) (i.e. how likely is each class after observing the feature),P(c|x) = P(x|c)P(c)Pc P(x|c)P(c). (Bayes’ rule) (8)We then assign to x the class label with highest posterior probability, which is also called themaximum a-posteriori solution. In our two-class ‘pedestrian’ vs ’non-pedestrian’ problem, weclassify x as pedestrian if P(c = pedestrian|x) ≥12. In the GMM the decision value is therefored(x) = P(c = pedestrian|x), with the decision threshold 12.In case of the GMM classifier, the distributions P(x|c) of each class are modeled by a weightedmixture of K Multivariate Normal (i.e. ‘Gaussian’) distributions, i.e.c are the mean and covariance of the i-th mixture component for class c,and w(i)c the component’s weight. Fitting such a distribution on training samples is typicallydone using the Expectation-Maximization (EM) algorithm which iterates between optimizingthe weights, and optimizing the K Normal distributions.In this course, we will not investigate the EM algorithm, and you are not required to know howit works. For training a Gaussian-Mixture-Model, the built-in gmdistribution class in theMatlab Statistical Toolbox can be used, which uses EM internally.D Receiver Operating Characteristic (ROC) curvesInstead of reporting a single test error value for a fixed decision threshold, one can also considerchanging the threshold to trade-off different types of classification error. Lowering thethreshold results in more test cases being assigned to the positive (e.g. pedestrian) class,while increasing it results in more samples classified negatively (e.g. non-pedestrian). Thistrade-off is then reflected in the False Positive Rate (FPR) and True Positive Rate (TPR), bothof which are numbers between 0 and 1.Let di be the classifier’s decision value for test sample i, and yi ∈ {−1, +1} the true classlabel of the sample. Also, let the function counti[x(i)] count the number of samples for whichcondition x holds. Then the TPR and FPR for a given threshold τ are expressed as,P = counti[yi ≥ 0] number of samples in positive class (10)N = counti[yi T P(τ ) = counti[(di ≥ τ ) ∧ (yi ≥ 0)] number of positive samples classified as positive (12)F P(τ ) = counti[(di ≥ τ ) ∧ (yi T P R(τ ) = T P(τ )PF P R(τ ) = F P(τ )Ntrue and false positive rates. (14)By computing the TPR and FPR for varying thresholds, one can create a so-called ReceiverOperatingCharacteristic (ROC) curve. Figure 3 shows an example of an ROC plot for a singleclassifier. The ROC curve always starts at (0,0), which corresponds to setting the threshold sohigh that all test samples are assigned to the negative class, hence there would be no falsepositives, but also no true positives. The other extreme is obtained when the threshold is so130 0.2 0.4 0.6 0.8 1False Positive RateTrue Positive RateROCFigure 3: Example of an ROC curve for a certain classifier. We can see for instance that weobtain a True Positive Rate of about 70% (e.g. actual pedestrians classified as pedestrians) ata False Positive Rate of 20% (e.g. actual non-pedestrians classified as pedestrians).low that everything is assigned to the positive class, in which case both FPR and TPR becomeone as the curves reaches the top-right corner at (1,1). The curve of an ideal classifier wouldtouch the top-left corner, corresponding to TPR of 1 for a FPR of 0.14转自:http://www.daixie0.com/contents/12/4378.html