Cardiff School of Computer Science and Informatics Coursework Assessment Pro-formaModule Code: CMT307Module Title: Applied Machine LearningLecturer: Jose Camacho-Collados, Yuhua LiAssessment Title: Coursework 1Assessment Number: 1Date Set: Monday, October 28thSubmission Date and Time: Tuesday, January 14th at 9:30amReturn Date: Friday, February 7thThis assignment is worth 50% of the total marks available for this module. If coursework issubmitted late (and where there are no extenuating circumstances):1 If the assessment is submitted no later than 24 hours after the deadline,the mark for the assessment will be capped at the minimum pass mark;2 If the assessment is submitted more than 24 hours after the deadline, amark of 0 will be given for the assessment.Your submission must include the official Coursework Submission Cover sheet, which can befound here:https://docs.cs.cf.ac.uk/downloads/coursework/Coversheet.pdfSubmission InstructionsThis coursework consists of a portfolio divided into two parts with equal weight:- Part (1) consists of selected homework similar to the one handed inthroughout the course. The final deliverable consists of a single PDF file,which may include reflective answers to theoretical exercises, snippets ofPython code and solved exercises.- Part (2) consists of a machine learning project where students implement abasic machine learning algorithm for solving a given task. The deliverable is azip file with the code, and a written summary (up to 1200 words) describingsolutions, design choices and a reflection on the main challenges faced duringdevelopment.Description Type NameCover sheet? Compulsory One PDF (.pdf) file [student number].pdfPart 1 Compulsory One PDF (.pdf) file part1_[student number].pdfPart 2 Compulsory One ZIP (.zip) file containing the Pythoncodepart2code_[student number].zipPart 2 Compulsory One PDF (.pdf) file for the reflective report part2report_[student number].pdfAny code submitted will be run in Python 3 (Linux) and must be submitted as stipulated inthe instructions above.Any deviation from the submission instructions above (including the number and types offiles submitted) will result in a mark of zero for the assessment or question part.Staff reserve the right to invite students to a meeting to discuss coursework submissionsAssignmentIn this coursework, students demonstrate their familiarity with the topics covered in themodule via two separate parts with equal weight (50% each).Part 1In Part 1, students are expected to answer two types of questions: theoretical and practical.Please answer the questions with your own words and provide short answers (fewer than100 words each) for the theoretical questions.1. Theory (15%)1. What is the difference between a rule-based system and a machine learningsystem? (5%)2. What is the difference between unsupervised and supervised learning? (5%)3. What do we mean when we say that a machine learning system is overfitting?(5%)2. Practice (85%)1. Your algorithm gets the following results in a classification experiment. Pleasecompute the precision, recall, f-measure and accuracy *manually* (without thehelp of your computer/Python, please provide all steps and formulas). Includethe process to get to the final result. (20%)Id Prediction Gold1 True True2 True True3 False True4 True True5 False True6 False True7 True True8 True True9 True True10 False False11 False False12 False False13 True False14 False False15 False False16 False False17 False False18 True False19 True False20 False False2. You are given a dataset (named Wine dataset) with different measuredproperties of different wines (dataset available in Learning Central). Your goal isto develop a machine learning model to predict the quality of an unseen winegiven these properties. Train two machine learning regression models and checktheir performance. Write, for each of the models, the main Python instructionsto train and predict the labels (one line each, no need to include any datapreprocessing) and the performance in the test set in terms of Root MeanSquared Error (RMSE) (30%)3. Train an SVM binary classifier using the Hateval dataset (available in LearningCentral). The task consists of predicting whether a tweet represents hate speechor not. You can preprocess and choose the features freely. Evaluate theperformance of your classifier in terms of accuracy using 10-fold cross-validation.Write a table with the results of the classifier (accuracy, precision, recall andF-measure) in each of the folds and write a small summary (up to 500 words) ofhow you preprocessed the data, chose the feature/s, and trained and evaluatedyour model (35%)Part 2In Part 2, students are provided with a sentiment analysis dataset (IMDb). The datasetcontains positive and negative movie reviews. Training, development and test splits areprovided. Based on this dataset, students will be asked to preprocess the data, selectfeatures and train a machine learning model of th CMT307作业代写、代做Python程序作业、代写Python编程作业、代做Applied Machine Leareir choice to solve this problem. Studentsshould include at least three different features to train their model, one of them should bebased on some sort of word frequency. Students can decide the type of frequency (absoluteor relative, normalized or not) and text preprocessing for this mandatory word frequencyfeature. The remaining two (or more) features can be chosen freely. Then, students areasked to perform feature selection to reduce the dimensionality of all features.Deliverables for this part are the Python code including all steps and an essay of up to 1200words. The Python code should include the Python scripts and a small README file withinstructions on how to run the code in Linux. Jupyter notebooks with clear execution pathsare also accepted. The code should take the training set as input, and output the results inthe test set. The code will consist of 25% of the marks for this part and the essay theremaining 75%. The code should contain all necessary steps described above: to get the fullmarks for the code, it should work properly and clearly perform all required steps. The essayshould include:1) Description of all steps taken in the process (preprocessing, choice of features,feature selection and training and testing of the model). (25% - The quality of thepreprocessing, features and algorithm will not be considered here)2) Justification of all steps. Some justifications may be numerical, in that case adevelopment set is included to perform additional experiments. (25% - A reasonablereasoned justification is enough to get half of the marks here. The usage of thedevelopment set is required to get full marks)3) Overall performance (precision, recall, f-measure and accuracy) of the trained modelin the test set. (10% - Indicating the results, even if very low, is enough to get halfof the marks here. A minimum of 65% accuracy is required to get full marks)4) Critical reflection of how the deliverable could be improved in the future and onpossible biases that the deployed machine learning may have. (15% - The depth andcorrectness of insights related to your deliverable will be assessed)The essay may include tables and/or figures.Extra credit (optional - 15% extra marks in the second part): For this second part studentscan get extra credits by writing an essay on one specific task related to Part 2 (except foroption d, see instructions below). The essay will need to contain a maximum of 500 words(figures/tables are allowed and encouraged) and will deal with one of the following fourspecific topics:a. Error analysis: Check the types of errors that the system submitted for Part 2 makesand reflect on possible solutions. Qualitative analysis with specific examples isencouraged.b. Literature review: Write an essay about the state of the art of the field (i.e.automatic hate speech detection). Retrieve relevant articles and digest them,connecting them with your proposed solution to the problem in Part 2.c. Model comparison: Propose and evaluate machine learning systems of differentnature from the ones taught during the course. Write a table with all results andanalyze the strengths and limitations of the approaches.d. Code release: Create a GitHub or Bitbucket repository with the data and Pythoncode used for Part 2, with very clear instructions on how to run the code from theterminal and about its different functionalities/parameters. Include all necessarydata, provide full documentation and comment on the code. Students only need toinclude the link to the repository in the pdf.Learning Outcomes AssessedThis coursework covers the 7 LOs listed in the module description. Specifically:Part 1: LO1, LO2Part 2: LO1, LO3, LO4, LO5, LO6Criteria for assessmentCredit will be awarded against the following criteria.Credit will be awarded against thefollowing criteria.? Part 1. The main criteria for assessment in based on the correctness of the answer,unless a written reflection is required, in which case correctness/performance andwritten justification weigh 50% each.? Part 2. This part is divided into Python code (25%) and an essay (75%). The code willbe evaluated based on whether it works or not, and whether it minimally containsthe necessary steps required for the completion of Part 2. Four items will beevaluated in the essay, whose weights and descriptions are indicated in theassessment instructions. The main criteria to evaluate those items will be theadequacy of the answer with respect to what was asked, and the justificationprovided.The grade range is divided in:Distinction (70-100%)Merit (60-69%)Pass (50-59%)Fail (0-50)Feedback and suggestion for future learningFeedback on your coursework will address the above criteria. Feedback and marks will bereturned between February 3rd and February 7th via Learning Central. There will beopportunity for individual feedback during an agreed time.Feedback for this assignment will be useful for subsequent skills development, such as datascience, natural language processing and deep learning (which will be studied during thesecond semester).转自:http://www.daixie0.com/contents/3/2331.html