讲解:MSBA7002、Statistics、R、RDatabase|Matlab

Predict the Bankruptcy Situation of Polish CompaniesMSBA7002: Business StatisticsNov 13, 2019Due Date: 11:55pm Dec 1, 2019Projective Deliverables(1) Power point or pdf file, containing clear steps about model selection and fiveinteresting visualizations that can help to answer analytical problems;(2) Rmarkdown file that embeds R code within the analysis narrative, or(3) Python code with comments written in Jupyter notebook.(please make sure that by running the code, all the results reported can bereproduced).ContentBankrupt fileis about bankruptcy prediction of Polish companies. The data contains financial ratesfrom 2nd year of the forecasting period and corresponding class label that indicatesbankruptcy status after 4 years. The bankrupt companies were analysed in the period2000-2012, while the still operating companies were evaluated from 2007 to 2013. Inthis task, you need to use 64 features from the financial reports to predict whichcompanies will go bankrupt in the next four years.Lookup filecontains the text description of all variable codes in the data file.Training and testing datasetsTraining data: containing 6000 rows of data;Testing data: containing 3000 rows of data.Tasks1. Data Pre-processingThis dataset contains plenty of missing values. You need to report how you handlethese missing values.2. Model SelectionReport how you build your model or model ensemble with suitable criterion.3. VisualizationsReport 5 most interesting visualizations that can help to answer analytical problems.For example, are there any predictors have some high correlations, which twodimensions can provide a good classification performance, etc.4. ClassificationBased on the features available, develop a model that predicts the bankruptcysituation of the companies. The classification results will be evaluated in Kaggleautomatically.The evaluation metric – F scorewith TP, FP and FN being the numbers of true positive,false positive, and false negative, respectively.5. Report resultsThe slides should include at least three parts – visualization, methodology, and results.The methodology section should be precise and can justify your decisions, forexample, how you choose hyperparameters, why you prefer a particular method overthe others?The codes need to demonstrate that the classification results are reproducible, and theadopted method is consistent with the one introduced in the submission file.Notice1. Project deadline: 11:55 pm, Dec 1, 2019.2. You can use either R or Python for the classification task.3. The composition of marks is given in the table belowTotal Score CriterionVisualization 5 InnovationAestheticInformation complexityInsightfulness of conclusionClassification 1AnalysisPresentation10 Clear idea about model selectionGood interpretation about modelWell-structured report转自:http://www.daixie0.com/contents/18/4374.html

你可能感兴趣的:(讲解:MSBA7002、Statistics、R、RDatabase|Matlab)