MSc代做、代做R编程语言、代写R设计、代写Data Science代写Web开发|代写R语言编程

Real data coding exercisesMSc in Statistics for Data Science at Carlos III University of MadridEduardo García-Portugués2020-03-10, v1.0DatasetsMNIST datasetThe MNIST database contains images of handwritten 0−9 digits used in mailing. The first 60, 000 images arestored in the object MNIST inside MNIST-tSNE.RData. Each observation is a grayscale image made of 28 × 28pixels that is recorded as a vector of length 282 = 784 by concatenating the pixel columns. The entries ofthese vectors are numbers in the interval [0, 1] indicating the level of grayness of the pixel: 0 for white, 1for black. These vectorised images are stored in the 60, 000 × 784 matrix in MNIST$x. The typical goal is toclassify the images as one of the digits they represent. These 0 − 9 labels are stored in MNIST$labels.The data is high-dimensional (lives in R784) and therefore hard to analyse. To simplify its statistical treatment,the nonlinear dimension-reduction technique t-SNE has been applied. t-SNE can be regarded as a kind of“nonlinear PCA” that maps the data into a low-dimensional representation, in this case in R2. The “t-SNEscores” are stored in MNIST$y_tsne.The following code chunk illustrates the basics of the data.# Load dataload(datasets/MNIST-tSNE.RData)# Visualization of an individual imageshow_digit image(matrix(vec, nrow = 28)[, 28:1], col = col, ...)}# Plot the first 10 images (in rows order)par(mfrow = c(1, 10), mar = c(0, 0, 0, 0))for (i in 1:10) show_digit(MNIST$x[i, ], axes = FALSE)# Data structurestr(MNIST)## List of 3## $ x : num [1:60000, 1:784] 0 0 0 0 0 0 0 0 0 0 ...## $ labels: int [1:60000] 5 0 4 1 9 2 1 3 1 4 ...## $ y_tsne: num [1:60000, 1:2] -58.5 -19 85.2 -23.8 68.3 ...1# Plot t-SNE scorespar(mfrow = c(1, 1), mar = c(5, 4, 4, 2) + 0.1)plot(MNIST$y_tsne, col = rainbow(10)[MNIST$labels + 1], pch = 16, cex = 0.25)legend(bottomright, col = rainbow(10), pch = 16, legend = 0:9)−100 −50 0 50 100−100 −500 50 100MNIST$y_tsne[,1]MNIST$y_tsne[,2]sunspots_births datasetThe sunspots_births dataset from the rotasym package contains the recorded sunspots births during1872–2018 from the Debrecen Photoheliographic Data (DPD) sunspot catalogue and the revised version of theGreenwich Photoheliographic Results (GPR) sunspot catalogs. The dataset contains 51, 303 observations of 6features of the births of groups of sunspots. These features include their positions in spherical coordinates(theta and phi), their size (total_area), and their distance to the center of the solar disk (dist_sun_disc).The data has been recently analysed in this paper.The following code chunk illustrates the basics of the data.# Install packages# install.packages(rotasym)# Load datadata(sunspots_births, package = rotasym)# Data summarysummary(sunspots_births)## date cycle total_area## Min. :1874-04-17 11:38:00 Min. :11.00 Min. : 0.00## 1st Qu.:1929-04-01 01:00:01 1st Qu.:16.00 1st Qu.: 4.00## Median :1964-12-16 08:24:00 Median :20.00 Median : 12.00## Mean :1959-06-19 23:05:53 Mean :18.98 Mean : 54.63## 3rd Qu.:1990-07-09 03:43:59 3rd Qu.:22.00 3rd Qu.: 43.002## Max. :2018-06-19 04:58:46 Max. :24.00 Max. :2803.00## dist_sun_disc theta phi## Min. :0.0030 Min. :0.000 Min. :-1.0384709## 1st Qu.:0.4637 1st Qu.:1.600 1st Qu.:-0.2565634## Median :0.7450 Median :3.121 Median : 0.0087266## Mean :0.6895 Mean :3.134 Mean : 0.0002379## 3rd Qu.:0.9530 3rd Qu.:4.679 3rd Qu.: 0.2548181## Max. :1.0000 Max. :6.283 Max. : 1.0419616# Obtain the data from the 23rd solar cyclesunspots_23 # Transform to Cartesian coordinatesX cos(sunspots_23$phi) * sin(sunspots_23$theta),sin(sunspots_23$phi))# Plot datan_cols rgl::plot3d(0, 0, 0, xlim = c(-1, 1), ylim = c(-1, 1), zlim = c(-1, 1),radius = 1, type = s, col = lightblue, alpha = 0.25,lit = FALSE)cuts breaks = quantile(sunspots_23$date,probs = seq(0, 1, l = n_cols + 1)))rgl::points3d(X, col = viridisLite::viridis(n_cols)[cuts])# Spörers law: sunspots at the beginning of the solar cycle (dark blue# color) tend to appear at higher latitutes, gradually decreasing to the# equator as the solar cycle advances (yellow color)ExercisesChapter 2• ExerciseMSc留学生作业代做、代做R编程语言作业、代写R课程设计作业、代写Data Science作业 代写Web开发|代写R语 2.19 (solved in exercise_2_19.R). Consider the sunspots_births dataset.a. Compute and plot the kernel density estimator for phi using the DPI selector. Describe the result.b. Compute and plot the kernel density derivative estimator for phi using the adequate DPI selector.Using a horizontal line at y = 0, determine approximately the location of the main mode(s).c. Compute the log-transformed kernel density estimator with adj.positive = 1 for total_areausing the NS selector.d. Draw the histogram of M = 104samples simulated from the kernel density estimator obtained ina.• Exercise 2.20 (solved in exercise_2_20.R). Consider the MNIST dataset.a. Compute the average gray level, av_gray_one, for each image of the digit “1”.b. Compute and plot the kde of av_gray_one, taking into account that it is a positive variable.c. Overlay the lognormal distribution density, with parameters estimated by maximum likelihood(use MASS::fitdistr).d. Repeat c. for the Weibull density.e. Which parametric model seems more adequate?Chapter 3• Exercise 3.25 (solved in exercise_3_25.R). Load the ovals.RData file.3a. Split the dataset into the training sample, comprised of the first 2,000 observations, and the testsample (rest of the sample). Plot the dataset with colors for its classes. What can you say aboutthe classification problem?b. Using the training sample, compute the plug-in bandwidth matrices for all the classes.c. Use these plug-in bandwidths to perform kernel discriminant analysis.d. Plot the contours of the kernel density estimator of each class and the classes partitions. Usecoherent colors between contours and points.e. Predict the class for the test sample and compare with the true classes. Then report the successfulclassification rate.f. Compare the successful classification rate with the one given by LDA. Is it better than kerneldiscriminant analysis?g. Repeat f. with QDA.• Exercise 3.26 (solved in exercise_3_26.R). Consider the MNIST dataset. Classify the digit images, viathe t-SNE scores, into the digit labels:a. Split the dataset into the training sample, comprised of the first 50,000 t-SNE scores and theirassociated labels, and the test sample (rest of the sample).b. Using the training sample, compute the plug-in bandwidth matrices for all the classes.c. Use these plug-in bandwidths to perform kernel discriminant analysis.d. Plot the contours of the kernel density estimator of each class and overlay the t-SNE scores aspoints. Use coherent colors between contours and points, and add a legend.e. Obtain the successful classification rate of the kernel discriminant analysis.• Exercise 3.27 (solved in exercise_3_27.R). Consider the MNIST dataset. Investigate the different waysof writing the digit “3” using kernel mean shift clustering. Follow the next steps:a. Consider the first 2,000 t-SNE scores (for the sake of computational expediency) f the class “3”.b. Compute the normal scale bandwidth matrix that is appropriate for performing kernel mean shiftclustering with the first 2,000 t-SNE scores of the class “3”.c. Do kernel mean shift clustering on that subset using the previously obtained bandwidth and obtainthe modes ξj, j = 1, . . . , M, that cluster the t-SNE scores.d. Determine the M images that have the closest t-SNE scores, using the Euclidean distance, to themodes ξj.e. Show the closest images associated to the modes. Do they represent different forms of drawing thedigit “3”?Chapter 4• Exercise 4.14 (solved in exercise_4_14.R).a. Code your own function that computes the local cubic estimator. The function must take as inputthe vector of evaluation points x, the sample data, the bandwidth h, and the kernel K. The resultmust be a vector of the same length as x containing the estimator evaluated at x.b. Test the implementation by estimating the regression function in the location model Y = m(X)+ε,where m(x) = (x − 1)2, X ∼ N (1, 1), and ε ∼ N (0, 0.5). Do it from a sample of size n = 500.• Exercise 4.15 (solved in exercise_4_15.R). Consider the sunspots_births dataset. Then:a. Filter the dataset to account only for the 23rd cycle.b. Inspect the graphical relation between dist_sun_disc (response) and log10(total_area) (predictor).c. Compute the CV bandwidth for the above regression, for the local constant estimator.d. Compute and plot the local constant estimator using CV bandwidth.e. Repeat c. and d. for the local linear estimator.4转自:http://www.3zuoye.com/contents/18/4844.html

你可能感兴趣的:(MSc代做、代做R编程语言、代写R设计、代写Data Science代写Web开发|代写R语言编程)