三星GALAXY智能手机数据分析的准备:Preparation OF Data Analysis.Data from" Samsung Galaxy S smartphone"

This s my "Getting and Cleaning Data Course" Project.

目录

1.load the data in R

2.Merges the training and the test sets to create one data set.

3.Extracts only the measurements on the mean and standard deviation for each measurement.

4.Uses descriptive activity names to name the activities in the data set

5.Appropriately labels the data set with descriptive variable names.

6.From the data set in step 5, creates a second, independent tidy data set with the average of each variable for each activity and each subject.


Here are the data for the project:

https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip


One of the most exciting areas in all of data science right now is wearable computing .Companies like Fitbit, Nike, and Jawbone Up are racing to develop the most advanced algorithms to attract new users. The data linked to from the course website represent data collected from the accelerometers from the Samsung Galaxy S smartphone. 

And this time ,i downloaded the file into my workdir to read the readme.txt easier . If u wanna know sth about the download process in R, u can go to : 用R获得你想要的原始数据-如何下载  to check more detail.


 

1.load the data in R

here,i download the  dataset into my wd already. If u wanna download from R coding ,and wanna know how to do it ,welcome to :how to LOAD the data .

#already set the dataset file as wd
setwd("C:/Users/zhong/Desktop/coursera/R/UCI HAR Dataset")

#load the data
train_x <- read.table("./train/X_train.txt")
train_y <- read.table("./train/y_train.txt")
train_subject <- read.table("./train/subject_train.txt")
test_x <- read.table("./test/X_test.txt")
test_y <- read.table("./test/y_test.txt")
test_subject <- read.table("./test/subject_test.txt")

 

2.Merges the training and the test sets to create one data set.

#combine the data
trainData <- cbind(train_subject, train_y, train_x)
testData <- cbind(test_subject, test_y, test_x)

#merge the train and test data
MergeData <- rbind(trainData, testData)

 

3.Extracts only the measurements on the mean and standard deviation for each measurement.

#Extract only the measurements on the mean and standard deviation for each measurement. 
##get the feature of the data
Feature <- read.table("./features.txt", stringsAsFactors = FALSE)[,2]

##add feature into the data
FeatureIndex <- grep(("mean\\(\\)|std\\(\\)"), Feature)
DATA <- MergeData[, c(1, 2, FeatureIndex+2)]
colnames(DATA) <- c("subject", "activity", Feature[FeatureIndex])


4.Uses descriptive activity names to name the activities in the data set

#Uses descriptive activity names to name the activities in the data set
## get activity name
ActivityName <- read.table("./activity_labels.txt")

##replace activity names
DATA$activity <- factor(DATA$activity, levels = ActivityName[,1], labels = ActivityName[,2])

5.Appropriately labels the data set with descriptive variable names.

#Appropriately labels the data set with descriptive variable names.

names(DATA) <- gsub("\\()", "", names(DATA))
names(DATA) <- gsub("^t", "time", names(DATA))
names(DATA) <- gsub("^f", "frequence", names(DATA))
names(DATA) <- gsub("-mean", "Mean", names(DATA))
names(DATA) <- gsub("-std", "Std", names(DATA))

 

6.From the data set in step 5, creates a second, independent tidy data set with the average of each variable for each activity and each subject.

#From the data set in step 5, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
library(plyr)
tidyData<-aggregate(. ~subject + activity, DATA, mean)
tidyData<-tidyData[order(tidyData$subject,tidyData$activity),]

#save the data which s clean and tidy
write.table(tidyData, file = "tidyData.txt",row.name=FALSE)

 more info. and code update :https://github.com/kidpea/Preparation-OF-Data-Analysis.Data-from-Samsung-Galaxy-S-smartphone-/blob/master/run_analysis.R

你可能感兴趣的:(R语言)