Data Science with R in 4 Weeks - Week 1 - Day3

Day 3: summaries of data - two dimension summary


例子1: multiple boxplot  不同联盟的胜率有什么不同?

> temp <- read.csv("basketball_teams.csv")

> teamdata <- as.data.frame(temp)

> teamdata$new_column <- ifelse(teamdata$games == 0, NA, teamdata$won / teamdata$games)

> stats <- teamdata[, c("name","lgID", "year","new_column")]

boxplot(stats$new_column ~stats$lgID, data = stats, col = "red")

结果如下:


Data Science with R in 4 Weeks - Week 1 - Day3_第1张图片

我们也可以用histgram

> par(mfrow = c(2,1), mar = c(4,4,2,1))

> hist(subset(stats$new_column, stats$lgID == "ABA"), col="green")

> hist(subset(stats$new_column, stats$lgID == "NBA"), col="green")


Data Science with R in 4 Weeks - Week 1 - Day3_第2张图片

scatterplot

> with(stats, plot(stats$year, stats$new_column))

> abline( h =0.7, lwd = 2, lty = 2)


Data Science with R in 4 Weeks - Week 1 - Day3_第3张图片

add color to scatterplot

with(stats, plot(stats$year, stats$new_column, col=stats$lgID))

从这个图中,我们就能看出来各个联赛(ABA,NBA)的球队他们的胜率是什么样子的。

Data Science with R in 4 Weeks - Week 1 - Day3_第4张图片

或者,可以做多个scatterplot

分别看NBA和NBL的胜率

> with(subset(stats, stats$lgID == "NBA"), plot(subset(stats, stats$lgID == "NBA")$year, subset(stats, stats$lgID == "NBA")$new_column, main = "NBA"))

> with(subset(stats, stats$lgID == "NBL"), plot(subset(stats, stats$lgID == "NBL")$year, subset(stats, stats$lgID == "NBL")$new_column, main = "NBL"))


Data Science with R in 4 Weeks - Week 1 - Day3_第5张图片

你可能感兴趣的:(Data Science with R in 4 Weeks - Week 1 - Day3)