R极简教程-12:交互式绘图

目前绘制R语言交互图的主要有rchart,Highchart和plotly,我个人只用过最后一个,感觉很好。

本篇案例全部出自plotly官网。

气泡图

气泡图简而言之就是升级版的ScatterPlot,每一个横纵坐标上的点有了两个更多的属性,可以有颜色,还可以有大小,通过它可以再一个图像上显示3-4个维度的数据。

library(plotly)

data <- read.csv("https://raw.githubusercontent.com/plotly/datasets/master/school_earnings.csv")

p <- plot_ly(data, 
             x = ~Women, 
             y = ~Men, 
             text = ~School, 
             type = 'scatter', 
             mode = 'markers',
             marker = list(size = ~gap, opacity = 0.5)) %>%
             layout(title = 'Gender Gap in Earnings per University',
             xaxis = list(showgrid = FALSE),
             yaxis = list(showgrid = FALSE))
p

R极简教程-12:交互式绘图_第1张图片

我们需要认真看一下data里边是什么:

> data
       School Women Men gap
1         MIT    94 152  58
2    Stanford    96 151  55
3     Harvard   112 165  53
4      U.Penn    92 141  49
5   Princeton    90 137  47
6     Chicago    78 118  40
7  Georgetown    94 131  37
8       Tufts    76 112  36
9        Yale    79 114  35
10   Columbia    86 119  33
11       Duke    93 124  31
12  Dartmouth    84 114  30
13        NYU    67  94  27
14 Notre Dame    73 100  27
15    Cornell    80 107  27
16   Michigan    62  84  22
17      Brown    72  92  20
18   Berkeley    71  88  17
19      Emory    68  82  14
20       UCLA    64  78  14
21      SoCal    72  81   9

上边的数据很重要!因为它是plotly或者说ggplot2的一个重要特性,也就是用DataFrame显示一切数据,可以把多维度的数据展开成一个二维表格。这个是一个非常重要的设计,我也不知道始于谁,但是它的结果就是,让所有的数据整理可视化存储格式都有迹可循,让所有人对于数据的整理和规划有了一个固定的规范。

可以想象一下,横坐标是年份,纵坐标是国家,气泡大小代表人数,颜色代表发达程度。

看一个比较复杂的例子:

library(plotly)

data <- read.csv("https://raw.githubusercontent.com/plotly/datasets/master/gapminderDataFiveYear.csv")

data_2007 <- data[which(data$year == 2007),]
data_2007 <- data_2007[order(data_2007$continent, data_2007$country),]
slope <- 2.666051223553066e-05
data_2007$size <- sqrt(data_2007$pop * slope)
colors <- c('#4AC6B7', '#1972A4', '#965F8A', '#FF7070', '#C61951')

p <- plot_ly(data_2007, x = ~gdpPercap, y = ~lifeExp, color = ~continent, size = ~size, colors = colors,
        type = 'scatter', mode = 'markers', sizes = c(min(data_2007$size), max(data_2007$size)),
        marker = list(symbol = 'circle', sizemode = 'diameter',
                      line = list(width = 2, color = '#FFFFFF')),
        text = ~paste('Country:', country, '
Life Expectancy:'
, lifeExp, '
GDP:'
, gdpPercap, '
Pop.:'
, pop)) %>% layout(title = 'Life Expectancy v. Per Capita GDP, 2007', xaxis = list(title = 'GDP per capita (2000 dollars)', gridcolor = 'rgb(255, 255, 255)', range = c(2.003297660701705, 5.191505530708712), type = 'log', zerolinewidth = 1, ticklen = 5, gridwidth = 2), yaxis = list(title = 'Life Expectancy (years)', gridcolor = 'rgb(255, 255, 255)', range = c(36.12621671352166, 91.72921793264332), zerolinewidth = 1, ticklen = 5, gridwith = 2), paper_bgcolor = 'rgb(243, 243, 243)', plot_bgcolor = 'rgb(243, 243, 243)') p

R极简教程-12:交互式绘图_第2张图片
途中的纵坐标代表的是人均预期寿命,可以看出,中间的两个大泡是中国和印度。

饼图

饼图是呈现比例的一个绝好的办法,但是做的时候最好确认你的数据适合这样的呈现方式,我见过很多本来应该用barplot来呈现的图最后选择了饼图。

library(plotly)

USPersonalExpenditure <- data.frame("Categorie"=rownames(USPersonalExpenditure), USPersonalExpenditure)
data <- USPersonalExpenditure[,c('Categorie', 'X1960')]

p <- plot_ly(data, labels = ~Categorie, values = ~X1960, type = 'pie') %>%
  layout(title = 'United States Personal Expenditures by Categories in 1960',
         xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
         yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
p

R极简教程-12:交互式绘图_第3张图片

一样我们来看一下数据:

> data
                              Categorie X1960
Food and Tobacco       Food and Tobacco 86.80
Household Operation Household Operation 46.20
Medical and Health   Medical and Health 21.10
Personal Care             Personal Care  5.40
Private Education     Private Education  3.64

同理,plotly是一个完全数据驱动的程序,你只需要把数据结构整理好,然后按照一个一个的列指定给plotly函数,就OK了。

柱状图

library(plotly)

x <- c('Feature A', 'Feature B', 'Feature C', 'Feature D', 'Feature E')
y <- c(20, 14, 23, 25, 22)
data <- data.frame(x, y)

p <- plot_ly(data, x = ~x, y = ~y, type = 'bar',
        marker = list(color = c('rgba(204,204,204,1)', 'rgba(222,45,38,0.8)',
                                'rgba(204,204,204,1)', 'rgba(204,204,204,1)',
                                'rgba(204,204,204,1)'))) %>%
  layout(title = "Least Used Features",
         xaxis = list(title = ""),
         yaxis = list(title = ""))
p

R极简教程-12:交互式绘图_第4张图片
这里哦我们可以看出,marker这个参数,是一个list,然后你需要往其中传入一些固定的名字,比如color,这样就可以指定不同的东西的名字。

折线图

library(plotly)

trace_0 <- rnorm(100, mean = 5)
trace_1 <- rnorm(100, mean = 0)
trace_2 <- rnorm(100, mean = -5)
x <- c(1:100)

data <- data.frame(x, trace_0, trace_1, trace_2)

p <- plot_ly(data, x = ~x, y = ~trace_0, name = 'trace 0', type = 'scatter', mode = 'lines') %>%
  add_trace(y = ~trace_1, name = 'trace 1', mode = 'lines+markers') %>%
  add_trace(y = ~trace_2, name = 'trace 2', mode = 'markers')
p

R极简教程-12:交互式绘图_第5张图片

上图的代码里有一个功能是很重要的:就是add_trace,这个函数可以让你无限制地网图上加不同的线。

举个例子,你可以自动地选择一系列的国家,比较他们的增长率,你点击那个国家,哪条线就会被加上去。现在的数据工具越来越多,其中最大的一个特征就是,通过数据直接驱动产品变得愈来愈容易。

下面看一个更为复杂的案例:

library(plotly)

month <- c('January', 'February', 'March', 'April', 'May', 'June', 'July',
           'August', 'September', 'October', 'November', 'December')
high_2014 <- c(28.8, 28.5, 37.0, 56.8, 69.7, 79.7, 78.5, 77.8, 74.1, 62.6, 45.3, 39.9)
low_2014 <- c(12.7, 14.3, 18.6, 35.5, 49.9, 58.0, 60.0, 58.6, 51.7, 45.2, 32.2, 29.1)
data <- data.frame(month, high_2014, low_2014)
data$average_2014 <- rowMeans(data[,c("high_2014", "low_2014")])

#The default order will be alphabetized unless specified as below:
data$month <- factor(data$month, levels = data[["month"]])

p <- plot_ly(data, x = ~month, y = ~high_2014, type = 'scatter', mode = 'lines',
        line = list(color = 'transparent'),
        showlegend = FALSE, name = 'High 2014') %>%
  add_trace(y = ~low_2014, type = 'scatter', mode = 'lines',
            fill = 'tonexty', fillcolor='rgba(0,100,80,0.2)', line = list(color = 'transparent'),
            showlegend = FALSE, name = 'Low 2014') %>%
  add_trace(x = ~month, y = ~average_2014, type = 'scatter', mode = 'lines',
            line = list(color='rgb(0,100,80)'),
            name = 'Average') %>%
  layout(title = "Average, High and Low Temperatures in New York",
         paper_bgcolor='rgb(255,255,255)', plot_bgcolor='rgb(229,229,229)',
         xaxis = list(title = "Months",
                      gridcolor = 'rgb(255,255,255)',
                      showgrid = TRUE,
                      showline = FALSE,
                      showticklabels = TRUE,
                      tickcolor = 'rgb(127,127,127)',
                      ticks = 'outside',
                      zeroline = FALSE),
         yaxis = list(title = "Temperature (degrees F)",
                      gridcolor = 'rgb(255,255,255)',
                      showgrid = TRUE,
                      showline = FALSE,
                      showticklabels = TRUE,
                      tickcolor = 'rgb(127,127,127)',
                      ticks = 'outside',
                      zeroline = FALSE))
p

R极简教程-12:交互式绘图_第6张图片
这个图可以显示一个区间度,所以很适合那种随着时间变化有波动发生的数据。

散点图

library(plotly)

p <- plot_ly(data = iris, x = ~Sepal.Length, y = ~Petal.Length,
        marker = list(size = 10,
                       color = 'rgba(255, 182, 193, .9)',
                       line = list(color = 'rgba(152, 0, 0, .8)',
                                   width = 2))) %>%
  layout(title = 'Styled Scatter',
         yaxis = list(zeroline = FALSE),
         xaxis = list(zeroline = FALSE))
p

R极简教程-12:交互式绘图_第7张图片

让我们看一下朴素版的图像:

plot(iris$Sepal.Length,iris$Petal.Length)

R极简教程-12:交互式绘图_第8张图片

认真看就能发现,图像其实是一模一样的,一点区别都没有,只不过前者炫酷很多吧。

在这里我们就展示这么多,但是其实plotly还有其他的很多功能,很耐得住探索。

你可能感兴趣的:(R语言)