R 数据可视化 —— ggplot 标度（四）配色

10. 颜色

除了前面讲的位置标度之外，最常用的就是颜色属性的修改了。

配色也是一门艺术，像我这种没有艺术细胞的人就只能看着哪种配色方案漂亮就将就着用了

10.1 ColorBrewer 配色

ColorBrewer 提供了 sequential, diverging 和 qualitative 三种不同的配色类型，每种类型又有不同的配色方案。

有三个不同的标度函数来设置

# 针对离散型数据
scale_*_brewer()
# 针对连续型数据
scale_*_distiller()
# 针对分箱数据
scale_*_fermenter()

其中 * 代表 colour（轮廓颜色）和 fill（填充色）。

palette 参数的值可以是下列字符串之一

Diverging(双色渐变系)

BrBG, PiYG, PRGn, PuOr, RdBu, RdGy, RdYlBu, RdYlGn, Spectral

Qualitative（多色系）

Accent, Dark2, Paired, Pastel1, Pastel2, Set1, Set2, Set3

Sequential（单色系）

Blues, BuGn, BuPu, GnBu, Greens, Greys, Oranges, OrRd, PuBu, PuBuGn, PuRd, Purples, RdPu, Reds, YlGn, YlGnBu, YlOrBr, YlOrRd

示例

对于如下数据

dsamp <- diamonds[sample(nrow(diamonds), 1000), ]

我们可以为 colour 设置变量

(d <- ggplot(dsamp, aes(carat, price)) +
    geom_point(aes(colour = clarity)))

但是我觉得这种配色不太好看，想换一种。我们进行如下尝试

p1 <- d + scale_colour_brewer()

p2 <- d + scale_colour_brewer(palette = "Greens")

p3 <- d + scale_colour_brewer(palette = "Set1")

p4 <- d + scale_colour_brewer(palette = "Spectral")

plot_grid(p1, p2, p3, p4, labels = LETTERS[1:4], nrow = 2)

en...，这样来看，图 C 是比较好看的，图 D 看起来也行，单色系的就差了点喽。

连续型数据的设置方式也是类似的

d1 <- ggplot(dsamp, aes(carat, price)) +
  geom_point(aes(colour = depth))

d2 <- d + scale_colour_distiller(palette = "Greens")

d3 <- d + scale_colour_distiller(palette = "Set1")

d4 <- d + scale_colour_distiller(palette = "Spectral")

plot_grid(d1, d2, d3, d4, labels = LETTERS[1:4], nrow = 2)

为分箱数据配色

ggplot(dsamp, aes(carat, price)) +
  geom_point(aes(colour=depth)) +
  scale_colour_fermenter(palette = "Spectral")

为每个分箱设置颜色相同的配色

ggplot(dsamp, aes(carat, price, colour=price)) +
  geom_point() +
  scale_x_binned() +
  scale_colour_fermenter(palette = "Spectral")

获取更多的配色方案，可以访问 ColorBrewer 的网页：

http://colorbrewer2.org

配色好看，获取简单，推荐使用

还有一个网站
https://www.webdesignrankings.com/resources/lolcolors/

配色也还可以，只是都是 4 种颜色的。

10.2 渐变色

根据颜色梯度，可以将设置渐变色的函数分为三种：

scale_*_gradient：双色渐变，使用 low 和 high 两个参数控制两端的颜色
scale_*_gradient2：三色渐变，有 low、mid 和 high 三个参数，low 和 high 作用同上，mid 默认值为 0 表示中点的颜色，可以使用 midpoint 参数设置中点位置
scale_*_gradientn：多色渐变，为 colours 参数设置一个颜色向量，不加其他参数会选择范围内的均匀分布值，离散型颜色可以指定 values 参数。

示例

对于如下数据

df <- data.frame(
  x = runif(100),
  y = runif(100),
  z1 = rnorm(100),
  z2 = abs(rnorm(100))
)

df_na <- data.frame(
  value = seq(1, 20),
  x = runif(20),
  y = runif(20),
  z1 = c(rep(NA, 10), rnorm(10))
)

其默认配色为浅蓝色到深蓝色的双色渐变

# 默认双色渐变
p1 <- ggplot(df, aes(x, y)) +
  geom_point(aes(colour = z2))

# 调整双色渐变区间
p2 <- ggplot(df, aes(x, y)) +
  geom_point(aes(colour = z2)) +
  scale_colour_gradient(low = "white", high = "black")

# 三色渐变
p3 <- ggplot(df, aes(x, y)) +
  geom_point(aes(colour = z1)) +
  scale_colour_gradient2()

# 调整三色渐变区间
p4 <- ggplot(df, aes(x, y)) +
  geom_point(aes(colour = z1)) +
  scale_colour_gradient2(low = "green", high = "blue", midpoint = 1)

plot_grid(p1, p2, p3, p4, labels = LETTERS[1:4], nrow = 2)

自定义颜色范围，以及 NA 值的处理

# 自定义颜色范围，使用函数来生成 10 个连续的颜色
p5 <- ggplot(df, aes(x, y)) +
  geom_point(aes(colour = z1)) +
  scale_colour_gradientn(colours = topo.colors(10))

p6 <- ggplot(df, aes(x, y)) +
  geom_point(aes(colour = z1)) +
  scale_colour_gradientn(colours = rainbow(10))

# 去除 NA 值
p7 <- ggplot(df_na, aes(x = value, y)) +
  geom_bar(aes(fill = z1), stat = "identity") +
  scale_fill_gradient(low = "yellow", high = "red", na.value = NA)

p8 <- ggplot(df_na, aes(x, y)) +
  geom_point(aes(colour = z1)) +
  scale_colour_gradient(low = "yellow", high = "red", na.value = NA)

plot_grid(p5, p6, p7, p8, labels = LETTERS[1:4], nrow = 2)

10.3 灰色渐变

使用 start 和 end 控制灰度范围

示例

默认颜色

p <- ggplot(mtcars, aes(mpg, wt)) + geom_point(aes(colour = factor(cyl)))

p + scale_colour_grey()

更改设置

# 设置终端灰度
p1 <- p + scale_colour_grey(end = 0.5)
# 更改主题
p2 <- p + scale_colour_grey() + theme_bw()

# 缺失值
miss <- factor(sample(c(NA, 1:5), nrow(mtcars), replace = TRUE))

p3 <- ggplot(mtcars, aes(mpg, wt)) +
  geom_point(aes(colour = miss)) +
  scale_colour_grey()
# 更改缺失值的颜色
p4 <- ggplot(mtcars, aes(mpg, wt)) +
  geom_point(aes(colour = miss)) +
  scale_colour_grey(na.value = "green")


plot_grid(p1, p2, p3, p4, labels = LETTERS[1:4], nrow = 2)

10.4 分箱渐变色

分箱渐变色与渐变色类似，只是应用的数据不同。也有三个函数，对应于双色、三色以及多色渐变

scale_*_steps()
scale_*_steps2()
scale_*_stepsn()

用起来和渐变色很相似

df <- data.frame(
  x = runif(100),
  y = runif(100),
  z1 = rnorm(100)
)

p1 <- ggplot(df, aes(x, y)) +
  geom_point(aes(colour = z1))

p2 <- ggplot(df, aes(x, y)) +
  geom_point(aes(colour = z1)) +
  scale_colour_steps(low = "skyblue", high = "blue")

p3 <- ggplot(df, aes(x, y)) +
  geom_point(aes(colour = z1)) +
  scale_colour_steps2()

p4 <- ggplot(df, aes(x, y)) +
  geom_point(aes(colour = z1)) +
  scale_colour_stepsn(colours = terrain.colors(10))

plot_grid(p1, p2, p3, p4, labels = LETTERS[1:4], nrow = 2)

10.5 色轮颜色

这种设置方式主要用于将离散型数据均匀地映射到色轮上的颜色。

这种配色方案被称为 HLC，包含三部分：

色相（hue）：0-360 之间的值（代表角度），代表颜色。如红、黄、蓝等
亮度（luminance）：表示颜色的明亮程度，0 表示黑色，100 表示白色
色度（chroma）：颜色的纯度，色度为 0 表示灰色，最大值根据色相和亮度的组合而变化

而标度函数

scale_*_hue(
  ...
  h = c(0, 360) + 15,
  c = 100,
  l = 65,
)

有对应的参数 h、c、l 用于设置这些值。

示例

dsamp <- diamonds[sample(nrow(diamonds), 1000), ]
d <- ggplot(dsamp, aes(carat, price)) + 
  geom_point(aes(colour = clarity))

p1<- d + scale_colour_hue()

p2 <- d + scale_colour_hue("Clarity")

p3 <- d + scale_colour_hue(expression(clarity[beta]))


d + scale_colour_hue(l = 40, c = 30)

plot_grid(d, p1, p2, p3, labels = LETTERS[1:4], nrow = 2)

调整亮度和色相

p1 <- d + scale_colour_hue(l = 40, c = 30)

p2 <- d + scale_colour_hue(l = 70, c = 30)

p3 <- d + scale_colour_hue(l = 70, c = 150)

p4 <- d + scale_colour_hue(l = 80, c = 150)

plot_grid(p1, p2, p3, p4, labels = LETTERS[1:4], nrow = 2)

设置色相的范围

p1 <- d + scale_colour_hue(h = c(0, 90))

p2 <- d + scale_colour_hue(h = c(90, 180))

p3 <- d + scale_colour_hue(h = c(180, 270))

p4 <- d + scale_colour_hue(h = c(270, 360))

plot_grid(p1, p2, p3, p4, labels = LETTERS[1:4], nrow = 2)

设置透明度

d <- ggplot(dsamp, aes(carat, price, colour = clarity))

p2 <- d + geom_point(alpha = 0.9)

p3 <- d + geom_point(alpha = 0.5)

p4 <- d + geom_point(alpha = 0.2)

plot_grid(p2, p3, p4, labels = LETTERS[1:4], nrow = 3)

这个图设置透明度之后好看多了

给 NA 值设置特殊的颜色

miss <- factor(sample(c(NA, 1:5), nrow(mtcars), replace = TRUE))

p1 <- ggplot(mtcars, aes(mpg, wt)) + geom_point(aes(colour = miss))

p2 <- ggplot(mtcars, aes(mpg, wt)) +
  geom_point(aes(colour = miss)) +
  scale_colour_hue(na.value = "black")

plot_grid(p1, p2, labels = LETTERS[1:4], nrow = 2)

10.6 Viridis 配色

viridis 提供的颜色映射，让无论是在彩色还是黑白图片均能够很容易看出区别。

它还被设计能够让色盲观众也能看清

示例

对于如下数据

> dsamp <- diamonds[sample(nrow(diamonds), 1000), ]
> dsamp
# A tibble: 1,000 x 10
   carat cut       color clarity depth table price     x     y     z
                  
 1  1.01 Very Good E     SI1      62.1    56  5461  6.43  6.48  4.01
 2  0.91 Premium   F     SI2      61.7    58  3609  6.22  6.29  3.86
 3  1.26 Very Good H     SI1      60.6    60  6546  6.97  7     4.23
 4  1.31 Premium   H     VS2      61.4    59  7550  7.01  6.96  4.29
 5  1.53 Very Good D     SI1      59.2    57 11873  7.52  7.55  4.46
 6  1.5  Ideal     H     SI2      62.4    58  8471  7.29  7.26  4.54
 7  1.24 Premium   F     VS1      61.5    57  9333  6.97  6.86  4.25
 8  1.14 Premium   H     SI1      59.3    59  5458  6.8   6.76  4.02
 9  0.62 Very Good E     SI2      63.4    56  1426  5.4   5.45  3.44
10  0.72 Premium   G     VS2      62.9    57  2795  5.73  5.65  3.58
# … with 990 more rows

默认的颜色设置方式会根据因子的顺序上色

ggplot(dsamp, aes(carat, price)) +
  geom_point(aes(colour = clarity))

viridis_d 函数用于离散数据配色，可以使用 option 参数设置不同的调色板

# viridis_d 用于离散数据
txsamp <- subset(txhousing, city %in%
                   c("Houston", "Fort Worth", "San Antonio", "Dallas", "Austin"))
d <- ggplot(data = txsamp, aes(x = sales, y = median)) +
    geom_point(aes(colour = city))

# 设置配色和标签
p1 <- d + scale_colour_viridis_d("City\nCenter")

# 选择调色板，使用 ?scales::viridis_pal 获取更多细节
p2 <- d + scale_colour_viridis_d(option = "plasma")

p3 <- d + scale_colour_viridis_d(option = "inferno")

plot_grid(d, p1, p2, p3, labels = LETTERS[1:4], nrow = 2)

设置填充色

# 设置填充色
p <- ggplot(txsamp, aes(x = median, fill = city)) +
  geom_histogram(position = "dodge", binwidth = 15000)
p1 <- p + scale_fill_viridis_d()

# 反转颜色
p2 <- p + scale_fill_viridis_d(direction = -1)

plot_grid(p1, p2, labels = LETTERS[1:4], nrow = 2)

连续型数据及分箱数据颜色设置

# 连续型数据
v <- ggplot(faithfuld) +
    geom_tile(aes(waiting, eruptions, fill = density))

p2 <- v + scale_fill_viridis_c()

p3 <- v + scale_fill_viridis_c(option = "plasma")
# 分箱数据
p4 <- v + scale_fill_viridis_b()

plot_grid(v, p2, p3, p4, labels = LETTERS[1:4], nrow = 2)

10.6 连续型和离散型

其实使用上面的函数已经够了，下面的操作也只是对上面的函数进行整合，功能都是一样的。

连续型

对于连续型数据的配色方案，其实我们上面也都涉及到。在这里我们要使用的是下面两个函数

scale_colour_continuous(
  ...,
  type = getOption("ggplot2.continuous.colour", default = "gradient")
)

scale_fill_continuous(
  ...,
  type = getOption("ggplot2.continuous.fill", default = "gradient")
)

这两个函数主要是通过获取 ggplot2.continuous.colour 和 ggplot2.continuous.fill 两个选项的值来设置颜色的，默认是渐变色

type 参数还可以设置为：

"viridis"
返回连续颜色向量的函数

示例

对于如下数据

> faithfuld
# A tibble: 5,625 x 3
   eruptions waiting density
             
 1      1.6       43 0.00322
 2      1.65      43 0.00384
 3      1.69      43 0.00444
 4      1.74      43 0.00498
 5      1.79      43 0.00542
 6      1.84      43 0.00574
 7      1.88      43 0.00592
 8      1.93      43 0.00594
 9      1.98      43 0.00581
10      2.03      43 0.00554
# … with 5,615 more rows

我们可以绘制一张热图

v <- ggplot(faithfuld, aes(waiting, eruptions, fill = density)) +
  geom_tile()

更改颜色方案

p1 <- v + scale_fill_continuous(type = "gradient")

p2 <- v + scale_fill_continuous(type = "viridis")

p3 <- v + scale_fill_gradient()

p4 <- v + scale_fill_viridis_c()

plot_grid(p1, p2, p3, p4, labels = LETTERS[1:4], nrow = 2)

可以看到，A 与 C，B 与 D 是一样的。

离散型

类似于连续型设置颜色的函数，离散型函数也是通过获取两个选项的值来设置颜色的

scale_colour_discrete(
  ...,
  type = getOption("ggplot2.discrete.colour", getOption("ggplot2.discrete.fill"))
)

scale_fill_discrete(
  ...,
  type = getOption("ggplot2.discrete.fill", getOption("ggplot2.discrete.colour"))
)

例如，我们要绘制如下图形

ggplot(mpg, aes(cty, colour = factor(class), fill = factor(class))) +
  geom_density(alpha = 0.2)

默认是以 scale_fill_hue() 配色方案

使用 scale_*_discrete 来更改配色

okabe <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7")

ggplot(mpg, aes(cty, colour = factor(class), fill = factor(class))) +
  geom_density(alpha = 0.2) +
  scale_color_discrete(type = okabe) +
  scale_fill_discrete(type = okabe)

也可以通过设置 ggplot2.discrete.colour 或 ggplot2.discrete.fill 的值来更改配色

withr::with_options(
  list(ggplot2.discrete.fill = okabe),
  print(plot_cty(class))
)

还可以根据颜色变量的取值数目，设置默认的配色

discrete_palettes <- list(
  c("skyblue", "orange"),
  RColorBrewer::brewer.pal(3, "Set2"),
  RColorBrewer::brewer.pal(6, "Accent")
)

p1 <- plot_cty(year) + scale_fill_discrete(type = discrete_palettes)
p2 <- plot_cty(drv) + scale_fill_discrete(type = discrete_palettes)
p3 <- plot_cty(fl) + scale_fill_discrete(type = discrete_palettes)
# 也可以传递返回离散颜色的函数
p4 <- plot_cty(class) + scale_fill_discrete(type = scale_fill_brewer)

plot_grid(p1, p2, p3, p4, labels = LETTERS[1:4], nrow = 2)

在上面的例子中(图 A-C)，我们在列表中根据离散数据的取值数目（因子的 level）设置不同的配色。

在只有 1-2 个 level 时使用的是 skyblue 和 orange，有 3 个 level 使用 Set2 配色，4-6 个 level 使用 Accent 配色。

R 数据可视化 —— ggplot 标度（四）配色

10. 颜色

10.1 ColorBrewer 配色

示例

10.2 渐变色

示例

10.3 灰色渐变

示例

10.4 分箱渐变色

10.5 色轮颜色

示例

10.6 Viridis 配色

示例

10.6 连续型和离散型

连续型

示例

离散型

你可能感兴趣的:(R 数据可视化 —— ggplot 标度（四）配色)