[包] ggraph

主要是整理官方文件,供自己理解
https://ggraph.data-imaginist.com/articles/Edges.html
提供给ggraph的数据有两个表,分别是nodes和edges, tidygraph中包含很多创造特征的函数. ggraph和create_layout也会创造一些特征. 这些都可以作为分组和颜色等等图形属性.

1. layouts

As the layout is a global specification of the spatial position of the nodes it spans all layers in the plot and should thus be defined outside of calls to geoms or stats. In ggraph it is often done as part of the plot initialization using ggraph() — a function equivalent in intent to ggplot().
In addition to specifying the layout during plot creation it can also happen separately using create_layout(). This function takes the same arguments as ggraph() but returns a layout_ggraph object that can later be used in place of a graph structure in ggraph call:

ggraph(graph, layout = 'kk', maxiter = 100) + 
  geom_edge_link(aes(colour = factor(year))) + 
  geom_node_point()

layout <- create_layout(graph, layout = 'eigen')
ggraph(layout) + 
  geom_edge_link(aes(colour = factor(year))) + 
  geom_node_point()
  1. ggraph supports tbl_graph objects from tidygraph natively. Any other type of object will be attempted to be coerced to a tbl_graph object automatically. Tidygraph provide conversions for most known graph structure in R so almost any data type is supported by ggraph by extension.

  2. There’s a lot of different layouts in ggraph — All layouts from the graphlayouts and igraph packages are available, an ggraph itself also provide some of the more specialised layouts itself. All in all ggraph provides well above 20 different layouts to choose from, far more than we can cover in this text.
    If ggraph lacks the needed layout it is always possible to supply your own layout function that takes a tbl_graph object and returns a data.frame of node positions, or supply the positions directly by passing a matrix or data.frame to the layout argument.

  3. Sometimes standard and circular representations of the same layout get used so often that they get different names. In ggraph they’ll have the same name and only differ in whether or not circular is set to TRUE.
    Not every layout has a meaningful circular representation in which cases the circular argument will be ignored.

# An arc diagram
ggraph(graph, layout = 'linear') + 
  geom_edge_arc(aes(colour = factor(year)))
fig 1
# A coord diagram
ggraph(graph, layout = 'linear', circular = TRUE) + 
  geom_edge_arc(aes(colour = factor(year))) + 
  coord_fixed()
fig 2
graph <- tbl_graph(flare$vertices, flare$edges)
# An icicle plot
ggraph(graph, 'partition') + 
  geom_node_tile(aes(fill = depth), size = 0.25)
fig 3
# A sunburst plot
ggraph(graph, 'partition', circular = TRUE) + 
  geom_node_arc_bar(aes(fill = depth), size = 0.25) + 
  coord_fixed()
fig 4
  1. There is no such thing as “the best layout algorithm” as algorithms have been optimized for different scenarios.
graph <- as_tbl_graph(highschool) %>% 
  mutate(degree = centrality_degree())
lapply(c('stress', 'fr', 'lgl', 'graphopt'), function(layout) {
  ggraph(graph, layout = layout) + 
    geom_edge_link(aes(colour = factor(year)), show.legend = FALSE) +
    geom_node_point() + 
    labs(caption = paste0('Layout: ', layout))
})

5. hive plot
A hive plot, while still technically a node-edge diagram, is a bit different from the rest as it uses information pertaining to the nodes, rather than the connection information in the graph.
They are less common though, so use will often require some additional explanation.

graph <- graph %>% 
  mutate(friends = ifelse(
    centrality_degree(mode = 'in') < 5, 'few',
    ifelse(centrality_degree(mode = 'in') >= 15, 'many', 'medium')
  ))
ggraph(graph, 'hive', axis = friends, sort.by = degree) + 
  geom_edge_hive(aes(colour = factor(year))) + 
  geom_axis_hive(aes(colour = friends), size = 2, label = FALSE) + 
  coord_fixed()
fig 5

6. Hierarchical layouts
Trees and hierarchies are an important subset of graph structures, and ggraph provides a range of layouts optimized for their visual representation. Some of these use enclosure and position rather than edges to communicate relations (e.g. treemaps and circle packing). Still, these layouts can just as well be used for drawing edges if you wish to:

graph <- tbl_graph(flare$vertices, flare$edges)
set.seed(1)
ggraph(graph, 'circlepack', weight = size) + 
  geom_node_circle(aes(fill = depth), size = 0.25, n = 50) + 
  coord_fixed()
fig 6
set.seed(1)
ggraph(graph, 'circlepack', weight = size) + 
  geom_edge_link() + 
  geom_node_point(aes(colour = depth)) +
  coord_fixed()
fig 7
ggraph(graph, 'treemap', weight = size) + 
  geom_node_tile(aes(fill = depth), size = 0.25)
fig 8
ggraph(graph, 'treemap', weight = size) + 
  geom_edge_link() + 
  geom_node_point(aes(colour = depth))
fig 9

The most recognized tree plot is probably dendrograms though. If nothing else is stated the height of each node is calculated based on the distance to its farthest sibling (the tree layout, on the other hand, puts all nodes at a certain depth at the same level):

ggraph(graph, 'tree') + 
  geom_edge_diagonal()
fig 10

The height of each branch point can be set to a variable — e.g. the height provided by hclust and dendrogram objects:

dendrogram <- hclust(dist(iris[, 1:4]))
ggraph(dendrogram, 'dendrogram', height = height) + 
  geom_edge_elbow()
fig 11

Dendrograms are one of the layouts that are amenable for circular transformations, which can be effective in giving more space at the leafs of the tree at the expense of the space given to the root:

ggraph(dendrogram, 'dendrogram', circular = TRUE) + 
  geom_edge_elbow() + 
  coord_fixed()
fig 12

A type of trees known especially in phylogeny is unrooted trees, where no node is considered the root. Often a dendrogram layout will not be faithful as it implicitly position a node at the root. To avoid that you can use the unrooted layout instead.

tree <- create_tree(100, 2, directed = FALSE) %>% 
  activate(edges) %>% 
  mutate(length = runif(n()))
ggraph(tree, 'unrooted', length = length) + 
  geom_edge_link()
fig 13
  1. Focal layouts/Matrix layouts/Fabric layouts

2. Nodes

geom_node_*()
All ggraph geoms get a filter aesthetic that allows you to quickly filter the input data. The use of this can be illustrated when plotting a tree:

ggraph(gr, layout = 'dendrogram', circular = TRUE) + 
  geom_edge_diagonal() + 
  geom_node_point(aes(filter = leaf)) + 
  coord_fixed()
fig 14

In the above plot only the terminal nodes are drawn by filtering on the logical leaf column provided by the dendrogram layout.

geom_node_tile() and geom_node_range() are the ggraph counterpart to ggplot2s geom_tile() and geom_linerange() while geom_node_circle() and geom_node_arc_bar() maps to ggforces geom_circle() and geom_arc_bar(). Collective for these is that the spatial dimensions of the geoms (e.g. radius, width, and height) are precalculated by their intended layouts and defaulted by the geoms

ggraph(gr, layout = 'treemap', weight = size) + 
  geom_node_tile(aes(fill = depth))
fig 15
l <- ggraph(gr, layout = 'partition', circular = TRUE)
l + geom_node_arc_bar(aes(fill = depth)) + 
  coord_fixed()
fig 16
l + geom_edge_diagonal() + 
  geom_node_point(aes(colour = depth)) + 
  coord_fixed()

fig 17

geom_node_text() and geom_node_label() apart from their ggplot2 counterparts: both have a repel argument that, when set to TRUE, will use the repel functionality provided by the ggrepel package to avoid overlapping text. There is also geom_node_voronoi() that plots nodes as cells from a voronoi tesselation. This is useful for e.g. showing dominance of certain node types in an area as overlapping is avoided

graph <- create_notable('meredith') %>% 
  mutate(group = sample(c('A', 'B'), n(), TRUE))

ggraph(graph, 'stress') + 
  geom_node_voronoi(aes(fill = group), max.radius = 1) + 
  geom_node_point() + 
  geom_edge_link() + 
  coord_fixed()
fig 18

3. Edges

看看这个数据格式

library(ggraph)
library(tidygraph)
library(purrr)
library(rlang)

set_graph_style(plot_margin = margin(1,1,1,1))
hierarchy <- as_tbl_graph(hclust(dist(iris[, 1:4]))) %>% 
  mutate(Class = map_bfs_back_chr(node_is_root(), .f = function(node, path, ...) {
    if (leaf[node]) {
      as.character(iris$Species[as.integer(label[node])])
    } else {
      species <- unique(unlist(path$result))
      if (length(species) == 1) {
        species
      } else {
        NA_character_
      }
    }
  }))

hairball <- as_tbl_graph(highschool) %>% 
  mutate(
    year_pop = map_local(mode = 'in', .f = function(neighborhood, ...) {
      neighborhood %E>% pull(year) %>% table() %>% sort(decreasing = TRUE)
    }),
    pop_devel = map_chr(year_pop, function(pop) {
      if (length(pop) == 0 || length(unique(pop)) == 1) return('unchanged')
      switch(names(pop)[which.max(pop)],
             '1957' = 'decreased',
             '1958' = 'increased')
    }),
    popularity = map_dbl(year_pop, ~ .[1]) %|% 0
  ) %>% 
  activate(edges) %>% 
  mutate(year = as.character(year))

link 直线
fan 重复的连接画曲线
parallel 重复的链接画平行线

ggraph(hairball, layout = 'stress') + 
  geom_edge_fan(aes(colour = year))

loop 环

# let's make some of the student love themselves
# bind_edges
loopy_hairball <- hairball %>% 
  bind_edges(tibble::tibble(from = 1:5, to = 1:5, year = rep('1957', 5)))
ggraph(loopy_hairball, layout = 'stress') + 
  geom_edge_link(aes(colour = year), alpha = 0.25) + 
  geom_edge_loop(aes(colour = year))

geom_edge_density() lets you add a shading to your plot based on the density of edges in a certain area

ggraph(hairball, layout = 'stress') + 
  geom_edge_density(aes(fill = year)) + 
  geom_edge_link(alpha = 0.25)

Arcs
it increases overplotting and decreases interpretability for virtually no gain (unless complexity is your thing). That doesn’t mean arcs have no use in graph visualizations. Linear and circular layouts can benefit greatly from them and geom_edge_arc() is provided precisely for this scenario:

ggraph(hairball, layout = 'linear') + 
  geom_edge_arc(aes(colour = year))

Arcs behave differently in circular layouts as they will always bend towards the center no matter the direction of the edge (the same thing can be achieved in a linear layout by setting fold = TRUE).

ggraph(hairball, layout = 'linear', circular = TRUE) + 
  geom_edge_arc(aes(colour = year)) + 
  coord_fixed()
fig 19

Elbow 直方
Diagonals 弯曲
Bends 介于以上两者之间

ggraph(hierarchy, layout = 'dendrogram', height = height) + 
  geom_edge_bend()
fig 20

Hive
Span
point and tile 不需要edge

three variants
base:

ggraph(hairball, layout = 'linear') + 
  geom_edge_arc(aes(colour = year, alpha = after_stat(index))) + 
  scale_edge_alpha('Edge direction', guide = 'edge_direction')

2-variant线可以使用点的特征

ggraph(hierarchy, layout = 'dendrogram', height = height) + 
  geom_edge_elbow2(aes(colour = node.Class))

0-variant

edge strength
Many of the edge geoms takes a strength argument that denotes their deviation from a straight line. Setting strength = 0 will always result in a straight line, while strength = 1 is the default look.

small_tree <- create_tree(5, 2)

ggraph(small_tree, 'dendrogram') + 
  geom_edge_elbow(strength = 0.75)
fig 21

Decorating edges

simple <- create_notable('bull') %>% 
  mutate(name = c('Thomas', 'Bob', 'Hadley', 'Winston', 'Baptiste')) %>% 
  activate(edges) %>% 
  mutate(type = sample(c('friend', 'foe'), 5, TRUE))

Arrow or label
There’s a solution to this in the form of the start_cap and end_cap aesthetics in the base and 2-variant edge geoms (sorry 0-variant). This can be used to start and stop the edge drawing at an absolute distance from the terminal nodes. Using the circle(), square(), ellipsis(), and rectangle() helpers it is possible to get a lot of control over how edges are capped at either end. This works for any edge, curved or not:

ggraph(simple, layout = 'linear', circular = TRUE) + 
  geom_edge_arc(arrow = arrow(length = unit(4, 'mm')), 
                start_cap = circle(3, 'mm'),
                end_cap = circle(3, 'mm')) + 
  geom_node_point(size = 5) + 
  coord_fixed()
fig 22

When plotting node labels you often want to avoid that incoming and outgoing edges overlaps with the labels. ggraph provides a helper that calculates the bounding rectangle of the labels and cap edges based on that:

ggraph(simple, layout = 'graphopt') + 
  geom_edge_link(aes(start_cap = label_rect(node1.name),
                     end_cap = label_rect(node2.name)), 
                 arrow = arrow(length = unit(4, 'mm'))) + 
  geom_node_text(aes(label = name))
fig 23

Usually you would like the labels to run along the edges, but providing a fixed angle will only work at a very specific aspect ratio. Instead ggraph offers to calculate the correct angle dynamically so the labels always runs along the edge. Furthermore it can offset the label by an absolute length:

ggraph(simple, layout = 'graphopt') + 
  geom_edge_link(aes(label = type), 
                 angle_calc = 'along',
                 label_dodge = unit(2.5, 'mm'),
                 arrow = arrow(length = unit(4, 'mm')), 
                 end_cap = circle(3, 'mm')) + 
  geom_node_point(size = 5)
fig 24

Coneections
The estranged cousin of edges are connections. While edges show the relational nature of the nodes in the graph structure, connections connect nodes that are not connected in the graph. This is done by finding the shortest path between the two nodes. Currently the only connection geom available is [geom_conn_bundle()](https://ggraph.data-imaginist.com/reference/geom_conn_bundle.html) that implements the hierarchical edge bundling technique:

flaregraph <- tbl_graph(flare$vertices, flare$edges)
from <- match(flare$imports$from, flare$vertices$name)
to <- match(flare$imports$to, flare$vertices$name)
ggraph(flaregraph, layout = 'dendrogram', circular = TRUE) + 
  geom_conn_bundle(data = get_con(from = from, to = to), alpha = 0.1) + 
  coord_fixed()
fig 25

你可能感兴趣的:([包] ggraph)