1. 引入
融汇贯通之前介绍的内容,来实现一个简单的数据可视化app。首先会在R里进行一些数据探索,然后把这些功能转换为shiny的可交互模式。
本章节会用到下面的包。
library(shiny)
library(vroom)
library(tidyverse)
2. 数据
数据会用到National Electronic Injury Surveillance System (NEISS)的一份数据。记录了长期事故入院的病例数据。
https://github.com/hadley/neiss
本章只会用到2017年的数据,数据大小在10M左右。
dir.create("neiss")
#> Warning in dir.create("neiss"): 'neiss' already exists
download <- function(name) {
url <- "https://github.com/hadley/mastering-shiny/raw/master/neiss/"
download.file(paste0(url, name), paste0("neiss/", name), quiet = TRUE)
}
download("injuries.tsv.gz")
download("population.tsv")
download("products.tsv")
injuries
是长这样的
injuries <- vroom::vroom("neiss/injuries.tsv.gz")
injuries
#> # A tibble: 255,064 × 10
#> trmt_date age sex race body_part diag location prod_code weight
#>
#> 1 2017-01-01 71 male white Upper Trunk Contusion O… Other P… 1807 77.7
#> 2 2017-01-01 16 male white Lower Arm Burns, Ther… Home 676 77.7
#> 3 2017-01-01 58 male white Upper Trunk Contusion O… Home 649 77.7
#> 4 2017-01-01 21 male white Lower Trunk Strain, Spr… Home 4076 77.7
#> 5 2017-01-01 54 male white Head Inter Organ… Other P… 1807 77.7
#> 6 2017-01-01 21 male white Hand Fracture Home 1884 77.7
#> # … with 255,058 more rows, and 1 more variable: narrative
每个变量的定义:
-
trmt_date
is date the person was seen in the hospital (not when the accident occurred). -
age
,sex
, andrace
give demographic information about the person who experienced the accident. -
body_part
is the location of the injury on the body (like ankle or ear);location
is the place where the accident occurred (like home or school). -
diag
gives the basic diagnosis of the injury (like fracture or laceration). -
prod_code
is the primary product associated with the injury. -
weight
is statistical weight giving the estimated number of people who would suffer this injury if this dataset was scaled to the entire population of the US. -
narrative
is a brief story about how the accident occurred.
然后把这个数据和其他两个数据匹配起来
products <- vroom::vroom("neiss/products.tsv")
products
#> # A tibble: 38 × 2
#> prod_code title
#>
#> 1 464 knives, not elsewhere classified
#> 2 474 tableware and accessories
#> 3 604 desks, chests, bureaus or buffets
#> 4 611 bathtubs or showers
#> 5 649 toilets
#> 6 676 rugs or carpets, not specified
#> # … with 32 more rows
population <- vroom::vroom("neiss/population.tsv")
population
#> # A tibble: 170 × 3
#> age sex population
#>
#> 1 0 female 1924145
#> 2 0 male 2015150
#> 3 1 female 1943534
#> 4 1 male 2031718
#> 5 2 female 1965150
#> 6 2 male 2056625
#> # … with 164 more rows
3. 数据探索
在做成app之前,先把数据探索一下。可以从比较好玩的地方开始入手,比方说prod_code
为649的数据,这个表示发生漏电事故的产品编号是厕所。
selected <- injuries %>% filter(prod_code == 649)
nrow(selected)
#> [1] 2993
接着往下看可以找到发生事故的场所最多的是Home
家里,身体部位是Head
头
selected %>% count(location, wt = weight, sort = TRUE)
#> # A tibble: 6 × 2
#> location n
#>
#> 1 Home 99603.
#> 2 Other Public Property 18663.
#> 3 Unknown 16267.
#> 4 School 659.
#> 5 Street Or Highway 16.2
#> 6 Sports Or Recreation Place 14.8
selected %>% count(body_part, wt = weight, sort = TRUE)
#> # A tibble: 24 × 2
#> body_part n
#>
#> 1 Head 31370.
#> 2 Lower Trunk 26855.
#> 3 Face 13016.
#> 4 Upper Trunk 12508.
#> 5 Knee 6968.
#> 6 N.S./Unk 6741.
#> # … with 18 more rows
selected %>% count(diag, wt = weight, sort = TRUE)
#> # A tibble: 20 × 2
#> diag n
#>
#> 1 Other Or Not Stated 32897.
#> 2 Contusion Or Abrasion 22493.
#> 3 Inter Organ Injury 21525.
#> 4 Fracture 21497.
#> 5 Laceration 18734.
#> 6 Strain, Sprain 7609.
#> # … with 14 more rows
然后也可以用ggplot
做简单的可视化分析,比方说按照age
和sex
count一下数量
summary <- selected %>%
count(age, sex, wt = weight)
summary
#> # A tibble: 208 × 3
#> age sex n
#>
#> 1 0 female 4.76
#> 2 0 male 14.3
#> 3 1 female 253.
#> 4 1 male 231.
#> 5 2 female 438.
#> 6 2 male 632.
#> # … with 202 more rows
summary %>%
ggplot(aes(age, n, colour = sex)) +
geom_line() +
labs(y = "Estimated number of injuries")
上图显示的是实际数字,也可以改成按照比例显示。比方说10000个人里有多少个。
summary <- selected %>%
count(age, sex, wt = weight) %>%
left_join(population, by = c("age", "sex")) %>%
mutate(rate = n / population * 1e4)
summary
#> # A tibble: 208 × 5
#> age sex n population rate
#>
#> 1 0 female 4.76 1924145 0.0247
#> 2 0 male 14.3 2015150 0.0708
#> 3 1 female 253. 1943534 1.30
#> 4 1 male 231. 2031718 1.14
#> 5 2 female 438. 1965150 2.23
#> 6 2 male 632. 2056625 3.07
#> # … with 202 more rows
summary %>%
ggplot(aes(age, rate, colour = sex)) +
geom_line(na.rm = TRUE) +
labs(y = "Injuries per 10,000 people")
最后还可以查看一下里面一些案件的文字描述。比方说随机取样10个样本。
selected %>%
sample_n(10) %>%
pull(narrative)
#> [1] "97 YOM FELL HITTING HEAD ON TOILET SEAT.DX: NECK PX, BACK PX, FREQUENT FALLS."
#> [2] "95 YOF - CONTUSION HEAD - PT WAS TRANSFERRING FROM W.C TO TOILETAND FELL HITTING HEAD ON FLOOR@ N.H"
#> [3] "54YOM HAD A MECHANICAL FALL ATTEMPTING TO USE THE TOILET, C/O LT-SIDEDCHEST PAIN. DX - RIB FX, PNEUMOTHORAX, CHEST WALL CONTUSION"
#> [4] "99YF ATTEMPTING TO GET OFF THE TOILET&FELL FWD STRIKING HEAD&CHEST AGAINST THE WALKER, -LOC>>CHI, RIB FX, FREQ FALLS"
#> [5] "79 YOF HAD SYNCOPAL EPISODE AND FELL FROM TOILET HITTING FACE ONFLOOR DX NASAL BONE FRACTURE"
#> [6] "76YOM C/O GLF @HOME JUST PTA. ATTEMPTING TO SIT ON TOILET, MISSED TOILET AND FELL. NO HI, NO LOC DX=LEFT HIP FRACTURE="
#> [7] "6 YO M LAC HEAD-FELL,CLIMBING ON TOILET,STRUCK THE COUNTERTOP"
#> [8] "85YOM SITTING ON THE TOILET AT HOME AND LEANED FORWARD FELL ONTO HEAD SUSTAINED A SUBDURAL HEMATOMA"
#> [9] "79YOF H'TMA HEAD- LOWERING ONTO TOILET, FELL ON FLOOR"
#> [10] "81YOM WENT TO SIT ON A TOILET AND MISSED IT AND SUSTAINED A CLOSED HEADINJURY"
都是一些比较基础的数据探索。接下来要做的就是把这些事情转交给shiny,全都改写成shiny code。
4. Shiny小试牛刀
首先是设置UI界面。
这里会用最简单的方式演示三张表格,一张图。可以提前用笔在纸上打一下草稿规划一下界面的排版。这里打算做成一个2行3列的界面。第一行显示三张表格,第二行显示一张图。
由于每一行的最大宽度是12列,所以三张表格均匀分布的话是每一张占4列。
prod_codes <- setNames(products$prod_code, products$title)
ui <- fluidPage(
fluidRow(
column(6,
selectInput("code", "Product", choices = prod_codes)
)
),
fluidRow(
column(4, tableOutput("diag")),
column(4, tableOutput("body_part")),
column(4, tableOutput("location"))
),
fluidRow(
column(12, plotOutput("age_sex"))
)
)
虽然到目前为止还没有对fluidRow()
和column()
有过详细讲解,但是大概可以猜到是用来做什么的。包括setNames()
和selectInput()
之后的章节里都会有说明。
然后是server端。
server <- function(input, output, session) {
selected <- reactive(injuries %>% filter(prod_code == input$code))
output$diag <- renderTable(
selected() %>% count(diag, wt = weight, sort = TRUE)
)
output$body_part <- renderTable(
selected() %>% count(body_part, wt = weight, sort = TRUE)
)
output$location <- renderTable(
selected() %>% count(location, wt = weight, sort = TRUE)
)
summary <- reactive({
selected() %>%
count(age, sex, wt = weight) %>%
left_join(population, by = c("age", "sex")) %>%
mutate(rate = n / population * 1e4)
})
output$age_sex <- renderPlot({
summary() %>%
ggplot(aes(age, n, colour = sex)) +
geom_line() +
labs(y = "Estimated number of injuries")
}, res = 96)
}
5. 对齐表格样式
刚才在刚才的结果里看出表格的行数层次不齐,不是很美观。如果可以像下面那样指定排名前几的就好了。
injuries %>%
mutate(diag = fct_lump(fct_infreq(diag), n = 5)) %>%
group_by(diag) %>%
summarise(n = as.integer(sum(weight)))
#> # A tibble: 6 × 2
#> diag n
#>
#> 1 Other Or Not Stated 1806436
#> 2 Fracture 1558961
#> 3 Laceration 1432407
#> 4 Strain, Sprain 1432556
#> 5 Contusion Or Abrasion 1451987
#> 6 Other 1929147
可以写一个function,不会写也没关系,之后会有详细的解说。
count_top <- function(df, var, n = 5) {
df %>%
mutate({{ var }} := fct_lump(fct_infreq({{ var }}), n = n)) %>%
group_by({{ var }}) %>%
summarise(n = as.integer(sum(weight)))
}
然后在server端安排一下function。这里面有个细节,width=100%
指定撑满行距,让其看上去不会那么凌乱。
output$diag <- renderTable(count_top(selected(), diag), width = "100%")
output$body_part <- renderTable(count_top(selected(), body_part), width = "100%")
output$location <- renderTable(count_top(selected(), location), width = "100%")
5. 添加选项 rate vs count
fluidRow(
column(8,
selectInput("code", "Product",
choices = setNames(products$prod_code, products$title),
width = "100%"
)
),
column(2, selectInput("y", "Y axis", c("rate", "count")))
),
output$age_sex <- renderPlot({
if (input$y == "count") {
summary() %>%
ggplot(aes(age, n, colour = sex)) +
geom_line() +
labs(y = "Estimated number of injuries")
} else {
summary() %>%
ggplot(aes(age, rate, colour = sex)) +
geom_line(na.rm = TRUE) +
labs(y = "Injuries per 10,000 people")
}
}, res = 96)
6. 添加文字敘述
首先在UI里添加新的元素。比如说actionButton
fluidRow(
column(2, actionButton("story", "Tell me a story")),
column(10, textOutput("narrative"))
)
然后在sever里添加一个eventReactive
,这个是表示只有被点击才会被激活。
arrative_sample <- eventReactive(
list(input$story, selected()),
selected() %>% pull(narrative) %>% sample(1)
)
output$narrative <- renderText(narrative_sample())