R Shiny 基础 3. 实战演练 数据探索

1. 引入

融汇贯通之前介绍的内容,来实现一个简单的数据可视化app。首先会在R里进行一些数据探索,然后把这些功能转换为shiny的可交互模式。

本章节会用到下面的包。

library(shiny)
library(vroom)
library(tidyverse)

2. 数据

数据会用到National Electronic Injury Surveillance System (NEISS)的一份数据。记录了长期事故入院的病例数据。

https://github.com/hadley/neiss

本章只会用到2017年的数据,数据大小在10M左右。

dir.create("neiss")
#> Warning in dir.create("neiss"): 'neiss' already exists
download <- function(name) {
  url <- "https://github.com/hadley/mastering-shiny/raw/master/neiss/"
  download.file(paste0(url, name), paste0("neiss/", name), quiet = TRUE)
}
download("injuries.tsv.gz")
download("population.tsv")
download("products.tsv")

injuries 是长这样的

injuries <- vroom::vroom("neiss/injuries.tsv.gz")
injuries
#> # A tibble: 255,064 × 10
#>   trmt_date    age sex   race  body_part   diag         location prod_code weight
#>                                    
#> 1 2017-01-01    71 male  white Upper Trunk Contusion O… Other P…      1807   77.7
#> 2 2017-01-01    16 male  white Lower Arm   Burns, Ther… Home           676   77.7
#> 3 2017-01-01    58 male  white Upper Trunk Contusion O… Home           649   77.7
#> 4 2017-01-01    21 male  white Lower Trunk Strain, Spr… Home          4076   77.7
#> 5 2017-01-01    54 male  white Head        Inter Organ… Other P…      1807   77.7
#> 6 2017-01-01    21 male  white Hand        Fracture     Home          1884   77.7
#> # … with 255,058 more rows, and 1 more variable: narrative 

每个变量的定义:

  • trmt_date is date the person was seen in the hospital (not when the accident occurred).
  • age, sex, and race give demographic information about the person who experienced the accident.
  • body_part is the location of the injury on the body (like ankle or ear); location is the place where the accident occurred (like home or school).
  • diag gives the basic diagnosis of the injury (like fracture or laceration).
  • prod_code is the primary product associated with the injury.
  • weight is statistical weight giving the estimated number of people who would suffer this injury if this dataset was scaled to the entire population of the US.
  • narrative is a brief story about how the accident occurred.

然后把这个数据和其他两个数据匹配起来

products <- vroom::vroom("neiss/products.tsv")
products
#> # A tibble: 38 × 2
#>   prod_code title                            
#>                                    
#> 1       464 knives, not elsewhere classified 
#> 2       474 tableware and accessories        
#> 3       604 desks, chests, bureaus or buffets
#> 4       611 bathtubs or showers              
#> 5       649 toilets                          
#> 6       676 rugs or carpets, not specified   
#> # … with 32 more rows

population <- vroom::vroom("neiss/population.tsv")
population
#> # A tibble: 170 × 3
#>     age sex    population
#>           
#> 1     0 female    1924145
#> 2     0 male      2015150
#> 3     1 female    1943534
#> 4     1 male      2031718
#> 5     2 female    1965150
#> 6     2 male      2056625
#> # … with 164 more rows

3. 数据探索

在做成app之前,先把数据探索一下。可以从比较好玩的地方开始入手,比方说prod_code 为649的数据,这个表示发生漏电事故的产品编号是厕所。

selected <- injuries %>% filter(prod_code == 649)
nrow(selected)
#> [1] 2993

接着往下看可以找到发生事故的场所最多的是Home家里,身体部位是Head

selected %>% count(location, wt = weight, sort = TRUE)
#> # A tibble: 6 × 2
#>   location                         n
#>                           
#> 1 Home                       99603. 
#> 2 Other Public Property      18663. 
#> 3 Unknown                    16267. 
#> 4 School                       659. 
#> 5 Street Or Highway             16.2
#> 6 Sports Or Recreation Place    14.8

selected %>% count(body_part, wt = weight, sort = TRUE)
#> # A tibble: 24 × 2
#>   body_part        n
#>           
#> 1 Head        31370.
#> 2 Lower Trunk 26855.
#> 3 Face        13016.
#> 4 Upper Trunk 12508.
#> 5 Knee         6968.
#> 6 N.S./Unk     6741.
#> # … with 18 more rows

selected %>% count(diag, wt = weight, sort = TRUE)
#> # A tibble: 20 × 2
#>   diag                       n
#>                     
#> 1 Other Or Not Stated   32897.
#> 2 Contusion Or Abrasion 22493.
#> 3 Inter Organ Injury    21525.
#> 4 Fracture              21497.
#> 5 Laceration            18734.
#> 6 Strain, Sprain         7609.
#> # … with 14 more rows

然后也可以用ggplot做简单的可视化分析,比方说按照agesex count一下数量

summary <- selected %>% 
  count(age, sex, wt = weight)
summary
#> # A tibble: 208 × 3
#>     age sex         n
#>       
#> 1     0 female   4.76
#> 2     0 male    14.3 
#> 3     1 female 253.  
#> 4     1 male   231.  
#> 5     2 female 438.  
#> 6     2 male   632.  
#> # … with 202 more rows

summary %>% 
  ggplot(aes(age, n, colour = sex)) + 
  geom_line() + 
  labs(y = "Estimated number of injuries")

上图显示的是实际数字,也可以改成按照比例显示。比方说10000个人里有多少个。

summary <- selected %>% 
  count(age, sex, wt = weight) %>% 
  left_join(population, by = c("age", "sex")) %>% 
  mutate(rate = n / population * 1e4)

summary
#> # A tibble: 208 × 5
#>     age sex         n population   rate
#>               
#> 1     0 female   4.76    1924145 0.0247
#> 2     0 male    14.3     2015150 0.0708
#> 3     1 female 253.      1943534 1.30  
#> 4     1 male   231.      2031718 1.14  
#> 5     2 female 438.      1965150 2.23  
#> 6     2 male   632.      2056625 3.07  
#> # … with 202 more rows
summary %>% 
  ggplot(aes(age, rate, colour = sex)) + 
  geom_line(na.rm = TRUE) + 
  labs(y = "Injuries per 10,000 people")

最后还可以查看一下里面一些案件的文字描述。比方说随机取样10个样本。

selected %>% 
  sample_n(10) %>% 
  pull(narrative)
#>  [1] "97 YOM FELL HITTING HEAD ON TOILET SEAT.DX:  NECK PX, BACK PX, FREQUENT FALLS."                                                   
#>  [2] "95 YOF - CONTUSION HEAD - PT WAS TRANSFERRING FROM W.C TO TOILETAND FELL HITTING HEAD ON FLOOR@ N.H"                              
#>  [3] "54YOM HAD A MECHANICAL FALL ATTEMPTING TO USE THE TOILET, C/O LT-SIDEDCHEST PAIN. DX - RIB FX, PNEUMOTHORAX, CHEST WALL CONTUSION"
#>  [4] "99YF ATTEMPTING TO GET OFF THE TOILET&FELL FWD STRIKING HEAD&CHEST AGAINST THE WALKER, -LOC>>CHI, RIB FX, FREQ FALLS"             
#>  [5] "79 YOF HAD SYNCOPAL EPISODE AND FELL FROM TOILET HITTING FACE ONFLOOR     DX  NASAL BONE FRACTURE"                                
#>  [6] "76YOM C/O GLF @HOME JUST PTA. ATTEMPTING TO SIT ON TOILET, MISSED TOILET AND FELL. NO HI, NO LOC DX=LEFT HIP FRACTURE="           
#>  [7] "6 YO M LAC HEAD-FELL,CLIMBING ON TOILET,STRUCK THE COUNTERTOP"                                                                    
#>  [8] "85YOM SITTING ON THE TOILET AT HOME AND LEANED FORWARD FELL ONTO HEAD SUSTAINED A SUBDURAL HEMATOMA"                              
#>  [9] "79YOF H'TMA HEAD- LOWERING ONTO TOILET, FELL ON FLOOR"                                                                            
#> [10] "81YOM WENT TO SIT ON A TOILET AND MISSED IT AND SUSTAINED A CLOSED HEADINJURY"

都是一些比较基础的数据探索。接下来要做的就是把这些事情转交给shiny,全都改写成shiny code。

4. Shiny小试牛刀

首先是设置UI界面。
这里会用最简单的方式演示三张表格,一张图。可以提前用笔在纸上打一下草稿规划一下界面的排版。这里打算做成一个2行3列的界面。第一行显示三张表格,第二行显示一张图。
由于每一行的最大宽度是12列,所以三张表格均匀分布的话是每一张占4列。

prod_codes <- setNames(products$prod_code, products$title)

ui <- fluidPage(
  fluidRow(
    column(6,
      selectInput("code", "Product", choices = prod_codes)
    )
  ),
  fluidRow(
    column(4, tableOutput("diag")),
    column(4, tableOutput("body_part")),
    column(4, tableOutput("location"))
  ),
  fluidRow(
    column(12, plotOutput("age_sex"))
  )
)

虽然到目前为止还没有对fluidRow()column()有过详细讲解,但是大概可以猜到是用来做什么的。包括setNames()selectInput() 之后的章节里都会有说明。

然后是server端。

server <- function(input, output, session) {
  selected <- reactive(injuries %>% filter(prod_code == input$code))

  output$diag <- renderTable(
    selected() %>% count(diag, wt = weight, sort = TRUE)
  )
  output$body_part <- renderTable(
    selected() %>% count(body_part, wt = weight, sort = TRUE)
  )
  output$location <- renderTable(
    selected() %>% count(location, wt = weight, sort = TRUE)
  )

  summary <- reactive({
    selected() %>%
      count(age, sex, wt = weight) %>%
      left_join(population, by = c("age", "sex")) %>%
      mutate(rate = n / population * 1e4)
  })

  output$age_sex <- renderPlot({

    summary() %>%
      ggplot(aes(age, n, colour = sex)) +
      geom_line() +
      labs(y = "Estimated number of injuries")
  }, res = 96)
}

5. 对齐表格样式

刚才在刚才的结果里看出表格的行数层次不齐,不是很美观。如果可以像下面那样指定排名前几的就好了。

injuries %>%
  mutate(diag = fct_lump(fct_infreq(diag), n = 5)) %>%
  group_by(diag) %>%
  summarise(n = as.integer(sum(weight)))
#> # A tibble: 6 × 2
#>   diag                        n
#>                      
#> 1 Other Or Not Stated   1806436
#> 2 Fracture              1558961
#> 3 Laceration            1432407
#> 4 Strain, Sprain        1432556
#> 5 Contusion Or Abrasion 1451987
#> 6 Other                 1929147

可以写一个function,不会写也没关系,之后会有详细的解说。

count_top <- function(df, var, n = 5) {
  df %>%
    mutate({{ var }} := fct_lump(fct_infreq({{ var }}), n = n)) %>%
    group_by({{ var }}) %>%
    summarise(n = as.integer(sum(weight)))
}

然后在server端安排一下function。这里面有个细节,width=100% 指定撑满行距,让其看上去不会那么凌乱。

  output$diag <- renderTable(count_top(selected(), diag), width = "100%")
  output$body_part <- renderTable(count_top(selected(), body_part), width = "100%")
  output$location <- renderTable(count_top(selected(), location), width = "100%")

5. 添加选项 rate vs count

fluidRow(
    column(8,
      selectInput("code", "Product",
        choices = setNames(products$prod_code, products$title),
        width = "100%"
      )
    ),
    column(2, selectInput("y", "Y axis", c("rate", "count")))
  ),
output$age_sex <- renderPlot({
    if (input$y == "count") {
      summary() %>%
        ggplot(aes(age, n, colour = sex)) +
        geom_line() +
        labs(y = "Estimated number of injuries")
    } else {
      summary() %>%
        ggplot(aes(age, rate, colour = sex)) +
        geom_line(na.rm = TRUE) +
        labs(y = "Injuries per 10,000 people")
    }
  }, res = 96)

6. 添加文字敘述

首先在UI里添加新的元素。比如说actionButton

fluidRow(
    column(2, actionButton("story", "Tell me a story")),
    column(10, textOutput("narrative"))
  )

然后在sever里添加一个eventReactive ,这个是表示只有被点击才会被激活。

arrative_sample <- eventReactive(
    list(input$story, selected()),
    selected() %>% pull(narrative) %>% sample(1)
  )
  output$narrative <- renderText(narrative_sample())

你可能感兴趣的:(R Shiny 基础 3. 实战演练 数据探索)