dataframe安装某种标准过滤行dplyr gsub filter slice grep str_detect

x <- c("aa", "aa", "aa", "bb", "cc", "cc", "cc")
y <- c(101, 102, 113, 201, 202, 344, 407)
df = data.frame(x, y)    

    x   y
1   aa  101
2   aa  102
3   aa  113
4   bb  201
5   cc  202
6   cc  344
7   cc  407
#filter rows that contain 'Guard' or 'Forward' in the player column
df %>% filter(grepl('Guard|Forward', player))

     player points rebounds
1   P Guard     12        5
2   S Guard     15        7
3 S Forward     19        7
4 P Forward     22       12

 

#filter out rows that contain 'Guard' in the player column
df %>% filter(!grepl('Guard', player))

     player points rebounds
1 S Forward     19        7
2 P Forward     22       12
3    Center     32       11

 

df %>%
  filter(y != grep("^1")) 
df %>% filter(!str_detect(y, "^1"))

dataframe安装某种标准过滤行dplyr gsub filter slice grep str_detect_第1张图片

Pipe functions together
We created multiple new data objects during our explorations of dplyr functions, above. While this works, we can produce the same results more efficiently by chaining functions together and creating only one new data object that encapsulates all of the previously sought information: filter on only females, grepl to get only Peromyscus spp., group_by individual species, and summarise the numbers of individuals.

# combine several functions to get a summary of the numbers of individuals of 
# female Peromyscus species in our dataset.

# remember %>% are "pipes" that allow us to pass information from one function 
# to the next. 

dataBySpFem <- myData %>% 
                  filter(grepl('Peromyscus', scientificName), sex == "F") %>%
                  group_by(scientificName) %>%
                  summarise(n_individuals = n())

## `summarise()` ungrouping output (override with `.groups` argument)

# view the data
dataBySpFem

## # A tibble: 3 x 2
##   scientificName         n_individuals
##                             
## 1 Peromyscus leucopus              455
## 2 Peromyscus maniculatus            98
## 3 Peromyscus sp.                     5
Cool!

Base R only
So that is nice, but we had to install a new package dplyr. You might ask, "Is it really worth it to learn new commands if I can do this is base R." While we think "yes", why don't you see for yourself. Here is the base R code needed to accomplish the same task.

# For reference, the same output but using R's base functions

# First, subset the data to only female Peromyscus
dataFemPero  <- myData[myData$sex == 'F' & 
                   grepl('Peromyscus', myData$scientificName), ]

# Option 1 --------------------------------
# Use aggregate and then rename columns
dataBySpFem_agg <-aggregate(dataFemPero$sex ~ dataFemPero$scientificName, 
                   data = dataFemPero, FUN = length)
names(dataBySpFem_agg) <- c('scientificName', 'n_individuals')

# view output
dataBySpFem_agg

##           scientificName n_individuals
## 1    Peromyscus leucopus           455
## 2 Peromyscus maniculatus            98
## 3         Peromyscus sp.             5

# Option 2 --------------------------------------------------------
# Do it by hand

# Get the unique scientificNames in the subset
sppInDF <- unique(dataFemPero$scientificName[!is.na(dataFemPero$scientificName)])

# Use a loop to calculate the numbers of individuals of each species
sciName <- vector(); numInd <- vector()
for (i in sppInDF) {
  sciName <- c(sciName, i)
  numInd <- c(numInd, length(which(dataFemPero$scientificName==i)))
}

#Create the desired output data frame
dataBySpFem_byHand <- data.frame('scientificName'=sciName, 
                   'n_individuals'=numInd)

# view output
dataBySpFem_byHand

##           scientificName n_individuals
## 1    Peromyscus leucopus           455
## 2 Peromyscus maniculatus            98
## 3         Peromyscus sp.             5

你可能感兴趣的:(java,前端,服务器)