R语言 随机抽样

  • 简单随机抽样是指从总体N个单位中任意抽取n个单位作为样本,使每个可能的样本被抽中的概率相等的一种抽样方式
  • 可通过srswr函数,srsworsample函数实现
  • srswr函数和srswor函数在sampling包中,使用前需要先加载sampling包
    srswr(n, N)
    srswor(n, N)
    sample(x, size, replace = FALSE, prob = NULL)

    示例:
install.packages("sampling")
#  运行结果:
#  WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:
#  
#  https://cran.rstudio.com/bin/windows/Rtools/
#  Installing package into ‘C:/Users/Admin/Documents/R/win-library/3.6’
#  (as ‘lib’ is unspecified)
#  also installing the dependency ‘lpSolve’
#  
#  试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/lpSolve_5.6.15.zip'
#  Content type 'application/zip' length 748088 bytes (730 KB)
#  downloaded 730 KB
#  
#  试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/sampling_2.8.zip'
#  Content type 'application/zip' length 905090 bytes (883 KB)
#  downloaded 883 KB
#  
#  package ‘lpSolve’ successfully unpacked and MD5 sums checked
#  package ‘sampling’ successfully unpacked and MD5 sums checked
#  
#  The downloaded binary packages are in
#  	       C:\Users\Admin\AppData\Local\Temp\RtmpWuDkpt\downloaded_packages
library(sampling)



有放回的简单随机抽样-srswr函数

srswr(n, N)
srswr函数可实现放回简单随机抽样,表示在总体N中有放回地抽取n个样本,返回一个长度为N的向量,每个分量的值表示抽取次数
R语言 随机抽样_第1张图片
示例:

install.packages("sampling")
#  运行结果:
#  WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:
#  
#  https://cran.rstudio.com/bin/windows/Rtools/
#  Installing package into ‘C:/Users/Admin/Documents/R/win-library/3.6’
#  (as ‘lib’ is unspecified)
#  also installing the dependency ‘lpSolve’
#  
#  试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/lpSolve_5.6.15.zip'
#  Content type 'application/zip' length 748088 bytes (730 KB)
#  downloaded 730 KB
#  
#  试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/sampling_2.8.zip'
#  Content type 'application/zip' length 905090 bytes (883 KB)
#  downloaded 883 KB
#  
#  package ‘lpSolve’ successfully unpacked and MD5 sums checked
#  package ‘sampling’ successfully unpacked and MD5 sums checked
#  
#  The downloaded binary packages are in
#  	       C:\Users\Admin\AppData\Local\Temp\RtmpWuDkpt\downloaded_packages
library(sampling)

s <- srswr(8,100)
s
#  运行结果:
#    [1] 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
#   [41] 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0
#   [81] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
obs <- c(1:100)[s!=0]
obs
#  运行结果:
#  [1]  4  6 29 44 67 69 73 92
n <- s[s!=0]
n
#  运行结果:
#  [1] 1 1 1 1 1 1 1 1


不放回简单随机抽样-srswor函数

srswor(n, N)
srswor函数实现不放回简单随机抽样,表示在总体N中无放回地抽取n个样本,返回一个长度为N的向量,每个分量的值表示抽取次数,取值为0或1
R语言 随机抽样_第2张图片
示例:

install.packages("sampling")
#  运行结果:
#  WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding:
#  
#  https://cran.rstudio.com/bin/windows/Rtools/
#  Installing package into ‘C:/Users/Admin/Documents/R/win-library/3.6’
#  (as ‘lib’ is unspecified)
#  also installing the dependency ‘lpSolve’
#  
#  试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/lpSolve_5.6.15.zip'
#  Content type 'application/zip' length 748088 bytes (730 KB)
#  downloaded 730 KB
#  
#  试开URL’https://cran.rstudio.com/bin/windows/contrib/3.6/sampling_2.8.zip'
#  Content type 'application/zip' length 905090 bytes (883 KB)
#  downloaded 883 KB
#  
#  package ‘lpSolve’ successfully unpacked and MD5 sums checked
#  package ‘sampling’ successfully unpacked and MD5 sums checked
#  
#  The downloaded binary packages are in
#  	       C:\Users\Admin\AppData\Local\Temp\RtmpWuDkpt\downloaded_packages
library(sampling)

LETTERS
#  运行结果:
#   [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T"
#  [21] "U" "V" "W" "X" "Y" "Z"
s <- srswor(10, 26)
s
#  运行结果:
#   [1] 1 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 1 1 1 0 1 0 0
obs <- c(1:26)[s!=0]
obs
#  运行结果:
#   [1]  1  2  4 10 13 17 20 21 22 24
sample <- LETTERS[obs]
sample
#  运行结果:
#   [1] "A" "B" "D" "J" "M" "Q" "T" "U" "V" "X"


sample函数

sample(x, size, replace = FALSE,prob = NULL)
sample函数可实现放回简单抽样和不放回简单随机抽样,返回随机抽样得到的元素。同时也可对数据进行随机分组

参数 描述
x 数据
size 抽取样本数
replace replace = FALSE 为不放回随机抽样;replace = TRUE 为放回随机抽样
prob 权重向量

R语言 随机抽样_第3张图片
示例:

LETTERS
#  运行结果:
#   [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T"
#  [21] "U" "V" "W" "X" "Y" "Z"
sample(LETTERS,5,replace = TRUE)   #  有放回
#  运行结果:
#  [1] "C" "M" "L" "K" "N"
sample(LETTERS,5,replace = FALSE)   #  无放回
#  运行结果:
#  [1] "X" "G" "H" "B" "J"
n <- sample(c(1:2),26,replace = TRUE,prob = c(0.7, 0.3))
n
#  运行结果:
#  [1] 2 1 2 1 1 1 1 1 1 2 2 2 2 1 2 1 1 1 1 1 1 2 1 1 2 1
sample1 <- LETTERS[n==1]
sample1
#  运行结果:
#  [1] "B" "D" "E" "F" "G" "H" "I" "N" "P" "Q" "R" "S" "T" "U" "W" "X" "Z"
sample2 <- LETTERS[n==2]
sample2
#  运行结果:
#  [1] "A" "C" "J" "K" "L" "M" "O" "V" "Y"

你可能感兴趣的:(大数据,r语言,大数据,数据挖掘,海量数据挖掘,数据分析)