R 中的分层抽样（附示例）

经过本杰明·安德森博 28 7 月, 2023 指导 0 条评论

研究人员经常从人群中抽取样本，并利用样本中的数据得出关于整个人群的结论。

常用的抽样方法是分层随机抽样，将总体分为若干组，并从每个组中随机选择一定数量的成员纳入样本。

本教程介绍如何在 R 中执行分层随机抽样。

示例：R 中的分层采样

一所高中由 400 名学生组成，包括一年级、二年级、三年级或四年级学生。假设我们要抽取 40 名学生的分层样本，因此每个年级有 10 名学生包含在样本中。

以下代码展示了如何生成 400 名学生的示例数据框：

 #make this example reproducible
set.seed(1)

#create data frame
df <- data.frame(grade = rep(c('Freshman', 'Sophomore', 'Junior', 'Senior'), each =100),
                 gpa = rnorm(400, mean=85, sd=3))

#view first six rows of data frame
head(df)

     gpa grade
1 Freshman 83.12064
2 Freshman 85.55093
3 Freshman 82.49311
4 Freshman 89.78584
5 Freshman 85.98852
6 Freshman 82.53859

使用行计数进行分层采样

以下代码演示如何使用dplyr包中的group_by()和sample_n()函数来获取总共 40 名学生的分层随机样本，每个年级有 10 名学生：

 library (dplyr)

#obtain laminated sample
strat_sample <- df %>%
                  group_by (grade) %>%
                  sample_n (size=10)

#find frequency of students from each grade
table(strat_sample$grade)

 Freshman Junior Senior Sophomore 
       10 10 10 10

使用行分数进行分层抽样

以下代码演示了如何使用dplyr包中的group_by()和sample_frac()函数来获取分层随机样本，我们从中随机选择每个年级 15% 的学生：

 library (dplyr)

#obtain laminated sample
strat_sample <- df %>%
                  group_by (grade) %>%
                  sample_frac (size=.15)

#find frequency of students from each grade
table(strat_sample$grade)

 Freshman Junior Senior Sophomore 
       15 15 15 15

其他资源

抽样方法的类型
 R 中的聚类采样
 R 中的系统抽样

关于作者

本杰明·安德森博

大家好，我是本杰明，一位退休的统计学教授，后来成为 Statorials 的热心教师。凭借在统计领域的丰富经验和专业知识，我渴望分享我的知识，通过 Statorials 增强学生的能力。了解更多