如何使用 dplyr 汇总多列

经过本杰明·安德森博 18 7 月, 2023 指导 0 条评论

您可以使用以下方法使用 dplyr 来汇总数据框中的多个列：

方法 1：汇总所有列

 #summarize mean of all columns
df %>%
  group_by(group_var) %>%
  summarise(across(everything(), mean, na. rm = TRUE ))

方法 2：汇总特定列

 #summarize mean of col1 and col2 only
df %>%
  group_by(group_var) %>%
  summarise(across(c(col1, col2), mean, na. rm = TRUE ))

方法 3：汇总所有数字列

 #summarize mean and standard deviation of all numeric columns
df %>%
  group_by(group_var) %>%
  summarise(across(where(is. numeric ), list(mean=mean, sd=sd), na. rm = TRUE ))

以下示例展示了如何将每种方法与以下数据框结合使用：

 #create data frame
df <- data. frame (team=c('A', 'A', 'A', 'B', 'B', 'B'),
                 points=c(99, 90, 86, 88, 95, 90),
                 assists=c(33, 28, 31, 39, 34, 25),
                 rebounds=c(NA, 28, 24, 24, 28, 19))

#view data frame
df

  team points assists rebounds
1 A 99 33 NA
2 A 90 28 28
3 A 86 31 24
4 B 88 39 24
5 B 95 34 28
6 B 90 25 19

示例 1：汇总所有列

以下代码显示了如何汇总所有列的平均值：

 library (dplyr)

#summarize mean of all columns, grouped by team
df %>%
  group_by(team) %>%
  summarise(across(everything(), mean, na. rm = TRUE ))

# A tibble: 2 x 4
  team points assists rebounds
           
1 A 91.7 30.7 26  
2 B 91 32.7 23.7

示例 2：汇总特定列

以下代码显示了如何汇总得分和篮板数列的平均值：

 library (dplyr)

#summarize mean of points and rebounds, grouped by team
df %>%
  group_by(team) %>%
  summarise(across(c(points, rebounds), mean, na. rm = TRUE ))

# A tibble: 2 x 3
  team points rebounds
        
1 A 91.7 26  
2 B 91 23.7

示例 3：汇总所有数字列

以下代码显示如何汇总数据框中所有数字列的平均值和标准差：

 library (dplyr)

#summarize mean and standard deviation of all numeric columns
df %>%
  group_by(team) %>%
  summarise(across(where(is. numeric ), list(mean=mean, sd=sd), na. rm = TRUE ))

# A tibble: 2 x 7
  team points_mean points_sd assists_mean assists_sd rebounds_mean rebounds_sd
                                            
1 A 91.7 6.66 30.7 2.52 26 2.83
2 B 91 3.61 32.7 7.09 23.7 4.51

输出显示数据框中所有数值变量的平均值和标准差。

请注意，在本示例中，我们使用list()函数列出了我们想要计算的几个汇总统计信息。

注意：在每个示例中，我们都使用了 dplyr across()函数。您可以在此处找到此功能的完整文档。

其他资源

以下教程解释了如何使用 dplyr 执行其他常见功能：

如何使用 dplyr 删除行
 如何使用 dplyr 排列行
如何使用 dplyr 按多个条件进行过滤

关于作者

本杰明·安德森博

大家好，我是本杰明，一位退休的统计学教授，后来成为 Statorials 的热心教师。凭借在统计领域的丰富经验和专业知识，我渴望分享我的知识，通过 Statorials 增强学生的能力。了解更多