如何在 r 中删除重复行（附示例）

经过本杰明·安德森博 7月 23, 2023 指导 0 条评论

您可以使用以下两种方法之一从 R 中的数据框中删除重复行：

方法一：使用Base R

 #remove duplicate rows across entire data frame
df[ ! duplicated(df), ]

#remove duplicate rows across specific columns of data frame
df[ ! duplicated(df[c(' var1 ')]), ]

方法2：使用dplyr

 #remove duplicate rows across entire data frame 
df %>%
  distinct(.keep_all = TRUE )

#remove duplicate rows across specific columns of data frame
df %>%
  distinct(var1, .keep_all = TRUE )

以下示例展示了如何在实践中使用以下数据框使用此语法：

 #define data frame
df <- data. frame (team=c('A', 'A', 'A', 'B', 'B', 'B'),
                 position=c('Guard', 'Guard', 'Forward', 'Guard', 'Center', 'Center'))

#view data frame
df

  team position
1A Guard
2 A Guard
3 A Forward
4 B Guard
5B Center
6B Center

示例 1：使用 Base R 删除重复行

以下代码显示如何使用 R 基本函数从数据框中删除重复行：

 #remove duplicate rows from data frame
df[ ! duplicated(df), ]

  team position
1A Guard
3 A Forward
4 B Guard
5B Center

以下代码演示了如何使用基本 R 从数据框中的特定列中删除重复行：

 #remove rows where there are duplicates in the 'team' column
df[ ! duplicated(df[c(' team ')]), ]

  team position
1A Guard
4 B Guard

示例 2：使用 dplyr 删除重复行

以下代码显示如何使用dplyr包中的unique()函数从数据框中删除重复行：

 library (dplyr)

#remove duplicate rows from data frame
df %>%
  distinct(.keep_all = TRUE )

  team position
1A Guard
2 A Forward
3 B Guard
4B Center

请注意， .keep_all参数告诉 R 将所有列保留在原始数据框中。

以下代码显示如何使用distinct()函数从数据框中的特定列中删除重复行：

 library (dplyr)

#remove duplicate rows from data frame
df %>%
  distinct(team, .keep_all = TRUE )

  team position
1A Guard
2 B Guard

其他资源

以下教程解释了如何在 R 中执行其他常见功能：

如何根据条件删除R中的行
 如何删除R中特定列中带有NA的行

关于作者

本杰明·安德森博

大家好，我是本杰明，一位退休的统计学教授，后来成为 Statorials 的热心教师。凭借在统计领域的丰富经验和专业知识，我渴望分享我的知识，通过 Statorials 增强学生的能力。了解更多

示例 1：使用 Base R 删除重复行

示例 2：使用 dplyr 删除重复行

其他资源

关于作者

本杰明·安德森博

添加评论