Pandas：如何使用 groupby 和有条件计数

经过本杰明·安德森博 7月 18, 2023 指导 0 条评论

您可以使用以下基本语法在 pandas DataFrame 中执行 groupby 和条件计数：

 df. groupby (' var1 ')[' var2 ']. apply ( lambda x:(x==' val '). sum ()). reset_index (name=' count ')

此特定语法根据var1对 DataFrame 的行进行分组，然后计算var2等于“val”的行数。

以下示例展示了如何在实践中使用此语法。

示例：Pandas 中的 Groupby 和条件计数

假设我们有以下 pandas DataFrame，其中包含有关各种篮球运动员的信息：

 import pandas as pd

#createDataFrame
df = pd. DataFrame ({' team ': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   ' pos ': ['Gu', 'Fo', 'Fo', 'Fo', 'Gu', 'Gu', 'Fo', 'Fo'],
                   ' points ': [18, 22, 19, 14, 14, 11, 20, 28]})

#view DataFrame
print (df)

  team pos points
0 A Gu 18
1 A Fo 22
2 A Fo 19
3 A Fo 14
4 B Gu 14
5 B Gu 11
6 B Fo 20
7 B Fo 28

以下代码显示了如何按team变量对 DataFrame 进行分组，并计算pos变量等于“Gu”的行数：

 #groupby team and count number of 'pos' equal to 'Gu'
df_count = df. groupby (' team ')[' pos ']. apply ( lambda x: (x==' Gu '). sum ()). reset_index (name=' count ')

#view results
print (df_count)

  team count
0 to 1
1 B 2

从结果我们可以看出：

A 队有1行，其中 pos 列等于“Gu”
B 队有2行，其中 pos 列等于“Gu”

我们可以使用类似的语法来执行分组并使用数字条件进行计数。

例如，以下代码显示了如何按team变量进行分组并计算Points变量大于 15 的行数：

 #groupby team and count number of 'points' greater than 15
df_count = df. groupby (' team ')[' points ']. apply ( lambda x: (x>15). sum ()). reset_index (name=' count ')

#view results
print (df_count)

  team count
0 to 3
1 B 2

从结果我们可以看出：

A 队有3 条线得分大于 15
B 队有2 条线得分大于 15

您可以使用类似的语法来执行分组并根据您想要的任何特定条件进行计数。

其他资源

以下教程解释了如何在 pandas 中执行其他常见任务：

如何使用 Pandas GroupBy 计算唯一值
 如何将函数应用于 Pandas Groupby
如何从 Pandas GroupBy 创建条形图

关于作者

本杰明·安德森博

大家好，我是本杰明，一位退休的统计学教授，后来成为 Statorials 的热心教师。凭借在统计领域的丰富经验和专业知识，我渴望分享我的知识，通过 Statorials 增强学生的能力。了解更多

示例：Pandas 中的 Groupby 和条件计数

其他资源

关于作者

本杰明·安德森博

添加评论