如何将函数应用于 pandas groupby

经过本杰明·安德森博 7月 21, 2023 指导 0 条评论

您可以使用以下基本语法在 pandas DataFrame 中一起使用groupby()和apply()函数：

 df. groupby (' var1 '). apply ( lambda x: some function)

以下示例展示了如何在实践中通过以下 pandas DataFrame 使用此语法：

 import pandas as pd

#createDataFrame
df = pd. DataFrame ({' team ': ['A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   ' points_for ': [18, 22, 19, 14, 11, 20, 28],
                   ' points_against ': [14, 21, 19, 14, 12, 20, 21]})

#view DataFrame
print (df)

  team points_for points_against
0 to 18 14
1 To 22 21
2 A 19 19
3 B 14 14
4 B 11 12
5 B 20 20
6 B 28 21

示例 1：使用 groupby() 和 apply() 查找相对频率

以下代码展示了如何使用groupby( ) 和apply()函数查找 pandas DataFrame 中每个团队名称的相对频率：

 #find relative frequency of each team name in DataFrame
df. groupby (' team '). apply ( lambda x:x[' team ']. count ()/ df.shape [0])

team
A 0.428571
B 0.571429
dtype:float64

从结果中我们可以看到，A 队出现在所有行中的 42.85%，B 队出现在所有行中的 57.14%。

示例 2：使用 groupby() 和 apply() 查找最大值

以下代码展示了如何使用groupby( ) 和apply()函数查找每个团队的最大“points_for”值：

 #find max "points_for" values for each team
df. groupby (' team '). apply ( lambda x:x[' points_for ']. max ())

team
At 22
B28
dtype: int64

从结果中我们可以看出，A队的最高分是22分，B队的最高分是28分。

示例 3：使用 groupby() 和 apply() 执行自定义计算

以下代码展示了如何使用groupby( ) 和apply()函数来查找每个团队的“points_for”和“points_against”之间的平均差值：

 #find max "points_for" values for each team
df. groupby (' team '). apply ( lambda x: (x[' points_for '] - x[' points_against ']). mean ())

team
A 1.666667
B 1.500000
dtype:float64

从结果中我们可以看出，A 队的“支持分”和“反对分”之间的平均差为1.67 ，B 队为1.50 。

其他资源

以下教程解释了如何在 pandas 中执行其他常见操作：

如何在 Pandas 中执行 GroupBy 求和
 如何在 Pandas 中使用 Groupby 和 Plot
如何在 Pandas 中使用 GroupBy 计算唯一值

关于作者

本杰明·安德森博

大家好，我是本杰明，一位退休的统计学教授，后来成为 Statorials 的热心教师。凭借在统计领域的丰富经验和专业知识，我渴望分享我的知识，通过 Statorials 增强学生的能力。了解更多