Pandas：如何使用相当于 r 的 mutate() 函数

经过本杰明·安德森博 7月 16, 2023 指导 0 条评论

在 R 编程语言中，我们可以使用dplyr包中的mutate()函数快速将新列添加到根据现有列计算的数据帧中。

例如，以下代码显示如何计算 R 中特定列的平均值并将该值添加为数据框中的新列：

 library (dplyr)

#create data frame
df <- data. frame (team=c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'),
                 points=c(30, 22, 19, 14, 14, 11, 20, 28))

#add new column that shows mean points by team
df <- df %>%
      group_by(team) %>%
      mutate(mean_points = mean(points))

#view updated data frame
df

  team points mean_points           
1 to 30 21.2
2 A 22 21.2
3 A 19 21.2
4 A 14 21.2
5 B 14 18.2
6 B 11 18.2
7 B 20 18.2
8 B 28 18.2

pandas 中mutate()函数的等效项是transform()函数。

下面的例子展示了如何在实际中使用这个功能。

示例：使用pandas 中的transform() 来复制R 中的mutate()

假设我们有以下 pandas DataFrame，显示来自不同球队的篮球运动员的得分：

 import pandas as pd

#createDataFrame
df = pd. DataFrame ({' team ': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   ' points ': [30, 22, 19, 14, 14, 11, 20, 28]})

#view DataFrame
print (df)

  team points
0 to 30
1 to 22
2 to 19
3 to 14
4 B 14
5 B 11
6 B 20
7 B 28

我们可以使用transform()函数添加一个名为Mean_points的新列，该列显示每个团队的平均得分：

 #add new column to DataFrame that shows mean points by team
df[' mean_points '] = df. groupby (' team ')[' points ']. transform (' mean ')

#view updated DataFrame
print (df)

  team points mean_points
0 to 30 21.25
1 to 22 21.25
2 A 19 21.25
3 to 14 21.25
4 B 14 18.25
5 B 11 18.25
6 B 20 18.25
7 B 28 18.25

A 队球员的平均得分为21.25 ，B 队球员的平均得分为18.25 ，因此这些值被相应地分配给新列中的每个球员。

请注意，这与介绍性示例中使用mutate()函数获得的结果相匹配。

值得注意的是，您还可以在transform()函数中使用lambda来执行自定义计算。

例如，以下代码展示了如何使用lambda计算各自球队中每个球员的总得分百分比：

 #create new column called percent_of_points
df[' percent_of_points '] = df. groupby (' team ')[' points ']. transform ( lambda x:x/ x.sum ())

#view updated DataFrame
print (df)

  team points percent_of_points
0 A 30 0.352941
1 A 22 0.258824
2 A 19 0.223529
3 A 14 0.164706
4 B 14 0.191781
5 B 11 0.150685
6 B 20 0.273973
7 B 28 0.383562

以下是如何解释结果：

A队的第一名球员在A队总共85分中得到了30分，因此他占总分的比例为30/85 = 0.352941 。
A队第二名选手在A队总共85分中得到了22分，因此他占总分的比例为22/85= 0.258824 。

等等。

请注意，我们可以在Transform()函数中使用lambda参数来执行我们想要的任何自定义计算。

其他资源

以下教程解释了如何在 pandas 中执行其他常见操作：

如何在 Pandas 中执行 GroupBy 求和
 如何在 Pandas 中使用 Groupby 和 Plot
如何在 Pandas 中使用 GroupBy 计算唯一值

关于作者

本杰明·安德森博

大家好，我是本杰明，一位退休的统计学教授，后来成为 Statorials 的热心教师。凭借在统计领域的丰富经验和专业知识，我渴望分享我的知识，通过 Statorials 增强学生的能力。了解更多

示例：使用pandas 中的transform() 来复制R 中的mutate()

其他资源

关于作者

本杰明·安德森博

添加评论