Pandas：如何按组使用describe()

经过本杰明·安德森博 7月 16, 2023 指导 0 条评论

您可以使用describe()函数为pandas DataFrame中的变量生成描述性统计数据。

您可以使用以下基本语法在pandas中将describe()函数与groupby()函数一起使用：

 df. groupby (' group_var ')[' values_var ']. describe ()

以下示例展示了如何在实践中使用此语法。

示例：在 Pandas 中按 Group 使用describe()

假设我们有以下 pandas DataFrame，其中包含来自两个不同球队的篮球运动员的信息：

 import pandas as pd

#createDataFrame
df = pd. DataFrame ({' team ': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   ' points ': [8, 12, 14, 14, 15, 22, 27, 24],
                   ' assists ':[2, 2, 3, 5, 7, 6, 8, 12]})

#view DataFrame
print (df)

  team points assists
0 to 8 2
1 to 12 2
2 to 14 3
3 to 14 5
4 B 15 7
5 B 22 6
6 B 27 8
7 B 24 12

我们可以使用describe()函数和groupby()函数来总结每个团队的积分列中的值：

 #summarize points by team
df. groupby (' team ')[' points ']. describe ()

count mean std min 25% 50% 75% max
team								
A 4.0 12.0 2.828427 8.0 11.00 13.0 14.00 14.0
B 4.0 22.0 5.099020 15.0 20.25 23.0 24.75 27.0

从结果中我们可以看到每个团队的积分变量值如下：

计数（观察数）
平均值（平均分值）
std （点值的标准差）
min （最小分值）
25 %（第 25 个百分位数）
50 %（分数的第 50 个百分位（即中位数））
75 %（第 75 个百分位数）
max （最大点值）

如果您希望结果以DataFrame格式显示，可以使用reset_index()参数：

 #summarize points by team
df. groupby (' team ')[' points ']. describe (). reset_index ()

        team count mean std min 25% 50% 75% max
0 A 4.0 12.0 2.828427 8.0 11.00 13.0 14.00 14.0
1 B 4.0 22.0 5.099020 15.0 20.25 23.0 24.75 27.0

team变量现在是 DataFrame 中的一列，索引值为 0 和 1。

其他资源

以下教程解释了如何在 pandas 中执行其他常见操作：

Pandas：如何计算每组的累计和
 Pandas：如何按组计算唯一值
 Pandas：如何按组计算相关性

关于作者

本杰明·安德森博

大家好，我是本杰明，一位退休的统计学教授，后来成为 Statorials 的热心教师。凭借在统计领域的丰富经验和专业知识，我渴望分享我的知识，通过 Statorials 增强学生的能力。了解更多

示例：在 Pandas 中按 Group 使用describe()

其他资源

关于作者

本杰明·安德森博

添加评论