如何在 python 中将数据居中：带有示例

经过本杰明·安德森博 7月 21, 2023 指导 0 条评论

将数据集居中意味着减去数据集中每个单独观测值的平均值。

一旦将数据集居中，数据集的平均值就会变为零。

以下示例展示了如何在 Python 中居中数据。

示例 1：将 NumPy 数组的值居中

假设我们有以下 NumPy 数组：

 import numpy as np

#create NumPy array
data = np. array ([4, 6, 9, 13, 14, 17, 18, 19, 19, 21])

#display mean of array
print ( data.mean ())

14.0

我们可以定义一个函数来从每个单独的观察值中减去平均数组值：

 #create function to data center
center_function = lambda x: x - x. mean ()

#apply function to original NumPy array
data_centered = center_function(data)

#view updated Array
print (data_centered)

array([-10., -8., -5., -1., 0., 3., 4., 5., 5., 7.])

结果值是数据集的中心值。

由于原始表的平均值为 14，因此该函数只是从原始表中的每个单独值中减去 14。

例如：

居中数组的第一个值 = 4 – 14 = -10
居中数组的第二个值 = 6 – 14 = -8
居中数组中的第三个值 = 9 – 14 = -5

等等。

我们还可以检查居中表的平均值是否为零：

 #display mean of centered array
print ( data_centered.mean ())

0.0

示例 2：将 Pandas DataFrame 的列居中

假设我们有以下 pandas DataFrame：

 import pandas as pd

#createDataFrame
df = pd. DataFrame ({' x ': [1, 4, 5, 6, 6, 8, 9],
                   ' y ': [7, 7, 8, 8, 8, 9, 12],
                   ' z ': [3, 3, 4, 4, 6, 7, 7]})

#view DataFrame
print (df)

   X Y Z
0 1 7 3
1 4 7 3
2 5 8 4
3 6 8 4
4 6 8 6
5 8 9 7
6 9 12 7

我们可以使用 pandas apply()函数将 DataFrame 中每一列的值居中：

 #center the values in each column of the DataFrame
df_centered = df. apply ( lambda x: xx.mean ())

#view centered DataFrame
print (df_centered)

	        X Y Z
0 -4.571429 -1.428571 -1.857143
1 -1.571429 -1.428571 -1.857143
2 -0.571429 -0.428571 -0.857143
3 0.428571 -0.428571 -0.857143
4 0.428571 -0.428571 1.142857
5 2.428571 0.571429 2.142857
6 3.428571 3.571429 2.142857

然后我们可以验证每列的平均值是否为零：

 #display mean of each column in the DataFrame
df_centered. mean ()

x 2.537653e-16
y-2.537653e-16
z 3.806479e-16
dtype:float64

列平均值以科学计数法显示，但每个值基本上为零。

其他资源

以下教程解释了如何在 Python 中执行其他常见操作：

如何在 Python 中计算截尾平均值
 如何在 Python 中计算均方误差 (MSE)
如何计算 Pandas 中选定列的平均值

关于作者

本杰明·安德森博

大家好，我是本杰明，一位退休的统计学教授，后来成为 Statorials 的热心教师。凭借在统计领域的丰富经验和专业知识，我渴望分享我的知识，通过 Statorials 增强学生的能力。了解更多

示例 1：将 NumPy 数组的值居中

示例 2：将 Pandas DataFrame 的列居中

其他资源

关于作者

本杰明·安德森博

添加评论