如何在 python 中对时间序列数据重新采样（附示例）

经过本杰明·安德森博 7月 22, 2023 指导 0 条评论

对时间序列数据进行重采样意味着对新时期的数据进行汇总或聚合。

我们可以使用以下基本语法在 Python 中对时间序列数据进行重新采样：

 #find sum of values in column1 by month
weekly_df[' column1 '] = df[' column1 ']. resample (' M '). sum ()

#find mean of values in column1 by week
weekly_df[' column1 '] = df[' column1 ']. resample (' W '). mean ()

请注意，我们可以将时间序列数据重新采样到不同的时间段，包括：

S ：秒
分钟：分钟
H : 小时
J ：日
W : 周
月: 月
问：四分之一
答：年份

以下示例展示了如何在实践中对时间序列数据进行重采样。

示例：在 Python 中重新采样时间序列数据

假设我们有以下 pandas DataFrame，显示一家公司在一年内每小时的总销售额：

 import pandas as pd
import numpy as np

#make this example reproducible
n.p. random . seeds (0)

#create DataFrame with hourly index
df = pd. DataFrame (index= pd.date_range (' 2020-01-06 ', ' 2020-12-27 ', freq=' h '))

#add column to show sales by hour
df[' sales '] = np. random . randint (low=0, high=20, size= len (df. index ))

#view first five rows of DataFrame
df. head ()

	             dirty
2020-01-06 00:00:00 12
2020-01-06 01:00:00 15
2020-01-06 02:00:00 0
2020-01-06 03:00:00 3
2020-01-06 04:00:00 3

如果我们创建一个线图来可视化销售数据，它将如下所示：

 import matplotlib. pyplot as plt

#plot time series data
plt. plot (df. index , df. sales , linewidth= 3 )

这个图表很难解释，所以我们可以按周总结销售数据：

 #create new DataFrame
weekly_df = pd. DataFrame ()

#create 'sales' column that summarizes total sales by week
weekly_df[' sales '] = df[' sales ']. resample (' W '). sum ()

#view first five rows of DataFrame
weekly_df. head ()

                dirty
2020-01-12 1519
2020-01-19 1589
2020-01-26 1540
2020-02-02 1562
2020-02-09 1614

这个新的 DataFrame 显示每周的销售额总和。

然后我们可以使用每周数据创建时间序列图：

 import matplotlib. pyplot as plt

#plot weekly sales data
plt. plot ( weekly_df.index , weekly_df.sales , linewidth= 3 )

此图表更容易阅读，因为我们仅表示 51 个单独周的销售数据，而不是第一个示例中的 8,545 个单独小时的销售数据。

注意：在本例中，我们按周汇总销售数据，但如果我们想绘制更少的数据点，我们也可以按月或季度汇总。

其他资源

以下教程解释了如何在 Python 中执行其他常见操作：

如何在 Matplotlib 中绘制时间序列
 如何在 Seaborn 中绘制时间序列
 如何用Python从时间序列计算MAPE

关于作者

本杰明·安德森博

大家好，我是本杰明，一位退休的统计学教授，后来成为 Statorials 的热心教师。凭借在统计领域的丰富经验和专业知识，我渴望分享我的知识，通过 Statorials 增强学生的能力。了解更多

示例：在 Python 中重新采样时间序列数据

其他资源

关于作者

本杰明·安德森博

添加评论