如何保存 pandas dataframe 供以后使用（举例）

经过本杰明·安德森博 18 7 月, 2023 指导 0 条评论

通常，您可能希望保存 pandas DataFrame 以供以后使用，而无需从 CSV 文件重新导入数据。

最简单的方法是使用to_pickle()将 DataFrame 保存为 pickle 文件：

 df. to_pickle (" my_data.pkl ")

这会将 DataFrame 保存在您当前的工作环境中。

然后，您可以使用read_pickle()从 pickle 文件中快速读取 DataFrame：

 df = pd. read_pickle (" my_data.pkl ")

下面的例子展示了如何在实践中使用这些函数。

示例：保存并加载 Pandas DataFrame

假设我们创建以下 pandas DataFrame，其中包含有关各个篮球队的信息：

 import pandas as pd

#createDataFrame
df = pd. DataFrame ({' team ': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
                   ' points ': [18, 22, 19, 14, 14, 11, 20, 28],
                   ' assists ': [5, 7, 7, 9, 12, 9, 9, 4],
                   ' rebounds ': [11, 8, 10, 6, 6, 5, 9, 12]})

#view DataFrame
print (df)

  team points assists rebounds
0 A 18 5 11
1 B 22 7 8
2 C 19 7 10
3 D 14 9 6
4 E 14 12 6
5 F 11 9 5
6 G 20 9 9
7:28 4 12

我们可以使用df.info()显示 DataFrame 中每个变量的数据类型：

 #view DataFrame info
print ( df.info ())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 4 columns):
 # Column Non-Null Count Dtype 
--- ------ -------------- ----- 
 0 team 8 non-null object
 1 point 8 non-null int64 
 2 assists 8 non-null int64 
 3 rebounds 8 non-null int64 
dtypes: int64(3), object(1)
memory usage: 292.0+ bytes
None

我们可以使用to_pickle()函数将此 DataFrame 保存到扩展名为.pkl的 pickle 文件中：

 #save DataFrame to pickle file
df. to_pickle (" my_data.pkl ")

我们的 DataFrame 现在在当前工作环境中保存为 pickle 文件。

然后我们可以使用read_pickle()函数快速读取 DataFrame：

 #read DataFrame from pickle file
df=pd. read_pickle (" my_data.pkl ")

#view DataFrame
print (df)

team points assists rebounds
0 A 18 5 11
1 B 22 7 8
2 C 19 7 10
3 D 14 9 6
4 E 14 12 6
5 F 11 9 5
6 G 20 9 9
7:28 4 12

我们可以再次使用df.info()来确认每列的数据类型与之前相同：

 #view DataFrame info
print ( df.info ())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8 entries, 0 to 7
Data columns (total 4 columns):
 # Column Non-Null Count Dtype 
--- ------ -------------- ----- 
 0 team 8 non-null object
 1 point 8 non-null int64 
 2 assists 8 non-null int64 
 3 rebounds 8 non-null int64 
dtypes: int64(3), object(1)
memory usage: 292.0+ bytes
None

使用 pickle 文件的优点是，当我们保存和加载 DataFrame 时，会保留每列的数据类型。

与保存和加载 CSV 文件相比，这具有优势，因为我们不需要对 DataFrame 执行任何转换，因为 pickle 文件保留了 DataFrame 的原始状态。

其他资源

以下教程解释了如何修复 Python 中的其他常见错误：

如何修复 Pandas 中的 KeyError
如何修复：ValueError：无法将 float NaN 转换为 int
如何修复：ValueError：操作数无法与形状一起广播

关于作者

本杰明·安德森博

大家好，我是本杰明，一位退休的统计学教授，后来成为 Statorials 的热心教师。凭借在统计领域的丰富经验和专业知识，我渴望分享我的知识，通过 Statorials 增强学生的能力。了解更多

示例：保存并加载 Pandas DataFrame

其他资源

关于作者

本杰明·安德森博

添加评论