如何从 scikit-learn 获取回归模型摘要

经过本杰明·安德森博 7月 19, 2023 指导 0 条评论

通常，您可能希望提取在 Python 中使用scikit-learn创建的回归模型的摘要。

不幸的是，scikit-learn 没有提供许多用于分析回归模型摘要的内置函数，因为它通常仅用于预测目的。

因此，如果您想获得 Python 回归模型的摘要，您有两个选择：

1.使用scikit-learn的有限功能。

以下示例展示了如何在实践中使用以下 pandas DataFrame 的每种方法：

 import pandas as pd

#createDataFrame
df = pd. DataFrame ({' x1 ': [1, 2, 2, 4, 2, 1, 5, 4, 2, 4, 4],
                   ' x2 ': [1, 3, 3, 5, 2, 2, 1, 1, 0, 3, 4],
                   ' y ': [76, 78, 85, 88, 72, 69, 94, 94, 88, 92, 90]})

#view first five rows of DataFrame
df. head ()

       x1 x2 y
0 1 1 76
1 2 3 78
2 2 3 85
3 4 5 88
4 2 2 72

方法 1：从 Scikit-Learn 获取回归模型摘要

我们可以使用以下代码来使用 scikit-learn 拟合多元线性回归模型：

 from sklearn. linear_model import LinearRegression

#initiate linear regression model
model = LinearRegression()

#define predictor and response variables
x, y = df[[' x1 ', ' x2 ']], df. y

#fit regression model
model. fit (x,y)

然后我们可以使用以下代码从模型中提取回归系数以及模型的R 平方值：

 #display regression coefficients and R-squared value of model
print (model. intercept_ , model. coef_ , model. score (X, y))

70.4828205704 [5.7945 -1.1576] 0.766742556527

使用此输出，我们可以编写拟合回归模型的方程：

y = 70.48 + 5.79x ₁ – 1.16x ₂

还可以看出，模型的R ²值为76.67。

这意味着响应变量中76.67%的变异可以由模型中的两个预测变量来解释。

虽然这个结果很有用，但我们仍然不知道模型的整体 F 统计量、各个回归系数的 p 值以及其他可以帮助我们了解模型拟合程度的有用度量。数据集.数据集。

方法2：从Statsmodels获取回归模型摘要

如果你想在Python中提取回归模型的摘要，最好使用statsmodels包。

以下代码展示了如何使用此包来拟合与前面的示例相同的多元线性回归模型并提取模型摘要：

 import statsmodels. api as sm

#define response variable
y = df[' y ']

#define predictor variables
x = df[[' x1 ', ' x2 ']]

#add constant to predictor variables
x = sm. add_constant (x)

#fit linear regression model
model = sm. OLS (y,x). fit ()

#view model summary
print ( model.summary ())

                            OLS Regression Results                            
==================================================== ============================
Dept. Variable: y R-squared: 0.767
Model: OLS Adj. R-squared: 0.708
Method: Least Squares F-statistic: 13.15
Date: Fri, 01 Apr 2022 Prob (F-statistic): 0.00296
Time: 11:10:16 Log-Likelihood: -31.191
No. Comments: 11 AIC: 68.38
Df Residuals: 8 BIC: 69.57
Df Model: 2                                         
Covariance Type: non-robust                                         
==================================================== ============================
                 coef std err t P>|t| [0.025 0.975]
-------------------------------------------------- ----------------------------
const 70.4828 3.749 18.803 0.000 61.839 79.127
x1 5.7945 1.132 5.120 0.001 3.185 8.404
x2 -1.1576 1.065 -1.087 0.309 -3.613 1.298
==================================================== ============================
Omnibus: 0.198 Durbin-Watson: 1.240
Prob(Omnibus): 0.906 Jarque-Bera (JB): 0.296
Skew: -0.242 Prob(JB): 0.862
Kurtosis: 2.359 Cond. No. 10.7
==================================================== ============================

请注意，回归系数和 R 平方值与 scikit-learn 计算的结果相匹配，但我们还有大量其他用于回归模型的有用指标。

例如，我们可以看到每个单独预测变量的 p 值：

x ₁的 p 值 = 0.001
x ₂的 p 值 = 0.309

我们还可以看到模型的整体 F 统计量、调整后的 R 平方值、模型AIC 值等等。

其他资源

以下教程解释了如何在 Python 中执行其他常见操作：

如何在 Python 中执行简单线性回归
 如何在 Python 中执行多元线性回归
 如何用Python计算回归模型的AIC

关于作者

本杰明·安德森博

大家好，我是本杰明，一位退休的统计学教授，后来成为 Statorials 的热心教师。凭借在统计领域的丰富经验和专业知识，我渴望分享我的知识，通过 Statorials 增强学生的能力。了解更多

方法 1：从 Scikit-Learn 获取回归模型摘要

方法2：从Statsmodels获取回归模型摘要

其他资源

关于作者

本杰明·安德森博

添加评论