如何计算 r 中的残差标准误差

经过本杰明·安德森博 7月 27, 2023 指导 0 条评论

每当我们在 R 中拟合线性回归模型时，该模型都采用以下形式：

Y = β ₀ + β ₁ X + … + β _i

其中 ϵ 是独立于 X 的误差项。

无论如何使用X来预测Y的值，模型中总会存在随机误差。测量该随机误差的分散性的一种方法是使用残差标准误差，这是测量残差 ϵ 标准差的一种方法。

回归模型的残差标准误差计算如下：

残差标准误 = √ SS_残差/ df_残差

金子：

_残差SS ：残差平方和。
_残差df ：残差自由度，计算公式为 n – k – 1，其中 n = 观测值总数，k = 模型参数总数。

我们可以使用三种方法来计算 R 中回归模型的残差标准误差。

方法一：分析模型概要

获取残差标准误差的第一种方法是简单地拟合线性回归模型，然后使用summary()命令获取模型结果。然后只需在输出底部查找“剩余标准误差”：

 #load built-in mtcars dataset
data(mtcars)

#fit regression model
model <- lm(mpg~disp+hp, data=mtcars)

#view model summary
summary(model)

Call:
lm(formula = mpg ~ disp + hp, data = mtcars)

Residuals:
    Min 1Q Median 3Q Max 
-4.7945 -2.3036 -0.8246 1.8582 6.9363 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 30.735904 1.331566 23.083 < 2nd-16 ***
available -0.030346 0.007405 -4.098 0.000306 ***
hp -0.024840 0.013385 -1.856 0.073679 .  
---
Significant. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.127 on 29 degrees of freedom
Multiple R-squared: 0.7482, Adjusted R-squared: 0.7309 
F-statistic: 43.09 on 2 and 29 DF, p-value: 2.062e-09

我们可以看到残差标准误差为3.127 。

方法 2：使用简单的公式

获得残差标准误差 (RSE) 的另一种方法是拟合线性回归模型，然后使用以下公式计算 RSE：

 sqrt( deviance (model)/df. residual (model))

以下是如何在 R 中实现该公式：

 #load built-in mtcars dataset
data(mtcars)

#fit regression model
model <- lm(mpg~disp+hp, data=mtcars)

#calculate residual standard error
sqrt( deviance (model)/df. residual (model))

[1] 3.126601

我们可以看到残差标准误差为3.126601 。

方法 3：使用分步公式

获得残差标准误差的另一种方法是拟合线性回归模型，然后使用逐步方法计算 RSE 公式的每个单独组件：

 #load built-in mtcars dataset
data(mtcars)

#fit regression model
model <- lm(mpg~disp+hp, data=mtcars)

#calculate the number of model parameters - 1
k=length(model$ coefficients )-1

#calculate sum of squared residuals
SSE=sum(model$ residuals **2)

#calculate total observations in dataset
n=length(model$ residuals )

#calculate residual standard error
sqrt(SSE/(n-(1+k)))

[1] 3.126601

我们可以看到残差标准误差为3.126601 。

如何解释残差标准误差

如前所述，残差标准误差 (RSE) 是衡量回归模型中残差标准差的一种方法。

CSR值越低，模型拟合数据的能力就越好（但要小心过度拟合）。在比较两个或多个模型以确定哪个模型最适合数据时，这可能是一个有用的指标。

其他资源

如何解释残差标准误差
 如何在 R 中执行多元线性回归
 如何在 R 中交叉验证模型性能
 如何计算 R 中的标准差

关于作者

本杰明·安德森博

大家好，我是本杰明，一位退休的统计学教授，后来成为 Statorials 的热心教师。凭借在统计领域的丰富经验和专业知识，我渴望分享我的知识，通过 Statorials 增强学生的能力。了解更多