如何在 python 中执行似然比检验

经过本杰明·安德森博 7月 22, 2023 指导 0 条评论

似然比检验比较两个嵌套回归模型的拟合优度。

嵌套模型只是在整体回归模型中包含预测变量子集的模型。

例如，假设我们有以下具有四个预测变量的回归模型：

Y = β ₀ + β ₁ x ₁ + β ₂ x ₂ + β ₃ x ₃ + β ₄ x ₄ + ε

嵌套模型的一个示例是以下仅包含两个原始预测变量的模型：

Y = β ₀ + β ₁ x ₁ + β ₂ x ₂ + ε

为了确定这两个模型是否显着不同，我们可以使用以下原假设和备择假设执行似然比检验：

H ₀ ：完整模型和嵌套模型同样适合数据。所以，你应该使用嵌套模型。

H _A ：完整模型比嵌套模型更适合数据。所以你必须使用完整的模板。

如果检验的p 值低于一定的显着性水平（例如 0.05），那么我们可以拒绝零假设并得出结论：完整模型提供了明显更好的拟合。

以下分步示例展示了如何在 Python 中执行似然比检验。

第 1 步：加载数据

在此示例中，我们将展示如何使用mtcars数据集中的数据在 Python 中拟合以下两个回归模型：

完整模型： mpg = β ₀ + β ₁可用 + β ₂碳水化合物 + β ₃ hp + β ₄汽缸

模型： mpg = β ₀ + β ₁可用 + β ₂碳水化合物

首先，我们将加载数据集：

 from sklearn. linear_model import LinearRegression
import statsmodels. api as sm
import pandas as pd
import scipy

#define URL where dataset is located
url = "https://raw.githubusercontent.com/Statorials/Python-Guides/main/mtcars.csv"

#read in data
data = pd. read_csv (url)

相关：如何使用 Pandas 读取 CSV 文件

步骤 2：拟合回归模型

首先，我们将拟合完整模型并计算模型的对数似然：

 #define response variable
y1 = data['mpg']

#define predictor variables
x1 = data[['disp', 'carb', 'hp', 'cyl']]

#add constant to predictor variables
x1 = sm. add_constant (x1)

#fit regression model
full_model = sm. OLS (y1,x1). fit ()

#calculate log-likelihood of model
full_ll = full_model. llf

print (full_ll)

-77.55789711787898

接下来，我们将拟合简化模型并计算模型的对数似然：

 #define response variable
y2 = data['mpg']

#define predictor variables
x2 = data[['disp', 'carb']]

#add constant to predictor variables
x2 = sm. add_constant (x2)

#fit regression model
reduced_model = sm. OLS (y2, x2). fit ()

#calculate log-likelihood of model
reduced_ll = reduced_model. llf

print (reduced_ll)

-78.60301334355185

步骤 3：执行对数似然检验

接下来，我们将使用以下代码来执行合理性测试：

 #calculate likelihood ratio Chi-Squared test statistic
LR_statistic = -2 * (reduced_ll-full_ll)

print (LR_statistic)

2.0902324513457415

#calculate p-value of test statistic using 2 degrees of freedom
p_val = scipy. stats . chi2 . sf (LR_statistic, 2)

print (p_val)

0.35165094613502257

从结果中，我们可以看到卡方检验统计量为2.0902 ，相应的 p 值为0.3517 。

由于该 p 值不小于 0.05，因此我们将无法拒绝原假设。

这意味着完整模型和嵌套模型同样适合数据。因此，我们必须使用嵌套模型，因为完整模型中的附加预测变量不会显着改善拟合效果。

所以，我们的最终模型是：

mpg = β ₀ + β ₁可用 + β ₂碳水化合物

注意：我们在计算 p 值时使用 2 个自由度，因为这代表两个模型之间使用的总预测变量的差异。

其他资源

以下教程提供了有关在 Python 中使用回归模型的更多信息：

Python 线性回归完整指南
 如何在 Python 中执行多项式回归
 如何在 Python 中执行逻辑回归

关于作者

本杰明·安德森博

大家好，我是本杰明，一位退休的统计学教授，后来成为 Statorials 的热心教师。凭借在统计领域的丰富经验和专业知识，我渴望分享我的知识，通过 Statorials 增强学生的能力。了解更多