如何解读sklearn中的分类报告（附示例）

经过本杰明·安德森博 7月 19, 2023 指导 0 条评论

当我们在机器学习中使用分类模型时，我们使用三个常用指标来评估模型的质量：

1. 准确度：正确的阳性预测占总阳性预测的百分比。

2. 召回率：正确的阳性预测与实际阳性总数相比的百分比。

3. F1 分数：精确率和召回率的加权调和平均值。模型越接近1，模型越好。

F1 分数：2*（准确率 * 召回率）/（准确率 + 召回率）

使用这三个指标，我们可以了解给定的分类模型能够预测某些响应变量的结果。

幸运的是，在Python中拟合分类模型时，我们可以使用sklearn库中的classification_report()函数来生成这三个指标。

下面的例子展示了如何在实际中使用这个功能。

示例：如何在sklearn中使用分类报告

在此示例中，我们将拟合一个逻辑回归模型，该模型使用得分和助攻来预测 1,000 名不同的大学篮球运动员是否会被选入 NBA。

首先，我们将导入必要的包以在 Python 中执行逻辑回归：

 import pandas as pd
import numpy as np
from sklearn. model_selection import train_test_split
from sklearn. linear_model import LogisticRegression
from sklearn. metrics import classification_report

接下来，我们将创建包含 1000 名篮球运动员信息的数据框：

 #make this example reproducible
n.p. random . seeds (1)

#createDataFrame
df = pd. DataFrame ({' points ': np. random . randint (30, size=1000),
                   ' assists ': np. random . randint (12, size=1000),
                   ' drafted ': np. random . randint (2, size=1000)})

#view DataFrame
df. head ()

	points assists drafted
0 5 1 1
1 11 8 0
2 12 4 1
3 8 7 0
4 9 0 0

注意：值为0表示球员未被选秀，而值为1表示球员被选秀。

接下来，我们将数据分为训练集和测试集，并拟合逻辑回归模型：

 #define the predictor variables and the response variable
X = df[[' points ', ' assists ']]
y = df[' drafted ']

#split the dataset into training (70%) and testing (30%) sets
X_train,X_test,y_train,y_test = train_test_split (X,y,test_size=0.3,random_state=0)  

#instantiate the model
logistic_regression = LogisticRegression()

#fit the model using the training data
logistic_regression. fit (X_train,y_train)

#use model to make predictions on test data
y_pred = logistic_regression. predict (X_test)

最后，我们将使用classification_report()函数来打印模型的分类指标：

 #print classification report for model
print (classification_report(y_test, y_pred))

              precision recall f1-score support

           0 0.51 0.58 0.54 160
           1 0.43 0.36 0.40 140

    accuracy 0.48 300
   macro avg 0.47 0.47 0.47 300
weighted avg 0.47 0.48 0.47 300

以下是如何解释结果：

澄清：在所有模型预测他们将被选中的球员中，只有43%实际上被选中。

提醒：在所有实际选秀的球员中，模型仅正确预测了其中36%的结果。

F1 分数：该值计算如下：

F1 分数：2*（准确率 * 召回率）/（准确率 + 召回率）
F1分数：2*(.43*.36)/(.43+.36)
F1 评级： 0.40 。

由于该值不太接近 1，这表明该模型无法很好地预测球员是否会被选中。

支持：这些值只是告诉我们测试数据集中有多少玩家属于每个类别。我们可以看到，测试数据集中的球员中有160 人落选， 140 人落选。

注意：您可以在此处找到classification_report()函数的完整文档。

其他资源

以下教程提供了有关在 Python 中使用分类模型的更多信息：

如何在 Python 中执行逻辑回归
 如何在 Python 中创建混淆矩阵
 如何在Python中计算平衡精度

关于作者

本杰明·安德森博

大家好，我是本杰明，一位退休的统计学教授，后来成为 Statorials 的热心教师。凭借在统计领域的丰富经验和专业知识，我渴望分享我的知识，通过 Statorials 增强学生的能力。了解更多

示例：如何在sklearn中使用分类报告

其他资源

关于作者

本杰明·安德森博

添加评论