Scikit-learn：使用多列标签编码

经过本杰明·安德森博 7月 16, 2023 指导 0 条评论

在机器学习中，标签编码是将分类变量的值转换为整数值的过程。

例如，以下屏幕截图显示了如何将名为Team的分类变量的每个唯一值转换为基于字母顺序的整数值：

您可以使用以下语法在Python中执行多列标签编码：

 from sklearn. preprocessing import LabelEncoder

#perform label encoding on col1, col2 columns
df[[' col1 ', ' col2 ']] = df[[' col1 ', ' col2 ']]. apply (LabelEncoder(). fit_transform )

以下示例展示了如何在实践中使用此语法。

示例：在 Python 中编码标签

假设我们有以下 pandas DataFrame，其中包含有关各种篮球运动员的信息：

 import pandas as pd

#createDataFrame
df = pd. DataFrame ({' team ': ['A', 'A', 'B', 'B', 'B', 'C', 'C', 'D'],
                   ' position ': ['G', 'F', 'G', 'F', 'F', 'G', 'G', 'F'],
                   ' all_star ': ['Y', 'N', 'Y', 'Y', 'Y', 'N', 'Y', 'N'],
                   ' points ': [11, 8, 10, 6, 6, 5, 9, 12]})

#view DataFrame
print (df)

  team position all_star points
0 AGY 11
1 AFN 8
2 BGY 10
3 BFY 6
4 BFY 6
5 CGN 5
6 CGY 9
7 DFN 12

我们可以使用以下代码执行标签编码，将team 、 position和all_star列中的每个分类值转换为整数值：

 from sklearn. preprocessing import LabelEncoder

#perform label encoding across team, position, and all_star columns
df[[' team ', ' position ', ' all_star ']] = df[[' team ', ' position ', ' all_star ']]. apply (LabelEncoder(). fit_transform )

#view udpated DataFrame
print (df)

   team position all_star points
0 0 1 1 11
1 0 0 0 8
2 1 1 1 10
3 1 0 1 6
4 1 0 1 6
5 2 1 0 5
6 2 1 1 9
7 3 0 0 12

从结果中我们可以看到team 、 position和all_star列的每个值都已转换为整数值。

例如，在团队一栏中我们可以看到：

每个“A”值已转换为0 。
每个“B”值已转换为1 。
每个“C”值都转换为2 。
每个“D”值都转换为3 。

请注意，在本示例中，我们对 DataFrame 的三列执行了标签编码，但我们可以使用类似的语法对任意数量的分类列执行标签编码。

其他资源

以下教程解释了如何在 Python 中执行其他常见任务：

如何在 Pandas 中将分类变量转换为数值
 如何在 Pandas 中将布尔值转换为整数值
 如何使用 Factorize() 将字符串编码为 Pandas 中的数字

关于作者

本杰明·安德森博

大家好，我是本杰明，一位退休的统计学教授，后来成为 Statorials 的热心教师。凭借在统计领域的丰富经验和专业知识，我渴望分享我的知识，通过 Statorials 增强学生的能力。了解更多

示例：在 Python 中编码标签

其他资源

关于作者

本杰明·安德森博

添加评论