利用scikitlearn画ROC曲线实例

当我们使用机器学习模型时，我们通常需要在模型的性能方面进行评估。评估分类模型性能的一种常用方法是绘制ROC曲线。实现ROC曲线的方法之一是使用Python中的Scikit-Learn库。以下是一个完整的示例，该示例演示了如何使用Scikit-Learn库绘制ROC曲线。

数据集选择和预处理

在开始绘制ROC曲线之前，首先需要准备数据集。以下是一个简单的数据集示例：

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

X, y = make_classification(n_samples=1000, n_classes=2, n_features=10, n_informative=5, random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

该数据集由一个包含10个特征的输入向量和与之对应的二进制分类标签组成。在此示例中，我们使用Scikit-Learn库中的make_classification函数来生成一个随机数据集。我们还使用train_test_split函数将数据集划分为训练集和测试集。

训练模型

接下来，我们需要训练一个分类器模型来对数据进行分类。以下是一个简单的示例，该示例使用Scikit-Learn库中的Logistic回归模型：

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X_train, y_train)

我们使用LogisticRegression类初始化模型，并使用训练集对其进行拟合。

绘制ROC曲线

一旦我们拟合了分类器模型，我们就可以使用测试集来预测分类标签，并使用roc_curve函数计算ROC曲线中的真阳性率和假阳性率。以下是一个完整的代码示例，该示例演示了如何使用Scikit-Learn库绘制ROC曲线：

from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt

y_pred_proba = model.predict_proba(X_test)[:,1]
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)

plt.plot(fpr, tpr)
plt.plot([0, 1], [0, 1], linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.show()

首先，我们使用predict_proba函数计算每个测试集样本的预测概率。然后，我们使用roc_curve函数计算真阳性率和假阳性率，并返回一个包含这两个指标及其对应的分类阈值的数组。

最后，我们使用Matplotlib库中的plot函数来绘制ROC曲线。我们在图形上还添加了一条虚线，该虚线是ROC曲线空间中对角线的表示，它表示随机猜测分类器。

示例1

为了更好地说明ROC曲线的绘制和解释，我们可以使用另一个数据集，并使用逻辑回归分类器来预测其中的标签。以下代码演示了如何使用Scikit-Learn库从数据集中读取数据并训练逻辑回归分类器：

from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_curve
import matplotlib.pyplot as plt

breast_cancer_data = load_breast_cancer()
X = breast_cancer_data.data
y = breast_cancer_data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

model = LogisticRegression()
model.fit(X_train, y_train)

y_pred_proba = model.predict_proba(X_test)[:,1]
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)

plt.plot(fpr, tpr)
plt.plot([0, 1], [0, 1], linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.show()

示例2

ROC曲线还可以用于比较不同模型或分类算法之间的性能。下面是一个演示比较两种不同分类器算法的ROC曲线的代码示例。

from sklearn.datasets import make_classification
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_curve, roc_auc_score
import matplotlib.pyplot as plt

X, y = make_classification(n_samples=1000, n_classes=2, n_features=10, n_informative=5, random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

model_svc = SVC(probability=True)
model_rf = RandomForestClassifier()

model_svc.fit(X_train, y_train)
model_rf.fit(X_train, y_train)

y_pred_proba_svc = model_svc.predict_proba(X_test)[:,1]
y_pred_proba_rf = model_rf.predict_proba(X_test)[:,1]

fpr_svc, tpr_svc, thresholds_svc = roc_curve(y_test, y_pred_proba_svc)
fpr_rf, tpr_rf, thresholds_rf = roc_curve(y_test, y_pred_proba_rf)

plt.plot(fpr_svc, tpr_svc, label='SVC')
plt.plot(fpr_rf, tpr_rf, label='Random Forest')
plt.plot([0, 1], [0, 1], linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()
plt.show()

在此示例中，我们使用Scikit-Learn库中的SVC和RandomForestClassifier类训练了两个不同的分类器。同时，我们计算了每个分类器的ROC曲线，并将两个曲线绘制在同一个图形上，以便比较它们的性能。注意，在此示例中，我们还使用Matplotlib库中的legend函数添加了一些文本，以说明每个曲线代表的分类器。

本站文章如无特殊说明，均为本站原创，如若转载，请注明出处：利用scikitlearn画ROC曲线实例 - Python技术站

利用scikitlearn画ROC曲线实例

数据集选择和预处理

训练模型

绘制ROC曲线

示例1

示例2

相关文章