#藉由迭代的方式来计算不同参数对模型的影响,并返回交叉验证后的平均准确率 for k in k_range: knn = KNeighborsClassifier(n_neighbors=k) scores = cross_val_score(knn, X, y, cv=10, scoring='accuracy') k_scores.append(scores.mean())
#可视化数据 plt.plot(k_range, k_scores) plt.xlabel('Value of K for KNN') plt.ylabel('Cross-Validated Accuracy') plt.show()
从图中得知,选择 12~18 的 k 值最好。高过 18 之后,准确率开始下降则是因为过拟合(Over fitting)的问题。
import matplotlib.pyplot as plt k_range = range(1, 31) k_scores = [] for k in k_range: knn = KNeighborsClassifier(n_neighbors=k) loss = -cross_val_score(knn, X, y, cv=10, scoring='mean_squared_error') k_scores.append(loss.mean())
plt.plot(k_range, k_scores) plt.xlabel('Value of K for KNN') plt.ylabel('Cross-Validated MSE') plt.show()
Checking if Disqus is accessible...