
We do not expect you in this class to learn every detail of the models.
We will start with a linear kernel, which tries to construct hyper-planes to seperate the data.
We are now going to use a new kernel: RBF, this will create new dimensions that aren't linear. You do not need to know the details of how this works (that is for later coursework).
We use make_circles because it gives us control over the data and it's separation; we don't have to clean or standardize it.
##imports
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.datasets import make_circles
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, roc_curve, roc_auc_score
X,y = make_circles(n_samples = 100, random_state = 3)
## Plot Circles
plt.scatter(X[:,0], X[:,1], c=y)
plt.xlabel(r'$x_0$'); plt.ylabel(r'$x_1$')
Text(0, 0.5, '$x_1$')
fig = plt.figure(figsize = (10, 7))
ax = plt.axes(projection ="3d")
ax.scatter3D(X[:,0], X[:,1], 0, c=y)
<mpl_toolkits.mplot3d.art3d.Path3DCollection at 0x7ff7d85f8fd0>
X,y = make_circles(n_samples = 1000, random_state = 3)
## Plot Blobs
plt.scatter(X[:,0], X[:,1], c=y)
plt.xlabel(r'$x_0$'); plt.ylabel(r'$x_1$')
Text(0, 0.5, '$x_1$')
## Split the data
train_vectors, test_vectors, train_labels, test_labels = train_test_split(X, y, test_size=0.25)
## Fit with a linear kernel
cls = SVC(kernel="linear", C=10)
cls.fit(train_vectors,train_labels)
## Print the accuracy
print('Accuracy: ', cls.score(test_vectors, test_labels))
Accuracy: 0.44
## Use the model to predict
y_pred = cls.predict(test_vectors)
print("Classification Report:\n", classification_report(test_labels, y_pred))
print("Confusion Matrix:\n", confusion_matrix(test_labels, y_pred))
Classification Report:
precision recall f1-score support
0 0.45 0.43 0.44 129
1 0.43 0.45 0.44 121
accuracy 0.44 250
macro avg 0.44 0.44 0.44 250
weighted avg 0.44 0.44 0.44 250
Confusion Matrix:
[[55 74]
[66 55]]
## Construct the ROC and the AUC
fpr, tpr, thresholds = roc_curve(test_labels, y_pred)
auc = np.round(roc_auc_score(test_labels, y_pred),3)
plt.plot(fpr,tpr)
plt.plot([0,1],[0,1], 'k--')
plt.xlabel('FPR'); plt.ylabel('TPR'); plt.text(0.6,0.2, "AUC:"+str(auc));
## Fit with a RBF kernel
cls_rbf = SVC(kernel="rbf", C=10)
cls_rbf.fit(train_vectors,train_labels)
## Print the accuracy
print('Accuracy: ', cls_rbf.score(test_vectors, test_labels))
Accuracy: 1.0
## Use the model to predict
y_pred = cls_rbf.predict(test_vectors)
print("Classification Report:\n", classification_report(test_labels, y_pred))
print("Confusion Matrix:\n", confusion_matrix(test_labels, y_pred))
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 129
1 1.00 1.00 1.00 121
accuracy 1.00 250
macro avg 1.00 1.00 1.00 250
weighted avg 1.00 1.00 1.00 250
Confusion Matrix:
[[129 0]
[ 0 121]]
## Construct the ROC and the AUC
fpr, tpr, thresholds = roc_curve(test_labels, y_pred)
auc = np.round(roc_auc_score(test_labels, y_pred),3)
plt.plot(fpr,tpr)
plt.plot([0,1],[0,1], 'k--')
plt.xlabel('FPR'); plt.ylabel('TPR'); plt.text(0.6,0.2, "AUC:"+str(auc));
In the construction of the SVM: cls = svm.SVC(kernel="linear", C=10), C is a hyperparameter that we can adjust. sklearn has a mechanism to do this automatically via a search and find the "best" choice: GridSearchCV.
Please ask lots of questions about what the code is doing today because you are not writing a lot of code today!