Finished Model Evaluation Metrics

2019-07-12 02:01:41 +01:00
parent b5dd5aa345
commit af3c2caa6a
14 changed files with 8668 additions and 0 deletions
--- a/Metrics/Classification_Metrics.ipynb
+++ b/Metrics/Classification_Metrics.ipynb
@@ -0,0 +1,487 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Our Mission\n",
+    "\n",
+    "In this lesson you gained some insight into a number of techniques used to understand how well our model is performing.  This notebook is aimed at giving you some practice with the metrics specifically related to classification problems.  With that in mind, we will again be looking at the spam dataset from the earlier lessons.\n",
+    "\n",
+    "First, run the cell below to prepare the data and instantiate a number of different models."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Import our libraries\n",
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.feature_extraction.text import CountVectorizer\n",
+    "from sklearn.naive_bayes import MultinomialNB\n",
+    "from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score\n",
+    "from sklearn.ensemble import BaggingClassifier, RandomForestClassifier, AdaBoostClassifier\n",
+    "from sklearn.svm import SVC\n",
+    "import tests as t\n",
+    "\n",
+    "# Read in our dataset\n",
+    "df = pd.read_table('smsspamcollection/SMSSpamCollection',\n",
+    "                   sep='\\t', \n",
+    "                   header=None, \n",
+    "                   names=['label', 'sms_message'])\n",
+    "\n",
+    "# Fix our response value\n",
+    "df['label'] = df.label.map({'ham':0, 'spam':1})\n",
+    "\n",
+    "# Split our dataset into training and testing data\n",
+    "X_train, X_test, y_train, y_test = train_test_split(df['sms_message'], \n",
+    "                                                    df['label'], \n",
+    "                                                    random_state=1)\n",
+    "\n",
+    "# Instantiate the CountVectorizer method\n",
+    "count_vector = CountVectorizer()\n",
+    "\n",
+    "# Fit the training data and then return the matrix\n",
+    "training_data = count_vector.fit_transform(X_train)\n",
+    "\n",
+    "# Transform testing data and return the matrix. Note we are not fitting the testing data into the CountVectorizer()\n",
+    "testing_data = count_vector.transform(X_test)\n",
+    "\n",
+    "# Instantiate a number of our models\n",
+    "naive_bayes = MultinomialNB()\n",
+    "bag_mod = BaggingClassifier(n_estimators=200)\n",
+    "rf_mod = RandomForestClassifier(n_estimators=200)\n",
+    "ada_mod = AdaBoostClassifier(n_estimators=300, learning_rate=0.2)\n",
+    "svm_mod = SVC()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> **Step 1**: Now, fit each of the above models to the appropriate data.  Answer the following question to assure that you fit the models correctly."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Fit each of the 4 models\n",
+    "# This might take some time to run\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# The models you fit above were fit on which data?\n",
+    "\n",
+    "a = 'X_train'\n",
+    "b = 'X_test'\n",
+    "c = 'y_train'\n",
+    "d = 'y_test'\n",
+    "e = 'training_data'\n",
+    "f = 'testing_data'\n",
+    "\n",
+    "# Change models_fit_on to only contain the correct string names\n",
+    "# of values that you oassed to the above models\n",
+    "\n",
+    "models_fit_on = {a, b, c, d, e, f} # update this to only contain correct letters\n",
+    "\n",
+    "# Checks your solution - don't change this\n",
+    "t.test_one(models_fit_on)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> **Step 2**: Now make predictions for each of your models on the data that will allow you to understand how well our model will extend to new data.  Then correctly add the strings to the set in the following cell."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Make predictions using each of your models\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Which data was used in the predict method to see how well your\n",
+    "# model would work on new data?\n",
+    "\n",
+    "a = 'X_train'\n",
+    "b = 'X_test'\n",
+    "c = 'y_train'\n",
+    "d = 'y_test'\n",
+    "e = 'training_data'\n",
+    "f = 'testing_data'\n",
+    "\n",
+    "# Change models_predict_on to only contain the correct string names\n",
+    "# of values that you oassed to the above models\n",
+    "\n",
+    "models_predict_on = {a, b, c, d, e, f} # update this to only contain correct letters\n",
+    "\n",
+    "# Checks your solution - don't change this\n",
+    "t.test_two(models_predict_on)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now that you have set up all your predictions, let's get to topics addressed in this lesson - measuring how well each of your models performed. First, we will focus on how each metric was calculated for a single model, and then in the final part of this notebook, you will choose models that are best based on a particular metric.\n",
+    "\n",
+    "You will be writing functions to calculate a number of metrics and then comparing the values to what you get from sklearn.  This will help you build intuition for how each metric is calculated.\n",
+    "\n",
+    "> **Step 3**: As an example of how this will work for the upcoming questions, run the cell below.  Fill in the below function to calculate accuracy, and then compare your answer to the built in to assure you are correct."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# accuracy is the total correct divided by the total to predict\n",
+    "def accuracy(actual, preds):\n",
+    "    '''\n",
+    "    INPUT\n",
+    "    preds - predictions as a numpy array or pandas series\n",
+    "    actual - actual values as a numpy array or pandas series\n",
+    "    \n",
+    "    OUTPUT:\n",
+    "    returns the accuracy as a float\n",
+    "    '''\n",
+    "    return np.sum(preds == actual)/len(actual)\n",
+    "\n",
+    "\n",
+    "print(accuracy(y_test, preds_nb))\n",
+    "print(accuracy_score(y_test, preds_nb))\n",
+    "print(\"Since these match, we correctly calculated our metric!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> **Step 4**: Fill in the below function to calculate precision, and then compare your answer to the built in to assure you are correct."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# precision is the true positives over the predicted positive values\n",
+    "def precision(actual, preds):\n",
+    "    '''\n",
+    "    INPUT\n",
+    "    (assumes positive = 1 and negative = 0)\n",
+    "    preds - predictions as a numpy array or pandas series \n",
+    "    actual - actual values as a numpy array or pandas series\n",
+    "    \n",
+    "    OUTPUT:\n",
+    "    returns the precision as a float\n",
+    "    '''\n",
+    "    \n",
+    "    return None # calculate precision here\n",
+    "\n",
+    "\n",
+    "print(precision(y_test, preds_nb))\n",
+    "print(precision_score(y_test, preds_nb))\n",
+    "print(\"If the above match, you got it!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> **Step 5**: Fill in the below function to calculate recall, and then compare your answer to the built in to assure you are correct."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# recall is true positives over all actual positive values\n",
+    "def recall(actual, preds):\n",
+    "    '''\n",
+    "    INPUT\n",
+    "    preds - predictions as a numpy array or pandas series\n",
+    "    actual - actual values as a numpy array or pandas series\n",
+    "    \n",
+    "    OUTPUT:\n",
+    "    returns the recall as a float\n",
+    "    '''\n",
+    "\n",
+    "    return None # calculate recall here\n",
+    "\n",
+    "\n",
+    "print(recall(y_test, preds_nb))\n",
+    "print(recall_score(y_test, preds_nb))\n",
+    "print(\"If the above match, you got it!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> **Step 6**: Fill in the below function to calculate f1-score, and then compare your answer to the built in to assure you are correct."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# f1_score is 2*(precision*recall)/(precision+recall))\n",
+    "def f1(preds, actual):\n",
+    "    '''\n",
+    "    INPUT\n",
+    "    preds - predictions as a numpy array or pandas series\n",
+    "    actual - actual values as a numpy array or pandas series\n",
+    "    \n",
+    "    OUTPUT:\n",
+    "    returns the f1score as a float\n",
+    "    '''\n",
+    "    \n",
+    "    return None # calculate f1-score here\n",
+    "\n",
+    "\n",
+    "print(f1(y_test, preds_nb))\n",
+    "print(f1_score(y_test, preds_nb))\n",
+    "print(\"If the above match, you got it!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> **Step 7:** Now that you have calculated a number of different metrics, let's tie that to when we might use one versus another.  Use the dictionary below to match a metric to each statement that identifies when you would want to use that metric."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# add the letter of the most appropriate metric to each statement\n",
+    "# in the dictionary\n",
+    "a = \"recall\"\n",
+    "b = \"precision\"\n",
+    "c = \"accuracy\"\n",
+    "d = 'f1-score'\n",
+    "\n",
+    "\n",
+    "seven_sol = {\n",
+    "'We have imbalanced classes, which metric do we definitely not want to use?': None # letter here,\n",
+    "'We really want to make sure the positive cases are all caught even if that means we identify some negatives as positives': None # letter here,    \n",
+    "'When we identify something as positive, we want to be sure it is truly positive': None # letter here, \n",
+    "'We care equally about identifying positive and negative cases': None # letter here    \n",
+    "}\n",
+    "\n",
+    "t.sol_seven(seven_sol)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> **Step 8:** Given what you know about the metrics now, use this information to correctly match the appropriate model to when it would be best to use each in the dictionary below."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# use the answers you found to the previous questiona, then match the model that did best for each metric\n",
+    "a = \"naive-bayes\"\n",
+    "b = \"bagging\"\n",
+    "c = \"random-forest\"\n",
+    "d = 'ada-boost'\n",
+    "e = \"svm\"\n",
+    "\n",
+    "\n",
+    "eight_sol = {\n",
+    "'We have imbalanced classes, which metric do we definitely not want to use?': None # letter here,\n",
+    "'We really want to make sure the positive cases are all caught even if that means we identify some negatives as positives': None # letter here,    \n",
+    "'When we identify something as positive, we want to be sure it is truly positive': None # letter here, \n",
+    "'We care equally about identifying positive and negative cases': None # letter here  \n",
+    "}\n",
+    "\n",
+    "t.sol_eight(eight_sol)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# cells for work"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# If you get stuck, also notice there is a solution available by hitting the orange button in the top left"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As a final step in this workbook, let's take a look at the last three metrics you saw, f-beta scores, ROC curves, and AUC.\n",
+    "\n",
+    "**For f-beta scores:** If you decide that you care more about precision, you should move beta closer to 0.  If you decide you care more about recall, you should move beta towards infinity. \n",
+    "\n",
+    "> **Step 9:** Using the fbeta_score works similar to most of the other metrics in sklearn, but you also need to set beta as your weighting between precision and recall.  Use the space below to show that you can use [fbeta in sklearn](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.fbeta_score.html) to replicate your f1-score from above.  If in the future you want to use a different weighting, [this article](http://mlwiki.org/index.php/Precision_and_Recall) does an amazing job of explaining how you might adjust beta for different situations."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# import fbeta_score\n",
+    "\n",
+    "\n",
+    "# Show that you can produce the same f1_score results using fbeta_score\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> **Step 10:** Building ROC curves in python is a pretty involved process on your own.  I wrote the function below to assist with the process and make it easier for you to do so in the future as well.  Try it out using one of the other classifiers you created above to see how it compares to the random forest model below.\n",
+    "\n",
+    "Run the cell below to build a ROC curve, and retrieve the AUC for the random forest model."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Function for calculating auc and roc\n",
+    "\n",
+    "def build_roc_auc(model, X_train, X_test, y_train, y_test):\n",
+    "    '''\n",
+    "    INPUT:\n",
+    "    model - an sklearn instantiated model\n",
+    "    X_train - the training data\n",
+    "    y_train - the training response values (must be categorical)\n",
+    "    X_test - the test data\n",
+    "    y_test - the test response values (must be categorical)\n",
+    "    OUTPUT:\n",
+    "    auc - returns auc as a float\n",
+    "    prints the roc curve\n",
+    "    '''\n",
+    "    import numpy as np\n",
+    "    import matplotlib.pyplot as plt\n",
+    "    from itertools import cycle\n",
+    "    from sklearn.metrics import roc_curve, auc, roc_auc_score\n",
+    "    from scipy import interp\n",
+    "    \n",
+    "    y_preds = model.fit(X_train, y_train).predict_proba(X_test)\n",
+    "    # Compute ROC curve and ROC area for each class\n",
+    "    fpr = dict()\n",
+    "    tpr = dict()\n",
+    "    roc_auc = dict()\n",
+    "    for i in range(len(y_test)):\n",
+    "        fpr[i], tpr[i], _ = roc_curve(y_test, y_preds[:, 1])\n",
+    "        roc_auc[i] = auc(fpr[i], tpr[i])\n",
+    "\n",
+    "    # Compute micro-average ROC curve and ROC area\n",
+    "    fpr[\"micro\"], tpr[\"micro\"], _ = roc_curve(y_test.ravel(), y_preds[:, 1].ravel())\n",
+    "    roc_auc[\"micro\"] = auc(fpr[\"micro\"], tpr[\"micro\"])\n",
+    "    \n",
+    "    plt.plot(fpr[2], tpr[2], color='darkorange',\n",
+    "             lw=2, label='ROC curve (area = %0.2f)' % roc_auc[2])\n",
+    "    plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')\n",
+    "    plt.xlim([0.0, 1.0])\n",
+    "    plt.ylim([0.0, 1.05])\n",
+    "    plt.xlabel('False Positive Rate')\n",
+    "    plt.ylabel('True Positive Rate')\n",
+    "    plt.title('Receiver operating characteristic example')\n",
+    "    plt.show()\n",
+    "    \n",
+    "    return roc_auc_score(y_test, np.round(y_preds[:, 1]))\n",
+    "    \n",
+    "    \n",
+    "# Finding roc and auc for the random forest model    \n",
+    "build_roc_auc(rf_mod, training_data, testing_data, y_train, y_test) "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Your turn here - choose another classifier to see how it compares\n",
+    "\n",
+    "\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/Metrics/Classification_Metrics_sol.ipynb
+++ b/Metrics/Classification_Metrics_sol.ipynb
--- a/Metrics/classification_metrics.py
+++ b/Metrics/classification_metrics.py
@@ -0,0 +1,239 @@
+# Import our libraries
+import pandas as pd
+import numpy as np
+from sklearn.model_selection import train_test_split
+from sklearn.feature_extraction.text import CountVectorizer
+from sklearn.naive_bayes import MultinomialNB
+from sklearn.metrics import accuracy_score, precision_score, recall_score
+from sklearn.metrics import f1_score, fbeta_score
+from sklearn.ensemble import BaggingClassifier, RandomForestClassifier
+from sklearn.ensemble import AdaBoostClassifier
+from sklearn.svm import SVC
+import matplotlib.pyplot as plt
+from itertools import cycle
+from sklearn.metrics import roc_curve, auc, roc_auc_score
+from scipy import interp
+# import tests as t
+
+# Read in our dataset
+df = pd.read_csv('smsspamcollection_SMSSpamCollection',
+                 header=None,
+                 names=['label', 'sms_message'])
+
+# Fix our response value
+df['label'] = df.label.map({'ham': 0, 'spam': 1})
+
+# Split our dataset into training and testing data
+X_train, X_test, y_train, y_test = train_test_split(df['sms_message'],
+                                                    df['label'],
+                                                    random_state=1)
+
+# Instantiate the CountVectorizer method
+count_vector = CountVectorizer()
+
+# Fit the training data and then return the matrix
+training_data = count_vector.fit_transform(X_train)
+
+# Transform testing data and return the matrix. Note we are not fitting the
+# testing data into the CountVectorizer()
+testing_data = count_vector.transform(X_test)
+
+# Instantiate a number of our models
+naive_bayes = MultinomialNB()
+bag_mod = BaggingClassifier(n_estimators=200)
+rf_mod = RandomForestClassifier(n_estimators=200)
+ada_mod = AdaBoostClassifier(n_estimators=300, learning_rate=0.2)
+svm_mod = SVC()
+
+# Fit each of the 4 models
+# This might take some time to run
+naive_bayes.fit(training_data, y_train)
+bag_mod.fit(training_data, y_train)
+rf_mod.fit(training_data, y_train)
+ada_mod.fit(training_data, y_train)
+svm_mod.fit(training_data, y_train)
+
+
+# Make predictions using each of your models
+nb = naive_bayes.predict(testing_data)
+bag_pred = bag_mod.predict(testing_data)
+rf_pred = rf_mod.predict(testing_data)
+ada_pred = ada_mod.predict(testing_data)
+svm_pred = svm_mod.predict(testing_data)
+
+
+# accuracy is the total correct divided by the total to predict
+def accuracy(actual, preds):
+    '''
+    INPUT
+    preds - predictions as a numpy array or pandas series
+    actual - actual values as a numpy array or pandas series
+
+    OUTPUT:
+    returns the accuracy as a float
+    '''
+    return np.sum(preds == actual) / len(actual)
+
+
+print(accuracy(y_test, nb))
+print(accuracy_score(y_test, nb))
+print("Since these match, we correctly calculated our metric!")
+
+
+# precision is the true positives over the predicted positive values
+def precision(actual, preds):
+    '''
+    INPUT
+    (assumes positive = 1 and negative = 0)
+    preds - predictions as a numpy array or pandas series
+    actual - actual values as a numpy array or pandas series
+
+    OUTPUT:
+    returns the precision as a float
+    '''
+    TP = np.sum((preds == actual) & (preds > 0))
+    FP = np.sum((preds == 1) & (actual == 0))
+    return TP / (TP + FP)
+
+
+print(precision(y_test, nb))
+print(precision_score(y_test, nb))
+print("If the above match, you got it!")
+
+
+# recall is true positives over all actual positive values
+def recall(actual, preds):
+    '''
+    INPUT
+    preds - predictions as a numpy array or pandas series
+    actual - actual values as a numpy array or pandas series
+
+    OUTPUT:
+    returns the recall as a float
+    '''
+    TP = np.sum((preds == actual) & (preds > 0))
+    FN = np.sum((preds == 0) & (actual == 1))
+    return TP / (TP + FN)
+
+
+print(recall(y_test, nb))
+print(recall_score(y_test, nb))
+print("If the above match, you got it!")
+
+
+# f1_score is 2*(precision*recall)/(precision+recall))
+def f1(actual, preds):
+    '''
+    INPUT
+    preds - predictions as a numpy array or pandas series
+    actual - actual values as a numpy array or pandas series
+
+    OUTPUT:
+    returns the f1score as a float
+    '''
+    prec = precision(actual, preds)
+    rec = recall(actual, preds)
+    return 2 * ((prec * rec) / (prec + rec))
+
+
+print(f1(y_test, nb))
+print(f1_score(y_test, nb))
+print("If the above match, you got it!")
+
+
+# add the letter of the most appropriate metric to each statement
+# in the dictionary
+a = "recall"
+b = "precision"
+c = "accuracy"
+d = 'f1-score'
+
+
+seven_sol = {
+    'We have imbalanced classes, which metric do we definitely not want to'
+    ' use?': c,
+    'We really want to make sure the positive cases are all caught even if'
+    ' that means we identify some negatives as positives': a,
+    'When we identify something as positive, we want to be sure it is truly'
+    ' positive': b,
+    'We care equally about identifying positive and negative cases': d
+}
+
+# This gives: That's right!  It isn't really necessary to memorize these in
+# practice, but it is important to know they exist and know why might use one
+# metric over another for a particular situation.
+
+
+models = {'nb': nb,
+          'bag_pred': bag_pred,
+          'rf_pred': rf_pred,
+          'ada_pred': ada_pred,
+          'svm_pred': svm_pred}
+metrics = [accuracy_score, precision_score, recall_score, f1_score]
+
+for i in models:
+    for j in range(len(metrics)):
+        print(f'{metrics[j].__name__} for '
+              f'{i} {metrics[j](y_test, models[i]):.4f}')
+    print()
+
+
+beta = 1
+
+print(f1_score(y_test, nb))
+print(fbeta_score(y_test, nb, beta))
+
+for i in models:
+    print(f'fbeta_score for {i} {fbeta_score(y_test, models[i], beta)}')
+    print(f'f1_score for {i} {f1_score(y_test, models[i], beta)}')
+    print()
+
+
+# Function for calculating auc and roc
+
+def build_roc_auc(model, X_train, X_test, y_train, y_test):
+    '''
+    INPUT:
+    model - an sklearn instantiated model
+    X_train - the training data
+    y_train - the training response values (must be categorical)
+    X_test - the test data
+    y_test - the test response values (must be categorical)
+    OUTPUT:
+    auc - returns auc as a float
+    prints the roc curve
+    '''
+    y_preds = model.fit(X_train, y_train).predict_proba(X_test)
+    # Compute ROC curve and ROC area for each class
+    fpr = dict()
+    tpr = dict()
+    roc_auc = dict()
+    for i in range(len(y_test)):
+        fpr[i], tpr[i], _ = roc_curve(y_test, y_preds[:, 1])
+        roc_auc[i] = auc(fpr[i], tpr[i])
+
+    # Compute micro-average ROC curve and ROC area
+    fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(),
+                                              y_preds[:, 1].ravel())
+    roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])
+
+    plt.plot(fpr[2], tpr[2], color='darkorange',
+             lw=2, label='ROC curve (area = %0.2f)' % roc_auc[2])
+    plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
+    plt.xlim([0.0, 1.0])
+    plt.ylim([0.0, 1.05])
+    plt.xlabel('False Positive Rate')
+    plt.ylabel('True Positive Rate')
+    plt.title('Receiver operating characteristic example')
+    plt.show()
+
+    return roc_auc_score(y_test, np.round(y_preds[:, 1]))
+
+
+instaniated_models = [naive_bayes, bag_mod, rf_mod]
+
+for i in instaniated_models:
+    build_roc_auc(i, training_data, testing_data, y_train, y_test)
+
+print(build_roc_auc(instaniated_models[0], training_data, testing_data,
+                    y_train, y_test))
--- a/Metrics/smsspamcollection_SMSSpamCollection
+++ b/Metrics/smsspamcollection_SMSSpamCollection
--- a/Metrics/Classification
+++ b/Metrics/Classification
@@ -0,0 +1,98 @@
+def test_one(mod_arg):
+    '''
+    INPUT:
+    mod_arg - a set of the strings pertaining to the objects that were passed in the fitting of our models
+
+    OUTPUT:
+    prints correctness of the set
+    nothing returned
+    '''
+    a = 'X_train'
+    b = 'X_test'
+    c = 'y_train'
+    d = 'y_test'
+    e = 'training_data'
+    f = 'testing_data'
+    if mod_arg == {c, e}:
+        print("That's right!  You need to fit on both parts of the data pertaining to training data!")
+    else:
+        print("Oops!  That doesn't look quite right!  Remember you only want to fit your model to the training data!  Notice that X_train hasn't had the data cleaned yet, so that won't work to pass to our fit method. Hint - there are two items you should be passing to your fit method.")
+
+
+def test_two(mod_arg):
+    '''
+    INPUT:
+    model_arg - a set of the strings pertaining to the objects that were passed in the predicting step
+
+    OUTPUT:
+    prints correctness of the set
+    nothing returned
+    '''
+    a = 'X_train'
+    b = 'X_test'
+    c = 'y_train'
+    d = 'y_test'
+    e = 'training_data'
+    f = 'testing_data'
+    if mod_arg == {f}:
+        print("That's right! To see how well our models perform in a new setting, you will want to predict on the test set of data.")
+    else:
+        print("Oops!  That doesn't look quite right!  Remember you will want to predict on test data to know how well your model will do in a new situation.  Hint - there is only one item that should be passed to the predict method of your model.  Also notice that X_test has not been cleaned yet, so this cannot be passed to the predict method!")
+
+
+def sol_seven(seven_sol):
+    '''
+    INPUT: dictionary with correct matching of metrics
+    OUTPUT: nothing returned - prints statement related to correctness of dictionary
+    '''
+
+    a = "recall"
+    b = "precision"
+    c = "accuracy"
+    d = 'f1-score'
+
+    seven_sol_1 = {
+        'We have imbalanced classes, which metric do we definitely not want to use?': c,
+        'We really want to make sure the positive cases are all caught even if that means we identify some negatives as positives': a, 'When we identify something as positive, we want to be sure it is truly positive': b,
+        'We care equally about identifying positive and negative cases': d
+    }
+
+    if seven_sol == seven_sol_1:
+        print("That's right!  It isn't really necessary to memorize these in practice, but it is important to know they exist and know why might use one metric over another for a particular situation.")
+
+    if seven_sol['We have imbalanced classes, which metric do we definitely not want to use?'] != seven_sol_1['We have imbalanced classes, which metric do we definitely not want to use?']:
+        print("Oops!  The first one isn't right.  If we do not have balanced classes, we probably want to stay away from using accuracy.")
+
+    if seven_sol['We really want to make sure the positive cases are all caught even if that means we identify some negatives as positives'] != seven_sol_1['We really want to make sure the positive cases are all caught even if that means we identify some negatives as positives']:
+        print("Oops!  The second one isn't right.  If we really want to be sure about catching positive cases, we should be closely watching recall, which has all of the positive clases in the denominator - so we are monitoring how many of them we get right with recall.")
+
+    if seven_sol['When we identify something as positive, we want to be sure it is truly positive'] != seven_sol_1['When we identify something as positive, we want to be sure it is truly positive']:
+        print("Oops!  The third one isn't right.  Using precision, we have the predicted positives in the denominator.  Therefore, this will help us be sure the items we identify as positive are actually positive.")
+
+    if seven_sol['We care equally about identifying positive and negative cases'] != seven_sol_1['We care equally about identifying positive and negative cases']:
+        print("Oops!  The last one isn't right.  If we care equally about precision and recall, we should use f1 score.")
+
+
+def sol_eight(eight_sol):
+    '''
+    INPUT: dictionary with correct matching of metrics
+    OUTPUT: nothing returned - prints statement related to correctness of dictionary
+    '''
+    a = "naive-bayes"
+    b = "bagging"
+    c = "random-forest"
+    d = 'ada-boost'
+    e = "svm"
+
+    eight_sol_1 = {
+        'We have imbalanced classes, which metric do we definitely not want to use?': a,
+        'We really want to make sure the positive cases are all caught even if that means we identify some negatives as positives': a,
+        'When we identify something as positive, we want to be sure it is truly positive': c,
+        'We care equally about identifying positive and negative cases': a
+    }
+
+    if eight_sol_1 == eight_sol:
+        print("That's right!  Naive Bayes was the best model for all of our metrics except precision!")
+
+    else:
+        print("Oops!  That doesn't look right.  Make sure you are performing your predictions and matching on the test data.  Hint: The naive bayes model actually performs best on all of the metrics except one.  Try again!")