udacity/python/Supervised Learning/Model Evaluation Metrics/Classification Metrics/Classification_Metrics.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Our Mission\n",
    "\n",
    "In this lesson you gained some insight into a number of techniques used to understand how well our model is performing.  This notebook is aimed at giving you some practice with the metrics specifically related to classification problems.  With that in mind, we will again be looking at the spam dataset from the earlier lessons.\n",
    "\n",
    "First, run the cell below to prepare the data and instantiate a number of different models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import our libraries\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from sklearn.model_selection import train_test_split\n",
    "from sklearn.feature_extraction.text import CountVectorizer\n",
    "from sklearn.naive_bayes import MultinomialNB\n",
    "from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score\n",
    "from sklearn.ensemble import BaggingClassifier, RandomForestClassifier, AdaBoostClassifier\n",
    "from sklearn.svm import SVC\n",
    "import tests as t\n",
    "\n",
    "# Read in our dataset\n",
    "df = pd.read_table('smsspamcollection/SMSSpamCollection',\n",
    "                   sep='\\t', \n",
    "                   header=None, \n",
    "                   names=['label', 'sms_message'])\n",
    "\n",
    "# Fix our response value\n",
    "df['label'] = df.label.map({'ham':0, 'spam':1})\n",
    "\n",
    "# Split our dataset into training and testing data\n",
    "X_train, X_test, y_train, y_test = train_test_split(df['sms_message'], \n",
    "                                                    df['label'], \n",
    "                                                    random_state=1)\n",
    "\n",
    "# Instantiate the CountVectorizer method\n",
    "count_vector = CountVectorizer()\n",
    "\n",
    "# Fit the training data and then return the matrix\n",
    "training_data = count_vector.fit_transform(X_train)\n",
    "\n",
    "# Transform testing data and return the matrix. Note we are not fitting the testing data into the CountVectorizer()\n",
    "testing_data = count_vector.transform(X_test)\n",
    "\n",
    "# Instantiate a number of our models\n",
    "naive_bayes = MultinomialNB()\n",
    "bag_mod = BaggingClassifier(n_estimators=200)\n",
    "rf_mod = RandomForestClassifier(n_estimators=200)\n",
    "ada_mod = AdaBoostClassifier(n_estimators=300, learning_rate=0.2)\n",
    "svm_mod = SVC()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Step 1**: Now, fit each of the above models to the appropriate data.  Answer the following question to assure that you fit the models correctly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Fit each of the 4 models\n",
    "# This might take some time to run\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The models you fit above were fit on which data?\n",
    "\n",
    "a = 'X_train'\n",
    "b = 'X_test'\n",
    "c = 'y_train'\n",
    "d = 'y_test'\n",
    "e = 'training_data'\n",
    "f = 'testing_data'\n",
    "\n",
    "# Change models_fit_on to only contain the correct string names\n",
    "# of values that you oassed to the above models\n",
    "\n",
    "models_fit_on = {a, b, c, d, e, f} # update this to only contain correct letters\n",
    "\n",
    "# Checks your solution - don't change this\n",
    "t.test_one(models_fit_on)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Step 2**: Now make predictions for each of your models on the data that will allow you to understand how well our model will extend to new data.  Then correctly add the strings to the set in the following cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Make predictions using each of your models\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Which data was used in the predict method to see how well your\n",
    "# model would work on new data?\n",
    "\n",
    "a = 'X_train'\n",
    "b = 'X_test'\n",
    "c = 'y_train'\n",
    "d = 'y_test'\n",
    "e = 'training_data'\n",
    "f = 'testing_data'\n",
    "\n",
    "# Change models_predict_on to only contain the correct string names\n",
    "# of values that you oassed to the above models\n",
    "\n",
    "models_predict_on = {a, b, c, d, e, f} # update this to only contain correct letters\n",
    "\n",
    "# Checks your solution - don't change this\n",
    "t.test_two(models_predict_on)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that you have set up all your predictions, let's get to topics addressed in this lesson - measuring how well each of your models performed. First, we will focus on how each metric was calculated for a single model, and then in the final part of this notebook, you will choose models that are best based on a particular metric.\n",
    "\n",
    "You will be writing functions to calculate a number of metrics and then comparing the values to what you get from sklearn.  This will help you build intuition for how each metric is calculated.\n",
    "\n",
    "> **Step 3**: As an example of how this will work for the upcoming questions, run the cell below.  Fill in the below function to calculate accuracy, and then compare your answer to the built in to assure you are correct."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# accuracy is the total correct divided by the total to predict\n",
    "def accuracy(actual, preds):\n",
    "    '''\n",
    "    INPUT\n",
    "    preds - predictions as a numpy array or pandas series\n",
    "    actual - actual values as a numpy array or pandas series\n",
    "    \n",
    "    OUTPUT:\n",
    "    returns the accuracy as a float\n",
    "    '''\n",
    "    return np.sum(preds == actual)/len(actual)\n",
    "\n",
    "\n",
    "print(accuracy(y_test, preds_nb))\n",
    "print(accuracy_score(y_test, preds_nb))\n",
    "print(\"Since these match, we correctly calculated our metric!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Step 4**: Fill in the below function to calculate precision, and then compare your answer to the built in to assure you are correct."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# precision is the true positives over the predicted positive values\n",
    "def precision(actual, preds):\n",
    "    '''\n",
    "    INPUT\n",
    "    (assumes positive = 1 and negative = 0)\n",
    "    preds - predictions as a numpy array or pandas series \n",
    "    actual - actual values as a numpy array or pandas series\n",
    "    \n",
    "    OUTPUT:\n",
    "    returns the precision as a float\n",
    "    '''\n",
    "    \n",
    "    return None # calculate precision here\n",
    "\n",
    "\n",
    "print(precision(y_test, preds_nb))\n",
    "print(precision_score(y_test, preds_nb))\n",
    "print(\"If the above match, you got it!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Step 5**: Fill in the below function to calculate recall, and then compare your answer to the built in to assure you are correct."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# recall is true positives over all actual positive values\n",
    "def recall(actual, preds):\n",
    "    '''\n",
    "    INPUT\n",
    "    preds - predictions as a numpy array or pandas series\n",
    "    actual - actual values as a numpy array or pandas series\n",
    "    \n",
    "    OUTPUT:\n",
    "    returns the recall as a float\n",
    "    '''\n",
    "\n",
    "    return None # calculate recall here\n",
    "\n",
    "\n",
    "print(recall(y_test, preds_nb))\n",
    "print(recall_score(y_test, preds_nb))\n",
    "print(\"If the above match, you got it!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Step 6**: Fill in the below function to calculate f1-score, and then compare your answer to the built in to assure you are correct."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# f1_score is 2*(precision*recall)/(precision+recall))\n",
    "def f1(preds, actual):\n",
    "    '''\n",
    "    INPUT\n",
    "    preds - predictions as a numpy array or pandas series\n",
    "    actual - actual values as a numpy array or pandas series\n",
    "    \n",
    "    OUTPUT:\n",
    "    returns the f1score as a float\n",
    "    '''\n",
    "    \n",
    "    return None # calculate f1-score here\n",
    "\n",
    "\n",
    "print(f1(y_test, preds_nb))\n",
    "print(f1_score(y_test, preds_nb))\n",
    "print(\"If the above match, you got it!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Step 7:** Now that you have calculated a number of different metrics, let's tie that to when we might use one versus another.  Use the dictionary below to match a metric to each statement that identifies when you would want to use that metric."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# add the letter of the most appropriate metric to each statement\n",
    "# in the dictionary\n",
    "a = \"recall\"\n",
    "b = \"precision\"\n",
    "c = \"accuracy\"\n",
    "d = 'f1-score'\n",
    "\n",
    "\n",
    "seven_sol = {\n",
    "'We have imbalanced classes, which metric do we definitely not want to use?': None # letter here,\n",
    "'We really want to make sure the positive cases are all caught even if that means we identify some negatives as positives': None # letter here,    \n",
    "'When we identify something as positive, we want to be sure it is truly positive': None # letter here, \n",
    "'We care equally about identifying positive and negative cases': None # letter here    \n",
    "}\n",
    "\n",
    "t.sol_seven(seven_sol)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Step 8:** Given what you know about the metrics now, use this information to correctly match the appropriate model to when it would be best to use each in the dictionary below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# use the answers you found to the previous questiona, then match the model that did best for each metric\n",
    "a = \"naive-bayes\"\n",
    "b = \"bagging\"\n",
    "c = \"random-forest\"\n",
    "d = 'ada-boost'\n",
    "e = \"svm\"\n",
    "\n",
    "\n",
    "eight_sol = {\n",
    "'We have imbalanced classes, which metric do we definitely not want to use?': None # letter here,\n",
    "'We really want to make sure the positive cases are all caught even if that means we identify some negatives as positives': None # letter here,    \n",
    "'When we identify something as positive, we want to be sure it is truly positive': None # letter here, \n",
    "'We care equally about identifying positive and negative cases': None # letter here  \n",
    "}\n",
    "\n",
    "t.sol_eight(eight_sol)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# cells for work"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# If you get stuck, also notice there is a solution available by hitting the orange button in the top left"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As a final step in this workbook, let's take a look at the last three metrics you saw, f-beta scores, ROC curves, and AUC.\n",
    "\n",
    "**For f-beta scores:** If you decide that you care more about precision, you should move beta closer to 0.  If you decide you care more about recall, you should move beta towards infinity. \n",
    "\n",
    "> **Step 9:** Using the fbeta_score works similar to most of the other metrics in sklearn, but you also need to set beta as your weighting between precision and recall.  Use the space below to show that you can use [fbeta in sklearn](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.fbeta_score.html) to replicate your f1-score from above.  If in the future you want to use a different weighting, [this article](http://mlwiki.org/index.php/Precision_and_Recall) does an amazing job of explaining how you might adjust beta for different situations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import fbeta_score\n",
    "\n",
    "\n",
    "# Show that you can produce the same f1_score results using fbeta_score\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **Step 10:** Building ROC curves in python is a pretty involved process on your own.  I wrote the function below to assist with the process and make it easier for you to do so in the future as well.  Try it out using one of the other classifiers you created above to see how it compares to the random forest model below.\n",
    "\n",
    "Run the cell below to build a ROC curve, and retrieve the AUC for the random forest model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Function for calculating auc and roc\n",
    "\n",
    "def build_roc_auc(model, X_train, X_test, y_train, y_test):\n",
    "    '''\n",
    "    INPUT:\n",
    "    model - an sklearn instantiated model\n",
    "    X_train - the training data\n",
    "    y_train - the training response values (must be categorical)\n",
    "    X_test - the test data\n",
    "    y_test - the test response values (must be categorical)\n",
    "    OUTPUT:\n",
    "    auc - returns auc as a float\n",
    "    prints the roc curve\n",
    "    '''\n",
    "    import numpy as np\n",
    "    import matplotlib.pyplot as plt\n",
    "    from itertools import cycle\n",
    "    from sklearn.metrics import roc_curve, auc, roc_auc_score\n",
    "    from scipy import interp\n",
    "    \n",
    "    y_preds = model.fit(X_train, y_train).predict_proba(X_test)\n",
    "    # Compute ROC curve and ROC area for each class\n",
    "    fpr = dict()\n",
    "    tpr = dict()\n",
    "    roc_auc = dict()\n",
    "    for i in range(len(y_test)):\n",
    "        fpr[i], tpr[i], _ = roc_curve(y_test, y_preds[:, 1])\n",
    "        roc_auc[i] = auc(fpr[i], tpr[i])\n",
    "\n",
    "    # Compute micro-average ROC curve and ROC area\n",
    "    fpr[\"micro\"], tpr[\"micro\"], _ = roc_curve(y_test.ravel(), y_preds[:, 1].ravel())\n",
    "    roc_auc[\"micro\"] = auc(fpr[\"micro\"], tpr[\"micro\"])\n",
    "    \n",
    "    plt.plot(fpr[2], tpr[2], color='darkorange',\n",
    "             lw=2, label='ROC curve (area = %0.2f)' % roc_auc[2])\n",
    "    plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')\n",
    "    plt.xlim([0.0, 1.0])\n",
    "    plt.ylim([0.0, 1.05])\n",
    "    plt.xlabel('False Positive Rate')\n",
    "    plt.ylabel('True Positive Rate')\n",
    "    plt.title('Receiver operating characteristic example')\n",
    "    plt.show()\n",
    "    \n",
    "    return roc_auc_score(y_test, np.round(y_preds[:, 1]))\n",
    "    \n",
    "    \n",
    "# Finding roc and auc for the random forest model    \n",
    "build_roc_auc(rf_mod, training_data, testing_data, y_train, y_test) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Your turn here - choose another classifier to see how it compares\n",
    "\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}