completed deep learning section

2019-07-24 00:09:38 +01:00
parent bb46088e76
commit 46660262e0
18 changed files with 7161 additions and 0 deletions
--- a/PyTorch/.ipynb_checkpoints/Part
+++ b/PyTorch/.ipynb_checkpoints/Part
--- a/PyTorch/.ipynb_checkpoints/Untitled-checkpoint.ipynb
+++ b/PyTorch/.ipynb_checkpoints/Untitled-checkpoint.ipynb
@@ -0,0 +1,6 @@
+{
+ "cells": [],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/(Exercises).ipynb
+++ b/(Exercises).ipynb
@@ -0,0 +1,475 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Introduction to Deep Learning with PyTorch\n",
+    "\n",
+    "In this notebook, you'll get introduced to [PyTorch](http://pytorch.org/), a framework for building and training neural networks. PyTorch in a lot of ways behaves like the arrays you love from Numpy. These Numpy arrays, after all, are just tensors. PyTorch takes these tensors and makes it simple to move them to GPUs for the faster processing needed when training neural networks. It also provides a module that automatically calculates gradients (for backpropagation!) and another module specifically for building neural networks. All together, PyTorch ends up being more coherent with Python and the Numpy/Scipy stack compared to TensorFlow and other frameworks.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Neural Networks\n",
+    "\n",
+    "Deep Learning is based on artificial neural networks which have been around in some form since the late 1950s. The networks are built from individual parts approximating neurons, typically called units or simply \"neurons.\" Each unit has some number of weighted inputs. These weighted inputs are summed together (a linear combination) then passed through an activation function to get the unit's output.\n",
+    "\n",
+    "<img src=\"assets/simple_neuron.png\" width=400px>\n",
+    "\n",
+    "Mathematically this looks like: \n",
+    "\n",
+    "$$\n",
+    "\\begin{align}\n",
+    "y &= f(w_1 x_1 + w_2 x_2 + b) \\\\\n",
+    "y &= f\\left(\\sum_i w_i x_i +b \\right)\n",
+    "\\end{align}\n",
+    "$$\n",
+    "\n",
+    "With vectors this is the dot/inner product of two vectors:\n",
+    "\n",
+    "$$\n",
+    "h = \\begin{bmatrix}\n",
+    "x_1 \\, x_2 \\cdots  x_n\n",
+    "\\end{bmatrix}\n",
+    "\\cdot \n",
+    "\\begin{bmatrix}\n",
+    "           w_1 \\\\\n",
+    "           w_2 \\\\\n",
+    "           \\vdots \\\\\n",
+    "           w_n\n",
+    "\\end{bmatrix}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Tensors\n",
+    "\n",
+    "It turns out neural network computations are just a bunch of linear algebra operations on *tensors*, a generalization of matrices. A vector is a 1-dimensional tensor, a matrix is a 2-dimensional tensor, an array with three indices is a 3-dimensional tensor (RGB color images for example). The fundamental data structure for neural networks are tensors and PyTorch (as well as pretty much every other deep learning framework) is built around tensors.\n",
+    "\n",
+    "<img src=\"assets/tensor_examples.svg\" width=600px>\n",
+    "\n",
+    "With the basics covered, it's time to explore how we can use PyTorch to build a simple neural network."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# First, import PyTorch\n",
+    "import torch"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def activation(x):\n",
+    "    \"\"\" Sigmoid activation function \n",
+    "    \n",
+    "        Arguments\n",
+    "        ---------\n",
+    "        x: torch.Tensor\n",
+    "    \"\"\"\n",
+    "    return 1/(1+torch.exp(-x))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "### Generate some data\n",
+    "torch.manual_seed(7) # Set the random seed so things are predictable\n",
+    "\n",
+    "# Features are 3 random normal variables\n",
+    "features = torch.randn((1, 5))\n",
+    "# True weights for our data, random normal variables again\n",
+    "weights = torch.randn_like(features)\n",
+    "# and a true bias term\n",
+    "bias = torch.randn((1, 1))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[-0.9179, -0.4578, -0.7245,  1.2799, -0.9941]])"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "torch.randn_like(torch.randn((1, 5)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Above I generated data we can use to get the output of our simple network. This is all just random for now, going forward we'll start using normal data. Going through each relevant line:\n",
+    "\n",
+    "`features = torch.randn((1, 5))` creates a tensor with shape `(1, 5)`, one row and five columns, that contains values randomly distributed according to the normal distribution with a mean of zero and standard deviation of one. \n",
+    "\n",
+    "`weights = torch.randn_like(features)` creates another tensor with the same shape as `features`, again containing values from a normal distribution.\n",
+    "\n",
+    "Finally, `bias = torch.randn((1, 1))` creates a single value from a normal distribution.\n",
+    "\n",
+    "PyTorch tensors can be added, multiplied, subtracted, etc, just like Numpy arrays. In general, you'll use PyTorch tensors pretty much the same way you'd use Numpy arrays. They come with some nice benefits though such as GPU acceleration which we'll get to later. For now, use the generated data to calculate the output of this simple single layer network. \n",
+    "> **Exercise**: Calculate the output of the network with input features `features`, weights `weights`, and bias `bias`. Similar to Numpy, PyTorch has a [`torch.sum()`](https://pytorch.org/docs/stable/torch.html#torch.sum) function, as well as a `.sum()` method on tensors, for taking sums. Use the function `activation` defined above as the activation function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[-1.6619]])\n"
+     ]
+    }
+   ],
+   "source": [
+    "## Calculate the output of this network using the weights and bias tensors\n",
+    "sum = torch.matmul(features, weights.view(5, 1)) + bias\n",
+    "print(sum)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can do the multiplication and sum in the same operation using a matrix multiplication. In general, you'll want to use matrix multiplications since they are more efficient and accelerated using modern libraries and high-performance computing on GPUs.\n",
+    "\n",
+    "Here, we want to do a matrix multiplication of the features and the weights. For this we can use [`torch.mm()`](https://pytorch.org/docs/stable/torch.html#torch.mm) or [`torch.matmul()`](https://pytorch.org/docs/stable/torch.html#torch.matmul) which is somewhat more complicated and supports broadcasting. If we try to do it with `features` and `weights` as they are, we'll get an error\n",
+    "\n",
+    "```python\n",
+    ">> torch.mm(features, weights)\n",
+    "\n",
+    "---------------------------------------------------------------------------\n",
+    "RuntimeError                              Traceback (most recent call last)\n",
+    "<ipython-input-13-15d592eb5279> in <module>()\n",
+    "----> 1 torch.mm(features, weights)\n",
+    "\n",
+    "RuntimeError: size mismatch, m1: [1 x 5], m2: [1 x 5] at /Users/soumith/minicondabuild3/conda-bld/pytorch_1524590658547/work/aten/src/TH/generic/THTensorMath.c:2033\n",
+    "```\n",
+    "\n",
+    "As you're building neural networks in any framework, you'll see this often. Really often. What's happening here is our tensors aren't the correct shapes to perform a matrix multiplication. Remember that for matrix multiplications, the number of columns in the first tensor must equal to the number of rows in the second column. Both `features` and `weights` have the same shape, `(1, 5)`. This means we need to change the shape of `weights` to get the matrix multiplication to work.\n",
+    "\n",
+    "**Note:** To see the shape of a tensor called `tensor`, use `tensor.shape`. If you're building neural networks, you'll be using this method often.\n",
+    "\n",
+    "There are a few options here: [`weights.reshape()`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.reshape), [`weights.resize_()`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.resize_), and [`weights.view()`](https://pytorch.org/docs/stable/tensors.html#torch.Tensor.view).\n",
+    "\n",
+    "* `weights.reshape(a, b)` will return a new tensor with the same data as `weights` with size `(a, b)` sometimes, and sometimes a clone, as in it copies the data to another part of memory.\n",
+    "* `weights.resize_(a, b)` returns the same tensor with a different shape. However, if the new shape results in fewer elements than the original tensor, some elements will be removed from the tensor (but not from memory). If the new shape results in more elements than the original tensor, new elements will be uninitialized in memory. Here I should note that the underscore at the end of the method denotes that this method is performed **in-place**. Here is a great forum thread to [read more about in-place operations](https://discuss.pytorch.org/t/what-is-in-place-operation/16244) in PyTorch.\n",
+    "* `weights.view(a, b)` will return a new tensor with the same data as `weights` with size `(a, b)`.\n",
+    "\n",
+    "I usually use `.view()`, but any of the three methods will work for this. So, now we can reshape `weights` to have five rows and one column with something like `weights.view(5, 1)`.\n",
+    "\n",
+    "> **Exercise**: Calculate the output of our little network using matrix multiplication."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[ 0.1595]])\n"
+     ]
+    }
+   ],
+   "source": [
+    "## Calculate the output of this y = activation(sum)\n",
+    "y = activation(sum)\n",
+    "print(y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Stack them up!\n",
+    "\n",
+    "That's how you can calculate the output for a single neuron. The real power of this algorithm happens when you start stacking these individual units into layers and stacks of layers, into a network of neurons. The output of one layer of neurons becomes the input for the next layer. With multiple input units and output units, we now need to express the weights as a matrix.\n",
+    "\n",
+    "<img src='assets/multilayer_diagram_weights.png' width=450px>\n",
+    "\n",
+    "The first layer shown on the bottom here are the inputs, understandably called the **input layer**. The middle layer is called the **hidden layer**, and the final layer (on the right) is the **output layer**. We can express this network mathematically with matrices again and use matrix multiplication to get linear combinations for each unit in one operation. For example, the hidden layer ($h_1$ and $h_2$ here) can be calculated \n",
+    "\n",
+    "$$\n",
+    "\\vec{h} = [h_1 \\, h_2] = \n",
+    "\\begin{bmatrix}\n",
+    "x_1 \\, x_2 \\cdots \\, x_n\n",
+    "\\end{bmatrix}\n",
+    "\\cdot \n",
+    "\\begin{bmatrix}\n",
+    "           w_{11} & w_{12} \\\\\n",
+    "           w_{21} &w_{22} \\\\\n",
+    "           \\vdots &\\vdots \\\\\n",
+    "           w_{n1} &w_{n2}\n",
+    "\\end{bmatrix}\n",
+    "$$\n",
+    "\n",
+    "The output for this small network is found by treating the hidden layer as inputs for the output unit. The network output is expressed simply\n",
+    "\n",
+    "$$\n",
+    "y =  f_2 \\! \\left(\\, f_1 \\! \\left(\\vec{x} \\, \\mathbf{W_1}\\right) \\mathbf{W_2} \\right)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "### Generate some data\n",
+    "torch.manual_seed(7) # Set the random seed so things are predictable\n",
+    "\n",
+    "# Features are 3 random normal variables\n",
+    "features = torch.randn((1, 3))\n",
+    "\n",
+    "# Define the size of each layer in our network\n",
+    "n_input = features.shape[1]     # Number of input units, must match number of input features\n",
+    "n_hidden = 2                    # Number of hidden units \n",
+    "n_output = 1                    # Number of output units\n",
+    "\n",
+    "# Weights for inputs to hidden layer\n",
+    "W1 = torch.randn(n_input, n_hidden)\n",
+    "# Weights for hidden layer to output layer\n",
+    "W2 = torch.randn(n_hidden, n_output)\n",
+    "\n",
+    "# and bias terms for hidden and output layers\n",
+    "B1 = torch.randn((1, n_hidden))\n",
+    "B2 = torch.randn((1, n_output))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> **Exercise:** Calculate the output for this multi-layer network using the weights `W1` & `W2`, and the biases, `B1` & `B2`. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[ 0.3171]])\n"
+     ]
+    }
+   ],
+   "source": [
+    "## Your solution here\n",
+    "h = torch.mm(features, W1) + B1\n",
+    "h = activation(h)\n",
+    "output = torch.mm(h, W2) + B2\n",
+    "output = activation(output)\n",
+    "print(output)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If you did this correctly, you should see the output `tensor([[ 0.3171]])`.\n",
+    "\n",
+    "The number of hidden units a parameter of the network, often called a **hyperparameter** to differentiate it from the weights and biases parameters. As you'll see later when we discuss training a neural network, the more hidden units a network has, and the more layers, the better able it is to learn from data and make accurate predictions."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Numpy to Torch and back\n",
+    "\n",
+    "Special bonus section! PyTorch has a great feature for converting between Numpy arrays and Torch tensors. To create a tensor from a Numpy array, use `torch.from_numpy()`. To convert a tensor to a Numpy array, use the `.numpy()` method."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([[ 0.25916044,  0.07974115,  0.13661261],\n",
+       "       [ 0.12537927,  0.68318453,  0.24728824],\n",
+       "       [ 0.49867652,  0.62760544,  0.84363415],\n",
+       "       [ 0.39216351,  0.87075396,  0.0653991 ]])"
+      ]
+     },
+     "execution_count": 27,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "a = np.random.rand(4,3)\n",
+    "a"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[ 0.2592,  0.0797,  0.1366],\n",
+       "        [ 0.1254,  0.6832,  0.2473],\n",
+       "        [ 0.4987,  0.6276,  0.8436],\n",
+       "        [ 0.3922,  0.8708,  0.0654]], dtype=torch.float64)"
+      ]
+     },
+     "execution_count": 28,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "b = torch.from_numpy(a)\n",
+    "b"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([[ 0.25916044,  0.07974115,  0.13661261],\n",
+       "       [ 0.12537927,  0.68318453,  0.24728824],\n",
+       "       [ 0.49867652,  0.62760544,  0.84363415],\n",
+       "       [ 0.39216351,  0.87075396,  0.0653991 ]])"
+      ]
+     },
+     "execution_count": 29,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "b.numpy()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The memory is shared between the Numpy array and Torch tensor, so if you change the values in-place of one object, the other will change as well."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[ 0.5183,  0.1595,  0.2732],\n",
+       "        [ 0.2508,  1.3664,  0.4946],\n",
+       "        [ 0.9974,  1.2552,  1.6873],\n",
+       "        [ 0.7843,  1.7415,  0.1308]], dtype=torch.float64)"
+      ]
+     },
+     "execution_count": 30,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Multiply PyTorch Tensor by 2, in place\n",
+    "b.mul_(2)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([[ 0.51832089,  0.15948231,  0.27322523],\n",
+       "       [ 0.25075854,  1.36636907,  0.49457648],\n",
+       "       [ 0.99735305,  1.25521088,  1.68726831],\n",
+       "       [ 0.78432703,  1.74150792,  0.1307982 ]])"
+      ]
+     },
+     "execution_count": 31,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Numpy array matches new values from Tensor\n",
+    "a"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/(Exercises).ipynb
+++ b/(Exercises).ipynb
--- a/(Exercises).ipynb
+++ b/(Exercises).ipynb
--- a/(Exercises).ipynb
+++ b/(Exercises).ipynb
--- a/(Solution).ipynb
+++ b/(Solution).ipynb
--- a/(Exercises).ipynb
+++ b/(Exercises).ipynb
--- a/Learning/Deep
+++ b/Learning/Deep
--- a/(Exercises).ipynb
+++ b/(Exercises).ipynb
--- a/(Exercises).ipynb
+++ b/(Exercises).ipynb
@@ -0,0 +1,903 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Transfer Learning\n",
+    "\n",
+    "In this notebook, you'll learn how to use pre-trained networks to solved challenging problems in computer vision. Specifically, you'll use networks trained on [ImageNet](http://www.image-net.org/) [available from torchvision](http://pytorch.org/docs/0.3.0/torchvision/models.html). \n",
+    "\n",
+    "ImageNet is a massive dataset with over 1 million labeled images in 1000 categories. It's used to train deep neural networks using an architecture called convolutional layers. I'm not going to get into the details of convolutional networks here, but if you want to learn more about them, please [watch this](https://www.youtube.com/watch?v=2-Ol7ZB0MmU).\n",
+    "\n",
+    "Once trained, these models work astonishingly well as feature detectors for images they weren't trained on. Using a pre-trained network on images not in the training set is called transfer learning. Here we'll use transfer learning to train a network that can classify our cat and dog photos with near perfect accuracy.\n",
+    "\n",
+    "With `torchvision.models` you can download these pre-trained networks and use them in your applications. We'll include `models` in our imports now."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "%config InlineBackend.figure_format = 'retina'\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "import torch\n",
+    "from torch import nn\n",
+    "from torch import optim\n",
+    "import torch.nn.functional as F\n",
+    "from torchvision import datasets, transforms, models"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Most of the pretrained models require the input to be 224x224 images. Also, we'll need to match the normalization used when the models were trained. Each color channel was normalized separately, the means are `[0.485, 0.456, 0.406]` and the standard deviations are `[0.229, 0.224, 0.225]`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_dir = 'Cat_Dog_data'\n",
+    "\n",
+    "# TODO: Define transforms for the training data and testing data\n",
+    "train_transforms = transforms.Compose([transforms.RandomRotation(30),\n",
+    "                                       transforms.RandomResizedCrop(224),\n",
+    "                                       transforms.RandomHorizontalFlip(),\n",
+    "                                       transforms.ToTensor(),\n",
+    "                                       transforms.Normalize([0.485, 0.456, 0.406],\n",
+    "                                                            [0.229, 0.224, 0.225])])\n",
+    "\n",
+    "test_transforms = transforms.Compose([transforms.Resize(255),\n",
+    "                                      transforms.CenterCrop(224),\n",
+    "                                      transforms.ToTensor(),\n",
+    "                                      transforms.Normalize([0.485, 0.456, 0.406],\n",
+    "                                                           [0.229, 0.224, 0.225])])\n",
+    "\n",
+    "# Pass transforms in here, then run the next cell to see how the transforms look\n",
+    "train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)\n",
+    "test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)\n",
+    "\n",
+    "trainloader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)\n",
+    "testloader = torch.utils.data.DataLoader(test_data, batch_size=64)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can load in a model such as [DenseNet](http://pytorch.org/docs/0.3.0/torchvision/models.html#id5). Let's print out the model architecture so we can see what's going on."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/opt/conda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/models/densenet.py:212: UserWarning: nn.init.kaiming_normal is now deprecated in favor of nn.init.kaiming_normal_.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "DenseNet(\n",
+       "  (features): Sequential(\n",
+       "    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)\n",
+       "    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "    (relu0): ReLU(inplace)\n",
+       "    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)\n",
+       "    (denseblock1): _DenseBlock(\n",
+       "      (denselayer1): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer2): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(96, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer3): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer4): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(160, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer5): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(192, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer6): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(224, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(224, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "    )\n",
+       "    (transition1): _Transition(\n",
+       "      (norm): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "      (relu): ReLU(inplace)\n",
+       "      (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "      (pool): AvgPool2d(kernel_size=2, stride=2, padding=0)\n",
+       "    )\n",
+       "    (denseblock2): _DenseBlock(\n",
+       "      (denselayer1): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer2): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(160, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer3): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(192, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer4): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(224, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(224, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer5): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer6): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(288, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(288, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer7): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(320, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer8): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(352, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(352, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer9): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(384, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer10): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(416, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(416, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer11): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(448, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(448, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer12): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(480, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(480, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "    )\n",
+       "    (transition2): _Transition(\n",
+       "      (norm): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "      (relu): ReLU(inplace)\n",
+       "      (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "      (pool): AvgPool2d(kernel_size=2, stride=2, padding=0)\n",
+       "    )\n",
+       "    (denseblock3): _DenseBlock(\n",
+       "      (denselayer1): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer2): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(288, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(288, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer3): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(320, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer4): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(352, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(352, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer5): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(384, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer6): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(416, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(416, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer7): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(448, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(448, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer8): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(480, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(480, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer9): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer10): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(544, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(544, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer11): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(576, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer12): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(608, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(608, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer13): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(640, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer14): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(672, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(672, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer15): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(704, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(704, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer16): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(736, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(736, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer17): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(768, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer18): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(800, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer19): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(832, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(832, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer20): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(864, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(864, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer21): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(896, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(896, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer22): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(928, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(928, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer23): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(960, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer24): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(992, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(992, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "    )\n",
+       "    (transition3): _Transition(\n",
+       "      (norm): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "      (relu): ReLU(inplace)\n",
+       "      (conv): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "      (pool): AvgPool2d(kernel_size=2, stride=2, padding=0)\n",
+       "    )\n",
+       "    (denseblock4): _DenseBlock(\n",
+       "      (denselayer1): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer2): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(544, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(544, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer3): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(576, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer4): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(608, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(608, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer5): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(640, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer6): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(672, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(672, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer7): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(704, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(704, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer8): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(736, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(736, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer9): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(768, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer10): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(800, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer11): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(832, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(832, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer12): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(864, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(864, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer13): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(896, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(896, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer14): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(928, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(928, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer15): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(960, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer16): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(992, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(992, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "    )\n",
+       "    (norm5): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "  )\n",
+       "  (classifier): Linear(in_features=1024, out_features=1000, bias=True)\n",
+       ")"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "model = models.densenet121(pretrained=True)\n",
+    "model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This model is built out of two main parts, the features and the classifier. The features part is a stack of convolutional layers and overall works as a feature detector that can be fed into a classifier. The classifier part is a single fully-connected layer `(classifier): Linear(in_features=1024, out_features=1000)`. This layer was trained on the ImageNet dataset, so it won't work for our specific problem. That means we need to replace the classifier, but the features will work perfectly on their own. In general, I think about pre-trained networks as amazingly good feature detectors that can be used as the input for simple feed-forward classifiers."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Freeze parameters so we don't backprop through them\n",
+    "for param in model.parameters():\n",
+    "    param.requires_grad = False\n",
+    "\n",
+    "from collections import OrderedDict\n",
+    "classifier = nn.Sequential(OrderedDict([\n",
+    "                          ('fc1', nn.Linear(1024, 500)),\n",
+    "                          ('relu', nn.ReLU()),\n",
+    "                          ('fc2', nn.Linear(500, 2)),\n",
+    "                          ('output', nn.LogSoftmax(dim=1))\n",
+    "                          ]))\n",
+    "    \n",
+    "model.classifier = classifier"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "With our model built, we need to train the classifier. However, now we're using a **really deep** neural network. If you try to train this on a CPU like normal, it will take a long, long time. Instead, we're going to use the GPU to do the calculations. The linear algebra computations are done in parallel on the GPU leading to 100x increased training speeds. It's also possible to train on multiple GPUs, further decreasing training time.\n",
+    "\n",
+    "PyTorch, along with pretty much every other deep learning framework, uses [CUDA](https://developer.nvidia.com/cuda-zone) to efficiently compute the forward and backwards passes on the GPU. In PyTorch, you move your model parameters and other tensors to the GPU memory using `model.to('cuda')`. You can move them back from the GPU with `model.to('cpu')` which you'll commonly do when you need to operate on the network output outside of PyTorch. As a demonstration of the increased speed, I'll compare how long it takes to perform a forward and backward pass with and without a GPU."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import time"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Device = cpu; Time per batch: 5.701 seconds\n",
+      "Device = cuda; Time per batch: 0.010 seconds\n"
+     ]
+    }
+   ],
+   "source": [
+    "for device in ['cpu', 'cuda']:\n",
+    "\n",
+    "    criterion = nn.NLLLoss()\n",
+    "    # Only train the classifier parameters, feature parameters are frozen\n",
+    "    optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)\n",
+    "\n",
+    "    model.to(device)\n",
+    "\n",
+    "    for ii, (inputs, labels) in enumerate(trainloader):\n",
+    "\n",
+    "        # Move input and label tensors to the GPU\n",
+    "        inputs, labels = inputs.to(device), labels.to(device)\n",
+    "\n",
+    "        start = time.time()\n",
+    "\n",
+    "        outputs = model.forward(inputs)\n",
+    "        loss = criterion(outputs, labels)\n",
+    "        loss.backward()\n",
+    "        optimizer.step()\n",
+    "\n",
+    "        if ii==3:\n",
+    "            break\n",
+    "        \n",
+    "    print(f\"Device = {device}; Time per batch: {(time.time() - start)/3:.3f} seconds\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can write device agnostic code which will automatically use CUDA if it's enabled like so:\n",
+    "```python\n",
+    "# at beginning of the script\n",
+    "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
+    "\n",
+    "...\n",
+    "\n",
+    "# then whenever you get a new Tensor or Module\n",
+    "# this won't copy if they are already on the desired device\n",
+    "input = data.to(device)\n",
+    "model = MyModule(...).to(device)\n",
+    "```\n",
+    "\n",
+    "From here, I'll let you finish training the model. The process is the same as before except now your model is much more powerful. You should get better than 95% accuracy easily.\n",
+    "\n",
+    ">**Exercise:** Train a pretrained models to classify the cat and dog images. Continue with the DenseNet model, or try ResNet, it's also a good model to try out first. Make sure you are only training the classifier and the parameters for the features part are frozen."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/opt/conda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/models/densenet.py:212: UserWarning: nn.init.kaiming_normal is now deprecated in favor of nn.init.kaiming_normal_.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "device(type='cuda')"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "## TODO: Use a pretrained model to classify the cat and dog images\n",
+    "\n",
+    "# Use GPU if it's available\n",
+    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
+    "\n",
+    "model = models.densenet121(pretrained=True)\n",
+    "\n",
+    "# Freeze Parameters so we don't back propagate through them\n",
+    "for param in model.parameters():\n",
+    "    param.requires_grad = False\n",
+    "    \n",
+    "model.classifier = nn.Sequential(nn.Linear(1024, 256),\n",
+    "                                 nn.ReLU(),\n",
+    "                                 nn.Dropout(0.2),\n",
+    "                                 nn.Linear(256, 2),\n",
+    "                                 nn.LogSoftmax(dim=1))\n",
+    "\n",
+    "criterion = nn.NLLLoss()\n",
+    "\n",
+    "# Only train the classifier parameters, feature parameters are frozen\n",
+    "optimizer = optim.Adam(model.classifier.parameters(), lr=0.003)\n",
+    "\n",
+    "model.to(device)\n",
+    "device"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: 1 out of 1\n",
+      "Training Loss: 0.208\n",
+      "Test Loss: 0.076\n",
+      "Test Accuracy: 0.971\n",
+      "\n",
+      "Epoch: 1 out of 1\n",
+      "Training Loss: 0.214\n",
+      "Test Loss: 0.068\n",
+      "Test Accuracy: 0.978\n",
+      "\n",
+      "Epoch: 1 out of 1\n",
+      "Training Loss: 0.216\n",
+      "Test Loss: 0.075\n",
+      "Test Accuracy: 0.971\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "epochs = 1\n",
+    "steps = 0\n",
+    "runningLoss = 0\n",
+    "printEvery = 5\n",
+    "\n",
+    "for epoch in range(epochs):\n",
+    "    for inputs, labels in trainloader:\n",
+    "        steps += 1\n",
+    "        \n",
+    "        # Move input and label tensors to the device\n",
+    "        inputs, labels = inputs.to(device), labels.to(device)\n",
+    "        \n",
+    "        # Zero the gradients\n",
+    "        optimizer.zero_grad()\n",
+    "        \n",
+    "        # Make forward pass\n",
+    "        logps = model.forward(inputs)\n",
+    "        \n",
+    "        # Calculate loss\n",
+    "        loss = criterion(logps, labels)\n",
+    "        \n",
+    "        # Backpropagate\n",
+    "        loss.backward()\n",
+    "        \n",
+    "        # Update the weights\n",
+    "        optimizer.step()\n",
+    "        \n",
+    "        runningLoss += loss.item()\n",
+    "        \n",
+    "        # Do the validation pass\n",
+    "        if steps % printEvery == 0:\n",
+    "            testLoss = 0\n",
+    "            accuracy = 0\n",
+    "            model.eval()\n",
+    "            \n",
+    "            with torch.no_grad():\n",
+    "                for inputs, labels in testloader:\n",
+    "                    # Move input and label tensors to the device\n",
+    "                    inputs, labels = inputs.to(device), labels.to(device)\n",
+    "                    \n",
+    "                    # Get the output\n",
+    "                    logps = model.forward(inputs)\n",
+    "                    \n",
+    "                    # Get the loss\n",
+    "                    batchLoss = criterion(logps, labels)\n",
+    "                    testLoss += batchLoss.item()\n",
+    "                    \n",
+    "                    # Find the accuracy\n",
+    "                    # Get the probabilities\n",
+    "                    ps = torch.exp(logps)\n",
+    "                    \n",
+    "                    # Get the most likely class for each prediction\n",
+    "                    top_p, top_class = ps.topk(1, dim=1)\n",
+    "                    \n",
+    "                    # Check if the predictions match the actual label\n",
+    "                    equals = top_class == labels.view(*top_class.shape)\n",
+    "                    \n",
+    "                    # Update accuracy\n",
+    "                    accuracy += torch.mean(equals.type(torch.FloatTensor)).item()\n",
+    "                    \n",
+    "            # Print output\n",
+    "            print(f'Epoch: {epoch+1} out of {epochs}')\n",
+    "            print(f'Training Loss: {runningLoss/printEvery:.3f}')\n",
+    "            print(f'Test Loss: {testLoss/len(testloader):.3f}')\n",
+    "            print(f'Test Accuracy: {accuracy/len(testloader):.3f}')\n",
+    "            print()\n",
+    "\n",
+    "            runningLoss = 0\n",
+    "            model.train()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/(Solution).ipynb
+++ b/(Solution).ipynb
@@ -0,0 +1,861 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Transfer Learning\n",
+    "\n",
+    "In this notebook, you'll learn how to use pre-trained networks to solved challenging problems in computer vision. Specifically, you'll use networks trained on [ImageNet](http://www.image-net.org/) [available from torchvision](http://pytorch.org/docs/0.3.0/torchvision/models.html). \n",
+    "\n",
+    "ImageNet is a massive dataset with over 1 million labeled images in 1000 categories. It's used to train deep neural networks using an architecture called convolutional layers. I'm not going to get into the details of convolutional networks here, but if you want to learn more about them, please [watch this](https://www.youtube.com/watch?v=2-Ol7ZB0MmU).\n",
+    "\n",
+    "Once trained, these models work astonishingly well as feature detectors for images they weren't trained on. Using a pre-trained network on images not in the training set is called transfer learning. Here we'll use transfer learning to train a network that can classify our cat and dog photos with near perfect accuracy.\n",
+    "\n",
+    "With `torchvision.models` you can download these pre-trained networks and use them in your applications. We'll include `models` in our imports now."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%matplotlib inline\n",
+    "%config InlineBackend.figure_format = 'retina'\n",
+    "\n",
+    "import matplotlib.pyplot as plt\n",
+    "\n",
+    "import torch\n",
+    "from torch import nn\n",
+    "from torch import optim\n",
+    "import torch.nn.functional as F\n",
+    "from torchvision import datasets, transforms, models"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Most of the pretrained models require the input to be 224x224 images. Also, we'll need to match the normalization used when the models were trained. Each color channel was normalized separately, the means are `[0.485, 0.456, 0.406]` and the standard deviations are `[0.229, 0.224, 0.225]`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data_dir = 'Cat_Dog_data'\n",
+    "\n",
+    "# TODO: Define transforms for the training data and testing data\n",
+    "train_transforms = transforms.Compose([transforms.RandomRotation(30),\n",
+    "                                       transforms.RandomResizedCrop(224),\n",
+    "                                       transforms.RandomHorizontalFlip(),\n",
+    "                                       transforms.ToTensor(),\n",
+    "                                       transforms.Normalize([0.485, 0.456, 0.406],\n",
+    "                                                            [0.229, 0.224, 0.225])])\n",
+    "\n",
+    "test_transforms = transforms.Compose([transforms.Resize(255),\n",
+    "                                      transforms.CenterCrop(224),\n",
+    "                                      transforms.ToTensor(),\n",
+    "                                      transforms.Normalize([0.485, 0.456, 0.406],\n",
+    "                                                           [0.229, 0.224, 0.225])])\n",
+    "\n",
+    "# Pass transforms in here, then run the next cell to see how the transforms look\n",
+    "train_data = datasets.ImageFolder(data_dir + '/train', transform=train_transforms)\n",
+    "test_data = datasets.ImageFolder(data_dir + '/test', transform=test_transforms)\n",
+    "\n",
+    "trainloader = torch.utils.data.DataLoader(train_data, batch_size=64, shuffle=True)\n",
+    "testloader = torch.utils.data.DataLoader(test_data, batch_size=64)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We can load in a model such as [DenseNet](http://pytorch.org/docs/0.3.0/torchvision/models.html#id5). Let's print out the model architecture so we can see what's going on."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/opt/conda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/models/densenet.py:212: UserWarning: nn.init.kaiming_normal is now deprecated in favor of nn.init.kaiming_normal_.\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "DenseNet(\n",
+       "  (features): Sequential(\n",
+       "    (conv0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)\n",
+       "    (norm0): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "    (relu0): ReLU(inplace)\n",
+       "    (pool0): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)\n",
+       "    (denseblock1): _DenseBlock(\n",
+       "      (denselayer1): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer2): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(96, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer3): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer4): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(160, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer5): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(192, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer6): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(224, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(224, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "    )\n",
+       "    (transition1): _Transition(\n",
+       "      (norm): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "      (relu): ReLU(inplace)\n",
+       "      (conv): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "      (pool): AvgPool2d(kernel_size=2, stride=2, padding=0)\n",
+       "    )\n",
+       "    (denseblock2): _DenseBlock(\n",
+       "      (denselayer1): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer2): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(160, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(160, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer3): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(192, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer4): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(224, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(224, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer5): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer6): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(288, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(288, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer7): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(320, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer8): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(352, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(352, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer9): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(384, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer10): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(416, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(416, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer11): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(448, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(448, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer12): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(480, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(480, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "    )\n",
+       "    (transition2): _Transition(\n",
+       "      (norm): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "      (relu): ReLU(inplace)\n",
+       "      (conv): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "      (pool): AvgPool2d(kernel_size=2, stride=2, padding=0)\n",
+       "    )\n",
+       "    (denseblock3): _DenseBlock(\n",
+       "      (denselayer1): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer2): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(288, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(288, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer3): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(320, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer4): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(352, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(352, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer5): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(384, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer6): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(416, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(416, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer7): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(448, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(448, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer8): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(480, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(480, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer9): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer10): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(544, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(544, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer11): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(576, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer12): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(608, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(608, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer13): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(640, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer14): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(672, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(672, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer15): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(704, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(704, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer16): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(736, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(736, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer17): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(768, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer18): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(800, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer19): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(832, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(832, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer20): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(864, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(864, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer21): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(896, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(896, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer22): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(928, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(928, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer23): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(960, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer24): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(992, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(992, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "    )\n",
+       "    (transition3): _Transition(\n",
+       "      (norm): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "      (relu): ReLU(inplace)\n",
+       "      (conv): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "      (pool): AvgPool2d(kernel_size=2, stride=2, padding=0)\n",
+       "    )\n",
+       "    (denseblock4): _DenseBlock(\n",
+       "      (denselayer1): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer2): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(544, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(544, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer3): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(576, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(576, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer4): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(608, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(608, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer5): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(640, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(640, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer6): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(672, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(672, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer7): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(704, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(704, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer8): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(736, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(736, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer9): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(768, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(768, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer10): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(800, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(800, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer11): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(832, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(832, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer12): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(864, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(864, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer13): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(896, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(896, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer14): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(928, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(928, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer15): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(960, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(960, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "      (denselayer16): _DenseLayer(\n",
+       "        (norm1): BatchNorm2d(992, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu1): ReLU(inplace)\n",
+       "        (conv1): Conv2d(992, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n",
+       "        (norm2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "        (relu2): ReLU(inplace)\n",
+       "        (conv2): Conv2d(128, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n",
+       "      )\n",
+       "    )\n",
+       "    (norm5): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "  )\n",
+       "  (classifier): Linear(in_features=1024, out_features=1000, bias=True)\n",
+       ")"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "model = models.densenet121(pretrained=True)\n",
+    "model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This model is built out of two main parts, the features and the classifier. The features part is a stack of convolutional layers and overall works as a feature detector that can be fed into a classifier. The classifier part is a single fully-connected layer `(classifier): Linear(in_features=1024, out_features=1000)`. This layer was trained on the ImageNet dataset, so it won't work for our specific problem. That means we need to replace the classifier, but the features will work perfectly on their own. In general, I think about pre-trained networks as amazingly good feature detectors that can be used as the input for simple feed-forward classifiers."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Freeze parameters so we don't backprop through them\n",
+    "for param in model.parameters():\n",
+    "    param.requires_grad = False\n",
+    "\n",
+    "from collections import OrderedDict\n",
+    "classifier = nn.Sequential(OrderedDict([\n",
+    "                          ('fc1', nn.Linear(1024, 500)),\n",
+    "                          ('relu', nn.ReLU()),\n",
+    "                          ('fc2', nn.Linear(500, 2)),\n",
+    "                          ('output', nn.LogSoftmax(dim=1))\n",
+    "                          ]))\n",
+    "    \n",
+    "model.classifier = classifier"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "With our model built, we need to train the classifier. However, now we're using a **really deep** neural network. If you try to train this on a CPU like normal, it will take a long, long time. Instead, we're going to use the GPU to do the calculations. The linear algebra computations are done in parallel on the GPU leading to 100x increased training speeds. It's also possible to train on multiple GPUs, further decreasing training time.\n",
+    "\n",
+    "PyTorch, along with pretty much every other deep learning framework, uses [CUDA](https://developer.nvidia.com/cuda-zone) to efficiently compute the forward and backwards passes on the GPU. In PyTorch, you move your model parameters and other tensors to the GPU memory using `model.to('cuda')`. You can move them back from the GPU with `model.to('cpu')` which you'll commonly do when you need to operate on the network output outside of PyTorch. As a demonstration of the increased speed, I'll compare how long it takes to perform a forward and backward pass with and without a GPU."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import time"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Device = cpu; Time per batch: 5.634 seconds\n",
+      "Device = cuda; Time per batch: 0.010 seconds\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Try to replace with just ['cuda'] if you are using GPU \n",
+    "\n",
+    "for device in ['cpu', 'cuda']:\n",
+    "\n",
+    "    criterion = nn.NLLLoss()\n",
+    "    # Only train the classifier parameters, feature parameters are frozen\n",
+    "    optimizer = optim.Adam(model.classifier.parameters(), lr=0.001)\n",
+    "\n",
+    "    model.to(device)\n",
+    "\n",
+    "    for ii, (inputs, labels) in enumerate(trainloader):\n",
+    "\n",
+    "        # Move input and label tensors to the GPU\n",
+    "        inputs, labels = inputs.to(device), labels.to(device)\n",
+    "\n",
+    "        start = time.time()\n",
+    "\n",
+    "        outputs = model.forward(inputs)\n",
+    "        loss = criterion(outputs, labels)\n",
+    "        loss.backward()\n",
+    "        optimizer.step()\n",
+    "\n",
+    "        if ii==3:\n",
+    "            break\n",
+    "        \n",
+    "    print(f\"Device = {device}; Time per batch: {(time.time() - start)/3:.3f} seconds\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "You can write device agnostic code which will automatically use CUDA if it's enabled like so:\n",
+    "```python\n",
+    "# at beginning of the script\n",
+    "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
+    "\n",
+    "...\n",
+    "\n",
+    "# then whenever you get a new Tensor or Module\n",
+    "# this won't copy if they are already on the desired device\n",
+    "input = data.to(device)\n",
+    "model = MyModule(...).to(device)\n",
+    "```\n",
+    "\n",
+    "From here, I'll let you finish training the model. The process is the same as before except now your model is much more powerful. You should get better than 95% accuracy easily.\n",
+    "\n",
+    ">**Exercise:** Train a pretrained models to classify the cat and dog images. Continue with the DenseNet model, or try ResNet, it's also a good model to try out first. Make sure you are only training the classifier and the parameters for the features part are frozen."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/opt/conda/lib/python3.6/site-packages/torchvision-0.2.1-py3.6.egg/torchvision/models/densenet.py:212: UserWarning: nn.init.kaiming_normal is now deprecated in favor of nn.init.kaiming_normal_.\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Use GPU if it's available\n",
+    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
+    "\n",
+    "model = models.densenet121(pretrained=True)\n",
+    "\n",
+    "# Freeze parameters so we don't backprop through them\n",
+    "for param in model.parameters():\n",
+    "    param.requires_grad = False\n",
+    "    \n",
+    "model.classifier = nn.Sequential(nn.Linear(1024, 256),\n",
+    "                                 nn.ReLU(),\n",
+    "                                 nn.Dropout(0.2),\n",
+    "                                 nn.Linear(256, 2),\n",
+    "                                 nn.LogSoftmax(dim=1))\n",
+    "\n",
+    "criterion = nn.NLLLoss()\n",
+    "\n",
+    "# Only train the classifier parameters, feature parameters are frozen\n",
+    "optimizer = optim.Adam(model.classifier.parameters(), lr=0.003)\n",
+    "\n",
+    "model.to(device);"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch 1/1.. Train loss: 0.844.. Test loss: 0.479.. Test accuracy: 0.675\n",
+      "Epoch 1/1.. Train loss: 0.596.. Test loss: 0.331.. Test accuracy: 0.841\n",
+      "Epoch 1/1.. Train loss: 0.314.. Test loss: 0.183.. Test accuracy: 0.939\n",
+      "Epoch 1/1.. Train loss: 0.287.. Test loss: 0.108.. Test accuracy: 0.966\n",
+      "Epoch 1/1.. Train loss: 0.241.. Test loss: 0.091.. Test accuracy: 0.969\n",
+      "Epoch 1/1.. Train loss: 0.210.. Test loss: 0.111.. Test accuracy: 0.959\n",
+      "Epoch 1/1.. Train loss: 0.192.. Test loss: 0.083.. Test accuracy: 0.969\n",
+      "Epoch 1/1.. Train loss: 0.168.. Test loss: 0.076.. Test accuracy: 0.973\n",
+      "Epoch 1/1.. Train loss: 0.176.. Test loss: 0.060.. Test accuracy: 0.979\n",
+      "Epoch 1/1.. Train loss: 0.174.. Test loss: 0.087.. Test accuracy: 0.965\n",
+      "Epoch 1/1.. Train loss: 0.167.. Test loss: 0.100.. Test accuracy: 0.960\n",
+      "Epoch 1/1.. Train loss: 0.243.. Test loss: 0.052.. Test accuracy: 0.980\n",
+      "Epoch 1/1.. Train loss: 0.252.. Test loss: 0.054.. Test accuracy: 0.982\n",
+      "Epoch 1/1.. Train loss: 0.207.. Test loss: 0.054.. Test accuracy: 0.982\n",
+      "Epoch 1/1.. Train loss: 0.199.. Test loss: 0.068.. Test accuracy: 0.979\n",
+      "Epoch 1/1.. Train loss: 0.200.. Test loss: 0.085.. Test accuracy: 0.969\n",
+      "Epoch 1/1.. Train loss: 0.196.. Test loss: 0.052.. Test accuracy: 0.982\n"
+     ]
+    }
+   ],
+   "source": [
+    "epochs = 1\n",
+    "steps = 0\n",
+    "running_loss = 0\n",
+    "print_every = 5\n",
+    "for epoch in range(epochs):\n",
+    "    for inputs, labels in trainloader:\n",
+    "        steps += 1\n",
+    "        # Move input and label tensors to the default device\n",
+    "        inputs, labels = inputs.to(device), labels.to(device)\n",
+    "        \n",
+    "        optimizer.zero_grad()\n",
+    "        \n",
+    "        logps = model.forward(inputs)\n",
+    "        loss = criterion(logps, labels)\n",
+    "        loss.backward()\n",
+    "        optimizer.step()\n",
+    "\n",
+    "        running_loss += loss.item()\n",
+    "        \n",
+    "        if steps % print_every == 0:\n",
+    "            test_loss = 0\n",
+    "            accuracy = 0\n",
+    "            model.eval()\n",
+    "            with torch.no_grad():\n",
+    "                for inputs, labels in testloader:\n",
+    "                    inputs, labels = inputs.to(device), labels.to(device)\n",
+    "                    logps = model.forward(inputs)\n",
+    "                    batch_loss = criterion(logps, labels)\n",
+    "                    \n",
+    "                    test_loss += batch_loss.item()\n",
+    "                    \n",
+    "                    # Calculate accuracy\n",
+    "                    ps = torch.exp(logps)\n",
+    "                    top_p, top_class = ps.topk(1, dim=1)\n",
+    "                    equals = top_class == labels.view(*top_class.shape)\n",
+    "                    accuracy += torch.mean(equals.type(torch.FloatTensor)).item()\n",
+    "                    \n",
+    "            print(f\"Epoch {epoch+1}/{epochs}.. \"\n",
+    "                  f\"Train loss: {running_loss/print_every:.3f}.. \"\n",
+    "                  f\"Test loss: {test_loss/len(testloader):.3f}.. \"\n",
+    "                  f\"Test accuracy: {accuracy/len(testloader):.3f}\")\n",
+    "            running_loss = 0\n",
+    "            model.train()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/PyTorch/Untitled.ipynb
+++ b/PyTorch/Untitled.ipynb
@@ -0,0 +1,332 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "torch.Size([64, 10])\n",
+      "tensor([[9],\n",
+      "        [9],\n",
+      "        [9],\n",
+      "        [9],\n",
+      "        [9],\n",
+      "        [5],\n",
+      "        [9],\n",
+      "        [5],\n",
+      "        [2],\n",
+      "        [9]])\n",
+      "Accuracy: 1.5625%\n",
+      "Epoch: 1 out of 30\n",
+      "Training Loss: 0.510\n",
+      "Test Loss: 0.454\n",
+      "Test Accuracy: 0.836\n",
+      "\n",
+      "Epoch: 2 out of 30\n",
+      "Training Loss: 0.389\n",
+      "Test Loss: 0.412\n",
+      "Test Accuracy: 0.852\n",
+      "\n",
+      "Epoch: 3 out of 30\n",
+      "Training Loss: 0.353\n",
+      "Test Loss: 0.388\n",
+      "Test Accuracy: 0.861\n",
+      "\n",
+      "Epoch: 4 out of 30\n",
+      "Training Loss: 0.330\n",
+      "Test Loss: 0.428\n",
+      "Test Accuracy: 0.847\n",
+      "\n",
+      "Epoch: 5 out of 30\n",
+      "Training Loss: 0.315\n",
+      "Test Loss: 0.381\n",
+      "Test Accuracy: 0.865\n",
+      "\n",
+      "Epoch: 6 out of 30\n",
+      "Training Loss: 0.303\n",
+      "Test Loss: 0.388\n",
+      "Test Accuracy: 0.864\n",
+      "\n",
+      "Epoch: 7 out of 30\n",
+      "Training Loss: 0.292\n",
+      "Test Loss: 0.364\n",
+      "Test Accuracy: 0.872\n",
+      "\n",
+      "Epoch: 8 out of 30\n",
+      "Training Loss: 0.281\n",
+      "Test Loss: 0.370\n",
+      "Test Accuracy: 0.869\n",
+      "\n",
+      "Epoch: 9 out of 30\n",
+      "Training Loss: 0.270\n",
+      "Test Loss: 0.365\n",
+      "Test Accuracy: 0.877\n",
+      "\n",
+      "Epoch: 10 out of 30\n",
+      "Training Loss: 0.267\n",
+      "Test Loss: 0.366\n",
+      "Test Accuracy: 0.877\n",
+      "\n",
+      "Epoch: 11 out of 30\n",
+      "Training Loss: 0.260\n",
+      "Test Loss: 0.369\n",
+      "Test Accuracy: 0.873\n",
+      "\n",
+      "Epoch: 12 out of 30\n",
+      "Training Loss: 0.254\n",
+      "Test Loss: 0.377\n",
+      "Test Accuracy: 0.876\n",
+      "\n",
+      "Epoch: 13 out of 30\n",
+      "Training Loss: 0.244\n",
+      "Test Loss: 0.369\n",
+      "Test Accuracy: 0.879\n",
+      "\n",
+      "Epoch: 14 out of 30\n",
+      "Training Loss: 0.243\n",
+      "Test Loss: 0.371\n",
+      "Test Accuracy: 0.879\n",
+      "\n",
+      "Epoch: 15 out of 30\n",
+      "Training Loss: 0.237\n",
+      "Test Loss: 0.377\n",
+      "Test Accuracy: 0.883\n",
+      "\n",
+      "Epoch: 16 out of 30\n",
+      "Training Loss: 0.230\n",
+      "Test Loss: 0.407\n",
+      "Test Accuracy: 0.874\n",
+      "\n",
+      "Epoch: 17 out of 30\n",
+      "Training Loss: 0.228\n",
+      "Test Loss: 0.370\n",
+      "Test Accuracy: 0.879\n",
+      "\n",
+      "Epoch: 18 out of 30\n",
+      "Training Loss: 0.221\n",
+      "Test Loss: 0.376\n",
+      "Test Accuracy: 0.878\n",
+      "\n",
+      "Epoch: 19 out of 30\n",
+      "Training Loss: 0.222\n",
+      "Test Loss: 0.376\n",
+      "Test Accuracy: 0.881\n",
+      "\n",
+      "Epoch: 20 out of 30\n",
+      "Training Loss: 0.217\n",
+      "Test Loss: 0.387\n",
+      "Test Accuracy: 0.880\n",
+      "\n",
+      "Epoch: 21 out of 30\n",
+      "Training Loss: 0.209\n",
+      "Test Loss: 0.401\n",
+      "Test Accuracy: 0.877\n",
+      "\n",
+      "Epoch: 22 out of 30\n",
+      "Training Loss: 0.210\n",
+      "Test Loss: 0.392\n",
+      "Test Accuracy: 0.883\n",
+      "\n",
+      "Epoch: 23 out of 30\n",
+      "Training Loss: 0.204\n",
+      "Test Loss: 0.411\n",
+      "Test Accuracy: 0.878\n",
+      "\n",
+      "Epoch: 24 out of 30\n",
+      "Training Loss: 0.202\n",
+      "Test Loss: 0.391\n",
+      "Test Accuracy: 0.882\n",
+      "\n",
+      "Epoch: 25 out of 30\n",
+      "Training Loss: 0.195\n",
+      "Test Loss: 0.392\n",
+      "Test Accuracy: 0.883\n",
+      "\n",
+      "Epoch: 26 out of 30\n",
+      "Training Loss: 0.195\n",
+      "Test Loss: 0.471\n",
+      "Test Accuracy: 0.878\n",
+      "\n",
+      "Epoch: 27 out of 30\n",
+      "Training Loss: 0.191\n",
+      "Test Loss: 0.431\n",
+      "Test Accuracy: 0.881\n",
+      "\n",
+      "Epoch: 28 out of 30\n",
+      "Training Loss: 0.195\n",
+      "Test Loss: 0.418\n",
+      "Test Accuracy: 0.882\n",
+      "\n",
+      "Epoch: 29 out of 30\n",
+      "Training Loss: 0.192\n",
+      "Test Loss: 0.390\n",
+      "Test Accuracy: 0.887\n",
+      "\n",
+      "Epoch: 30 out of 30\n",
+      "Training Loss: 0.185\n",
+      "Test Loss: 0.428\n",
+      "Test Accuracy: 0.875\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "from torchvision import datasets, transforms\n",
+    "from torch import nn, optim\n",
+    "import torch.nn.functional as F\n",
+    "\n",
+    "# Define a transform to normalize the data\n",
+    "transform = transforms.Compose([transforms.ToTensor(),\n",
+    "                                transforms.Normalize((0.5, 0.5, 0.5),\n",
+    "                                                     (0.5, 0.5, 0.5))])\n",
+    "\n",
+    "# Download and load the training data\n",
+    "trainset = datasets.FashionMNIST(\n",
+    "    '.pytorch/F_MNIST_data/', download=True, train=True, transform=transform)\n",
+    "\n",
+    "trainloader = torch.utils.data.DataLoader(\n",
+    "    trainset, batch_size=64, shuffle=True)\n",
+    "\n",
+    "# Download and load the test data\n",
+    "testset = datasets.FashionMNIST(\n",
+    "    '.pytorch/F_MNIST_data/', download=True, train=False,\n",
+    "    transform=transform)\n",
+    "\n",
+    "testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True)\n",
+    "\n",
+    "\n",
+    "class Classifier(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super().__init__()\n",
+    "        self.fc1 = nn.Linear(784, 256)\n",
+    "        self.fc2 = nn.Linear(256, 128)\n",
+    "        self.fc3 = nn.Linear(128, 64)\n",
+    "        self.fc4 = nn.Linear(64, 10)\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        # make sure input tensor is flattened\n",
+    "        x = x.view(x.shape[0], -1)\n",
+    "\n",
+    "        x = F.relu(self.fc1(x))\n",
+    "        x = F.relu(self.fc2(x))\n",
+    "        x = F.relu(self.fc3(x))\n",
+    "        x = F.log_softmax(self.fc4(x), dim=1)\n",
+    "\n",
+    "        return x\n",
+    "\n",
+    "\n",
+    "model = Classifier()\n",
+    "\n",
+    "images, labels = next(iter(testloader))\n",
+    "\n",
+    "# Get the class probabilities\n",
+    "ps = torch.exp(model(images))\n",
+    "\n",
+    "# Make sure the shape is appropriate, we should get 10 class probabilities for\n",
+    "# 64 examples\n",
+    "print(ps.shape)\n",
+    "\n",
+    "top_p, top_class = ps.topk(1, dim=1)\n",
+    "# Look at the most likely classes for the first 10 examples\n",
+    "print(top_class[:10, :])\n",
+    "\n",
+    "\n",
+    "equals = top_class == labels.view(*top_class.shape)\n",
+    "\n",
+    "\n",
+    "accuracy = torch.mean(equals.type(torch.FloatTensor))\n",
+    "print(f'Accuracy: {accuracy.item()*100}%')\n",
+    "\n",
+    "\n",
+    "# Model begins\n",
+    "\n",
+    "model = Classifier()\n",
+    "criterion = nn.NLLLoss()\n",
+    "optimizer = optim.Adam(model.parameters(), lr=0.003)\n",
+    "\n",
+    "epochs = 30\n",
+    "steps = 0\n",
+    "\n",
+    "trainLosses, testLosses = [], []\n",
+    "for e in range(epochs):\n",
+    "    runningLoss = 0\n",
+    "    for images, labels in trainloader:\n",
+    "\n",
+    "        optimizer.zero_grad()\n",
+    "\n",
+    "        log_ps = model(images)\n",
+    "        loss = criterion(log_ps, labels)\n",
+    "        loss.backward()\n",
+    "        optimizer.step()\n",
+    "\n",
+    "        runningLoss += loss.item()\n",
+    "\n",
+    "    else:\n",
+    "        testLoss = 0\n",
+    "        accuracy = 0\n",
+    "\n",
+    "        # Turn off gradients for validation step\n",
+    "        with torch.no_grad():\n",
+    "            for images, labels in testloader:\n",
+    "                # Get the output\n",
+    "                log_ps = model(images)\n",
+    "                # Get the loss\n",
+    "                testLoss += criterion(log_ps, labels)\n",
+    "\n",
+    "                # Get the probabilities\n",
+    "                ps = torch.exp(log_ps)\n",
+    "                # Get the most likely class for each prediction\n",
+    "                top_p, top_class = ps.topk(1, dim=1)\n",
+    "                # Check if the predictions match the actual label\n",
+    "                equals = top_class == labels.view(*top_class.shape)\n",
+    "                # Update accuracy\n",
+    "                accuracy += torch.mean(equals.type(torch.FloatTensor))\n",
+    "\n",
+    "        # Update train loss\n",
+    "        trainLosses.append(runningLoss / len(trainloader))\n",
+    "        # Update test loss\n",
+    "        testLosses.append(testLoss / len(testloader))\n",
+    "\n",
+    "        # Print output\n",
+    "        print(f'Epoch: {e+1} out of {epochs}')\n",
+    "        print(f'Training Loss: {runningLoss/len(trainloader):.3f}')\n",
+    "        print(f'Test Loss: {testLoss/len(testloader):.3f}')\n",
+    "        print(f'Test Accuracy: {accuracy/len(testloader):.3f}')\n",
+    "        print()\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/PyTorch/fc_model.py
+++ b/PyTorch/fc_model.py
@@ -0,0 +1,104 @@
+import torch
+from torch import nn
+import torch.nn.functional as F
+
+
+class Network(nn.Module):
+    def __init__(self, input_size, output_size, hidden_layers, drop_p=0.5):
+        ''' Builds a feedforward network with arbitrary hidden layers.
+
+            Arguments
+            ---------
+            input_size: integer, size of the input layer
+            output_size: integer, size of the output layer
+            hidden_layers: list of integers, the sizes of the hidden layers
+
+        '''
+        super().__init__()
+        # Input to a hidden layer
+        self.hidden_layers = nn.ModuleList(
+            [nn.Linear(input_size, hidden_layers[0])])
+
+        # Add a variable number of more hidden layers
+        layer_sizes = zip(hidden_layers[:-1], hidden_layers[1:])
+        self.hidden_layers.extend([nn.Linear(h1, h2)
+                                   for h1, h2 in layer_sizes])
+
+        self.output = nn.Linear(hidden_layers[-1], output_size)
+
+        self.dropout = nn.Dropout(p=drop_p)
+
+    def forward(self, x):
+        ''' Forward pass through the network, returns the output logits '''
+
+        for each in self.hidden_layers:
+            x = F.relu(each(x))
+            x = self.dropout(x)
+        x = self.output(x)
+
+        return F.log_softmax(x, dim=1)
+
+
+def validation(model, testloader, criterion):
+    accuracy = 0
+    test_loss = 0
+    for images, labels in testloader:
+
+        images = images.resize_(images.size()[0], 784)
+
+        output = model.forward(images)
+        test_loss += criterion(output, labels).item()
+
+        # Calculating the accuracy
+        # Model's output is log-softmax, take exponential to get the probabilities
+        ps = torch.exp(output)
+        # Class with highest probability is our predicted class, compare with true label
+        equality = (labels.data == ps.max(1)[1])
+        # Accuracy is number of correct predictions divided by all predictions, just take the mean
+        accuracy += equality.type_as(torch.FloatTensor()).mean()
+
+    return test_loss, accuracy
+
+
+def train(model, trainloader, testloader, criterion, optimizer, epochs=5, print_every=40):
+
+    steps = 0
+    running_loss = 0
+    for e in range(epochs):
+        # Model in training mode, dropout is on
+        model.train()
+        for images, labels in trainloader:
+            steps += 1
+
+            # Flatten images into a 784 long vector
+            images.resize_(images.size()[0], 784)
+
+            optimizer.zero_grad()
+
+            output = model.forward(images)
+            loss = criterion(output, labels)
+            loss.backward()
+            optimizer.step()
+
+            running_loss += loss.item()
+
+            if steps % print_every == 0:
+                # Model in inference mode, dropout is off
+                model.eval()
+
+                # Turn off gradients for validation, will speed up inference
+                with torch.no_grad():
+                    test_loss, accuracy = validation(
+                        model, testloader, criterion)
+
+                print("Epoch: {}/{}.. ".format(e + 1, epochs),
+                      "Training Loss: {:.3f}.. ".format(
+                          running_loss / print_every),
+                      "Test Loss: {:.3f}.. ".format(
+                          test_loss / len(testloader)),
+                      "Test Accuracy: {:.3f}".format(accuracy / len(testloader)))
+
+                running_loss = 0
+
+                # Make sure dropout and grads are on for training
+                model.train()
--- a/PyTorch/helper.py
+++ b/PyTorch/helper.py
@@ -0,0 +1,95 @@
+import matplotlib.pyplot as plt
+import numpy as np
+from torch import nn, optim
+from torch.autograd import Variable
+
+
+def test_network(net, trainloader):
+
+    criterion = nn.MSELoss()
+    optimizer = optim.Adam(net.parameters(), lr=0.001)
+
+    dataiter = iter(trainloader)
+    images, labels = dataiter.next()
+
+    # Create Variables for the inputs and targets
+    inputs = Variable(images)
+    targets = Variable(images)
+
+    # Clear the gradients from all Variables
+    optimizer.zero_grad()
+
+    # Forward pass, then backward pass, then update weights
+    output = net.forward(inputs)
+    loss = criterion(output, targets)
+    loss.backward()
+    optimizer.step()
+
+    return True
+
+
+def imshow(image, ax=None, title=None, normalize=True):
+    """Imshow for Tensor."""
+    if ax is None:
+        fig, ax = plt.subplots()
+    image = image.numpy().transpose((1, 2, 0))
+
+    if normalize:
+        mean = np.array([0.485, 0.456, 0.406])
+        std = np.array([0.229, 0.224, 0.225])
+        image = std * image + mean
+        image = np.clip(image, 0, 1)
+
+    ax.imshow(image)
+    ax.spines['top'].set_visible(False)
+    ax.spines['right'].set_visible(False)
+    ax.spines['left'].set_visible(False)
+    ax.spines['bottom'].set_visible(False)
+    ax.tick_params(axis='both', length=0)
+    ax.set_xticklabels('')
+    ax.set_yticklabels('')
+
+    return ax
+
+
+def view_recon(img, recon):
+    ''' Function for displaying an image (as a PyTorch Tensor) and its
+        reconstruction also a PyTorch Tensor
+    '''
+
+    fig, axes = plt.subplots(ncols=2, sharex=True, sharey=True)
+    axes[0].imshow(img.numpy().squeeze())
+    axes[1].imshow(recon.data.numpy().squeeze())
+    for ax in axes:
+        ax.axis('off')
+        ax.set_adjustable('box-forced')
+
+
+def view_classify(img, ps, version="MNIST"):
+    ''' Function for viewing an image and it's predicted classes.
+    '''
+    ps = ps.data.numpy().squeeze()
+
+    fig, (ax1, ax2) = plt.subplots(figsize=(6, 9), ncols=2)
+    ax1.imshow(img.resize_(1, 28, 28).numpy().squeeze())
+    ax1.axis('off')
+    ax2.barh(np.arange(10), ps)
+    ax2.set_aspect(0.1)
+    ax2.set_yticks(np.arange(10))
+    if version == "MNIST":
+        ax2.set_yticklabels(np.arange(10))
+    elif version == "Fashion":
+        ax2.set_yticklabels(['T-shirt/top',
+                             'Trouser',
+                             'Pullover',
+                             'Dress',
+                             'Coat',
+                             'Sandal',
+                             'Shirt',
+                             'Sneaker',
+                             'Bag',
+                             'Ankle Boot'], size='small')
+    ax2.set_title('Class Probability')
+    ax2.set_xlim(0, 1.1)
+
+    plt.tight_layout()
--- a/PyTorch/python_scripts/fc_model.py
+++ b/PyTorch/python_scripts/fc_model.py
@@ -0,0 +1,104 @@
+import torch
+from torch import nn
+import torch.nn.functional as F
+
+
+class Network(nn.Module):
+    def __init__(self, input_size, output_size, hidden_layers, drop_p=0.5):
+        ''' Builds a feedforward network with arbitrary hidden layers.
+
+            Arguments
+            ---------
+            input_size: integer, size of the input layer
+            output_size: integer, size of the output layer
+            hidden_layers: list of integers, the sizes of the hidden layers
+
+        '''
+        super().__init__()
+        # Input to a hidden layer
+        self.hidden_layers = nn.ModuleList(
+            [nn.Linear(input_size, hidden_layers[0])])
+
+        # Add a variable number of more hidden layers
+        layer_sizes = zip(hidden_layers[:-1], hidden_layers[1:])
+        self.hidden_layers.extend([nn.Linear(h1, h2)
+                                   for h1, h2 in layer_sizes])
+
+        self.output = nn.Linear(hidden_layers[-1], output_size)
+
+        self.dropout = nn.Dropout(p=drop_p)
+
+    def forward(self, x):
+        ''' Forward pass through the network, returns the output logits '''
+
+        for each in self.hidden_layers:
+            x = F.relu(each(x))
+            x = self.dropout(x)
+        x = self.output(x)
+
+        return F.log_softmax(x, dim=1)
+
+
+def validation(model, testloader, criterion):
+    accuracy = 0
+    test_loss = 0
+    for images, labels in testloader:
+
+        images = images.resize_(images.size()[0], 784)
+
+        output = model.forward(images)
+        test_loss += criterion(output, labels).item()
+
+        # Calculating the accuracy
+        # Model's output is log-softmax, take exponential to get the probabilities
+        ps = torch.exp(output)
+        # Class with highest probability is our predicted class, compare with true label
+        equality = (labels.data == ps.max(1)[1])
+        # Accuracy is number of correct predictions divided by all predictions, just take the mean
+        accuracy += equality.type_as(torch.FloatTensor()).mean()
+
+    return test_loss, accuracy
+
+
+def train(model, trainloader, testloader, criterion, optimizer, epochs=5, print_every=40):
+
+    steps = 0
+    running_loss = 0
+    for e in range(epochs):
+        # Model in training mode, dropout is on
+        model.train()
+        for images, labels in trainloader:
+            steps += 1
+
+            # Flatten images into a 784 long vector
+            images.resize_(images.size()[0], 784)
+
+            optimizer.zero_grad()
+
+            output = model.forward(images)
+            loss = criterion(output, labels)
+            loss.backward()
+            optimizer.step()
+
+            running_loss += loss.item()
+
+            if steps % print_every == 0:
+                # Model in inference mode, dropout is off
+                model.eval()
+
+                # Turn off gradients for validation, will speed up inference
+                with torch.no_grad():
+                    test_loss, accuracy = validation(
+                        model, testloader, criterion)
+
+                print("Epoch: {}/{}.. ".format(e + 1, epochs),
+                      "Training Loss: {:.3f}.. ".format(
+                          running_loss / print_every),
+                      "Test Loss: {:.3f}.. ".format(
+                          test_loss / len(testloader)),
+                      "Test Accuracy: {:.3f}".format(accuracy / len(testloader)))
+
+                running_loss = 0
+
+                # Make sure dropout and grads are on for training
+                model.train()
--- a/PyTorch/python_scripts/part5_inference_and_validation.py
+++ b/PyTorch/python_scripts/part5_inference_and_validation.py
@@ -0,0 +1,130 @@
+import torch
+from torchvision import datasets, transforms
+from torch import nn, optim
+import torch.nn.functional as F
+import matplotlib.pyplot as plt
+
+# Define a transform to normalize the data
+transform = transforms.Compose([transforms.ToTensor(),
+                                transforms.Normalize((0.5, 0.5, 0.5),
+                                                     (0.5, 0.5, 0.5))])
+
+# Download and load the training data
+trainset = datasets.FashionMNIST(
+    '.pytorch/F_MNIST_data/', download=True, train=True, transform=transform)
+
+trainloader = torch.utils.data.DataLoader(
+    trainset, batch_size=64, shuffle=True)
+
+# Download and load the test data
+testset = datasets.FashionMNIST(
+    '.pytorch/F_MNIST_data/', download=True, train=False,
+    transform=transform)
+
+testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True)
+
+
+class Classifier(nn.Module):
+    def __init__(self):
+        super().__init__()
+        self.fc1 = nn.Linear(784, 256)
+        self.fc2 = nn.Linear(256, 128)
+        self.fc3 = nn.Linear(128, 64)
+        self.fc4 = nn.Linear(64, 10)
+
+    def forward(self, x):
+        # make sure input tensor is flattened
+        x = x.view(x.shape[0], -1)
+
+        x = F.relu(self.fc1(x))
+        x = F.relu(self.fc2(x))
+        x = F.relu(self.fc3(x))
+        x = F.log_softmax(self.fc4(x), dim=1)
+
+        return x
+
+
+model = Classifier()
+
+images, labels = next(iter(testloader))
+
+# Get the class probabilities
+ps = torch.exp(model(images))
+
+# Make sure the shape is appropriate, we should get 10 class probabilities for
+# 64 examples
+print(ps.shape)
+
+top_p, top_class = ps.topk(1, dim=1)
+# Look at the most likely classes for the first 10 examples
+print(top_class[:10, :])
+
+
+equals = top_class == labels.view(*top_class.shape)
+
+
+accuracy = torch.mean(equals.type(torch.FloatTensor))
+print(f'Accuracy: {accuracy.item()*100}%')
+
+
+# Model begins
+
+model = Classifier()
+criterion = nn.NLLLoss()
+optimizer = optim.Adam(model.parameters(), lr=0.003)
+
+epochs = 30
+steps = 0
+
+trainLosses, testLosses = [], []
+for e in range(epochs):
+    runningLoss = 0
+    for images, labels in trainloader:
+
+        optimizer.zero_grad()
+
+        log_ps = model(images)
+        loss = criterion(log_ps, labels)
+        loss.backward()
+        optimizer.step()
+
+        runningLoss += loss.item()
+
+    else:
+        testLoss = 0
+        accuracy = 0
+
+        # Turn off gradients for validation step
+        with torch.no_grad():
+            for images, labels in testloader:
+                # Get the output
+                log_ps = model(images)
+                # Get the loss
+                testLoss += criterion(log_ps, labels)
+
+                # Get the probabilities
+                ps = torch.exp(log_ps)
+                # Get the most likely class for each prediction
+                top_p, top_class = ps.topk(1, dim=1)
+                # Check if the predictions match the actual label
+                equals = top_class == labels.view(*top_class.shape)
+                # Update accuracy
+                accuracy += torch.mean(equals.type(torch.FloatTensor))
+
+        # Update train loss
+        trainLosses.append(runningLoss / len(trainloader))
+        # Update test loss
+        testLosses.append(testLoss / len(testloader))
+
+        # Print output
+        print(f'Epoch: {e+1} out of {epochs}')
+        print(f'Training Loss: {runningLoss/len(trainloader):.3f}')
+        print(f'Test Loss: {testLoss/len(testloader):.3f}')
+        print(f'Test Accuracy: {accuracy/len(testloader):.3f}')
+        print()
+
+
+plt.plot(trainLosses, label='Training loss')
+plt.plot(testLosses, label='Validation loss')
+plt.legend(frameon=False)
+plt.show()