# Project 1: Fully Connected Neural Networks

Instructions and assignment dates can be found on [github link](https://exception1984.github.io/CS294Y-2024/). 
Your submission will be this notebook, with all the outputs embedded in the notebook.
We will grade only what we can see in the notebook.

Main goal of this assignment is to get familiar with:
- Basic Ibex 
- Basic PyTorch 
- Basic Tensor operations
- Train a neural network

You will train a autoencoder on fashion mnist dataset.

IBEX is the compute cluster we use in KAUST. <br>
IMPORTANT policy for unassociated students can be [found here](https://docs.hpc.kaust.edu.sa/policy/ibex.html#limits-on-unassociated-users), basically 1 GPU 1080TI or 2080TI. <br>
Useful ibex documentation Quickstart can be [found here](https://docs.hpc.kaust.edu.sa/quickstart/ibex.html). <br>
Ibex 101 slides can be [found here](https://drive.google.com/file/d/13tiL3HjCu16cJ3GP_gR37xrvZ4h7W7KH/view).

**Submission note:** We grade the content in this notebook. Make sure outputs are present. You will package everything in a zip file and submit it with the following format:
f"P1_{LastName}_{FirstName}.V{version_number}.zip" <br>e.g. "P1_Smit_John_V1.zip" - check more details on the announcement

## TASK 1:  Setup (10 points)
### Connecting to IBEX and SETUP

Complete the steps below, and leave the required output of the cells in the notebook <br>
1. Read the [quickstart](https://docs.hpc.kaust.edu.sa/quickstart/ibex.html), and connect to ibex.
2. Follow instructions on how to [setup miniconda](https://github.com/kaust-rccl/ibex-miniconda-install) ([full ibex guide](https://docs.hpc.kaust.edu.sa/soft_env/prog_env/python_package_management/conda/ibex.html) that covers same steps)
3. Create a [new conda environment](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html) named "CS294Y" that has:
    * python, [pytorch](https://pytorch.org/get-started/locally/) and [jupyterlab](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html) installed.
4. Understand how to [submit jobs](https://docs.hpc.kaust.edu.sa/soft_env/job_schd/slurm/basic_jobscript.html), and follow [jupyterlab instructions to spin a server](https://docs.hpc.kaust.edu.sa/soft_env/job_schd/slurm/interactive_jobs/jupyter.html#job-on-ibex)
5. Choose the front end of your choice:
    * a) Connect to jupyterlab frontend as shown in the instructions. ([should see](https://jupyterlab.readthedocs.io/en/stable/_images/jupyterlab.png))
    * b) Connect via VSCode jupyter server:
        * You will get something like (http://gpu201-02-l:10009) in sbatch output, select kernel > use existing kernel > and use this HTTP to connect, and input the token when requested. Â 
6. Execute the following two cells

Important note: Future projects also need those steps. Setup for personal computers is similar,  **but we expect you to use IBEX.**

In [None]:
!nvidia-smi --query-gpu=name,utilization.gpu,memory.used,memory.total --format=csv,noheader,nounits

In [None]:
# TODO: Execute me in the correct environment
import torch
import os
import sys
import socket
import time
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms

cuda_tensor = torch.rand((2,3,256,256)).cuda()
hostname = socket.gethostname()
username = os.getlogin()

print(f"Hostname: {hostname}", f"Username: {username}")
print(torch.__version__, "/", torch.cuda.memory_allocated(0) / 1024**2)
print(sys.executable, "\n", sys.version)

## TASK 2: Basic Pytorch and tensor manipulation

### Task 2.1: Combining two tensors (2.5 points)
Concatenate the two tensors below along the batch dimension and print the result. <br>

In [None]:
a1 = torch.rand((2,3,256,256))
b1 = torch.rand((3,256, 256))
# TODO: Add/Modify your code below and print the shape
r1 = None
print(None)

None


### Task 2.2: Pytorch Gradient (2.5 points)
Compute gradient of y = 3*x^2 + 2*x + 1 at x = 2 using PyTorch inbuild mehanism<br>

In [None]:
# TODO: Add/Modify your code below and print the shape
x = None
y = None

print(None)

None


### Task 2.3: Indexing (2.5 points)
Select every other element in the batch, and reshape the last dimension to square

In [None]:
a1 = torch.rand((4,3,64 ** 2))
# TODO: Add/Modify your code below and print the shape
a1 = None
print(None)


None


### Task 2.4: Fix the code (7.5 points)
What is wrong with the code below? 
Fix it and explain why it was wrong. <br>
Hint: Think of the network in an optimization loop. <br> Can import extra classes if needed. <br>

In [None]:
class SimpleNetwork(nn.Module):
    def __init__(self):
        super(SimpleNetwork, self).__init__()
        self.layers = [
            nn.Linear(10, 20),
            nn.ReLU(),
            nn.Linear(20, 5)
        ]
    
    def forward(self, x):
        for layer in self.layers:
            x = layer(x)
        return x


Your interpretation and explanation:

TODO: Explain

## TASK 3: Define data loaders (10 points)

Define transforms, dataset, and dataloader for the [FashionMNIST](https://pytorch.org/vision/stable/datasets.html#fashion-mnist) dataset. <br> Finish the cell below.
Use 50k / 10k validation split. <br>
**Explain why and which batch size you selected.** <br> 
Hint: Use `torchvision.datasets.FashionMNIST` and `torch.utils.data.DataLoader` <br>

Function load data returns a dictionary with the training phase as keys and the appropriate data <br>

In [None]:
# Follow the link above, and define the following variables
device = torch.device("cuda")

def load_data(batch_size, transform, data_dir= "mnist_data/"):
    # TODO
    pass
      
dataloaders, actual_datasets, dataset_sizes = None, None, None # TODO


## Fully Connected Autoencoder


### TASK 4.1: Implement the autoencoder (15 points)

Implement a fully connected autoencoder with four encoder layers and decoder layers <br> 
At each layer, use a factor of 4 for increase/decrease. <br>

In [None]:
class MyNet(nn.Module):
    def __init__(self, initial_size=None):
        super(MyNet, self).__init__()
        pass
    def forward(self, x):
        pass

### TASK 4.1: Training and test code (5 points)

1. Modify the code below to make use of the validation code. 
2. Note part of task 4.4: Modify the code below to return the loss on the train and validation set and plot.

In [None]:
def train_autoencoder(model, criterion, optimizer, dataloaders,
                       dataset_sizes, device, num_epochs=25, 
                       save_path='saved_weight.pth', verbose=False):
    since = time.time()

    for epoch in range(num_epochs):
        
        epoch_str = 'Epoch {}/{}'.format(epoch, num_epochs - 1)
        # Each epoch has a training phase
        for phase in ['train']:
            if phase == 'train': model.train()  # Set model to training mode

            running_loss = 0.0

            # Iterate over data.
            for inputs, _ in dataloaders[phase]:  # Autoencoder doesn't need labels
                inputs = inputs.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    loss = criterion(outputs, inputs)  # Reconstruction loss

                    # backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)

            epoch_loss = running_loss / dataset_sizes[phase]
            if verbose:
                print('[{}] {} Loss: {:.4f}'.format(epoch_str, phase, epoch_loss))
    if verbose:
        print()
        time_elapsed = time.time() - since
        print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))

    torch.save(model.state_dict(), save_path)
    return model

In [None]:
def test_autoencoder(model, dataloaders, dataset_sizes, device, criterion, load_path='saved_weight.pth'):
    # load the model weights
    model.load_state_dict(torch.load(load_path))
    
    since = time.time()

    for phase in ['test']:
        if phase == 'test':
            model.eval()   # Set model to evaluate mode

        running_loss = 0.0

        # Iterate over data.
        for inputs, _ in dataloaders[phase]:  # Autoencoder doesn't need labels
            inputs = inputs.to(device)

            with torch.no_grad():
                outputs = model(inputs)
                loss = criterion(outputs, inputs)  # Reconstruction loss

            # statistics
            running_loss += loss.item() * inputs.size(0)

        epoch_loss = running_loss / dataset_sizes[phase]

        print('{} Loss: {:.4f}'.format(phase, epoch_loss))

    time_elapsed = time.time() - since
    print('Testing complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))

    return

### Task 4.3: Combine code together (15 points)
Create a model, optimizer, loss function, and train the model. <br> 
In addition, print the model summary and check the performance on the test set. <br>
**Explain why and which hyperparameters you selected. Which loss, etc, any design choice** <br>

In [None]:
# TODO: your code

TODO: your explanation here

### 4.4: Visualize loss (5 points)
Visualize a plot loss curve for the train/val set. <br>
Note (**applies to 4.4 and 4.5**): Backup your arguments with multiple configurations visualized, backing up your explanation.

In [None]:
from torchvision.utils import make_grid
from torchvision.transforms import ToPILImage
import matplotlib.pyplot as plt

# TODO: your code

TODO: your explanation here with plots above, explaining your findings

### TASK 4.5 Visualize Reconstruction (10 points)
Implement input and output reconstruction from the first 10 test samples. <br>
Expected to have ten images per row (input, output) <br>
Note: You can use `torchvision.utils.make_grid` to make a grid of images. <br>
**Explain the results of visualization. Did you get what you expected?**

In [None]:
number_of_images = 10
data_input = actual_datasets['test']
# TODO: your code


TODO: your observation here about multiple reconstructions

### Task 4.6: Reconstruction with Noise (5 points)
Add noise to the input data and test output reconstruction of the same images. <br>
Add Gaussian noise of different standard deviations. <br>
Demonstrate different effects and explain the results. <br>

In [None]:
# TODO: your code

TODO: your explanation and interpretation of the results

### TASK 4.7: Retrain the model with noise in mind (10 points)
Retrain the model with noisy input augmentations. <br>
Rerun training (task 4.3) and visualizations (4.5 and 4.6) with noisy input data. <br>
**Explain what type of noise we can add and how you would validate your new results and your observations** <br>

In [None]:
# TODO: your code

TODO: Explain what type of noise we would use and how we would validate the results.