Deep Learning exercise

This exercise has two parts. First you shall implement implement a FFNN from scratch, coding both the forward and backward pass using matrix multiplications, and next update this code to implement the NNAlign forward pass with insertions and deletions for peptides which are not 9-mers.

First you must access the program templates and exercise data

Download the file
NNDeep_FFNN_2025.tar.gz.

Open the file (using tar -xzvf NNDeep_FFNN.tar.gz) and place the created NNDeep directory in the "Algo/code" directory.

Now the NNDeep directory should contain the Jupyter-Notebook file

FFNN_from_scratch.ipynb

Next, download the data file

NNdeep_data.tar.gz.

Place the file in the course data direcory, and open the file (using tar -xzvf NNdeep_data.tar.gz). The created NNDeep directory should now contain the following files and directories

BLOSUM50
A0201/
A0301/

Now we are ready to code

Part I - FFNN exercise

Implementing the FFNN algorithms

FFNN

Open the FFNN_from_scratch.ipynb notebook and implement the feed-forward neural network part. You shall fill in the missing code (find the place with the missing code (XXXX)).

In details you shall, for a one hidden layer feed-forward neural network, with weights and biases:

Code the ReLU and Sigmoid activation functions
Create the forward loop, using ReLU as activation function in the hidden layer and Sigmoid as activation function in the output layer
Compute the derivatives and backward pass using the chain rule and matrix multiplications

What can you tell from the error curves for the training and validation dataset? Is your model training properly?

Test the code by selecting some allele data and running the notebooks.

Test different hyperparameters (hidden_size, learning_rate, n_epochs, etc) and plot the various results you get and compare their AUC values (larger = better).

After you have succesfully implemented the neural network, make the notebook into two python scripts:

One where you load data, instantiate a model, train it and save its parameters to a file. The script should take the following command line arguments:
- Path to a training data file.
- Path to a validation data file.
- Number of hidden layer neurons.
- Learning rate.
- Path to output directory.
Another, where you load a set of parameters for a model and do inference on a dataset you specify in your command line arguments, ie a program with the command line arguments:
- Path to inference data file.
- Path to parameter file for trained model (or models).
- Path to output directory.

To ensure that all your hard work has paid off, try to run the training script using the same hyperparameter configuration and training data you used in the first neural network exercise NN exercise

Is the error more or less the same for the new and the old model? How much faster is the new model?

Now you are done with the FFNN exercise. Remember to upload python programs via DTU-Inside

Part II - NNAlign exercise

Implementing the NNAlign forward algorithm

This exercise will be following the FFNN exercise closely.

We will

Encode our sequences without padding, since we will now be encoding a 9-mer window for each peptide regardless of its length
Modify the forward function to implement the NNAlign forward pass. Peptides shorter than the motif length of 9 will have an insertion, and peptides longer than the motif length of 9 will have either a continuous binding core or a binding core with a deletion inside.

First download the code NNAlign_from_scratch.ipynb.

and place it in the NNDeep directory in the "Algo/code" directory.

Your task is to first fill in the missing code in the forward function of the NNAlignFFNN class (find the place with the missing code (XXXX)).

Try training an NNAlign model with the same hyperparameters as the FFNN model from before.

The NNAlign model should perform better than the FFNN. Why?
Can you think of a scenario where training an NNAlign model using the current Python code would be difficult/impossible? How would you solve it? (hint: limited memory)

After you have succesfully implemented the NNAlign method, make the notebook into two python scripts:

One where you load data, instantiate a model, train it and save its parameters to a file. The script should take the following command line arguments:
- Path to a training data file.
- Path to a validation data file.
- Size of motif length (in our case it is 9, but could be other values for other types of data)
- Number of FFNN hidden layer neurons.
- Learning rate.
- Number of training epochs.
- Path to output directory.
and another, where you load a set of parameters for a model and do inference on a dataset you specify in your command line arguments.
- Path to inference data file.
- Path to parameter file for trained model (or models).
- Path to output directory.

The scripts you have developed here are modular and can be used for various purposes, such as cross-validation and hyperparameter tuning. You have already implemented some wrapper scripts for this during the SMM exercise SMM exercise .

You should adapt these wrapper scripts so they can be used to train and evaluate a FFNN or NNAlign model on the SMM dataset splits.
The same scripts could be used for hyperparamter tuning, by using only a single data split and calling the training script using various hyperparamter configurations. Try to think about which part of the dataset split (i.e. training, validation or test partitions) should be used for selecting the best model configuration in an ubiased way.

Now you are done. Remember to upload python programs via DTU-Inside