This exercise has two parts. First you shall implement a conventional Feed-Forward Neural Network (FFNN) in PyTorch, and next update this code to implement a simplified NNAlign-like forward pass for MHC class II peptides. In this NNAlign-like model, each peptide is represented by all possible continuous 9-mer binding cores, and the model prediction for the peptide is the maximum prediction over these candidate cores.
In this simplified version we do not model peptide-flanking-region effects.
First you must access the program templates and exercise data.
Download the file
NNdeep_2026.tar.gz.
Open the file using:
tar -xzvf NNdeep_2026.tar.gz
and place the created NNDeep directory in the "Algo/code" directory.
Now the NNDeep directory should contain the Jupyter-Notebook files:
FFNN_pytorch.ipynb NNAlign_pytorch.ipynb
Next, download the data file
NNdeep_data_2026.tar.gz.
Place the file in the course data directory, and open the file using:
tar -xzvf NNdeep_data_2026.tar.gz
The created NNDeep data directory should now contain the BLOSUM encoding file and the peptide-MHC class II datasets used in the notebooks.
The notebooks use PyTorch. If you have created a conda environment for the course, first activate it and then install PyTorch:
conda activate YOUR_ENVIRONMENT_NAME pip install torch
If you are not using a separate conda environment, activate the base environment and install PyTorch there:
conda activate base pip install torch
Now we are ready to code.
Open the FFNN_pytorch.ipynb notebook and implement the feed-forward neural network part. You shall fill in the missing code. Find the places marked with missing code (XXX).
In detail you shall, for a one hidden layer feed-forward neural network:
What can you tell from the error curves for the training and validation datasets?
Is your model training properly?
Test the code by selecting the allele data and running the notebook.
Test different hyperparameters, for example hidden layer size, learning rate and number of epochs. Plot the various results you get and compare their AUC values. Larger AUC values are better.
After you have successfully implemented the neural network, try training the FFNN both on fixed-length binding cores and on full-length MHC class II peptides.
What happens when a conventional FFNN is trained on full-length peptides of variable length? Why is this a limitation for MHC class II peptide binding prediction?
After you have successfully implemented the FFNN model in the notebook, make the notebook into two Python scripts:
Now you are done with the FFNN exercise.
This exercise will follow the FFNN exercise closely.
We will:
Open the NNAlign_pytorch.ipynb notebook.
Your task is to first fill in the missing code. Find the places marked with missing code (XXX).
Try training an NNAlign-like model with the same data and hyperparameters as the FFNN model from before.
After you have successfully implemented the NNAlign-like method, consider how the notebook could be turned into two Python scripts:
The scripts developed here would be modular and could be used for various purposes, such as cross-validation and hyperparameter tuning. You have already implemented wrapper scripts for this during the SMM exercise SMM exercise.
Now you are done. Remember to upload your completed Python scripts via DTU-Inside.