Artificial Neural Network

From 22113
Jump to navigation Jump to search

Description

Implement a simple artificial neural network algorithm with backpropagation in Python. ANNs are of great interest in bioinformatics. The institute has created many online prediction servers, which utilises ANNs.
The data is a part of a project at DTU HealthTech, which is about prediction of whether certain variations of a SNP will lead to a disease or not. A lot of work has already gone into preparing a data set for network training. The resulting data sets can be seen below.
It is probably a good idea to create one program, that does the training on the data set(s) and the saves the synapses (weights) to a file, and an other program, that reads the synapses and makes a prediction on the given input (data set). The training commences until the error is below a predetermined threshold or stops when max. training rounds have been reached.

Tip: Building an ANN is very different from training an ANN. This project is about building (implementing) the ANN. There are many considerations and tricks that are part of training the ANN, but the student should not focus on these as they have nothing to do with Python. If you find yourself using a lot of time trying to optimize the network parameters on order to produce the best possible predictions, then YOU ARE USING YOUR TIME WRONG. Spend it on making the code better or writing a better report.

Input and output

The input for training is whitespace separated numbers. There are 27 input values and a target value (1=disease, 0=health).

0.075 0.3 0.075 0.225 -0.3 -0.075 0.15 -0.15 0 -0.3 -0.3 0.15 -0.3 -0.225 0 -0.15 -0.15 -0.3 -0.225 -0.3 -0.075 0.3 0.193746064573032135 1.011 0.7525 0.744312561819980218 0.612 0
-0.075 0.075 0.15 0.15 -0.375 -0.15 0.375 -0.15 -0.225 -0.375 -0.15 0.15 -0.3 -0.375 -0.3 -0.225 -0.225 -0.375 -0.3 -0.375 -0.225 0.15 0.173058043883721677 1.011 0.7525 0.744312561819980218 0.340 0
-0.3 -0.225 -0.15 -0.3 -0.45 -0.15 -0.225 -0.375 0.75 -0.375 -0.075 -0.225 -0.3 -0.3 -0.375 -0.3 -0.3 -0.375 -0.075 -0.375 -0.075 0.75 0.336279812283658667 1.011 0.7525 0.744312561819980218 0.100 1
-0.15 -0.3 0 -0.3 -0.225 0.075 -0.225 -0.375 0.6 -0.075 -0.15 -0.225 0 0.3 -0.225 -0.3 -0.3 -0.225 0.3 -0.3 -0.15 0.6 0.303728369615393505 1.011 0.7525 0.744312561819980218 0.212 1

There are several datasets, that can be combined for testing and training of the network: homology_reduced_subset_1.howlin.gz, homology_reduced_subset_2.howlin.gz, homology_reduced_subset_3.howlin.gz, homology_reduced_subset_4.howlin.gz.
Normally you train with 3 of the subsets and evaluate against the fourth.

The input for testing is similar to the training input, and the output should be a value reflecting the networks opinion of the input :-)
You should consider using Numpy for this project due to the speed-up you will gain.

Further information

This project is quite large and requires students who is strong in math and theory.
The ANN should be implemented with a bias neuron on each layer - it learns better.
Wikipedia: Artificial neural network.
Wikipedia: Backpropagation.
Reference: Quick into - ha ha, you wish.
Reference: Chapter on Backpropagation in text book on machine learning

Additional posibilities

There a many other machine learning methods. The same project/data could be implemented with

or another method from Supervised learning in the Machine learning field