Solo git: Difference between revisions

From 22118
Jump to navigation Jump to search
 
(6 intermediate revisions by the same user not shown)
Line 6: Line 6:
== Required course material for the lesson ==
== Required course material for the lesson ==
Powerpoint: [https://teaching.healthtech.dtu.dk/material/22118/22118_03-Git.ppt Git]<br>
Powerpoint: [https://teaching.healthtech.dtu.dk/material/22118/22118_03-Git.ppt Git]<br>
Optional - do you want to know more<br>
Online: [https://coderefinery.github.io/git-intro/ Coderefinery's introduction to git] This is one of many git resources on the net.<br>
Online: [https://www.markdownguide.org/ Markdown guide]<br>
Online: [https://about.gitlab.com/topics/ci-cd/ CI/CD] - Continuous Integration and Continuous Delivery. It’s a DevOps practice used with GitHub/GitLab.
<!-- Resource: [[Example code - File Reading]]<br> -->
<!-- Resource: [[Example code - File Reading]]<br> -->


Line 14: Line 18:
== Exercises to be handed in ==
== Exercises to be handed in ==
'''In exercises 2-9 the job is to ''select'' the lines in the input file, i.e. the exercises are about which lines to select in various ways. The output is just a few of the lines in the input file.'''
'''In exercises 2-9 the job is to ''select'' the lines in the input file, i.e. the exercises are about which lines to select in various ways. The output is just a few of the lines in the input file.'''
# Create a private repository for your future exercises on [https://github.com GitHub] as described in the powerpoint. You can choose whatever name you like for the repository. '''All exercises in the future should be created/stored in this repository unless otherwise strictly stated.'''
# Create a private repository for your exercises on [https://github.com GitHub] as described in the powerpoint. You can choose whatever name you like for the repository. '''All exercises now and in the future should be created/stored in this repository unless otherwise strictly stated.'''
# <font color="#AA00FF">The input file ''scores.txt'' is a tab-separated file with an accession number in first column followed by 6 numbers (scores) between 0 and 1. You must find the accession numbers and scores (that means the entire line) of the 10 highest and 10 lowest "combined scores" (combined score is the metric for selection) and save the output in the file ''scoresextreme.txt''.<br>The combined score is simply the 6 numbers added together. The order of the output must be from high to low.</font>
# <font color="#AA00FF">The input file ''scores.txt'' is a tab-separated file with an accession number in first column followed by 6 numbers (scores) between 0 and 1. You must find the accession numbers and scores (that means the entire line) of the 10 highest and 10 lowest "combined scores" (combined score is the metric for selection) and save the output in the file ''scoresextreme.txt''.<br>The combined score is simply the 6 numbers added together. The order of the output must be from high to low.</font>
# Change exercise 2 in the following way: There is an input file, ''negative_list.txt'', which is a list of genes which can NOT be part of the output. They are banned from your analysis. As can be seen, the genes are identified by their swissprot id. In order to translate from swissprot id to accession number so you can relate it to the ''scores.txt'', you must use the input file ''translation.txt'', where the first item on the line is a accession number, second item is the corresponding swissprot id.
# Change exercise 2 in the following way: There is an input file, ''negative_list.txt'', which is a list of genes which can NOT be part of the output. They are banned from your analysis. As can be seen, the genes are identified by their swissprot id. In order to translate from swissprot id to accession number so you can relate it to the ''scores.txt'', you must use the input file ''translation.txt'', where the first item on the line is a accession number, second item is the corresponding swissprot id.

Latest revision as of 17:28, 11 February 2026

Previous: Intermediate Unix Next: Collaborative git

Required course material for the lesson

Powerpoint: Git
Optional - do you want to know more
Online: Coderefinery's introduction to git This is one of many git resources on the net.
Online: Markdown guide
Online: CI/CD - Continuous Integration and Continuous Delivery. It’s a DevOps practice used with GitHub/GitLab.

Subjects covered

  • GitHub - short
  • Git - concepts

Exercises to be handed in

In exercises 2-9 the job is to select the lines in the input file, i.e. the exercises are about which lines to select in various ways. The output is just a few of the lines in the input file.

  1. Create a private repository for your exercises on GitHub as described in the powerpoint. You can choose whatever name you like for the repository. All exercises now and in the future should be created/stored in this repository unless otherwise strictly stated.
  2. The input file scores.txt is a tab-separated file with an accession number in first column followed by 6 numbers (scores) between 0 and 1. You must find the accession numbers and scores (that means the entire line) of the 10 highest and 10 lowest "combined scores" (combined score is the metric for selection) and save the output in the file scoresextreme.txt.
    The combined score is simply the 6 numbers added together. The order of the output must be from high to low.
  3. Change exercise 2 in the following way: There is an input file, negative_list.txt, which is a list of genes which can NOT be part of the output. They are banned from your analysis. As can be seen, the genes are identified by their swissprot id. In order to translate from swissprot id to accession number so you can relate it to the scores.txt, you must use the input file translation.txt, where the first item on the line is a accession number, second item is the corresponding swissprot id.
  4. Change exercise 2 in the following way: Make the program work no matter how many numbers there are on every line. It must be the same number of numbers, i.e. in one file it could be 10 numbers on every line, in another file it could be 7 numbers per line.
  5. Change exercise 2 in the following way: Instead of using the combined score as the metric for selecting the accession numbers and scores, then use the average score as the metric. That will allow for having a different number of numbers on each line in the input file.
  6. Change exercise 2 in the following way: When you calculate the combined score the first number should weigh 50% more than the other numbers and the last should weigh 50% less.
  7. Change exercise 2 in the following way: When you calculate the combined score the numbers should be weighted after a linear sliding scale with the first number count for 50% more than its real value, sliding linearly down to the last number which is weighted 50% less. The weight is thus dynamically calculated according to how many numbers there are on the line and which position the number is on the line.
    N is the number of numbers, P is the position of the number, then the weight W is calculated as: W = 1.5 - (P-1)/(N-1)
    A more generic expression for the sliding weighting scale - B is the beginning weight, E is the ending weight: W = B - (B-E)*(P-1)/(N-1)
  8. Change exercise 2 in the following way: Just find the 10 lines in the input file with the highest combined scores. This should be easier than the original exercise.
  9. This is the same exercise as the previous (ex 8), however imagine that the input file is enormous - so big that you can not have it in memory. You still need to solve the problem, and this is done by having a running list of the best scores.

Exercises for extra practice