CfDNA exercise
Overview
First:
- Navigate to your home directory:
- Create a directory called "cfdna"
- Navigate to the directory you just created.
Blood was drawn from 2 patients. One of those is a healthy patient, the other one has lung squamous cell carcinoma.
It was previously reported that tumors can leave a signature in the blood plasma via cell-free DNA, namely:
- The distribution of fragment length will have higher variance especially with very short fragments.
- Tumor cfDNA (or ctDNA) tend to have higher than normal copy number variations (CNVs). Please also note that in the context of cancer, when these CNVs are caused by the tumor, they are sometimes called copy number alterations (CNAs).
Your goal is to determine which patient is which.
Insert size distribution
To speed up things, the data has already been trimmed and aligned:
/data/shared/exercises/cfdna/patient_1.bam /data/shared/exercises/cfdna/patient_2.bam
We now have to determine which one has the greatest variance in terms of DNA fragment length. Use the commands that you have learned in the alignment exercises and plot the insert size distribution.
Q1
Which would you say has the greatest variance in terms of DNA fragment length?
Copy number variations
We will use a program called HMMcopy to infer CNVs. This method relies on the number of DNA fragments that align in a very specific window size (ex: how many fragments align every 100,000bp of the genome).
The following program:
readCounter
Is a very simple utility the compute the number of fragments aligning in a series of genomic windows. Type it without any arguments to see the options. Use the following option:
-c 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22
This specifies the order of the chromosomes. For now, we will only care about chromosomes 1 to 22. Count the number of fragments that align in windows of 100000 basepairs for both patients. For each bam file, redirect the output to a file then you get to choose the name but use the file extension ".wig" please as the output will be in wiggle track format (WIG).
Q2
Inspect the format and determine how many genomic windows do you have in total?
We will done use the following custom R script to call copy number variations:
/data/shared/exercises/cfdna/plotCNV.R [INPUT WIG]
The input is the wig file that you have previously generated. The script produce three files:
- [INPUT]_bias.pdf which shows you the fragment count as a function of GC bias
- [INPUT]_correction.pdf which shows you the fragment count once having corrected for GC bias
- [INPUT]_CNV.pdf shows you the annotated copy number variations.
For now, the script only produces a plot for chromosome 6.
Q3
Which sample do you believe has a lot of copy number variations?
Q4
Based on your answers to question 1 and 3, which patient has cancer?
Please find answers here
Congratulations you finished the exercise!