22100 - R for Bio Data Science

A few initial questions

Remember, you can get help to any R-function by typing e.g. ?str_c in the console

Q1: What are primary, foreign and surrogate keys?
Q2: Discuss with your desk buddy and write notes in your rmarkdown document on defining with your own words, what inner-, left-, right-, full-, semi- and anti-joins do?
Q3: What would the output be of running: inner_join(), left_join(), right_join(), full_join(), semi_join() and anti_join() on x and y as defined below? (Discuss before running any code!)

x <- tribble(
  ~key, ~val_x,
     1, "x1",
     2, "x2",
     3, "x3"
)
y <- tribble(
  ~key, ~val_y,
     1, "y1",
     2, "y2",
     4, "y3"
)

Q4: Use the base function sample() in conjugation with str_c() to create a function, which can return a random dna string of length n, run the function with n = 100 and save the output to my_dna - What fraction of the dna you created is adenine?
Q5: Use the approproate str_* function to change my_dna to my_rna - How many start codons did you get?
Q6: Extract the first 3 and the last 3 nucleotides of my_dna and save them in seperate variables - which amino acid do they encode? (Do not trqnslate the entire sequence, just look the codons up using google)
Q7: Look at my_dna and randomly choose 3 nucleotides forming a codon, hardcode them to a variable and now, split my_dna on those - What is returned?
Q8: Discuss with your desk buddy and write notes on what is a factor and what does it mean, that a factor has levels?
Q9: Run the code factor(LETTERS) and factor(rep(LETTERS, 10)), inspect the output and discuss what is going on with your desk-buddy?
Q10: factor(rev(LETTERS)), factor(rev(LETTERS), levels = LETTERS) and factor(rev(LETTERS), levels = rev(LETTERS)), inspect the output and discuss what is going on with your desk-buddy?

Download this data I prepared to your computer, then upload it in your data folder in your RStudio cloud session.

Q11: Which file types are in the uploaded zip file?
Q12: The tidyverse package for reading files is called readr, which function reads the file types you uploaded? (Hint: Try typing readr::r and then hit the TAB key in the console)
Q13: Read in all the files into seperate variables and use the appropriate join_* function(s) to re-create the diabetes data from last session

The original data had the following variables:

id chol stab.glu hdl ratio glyhb location age gender height weight frame bp.1s bp.1d bp.2s bp.2d waist hip

Q14: The data you have re-created, take a look at it - Is it wide or long data?

Last session we worked with select(). This function has derivatives, one of which is select_if().

Q15: Try to google dplyr::select_if() - What does this function do?
Q16: Using the diabetes data you re-created previously, subset to id and numeric values, then convert to long format keeping the id variable (saving into the variable diabetes_data_long) and re-create the facetted plot below - What is on the x-axis and what is on the y-axis?

Q17: Take a good look at the plot - Can you come up with a better way of representing the data recorded for each numerical variable?
Q18: Use the original diabetes_data to join back the gender to the long data you created and redo the plot, colouring for gender
Q19: Could you have included the gender variable, when you originally created the long data and if so how?