A few initial questions

Remember, you can get help to any R-function by typing e.g. ?str_c in the console

  • Q1: What are primary, foreign and surrogate keys?
  • Q2: Discuss with your desk buddy and write notes in your rmarkdown document on defining with your own words, what inner-, left-, right-, full-, semi- and anti-joins do?
  • Q3: What would the output be of running: inner_join(), left_join(), right_join(), full_join(), semi_join() and anti_join() on x and y as defined below? (Discuss before running any code!)
x <- tribble(
  ~key, ~val_x,
     1, "x1",
     2, "x2",
     3, "x3"
)
y <- tribble(
  ~key, ~val_y,
     1, "y1",
     2, "y2",
     4, "y3"
)
  • Q4: Use the base function sample() in conjugation with str_c() to create a function, which can return a random dna string of length n, run the function with n = 100 and save the output to my_dna - What fraction of the dna you created is adenine?
  • Q5: Use the approproate str_* function to change my_dna to my_rna - How many start codons did you get?
  • Q6: Extract the first 3 and the last 3 nucleotides of my_dna and save them in seperate variables - which amino acid do they encode? (Do not trqnslate the entire sequence, just look the codons up using google)
  • Q7: Look at my_dna and randomly choose 3 nucleotides forming a codon, hardcode them to a variable and now, split my_dna on those - What is returned?
  • Q8: Discuss with your desk buddy and write notes on what is a factor and what does it mean, that a factor has levels?
  • Q9: Run the code factor(LETTERS) and factor(rep(LETTERS, 10)), inspect the output and discuss what is going on with your desk-buddy?
  • Q10: factor(rev(LETTERS)), factor(rev(LETTERS), levels = LETTERS) and factor(rev(LETTERS), levels = rev(LETTERS)), inspect the output and discuss what is going on with your desk-buddy?

Working with joins (Relational data)

Download this data I prepared to your computer, then upload it in your data folder in your RStudio cloud session.

  • Q11: Which file types are in the uploaded zip file?
  • Q12: The tidyverse package for reading files is called readr, which function reads the file types you uploaded? (Hint: Try typing readr::r and then hit the TAB key in the console)
  • Q13: Read in all the files into seperate variables and use the appropriate join_* function(s) to re-create the diabetes data from last session

The original data had the following variables:

id chol stab.glu hdl ratio glyhb location age gender height weight frame bp.1s bp.1d bp.2s bp.2d waist hip

Working with long/wide data

  • Q14: The data you have re-created, take a look at it - Is it wide or long data?

Last session we worked with select(). This function has derivatives, one of which is select_if().

  • Q15: Try to google dplyr::select_if() - What does this function do?
  • Q16: Using the diabetes data you re-created previously, subset to id and numeric values, then convert to long format keeping the id variable (saving into the variable diabetes_data_long) and re-create the facetted plot below - What is on the x-axis and what is on the y-axis?

  • Q17: Take a good look at the plot - Can you come up with a better way of representing the data recorded for each numerical variable?

  • Q18: Use the original diabetes_data to join back the gender to the long data you created and redo the plot, colouring for gender

  • Q19: Could you have included the gender variable, when you originally created the long data and if so how?