22100 - Course Programme Spring 2020 Spring 2020

From 22100
Jump to navigation Jump to search

Please note - This is the FIRST time the course runs, so the page is being created and updated and updated on-the-fly, i.e. the following is subject to change without notice!

Welcome to the spring 2020 version of R for Bio Data Science! Below you will find some basic information on the course and the complete course schedule. Please note: The course is scheduled for block F1A, i.e. Mondays 8-12.

Information for Course Participants

Course Responsible and Teacher

Course Communication

Course Format

  • Classes will be taught using RStudio Cloud, which is free. Students must sign up for an account
  • Classes will be a mixture of lectures and group work
  • Most of the group work will consist of computer exercises, students are required to bring their own laptop
  • All learning resources will be open and available through DTU inside or this site
  • Expected time usage: 1 ECTS point equals approx. 28 hours, this translates to an expected time usage of ~9-10 hours/week for a 5 ECTS 13-week course with 1 exam day and preparation. You will spend 4h in class per week and should therefore expect 5-6h of preparation.

Course Project Work and Exam

Course Ressources

General Daily Schedule

  • 08.00 - 09.00 Recap of subject covered the prior week and introduction to topic of the day
  • 09.00 - 12.00 Exercises

Location

  • Building and room: Building 208, room 903 (In via 208, down the stairs, through the glass doors on your left and then look for the room on your right)
  • Should you be new to DTU, a map of DTU Lyngby Campus is available here

2020 Course Schedule Overview

The Academic calendar sets the 13-week period for spring 2020 to 3/2 2020 - 12/5 2020, excluding holiday and non-teaching study breaks (all dates included) as follows:

  • Easter holiday: 6/4 2020 - 13/4 2020
  • St. Bededag (Danish national Holiday): 8/5 2020
  • Ascension Day: 21/5 2020 - 22/5 2020
  • Whitsun holiday: 1/6 2020
  • Constitution Day: 5/6 2020.

W01 - Monday Feb 3rd: Course Introduction and The very basics of R

Package(s) Schedule Learning Materials Session Learning Objectives
  • base R

A student who has met the objectives of the session will be able to:

  • Explain why reproducible data analysis is important, as well as identify relevant challenges and explain replicability versus reproducibility
  • Describe the components of a reproducible data analysis
  • Create an RStudio Cloud account and run cloud based sessions
  • Master the very basics of R
  • Navigate the RStudio IDE
  • Create, edit and run a basic RMarkdown document

W02 - Monday Feb 10th: Data Visualisation

Package(s) Schedule Learning Materials Session Learning Objectives

A student who has met the objectives of the session will be able to:

  • Use ggplot to visualize multilayer data from e.g. high-througput -omics platforms
  • Decipher the components of a ggplot

W03 - Monday Feb 17th: Data manipulation I: The 6 basic verbs

Package(s) Schedule Learning Materials Session Learning Objectives

A student who has met the objectives of the session will be able to:

  • Understand and apply the 6 basic dplyr verbs filter(), arrange(), select(), mutate(), summarise() and group_by()
  • Understand and apply the additional verbs count(), drop_na(), View()
  • Combine dplyr verbs to form a data manipulation pipeline using the pipe %>% operator
  • Decipher the components and functions hereof, of a dplyr pipeline

W04 - Monday Feb 24th: Data Manipulation II: Long and wide data, joins, strings and factors

Package(s) Schedule Learning Materials Session Learning Objectives

(These session materials contain repetition, this is intentional)

A student who has met the objectives of the session will be able to:

  • Understand and apply the various str_*() functions for string manipulation
  • Understand and apply the family of *_join() functions for combining data sets
  • Understand and apply pivot_wider() and pivot_longer()
  • Use factors in conjugation with plotting categorical data using ggplot

W05 - Monday Mar 2nd: Modelling, dimension reduction and clustering

Package(s) Schedule Learning Materials Session Learning Objectives

A student who has met the objectives of the session will be able to:

  • Understand and apply simple map() functions for element-wise function application
  • Understand and apply grouped supervised models to form nested model objects
  • Understand and apply the tidy() function for tidying various model objects
  • Perform a principal component analysis for dimension reduction of high dimensional data
  • Perform an unsupervised k-means clustering of high dimensional data

W06 - Monday Mar 9th: Scripting in a Reproducible and Collaborative Framework using GitHub via RStudio

Package(s) Schedule Learning Materials Session Learning Objectives

A student who has met the objectives of the session will be able to:

  • Explain why reproducible data analysis is important, as well as identify relevant challenges and explain replicability versus reproducibility
  • Describe the components of a reproducible data analysis
  • Use RStudio and github for collaborative analysis projects

W07 - Monday Mar 16th: Artificial Neural Networks using Keras / Tensorflow in R

Package(s) Schedule Learning Materials Session Learning Objectives
  • 08.00 - 08.10 Remote teaching setup and brief recap of git exercises
  • 08.10 - 08.15 Brief talk: Introduction to Artificial Neural Networks
  • 08.25 - 08.50 Exercise: Prototyping an ANN in R
  • 08.50 - 08.55 Brief talk: Introduction to TensorFlow/Keras in R 1
  • 08.55 - 09.15 Exercise: TensorFlow Playground
  • 09.15 - 09.30 Brief talk: Introduction to TensorFlow/Keras in R 2
  • 09.30 - 09.40 Brief talk: Session 1 Summary and Q&A
  • 09.40 - 10.00 Coffee Break / Time buffer
  • 10.00 - 10.30 Exercise: Hello Keras (Classification)
  • 10.30 - 10.45 Brief talk: A bit more on Keras
  • 10.45 - 11.15 Exercise: Predicting Price (regression)
  • 11.15 - 11.45 Exercise: Deep Learning for Cancer Immunotherapy
  • 11.45 - 12.00 Brief talk: Session 2 Summary and Q&A

A student who has met the objectives of the session will be able to:

  • Train and apply a simple basic machine learning model based on a neural network with Keras / Tensorflow in R

W08 - Monday Mar 23rd: Creating a simple R-package

Package(s) Schedule Learning Materials Session Learning Objectives
  • Self-study: Using the site Developing Packages with RStudio, spend time equivalent to your preparation and in-class time to study how to create a simply R-package
  • Remember, there is so much material available online, a quick google revealed this little example
  • Create a blank RStudio Cloud Project to use for studying the subject
  • Use your group Slack channel groupXX to pose and answer questions and interact with your study group. I will monitor these channels ~8-18 on the session date.
  • You are free to choose which function(s) you want to wrap in a package - A suggestion could be to create a set of functions to work programatically with DNA. Perhaps you want to be able transcribe, reverse, translate, etc.?
  • Look into including data in your package, perhaps you want your users to be able to access the BLOSUM62 matrix?
  • Remember to not only create the functions, but also work with creating the documentation around it, so that users can get help by typing, as per usual, ?your_function_name in the console
  • None

A student who has met the objectives of the session will be able to:

  • Prepare a simple R package for distributing documented functions
  • Using relevant online ressources to independently obtain new and expand on existing knowledge of R

W09 - Monday Mar 30th: Creating a simple Shiny application

Package(s) Schedule Learning Materials Session Learning Objectives
  • Self-study: Using the book Mastering Shiny by Hadley Wickham, spend time equivalent to your preparation and in-class time to study how to create a simply shiny application
  • Here is a nice primer on Shiny basics
  • Briefly on shiny: Think of shiny as a way to connect your data to a pointy-clicky interface, so that non-data users may interact with the data
  • Create a blank RStudio Cloud Project to use for studying the subject
  • Use your group Slack channel groupXX to pose and answer questions and interact with your study group. I will monitor these channels ~8-18 on the session date.
  • You are free to choose what you want to present using your shiny app. You could continue working with the package of DNA functions from last week. If you are interested in sequence logos, I can recommend looking into ggseqlogo
  • Investigate how you can use shinyapps.io to publish your shiny app - Here is a small example of a shiny app, that I have created
  • Your end-product for the day is a simple functional shiny server published on shinyapps.io - Send me the link to the server in a personal slack message. If circumstances do not allow you to finish, then that is fine, but do try to see if you can get it working.
  • None

A student who has met the objectives of the session will be able to:

  • Prepare a simple shiny application for distributing interactive data exploration
  • Using relevant online ressources to independently obtain new and expand on existing knowledge of R

Wnn - Monday Apr 6th: Happy Easter Holidays

Wnn - Monday Apr 13th: Happy Easter Holidays

W10-11-12-13 - Monday Apr 20th, Apr 27th, May 4th, May 11th: Project Work

  • Description of Project and Exam
  • Add your groups here
  • Now is the time to put everything you learned to use
  • In groups of 4 students (remember you have to form these yourself), you are to prepare a project (See above description)
  • Every Monday, each group will have a project-supervision meeting with me according to the below schedule
  • This year due to the situation, each meeting will take place using skype (My skype ID is: jessenleon)
  • It's a tight schedule and each group has ~20 minutes, so in the groups, be sure to prepare any questions you may have prior to the meeting
Time Group
  • 08.00 - 08.19
  • 08.20 - 08.39
  • 08.40 - 08.59
  • 09.00 - 09.19
  • 09.20 - 09.39
  • 09.40 - 09.59
  • 10.00 - 10.19
  • 10.20 - 10.39
  • 10.40 - 10.59
  • 11.00 - 11.19
  • 11.20 - 11.39
  • 1
  • 2
  • 3
  • 4
  • 5
  • break
  • 6
  • 7
  • 8
  • 9
  • 10

Wnn - Thursday May 14th and Friday May 15th: Exam Day

Description of Project and Exam

2020 Exam Schedule

Thursday May 14th (Ordinary Spring F1A)

  • 09.00 - 10.00 Group 8
  • 10.00 - 11.00 Group 4
  • 11.00 - 12.00 Group 10
  • 12.00 - 13.00 Lunch break
  • 13.00 - 14.00 Group 5
  • 14.00 - 15.00 Group 6

Friday May 15th (Extra Exam Day)

  • 09.00 - 10.00 Group 1
  • 10.00 - 11.00 Group 3
  • 11.00 - 12.00 Group 9
  • 12.00 - 13.00 Lunch break
  • 13.00 - 14.00 Group 7
  • 14.00 - 15.00 Group 2