22100 - R for Bio Data Science

Learning Objectives I - As per DTU Course Base

A student who has met the objectives of the course will be able to:

Explain why reproducible data analysis is important, as well as identify relevant challenges and explain replicability versus reproducibility
Describe the components of a reproducible data analysis
Use Tidyverse R to perform exploratory data analysis (EDA) for data insights, including using ggplot to visualize multilayer data from e.g. high-througput -omics platforms
Use Tidyverse R to perform data cleansing, transformation, visualization and communication
Use RStudio and github for collaborative analysis projects
Perform and interpret standard dimension reduction and clustering techniques, as well as basic statistical tests and models

Learning Objectives II - As per DTU Course Base

A student who has met the objectives of the course will be able to:

Train and apply a machine learning model based on a neural network with Keras / Tensorflow in R
Prepare a dynamic rmarkdown report / presentation for a bio data analysis
Prepare a simple R package
Prepare a simple shiny app
Design and execute a bio data science project focusing on reproducibility
Analyze an already performed bio data science project with a view to assessing methods, methods and reproducibility

Project Groups

Students are responsible for forming groups of 4 students
Enter groups using this google sheet
On slack, there is a channel called project_groups, students are encouraged to connect with students with similar interest
All groups members are responsible for all parts of the project

Project description

Aim

Using the tools you have learned in the course, the aim is to “Design and execute a collaborative bio data science project focusing on reproducibility” based on the below data science cycle

Project Organisation Overview

Note, since we are working with small data set, we will put the entire project on GitHub

Project Requirements - Code

Organised according to previous slide on “Project Organisation” including using GitHub (Each group must create a GitHub 2020_groupXX repository on the course organisation)
Be tidyverse-code as you have been taught during classes, i.e. do not apply base R tools unless no tidyverse equivalent exists
Follow tidyverse code style guide
Contain the entire data science cycle
be end-to-end executable by running 00_doit.R

Project Requirements - Data

Be based on a real bio data set
Start out as “dirty” i.e. a completely clean and analysis-ready data set will not allow you to demonstrate, that you have met the course learning objectives
Be of limited size, so we can use GitHub for all parts of the project
You can find several suggested data sources under collection of additional sources on the course site
Note, you should demonstrate ability to extract biological insights, but at the same time mind that the focus should be on demonstrating that you master the data science toolbox

Project Requirements - Presentation

Follow the IMRAD standard scientific structure:
- Introduction
- Materials and Methods
- Results (And)
- Discussion
With a technical focus, but minding to communicate which-ever biological insights you arrived at
Should not include all your code (we will look into that at the individual examinations), but rather focus on the broader picture of what you did and include data summaries and visualisations
Created using ioslides_presentation rmarkdown (i.e. the right-most doc column in the project organisation will be a rmarkdown based presentation)

Exam details - From the DTU Course Base

Date of examination

Type of assessment

Oral examination and reports. In groups, a bio data science project is prepared, which forms the background for the exam. Handing in of the project on time is a prerequisite for attending the exam. The oral exam will be a group presentation of the project, followed by an individual examination in the project, as well as in general course learning objectives.

Aid

No Aid

Evaluation

7 step scale , internal examiner

Elaboration

Time

May 14th (7 groups) and May 15th (3 groups) by random draw
Please be advised, that there is a chance, that the exam will be conducted remotely. In that case, I will of course provide guidelines and framework to facilitate a smooth exam.

At the Exam

Group Examination

Groups present their project for 15 minutes incl. 1-2 general questions
Each group member is then individually examined in the project and general course learning objectives for ~10 minutes
In the individual exam, we will look closer at the code underlying the project by looking into the associated GitHub project repo
So total expected time per group will be ~1h
Grades are given after the entire group has been examined
Important: ALL group members are responsible for ALL parts of the project

Learning Objectives I - As per DTU Course Base

A student who has met the objectives of the course will be able to:

Learning Objectives II - As per DTU Course Base

A student who has met the objectives of the course will be able to:

Project Groups

Project description

Aim

Project Organisation Overview

Project Requirements - Code

Project Requirements - Data

Project Requirements - Presentation

Exam details - From the DTU Course Base

Date of examination

Type of assessment

Aid

Evaluation

Elaboration

Time

At the Exam

Group Examination

Questions?