Learning Objectives I - As per DTU Course Base

A student who has met the objectives of the course will be able to:

  • Explain why reproducible data analysis is important, as well as identify relevant challenges and explain replicability versus reproducibility

  • Describe the components of a reproducible data analysis

  • Use Tidyverse R to perform exploratory data analysis (EDA) for data insights, including using ggplot to visualize multilayer data from e.g. high-througput -omics platforms

  • Use Tidyverse R to perform data cleansing, transformation, visualization and communication

  • Use RStudio and github for collaborative analysis projects

  • Perform and interpret standard dimension reduction and clustering techniques, as well as basic statistical tests and models

Learning Objectives II - As per DTU Course Base

A student who has met the objectives of the course will be able to:

  • Train and apply a machine learning model based on a neural network with Keras / Tensorflow in R

  • Prepare a dynamic rmarkdown report / presentation for a bio data analysis

  • Prepare a simple R package

  • Prepare a simple shiny app

  • Design and execute a bio data science project focusing on reproducibility

  • Analyze an already performed bio data science project with a view to assessing methods, methods and reproducibility

Project Groups

  • Students are responsible for forming groups of 4 students

  • Enter groups using this google sheet

  • On slack, there is a channel called project_groups, students are encouraged to connect with students with similar interest

  • All groups members are responsible for all parts of the project

Project description

Aim

  • Using the tools you have learned in the course, the aim is to “Design and execute a collaborative bio data science project focusing on reproducibility” based on the below data science cycle

Project Organisation Overview

  • Note, since we are working with small data set, we will put the entire project on GitHub

Project Requirements - Code

  • Organised according to previous slide on “Project Organisation” including using GitHub (Each group must create a GitHub 2020_groupXX repository on the course organisation)

  • Be tidyverse-code as you have been taught during classes, i.e. do not apply base R tools unless no tidyverse equivalent exists

  • Follow tidyverse code style guide

  • Contain the entire data science cycle

  • be end-to-end executable by running 00_doit.R

Project Requirements - Data

  • Be based on a real bio data set

  • Start out as “dirty” i.e. a completely clean and analysis-ready data set will not allow you to demonstrate, that you have met the course learning objectives

  • Be of limited size, so we can use GitHub for all parts of the project

  • You can find several suggested data sources under collection of additional sources on the course site

  • Note, you should demonstrate ability to extract biological insights, but at the same time mind that the focus should be on demonstrating that you master the data science toolbox

Project Requirements - Presentation

  • Follow the IMRAD standard scientific structure:
    • Introduction
    • Materials and Methods
    • Results (And)
    • Discussion
  • With a technical focus, but minding to communicate which-ever biological insights you arrived at

  • Should not include all your code (we will look into that at the individual examinations), but rather focus on the broader picture of what you did and include data summaries and visualisations

  • Created using ioslides_presentation rmarkdown (i.e. the right-most doc column in the project organisation will be a rmarkdown based presentation)

Exam details - From the DTU Course Base

Date of examination

  • F1A

Type of assessment

  • Oral examination and reports. In groups, a bio data science project is prepared, which forms the background for the exam. Handing in of the project on time is a prerequisite for attending the exam. The oral exam will be a group presentation of the project, followed by an individual examination in the project, as well as in general course learning objectives.

Aid

  • No Aid

Evaluation

  • 7 step scale , internal examiner

Elaboration

Time

  • May 14th (7 groups) and May 15th (3 groups) by random draw

  • Please be advised, that there is a chance, that the exam will be conducted remotely. In that case, I will of course provide guidelines and framework to facilitate a smooth exam.

At the Exam

Group Examination

  • Groups present their project for 15 minutes incl. 1-2 general questions

  • Each group member is then individually examined in the project and general course learning objectives for ~10 minutes

  • In the individual exam, we will look closer at the code underlying the project by looking into the associated GitHub project repo

  • So total expected time per group will be ~1h

  • Grades are given after the entire group has been examined

  • Important: ALL group members are responsible for ALL parts of the project

Questions?