22100 - R for Bio Data Science

Many Models

Visualising

T1: Using the diabetes data, convert variables height and weight to metric system and re-create the below plot (hint: facetting rows and columns requires to wrap the variable names in vars())

Modelling

Now, one thing is seeing, but let us see if we can create the models, that we are seeing, while staying in the data-tibble (Hint: Look in R4DS chapter 25)

T2: Create the below nested tibble

## # A tibble: 4 x 4
## # Groups:   location, gender [4]
##   location   gender            data mdls  
##   <chr>      <chr>  <list<df[,17]>> <list>
## 1 Buckingham female      [114 × 17] <lm>  
## 2 Buckingham male         [86 × 17] <lm>  
## 3 Louisa     female      [120 × 17] <lm>  
## 4 Louisa     male         [83 × 17] <lm>

T3: Create the below nested tibble

## # A tibble: 4 x 15
## # Groups:   location, gender [4]
##   location gender       data mdls  r.squared adj.r.squared sigma statistic
##   <chr>    <chr>  <list<df[> <lis>     <dbl>         <dbl> <dbl>     <dbl>
## 1 Bucking… female [114 × 17] <lm>     0.0571        0.0485  20.1      6.60
## 2 Bucking… male    [86 × 17] <lm>     0.0551        0.0438  18.3      4.90
## 3 Louisa   female [120 × 17] <lm>     0.0955        0.0877  16.2     12.2 
## 4 Louisa   male    [83 × 17] <lm>     0.0704        0.0588  15.8      6.06
## # … with 7 more variables: p.value <dbl>, df <int>, logLik <dbl>,
## #   AIC <dbl>, BIC <dbl>, deviance <dbl>, df.residual <int>

T4: Create the below nested tibble

## # A tibble: 8 x 9
## # Groups:   location, gender [4]
##   location gender       data mdls  term  estimate std.error statistic
##   <chr>    <chr>  <list<df[> <lis> <chr>    <dbl>     <dbl>     <dbl>
## 1 Bucking… female [114 × 17] <lm>  (Int…  -34.5      44.4      -0.778
## 2 Bucking… female [114 × 17] <lm>  heig…    0.699     0.272     2.57 
## 3 Bucking… male    [86 × 17] <lm>  (Int…   -9.57     42.2      -0.227
## 4 Bucking… male    [86 × 17] <lm>  heig…    0.529     0.239     2.21 
## 5 Louisa   female [120 × 17] <lm>  (Int…  -36.4      33.1      -1.10 
## 6 Louisa   female [120 × 17] <lm>  heig…    0.719     0.205     3.50 
## 7 Louisa   male    [83 × 17] <lm>  (Int…  -40.1      49.3      -0.812
## 8 Louisa   male    [83 × 17] <lm>  heig…    0.696     0.283     2.46 
## # … with 1 more variable: p.value <dbl>

T5: Create the below nested tibble

## # A tibble: 8 x 11
## # Groups:   location, gender [4]
##   location gender       data mdls  term  estimate std.error statistic
##   <chr>    <chr>  <list<df[> <lis> <chr>    <dbl>     <dbl>     <dbl>
## 1 Bucking… female [114 × 17] <lm>  (Int…  -34.5      44.4      -0.778
## 2 Bucking… female [114 × 17] <lm>  heig…    0.699     0.272     2.57 
## 3 Bucking… male    [86 × 17] <lm>  (Int…   -9.57     42.2      -0.227
## 4 Bucking… male    [86 × 17] <lm>  heig…    0.529     0.239     2.21 
## 5 Louisa   female [120 × 17] <lm>  (Int…  -36.4      33.1      -1.10 
## 6 Louisa   female [120 × 17] <lm>  heig…    0.719     0.205     3.50 
## 7 Louisa   male    [83 × 17] <lm>  (Int…  -40.1      49.3      -0.812
## 8 Louisa   male    [83 × 17] <lm>  heig…    0.696     0.283     2.46 
## # … with 3 more variables: p.value <dbl>, conf.low <dbl>, conf.high <dbl>

T6: Re-create the coefficient plot from David Robinson’s talk at the 2016 New York R Conference

Clustering and dimensionality reduction

We have previously worked with a data set called gravier on gene expression profiles in cancer.

Using this data, create a PCA and a cluster analysis to see if you can refind the sample/event labels (poor/good) (Hint: Revisit previous exercises to find the data and the slides from today for code examples)

You could also try to work with the BLOSUM62 matrix, perhaps it could be interesting to a PCA-visualisation and a clustering analysis of the amino acid profiles?

22100 - R for Bio Data Science

Day 4 - Exercises: Data Manipulation II

February 24th 2020

Many Models

Visualising

Modelling

Clustering and dimensionality reduction