height
and weight
to metric system and re-create the below plot (hint: facetting rows and columns requires to wrap the variable names in vars()
)Now, one thing is seeing, but let us see if we can create the models, that we are seeing, while staying in the data-tibble (Hint: Look in R4DS chapter 25)
## # A tibble: 4 x 4
## # Groups: location, gender [4]
## location gender data mdls
## <chr> <chr> <list<df[,17]>> <list>
## 1 Buckingham female [114 × 17] <lm>
## 2 Buckingham male [86 × 17] <lm>
## 3 Louisa female [120 × 17] <lm>
## 4 Louisa male [83 × 17] <lm>
## # A tibble: 4 x 15
## # Groups: location, gender [4]
## location gender data mdls r.squared adj.r.squared sigma statistic
## <chr> <chr> <list<df[> <lis> <dbl> <dbl> <dbl> <dbl>
## 1 Bucking… female [114 × 17] <lm> 0.0571 0.0485 20.1 6.60
## 2 Bucking… male [86 × 17] <lm> 0.0551 0.0438 18.3 4.90
## 3 Louisa female [120 × 17] <lm> 0.0955 0.0877 16.2 12.2
## 4 Louisa male [83 × 17] <lm> 0.0704 0.0588 15.8 6.06
## # … with 7 more variables: p.value <dbl>, df <int>, logLik <dbl>,
## # AIC <dbl>, BIC <dbl>, deviance <dbl>, df.residual <int>
## # A tibble: 8 x 9
## # Groups: location, gender [4]
## location gender data mdls term estimate std.error statistic
## <chr> <chr> <list<df[> <lis> <chr> <dbl> <dbl> <dbl>
## 1 Bucking… female [114 × 17] <lm> (Int… -34.5 44.4 -0.778
## 2 Bucking… female [114 × 17] <lm> heig… 0.699 0.272 2.57
## 3 Bucking… male [86 × 17] <lm> (Int… -9.57 42.2 -0.227
## 4 Bucking… male [86 × 17] <lm> heig… 0.529 0.239 2.21
## 5 Louisa female [120 × 17] <lm> (Int… -36.4 33.1 -1.10
## 6 Louisa female [120 × 17] <lm> heig… 0.719 0.205 3.50
## 7 Louisa male [83 × 17] <lm> (Int… -40.1 49.3 -0.812
## 8 Louisa male [83 × 17] <lm> heig… 0.696 0.283 2.46
## # … with 1 more variable: p.value <dbl>
## # A tibble: 8 x 11
## # Groups: location, gender [4]
## location gender data mdls term estimate std.error statistic
## <chr> <chr> <list<df[> <lis> <chr> <dbl> <dbl> <dbl>
## 1 Bucking… female [114 × 17] <lm> (Int… -34.5 44.4 -0.778
## 2 Bucking… female [114 × 17] <lm> heig… 0.699 0.272 2.57
## 3 Bucking… male [86 × 17] <lm> (Int… -9.57 42.2 -0.227
## 4 Bucking… male [86 × 17] <lm> heig… 0.529 0.239 2.21
## 5 Louisa female [120 × 17] <lm> (Int… -36.4 33.1 -1.10
## 6 Louisa female [120 × 17] <lm> heig… 0.719 0.205 3.50
## 7 Louisa male [83 × 17] <lm> (Int… -40.1 49.3 -0.812
## 8 Louisa male [83 × 17] <lm> heig… 0.696 0.283 2.46
## # … with 3 more variables: p.value <dbl>, conf.low <dbl>, conf.high <dbl>
We have previously worked with a data set called gravier
on gene expression profiles in cancer.
Using this data, create a PCA and a cluster analysis to see if you can refind the sample/event labels (poor/good) (Hint: Revisit previous exercises to find the data and the slides from today for code examples)
You could also try to work with the BLOSUM62 matrix, perhaps it could be interesting to a PCA-visualisation and a clustering analysis of the amino acid profiles?