Machine Learning for Life Sciences

Course leaders

Payam Emami

Olga Dethlefsen

Eva Freyhult

Description

This course if aimed for those who seek to deepen their biostatistical and machine learning skills. Building on the Introduction to Biostatistics and Machine Learning course, this course expands on common life science data analysis methods, including dimensionality reduction techniques beyond PCA, mixed-effects models for analysis of repeated measures, and survival analysis. We will also dive deeper into machine learning, covering more classification algorithms, ensemble techniques, optimization strategies and PLS methods for single and multi-omics data analysis.

Topics covered

  • Dimensionality reduction beyond PCA
  • Classification algorithms & ensemble techniques
  • Machine learning optimization strategies
  • PLS-based methods for single and multi-omics data analysis
  • Mixed-effect models for repeated measures, longitudinal studies and nested designs
  • Survival analysis
  • Introduction to neural networks

Learning outcomes

  • Machine Learning Workflow: understand and implement core ML stages in R and Python, covering data preprocessing, model selection, training, and evaluation.
  • Dimension Reduction: understand and apply advanced techniques like UMAP and t- SNE for high-dimensional data analysis and understand their relationship to PCA.
  • Classification Models: implement and tune RF, SVM, and logistic regression models using grid search for classification tasks.
  • Ensemble Methods: understand concepts of bagging, boosting, and stacking, and apply AdaBoost and XGBoost for classification and regression tasks.
  • PLS Analysis: Implement PLS, PLS-DA, and sPLS for single- and multi-omics data, including variable selection.
  • Mixed Effects Models: apply mixed models to complex biological data, focusing on repeated measures and longitudinal designs.
  • Survival Analysis: understand censored data, calculate Kaplan-Meier estimators to estimate survival functions, compare survival curves, and perform regression analysis with Cox proportional hazards models, handling time-dependent covariates and competing risks.
  • Gain foundational knowledge of CNNs and RNNs; understand LLMs in life sciences and apply pre-trained models for cell-type classification and gene expression prediction.
  • synthesize course methods in a final challenge, implementing ML workflows and statistical models on real-world data.

Pre-requisites

  • Basic knowledge of descriptive statistics, hypothesis testing and linear regression or having attended the Introduction to Biostatistics and Machine Learning course
  • Basic R and Python data science skills (for more details see course website)
  • BYOL (bring your own laptop)

Level

intermediate

Upcoming courses

CourseDateLocationApply by
Machine Learning for Life Sciences2025-06-09 - 2025-06-13Uppsala2025-05-02

Previous courses

CourseDateLocationApply by
Machine Learning for Life Sciences2024-11-25 - 2024-11-29Uppsala2024-10-18