Introduction to Biostatistics and Machine Learning

Course leaders

Olga Dethlefsen

Eva Freyhult

Description

National course open for PhD students, postdocs, researchers and other employees in need of biostatistical skills within all Swedish universities. The course is geared towards life scientists wanting to be able to understand and use basic statistical and machine learning methods. It would also suit those already applying biostatistical methods but have never got a chance to truly understand the basic statistical concepts, such as the commonly misinterpreted p-value.

In this course we focus on an active learning approach. The course participants are expected to do some pre-course reading and exercises, corresponding up to 40h studying. The education consists of teaching blocks alternating between mini-lectures, group discussions, live coding sessions etc.

Topics covered

  • Probability theory
    
  • Hypothesis testing and confidence intervals
    
  • Resampling
    
  • Linear regression methods
    
  • Introduction to generalized linear models
    
  • Model evaluation
    
  • Unsupervised learning incl. clustering and dimension reduction methods
    
  • Supervised learning incl. classification
    

More information can be found in last years course.

Learning outcomes

By the end of this course, participants will be able to:

  • Summarize and visualize data using descriptive statistics.
  • Understand probability, random variables, and key distributions
  • Compute sampling distributions and standard errors
  • Perform hypothesis testing using resampling techniques and parametric testing
  • Implement and interpret linear regression and classification models
  • Assess (generalized) linear model performance and assumptions
  • Apply Principal Component Analysis for dimensionality reduction
  • Use clustering methods like k-means and hierarchical clustering
  • Understand and apply Random Forest for classification and regression
  • Compare machine learning models using evaluation metrics
  • Use structured machine learning model building and evaluation

Pre-requisites

  • Basic R programming skills (check your skills by taking our self-assessment test)
    
  •         using R as calculator
    
  •         being able to work with vectors and matrices, incl. subsetting and matrices multiplication 
    
  •         reading in data from .csv files, e.g. with read_csv()
    
  •         printing top few rows or last few rows, e.g. with head() and tail()
    
  •         using in-built summary functions such as sum(), min() or max()
    
  •         being able to use documentation pages for R functions, e.g. with help() or ?()
    
  •         using if else statements, writing simple loops and functions.
    
  •         making simple plots (scatter plots, histograms), both with plot() and ggplot()
    
  •         using tidyverse() for data transformations, e.g. filtering rows, selecting columns, creating new columns etc. 
    
  •         being able to install CRAN packages e.g. with install.packages()
    
  •         being familiar with R Markdown or Quatro format
    
  • No prior biostatistical knowledge is assumed, only basic math skills (pre-course studying materials will be available upon course acceptance). 
    
  • BYOL (bring your own laptop) with R and R Studio installed
    

Level

beginner

Upcoming courses

CourseDateLocationApply by
Introduction to Biostatistics and Machine Learning2025-04-07 - 2025-04-11Uppsala2025-03-14

Previous courses

CourseDateLocationApply by
Introduction to Biostatistics and Machine Learning2024-04-22 - 2024-04-26Uppsala2024-03-24
Introduction to Biostatistics and Machine Learning2023-04-24 - 2023-04-28Uppsala
Introduction to Biostatistics and Machine Learning2022-09-12 - 2022-09-16Uppsala
Introduction to Biostatistics and Machine Learning2021-10-04 - 2021-10-08