Overview
The objective of this focus subject is to teach quantitative skills that should enable students to answer questions about ecological processes from data. Skills that are taught in this module include
- Models and theories in ecology
- Applied biostatistics
- Data Sciene, machine learning and AI
- Simulations and dynamic models
As all focus subjects, the TE specialisation consists of:
- A theoretical module
- A practical module
How to get started?
If you are interested in taking this focus subject, start by visiting Seminar 54374: Theoretische Ökologie (externer Link, öffnet neues Fenster), to get an idea what we are doing.
If you are sure you want to do the module, start the exercise here (externer Link, öffnet neues Fenster) which takes you through all necessary steps to finish the module.
Topics for the oral exam
The oral exam will be mainly based on the topics discussed below. Per default, the exam will cover all topics, but an emphasis on particular topics can be discussed on a case-by-case basis.
Ecological foundations
The lectures in the module focus on technical topics (biostatistics and machine learning). Nevertheless, in our internship, we expect that you will apply these methods to ecological questions, and thus have a basic understanding of ecology. To check if this understanding is present, please consult Begon, Ecology (Ch 15, 22, and 18), and check that you are familiar with the foundational concepts in ecology such as:
- Natural selection, adaptation, intraspecific variation, speciation mechanisms, biogeography, convergent evolution, coexistence
- Niches, environmental conditions, habitat, life history traits, density dependence, invasion fitness, trade-offs between (life-history) traits
- Competition, population dynamics and coexistence, metapopulations, dispersal, species-area and species-abundance distributions, rarity, inbreeding depression, demographic stochasticity
- Coexistence mechanisms, competitive exclusion, limiting similarity, paradox of the plankton, character displacement, stability
- Trophic interactions, foraging theory, predator-prey cycles, food webs, parasitism and mutualisms
- Community assembly mechanisms / processes, community diversity, community turnover and ecological gradients, effects of large-scale environmental drivers (e.g. light, energy, water) on functional and taxonomic community composition, biotic interactions on community composition, stability of food webs
- Richness patterns and their predictors (energy, harshness, nutrients, … ), effect of spatial structure on richness (e.g. island biogeography), richness + ecosystem functioning
Biostatistics and machine learning
The main part of the exam will be on biostatistics and machine learning. The topics of the exam follow the three advanced stats and ML lectures
- Advanced Stats lecture notes
- Machine learning lecture notes
- Introduction to Bayesian Statistics lecture notes
The exam questions will target a general understanding of all topics of the module. After an initial learning phase, you should try to actively check if you understood a topic. Methods for this include:
- Write down the “pseudocode” for a technique - how would you program / design a new hypothesis test?
- Answer questions of other people on the internet, e.g. on stats.stackexchange.com
- Ask an AI to ask you questions and provide feedback on your answers (to be treated with reservation, many AIs are bad on stats topics)
Make sure you have covered the following topics:
Statistical Theory
- NHST: p-values, Typ I/II error, power, FDR, etc.; Theory / numerics of NHST (test statistic, distribution of test statistic, …); Knowledge of common tests (t-test, ANOVA, …, see lecture notes)
- Likelihood: MLE: definition, motivation, properties, basic understanding of the numerics of MLE (i.e. optimization methods); Frequentist CIs: definition, motivation, properties, numerics (quadratic approximation, likelihood profiles, profile likelihood)
- Bayes: Posterior: definition, motivation, properties; Differences Bayes / frequentism; (optional: Basic understanding of posterior estimation via MCMC sampling)
Statistical models (see in particular content of advanced biostatistics course)
- The linear model: Properties, estimation with MLE, residual tests; practical application, main effects, quadratic effects, interactions, centering + scaling, testing blocks of variables via LRT / ANOVA
- Standard GLMs (Poisson, Binomial, negBinom, quasi-distributions): Properties, estimation, residual tests, Overdispersion, Zero-inflation
- The GLS framework: modeling heteroskedasticity; spatial autoregressive models (e.g. CAR, see also here); temporal autocorrelation (e.g. AR1)
- GAMs: theory / functioning, in particular how to control spline complexity
- Mixed models: definition and purpose of random effects, LMM and GLMM - estimation, residuals, etc.
- (Optional) basic ideas of hierarchical models, e.g. state-space model for population dynamics - see here; Occupancy model (see Bayesian Data Analysis Book)
Important statistical indicators (Intro R and Advanced Biostatistics)
- All standard summary statistics (standard deviation, R2, … etc.)
- Autocorrelation metrics, e.g. Moran’s I
- Performance measures (RMSE, AUC, etc.)
Multivariate statistics (Intro R course)
- Basic methods of constrained and unconstrained ordination (e.g. PCA, CA, NMDS), see Einführung in R
Model Selection and Regularization (ML and Advanced Biostatistics course)
- LRT (definition, theory, application, analytical and simulation methods for calculation)
- Information-Criteria (see also Gellman, A.; Hwang, J. & Vehtari, A. Understanding predictive information criteria for Bayesian models Statistics And Computing, Springer US, 2014, 24, 997-1016), in particular AIC, BIC, DIC, WAIC
- Bayes Factor / marginal likelihood (see also Kass, R. E. & Raftery, A. E. Bayes Factors, Journal of The American Statistical Association, Amer Statist Assn, 1995, 90, 773-795)
- Regularization / Shrinkage estimators (Ridge, Lasso)
Also: understand what the practical problems are to calculate those for different model classes.
Simulation / resampling methods (Advanced Biostatistics course)
- Bootstrap (parametric / non-parametric)
- Cross-validation
- Null-Models (randomization and simulation)
Machine learning (see content of ML course, if you didn’t take this course we will only cover basis)
- Basic definitions: supervised, unsupervised, semi-supervised learning
- Concepts: Regularization (e.g. Lasso); Model averaging, weak leaners; Overfitting, control complexity, bias-variance trade-off
- Standard algorithms for unsupervised learning: Clustering methods (hierarchical, k-means)
- Standard algorithms for supervised learning: Distance-based methods (knn, SVMs), Tree-based methods (Random Forest, BRTs, boosting, bagging, gradient-boosting), Neuronal Networks
- explainable AI
- Deep Neural networks architectures: DNNs, CNNs, Autoencoder and GANs, Reinforcement learning
Causal inference (Advanced Biostatistics course)
- Static (Pearl) causality concept: Expressing causal relationship as graphs (DAGs), Standard structures (see also this blog post) (Confounder, Mediator, Collider), fitting DAGs with Structural Equation Models (SEMs)
- Temporal (Granger) causality
Recommended Books for learning (additional to the specialized links above)
- Bruce, P., Bruce, A., & Gedeck, P. (2020). Practical statistics for data scientists: 50+ essential concepts using R and Python. O'Reilly Media.
- McElreath, R. (2018). Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman and Hall/CRC. (online material here)
- James, G.; Witten, D.; Hastie, T. & Tibshirani, R. An Introduction to Statistical Learning with Applications in R Springer, 2013 (online material here)