Focus subject Theoretical Ecology / Ecological Data Science

Overview

The objective of this focus subject is to teach quantitative skills that should enable students to answer questions about ecological processes from data. Skills that are taught in this module include

Models and theories in ecology
Applied biostatistics
Data Sciene, machine learning and AI
Simulations and dynamic models

As all focus subjects, the TE specialisation consists of:

A theoretical module
A practical module

How to get started?

If you are interested in taking this focus subject, start by visiting Seminar 54374: Theoretische Ökologie (externer Link, öffnet neues Fenster), to get an idea what we are doing.

If you are sure you want to do the module, start the exercise here (externer Link, öffnet neues Fenster) which takes you through all necessary steps to finish the module.

Topics for the oral exam

The oral exam will be mainly based on the topics discussed below. Per default, the exam will cover all topics, but an emphasis on particular topics can be discussed on a case-by-case basis.

Ecological foundations

The lectures in the module focus on technical topics (biostatistics and machine learning). Nevertheless, in our internship, we expect that you will apply these methods to ecological questions, and thus have a basic understanding of ecology. To check if this understanding is present, please consult Begon, Ecology (Ch 15, 22, and 18), and check that you are familiar with the foundational concepts in ecology such as:

Natural selection, adaptation, intraspecific variation, speciation mechanisms, biogeography, convergent evolution, coexistence
Niches, environmental conditions, habitat, life history traits, density dependence, invasion fitness, trade-offs between (life-history) traits
Competition, population dynamics and coexistence, metapopulations, dispersal, species-area and species-abundance distributions, rarity, inbreeding depression, demographic stochasticity
Coexistence mechanisms, competitive exclusion, limiting similarity, paradox of the plankton, character displacement, stability
Trophic interactions, foraging theory, predator-prey cycles, food webs, parasitism and mutualisms
Community assembly mechanisms / processes, community diversity, community turnover and ecological gradients, effects of large-scale environmental drivers (e.g. light, energy, water) on functional and taxonomic community composition, biotic interactions on community composition, stability of food webs
Richness patterns and their predictors (energy, harshness, nutrients, … ), effect of spatial structure on richness (e.g. island biogeography), richness + ecosystem functioning

Biostatistics and machine learning

The main part of the exam will be on biostatistics and machine learning. The topics of the exam follow the three advanced stats and ML lectures

The exam questions will target a general understanding of all topics of the module. After an initial learning phase, you should try to actively check if you understood a topic. Methods for this include:

Write down the “pseudocode” for a technique - how would you program / design a new hypothesis test?
Answer questions of other people on the internet, e.g. on stats.stackexchange.com
Ask an AI to ask you questions and provide feedback on your answers (to be treated with reservation, many AIs are bad on stats topics)

Make sure you have covered the following topics:

Statistical Theory

NHST: p-values, Typ I/II error, power, FDR, etc.; Theory / numerics of NHST (test statistic, distribution of test statistic, …); Knowledge of common tests (t-test, ANOVA, …, see lecture notes)
Likelihood: MLE: definition, motivation, properties, basic understanding of the numerics of MLE (i.e. optimization methods); Frequentist CIs: definition, motivation, properties, numerics (quadratic approximation, likelihood profiles, profile likelihood)
Bayes: Posterior: definition, motivation, properties; Differences Bayes / frequentism; (optional: Basic understanding of posterior estimation via MCMC sampling)

Statistical models (see in particular content of advanced biostatistics course)

The linear model: Properties, estimation with MLE, residual tests; practical application, main effects, quadratic effects, interactions, centering + scaling, testing blocks of variables via LRT / ANOVA
Standard GLMs (Poisson, Binomial, negBinom, quasi-distributions): Properties, estimation, residual tests, Overdispersion, Zero-inflation
The GLS framework: modeling heteroskedasticity; spatial autoregressive models (e.g. CAR, see also here); temporal autocorrelation (e.g. AR1)
GAMs: theory / functioning, in particular how to control spline complexity
Mixed models: definition and purpose of random effects, LMM and GLMM - estimation, residuals, etc.
(Optional) basic ideas of hierarchical models, e.g. state-space model for population dynamics - see here; Occupancy model (see Bayesian Data Analysis Book)

Important statistical indicators (Intro R and Advanced Biostatistics)

All standard summary statistics (standard deviation, R2, … etc.)
Autocorrelation metrics, e.g. Moran’s I
Performance measures (RMSE, AUC, etc.)

Multivariate statistics (Intro R course)

Basic methods of constrained and unconstrained ordination (e.g. PCA, CA, NMDS), see Einführung in R

Model Selection and Regularization (ML and Advanced Biostatistics course)

LRT (definition, theory, application, analytical and simulation methods for calculation)
Information-Criteria (see also Gellman, A.; Hwang, J. & Vehtari, A. Understanding predictive information criteria for Bayesian models Statistics And Computing, Springer US, 2014, 24, 997-1016), in particular AIC, BIC, DIC, WAIC
Bayes Factor / marginal likelihood (see also Kass, R. E. & Raftery, A. E. Bayes Factors, Journal of The American Statistical Association, Amer Statist Assn, 1995, 90, 773-795)
Regularization / Shrinkage estimators (Ridge, Lasso)

Also: understand what the practical problems are to calculate those for different model classes.

Simulation / resampling methods (Advanced Biostatistics course)

Bootstrap (parametric / non-parametric)
Cross-validation
Null-Models (randomization and simulation)

Machine learning (see content of ML course, if you didn’t take this course we will only cover basis)

Basic definitions: supervised, unsupervised, semi-supervised learning
Concepts: Regularization (e.g. Lasso); Model averaging, weak leaners; Overfitting, control complexity, bias-variance trade-off
Standard algorithms for unsupervised learning: Clustering methods (hierarchical, k-means)
Standard algorithms for supervised learning: Distance-based methods (knn, SVMs), Tree-based methods (Random Forest, BRTs, boosting, bagging, gradient-boosting), Neuronal Networks
explainable AI
Deep Neural networks architectures: DNNs, CNNs, Autoencoder and GANs, Reinforcement learning

Causal inference (Advanced Biostatistics course)

Static (Pearl) causality concept: Expressing causal relationship as graphs (DAGs), Standard structures (see also this blog post) (Confounder, Mediator, Collider), fitting DAGs with Structural Equation Models (SEMs)
Temporal (Granger) causality

Recommended Books for learning (additional to the specialized links above)

Bruce, P., Bruce, A., & Gedeck, P. (2020). Practical statistics for data scientists: 50+ essential concepts using R and Python. O'Reilly Media.
McElreath, R. (2018). Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman and Hall/CRC. (online material here)
James, G.; Witten, D.; Hastie, T. & Tibshirani, R. An Introduction to Statistical Learning with Applications in R Springer, 2013 (online material here)

nach oben