3 Materials and Methods

The experimental procedures performed will only be covered briefly here. For detailed protocols refer to the 3 protocols “DNA extraction from activated sludge”, “MiSeq Sequencing of Amplicons” and “16S rRNA V1-3 Amplicon Preparation v1.2” at http://midasfieldguide.org/en/protocols/ or the supplementary data from (S. J. McIlroy et al., 2015).

3.1 Sampling

The activated sludge (AS) from 32 Danish wastewater treatment plants (WWTPs) were sampled at a 1 meter depth below the surface. Some WWTPs were sampled 2 times a year (summer, winter), while others 4 times a year (summer, fall, winter, and spring), in the period 2006 to 2015. The samples were kept on ice as much as possible and stored at -80ºC.

3.2 DNA extraction

The DNA was extracted based on the manufacturer’s instructions of the FastDNA™ 2 mL SPIN Kit for Soil (MP Biomedicals, USA), optimised for DNA extraction from activated sludge by using increased bead beating with 4x40s at 6 m/s.

3.3 Library preparation, purification and pooling

Polymerase Chain Reaction (PCR) was used to amplify the V1-3 variable regions of the 16S rRNA gene using the barcoded forward (F) and reverse (R) primers listed below:

27F: AGAGTTTGATCCTGGCTCAG
534R: ATTACCGCGGCTGCTGG

PCR was performed with the thermo cycler settings: Initial denaturation at 95 °C for 2 min, 30 cycles of 95°C for 20s, 56°C for 30s, 72°C for 60s and final elongation at 72°C for 5 min. All PCR reactions were run in duplicates and pooled afterwards. The amplicon libraries were purified using the Agencourt® AMpure XP bead kit (Beckmann Coulter, USA) and the DNA concentration and quality was assured afterwards using a Quant-iT™ HS DNA Assay (Thermo Fisher Scientific) and Tapestation 2200 with D1K ScreenTapes (Agilent), respectively. Based on the measured library DNA concentrations the samples were pooled in equimolar concentrations before sequencing.

3.4 DNA sequencing and bioinformatics

The libraries were then sequenced using a MiSeq (Illumina, USA) with a PhiX control library. The libraries were subsampled to 50000 raw reads, and low quality reads were removed using Trimmomatic v0.32. Using USEARCH9 (Edgar, 2013), the sequence pairs were merged, PhiX sequences were filtered, unique sequences were identified, classified and clustered into Operational Taxonomic Units (OTUs, \(\geq\) 97% sequence identity). The OTUs were then infered taxonomic identification based on the MiDAS taxonomic database (S. J. McIlroy et al., 2015), which is a manually curated version of the SILVA database (Quast et al., 2013) suited for activated sludge microorganisms. The result is a sample-by-OTU table which contains the approximate read abundances of each OTU in each sample alongside their taxonomy.

3.5 Data processing and analysis

The raw sample-by-OTU table was analysed using the R statistical language (Ihaka & Gentleman, 1996) and the RStudio IDE (https://www.rstudio.org). I wrote an R-function (500+ lines of code) based on the vegan and ggplot2 R-packages for ordination of the data called ord_mep() (ordination of microbial ecology profiles) specifically for the purpose of this report, available at https://github.com/KasperSkytte/ord_mep. The function is inspired by the function amp_ordinate() from the R-package “ampvis” by Mads Albertsen (Albertsen, Karst, Ziegler, Kirkegaard, & Nielsen, 2015), extended with support for all ordination methods mentioned in Chapter 2, data transformations, plotting options, and more.

The OTUs have been filtered so that only those with a read abundance larger than 0.1% (across all samples) in at least one sample are being analysed. If not otherwise noted, ordination of the data has been done by first applying the Hellinger transformation to account for incorrect similarities due to double-zeros and to also downweight low abundant species (Buttigieg & Ramette, 2014; P. Legendre & Gallagher, 2001):

The Hellinger transformation

\[\begin{equation} D_{Hellinger}(x_1,x_2)= \sqrt{ \sum_{j=1}^p \left( \sqrt{\frac{y_{1j}}{y_{1+}}}- \sqrt{\frac{y_{2j}}{y_{2+}}} \right)^2 } \label{eq:hellinger} \end{equation}\]

where \(Y=[y_{ij}]\) is a species abundance table of the size \((n{\times}p)\) with sites (rows) \(i=[1...n]\), species (columns) \(j=[1...p]\) and row sums \(y_{i+}\). (P. Legendre & Gallagher, 2001)