Subset ampvis2 objects based on sample metadata

Subsets the data in ampvis2 objects based on metadata and returns the subsetted object.

amp_filter_samples(
  data,
  ...,
  minreads = 0,
  rarefy = NULL,
  normalise = FALSE,
  removeAbsentOTUs = TRUE
)

amp_subset_samples(
  data,
  ...,
  minreads = 0,
  rarefy = NULL,
  normalise = FALSE,
  removeAbsentOTUs = TRUE
)

Arguments

data: (required) Data list as loaded with amp_load.
...: Logical expression indicating elements or rows to keep in the metadata. Missing values are treated as FALSE. Passed directly to filter.
minreads: Minimum number of reads pr. sample. Samples below this value will be removed initially. (default: 0)
rarefy: Rarefy species richness to this value by using rrarefy. This is done initially, but after filtering based on the minreads value, if set. (default: NULL)
normalise: (logical) Normalise the OTU read counts to 100 (ie percent) per sample BEFORE the subset. (default: FALSE)
removeAbsentOTUs: (logical) Whether to remove OTU's that may have 0 read abundance in all samples after the subset. (default: TRUE)

Value

A modifed ampvis2 object

Details

The subset is performed on the metadata by subset() and the abundance- and taxonomy tables are then adjusted accordingly.

Preserving relative abundances in a subset of larger data

By default the raw read counts in the abundance matrix are normalised (transformed to percentages) by some plotting functions automatically (for example amp_heatmap, amp_timeseries, and more). This means that the relative abundances shown will be calculated based on the remaining taxa after the subset, not including the removed taxa, if any. To circumvent this, set normalise = TRUE when subsetting with the amp_filter_taxa and amp_filter_samples functions, and then set normalise = FALSE in the plotting function. This will transform the OTU counts to relative abundances BEFORE the subset, and setting normalise = FALSE will skip the transformation in the plotting function, see the example below.


data("MiDAS")
subsettedData <- amp_filter_samples(MiDAS,
                                    Plant %in% c("Aalborg West", "Aalborg East"),
                                    normalise = TRUE
                                    )
amp_heatmap(subsettedData,
            group_by = "Plant",
            tax_aggregate = "Phylum",
            tax_add = "Genus",
            normalise = FALSE
            )

References

McMurdie, P.J. & Holmes, S. (2014). Waste not, want not: Why rarefying microbiome data is inadmissible. PLoS Comput Biol 10(4): e1003531. DOI:10.1371/journal.pcbi.1003531

Author

Kasper Skytte Andersen ksa@bio.aau.dk

Mads Albertsen MadsAlbertsen85@gmail.com

Examples

# Load example data
data("MiDAS")

# Show a short summary about the data by simply typing the name of the object in the console
MiDAS
#> ampvis2 object with 5 elements. 
#> Summary of OTU table:
#>      Samples         OTUs  Total#Reads    Min#Reads    Max#Reads Median#Reads 
#>          658        14969     20890850        10480        46264        31800 
#>    Avg#Reads 
#>     31749.01 
#> 
#> Assigned taxonomy:
#>       Kingdom        Phylum         Class         Order        Family 
#>   14969(100%) 14477(96.71%) 12737(85.09%) 11470(76.63%)  9841(65.74%) 
#>         Genus       Species 
#>   7380(49.3%)     28(0.19%) 
#> 
#> Metadata variables: 5 
#>  SampleID, Plant, Date, Year, Period

# Keep only samples containing Aalborg West or East in the Plant column
MiDASsubset <- amp_filter_samples(MiDAS, Plant %in% c("Aalborg West", "Aalborg East"))
#> 590 samples and 5512 OTUs have been filtered 
#> Before: 658 samples and 14969 OTUs
#> After: 68 samples and 9457 OTUs

# Summary
MiDASsubset
#> ampvis2 object with 5 elements. 
#> Summary of OTU table:
#>      Samples         OTUs  Total#Reads    Min#Reads    Max#Reads Median#Reads 
#>           68         9457      2072678        17772        44326      30962.5 
#>    Avg#Reads 
#>     30480.56 
#> 
#> Assigned taxonomy:
#>      Kingdom       Phylum        Class        Order       Family        Genus 
#>   9457(100%) 9240(97.71%) 8305(87.82%) 7665(81.05%) 6767(71.56%) 5244(55.45%) 
#>      Species 
#>    23(0.24%) 
#> 
#> Metadata variables: 5 
#>  SampleID, Plant, Date, Year, Period

# Keep only samples containing Aalborg West or East in the Plant column
# and remove the sample "16SAMP-749". Remove any sample(s) with less than 20000 total reads
MiDASsubset2 <- amp_filter_samples(MiDAS,
  Plant %in% c("Aalborg West", "Aalborg East") & !SampleID %in% c("16SAMP-749"),
  minreads = 20000
)
#> 594 samples and 5601 OTUs have been filtered 
#> Before: 658 samples and 14969 OTUs
#> After: 64 samples and 9368 OTUs

# Summary
MiDASsubset2
#> ampvis2 object with 5 elements. 
#> Summary of OTU table:
#>      Samples         OTUs  Total#Reads    Min#Reads    Max#Reads Median#Reads 
#>           64         9368      1987574        21472        44326      31413.5 
#>    Avg#Reads 
#>     31055.84 
#> 
#> Assigned taxonomy:
#>      Kingdom       Phylum        Class        Order       Family        Genus 
#>   9368(100%) 9154(97.72%) 8222(87.77%) 7589(81.01%) 6699(71.51%) 5194(55.44%) 
#>      Species 
#>    23(0.25%) 
#> 
#> Metadata variables: 5 
#>  SampleID, Plant, Date, Year, Period