This function reads an OTU-table and corresponding sample metadata, and returns a list for use in all ampvis2 functions. It is therefore required to load data with amp_load
before any other ampvis2 functions can be used.
(required) File path, data frame, or a phyloseq-class object. OTU-table with the read counts of all OTU's. Rows are OTU's, columns are samples, otherwise you must transpose. The taxonomy of the OTU's can be placed anywhere in the table and will be extracted by name (Kingdom/Domain -> Species). If a file path is provided it will be attempted being read by either fread
or read_excel
, respectively. Compressed files (zip, bzip2, gzip) are supported if not an excel file (bzip2 and gzip requires data.table
1.14.3 or later). Can also be a path to a BIOM file, which will then be parsed using the biomformat package, so both the JSON and HDF5 versions of the BIOM format are supported.
(recommended) File path or a data frame. Sample metadata with any information about the samples. The first column must contain sample ID's matching those in the otutable. If none provided, dummy metadata will be created. Can be a data frame, matrix, or path to a delimited text file or excel file which will be read using either fread
or read_excel
, respectively. Compressed files (zip, bzip2, gzip) are supported if not an excel file (bzip2 and gzip requires data.table
1.14.3 or later). If otutable
is a BIOM file and contains sample metadata, metadata
will take precedence if provided. (default: NULL
)
(recommended) File path or a data frame. Taxonomy table where rows are OTU's and columns are up to 7 levels of taxonomy named Kingdom/Domain->Species. If taxonomy is also present in otutable, it will be discarded and only this will be used. Can be a data frame, matrix, or path to a delimited text file or excel file which will be read using either fread
or read_excel
, respectively. Compressed files (zip, bzip2, gzip) are supported if not an excel file (bzip2 and gzip requires data.table
1.14.3 or later). Can also be a path to a .sintax taxonomy table from a USEARCH analysis pipeline, file extension must be .sintax
. bzip2 or gzip compression is currently NOT supported if sintax format. (default: NULL
)
(optional) Path to a FASTA file with reference sequences for all OTU's in the OTU-table. (default: NULL
)
(optional) Path to a phylogenetic tree file which will be read using read.tree
, or an object of class "phylo"
. (default: NULL
)
(logical) Remove OTU's only observed once in all samples. (default: FALSE
)
(logical) Remove OTU's with 0 abundance in all samples. Absent OTUs are rarely present in the input data itself, but can occur when some samples are removed because of a mismatch between samples in the OTU-table and sample metadata. (default: TRUE
)
Character vector with the name(s) of the column in the otutable that contains the OTUs/ASVs.
Character vector with the name(s) of the column in the taxonomy that contains the OTUs/ASVs.
(optional) Additional arguments are passed on to any of the file reader functions used.
A list of class "ampvis2"
with 3 to 5 elements.
The amp_load
function validates and corrects the provided data frames in different ways to make it suitable for the rest of the ampvis2 functions. It is important that the provided data frames match the requirements as described in the following sections to work properly. If a phyloseq
-class object is provided the metadata, taxonomy, fasta, and tree arguments are ignored as they are expected to be provided in the phyloseq
object.
The OTU-table contains information about the OTUs, their read counts in each sample, and optionally their assigned taxonomy. The provided OTU-table must be a data frame with the following requirements:
The rows are OTU IDs and the columns are samples.
The last 7 columns are optionally the corresponding taxonomy assigned to the OTUs, named "Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species"
.
The OTU ID's are expected to be in either the row names of the data frame or in a column called "OTU", "ASV", or "#OTU ID". Otherwise the function will stop with a message.
The column names of the data frame are the sample IDs, exactly matching those in the metadata, (and taxonomy columns named Kingdom
-> Species
if present, of course).
Generally avoid special characters and spaces in row- and column names.
A minimal example is available with data("example_otutable")
.
The metadata contains additional information about the samples, for example where each sample was taken, date, pH, treatment etc, which is used to compare and group the samples during analysis. The amount of information in the metadata is unlimited, it can contain any number of columns (variables), however there are a few requirements:
The sample IDs must be in the first column. These sample IDs must match exactly to those in the OTU-table.
Column classes matter, categorical variables should be loaded either as.character()
or as.factor()
, and continuous variables as.numeric()
. See below.
Generally avoid special characters and spaces in row- and column names.
If for example a column is named "Year" and the entries are simply entered as numbers (2011, 2012, 2013 etc), then R will automatically consider these as numerical values (as.numeric()
) and therefore the column as a continuous variable, while it is a categorical variable and should be loaded as.factor()
or as.character()
instead. This has consequences for the analysis as R treats them differently. Therefore either use the colClasses =
argument when loading a csv file or col_types =
when loading an excel file, or manually adjust the column classes afterwards with fx metadata$Year <- as.character(metadata$Year)
.
The amp_load
function will automatically use the sample IDs in the first column as row names, but it is important to also have an actual column with sample IDs, so it's possible to fx group by that column during analysis. Any unmatched samples between the otutable and metadata will be removed with a warning.
A minimal example is available with data("example_metadata")
.
amp_load
, amp_filter_samples
, amp_filter_taxa
library(ampvis2)
if (FALSE) {
# Load data by either giving file paths or by passing already loaded R objects
### example load with file paths
d <- amp_load(
otutable = "path/to/otutable.tsv",
metadata = "path/to/metadata.xlsx",
taxonomy = "path/to/taxonomy.txt"
)
### example load with R objects
# Read the OTU-table as a data frame. It is important to set check.names = FALSE
myotutable <- read.delim("data/otutable.txt", check.names = FALSE)
# Read the metadata, probably an excel sheet
mymetadata <- read_excel("data/metadata.xlsx", col_names = TRUE)
# Read the taxonomy
mytaxonomy <- read.csv("data/taxonomy.csv", check.names = FALSE)
# Combine the data with amp_load()
d <- amp_load(
otutable = myotutable,
metadata = mymetadata,
taxonomy = mytaxonomy,
pruneSingletons = FALSE,
fasta = "path/to/fastafile.fa", # optional
tree = "path/to/tree.tree" # optional
)
### Load a phyloseq object
d <- amp_load(physeq_object)
### Show a short summary about the data by simply typing the name of the object in the console
d
}
### Minimal example metadata:
data("example_metadata")
example_metadata
#> # A tibble: 8 × 4
#> SampleID Plant Date Year
#> <chr> <chr> <dttm> <dbl>
#> 1 16SAMP_3893 Aalborg E 2014-02-06 00:00:00 2014
#> 2 16SAMP_3913 Aalborg E 2014-07-03 00:00:00 2014
#> 3 16SAMP_3941 Aalborg E 2014-08-19 00:00:00 2014
#> 4 16SAMP_3946 Aalborg E 2014-11-13 00:00:00 2014
#> 5 16SAMP_3953 Aalborg W 2014-02-04 00:00:00 2014
#> 6 16SAMP_4591 Aalborg W 2014-05-05 00:00:00 2014
#> 7 16SAMP_4597 Aalborg W 2014-08-18 00:00:00 2014
#> 8 16SAMP_4603 Aalborg W 2014-11-12 00:00:00 2014
### Minimal example otutable:
data("example_otutable")
example_otutable
#> 16SAMP_3893 16SAMP_3913 16SAMP_3941 16SAMP_3946 16SAMP_3953 16SAMP_4591
#> OTU_1 23 15 273 51 127 190
#> OTU_2 675 565 331 411 430 780
#> OTU_3 780 733 405 199 1346 1114
#> OTU_4 272 233 1434 256 736 1338
#> OTU_5 560 339 509 598 223 145
#> OTU_6 906 766 133 390 232 1458
#> OTU_7 297 218 418 130 1354 198
#> OTU_8 28 8 155 72 156 101
#> OTU_9 0 0 9 0 19 25
#> OTU_10 373 256 19 415 43 102
#> 16SAMP_4597 16SAMP_4603 Kingdom Phylum
#> OTU_1 220 83 k__Bacteria p__Chloroflexi
#> OTU_2 699 820 k__Bacteria p__Actinobacteria
#> OTU_3 1630 112 k__Bacteria p__Actinobacteria
#> OTU_4 1224 564 k__Bacteria p__Proteobacteria
#> OTU_5 212 1619 k__Bacteria p__Chloroflexi
#> OTU_6 560 287 k__Bacteria p__Firmicutes
#> OTU_7 283 116 k__Bacteria p__Actinobacteria
#> OTU_8 151 25 k__Bacteria p__Nitrospirae
#> OTU_9 58 0 k__Bacteria p__Bacteroidetes
#> OTU_10 73 138 k__Bacteria p__Bacteroidetes
#> Class Order Family
#> OTU_1 c__SJA-15 o__C10_SB1A f__C10_SB1A
#> OTU_2 c__Actinobacteria o__Micrococcales f__Intrasporangiaceae
#> OTU_3 c__Acidimicrobiia o__Acidimicrobiales f__Microthricaceae
#> OTU_4 c__Betaproteobacteria o__Rhodocyclales f__Rhodocyclaceae
#> OTU_5 c__Anaerolineae o__Anaerolineales f__Anaerolineaceae
#> OTU_6 c__Bacilli o__Lactobacillales f__Carnobacteriaceae
#> OTU_7 c__Acidimicrobiia o__Acidimicrobiales f__Microthricaceae
#> OTU_8 c__Nitrospira o__Nitrospirales f__Nitrospiraceae
#> OTU_9 c__Sphingobacteriia o__Sphingobacteriales f__Saprospiraceae
#> OTU_10 c__Sphingobacteriia o__Sphingobacteriales f__Saprospiraceae
#> Genus Species OTU
#> OTU_1 g__Candidatus Amarilinum s__ OTU_1
#> OTU_2 g__Tetrasphaera s__ OTU_2
#> OTU_3 g__Candidatus Microthrix s__ OTU_3
#> OTU_4 g__Dechloromonas s__ OTU_4
#> OTU_5 g__Candidatus Villogracilis s__ OTU_5
#> OTU_6 g__Trichococcus s__ OTU_6
#> OTU_7 g__Candidatus Microthrix s__ OTU_7
#> OTU_8 g__Nitrospira s__sublineage I OTU_8
#> OTU_9 g__QEDR3BF09 s__ OTU_9
#> OTU_10 g__MK04 s__ OTU_10
### Minimal example taxonomy:
data("example_taxonomy")
example_taxonomy
#> Kingdom Phylum Class
#> OTU_1 k__Bacteria p__Chloroflexi c__SJA-15
#> OTU_2 k__Bacteria p__Actinobacteria c__Actinobacteria
#> OTU_3 k__Bacteria p__Actinobacteria c__Acidimicrobiia
#> OTU_4 k__Bacteria p__Proteobacteria c__Betaproteobacteria
#> OTU_5 k__Bacteria p__Chloroflexi c__Anaerolineae
#> OTU_6 k__Bacteria p__Firmicutes c__Bacilli
#> OTU_7 k__Bacteria p__Actinobacteria c__Acidimicrobiia
#> OTU_8 k__Bacteria p__Nitrospirae c__Nitrospira
#> OTU_9 k__Bacteria p__Bacteroidetes c__Sphingobacteriia
#> OTU_10 k__Bacteria p__Bacteroidetes c__Sphingobacteriia
#> Order Family Genus
#> OTU_1 o__C10_SB1A f__C10_SB1A g__Candidatus Amarilinum
#> OTU_2 o__Micrococcales f__Intrasporangiaceae g__Tetrasphaera
#> OTU_3 o__Acidimicrobiales f__Microthricaceae g__Candidatus Microthrix
#> OTU_4 o__Rhodocyclales f__Rhodocyclaceae g__Dechloromonas
#> OTU_5 o__Anaerolineales f__Anaerolineaceae g__Candidatus Villogracilis
#> OTU_6 o__Lactobacillales f__Carnobacteriaceae g__Trichococcus
#> OTU_7 o__Acidimicrobiales f__Microthricaceae g__Candidatus Microthrix
#> OTU_8 o__Nitrospirales f__Nitrospiraceae g__Nitrospira
#> OTU_9 o__Sphingobacteriales f__Saprospiraceae g__QEDR3BF09
#> OTU_10 o__Sphingobacteriales f__Saprospiraceae g__MK04
#> Species OTU
#> OTU_1 s__ OTU_1
#> OTU_2 s__ OTU_2
#> OTU_3 s__ OTU_3
#> OTU_4 s__ OTU_4
#> OTU_5 s__ OTU_5
#> OTU_6 s__ OTU_6
#> OTU_7 s__ OTU_7
#> OTU_8 s__sublineage I OTU_8
#> OTU_9 s__ OTU_9
#> OTU_10 s__ OTU_10
# load example data
d <- amp_load(
otutable = example_otutable,
metadata = example_metadata,
taxonomy = example_taxonomy
)
# show a summary of the data
d
#> ampvis2 object with 3 elements.
#> Summary of OTU table:
#> Samples OTUs Total#Reads Min#Reads Max#Reads Median#Reads
#> 8 10 32246 2522 5451 3839
#> Avg#Reads
#> 4030.75
#>
#> Assigned taxonomy:
#> Kingdom Phylum Class Order Family Genus Species
#> 10(100%) 10(100%) 10(100%) 10(100%) 10(100%) 10(100%) 1(10%)
#>
#> Metadata variables: 4
#> SampleID, Plant, Date, Year