Calculates some basic statistics like the N50 length, minimum-, mean-, and maximum scaffold lengths, mean GC content, information about essential genes, and more...

mmstats(mm, original_data = NULL, print = TRUE)

Arguments

mm

(required) A dataframe loaded with mmload.

original_data

If mm is a subset/extraction of another dataframe loaded with mmload, then provide here the original dataframe from which the extraction originates to compare the extraction to the original data, see examples. (Default: NULL)

print

(logical) Whether to print the calculated stats or not. (Default: TRUE)

Value

A dataframe with the calculated stats is returned invisibly.

Details

The returned stats are calculated as follows:

  • Scaffolds: The number of different scaffolds in the assembly.

  • N50: The shortest sequence length at 50

  • Length.total: The total size of the assembly.

  • Length.max: The size of the largest scaffold in the assembly.

  • Length.mean: The average scaffold size in the assembly.

  • Length.min: The size of the smallest scaffold in the assembly.

  • weighted_GC_mean: The average GC content in the assembly, weighted by scaffold sizes.

  • cov_*: The average coverage of each coverage variable in mm, weighted by scaffold sizes. (Only columns starting with "cov_" will be shown)

  • Ess.genes.total: The total number of essential genes, if any have been loaded.

  • Ess.genes.unique: The number of unique essential genes, if any have been loaded.

Author

Kasper Skytte Andersen ksa@bio.aau.dk

Soren M. Karst smk@bio.aau.dk

Mads Albertsen MadsAlbertsen85@gmail.com

Examples

library(mmgenome2) data(mmgenome2) mmstats(mmgenome2)
#> General stats #> Scaffolds 97 285 #> N50 5 291 #> Length.total 331 908 376 #> Length.max 1 446 979 #> Length.mean 3 411.71 #> Length.min 1 000 #> weighted_GC_mean 52.22 #> cov_C13.11.14 2.62 #> cov_C13.11.25 16.06 #> cov_C13.12.03 2.91 #> cov_C14.01.09 11.58 #> Ess.genes.total 7 138 #> Ess.genes.unique 109
# Compare an extraction with the original data from which the extraction originates: selection <- data.frame( cov_C13.11.25 = c(7.2, 16.2, 25.2, 23.3, 10.1), cov_C14.01.09 = c(47, 77, 52.8, 29.5, 22.1) ) mmgenome2_extraction <- mmextract(mmgenome2, selection = selection )
#> 61 scaffolds (or 1.34% of the scaffolds in mm, weighted by length) remain after 97224 of 97285 scaffolds have been filtered.
mmstats(mmgenome2_extraction, original_data = mmgenome2)
#> General stats #> Scaffolds 61 #> N50 199 460 #> Length.total 4 452 901 #> Length.max 454 223 #> Length.mean 72 998.38 #> Length.min 1 043 #> weighted_GC_mean 70.96 #> cov_C13.11.14 (% of original) 0.3 #> cov_C13.11.25 (% of original) 1.11 #> cov_C13.12.03 (% of original) 3.19 #> cov_C14.01.09 (% of original) 4.49 #> Ess.genes.total 109 #> Ess.genes.unique 105