Searches for all connections between the given set of scaffolds and the scaffolds in mm. The given set of scaffolds is often a subset of mm, but can also be a manually selected set of scaffolds from which to find connections in mm.

mmexpand_network(
  mm,
  scaffolds,
  network,
  min_connections = 2,
  include_connections = "direct"
)

Arguments

mm

(required) A dataframe loaded with mmload in which to find connections from scaffolds.

scaffolds

(required) The scaffolds from which to find connections in mm. Must be a vector of scaffold names or a dataframe with scaffold names in the first column.

network

(required) Paired-end or mate-pair connections between scaffolds in long format. The first and second columns must contain all connected scaffold pairs and the third column the number of connections.

min_connections

Filter all scaffold pairs with equal to or less than this number of connections before the extraction. (Default: 2)

include_connections

The connections to include. One of the following:

"direct"

Extract only scaffolds from mm that are directly connected to any of the scaffolds in scaffolds. (default)

"all"

Same as "direct", except that any further connections to connected scaffolds are also included, continuing until no further connections are found.

Value

A dataframe (tibble) compatible with other mmgenome2 functions.

Author

Kasper Skytte Andersen ksa@bio.aau.dk

Soren M. Karst smk@bio.aau.dk

Mads Albertsen MadsAlbertsen85@gmail.com

Examples

library(mmgenome2) data(mmgenome2) data(paired_ends) mmgenome2
#> # A tibble: 97,285 x 13 #> scaffold length gc cov_C13.11.14 cov_C13.11.25 cov_C13.12.03 cov_C14.01.09 #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 1 8264 57.8 1.44 53.6 0 0.066 #> 2 2 1027 57.0 0.625 24.2 0 0 #> 3 3 1665 55.9 13.5 434. 0.166 0.177 #> 4 4 9056 35.9 0.01 23.4 0 0 #> 5 5 3343 64.0 3.20 16.4 0 0 #> 6 6 98207 39.1 0.00966 24.5 3.29 9.85 #> 7 7 6480 63.0 2.61 19.2 1.46 12.3 #> 8 8 15790 61.7 2.78 21.2 1.62 10.3 #> 9 9 1403 70.4 85.1 192. 0 0 #> 10 10 2018 70.2 50.3 101. 0 0 #> # … with 97,275 more rows, and 6 more variables: PC1 <dbl>, PC2 <dbl>, #> # PC3 <dbl>, geneID <chr>, taxonomy <fct>, rRNA16S <fct>
selection <- data.frame( cov_C13.11.25 = c(7.2, 16.2, 25.2, 23.3, 10.1), cov_C14.01.09 = c(47, 77, 52.8, 29.5, 22.1) ) mmgenome2_extraction <- mmextract(mmgenome2, min_length = 3000, selection = selection, inverse = FALSE )
#> 47 scaffolds (or 1.33% of the scaffolds in mm, weighted by length) remain after 97238 of 97285 scaffolds have been filtered.
mmgenome2_extraction
#> # A tibble: 47 x 13 #> scaffold length gc cov_C13.11.14 cov_C13.11.25 cov_C13.12.03 cov_C14.01.09 #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 180 22093 74.0 0.298 12.0 7.34 37.9 #> 2 227 190826 65.4 0.348 13.0 6.91 38.5 #> 3 253 296715 70.2 1.06 15.2 7.00 41.1 #> 4 315 218862 71.7 0.366 12.6 7.17 38.8 #> 5 343 27203 57.6 0.235 10.7 5.07 32.0 #> 6 381 6128 67.2 8.11 22.6 5.89 48.7 #> 7 386 164942 68.2 0.359 12.3 6.73 37.2 #> 8 396 7448 72.6 0.329 13.0 7.59 41.0 #> 9 423 100199 68.7 1.40 15.3 6.71 40.0 #> 10 431 446220 72.4 0.409 12.6 6.80 37.9 #> # … with 37 more rows, and 6 more variables: PC1 <dbl>, PC2 <dbl>, PC3 <dbl>, #> # geneID <chr>, taxonomy <fct>, rRNA16S <fct>
mmgenome2_extraction_expanded <- mmexpand_network( mm = mmgenome2, scaffolds = mmgenome2_extraction, network = paired_ends, min_connections = 10, include_connections = "direct" )
#> The provided 47 scaffolds had 23 additional direct connections to other scaffolds in mm. #> Total: 70 scaffolds (or 1.92% of the scaffolds in mm, weighted by length).
mmgenome2_extraction_expanded
#> # A tibble: 70 x 13 #> scaffold length gc cov_C13.11.14 cov_C13.11.25 cov_C13.12.03 cov_C14.01.09 #> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 180 22093 74.0 0.298 12.0 7.34 37.9 #> 2 181 2970 68.0 0.34 13.9 7.37 38.6 #> 3 227 190826 65.4 0.348 13.0 6.91 38.5 #> 4 253 296715 70.2 1.06 15.2 7.00 41.1 #> 5 315 218862 71.7 0.366 12.6 7.17 38.8 #> 6 343 27203 57.6 0.235 10.7 5.07 32.0 #> 7 381 6128 67.2 8.11 22.6 5.89 48.7 #> 8 386 164942 68.2 0.359 12.3 6.73 37.2 #> 9 396 7448 72.6 0.329 13.0 7.59 41.0 #> 10 423 100199 68.7 1.40 15.3 6.71 40.0 #> # … with 60 more rows, and 6 more variables: PC1 <dbl>, PC2 <dbl>, PC3 <dbl>, #> # geneID <chr>, taxonomy <fct>, rRNA16S <fct>