aSmithsonian Tropical Research Institute, Apartado 0843-03092 Balboa, República de Panamá
1To whom correspondence should be addressed. E-mail: jarrod.jude.scott@gmail.com or drasher@bigelow.org
Abstract
Workflow to generate phyloseq objects, analyze data, and create figures for the fish microbiome paper, processed using the previous DADA2 workflow. Here we report both methods and results in plain text and R code. You can choose to show all code or hide the code using the button in the upper right hand corner of the fron page (default is hide) as well as download the .Rmd file to tweak the code. The document represents a completely reproducible workflow of our study.All you need to do to run this workflow is to install the appropriate R packages and download the following file:
We made several additional data products and processing scripts available.
This document is interactive. You can sort and scroll through most of the tables and the phylogenetic tree is zoomable. In the upper right hand corner of the front page is a Code
button. Use this to show or hide all the code in the document (default is hide) as well as download the .Rmd
file which you can use to extract the code. Data products from the final paper are highlighted in Red.
Before we proceed with microbial community analysis, we will run some analyses on field-based behavioral assays of the different herbivorous reef fish species.
In this first part we go through the steps of defining sample groups, creating phyloseq objects, removing unwanted samples, and removing contaminant ASVs. Various parts of this section can easily be modified to perform different analyses. For example, if you were only interested in a specific taxa or group of samples, you could change the code here to create new phyloseq objects.
In the second part, we assess taxonomic composition as well as alpha and beta diversity. Phyloseq offers many options for assessing diversity, including several alpha diversity metrics, additional ordination and distance methods, and so on. You can play around with these settings to how it affects the results.
We wanted to understand how ASVs partitioned across host species. We also wanted to assess the specificity of each ASV to determine habitat preference. To our knowledge there is no quantitative way to do this. The only attempt we are aware of was MetaMetaDB but it is based on a 454 database and no longer seems to be in active development. So we used an approach based on the work of Sullam et. al., first identifying differentially abundant ASVs, then searching for closest database hits, and finally using phylogenetic analysis and top hit metadata (isolation source, natural host) to infer habitat preference.
In this section we pull together the results from Part III and try to make sense of the microbiomes from these herbivorous reef fish. How are ASVs partitioning across host? How similar are these ASVs to sequences from other studies? What can these patterns tell us about host specificity?
All tables and figures presented below are named as they appeared in the original publication. We also include many additional data productes that were not part of the original publication.
Throughout this workflow we are going to rely on color to help us tell our story. We will use color to delineate host fish species, but more importantly, to delineate microbial taxa. Microbial diversity is pretty vast and it can be difficult to display all of this diversity in a single, static figure. Additionally, many of us have a decreased ability to see color or differences in color. So it is not only important to use relatively few colors but also a color blind friendly palette. For our figures, we generated a palettes based on Bang Wong’s scheme described in this paper. Wong’s color scheme uses contrasting colors that can be distinguished by folks with color vision deficiency—roughly 8% of people (mostly males) are color blind. Do want Keanu Reeves to understand your figures or not?
This scheme is conservative—there are only 7 colors. We added black and grey to give us a little wiggle room. Others have developed 12 and 15 color palette schemes, but be careful—figures with too many colors can inhibit our ability to discern patterns. Our conservative palette forces us to choose carefully when deciding which taxa to target or how many groups to display. Here we will create two palettes—one for microbial taxa with all the colors and another for the five host fish species. The latter is just a subset of the full palette. Here is the code:
#Full palette
friend_pal <- c("#009E73", "#D55E00", "#F0E442", "#CC79A7", "#56B4E9",
"#E69F00", "#0072B2", "#7F7F7F", "#B6DBFF", "#000000")
#Fish palette
samp_pal <- c("#CC79A7", "#0072B2", "#009E73", "#56B4E9", "#E69F00")
To characterize the foraging ecology of the different species of herbivorous fishes, we characterized in detail individual bites by each species at three sites in the Florida Keys (Conch, French, Molasses reefs) during the Boreal summers of 2014 and 2016. At each site, we haphazardly selected focal fish over a wide range of sizes and then randomly selected a single bite by each individual to describe (see Supplementary Table 2 for sample sizes). For each bite, we identified the food item(s) targeted as well as characteristics of the substrate (e.g., hard bottom vs. other common substrates such as sponges, gorgonians, etc.) at the precise location of the bite. For hard substrates, we recorded whether a bite was on a convex, concave, or flat surface, and whether that surface was oriented horizontally (< 45 degrees) or vertically (> 45 degrees). In addition, we framed each bite within a 5 x 5 cm micro-quadrat and measured the depth of the sediment and height of the algae at several points to determine the average sediment depth and algal height within the vicinity of the bite. We then manually removed sediments and determined whether the fish left a distinct grazing scar (i.e., where calcium carbonate had been removed from the reef framework in addition to epilithic algae).
To visualize the multivariate patterns of herbivory we used non-metric multidimensional scaling (NMDS) as implemented via the metaMDS function in the vegan package. For each species of herbivorous fish at each site we calculated the proportion of bites focused on each prey item (‘prey variables’) as well as the proportion of bites targeting substrates with different characteristics (e.g., convex vs. concave vs. flat). We also calculated the proportion of bites resulting in a grazing scar. For bites on turf assemblages, we calculated the mean turf height and sediment depth directly adjacent to each bite. Prior to analysis, quantitative variables (e.g., sediment depth and turf height) were rescaled to the range of 0 to 1. In addition, quantitative and categorical variables were rescaled such that they would have similar influence to the ‘prey variables’ by dividing each variable by the number of prey categories. Rescaled data were then analysed via NMDS using a Bray-Curtis dissimilarity matrix.
First, we read in and display the table describing the foraging ecology of these fish.
library(ggthemes)
all_traits <- read.csv("TABLES/INPUT/Mean_bite_characteristics.txt",
header = TRUE, sep = "\t")
ids <- all_traits[, 1:2]
write.table(all_traits, "TABLES/OUTPUT/SUPP/Table_S1.txt",
sep = "\t", row.names = FALSE, quote = FALSE,
col.names = c("Host species", "Site", "Sediment depth (mm)",
"Turf height(mm)", "Prop. mark on substrate",
"Prop. vertical", "Prop. concave", "Prop. convex",
"Articulated red coralline", "CCA/Turf on CCA",
"Dictyota", "Epiphytes", "Gorgonian", "Halimeda",
"Laurencia", "Sponge", "Stypopodium", "Turf"))
datatable(all_traits, rownames = FALSE, width = "100%",
colnames = c("Host species", "Site", "Sediment depth (mm)",
"Turf height(mm)", "Prop. mark on substrate",
"Prop. vertical", "Prop. concave", "Prop. convex",
"Articulated red coralline", "CCA/Turf on CCA",
"Dictyota", "Epiphytes", "Gorgonian", "Halimeda",
"Laurencia", "Sponge", "Stypopodium", "turf"),
caption = htmltools::tags$caption(
style = "caption-side: bottom; text-align: left;",
"Supplementary Table 1: ",
htmltools::em("INSERT DESCRIPTION.")),
extensions = "Buttons",
options = list(columnDefs = list(list(
className = "dt-left", targets = 0)), dom = "Blrtip",
pageLength = 5, lengthMenu = c(5, 10, 15),
buttons = c("csv", "copy"), scrollX = TRUE,
scrollCollapse = TRUE))
Next we conduct the NMDS analysis of the feeding data using standardize the variables so that they have similar weights. Our process is as follows: a) rescale quantitative traits to be in the range 0 to 1 and divide by the number of diet categories to have similar influence as the diet variables; b) rescale all ‘non diet’ traits to have similar influence to the diet traits by dividing by the number of diet categories divided by the number of categories for each substrate characteristic; c) combine these data into a single data frame for NMDS analysis; d) run NMDS analysis
# First
quant_traits_std <- decostand(all_traits[, 3:4], "range") / 10
# Second
Mean_prop_mark_on_substrate_std <- all_traits[, 5] / (10 / 2)
prop_vertical_std <- all_traits[, 6] / (10 / 2)
prop_concave_std <- all_traits[, 7] / (10 / 3)
prop_convex_std <- all_traits[, 8] / (10 / 3)
all_traits_std <- cbind(quant_traits_std,
Mean_prop_mark_on_substrate_std,
prop_vertical_std, prop_concave_std,
prop_convex_std, all_traits[, 9:18])
nmds <- metaMDS(all_traits_std, distance = "bray", k = 3, trymax = 40)
Then we convert the NMDS data into a dataframe and inspect the Sheppard plot.
NMDS <- data.frame(NMDS1 = nmds$points[, 1],
NMDS2 = nmds$points[, 2],
MDS3 = nmds$points[, 3])
stressplot(nmds)
Finally, we generate the environmental vectors (correlations of variables with ordination axes).
set.seed(1)
vec_sp <- envfit(nmds$points, all_traits[, 3:18],
perm = 1000, choices = c(1, 2, 3))
spp_scrs <- as.data.frame(scores(vec_sp, display = "vectors"))
spp_scrs$species <- rownames(spp_scrs)
colnames(spp_scrs) <- c("S_MDS1", "S_MDS2", "SMDS3", "species")
Now we can plot the results and generate Figure 1 from the manuscript. For all figures generated in this workflow, we coded as much formatting as we could with R and then made minor stylistic changes in Inskscape. Thus, what you see here and throughout, are the raw figures.
#Plot results using ggplot2
#Merge with data that we want to use in plotting
NMDS <- cbind(ids, NMDS)
stuff <- ggplot(NMDS) +
geom_point(mapping = aes(x = NMDS1, y = NMDS2,
shape = Site, colour = Species), size = 4) +
geom_segment(aes(x = 0, y = 0, xend = S_MDS1, yend = S_MDS2),
data = spp_scrs, arrow = arrow(length = unit(0.5, "cm")),
colour = "black", inherit.aes = FALSE) +
geom_text(data = spp_scrs, aes(x = S_MDS1, y = S_MDS2,
label = species), size = 3) +
labs(title = "bray distance on all traits (standardized)",
subtitle = "Stress = 0.04")
#####Now subset just the spp. scrs with the highest correlations
spp_scrs_sub <- spp_scrs[-c(5, 11, 12), ]
#Categorize species scores into different variable types for plotting
variable_type <- as.factor(c(rep("substrate", 5), rep("algae", 8)))
spp_scrs_sub <- cbind(variable_type, spp_scrs_sub)
spp_scrs_substrate <- subset(spp_scrs_sub,
spp_scrs_sub$variable_type == "substrate")
spp_scrs_algae <- subset(spp_scrs_sub,
spp_scrs_sub$variable_type == "algae")
fig_1 <- ggplot(NMDS) +
geom_point(mapping = aes(x = NMDS1, y = NMDS2, shape = Site,
colour = Species), size = 4) +
scale_colour_manual(values = samp_pal) +
geom_segment(aes(x = 0, y = 0, xend = S_MDS1, yend = S_MDS2),
data = spp_scrs_substrate,
arrow = arrow(length = unit(0.5, "cm")), color = "red") +
geom_text(data = spp_scrs_substrate, aes(x = S_MDS1, y = S_MDS2,
label = species),
nudge_y = c(-0.025, -0.025, -0.05, 0.05, 0.05)) +
geom_segment(aes(x = 0, y = 0, xend = S_MDS1, yend = S_MDS2),
data = spp_scrs_algae,
arrow = arrow(length = unit(0.5, "cm")), color = "black") +
geom_text(data = spp_scrs_algae, aes(x = S_MDS1, y = S_MDS2,
label = species),
nudge_y = c(-0.025, 0.025, 0.025, 0.025,
0.025, -0.025, -0.05, -0.025)) +
theme_classic(base_size = 15)
fig_1 <- fig_1 + coord_fixed()
fig_1
0k, on to the microbial data. First, we load the data packet produced by the final step of the DADA2 workflow (Combine_Runs.Rmd), format sample names, and define groupings. We will use the sample names to define the different groups.
load("combo_pipeline.rdata")
samples.out <- rownames(seqtab)
subject <- sapply(strsplit(samples.out, "[[:digit:]]"), `[`, 1)
# this splits the string at first instance of a digit
# use the whole string for individuals
# use the first two letters for genus
# use the next three letters for species
sample_name <- substr(samples.out, 1, 999)
genus <- substr(samples.out, 1, 2)
species <- substr(samples.out, 1, 5)
Groups
And finally we define a sample data frame that holds the different groups we extracted from the sample names. On the right are a few samples and their different groups names.
#define a sample data frame
samdf <- data.frame(SamName = sample_name, Gen = genus, Sp = species)
rownames(samdf) <- samples.out
kable(samdf[c(1, 13, 20, 30, 44), 1:3], row.names = FALSE) %>%
kable_styling(bootstrap_options = "striped",
full_width = FALSE, position = "float_right") %>%
column_spec(1:3, width = "3.5cm")
SamName | Gen | Sp |
---|---|---|
AcCoe01 | Ac | AcCoe |
AcTra05 | Ac | AcTra |
ScTae03 | Sc | ScTae |
SpAur02 | Sp | SpAur |
SpVir07 | Sp | SpVir |
Abbreviations:
Next we create a phyloseq (ps) object with the Silva (slv) taxonomy. There is also a Greengenes (gg) annotation in the output file from DADA2 which can be used instead of the Silva annotation. Just change tax_silva
to tax_gg
. At this point we rename the amplicon sequence variants (ASVs) so the designations are a bit more user friendly. By default, DADA2 names each ASV by its unique sequence so that data can be directly compared across studies (which is great). But this convention can get cumbersome downstream, so we rename the ASVs using a simpler convention—ASV1, ASV2, ASV3, and so on, while retaining the exact sequences.
The phyloseq object looks like this:
# this create the phyloseq object
ps_slv <- phyloseq(otu_table(seqtab, taxa_are_rows = FALSE),
sample_data(samdf), tax_table(tax_silva))
tax_table(ps_slv) <- cbind(tax_table(ps_slv), rownames(tax_table(ps_slv)))
# adding unique ASV names
taxa_names(ps_slv) <- paste0("ASV", seq(ntaxa(ps_slv)))
tax_table(ps_slv) <- cbind(tax_table(ps_slv), rownames(tax_table(ps_slv)))
ps_slv
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 12479 taxa and 53 samples ]
## sample_data() Sample Data: [ 53 samples by 3 sample variables ]
## tax_table() Taxonomy Table: [ 12479 taxa by 8 taxonomic ranks ]
While the ASV names look like this: ASV1, ASV2, ASV3, ASV4, ASV5, ASV6 and so on…
At this point we have a completely unadulterated phyloseq object because it contains all ASVs and all samples. We add two final columns with the actual ASV sequences and ASV IDs. This will be useful later when trying to export a fasta file. Finally, we export the sequence and taxonomy tables, for posterity sake.
colnames(tax_table(ps_slv)) <- c("Kingdom", "Phylum", "Class", "Order",
"Family", "Genus", "ASV_SEQ", "ASV_ID")
write.table(tax_table(ps_slv), "TABLES/OUTPUT/PS/full_tax_table.txt",
sep = "\t", quote = FALSE, col.names = NA)
write.table(t(otu_table(ps_slv)), "TABLES/OUTPUT/PS/full_seq_table.txt",
sep = "\t", quote = FALSE, col.names = NA)
write.table(sample_data(ps_slv), "TABLES/OUTPUT/PS/full_sample_data.txt",
sep = "\t", quote = FALSE, row.names = FALSE)
Remember three of these samples were omitted because we did not have replicates for the host species. Lets remove those samples. The only way we could figure out how to do this was by selecting the samples we wanted to keep. If you want to change the group of samples, modify the script accordingly.
ps_slv_base <- prune_samples(c("SpAur01", "SpAur02", "SpAur03", "SpAur04",
"SpAur10", "SpAur11", "SpAur12", "SpAur13",
"SpVir01", "SpVir02", "SpVir03", "SpVir04",
"SpVir05", "SpVir06", "SpVir07", "SpVir08",
"SpVir09", "SpVir10", "SpVir11", "AcCoe01",
"AcCoe02", "AcCoe03", "AcCoe04", "AcCoe05",
"AcCoe06", "AcCoe07", "AcCoe08", "AcTra01",
"AcTra02", "AcTra03", "AcTra04", "AcTra05",
"AcTra06", "AcTra07", "AcTra08", "AcTra09",
"ScTae01", "ScTae02", "ScTae03", "ScTae04",
"ScTae05", "ScTae06", "ScTae07", "ScTae08",
"ScTae09", "SpAur05", "SpAur06", "SpAur07",
"SpAur08", "SpAur09"), ps_slv)
ps_slv_base
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 12479 taxa and 50 samples ]
## sample_data() Sample Data: [ 50 samples by 3 sample variables ]
## tax_table() Taxonomy Table: [ 12479 taxa by 8 taxonomic ranks ]
0K, three samples gone. But we probably lost some ASVs when use we removed samples. So we need to get rid of any ASVs that have now a total of 0 reads. This will be our working phyloseq object.
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 12040 taxa and 50 samples ]
## sample_data() Sample Data: [ 50 samples by 3 sample variables ]
## tax_table() Taxonomy Table: [ 12040 taxa by 8 taxonomic ranks ]
Great, there were 439 ASVs found only in those three samples.
We can also export seq and tax tables for our trimmed dataset and get a quick summary of the trimmed dataset before removing unwanted reads. .
write.table(tax_table(ps_slv_work), "TABLES/OUTPUT/PS/trim_tax_table.txt",
sep = "\t", quote = FALSE, col.names = NA)
write.table(t(otu_table(ps_slv_work)), "TABLES/OUTPUT/PS/trim_seq_table.txt",
sep = "\t", quote = FALSE, col.names = NA)
write.table(sample_data(ps_slv_work), "TABLES/OUTPUT/PS/trim_sample_data.txt",
sep = "\t", quote = FALSE, row.names = FALSE)
# general stats for the dataset.
sample_sum_df_raw <- data.frame(sum = sample_sums(ps_slv_work))
total_reads_raw <- sum(otu_table(ps_slv_work))
smin_raw <- as.integer(min(sample_sums(ps_slv_work)))
smean_raw <- as.integer(mean(sample_sums(ps_slv_work)))
smax_raw <- as.integer(max(sample_sums(ps_slv_work)))
Looks like the total number of reads in the dataset (after removing unwanted samples) is 3120211; range of 12178 to 182696 reads per sample and an average of 62404 reads per sample.
These samples are intestinal communities and we assume that Chloroplast are not contributing to metabolism. These data could be useful later but for now lets create a phyloseq object without Chloroplast.
WARNING: the subset_taxa
command removes anything that is NA
for the specified taxonomic level or above. For example, lets say you run the subset_taxa
command using Order != "Chloroplast"
. Seems like you should get a phyloseq object with everything except Chloroplast. But actually the command not only gets rid Chloroplast but everything else that has NA
for Order and above. In our experience this is not well documented and we had to dig through the files to figure out what was happening.
Our dataset has 590 Chloroplast ASVs and running the command as is removed an additional 1244 ASVs. So lets see if we can get rid of just Chloroplast ASVs without removing everything that is unclassified at Order and above. To do this, we subset the taxa to generate a ps object of just Chloroplast, selected the ASV column only, turned it into a factor, and used this to remove Chloroplast from the ps object.
# generate a file with Chloroplast ASVs
chloro_p_ps <- subset_taxa(ps_slv_work, Order == "Chloroplast")
chloro_p_tab <- as(tax_table(chloro_p_ps), "matrix")
chloro_p_tab <- chloro_p_tab[, 8]
chloro_p_df <- as.factor(chloro_p_tab)
goodTaxaCH <- setdiff(taxa_names(ps_slv_work), chloro_p_df)
ps_slv_work_no_cyano <- prune_taxa(goodTaxaCH, ps_slv_work)
ps_slv_work_no_cyano
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 11450 taxa and 50 samples ]
## sample_data() Sample Data: [ 50 samples by 3 sample variables ]
## tax_table() Taxonomy Table: [ 11450 taxa by 8 taxonomic ranks ]
###### Summarize data
total_asv_chloro_p <- length(chloro_p_df)
sample_sum_chloro_p_ps <- data.frame(sum = sample_sums(chloro_p_ps))
total_reads_chloro_p_ps <- sum(otu_table(chloro_p_ps))
smin_chloro_p_ps <- min(sample_sums(chloro_p_ps))
smean_chloro_p_ps <- mean(sample_sums(chloro_p_ps))
smax_chloro_p_ps <- max(sample_sums(chloro_p_ps))
write.table(tax_table(chloro_p_ps),
"TABLES/OUTPUT/PS/chloroplast_tax_table.txt",
sep = "\t", quote = FALSE, col.names = NA)
write.table(t(otu_table(chloro_p_ps)),
"TABLES/OUTPUT/PS/chloroplast_seq_table.txt",
sep = "\t", quote = FALSE, col.names = NA)
write.table(sample_data(chloro_p_ps),
"TABLES/OUTPUT/PS/chloroplast_sample_data.txt",
sep = "\t", quote = FALSE, row.names = FALSE)
This step removed 590 Chloroplast ASVs encompassing 149274 total reads. Perfect.
And now we use the same approach to remove Mitochondria.
# generate a file with mitochondria ASVs
MT1_ps <- subset_taxa(ps_slv_work_no_cyano, Family == "Mitochondria")
MT1 <- as(tax_table(MT1_ps), "matrix")
MT1 <- MT1[, 8]
MT1df <- as.factor(MT1)
goodTaxa <- setdiff(taxa_names(ps_slv_work_no_cyano), MT1df)
ps_slv_work_filt <- prune_taxa(goodTaxa, ps_slv_work_no_cyano)
ps_slv_work_filt
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 11144 taxa and 50 samples ]
## sample_data() Sample Data: [ 50 samples by 3 sample variables ]
## tax_table() Taxonomy Table: [ 11144 taxa by 8 taxonomic ranks ]
###### Summarize data
total_asv_MT1 <- length(MT1df)
# colnames(tax_table(MT1_ps))
sample_sum_MT1_ps <- data.frame(sum = sample_sums(MT1_ps))
total_reads_MT1_ps <- sum(otu_table(MT1_ps))
smin_MT1_ps <- min(sample_sums(MT1_ps))
smean_MT1_ps <- mean(sample_sums(MT1_ps))
smax_MT1_ps <- max(sample_sums(MT1_ps))
Sweet, looks like this removed 306 Mitochondria ASVs encompassing 35941 total reads.
# general stats for the dataset.
#colnames(tax_table(ps_slv_work_filt))
sample_sum_df <- data.frame(sum = sample_sums(ps_slv_work_filt))
total_reads <- sum(otu_table(ps_slv_work_filt))
smin <- as.integer(min(sample_sums(ps_slv_work_filt)))
smean <- as.integer(mean(sample_sums(ps_slv_work_filt)))
smax <- as.integer(max(sample_sums(ps_slv_work_filt)))
After removing contaminants here is what the final dataset looks like:
One last thing to do is to create a merged phyloseq object where samples are grouped by host species. This will come in handy later for some analyses.
mergedGP <- merge_samples(ps_slv_work_filt, "Sp")
SD <- merge_samples(sample_data(ps_slv_work_filt), "Sp")
mergedGP
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 11144 taxa and 5 samples ]
## sample_data() Sample Data: [ 5 samples by 3 sample variables ]
## tax_table() Taxonomy Table: [ 11144 taxa by 8 taxonomic ranks ]
Great, still the same number of ASVs and now only 5 “samples” corresponding to the 5 species: AcCoe, AcTra, ScTae, SpAur, SpVir.
There are now the several phyloseq objects to chose from and, using the above methods, additional objects can easily be created.
ps_slv
–> phyloseq dataset with all 53 samples, all ASVs.ps_slv_base
–> phyloseq dataset with 50 samples, all ASVs (this is not very useful).ps_slv_work
–> phyloseq dataset with 50 samples, zero-read ASVs removed.ps_slv_work_filt
–> phyloseq dataset with 50 samples, ASVs and reads from Mitochondria and Chloroplast removed.mergedGP
–> ps_slv_work_filt
phyloseq dataset collapsed by host species.Here we can save some phyloseq objects so we can make some Shiny Apps at some point.
Before we do anything else, lets generate summary data for each host. We can generate a summary report for any ps object but we will use the object with mitochondria and chlorplasts removed, as well as the low replicate host species removed. We will also add details about each host. The table is displayed below. We can use these data when we upload the original fastq files to sequence read archives. Later on we will also add alpha diversity stats and save the table.
total_reads <- sample_sums(ps_slv_work_filt)
total_reads <- as.data.frame(total_reads, make.names = TRUE)
total_reads <- total_reads %>% rownames_to_column("host_ID")
total_asvs <- estimate_richness(ps_slv_work_filt, measures = "Observed")
total_asvs <- total_asvs %>% rownames_to_column("host_ID")
sam_details <- sample_data(ps_slv)
sam_details <- sam_details %>% mutate(genus = case_when(
Gen == "Ac" ~ "Acanthurus",
Gen == "Sc" ~ "Scarus",
Gen == "Sp" ~ "Sparisoma"))
sam_details <- sam_details %>% mutate(species = case_when(
Sp == "AcCoe"~ "coeruleus",
Sp == "AcTra"~ "tractus",
Sp == "ScTae"~ "taeniopterus",
Sp == "SpAur"~ "aurofrenatum",
Sp == "SpVir"~ "viride"))
#Sp == "SpChr"~ "chrysopterum",
#Sp == "ScVet"~ "vetula"
sam_details <- sam_details %>% mutate(common_name = case_when(
Sp == "AcCoe" ~ "blue tang surgeonfish",
Sp == "AcTra" ~ "fiveband surgeonfish",
Sp == "ScTae" ~ "princess parrotfish",
Sp == "SpAur" ~ "redband parrotfish",
Sp == "SpVir" ~ "stoplight parrotfish"))
#Sp == "SpChr" ~ "redtail parrotfish",
#Sp == "ScVet" ~ "queen parrotfish"))
sam_details <- sam_details %>% mutate(NCBI_txid = case_when(
Sp == "AcCoe" ~ "157585",
Sp == "AcTra" ~ "1316013",
Sp == "ScTae" ~ "544418",
Sp == "SpAur" ~ "59663",
Sp == "SpVir" ~ "59666"))
#Sp == "SpChr" ~ "51766",
#Sp == "ScVet" ~ "84543"))
sam_details <- sam_details[-c(2, 3)]
colnames(sam_details) <- c("host_ID", "host_genus",
"host_species", "full_name",
"NCBI_txid")
merge_tab <- merge(sam_details, total_reads, by = "host_ID")
merge_tab2 <- merge(merge_tab, total_asvs, by = "host_ID")
colnames(merge_tab2) <- c("host_ID", "host_genus",
"host_species", "common_name",
"NCBI_txid", "total_reads",
"total_ASVs")
# We also have a datatable containing metrics for each host. Lets bring this in
# and merge with the summary table
metrics <- read.table("TABLES/INPUT/host_metrics.txt",
sep = "\t", header = TRUE)
host_details <- merge(merge_tab2, metrics, by = "host_ID")
colnames(host_details) <- c("host_ID", "host_genus", "host_species",
"common_name", "NCBI_txid", "total_reads",
"total_ASVs", "collection_date", "phase",
"weight", "total_length", "foregut_length",
"midgut_length", "hindgut_length",
"total_gut_length")
datatable(host_details, rownames = FALSE, width = "100%",
colnames = c("host_ID", "host_genus", "host_species",
"common_name", "NCBI_txid", "total_reads",
"total_ASVs", "Collection_date", "Phase",
"Weight (g)", "Total length (cm)",
"Fore gut length (cm)", "Mid gut length (cm)",
"Hind gut length (cm)", "Total gut length (cm)"),
caption = htmltools::tags$caption(style = "caption-side:
bottom; text-align: left;",
"Table: ",
htmltools::em("Sample summary.")),
extensions = "Buttons",
options = list(columnDefs =
list(list(className = "dt-left", targets = 0)),
dom = "Blfrtip", pageLength = 5,
lengthMenu = c(5, 10, 25, 50),
buttons = c("csv", "copy"),
scrollX = TRUE, scrollCollapse = TRUE))
Now we have a nice little summary table about each sample—genus/species, common name, number of reads, number of ASVs, etc. All of this info can be used when submitting samples to sequence read archives. Once we conduct alpha diversity estimates below, we will add that data to the table above and export as Supplementary Table 3.
What are the dominant taxa in this system? How diverse are these communities? How similar are samples to each other?
Here we look at a) Taxonomic diversity, b) \(\alpha\)-diversity, and c) \(\beta\)-diversity
Before we can start to understand a system, we need to know something about its parts. So lets start with a quick look at Class-level diversity. Of course, you can change this to any taxonomic rank you wish. Here we created a sortable table that has the total number of reads and ASVs for each class
# generate the ASV table
tax_asv <- table(tax_table(ps_slv_work_filt)[, "Class"],
exclude = NULL, dnn = "Taxa")
tax_asv <- as.data.frame(tax_asv, make.names = TRUE)
#### change <NA> to Unclassified
# Get levels and add "None"
levels <- levels(tax_asv$Taxa)
levels[length(levels) + 1] <- "Unclassified"
# refactor Taxa to include "Unclassified" as a factor level
# and replace NA with "Unclassified"
tax_asv$Taxa <- factor(tax_asv$Taxa, levels = levels)
tax_asv$Taxa[is.na(tax_asv$Taxa)] <- "Unclassified"
# generate the reads table
tax_reads <- factor(tax_table(ps_slv_work_filt)[, "Class"], exclude = NULL)
tax_reads <- apply(otu_table(ps_slv_work_filt), MARGIN = 1,
function(x) {
tapply(x, INDEX = tax_reads,
FUN = sum, na.rm = FALSE,
simplify = TRUE)})
#RENAME NA --> Unclassified
rownames(tax_reads)[72] <- "Unclassified"
tax_reads <- as.data.frame(tax_reads, make.names = TRUE)
tax_reads <- cbind(tax_reads, reads = rowSums(tax_reads))
#DELETE all but last column
tax_reads <- tax_reads[51]
tax_reads <- setDT(tax_reads, keep.rownames = TRUE)[]
# merge the two tables and make everything look pretty
# in an interactive table
taxa_read_asv_tab <- merge(tax_reads, tax_asv, by.x = "rn", by.y = "Taxa")
taxa_read_asv_tab <- mutate(taxa_read_asv_tab,
prop_of_ASVs = Freq / sum(Freq),
prop_of_reads = reads / sum(reads))
taxa_read_asv_tab <- taxa_read_asv_tab[c(1, 2, 5, 3, 4)]
names(taxa_read_asv_tab) <- c("Class", "total_reads", "prop_of_reads",
"total_ASVs", "prop_of_ASVs")
taxa_read_asv_tab2 <- taxa_read_asv_tab
taxa_read_asv_tab2$prop_of_reads <- round(taxa_read_asv_tab2$prop_of_reads,
digits = 6)
taxa_read_asv_tab2$prop_of_ASVs <- round(taxa_read_asv_tab2$prop_of_ASVs,
digits = 6)
For your information, the dataset has a total of 2934996 reads across 11144 ASVs.
#kills sci notation
options(scipen = 999)
write.table(taxa_read_asv_tab2, "TABLES/OUTPUT/SUPP/Table_S4.txt",
sep = "\t", row.names = FALSE, quote = FALSE)
datatable(taxa_read_asv_tab2, rownames = FALSE, width = "100%",
colnames = c("Class", "total_reads", "prop_of_reads",
"total_ASVs", "prop_of_ASVs"),
caption =
htmltools::tags$caption(
style = "caption-side: bottom; text-align: left;",
"Table: ",
htmltools::em("Total reads & ASVs by Class")),
extensions = "Buttons",
options = list(columnDefs =
list(list(className = "dt-left", targets = 0)),
dom = "Blfrtip", pageLength = 5,
lengthMenu = c(5, 10, 35, 70),
buttons = c("csv", "copy")))
Looks like Proteobacteria, Firmicutes, Fusobacteria, Planctomycetes, and Bacteroidetes dominate in the read department. Curiously, Fusobacteria has comparatively low ASV richness.
Much of the analyses we do from here on out will be at the Class & Family levels. We chose not to focus on the Genus level because there simply is not enough resolution in our dataset to build a cohesive story. This is because these fish are (microbially) understudied and we are dealing with short read data. On the other hand, Phylum level is too coarse for groups like Proteobacteria and Firmicutes. Order did not provide any additional information and can be cumbersome for taxa with poorly resolved lineages. Depending on the dataset, you may want to change your strategy.
Lets take a closer look Class-level taxonomic content of these communities. There are numerous ways to do this but here we chose to collapse samples by host species and display the relative abundance of the most dominant taxa. We also generated alternative views of taxonomic composition for individual samples—a box-and-whisker plot as well as separate bar plots–that are included as Supplementary Figure 1 (see below for code).
Stacked bar charts are not the best but we like them for a birds eye view of the data. Here we calculate the relative abundance of taxa for each host species at the Class level. It turns out this is not too easy in phyloseq and there is a lot of (messy) code.
# calculate the averages and merge by species
ps_slv_filt_AVG <- transform_sample_counts(ps_slv_work_filt,
function(x) x / sum(x))
mergedGP_BAR <- merge_samples(ps_slv_filt_AVG, "Sp")
SD_BAR <- merge_samples(sample_data(ps_slv_filt_AVG), "Sp")
# merge taxa by rank. If you choose a different rank be sure to change
# the rank throughout this code chunk
mdata_phy <- tax_glom(mergedGP_BAR, taxrank = "Class", NArm = FALSE)
mdata_phyrel <- transform_sample_counts(mdata_phy, function(x) x / sum(x))
meltd <- psmelt(mdata_phyrel)
meltd$Class <- as.character(meltd$Class)
# calculate the total relative abundance for all taxa
means <- ddply(meltd, ~Class, function(x) c(mean = mean(x$Abundance)))
means$mean <- round(means$mean, digits = 8)
# this order in decending fashion
taxa_means <- means[order(-means$mean), ]
# ditch the sci notation
taxa_means <- format(taxa_means, scientific = FALSE)
#RENAME NA to UNCLASSIFIED
taxa_means$Class <- gsub("NA", "Unclassified", taxa_means$Class)
Since our goal is to generate a figure and we only have 9 colors, some taxa will need to be put into an Other category. We can define ‘Other’ however we like so lets take a look at the overall relative abundance of each Class.
datatable(taxa_means, rownames = FALSE, width = "65%",
colnames = c("Class", "mean"), caption =
htmltools::tags$caption(style = "caption-side: bottom;
text-align: left;", "Table: ",
htmltools::em("Class-level
relative abundance.")),
extensions = "Buttons",
options = list(columnDefs = list(list(className = "dt-center",
targets = "_all")),
dom = "Blfrtip", pageLength = 10,
lengthMenu = c(5, 10, 50, 70),
buttons = c("csv", "copy")))
Inspecting the table it looks like if we choose a cutoff of 2% (0.02) we get 9 taxa—sounds pretty good. The rest go into the ‘Other’ category. No matter what, we will always gloss over some groups using such a coarse approach. But as we will see later, some of these low abundance groups will reappear when we look at the level of individual ASVs.
Here we define the Other category by combining all taxa with less than 2% of total reads.
Other <- means[means$mean <= 0.02, ]$Class
# or you can chose specifc taxa like this
# Other_manual <- c("list", "taxa", "in", "this", "format")
At a 2% abundance cutoff, 63 Classes are grouped into the ‘Other’ category. Next we will melt all these classes into the Other category and then craft the bar chart. It took some tweaking to get the bar chart to look just right—so there is a lot of code here—and it could most certainly be better. While we’re at it, we will also save a copy of the figure so we can tweak it later and make it look pretty.
meltd[meltd$Class %in% Other, ]$Class <- "Other"
samp_names <- aggregate(meltd$Abundance,
by = list(meltd$Sample), FUN = sum)[, 1]
.e <- environment()
meltd[, "Class"] <- factor(meltd[, "Class"], sort(unique(meltd[, "Class"])))
meltd <- meltd[order(meltd[, "Class"]), ]
# Here we order Classes by the Phylum they belong to.
meltd$Class <- factor(meltd$Class,
levels = c("Bacteroidia", "Clostridia",
"Erysipelotrichia", "Fusobacteriia",
"Alphaproteobacteria", "Deltaproteobacteria",
"Gammaproteobacteria", "Planctomycetacia",
"Oxyphotobacteria", "Other"))
fig2A <- ggplot(meltd,
aes_string(x = "Sample", y = "Abundance", fill = "Class"),
environment = .e,
ordered = TRUE,
xlab = "x-axis label", ylab = "y-axis label")
fig2A <- fig2A + geom_bar(stat =
"identity",
position = position_stack(reverse = TRUE),
width = 0.95) +
coord_flip() +
theme(aspect.ratio = 1 / 2)
fig2A <- fig2A + scale_fill_manual(values = friend_pal)
fig2A <- fig2A + theme(axis.text.x = element_text(angle = 0,
hjust = 0.45,
vjust = 1))
fig2A <- fig2A + guides(fill = guide_legend(override.aes = list(colour = NULL),
reverse = FALSE)) +
theme(legend.key = element_rect(colour = "black"))
fig2A <- fig2A + labs(x = "Host species",
y = "Relative abundance (% total reads)",
title = "Abundance of bacterial taxa across host species")
fig2A <- fig2A + theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(colour = "black",
fill = NA, size = 1))
fig2A
Armed with a picture of taxonomic composition we can move on to diversity estimates.
Alpha diversity describes the diversity in a sample or site. There are several alpha diversity metrics available in phyloseq: Observed
, Chao1
, ACE
, Shannon
, Simpson
, InvSimpson
, Fisher
. Play around to see how different metrics change or confirm these results.
Here we want to know if diversity is significantly different across host species. In order to do that we need to know if we should run a parametric or non-parametric test, and for that we need to know if our data is normally distributed. Most of the ideas/code for alpha (and subsequent beta) diversity statistics come from this workshop tutorial by Kim Dill-McFarland and Madison Cox.
First we run the diversity estimates, add these data to our summary table, and save a copy of this table.
diversity <- estimate_richness(ps_slv_work_filt,
measures = c("Observed", "Chao1", "ACE",
"Shannon", "Simpson", "InvSimpson",
"Fisher"))
diversity_calc <- diversity %>% rownames_to_column("host_ID")
# round values
diversity_calc[c(3, 5, 10)] <- round(diversity_calc[c(3, 5, 10)], 1)
diversity_calc[c(4, 6, 7, 9)] <- round(diversity_calc[c(4, 6, 7, 9)], 2)
diversity_calc[8] <- round(diversity_calc[8], 3)
host_summary <- merge(host_details, diversity_calc)
host_summary$Observed <- NULL
host_summary <- host_summary[c(1, 2, 3, 4, 5, 8, 9, 10, 11, 12, 13,
14, 15, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23)]
write.table(host_summary, "TABLES/OUTPUT/SUPP/Table_S3.txt",
sep = "\t", row.names = FALSE, quote = FALSE,
col.names = c("Sample ID", "Host genus", "Host species",
"Common name", "NCBI tAxID", "Collection date",
"Life phase", "Weight (g)", "Total length (cm)",
"Foregut length (cm)", "Midgut length (cm)",
"Hindgut length (cm)", "Total gut length (cm)",
"Total reads", "Total ASVs", "Chao1", "Chao1 (se)",
"ACE", "ACE (se)", "Shannon", "Simpson",
"InvSimpson", "Fisher"))
datatable(host_summary, rownames = FALSE, width = "100%",
colnames = c("Sample ID", "Host genus", "Host species",
"Common name", "NCBI tAxID", "Collection date",
"Life phase", "Weight (g)", "Total length (cm)",
"Foregut length (cm)", "Midgut length (cm)",
"Hindgut length (cm)", "Total gut length (cm)",
"Total reads", "Total ASVs", "Chao1", "Chao1 (se)",
"ACE", "ACE (se)", "Shannon", "Simpson", "InvSimpson",
"Fisher"),
caption =
htmltools::tags$caption(style =
"caption-side: bottom; text-align:
left;", "Table: ",
htmltools::em("Host-associated metadata &
microbial diversity")),
extensions = "Buttons", options =
list(columnDefs = list(list(className = "dt-left", targets = 0)),
dom = "Blfrtip", pageLength = 5, lengthMenu = c(5, 10, 50),
buttons = c("csv", "copy"), scrollX = TRUE,
scrollCollapse = TRUE))
Next, we add the diversity estimates to our phyloseq object, and test if the data are normally distributed using Shapiro-Wilk Normality test. We will focus on the inverse Simpson and Shannon diversity estimates and Chao’s richness estimate but this approach can be used for any metric.
# Convert to ps object
sample_div <- sample_data(diversity)
# Create new ps object with diversity estimates added to sample_data
ps_slv_work_filt_div <- merge_phyloseq(ps_slv_work_filt, sample_div)
# Run Shapiro test
shapiro_test_Shan <- shapiro.test(sample_data(ps_slv_work_filt_div)$Shannon)
shapiro_test_invSimp <- shapiro.test(sample_data(ps_slv_work_filt_div)$InvSimpson)
shapiro_test_Chao1 <- shapiro.test(sample_data(ps_slv_work_filt_div)$Chao1)
shapiro_test_Observed <- shapiro.test(sample_data(ps_slv_work_filt_div)$Observed)
Shapiro-Wilk Normality Test for Shannon index.
##
## Shapiro-Wilk normality test
##
## data: sample_data(ps_slv_work_filt_div)$Shannon
## W = 0.98228, p-value = 0.6513
Shapiro-Wilk Normality Test for inverse Simpson index.
##
## Shapiro-Wilk normality test
##
## data: sample_data(ps_slv_work_filt_div)$InvSimpson
## W = 0.60454, p-value = 0.0000000002276
Shapiro-Wilk Normality Test for Chao1 richness estimator.
##
## Shapiro-Wilk normality test
##
## data: sample_data(ps_slv_work_filt_div)$Chao1
## W = 0.89305, p-value = 0.0002855
Shapiro-Wilk Normality Test for Observed ASV richness estimator.
##
## Shapiro-Wilk normality test
##
## data: sample_data(ps_slv_work_filt_div)$Observed
## W = 0.89591, p-value = 0.0003531
Ok, since the p-values are significant for the inverse Simpson, Chao richness, and Observed ASV richness we reject the null hypothesis that these data are normally distributed. However, the Shannon estimates appear normally distributed. So lets see if diversity is significantly different between host species based on the Shannon index.
Since the Shannon data is normally distributed we can test for significance using ANOVA (a parametric test).
sampledataDF <- data.frame(sample_data(ps_slv_work_filt_div))
aov.shannon <- aov(Shannon ~ Sp, data = sampledataDF)
#Call for the summary of that ANOVA, which will include P-values
summary(aov.shannon)
## Df Sum Sq Mean Sq F value Pr(>F)
## Sp 4 19.13 4.782 3.906 0.00833 **
## Residuals 45 55.10 1.224
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Ok, the results of the ANOVA are significant. Here we use the Tukey’s HSD (honestly significant difference) post-hoc test to determine which pairwise comparisons are different.
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = Shannon ~ Sp, data = sampledataDF)
##
## $Sp
## diff lwr upr p adj
## AcTra-AcCoe 0.7421379 -0.78560423 2.26987998 0.6431490
## ScTae-AcCoe 0.4727303 -1.05501185 2.00047236 0.9030495
## SpAur-AcCoe -0.9230990 -2.33591246 0.48971439 0.3551484
## SpVir-AcCoe 0.3323910 -1.12853187 1.79331396 0.9664219
## ScTae-AcTra -0.2694076 -1.75153517 1.21271993 0.9852679
## SpAur-AcTra -1.6652369 -3.02859596 -0.30187785 0.0096995
## SpVir-AcTra -0.4097468 -1.82289999 1.00340634 0.9218562
## SpAur-ScTae -1.3958293 -2.75918834 -0.03247023 0.0424253
## SpVir-ScTae -0.1403392 -1.55349237 1.27281396 0.9985589
## SpVir-SpAur 1.2554901 -0.03255018 2.54353034 0.0593081
Looks like Sparisoma aurofrenatum is significantly different from Scarus taeniopterus and Acanthurus tractus.
Now we can look at the results on the inverse Simpson diversity and Chao’s richness. Since host species is categorical, we use Kruskal-Wallis (non-parametric equivalent of ANOVA) to test for significance.
Kruskal-Wallis of inverse Simpson index.
#library(FSA)
#dunnTest(InvSimpson ~ Sp, data = sampledataDF, method="bh")
kruskal.test(InvSimpson ~ Sp, data = sampledataDF)
##
## Kruskal-Wallis rank sum test
##
## data: InvSimpson by Sp
## Kruskal-Wallis chi-squared = 13.779, df = 4, p-value = 0.008034
Kruskal-Wallis of Chao1 richness estimator.
#dunnTest(Chao1 ~ Sp, data = sampledataDF, method="bh")
kruskal.test(Chao1 ~ Sp, data = sampledataDF)
##
## Kruskal-Wallis rank sum test
##
## data: Chao1 by Sp
## Kruskal-Wallis chi-squared = 13.992, df = 4, p-value = 0.007321
Kruskal-Wallis of Observed ASV richness index.
#library(FSA)
#dunnTest(Observed ~ Sp, data = sampledataDF, method="bh")
kruskal.test(Observed ~ Sp, data = sampledataDF)
##
## Kruskal-Wallis rank sum test
##
## data: Observed by Sp
## Kruskal-Wallis chi-squared = 14.153, df = 4, p-value = 0.006822
For the inverse Simpson, Chao1, and Observed richness the results of the Kruskal-Wallis rank sum test are significant. So we can look at pairwise comparisons using Wilcoxon rank sum test for post-hoc analysis.
Pairwise significance test for inverse Simpson index.
##
## Pairwise comparisons using Wilcoxon rank sum test
##
## data: sampledataDF$InvSimpson and sampledataDF$Sp
##
## AcCoe AcTra ScTae SpAur
## AcTra 0.545 - - -
## ScTae 0.963 0.545 - -
## SpAur 0.042 0.042 0.108 -
## SpVir 0.545 0.789 0.545 0.004
##
## P value adjustment method: fdr
Pairwise significance test for Chao1 richness estimator.
##
## Pairwise comparisons using Wilcoxon rank sum test
##
## data: sampledataDF$Chao1 and sampledataDF$Sp
##
## AcCoe AcTra ScTae SpAur
## AcTra 0.628 - - -
## ScTae 0.617 0.666 - -
## SpAur 0.030 0.030 0.019 -
## SpVir 0.666 0.628 0.304 0.046
##
## P value adjustment method: fdr
Pairwise significance test for Observed ASV richness index.
##
## Pairwise comparisons using Wilcoxon rank sum test
##
## data: sampledataDF$Observed and sampledataDF$Sp
##
## AcCoe AcTra ScTae SpAur
## AcTra 0.677 - - -
## ScTae 0.677 0.730 - -
## SpAur 0.026 0.026 0.019 -
## SpVir 0.730 0.677 0.304 0.039
##
## P value adjustment method: fdr
Again we see that only Sp. aurofrenatum is significantly different from the other hosts. For the inverse Simpson index, Sp. aurofrenatum is significantly different from three of the four host species and Chao1 richness estimator, Sp. aurofrenatum is significantly different from all other host species. Now we can plot the results.
Here we plot the results of Shannon diversity index. We will save a copy of the figure for later tweaking. We use the color palette described above to delineate host species.
fig2B <- plot_richness(ps_slv_work_filt, x = "Sp",
measures = c("Observed",
"Shannon",
"InvSimpson",
"Chao1"),
color = "Sp", nrow = 1)
fig2B <- fig2B + geom_boxplot() + geom_jitter(width = 0.05)
fig2B <- fig2B + scale_colour_manual(values = samp_pal) +
labs(x = "Host species",
y = "Diversity",
title = "Alpha diversity of bacterial
communities in herbivorous reef fish")
#fig2B + geom_boxplot(aes(colour = black))
#fig2B <- fig2B + theme_bw() + geom_point(size = 2.5, aes(color = Sp)) +
fig2B
Next we wanted to know if any alpha-diversity metrics were correlated with host physical characteristics. At the time of collection, we recorded host weight, total length, total gut length, as well as the length of individual gut segments (fore, mid, hind).
When considering the dataset as a whole (i.e., all samples), we found no correlation between any physical characteristics and any diversity metrics. If we split samples by genera we found that neither Acanthurus nor Scarus were not significant for any parameters while Sparisoma showed significant results for all parameters except hindgut_length.
dt <- read.table("TABLES/OUTPUT/SUPP/Table_S3.txt",
sep = "\t", header = TRUE)
library(ggpubr)
scarus <- host_summary[host_summary$host_genus %in% "Scarus", ]
sparisoma <- host_summary[host_summary$host_genus %in% "Sparisoma", ]
acanthurus <- host_summary[host_summary$host_genus %in% "Acanthurus", ]
alphametric <- c("total_ASVs", "Chao1", "ACE", "Shannon",
"Simpson", "InvSimpson", "Fisher")
physical_char <- c("weight", "total_length", "foregut_length",
"midgut_length", "hindgut_length", "total_gut_length")
# Full set: not significant -> "weight", "total_length"
#"foregut_length", R = 0.32
#"midgut_length", R = 0.50
#"hindgut_length", R = 0.17-0.34
#"total_gut_length" R = 0.50
#By genus and midgut_length :
# scarus NS
# sparisoma R = 0.8
# acanthurus NS
# acanthurus not significant for any parameters
# scarus not significant for any parameters
# sparisoma significant for all parameters except hindgut_length was a bit weak
# "weight", "total_length" "foregut_length" "midgut_length"
# "hindgut_length" "total_gut_length"
# To do all diversity metric change "y = " to y = alphametric and
# ylab = alphametric
par(mfrow = c(2, 3))
shan_by_length <- ggscatter(host_summary, x = "total_gut_length",
y = "Shannon", add = "reg.line",
conf.int = FALSE,cor.coef = TRUE,
cor.method = "spearman",
xlab = "total_length (cm)", ylab = "Shannon",
color = "host_genus", palette = samp_pal,
legend = "bottom")
shan_by_weight <- ggscatter(host_summary, x = "weight", y = "Shannon",
add = "reg.line", conf.int = FALSE,
cor.coef = TRUE, cor.method = "spearman",
xlab = "weight (g)", ylab = "Shannon",
color = "host_genus", palette = samp_pal, legend = "top")
grid.arrange(shan_by_length, shan_by_weight, ncol = 2)
#ggscatter(host_summary, x = "total_gut_length", y = alphametric,
# add = "reg.line", conf.int = FALSE,
# cor.coef = TRUE, cor.method = "spearman",
# xlab = "mid-gut length (cm)", ylab = alphametric,
# color = "host_genus", palette = samp_pal)
#, facet.by = "host_species")
######################################################
#shapiro.test(host_summary$Shannon) # => p = 0.1229
## Shapiro-Wilk normality test for wt
#shapiro.test(host_summary$midgut_length) # => p = 0.09
#ggqqplot(host_summary$Shannon) # => p = 0.1229
## Shapiro-Wilk normality test for wt
#ggqqplot(host_summary$midgut_length) # => p = 0.09
######################################################
Beta diversity basically tells us how similar or dissimilar samples are to one another. Phyloseq offers several ordination methods
and distance
metrics. Here we use non metric multidimensional scaling (NMDS) coupled with Jensen–Shannon divergence. We also save a copy of the figure for later tweaking. To see the full output of the NMDS analysis, remove the results = 'hide'
tag from the code chunk.
set.seed(3131)
ord.nmds.jsd_slv <- ordinate(ps_slv_work_filt, method = "NMDS",
distance = "jsd")
stressplot(ord.nmds.jsd_slv)
##
## Call:
## metaMDS(comm = ps.dist)
##
## global Multidimensional Scaling using monoMDS
##
## Data: ps.dist
## Distance: user supplied
##
## Dimensions: 2
## Stress: 0.1646318
## Stress type 1, weak ties
## Two convergent solutions found after 20 tries
## Scaling: centring, PC rotation
## Species: scores missing
We see that a convergent solution was reached around 20 iterations and our stress is below 0.20, meaning that 2-axes are sufficient to view the data. Generally, we are looking for stress values below 0.2. If the stress values are high, you may need to add more axes to the ordination. Lets visualize the plot.
fig2C <- plot_ordination(ps_slv_work_filt, ord.nmds.jsd_slv,
color = "Sp", label = "SamName",
title = "Jensen-Shannon divergence")
fig2C <- fig2C + geom_point(size = 4) +
geom_point(shape = 1, size = 3.6, colour = "black", stroke = 0.75)
# + xlim(-0.4, 0.4) + ylim(-0.4, 0.4)
fig2C <- fig2C + scale_colour_manual(values = samp_pal)
fig2C <- fig2C + theme(axis.line = element_line(colour = "black"),
panel.background = element_blank(),
panel.grid.major = element_line("grey"),
panel.grid.minor = element_line("grey"),
panel.border =
element_rect(colour = "black", fill = NA, size = 1)) +
theme(legend.key = element_rect(colour = "black"))
fig2C <- fig2C + coord_fixed()
fig2C <- fig2C + stat_ellipse(type = "t") + theme_bw()
fig2C
So we can see some clustering within groups and spread between groups, but this is not a test for statistical differences. Do microbial communities differ significantly by host taxa?
To test whether microbial communities differ by host species we can use permutational analysis of variance (PERMANOVA) or analysis of similarity (ANOSIM). PERMANOVA does not assume normality but does assume equal beta dispersion between groups. We will test beta dispersion below.
First we use the adonis
function in vegan to run a PERMANOVA test. This will tell us whether host species have similar centroids or not.
set.seed(1911)
fish.jsd <- phyloseq::distance(ps_slv_work_filt, method = "jsd")
sampledf <- data.frame(sample_data(ps_slv_work_filt))
fish_adonis <- adonis(fish.jsd ~ Sp, data = sampledf, permutations = 1000)
fish_adonis
##
## Call:
## adonis(formula = fish.jsd ~ Sp, data = sampledf, permutations = 1000)
##
## Permutation: free
## Number of permutations: 1000
##
## Terms added sequentially (first to last)
##
## Df SumsOfSqs MeanSqs F.Model R2 Pr(>F)
## Sp 4 4.4045 1.10113 16.52 0.59489 0.000999 ***
## Residuals 45 2.9995 0.06665 0.40511
## Total 49 7.4040 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
These results indicate that centroids are significantly different across host species meaning that communities are different by host species.
We can also use the pairwiseAdonis
package for pair-wise PERMANOVA analysis.
Here we see again we see that communities are different by host species.
However, PERMANOVA assumes equal beta dispersion so we will use the betadisper
function from the vegan
package to calculate beta dispersion values.
##
## Homogeneity of multivariate dispersions
##
## Call: betadisper(d = fish.jsd, group = sampledf$Sp, bias.adjust =
## TRUE)
##
## No. of Positive Eigenvalues: 34
## No. of Negative Eigenvalues: 15
##
## Average distance to median:
## AcCoe AcTra ScTae SpAur SpVir
## 0.2980 0.3098 0.1845 0.1816 0.2570
##
## Eigenvalues for PCoA axes:
## (Showing 8 of 49 eigenvalues)
## PCoA1 PCoA2 PCoA3 PCoA4 PCoA5 PCoA6 PCoA7 PCoA8
## 1.8547 1.5977 0.9602 0.7234 0.5659 0.3413 0.2609 0.2427
And then a pair-wise Permutation test for homogeneity of multivariate dispersions using permutest
(again from the vegan
package).
##
## Permutation test for homogeneity of multivariate dispersions
## Permutation: free
## Number of permutations: 1000
##
## Response: Distances
## Df Sum Sq Mean Sq F N.Perm Pr(>F)
## Groups 4 0.14600 0.036500 4.1891 1000 0.007992 **
## Residuals 45 0.39209 0.008713
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Pairwise comparisons:
## (Observed p-value below diagonal, permuted p-value above diagonal)
## AcCoe AcTra ScTae SpAur SpVir
## AcCoe 0.8051948 0.0029970 0.0119880 0.2797
## AcTra 0.8147277 0.0159840 0.0139860 0.2557
## ScTae 0.0040024 0.0160930 0.9440559 0.0420
## SpAur 0.0125449 0.0160815 0.9425028 0.0699
## SpVir 0.2735320 0.2661694 0.0479863 0.0669727
These results are significant, meaning that host species have different dispersions. Looking at the pairwise p-values and permuted p-value, we see that the significant differences (p-value < 0.05) are between:
This means we are less confident that the PERMANOVA result is a real result, and that the result is possibly due to differences in group dispersions.
We can also use Analysis of Similarity (ANOSIM)—which does not assume equal group variances—to test whether overall microbial communities differ by host species.
spgroup <- get_variable(ps_slv_work_filt, "Sp")
fish_anosim <- anosim(distance(ps_slv_work_filt, "jsd"), grouping = spgroup)
summary(fish_anosim)
##
## Call:
## anosim(x = distance(ps_slv_work_filt, "jsd"), grouping = spgroup)
## Dissimilarity:
##
## ANOSIM statistic R: 0.8544
## Significance: 0.001
##
## Permutation: free
## Number of permutations: 999
##
## Upper quantiles of permutations (null model):
## 90% 95% 97.5% 99%
## 0.0556 0.0758 0.0873 0.1111
##
## Dissimilarity ranks between and within classes:
## 0% 25% 50% 75% 100% N
## Between 94 462.50 726.5 977.25 1225 992
## AcCoe 36 185.50 290.5 360.00 582 28
## AcTra 14 162.25 322.0 510.50 789 36
## ScTae 9 34.75 74.5 122.00 419 36
## SpAur 1 30.50 76.0 137.50 562 78
## SpVir 13 95.00 177.0 273.00 584 55
And the AN0SIM result is significant meaning that host species influences microbial community composition.
At this point we have a good handle on the diversity of the intestinal microbiomes of these herbivorous reef fish. We know that communities are dominated by the same broad-level taxonomic groups. The beta diversity analysis demonstrates that communities partition along host species. Now we want to determine which ASVs are driving these patterns and assess their distribution in nature using publicly available data. To accomplish this task we leave the R
environment and employ some additional tools. To summarize, our goals here are to:
- identify differentially abundant (DA) ASVs across host species,
- find closest database matches to DA ASVs, and
- perform phylogenetic reconstruction on DA ASVs and top hits.
We used LDA Effect Size (LEfSe) to identify differentially abundant (DA) ASVs across host species and the MicrobiomeAnalyst webserver to run the analysis. There are also many other great tools on the MicrobiomeAnalyst webserver by the way. We needed three files for the input—an ‘OTU’ table, a metadata file containing sample information, and a taxonomy table. We generate these tables with the code below.
########## OTU table
OTU1 <- as(otu_table(ps_slv_work_filt), "matrix")
# transpose if necessary
# Coerce to data.frame
OTUdf <- as.data.frame(t(OTU1))
setDT(OTUdf, keep.rownames = TRUE)[]
write.table(OTUdf, "TABLES/OUTPUT/OTHER/seq_tab_for_core.txt",
sep = "\t", row.names = FALSE, quote = FALSE)
colnames(OTUdf)[1] <- "#NAME"
write.table(OTUdf, "TABLES/OUTPUT/LEfSe/LEfSe_INPUT_seq_tab.txt",
sep = "\t", row.names = FALSE, quote = FALSE)
############ TAX Table
# Remember in the `tax_table` we added the last columns as the actual sequence
# of each ASV and the ASV_ID. We do not need those here.
# So lets only keep the first 6 columns (the taxonomic lineage)
TAX1 <- as(tax_table(ps_slv_work_filt), "matrix")
TAXdf <- as.data.frame(TAX1)
setDT(TAXdf, keep.rownames = TRUE)[]
colnames(TAXdf)[1] <- "#TAXONOMY"
TAXdf <- TAXdf[, 1:6]
write.table(TAXdf, "TABLES/OUTPUT/LEfSe/LEfSe_INPUT_tax_tab.txt",
sep = "\t", row.names = FALSE, quote = FALSE)
############ Metadata file
meta_file <- data.frame(NAME = sample_name, SampleType = species, Gen = genus)
rownames(meta_file) <- samples.out
colnames(meta_file) <- c("#NAME", "Species", "Genus")
# but we still have those three samples that need to be removed
meta_file <- filter(meta_file, Species != "SpChr" & Species != "ScVet")
write.table(meta_file, "TABLES/OUTPUT/LEfSe/LEfSe_INPUT_metadata.txt",
sep = "\t", row.names = FALSE, quote = FALSE)
And once we have the three files, we head over to the MicrobiomeAnalyst webserver and upload the files. Be sure to select Silva taxonomy
in the drop-down menu.
Check the data summary after uploading the files:
Cool, all looks good. Hit Proceed
. Here are the settings we used for the different step:
Minimum count = 20
, Prevalence in samples (%) = 20
, and Percentage to remove (%) = 0
. This removed 3796 low abundant ASVs.Data rarefying = Do not rarefy my data
,Data scaling = Total sum scaling (TSS)
, andData transformation = Do not transform my data
.Log LDA score = 4
& Adjusted p-value cutoff = 0.0001
. We specifically chose these values because we found that they eliminated spurious results such as DA ASVs that were really abundant in a few samples but not consistent across an entire group.The result was 59 differentially abundant (DA) ASVs.
We can inspect and save the results of the LEfSe analysis. The table shows the Linear discriminant analysis (LDA) scores, P-values adjusted for multiple testing, and False Discovery Rate (FDR) values from the LEfSe analysis. Normalized read abundance values for each host species are also given.
lefse_tab <- read.table("TABLES/INPUT/lefse_results.txt",
header = TRUE, sep = "\t", check.names = FALSE)
write.table(lefse_tab, "TABLES/OUTPUT/SUPP/Table_S5.txt",
quote = FALSE, sep = "\t", row.names = FALSE)
datatable(lefse_tab, rownames = FALSE, width = "100%",
caption =
htmltools::tags$caption(
style = "caption-side: bottom; text-align: left;",
"Table 4: ", htmltools::em("Results of LEfSe analysis.")),
extensions = "FixedColumns", "Buttons",
options = list(columnDefs =
list(list(className = "dt-center",
targets = c(1, 2, 3, 4, 5, 6, 7, 8))),
dom = "Blfrtip", pageLength = 5,
lengthMenu = c(5, 10, 25, 60),
buttons = c("csv", "copy"),
scrollX = TRUE, scrollCollapse = TRUE))
Before getting knee deep in the DA ASV analysis lets see if we can’t identify some core elements, or ASVs, to these fish. First we need a mothur-formatted .shared
file. This is the code …
cm <- read.table("TABLES/OUTPUT/OTHER/seq_tab_for_core.txt",
sep = "\t", header = TRUE, row.names = 1)
cm_t <- t(cm)
cm_df <- as.data.frame(cm_t)
numcols <- ncol(cm_df)
cm_df <- cm_df %>% tibble::rownames_to_column("Group")
cm_df <- cm_df %>%
mutate(label = 0.03, numOtus = numcols) %>%
select(label, Group, numOtus, everything())
write.table(cm_df, "TABLES/OUTPUT/OTHER/ps_slv_work_filt.txt",
quote = FALSE, sep = "\t", row.names = FALSE)
####COMBINE by fish species
cm_Merge <- cm %>% tibble::rownames_to_column("ASV")
cm_Merge <- cm_Merge %>% mutate(AcCoe =
AcCoe01 + AcCoe02 + AcCoe03 + AcCoe04 +
AcCoe05 + AcCoe06 + AcCoe07 + AcCoe08)
cm_Merge <- cm_Merge %>% mutate(AcTra =
AcTra01 + AcTra02 + AcTra03 + AcTra04 +
AcTra05 + AcTra06 + AcTra07 + AcTra08 +
AcTra09)
cm_Merge <- cm_Merge %>% mutate(ScTra =
ScTae01 + ScTae02 + ScTae03 + ScTae04 +
ScTae05 + ScTae06 + ScTae07 + ScTae08 +
ScTae09)
cm_Merge <- cm_Merge %>% mutate(SpAur =
SpAur01 + SpAur02 + SpAur03 + SpAur04 +
SpAur05 + SpAur06 + SpAur07 + SpAur08 +
SpAur09 + SpAur10 + SpAur11 + SpAur12 +
SpAur13)
cm_Merge <- cm_Merge %>% mutate(SpVir =
SpVir01 + SpVir02 + SpVir03 + SpVir04 +
SpVir05 + SpVir06 + SpVir07 + SpVir08 +
SpVir09 + SpVir10 + SpVir11)
cm_Merge <- cm_Merge %>% select(ASV, AcCoe, AcTra, ScTra, SpAur, SpVir)
cm_Merge_2 <- cm_Merge[, -1]
rownames(cm_Merge_2) <- cm_Merge[, 1]
cm_Merge_2_t <- t(cm_Merge_2)
cm_Merge_2_df <- as.data.frame(cm_Merge_2_t)
cm_Merge_2_df <- cm_Merge_2_df %>%
tibble::rownames_to_column("Group")
cm_Merge_2_df <- cm_Merge_2_df %>%
mutate(label = 0.03, numOtus = numcols) %>%
select(label, Group, numOtus, everything())
write.table(cm_Merge_2_df, "TABLES/OUTPUT/OTHER/ps_slv_work_filt_combine.txt",
quote = FALSE, sep = "\t", row.names = FALSE)
Next we use the output to run get.coremicrobiome
in mothur.
Next we wanted to know where else these ASVs had been detected in nature. There is a huge wealth of publicly available sequence information from many studies and habitats. We can use this information to get a better idea of the distribution and habitat specificity of the DA ASVs. To accomplish this we performed the following steps:
First we needed a phyloseq object that only contained the DA ASVs. To do this, we passed an object consisting of just these 59 ASVs (from the LEfSe analysis) to the phyloseq function prune_taxa
. We needed two different ps
objects, one from the unmerged object (ps_slv_work_filt
) and the other from the merged-by-genus object (mergedGP
).
Next we needed a fasta file of our DA ASVs. We could not find an easy way to export a fasta file from the new ps
objects. So we tried this using the tax_table
. This approach works but, well, it is not very elegant. If you want a fasta file from any other ps
objects just swipe out the name of the ps
object in the code below. Anyway, we will generate and save a fasta file.
Line Break Type
as Legacy Mac(CR)
. This may be incompatible with other programs and needs to be changed to UNIX (LS)
. I know, don’t quit my day job, except this is my day job :/# Object of DA ASVs
lefse_asvs <- c("ASV21", "ASV25", "ASV35", "ASV44", "ASV159",
"ASV17", "ASV174", "ASV22", "ASV234", "ASV60",
"ASV114", "ASV268", "ASV14", "ASV226", "ASV23",
"ASV29", "ASV30", "ASV48", "ASV90", "ASV98", "ASV18",
"ASV41", "ASV7", "ASV43", "ASV5", "ASV54", "ASV8",
"ASV9", "ASV250", "ASV12", "ASV32", "ASV34", "ASV39",
"ASV224", "ASV398", "ASV127", "ASV128", "ASV151",
"ASV323", "ASV359", "ASV374", "ASV91", "ASV56", "ASV6",
"ASV165", "ASV284", "ASV395", "ASV450", "ASV15", "ASV2",
"ASV20", "ASV298", "ASV57", "ASV69", "ASV75", "ASV82",
"ASV1", "ASV70", "ASV49")
# Create ps objects
da_asvs <- prune_taxa(lefse_asvs, mergedGP)
da_asvs_full <- prune_taxa(lefse_asvs, ps_slv_work_filt)
# Create fasta file from tax_table
table2format <- tax_table(da_asvs)
#retain only the column with the sequences
table2format_trim <- table2format[, 7]
table2format_trim_df <- data.frame(row.names(table2format_trim),
table2format_trim)
colnames(table2format_trim_df) <- c("ASV_ID", "ASV_SEQ")
#format fasta
table2format_trim_df$ASV_ID <- sub("ASV", ">ASV", table2format_trim_df$ASV_ID)
write.table(table2format_trim_df, "TABLES/OUTPUT/ASV_FOR_BLAST.fasta",
sep = "\r", col.names = FALSE, row.names = FALSE,
quote = FALSE, fileEncoding = "UTF-8")
Here are the top BLAST hits for each DA ASV. The table displays a lot of information about each BLAST search (it scrolls along the x-axis by the way). Most importantly are the accession numbers of top BLAST hits (subject acc.var), number of 100% identical matches (num perfect hits), the percent identity, and some info on where/when the hit sequence was originally found. Where applicable, there is also PubMedIDs so you can find the paper that reported the sequence. Looking at this table will give you a preliminary sense of the ecology of these ASVs. For example, most hits come from intestinal communities, many of which are marine herbivorous fish. But the low percent identity of several ASVs indicates that these sequences have been poorly sampled. This is not surprising given the geographic skew of sampling.
Note This table also scrolls horizontally.
blast_tab <- read.table("TABLES/INPUT/BLAST_results.txt",
header = TRUE, sep = "\t", check.names = FALSE)
write.table(blast_tab, "TABLES/OUTPUT/BLAST_RESULTS.txt",
row.names = FALSE, sep = "\t", quote = FALSE)
datatable(blast_tab, rownames = FALSE,
caption =
htmltools::tags$caption(
style = "caption-side: bottom; text-align: left;",
"Supplementary Table 6: ",
htmltools::em("Results of BLASTn analysis.")),
extensions = "FixedColumns", "Buttons",
options = list(columnDefs =
list(list(className = "dt-center",
targets = c(1, 2, 3, 4, 5, 6, 7, 8))),
dom = "Blfrtip", pageLength = 5,
lengthMenu = c(5, 10, 25, 65),
buttons = c("csv", "copy"),
scrollX = TRUE, scrollCollapse = TRUE))
Several ASVs returned more than one match at 100%. For some ASVs, the 100% matches were from the same study/study organism, so we just selected one as the representative. ASVs 6, 12, 224, and 398 returned numerous 100% matches (out of 50 total). These data were impractical to summarize and not very informative anyway. So we elected to leave these data out. If you want to see what these hits are, just grab the ASV sequence and BLAST away.
Also, some ASVs shared the same to hit. Since the table is ‘by ASV’ we retained all duplicate hits.
Short read sequences are not ideal for phylogenetic analysis, but given the data we have, we felt this was a good place to start. This analysis required several steps.
mothur "#align.seqs(candidate=sequence_tree.fasta, template=silva.nr_v132.align, processors=20, flip=t)"
mothur "#filter.seqs(fasta=sequence_tree.fasta, vertical = =F, hard=mask.txt)"
mothur "#classify.seqs(fasta=sequence_tree.filter.fasta, template=silva.nr_v132.align, taxonomy=silva.nr_v132.tax, processors=10)"
raxmlHPC-PTHREADS-SSE3 -T 24 -f a -p 2345 -x 3456 -m GTRGAMMA -N 1000 -s sequence_tree.filter.fasta -n fish_arb_align_1000BS_B.tre
Now it was time to put the pieces together and determine…
What these patterns tell us about specificity?
To dig just a little deeper, we screened our DA ASVs against the IMNGS database. IMNGS hosts a curated database of short-read sequences scraped from the International Nucleotide Sequence Database Collaboration (GenBank, DDBJ and EMBL). The database is rebuilt monthly and at the time of this analysis, contained 271,237 samples. IMNG is really designed to screen full-length 16S rRNA sequences and is not ideal for shorter reads. This is because the database is built from short reads, and different studies target different regions of the gene. For example, if we get no hits to an ASV it could mean the organisms it came from has really not been detected before or that it is in the database but is represented by a different 16S region. So take these data with a grain of salt.
IMNGS is not a high-throughput system. User may only submit a maximum of 10 sequences per query and this can (and will) take weeks to run. So choose your ASVs carefully.
IMNGS will return a lot of useful data for each query sequence. All we were interested in here was the number of hits, but there much more really useful data here. Among other data products, IMNGS returns report tables that tally the number of samples that were positive for the presence of query-like sequences for each sample category—categories like shrimp gut metagenome and seawater metagenome. Each category has a number of short-read samples, which could originate from a single study or multiple studies.
A report includes values for several percent identity cutoff values. We set a minimum threshold at 97% so our reports have values for 97 and 99%. You can set the threshold as low as 90%.
IMNGS provides three such reports based on the abundance of your query sequence.
An SRA-derived sample is considered positive if the query-like sequences sum up to more than 0% of the total number of sequences in that sample (i.e. any abundance).
An SRA-derived sample is considered positive if the query-like sequences sum up to more than 0.1% of the total number of sequences in that sample (i.e. excluding rare abundances).
An SRA-derived sample is considered positive if the query-like sequences sum up to more than 1% of the total number of sequences in that sample (i.e. including only dominant OTUs).
We report data from 97% cutoff identity and 0.1% of total reads in a sample. I think this is a little confusing so let me explain by example. In the tree below, ASV398 is most closely related to an Alphaproteobacteria associated with the toxic benthic marine dinoflagellate, Ostreopsis ovata. We retrieved this sequence during the BLASTn analysis discussed above. Anyway, ASV398 was screened against IMNGS and returned 2460 hits. This means that at 97% identity, 2460 samples had an ASV398-like sequence comprising greater than 0.1% of a given samples total number of sequences. If for example we increase the percent identity to 99% the number of sample hits drops to 138. If instead we look at the 0% report (97% identity), the number of sample hits increases to 6323.
We also compared the final list of top hits to the Sullam et. al. paper, specifically Table S1 from that paper. Because this paper was published in 2012, there were many sequence hits in our db that did not appear in the original paper. However, for those that did, we added the Sullam lifestyle category designations to the tree metadata.
We took all of these data and used iTOL to visualize the tree. For each top hit, we added the isolation source/natural host information, taxonomic affiliation, and Sullam lifestyle category. We also overlaid the number of hits to the IMNGS database for each ASV.
To view a full, interactive version of the tree go this iTOL page.
We used the tree to infer the lifestyle category of each ASVs based on the closest relatives in their clade. This was not a quantitative, but rather a user guided determination. Aside from MetaMetaDb (discussed above) we are not aware of any tool currently available to quantitatively assess habitat preference of a 16S rRNA sequence.
For simplicity we focused on three lifestyle catagories (though we have seven categories in the tree). Our reasoning—again based on the work of Sullam et. al.—was that fish intestines contain microbes that are generalists and possibly of environmental origin (what they eat, where they live), microbes that are there because fish are animals with guts, and microbes that are there because fish are fish and have a physiology and evolutionary history that select specific organisms. Sullam et.al. were also looking at fish from different habitats (freshwater, estuary, marine) and tropic levels (carnivores, herbivores, omnivores) while our study was more narrow in scope.
We combined these habitat predictions with the results of the BLASTn analysis, scan of the IMNGS database, Sullam lifestyle categories, etc. and put it all in one editable table. So if you disagree with a habitat prediction, you can feel free to change it.
Description of table headings
Note This table also scrolls horizontally.
habi_tab <- read.table("TABLES/INPUT/habitat_specificity.txt",
header = TRUE, sep = "\t", check.names = FALSE)
# order by habitat and host enriched
habi_tab2 <- habi_tab[order(habi_tab$habitat_code, habi_tab$Enriched), ]
#habi_tab <- habi_tab[, -2] #delete code column
da_asvs_counts <- as.data.frame(taxa_sums(da_asvs))
colnames(da_asvs_counts) <- c("total_reads")
# make rownames a column
da_asvs_counts <- cbind(ASV = rownames(da_asvs_counts), da_asvs_counts)
temp_table <- merge(da_asvs_counts, blast_tab, by = "ASV",
all = TRUE, sort = FALSE)
summ_table <- merge(temp_table, habi_tab2, by = "ASV",
all = TRUE, sort = FALSE, no.dups = TRUE)
summ_table <- summ_table[-c(22, 26, 27, 28, 29, 31)]
summ_table <- summ_table[c(1, 22, 23, 2, 24, 4, 3, 5, 6, 7, 8, 18, 19, 20,
9, 10, 11, 12, 13, 14, 15, 16, 17, 21, 25)]
datatable(summ_table, rownames = FALSE,
colnames = c(
"ASV", "Putative habitat", "Enriched", "Total reads",
"Taxon", "Num perfect hits", "Top hit acc", "% identity",
"Isolation source", "Nat host", "Common name", "Collection year",
"Country", "PubMed ID", "Alignment length", "Mismatches",
"Gap opens", "Q. start", "Q. end", "S. start", "S. end",
"Evalue", "Bit score", "Num IMNGS hits", "Sullam lifestyle"),
editable = TRUE, caption =
htmltools::tags$caption(
style = "caption-side: bottom; text-align: left;",
"Supplementary Table 6: ",
htmltools::em("Assessing habitat specificity.")),
extensions = "Buttons",
options = list(columnDefs =
list(list(className = "dt-center",
targets = c(1, 2, 3, 4, 5,
6, 7, 8, 9, 10))),
dom = "Blfrtip", pageLength = 5,
lengthMenu = c(5, 10, 25, 60),
buttons = c("csv", "copy"),
scrollX = TRUE, scrollCollapse = TRUE))
write.table(summ_table, "TABLES/OUTPUT/SUPP/Table_S6.txt",
sep = "\t", col.names = c(
"ASV", "Putative habitat", "Enriched", "Total reads",
"Taxon", "Num perfect hits", "Top hit acc", "% identity",
"Isolation source", "Nat host", "Common name",
"Collection year", "Country", "PubMed ID", "Alignment length",
"Mismatches", "Gap opens", "Q. start", "Q. end", "S. start",
"S. end", "Evalue", "Bit score", "Num IMNGS hits",
"Sullam lifestyle"),
row.names = FALSE, quote = FALSE, fileEncoding = "UTF-8")
NR Indicates Not Recorded. Four ASVs had numerous hits at 100% identity. We did not include top hit data for these ASVs.
Now we can summarize the data for each lifestyle category. This table was constructed in a text file and read into R.
habi_summary <- read.table("TABLES/INPUT/Table_1.txt",
header = TRUE, sep = "\t", check.names = FALSE)
datatable(habi_summary,
rownames = FALSE, editable = TRUE,
caption = htmltools::tags$caption(
style = "caption-side: bottom; text-align: left;",
"Table 1: ", htmltools::em("Summary of habitat specificity.")),
extensions = "Buttons",
options = list(columnDefs = list(list(className = "dt-center",
targets = c(1, 2, 3, 4, 5))),
dom = "Brti", buttons = c("csv", "copy"),
scrollX = TRUE, scrollCollapse = TRUE))
So we know that Host X is enriched for some ASV from taxa Y. Is this part of a larger pattern or an isolated case? For a given taxonomic group and rank, what proportion of total reads (from all ASVs) were found in a particular host species? At some point it would be nice if this were an interactive step, but for now we must modify the code below to look at different taxa. This example will look at the family Desulfovibrionaceae (Deltaproteopbacteria)
# Change this to select different taxa
calc_tax_prop <- subset_taxa(mergedGP, Family == "Desulfovibrionaceae")
calc_tax_prop
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 71 taxa and 5 samples ]
## sample_data() Sample Data: [ 5 samples by 3 sample variables ]
## tax_table() Taxonomy Table: [ 71 taxa by 8 taxonomic ranks ]
sample_sums_by_taxa <- sample_sums(calc_tax_prop)
total_taxa_reads <- sum(sample_sums_by_taxa)
sample_sums_by_taxa <- as.data.frame(sample_sums_by_taxa)
sample_sums_by_taxa$proportion <-
(sample_sums_by_taxa$sample_sums_by_taxa /
total_taxa_reads) * 100
colnames(sample_sums_by_taxa) <- c("total taxa reads", "Proportion")
sample_sums_by_taxa$Proportion <- round(sample_sums_by_taxa$Proportion,
digits = 2)
total_taxa_reads_int <- as.integer(total_taxa_reads)
sample_sums_by_taxa
Great. Looks like there are 71 Desulfovibrionaceae ASVs and the majority (> 90%) of the reads are from Acanthurus. This is interesting. We can do this with any taxa we wish.
So lets do this to also look at the proportion of Cyanobacteria reads by host species.
# Change this to select different taxa
calc_tax_prop_Cyan <- subset_taxa(mergedGP, Phylum == "Cyanobacteria")
calc_tax_prop_Cyan
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 484 taxa and 5 samples ]
## sample_data() Sample Data: [ 5 samples by 3 sample variables ]
## tax_table() Taxonomy Table: [ 484 taxa by 8 taxonomic ranks ]
sample_sums_by_taxa_Cyan <- sample_sums(calc_tax_prop_Cyan)
total_taxa_reads_Cyan <- sum(sample_sums_by_taxa_Cyan)
sample_sums_by_taxa_Cyan <- as.data.frame(sample_sums_by_taxa_Cyan)
sample_sums_by_taxa_Cyan$proportion <-
(sample_sums_by_taxa_Cyan$sample_sums_by_taxa_Cyan / total_taxa_reads_Cyan) * 100
colnames(sample_sums_by_taxa_Cyan) <- c("total taxa reads", "Proportion")
sample_sums_by_taxa_Cyan$Proportion <- round(sample_sums_by_taxa_Cyan$Proportion,
digits = 2)
total_taxa_reads_Cyan_int <- as.integer(total_taxa_reads_Cyan)
sample_sums_by_taxa_Cyan
There were a total of 484 Cyanobacteria ASVs across 72079 reads.
At this point we know which ASVs are enriched in which host species, the lineage of those ASVs, and something about where else these sequences have been detected in nature. Next we would like to know the proportion of total reads for each ASV that is found in each host species. We start with a summary table of these data.
# calculate the averages and merge by species
# grab the da_asv ps object & merge by samples
daASV_mergedGP_BAR <- merge_samples(da_asvs_full, "Sp")
#daASV_SD_BAR <- merge_samples(sample_data(da_asvs_full), "Sp")
# calculate percent proportion
daASV_AVG <- apply(t(otu_table(daASV_mergedGP_BAR)), 1, function(x) x / sum(x))
# transpose
daASV_t_AVG <- t(daASV_AVG)
daASV_t_AVG_df <- as.data.frame(daASV_t_AVG)
######################
# choose columns of interest
da_ASV_tax <- habi_tab[c("ASV", "Taxon", "Putative_habitat")]
da_ASV_tax2 <- da_ASV_tax[, -1]
rownames(da_ASV_tax2) <- da_ASV_tax[, 1]
######################
# combine based on ASV column
daASV_work <- merge(daASV_t_AVG_df, da_ASV_tax2, by = 0, sort = FALSE)
rownames(daASV_work) <- daASV_work[, 1]
daASV_work[, 1] <- NULL
#daASV_work
# then make column row.names
daASV_work2 <- cbind(ASV = rownames(daASV_work), daASV_work)
# melt the df
# wide to long format?
daASV_work3 <- melt(daASV_work2, value.name = "ASV")
colnames(daASV_work3) <- c("ASV", "Taxon", "Putative_habitat",
"Sample", "Proportion")
daASV_work3$Proportion <- round(daASV_work3$Proportion, digits = 4)
datatable(daASV_work3,
rownames = TRUE, editable = FALSE,
caption =
htmltools::tags$caption(
style = "caption-side: bottom; text-align: left;", "Table 8: ",
htmltools::em("DA ASV sample proportion.")),
extensions = "Buttons",
options = list(columnDefs = list(list(className = "dt-center",
targets = c(1, 2, 3, 4, 5))),
dom = "Blfrtip", pageLength = 5,
lengthMenu = c(5, 10, 50, 100, 300),
buttons = c("csv", "copy"), scrollX = TRUE,
scrollCollapse = TRUE))
write.table(daASV_work3, "TABLES/OUTPUT/prop_ASV_reads_by_host.txt",
sep = "\t", row.names = FALSE, quote = FALSE)
Now that we have a list of DA ASVs and their assigned habitat preference, we want to create a R object that organizes these in some logical fashion. We can then use this object to order subsequent graphs and tables. So lets order the ASVs by putative habitat preference and then by the host species in which that ASV was enriched. Seems reasonable enough? Depending on the R command, some objects need to be in ascending order, others in descending order.
asv_order <- c("ASV450", "ASV165", "ASV395", "ASV284", "ASV56",
"ASV6", "ASV359", "ASV128", "ASV127", "ASV91",
"ASV374", "ASV151", "ASV323", "ASV398", "ASV224",
"ASV39", "ASV34", "ASV12", "ASV32", "ASV250",
"ASV43", "ASV54", "ASV9", "ASV5", "ASV49", "ASV8",
"ASV41", "ASV18", "ASV7", "ASV90", "ASV29", "ASV98",
"ASV23", "ASV30", "ASV226", "ASV48", "ASV70", "ASV1",
"ASV14", "ASV298", "ASV82", "ASV75", "ASV69", "ASV57",
"ASV20", "ASV15", "ASV2", "ASV268", "ASV114", "ASV234",
"ASV174", "ASV60", "ASV17", "ASV22", "ASV159", "ASV44",
"ASV25", "ASV21", "ASV35")
asv_order_rev <- rev(asv_order)
Lets see if we can overlay all of this information in one “easy” to understand plot. The first thing is to do is plot the proportion of reads for a given ASV from each host species.
daASV_work3$ASV <- as.character(daASV_work3$ASV)
daASV_work3$ASV <- factor(daASV_work3$ASV, levels = unique(daASV_work3$ASV))
daASV_work3$ASV <- factor(daASV_work3$ASV, levels = asv_order)
Next, we created a bar plot of read proportion by host species for each ASV. And save a copy to the FIGURES/
directory.
Code for proportional bar chart
#Bar charts
ASV_bar <- ggplot(daASV_work3, aes_string(x = "ASV", y = "Proportion",
fill = "Sample"),
environment = .e, ordered = TRUE,
xlab = "x-axis label", ylab = "y-axis label")
ASV_bar <- ASV_bar +
geom_bar(stat = "identity", position = position_stack(reverse = TRUE),
width = 0.95) +
coord_flip() +
theme(aspect.ratio = 2 / 1)
ASV_bar <- ASV_bar +
scale_fill_manual(values = samp_pal)
ASV_bar <- ASV_bar +
theme(axis.text.x = element_text(angle = 0, hjust = 0.95, vjust = 1))
ASV_bar <- ASV_bar +
guides(fill = guide_legend(override.aes = list(colour = NULL),
reverse = FALSE)) +
theme(legend.key = element_rect(colour = "black"))
ASV_bar <- ASV_bar +
labs(x = "Host species", y = "Proportion (% total reads)",
title = "ASV Proportion by host species")
ASV_bar <- ASV_bar +
theme(axis.line = element_line(colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_rect(colour = "black", fill = NA, size = 1))
ASV_bar
Code for heatmap
# Heatmap
library(ComplexHeatmap)
library(circlize)
library(heatmap3)
library(gdata)
fig4_heat <- as.data.frame(t(otu_table(da_asvs)))
# Convert habi_table to df and store in new variable
fig4_tax <- as.data.frame(habi_tab2)
# eliminate 1st column so can combine based on row names
fig4_tax_tab <- fig4_tax[, -1]
# Make new row.names from original table
rownames(fig4_tax_tab) <- fig4_tax[, 1]
# Reorder
fig4_tax_tab <- fig4_tax_tab[c(4, 2, 3, 1, 5, 6, 7, 8)]
# Select columns
fig4_tax_tab <- fig4_tax_tab[c(1:4)]
# Combine the two df by rowname If the matching involved row names,
# an extra character column called Row.names
# is added at the left, and in all cases the result has ‘automatic’ row names.
fig4_heatmap2 <- merge(fig4_tax_tab, fig4_heat, by = 0, all = TRUE)
fig4_heatmap <- subset(fig4_heatmap2, select = -c(Row.names))
rownames(fig4_heatmap) <- fig4_heatmap2[, "Row.names"]
# make rownames a column
fig4_heatmap <- cbind(ASV = rownames(fig4_heatmap), fig4_heatmap)
fig4_heatmap$ASV <- factor(fig4_heatmap$ASV, levels = rev(asv_order))
fig4_heatmap <- fig4_heatmap[order(fig4_heatmap$ASV), ]
# combine the columns to make one name
fig4_heatmap$ID <- paste(fig4_heatmap$ASV, fig4_heatmap$Taxon,
fig4_heatmap$Putative_habitat,
fig4_heatmap$Enriched, sep = "_")
# delete the original columns
fig4_heatmap2 <- fig4_heatmap[-c(1:5)]
# reorder
fig4_heatmap2 <- fig4_heatmap2[c(6, 1, 2, 3, 4, 5)]
rownames(fig4_heatmap2) <- fig4_heatmap2[, 1]
fig4_heatmap2 <- fig4_heatmap2[-1]
####Define Colors
taxa_colors <- unlist(lapply(row.names(fig4_heatmap2), function(x) {
if (grepl
# generalists
("Alphaproteobacteria", x)) "#000000"
else if (grepl("Pirellulaceae", x)) "#000000"
else if (grepl("Rubritaleaceae", x)) "#000000"
else if (grepl("Flavobacteriaceae", x)) "#000000"
# fish/animal
else if (grepl("Desulfovibrionaceae", x)) "#0072b2"
else if (grepl("Lachnospiraceae", x)) "#f0e442"
else if (grepl("Erysipelotrichaceae", x)) "#009e73"
else if (grepl("Ruminococcaceae", x)) "#e69f00"
else if (grepl("Bacteroidales", x)) "#d55e00"
else if (grepl("Fusobacteriaceae", x)) "#56b4e9"
else if (grepl("Vibrionaceae", x)) "#cc79a7"
# other
else if (grepl("Family_XIII", x)) "#808080"
else if (grepl("Mollicutes", x)) "#808080"
else if (grepl("Brevinemataceae", x)) "#808080"
else if (grepl("Peptostreptococcaceae", x)) "#808080"
}))
habitat_colors <- unlist(lapply(row.names(fig4_heatmap2), function(x) {
if (grepl
("fish", x)) "#808080"
else if (grepl("animal", x)) "#000000"
else if (grepl("generalist", x)) "#808080"
#else if (grepl("undetermined", x)) "#000000"
}))
heatColors <- cbind(taxa_colors, habitat_colors)
colnames(heatColors)[1] <- "Taxa"
colnames(heatColors)[2] <- "Habitat"
### SAVE/display heatmap
col <- colorRampPalette(bias = 1, c("#000033", "#66CCFF"))(16)
pdf(file = "FIGURES/OUTPUT/Figure_4B.pdf")
heatmap3(fig4_heatmap2, cexRow = 0.5, cexCol = 1,
margins = c(3, 13), RowSideColors = heatColors, scale = "row",
Colv = NA, Rowv = NA, revC = TRUE, balanceColor = FALSE, col = col)
invisible(dev.off())
heatmap3(fig4_heatmap2, cexRow = 0.5, cexCol = 1,
margins = c(3, 13), RowSideColors = heatColors, scale = "row",
Colv = NA, Rowv = NA, revC = TRUE, balanceColor = FALSE, col = col)
Combine the two charts.
Either do outside R or figure out a way to ‘bolb’ the heatmap. grid::grid.grab
seemed promising.
To test whether intestinal microbes were associated with a) phylogenetic history and/or b) foraging ecology of each herbivore, we used a series of simple and partial Mantel tests. Because we expected the relationships to potentially differ for putative resident symbionts vs. ingested environmental generalist microbes, we constructed separate dissimilarity matrices for host- and environment-associated ASVs. These matrices were constructed using the vegan package in R and were based on Bray-Curtis dissimilarity of Hellinger transformed data. The ecological dissimilarity matrix was based on the behavioural data collected to quantify herbivore trophic niche space. The phylogenetic dissimilarity matrix was based on a phylogenetic tree of the five fish species used in this study. We constructed the tree using cytochrome oxidase subunit 1 (COI) genes retrieved from NCBI’s Nucleotide Database. Clustal Omega was used to align sequences (default settings for DNA). We then used Jalview to manually curate and trim the final alignment to 593 bp. This alignment contained COI genes from n = 5 Scarus taeniopterus, 22 Sparisoma aurofrenatum, 21 Sparisoma viride, 28 Acanthurus coeruleus, and 23 Acanthurus tractus. We used members of the Gerridae (2 Eucinostomus and 4 Gerres) as the outgroup. We used RAxML and the GTR model for tree computation and the GAMMA rate model for likelihoods. The tree was then transformed into a distance matrix using the cophenetic function in R.
detach("package:phyloseq", unload=TRUE)
library(ape)
library(picante)
library(ggtree)
library(tidytree)
library(treeio)
# Get phylogenetic data ------------------
# Read newick tree ------------------
tree <- read.tree("TABLES/INPUT/MANTEL_TEST/item_orders.txt")
#ggtree(tree) + geom_tiplab(color = "blue")
host_tree <- knitr::include_graphics("FIGURES/INPUT/collapse_tree.svg",
dpi = 300)
host_tree
Next, we pruned the tree to one member of each species, removed the outgroup, and changed the names to species names.
d <- matrix(nrow = 1, ncol = 5)
colnames(d) <- c("HM379826_Acanthurus_coeruleus",
"LIDM544-07_Acanthurus_tractus",
"MXIV480-10_Scarus_taeniopterus",
"JQ841390_Sparisoma_aurofrenatum",
"JQ839595_Sparisoma_viride")
tree.p <- prune.sample(phylo = tree, samp = d)
#plot(tree.p)
# Change the names to species names
tree.p$tip.label[1] <- "Sparisoma_aurofrenatum"
tree.p$tip.label[2] <- "Sparisoma_viride"
tree.p$tip.label[3] <- "Scarus_taeniopterus"
tree.p$tip.label[4] <- "Acanthurus_tractus"
tree.p$tip.label[5] <- "Acanthurus_coeruleus"
plot(tree.p)
We then transformed the tree into a distance matrix and generated a dendrogram.
#Transform tree into a distance matrix
trx <- cophenetic(tree.p)
#that works but I need these in alphabetical order by species
T <- dist(cophenetic(tree.p))
ordering <- sort(attr(T, "Labels"))
T.mat <- as.matrix(T)[ordering, ordering]
T <- as.dist(T.mat)
#plot cluster of distance matrix
cluster_phylo <- hclust(T, method = "ward.D")
plot(cluster_phylo, main = "Phylogenetic clustering",
xlab = "Host species", ylab = "Distance",
sub = "hellinger/bray-curtis/ward")
Next we grab the ecological data and standardize the variables so that they have similar weights.
First, rescale quantitative traits to be in the range 0 to 1 and then devide by the number of diet categories to have similar influence as the diet variables. Then rescale all ‘non diet’ traits to have similar influence to the diet traits by dividing by the number of diet categories divided by the number of categories for each substrate characteristic. Now combine into a single data frame for analysis get averages for each species.
all_traits <- read.csv(
"TABLES/INPUT/MANTEL_TEST/Mean_bite_characteristics.csv",
header = TRUE
)
ids <- all_traits[, 1:2]
quant_traits_std <- decostand(all_traits[, 3:4], "range") / 10
Mean_prop_mark_on_substrate_std <- all_traits[, 5] / (10 / 2)
prop_vertical_std <- all_traits[, 6] / (10 / 2)
prop_concave_std <- all_traits[, 7] / (10 / 3)
prop_convex_std <- all_traits[, 8] / (10 / 3)
all_traits_std <- cbind(quant_traits_std,
Mean_prop_mark_on_substrate_std,
prop_vertical_std, prop_concave_std,
prop_convex_std,
all_traits[, 9:18]
)
mean_traits <- aggregate(all_traits[, 3:18],
by = list(all_traits$Species), mean)
traits <- mean_traits[, 2:17]
rownames(traits) <- as.vector(mean_traits[, 1])
Fish_species <- as.vector(mean_traits[, 1])
And begin with a Hellinger transformation of Ecological traits.
traits_trans <- decostand(traits, method = "hellinger")
traits_dist <- vegdist(traits_trans, method = "bray")
cluster_traits <- hclust(traits_dist, method = "ward.D")
plot(cluster_traits,
labels = Fish_species, main = "Ecological traits",
xlab = "Host species", ylab = "Distance",
sub = "hellinger/bray-curtis/ward")
So, is ecological data correlated with phylogeny?
# Is ecological data correlated with phylogeny
mantel(traits_dist, T, method = "pearson", permutations = 9999)
##
## Mantel statistic based on Pearson's product-moment correlation
##
## Call:
## mantel(xdis = traits_dist, ydis = T, method = "pearson", permutations = 9999)
##
## Mantel statistic r: 0.4816
## Significance: 0.175
##
## Upper quantiles of permutations (null model):
## 90% 95% 97.5% 99%
## 0.593 0.654 0.751 0.817
## Permutation: free
## Number of permutations: 119
0K, looks like there is no correlation between ecological data and phylogeny. Next, we looked at differentially abundant ASVs split into host-associated vs environmentally associated
asv4 <- read.delim(
"TABLES/INPUT/MANTEL_TEST/2_da_asv_merged_fish.txt",
header = T
)
asv5 <- read.delim(
"TABLES/INPUT/MANTEL_TEST/2_da_asv_merged_animal.txt",
header = T
)
asv_host <- rbind(asv4, asv5)
# Merge the fish and animal datasets together
# Transpose dataframe
asv_host_t <- data.frame(t(asv_host[-1]))
colnames(asv_host_t) <- asv_host[, 1]
library(vegan)
# Create dendrogram based on similarity in asv
Fish_species <- as.vector(colnames(asv_host[2:6]))
Gut_contents <- sqrt(asv_host_t[])
#Try hellinger transformation
Gut_contents <- decostand(asv_host_t, method = "hellinger")
Gut_dist_host <- vegdist(Gut_contents, method = "bray")
cluster_gut_host <- hclust(Gut_dist_host, method = "ward.D")
plot(cluster_gut_host, labels = Fish_species, main = "Host-associated ASVs",
xlab = "Host species", ylab = "Distance", sub = "hellinger/bray-curtis/ward")
So is the distance matrix (based on host associated ASVs) correlated with ecological data or phylogeny?
##
## Mantel statistic based on Pearson's product-moment correlation
##
## Call:
## mantel(xdis = Gut_dist_host, ydis = T, method = "pearson", permutations = 9999)
##
## Mantel statistic r: 0.8448
## Significance: 0.0083333
##
## Upper quantiles of permutations (null model):
## 90% 95% 97.5% 99%
## 0.556 0.711 0.754 0.820
## Permutation: free
## Number of permutations: 119
Yes, highly correlated with phylogeny…
##
## Mantel statistic based on Pearson's product-moment correlation
##
## Call:
## mantel(xdis = Gut_dist_host, ydis = traits_dist, method = "pearson", permutations = 9999)
##
## Mantel statistic r: 0.2511
## Significance: 0.25833
##
## Upper quantiles of permutations (null model):
## 90% 95% 97.5% 99%
## 0.490 0.585 0.662 0.752
## Permutation: free
## Number of permutations: 119
…but not associated with ecological traits.
What about the distnace matrix based on environment associated ASVs? Is it correlated with ecological data or phylogeny?
asv6 <- read.delim(
"TABLES/INPUT/MANTEL_TEST/2_da_asv_merged_environmental.txt",
header = T)
# Transpose dataframe
asv_env_t <- data.frame(t(asv6[-1]))
colnames(asv_env_t) <- asv6[, 1]
# Create dendrogram based on similarity in asv
Fish_species <- as.vector(colnames(asv6[2:6]))
Gut_contents <- sqrt(asv_env_t[])
#Try hellinger transformation
Gut_contents <- decostand(asv_env_t, method = "hellinger")
Gut_dist_env <- vegdist(Gut_contents, method = "bray")
cluster_gut_env <- hclust(Gut_dist_env, method = "ward.D")
plot(cluster_gut_env, labels = Fish_species,
main = "Environment-associated ASVs",
xlab = "Host species", ylab = "Distance",
sub = "hellinger/bray-curtis/ward")
##
## Mantel statistic based on Pearson's product-moment correlation
##
## Call:
## mantel(xdis = Gut_dist_env, ydis = T, method = "pearson", permutations = 9999)
##
## Mantel statistic r: 0.8313
## Significance: 0.075
##
## Upper quantiles of permutations (null model):
## 90% 95% 97.5% 99%
## 0.455 0.862 0.893 0.931
## Permutation: free
## Number of permutations: 119
Doesn’t look correlated with phylogeny…
##
## Mantel statistic based on Pearson's product-moment correlation
##
## Call:
## mantel(xdis = Gut_dist_env, ydis = traits_dist, method = "pearson", permutations = 9999)
##
## Mantel statistic r: 0.5963
## Significance: 0.083333
##
## Upper quantiles of permutations (null model):
## 90% 95% 97.5% 99%
## 0.531 0.690 0.765 0.788
## Permutation: free
## Number of permutations: 119
…or ecological data.
We could also do partial Mantel tests.
##
## Partial Mantel statistic based on Pearson's product-moment correlation
##
## Call:
## mantel.partial(xdis = Gut_dist_env, ydis = T, zdis = traits_dist, method = "pearson", permutations = 9999)
##
## Mantel statistic r: 0.7734
## Significance: 0.091667
##
## Upper quantiles of permutations (null model):
## 90% 95% 97.5% 99%
## 0.497 0.829 0.871 0.914
## Permutation: free
## Number of permutations: 119
##
## Partial Mantel statistic based on Pearson's product-moment correlation
##
## Call:
## mantel.partial(xdis = Gut_dist_env, ydis = traits_dist, zdis = T, method = "pearson", permutations = 9999)
##
## Mantel statistic r: 0.4022
## Significance: 0.15
##
## Upper quantiles of permutations (null model):
## 90% 95% 97.5% 99%
## 0.516 0.632 0.830 0.839
## Permutation: free
## Number of permutations: 119
But again, no significance detected…
Here is code for other representation of taxa abundance. These are raw R
images that have not been gussied up.
We will create two different representations of relative abundance for each sample (arranged by host species) by major Classes—facet grid box-and-whisker plots and bar charts. We will generate each separately (and save) using an earlier relative abundance phyloseq object and then display the combined output.
Code for box-and-whisker plot.
library(phyloseq)
mdata_phy_all <- tax_glom(ps_slv_filt_AVG, taxrank = "Class", NArm = FALSE)
# You can choose any taxonomic level here
mdata_phyrel_all <- transform_sample_counts(
mdata_phy_all, function(x) x / sum(x)
)
meltd_all <- psmelt(mdata_phyrel_all)
meltd_all$Class <- as.character(meltd_all$Class)
means <- ddply(meltd_all, ~Class, function(x) c(mean = mean(x$Abundance)))
# decending order
taxa_means <- means[order(-means$mean), ]
# ditch the sci notation
taxa_means <- format(taxa_means, scientific = FALSE)
# Here we conglomerate at 2%.
Other <- means[means$mean <= 0.026, ]$Class
meltd_all[meltd_all$Class %in% Other, ]$Class <- "Other"
samp_names <- aggregate(meltd_all$Abundance,
by = list(meltd_all$Sample), FUN = sum)[, 1]
.e <- environment()
meltd_all[, "Class"] <- factor(meltd_all[, "Class"],
sort(unique(meltd_all[, "Class"])))
meltd_all <- meltd_all[order(meltd_all[, "Class"]), ]
levels(meltd_all$Class)
# Here we order Classes by the Phylum they belong to.
meltd_all$Class <- factor(meltd_all$Class,
levels = c(
"Bacteroidia", "Clostridia", "Erysipelotrichia",
"Fusobacteriia", "Alphaproteobacteria",
"Deltaproteobacteria", "Gammaproteobacteria",
"Planctomycetacia", "Oxyphotobacteria", "Other"))
sup_fig1 <- qplot(data = meltd_all, x = Sp, y = Abundance, fill = Class,
geom = "boxplot", ylab = "Relative Abundance") +
theme(legend.position = "bottom") +
facet_grid(Class ~ ., scales = "free_y", space = "free_y") +
geom_jitter(width = 0.05) +
geom_point(colour = "black", fill = "white")
#+ guides(guide_legend(reverse = FALSE) )
sup_fig1 <- sup_fig1 +
scale_fill_manual(values = friend_pal) +
labs(x = "Host species", y = "Relative abundance (% total reads)")
pdf("FIGURES/OUTPUT/box_and_whisker.pdf")
sup_fig1
invisible(dev.off())
Code for bar plot.
sup_fig2 <- ggplot(meltd_all, aes_string(x = "Sample", y = "Abundance",
fill = "Class"), environment = .e, Ordered = TRUE)
sup_fig2 <- sup_fig2 + geom_bar(stat = "identity", position = "stack") +
facet_grid(Class ~ Sp, scales = "free", space = "free")
sup_fig2 <- sup_fig2 + scale_fill_manual(values = friend_pal)
# sup_fig2 <- sup_fig2 + theme(axis.text.x = element_text(angle = -90,
# hjust = 0))
sup_fig2 <- sup_fig2 + theme(axis.text.x = element_blank())
sup_fig2 <- sup_fig2 +
guides(fill = guide_legend(override.aes = list(colour = NULL),
reverse = FALSE)) +
theme(legend.key = element_rect(colour = "black"),
legend.position = "bottom") +
labs(x = "Individual samples", y = "Relative abundance (% total reads)")
pdf("FIGURES/OUTPUT/Figure_S1.pdf")
sup_fig2
invisible(dev.off())
Specific tools
– phyloseq as the primary analytical package.
– LEfSe to identify differentially abundant (DA) amplicon sequence variants (ASV) across host fish species.
– MicrobiomeAnalyst to conduct LEfSe analysis.
– BLASTn and Silva ACT to identify closest hits to DA ASVs.
– RAxML for phylogenetic inference of DA ASVs and closest hits.
– iTOL for visualization of tree and associated metadata.
Other valuable resources
– R Markdown: The Definitive Guide
– knitr tutorials. Fantastic site!
– Microbiota analysis in R, UCR Workshop 2018 nicely documented workflow with examples.
It is now time to submit the data to your favorite sequence read archive. We submitted out data to the European Nucleotide Archive (ENA). The ENA does not like RAW data and prefers to have primers removed. So we submitted the trimmed Fastq files to the ENA. You can find these data under the study accession number PRJEB28397. The RAW files on our figshare site. TODO INSERT DOI/LINK
To submit to the ENA you need two data tables (plus your sequence data). The first file describes the samples and the second file describes the sequencing data. The original files can be found on the figshare site.
Below are the specific packages and versions used in this workflow using both sessionInfo()
and devtools::session_info()
.
## user system elapsed
## -8057.647 -115.817 -3044.500
## R version 3.6.0 (2019-04-26)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
##
## Random number generation:
## RNG: Mersenne-Twister
## Normal: Inversion
## Sample: Rounding
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] grid stats4 parallel stats graphics grDevices utils
## [8] datasets methods base
##
## other attached packages:
## [1] phyloseq_1.28.0 treeio_1.8.2
## [3] tidytree_0.2.7 ggtree_1.16.6
## [5] picante_1.8 nlme_3.1-141
## [7] ape_5.3 gdata_2.18.0
## [9] heatmap3_1.1.6 circlize_0.4.8
## [11] ComplexHeatmap_2.1.0 ggpubr_0.2.3
## [13] magrittr_1.5 ggthemes_4.2.0
## [15] pairwiseAdonis_0.0.1 cluster_2.1.0
## [17] plotly_4.9.0 RCurl_1.95-4.12
## [19] bitops_1.0-6 svgPanZoom_0.3.3
## [21] gridExtra_2.3 forcats_0.4.0
## [23] stringr_1.4.0 dplyr_0.8.3
## [25] purrr_0.3.2 readr_1.3.1
## [27] tidyr_0.8.3 tibble_2.1.3
## [29] tidyverse_1.2.1 formatR_1.7
## [31] pander_0.6.3 rmarkdown_1.16
## [33] DT_0.8 data.table_1.12.2
## [35] kableExtra_1.1.0 knitr_1.25
## [37] rstudioapi_0.10 reshape2_1.4.3
## [39] scales_1.0.0 vegan_2.5-6
## [41] lattice_0.20-38 permute_0.9-5
## [43] plyr_1.8.4 ggplot2_3.2.1
## [45] ShortRead_1.42.0 GenomicAlignments_1.20.1
## [47] SummarizedExperiment_1.14.1 DelayedArray_0.10.0
## [49] matrixStats_0.55.0 Biobase_2.44.0
## [51] Rsamtools_2.0.0 GenomicRanges_1.36.1
## [53] GenomeInfoDb_1.20.0 Biostrings_2.52.0
## [55] XVector_0.24.0 IRanges_2.18.2
## [57] S4Vectors_0.22.1 BiocParallel_1.18.1
## [59] BiocGenerics_0.30.0 dada2_1.12.1
## [61] Rcpp_1.0.2
##
## loaded via a namespace (and not attached):
## [1] readxl_1.3.1 backports_1.1.5 igraph_1.2.4.1
## [4] lazyeval_0.2.2 splines_3.6.0 crosstalk_1.0.0
## [7] digest_0.6.21 foreach_1.4.7 htmltools_0.3.6
## [10] fastcluster_1.1.25 modelr_0.1.5 RcppParallel_4.4.3
## [13] colorspace_1.4-1 rvest_0.3.4 haven_2.1.1
## [16] xfun_0.10 crayon_1.3.4 jsonlite_1.6
## [19] zeallot_0.1.0 survival_2.44-1.1 iterators_1.0.12
## [22] glue_1.3.1 gtable_0.3.0 zlibbioc_1.30.0
## [25] webshot_0.5.1 GetoptLong_0.1.7 Rhdf5lib_1.6.1
## [28] shape_1.4.4 viridisLite_0.3.0 xtable_1.8-4
## [31] clue_0.3-57 htmlwidgets_1.3 httr_1.4.1
## [34] RColorBrewer_1.1-2 pkgconfig_2.0.3 tidyselect_0.2.5
## [37] labeling_0.3 rlang_0.4.0 later_0.8.0
## [40] munsell_0.5.0 cellranger_1.1.0 tools_3.6.0
## [43] cli_1.1.0 generics_0.0.2 ade4_1.7-13
## [46] broom_0.5.2 evaluate_0.14 biomformat_1.12.0
## [49] yaml_2.2.0 mime_0.7 xml2_1.2.2
## [52] compiler_3.6.0 png_0.1-7 ggsignif_0.6.0
## [55] stringi_1.4.3 highr_0.8 Matrix_1.2-17
## [58] multtest_2.40.0 vctrs_0.2.0 pillar_1.4.2
## [61] GlobalOptions_0.1.0 httpuv_1.5.1 R6_2.4.0
## [64] latticeExtra_0.6-28 hwriter_1.3.2 promises_1.0.1
## [67] codetools_0.2-16 MASS_7.3-51.4 gtools_3.8.1
## [70] assertthat_0.2.1 rhdf5_2.28.0 rjson_0.2.20
## [73] withr_2.1.2 GenomeInfoDbData_1.2.1 mgcv_1.8-28
## [76] hms_0.5.1 rvcheck_0.1.3 shiny_1.3.2
## [79] lubridate_1.7.4 base64enc_0.1-3
## ─ Session info ──────────────────────────────────────────────────────────
## setting value
## version R version 3.6.0 (2019-04-26)
## os macOS Mojave 10.14.6
## system x86_64, darwin15.6.0
## ui X11
## language (EN)
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz America/Panama
## date 2019-10-08
##
## ─ Packages ──────────────────────────────────────────────────────────────
## package * version date lib
## ade4 1.7-13 2018-08-31 [1]
## ape * 5.3 2019-03-17 [1]
## assertthat 0.2.1 2019-03-21 [1]
## backports 1.1.5 2019-10-02 [1]
## base64enc 0.1-3 2015-07-28 [1]
## Biobase * 2.44.0 2019-05-02 [1]
## BiocGenerics * 0.30.0 2019-05-02 [1]
## BiocParallel * 1.18.1 2019-08-06 [1]
## biomformat 1.12.0 2019-05-02 [1]
## Biostrings * 2.52.0 2019-05-02 [1]
## bitops * 1.0-6 2013-08-17 [1]
## broom 0.5.2 2019-04-07 [1]
## callr 3.3.1 2019-07-18 [1]
## cellranger 1.1.0 2016-07-27 [1]
## circlize * 0.4.8 2019-09-08 [1]
## cli 1.1.0 2019-03-19 [1]
## clue 0.3-57 2019-02-25 [1]
## cluster * 2.1.0 2019-06-19 [1]
## codetools 0.2-16 2018-12-24 [1]
## colorspace 1.4-1 2019-03-18 [1]
## ComplexHeatmap * 2.1.0 2019-09-10 [1]
## crayon 1.3.4 2017-09-16 [1]
## crosstalk 1.0.0 2016-12-21 [1]
## dada2 * 1.12.1 2019-05-14 [1]
## data.table * 1.12.2 2019-04-07 [1]
## DelayedArray * 0.10.0 2019-05-02 [1]
## desc 1.2.0 2018-05-01 [1]
## devtools 2.2.0 2019-09-07 [1]
## digest 0.6.21 2019-09-20 [1]
## dplyr * 0.8.3 2019-07-04 [1]
## DT * 0.8 2019-08-07 [1]
## ellipsis 0.3.0 2019-09-20 [1]
## evaluate 0.14 2019-05-28 [1]
## fastcluster 1.1.25 2018-06-07 [1]
## forcats * 0.4.0 2019-02-17 [1]
## foreach 1.4.7 2019-07-27 [1]
## formatR * 1.7 2019-06-11 [1]
## fs 1.3.1 2019-05-06 [1]
## gdata * 2.18.0 2017-06-06 [1]
## generics 0.0.2 2018-11-29 [1]
## GenomeInfoDb * 1.20.0 2019-05-02 [1]
## GenomeInfoDbData 1.2.1 2019-09-10 [1]
## GenomicAlignments * 1.20.1 2019-06-18 [1]
## GenomicRanges * 1.36.1 2019-09-06 [1]
## GetoptLong 0.1.7 2018-06-10 [1]
## ggplot2 * 3.2.1 2019-08-10 [1]
## ggpubr * 0.2.3 2019-09-03 [1]
## ggsignif 0.6.0 2019-08-08 [1]
## ggthemes * 4.2.0 2019-05-13 [1]
## ggtree * 1.16.6 2019-08-26 [1]
## GlobalOptions 0.1.0 2018-06-09 [1]
## glue 1.3.1 2019-03-12 [1]
## gridExtra * 2.3 2017-09-09 [1]
## gtable 0.3.0 2019-03-25 [1]
## gtools 3.8.1 2018-06-26 [1]
## haven 2.1.1 2019-07-04 [1]
## heatmap3 * 1.1.6 2019-03-22 [1]
## highr 0.8 2019-03-20 [1]
## hms 0.5.1 2019-08-23 [1]
## htmltools 0.3.6 2017-04-28 [1]
## htmlwidgets 1.3 2018-09-30 [1]
## httpuv 1.5.1 2019-04-05 [1]
## httr 1.4.1 2019-08-05 [1]
## hwriter 1.3.2 2014-09-10 [1]
## igraph 1.2.4.1 2019-04-22 [1]
## IRanges * 2.18.2 2019-08-24 [1]
## iterators 1.0.12 2019-07-26 [1]
## jsonlite 1.6 2018-12-07 [1]
## kableExtra * 1.1.0 2019-03-16 [1]
## knitr * 1.25 2019-09-18 [1]
## labeling 0.3 2014-08-23 [1]
## later 0.8.0 2019-02-11 [1]
## lattice * 0.20-38 2018-11-04 [1]
## latticeExtra 0.6-28 2016-02-09 [1]
## lazyeval 0.2.2 2019-03-15 [1]
## lubridate 1.7.4 2018-04-11 [1]
## magrittr * 1.5 2014-11-22 [1]
## MASS 7.3-51.4 2019-03-31 [1]
## Matrix 1.2-17 2019-03-22 [1]
## matrixStats * 0.55.0 2019-09-07 [1]
## memoise 1.1.0 2017-04-21 [1]
## mgcv 1.8-28 2019-03-21 [1]
## mime 0.7 2019-06-11 [1]
## modelr 0.1.5 2019-08-08 [1]
## multtest 2.40.0 2019-05-02 [1]
## munsell 0.5.0 2018-06-12 [1]
## nlme * 3.1-141 2019-08-01 [1]
## pairwiseAdonis * 0.0.1 2019-09-10 [1]
## pander * 0.6.3 2018-11-06 [1]
## permute * 0.9-5 2019-03-12 [1]
## phyloseq * 1.28.0 2019-05-02 [1]
## picante * 1.8 2019-03-21 [1]
## pillar 1.4.2 2019-06-29 [1]
## pkgbuild 1.0.5 2019-08-26 [1]
## pkgconfig 2.0.3 2019-09-22 [1]
## pkgload 1.0.2 2018-10-29 [1]
## plotly * 4.9.0 2019-04-10 [1]
## plyr * 1.8.4 2016-06-08 [1]
## png 0.1-7 2013-12-03 [1]
## prettyunits 1.0.2 2015-07-13 [1]
## processx 3.4.1 2019-07-18 [1]
## promises 1.0.1 2018-04-13 [1]
## ps 1.3.0 2018-12-21 [1]
## purrr * 0.3.2 2019-03-15 [1]
## R6 2.4.0 2019-02-14 [1]
## RColorBrewer 1.1-2 2014-12-07 [1]
## Rcpp * 1.0.2 2019-07-25 [1]
## RcppParallel 4.4.3 2019-05-22 [1]
## RCurl * 1.95-4.12 2019-03-04 [1]
## readr * 1.3.1 2018-12-21 [1]
## readxl 1.3.1 2019-03-13 [1]
## remotes 2.1.0 2019-06-24 [1]
## reshape2 * 1.4.3 2017-12-11 [1]
## rhdf5 2.28.0 2019-05-02 [1]
## Rhdf5lib 1.6.1 2019-09-09 [1]
## rjson 0.2.20 2018-06-08 [1]
## rlang 0.4.0 2019-06-25 [1]
## rmarkdown * 1.16 2019-10-01 [1]
## rprojroot 1.3-2 2018-01-03 [1]
## Rsamtools * 2.0.0 2019-05-02 [1]
## rstudioapi * 0.10 2019-03-19 [1]
## rvcheck 0.1.3 2018-12-06 [1]
## rvest 0.3.4 2019-05-15 [1]
## S4Vectors * 0.22.1 2019-09-09 [1]
## scales * 1.0.0 2018-08-09 [1]
## sessioninfo 1.1.1 2018-11-05 [1]
## shape 1.4.4 2018-02-07 [1]
## shiny 1.3.2 2019-04-22 [1]
## ShortRead * 1.42.0 2019-05-02 [1]
## stringi 1.4.3 2019-03-12 [1]
## stringr * 1.4.0 2019-02-10 [1]
## SummarizedExperiment * 1.14.1 2019-07-31 [1]
## survival 2.44-1.1 2019-04-01 [1]
## svgPanZoom * 0.3.3 2016-09-26 [1]
## testthat 2.2.1 2019-07-25 [1]
## tibble * 2.1.3 2019-06-06 [1]
## tidyr * 0.8.3 2019-03-01 [1]
## tidyselect 0.2.5 2018-10-11 [1]
## tidytree * 0.2.7 2019-09-12 [1]
## tidyverse * 1.2.1 2017-11-14 [1]
## treeio * 1.8.2 2019-08-21 [1]
## usethis 1.5.1 2019-07-04 [1]
## vctrs 0.2.0 2019-07-05 [1]
## vegan * 2.5-6 2019-09-01 [1]
## viridisLite 0.3.0 2018-02-01 [1]
## webshot 0.5.1 2018-09-28 [1]
## withr 2.1.2 2018-03-15 [1]
## xfun 0.10 2019-10-01 [1]
## xml2 1.2.2 2019-08-09 [1]
## xtable 1.8-4 2019-04-21 [1]
## XVector * 0.24.0 2019-05-02 [1]
## yaml 2.2.0 2018-07-25 [1]
## zeallot 0.1.0 2018-01-28 [1]
## zlibbioc 1.30.0 2019-05-02 [1]
## source
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## Bioconductor
## Bioconductor
## Bioconductor
## Bioconductor
## Bioconductor
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## Github (jokergoo/ComplexHeatmap@35d1d20)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## Bioconductor
## CRAN (R 3.6.0)
## Bioconductor
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## Bioconductor
## Bioconductor
## Bioconductor
## Bioconductor
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## Bioconductor
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## Bioconductor
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## Bioconductor
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## Github (pmartinezarbizu/pairwiseAdonis@6e09713)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## Bioconductor
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## Bioconductor
## Bioconductor
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## Bioconductor
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## Bioconductor
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## Bioconductor
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## Bioconductor
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## Bioconductor
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## Bioconductor
## CRAN (R 3.6.0)
## CRAN (R 3.6.0)
## Bioconductor
##
## [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
## Time difference of 1.353919 mins