Metadata

The metadata is collected from samplesheet.

Sample ID	Group ID
A	A
B	A
C	B
D	B

Sample-level Quality control

High-level quality control of the samples and their behavior based on the count data matrix.

Sample correlation

MDS Plot

Multidimensional scaling plot is generated to inspect how samples are clustered based on their relative normalization factors.

Mean-variance trend

The DESeq2 dispersion, a measure of spread or variability in data, estimates are inversely related to the mean and directly related to variance. Based on this relationship, the dispersion is higher for small mean counts and lower for large mean counts. The dispersion estimates for genes with the same mean will differ only based on their variance. Therefore, the dispersion estimates reflect the variance in gene expression for a given mean value.

The plot of mean versus variance in count data below shows the variance in gene expression increases with the mean expression (each black dot is a gene). Notice that the relationship between mean and variance is linear on the log scale, and for higher means, we could predict the variance relatively accurately given the mean. However, for low mean counts, the variance estimates have a much larger spread; therefore, the dispersion estimates will differ much more between genes with small means.

Differential Gene Expression (DGE) analysis

Differential expression analysis is performed using DESeq2 and is based on the Negative Binomial (a.k.a. Gamma-Poission) distribution.

Gene-level Quality control

DESeq package will also omit genes that have little or no chance of being detected as differentially expressed. This will increase the power to detect differentially expressed genes.

The genes omitted fall into three categories:

Genes with zero counts in all samples
Genes with an extreme count outlier
Genes with a low mean normalized counts

This plot shows per-gene dispersion estimates together with the fitted mean-dispersion relationship.

PCA

Principal Component Analysis (PCA) is a technique used to emphasize variation and bring out strong patterns in a dataset (dimensionality reduction). DESeq2 uses a regularized log transform (rlog) of the normalized counts for sample-level QC as it moderates the variance across the mean, improving the clustering. Another technique is variance stabilizing transformation (vst) Both techniques aim to remove the dependence of the variance on the mean. In particular, genes with low expression level and therefore low read counts tend to have high variance, which is not removed efficiently by the ordinary logarithmic transformation.

The chosen technique to transform normalized counts here: rlog

Hierarchical clustering/Heatmap

The hierarchical clustering using the same technique (FALSE) as for PCA, shows the correlation between samples. A high overall correlation suggests no outlying samples. Also similar to PCA plot where samples are clustered by group ID.

Pairwise comparisons

The order of the names determines the direction of fold change that is reported. The name provided in the second element is the level that is used as baseline. So for example, if we observe a log2 fold change of -2 this would mean the gene expression is lower in first element relative to the control (second element). E.g. treatment (first element) vs control (second element).

The lfc.cutoff is set to 0.58; which translated to fold change of 1.5 with log2 fold changes.

(Pairwise) UP, DOWN and TOTAL-regulated significant genes

Below is a summary of up, down and total of significant genes per pairwise comparison.

Comparison	Low	High	Total
A-B	12	6	18

Full table is written to DEG_all.csv.

Top selection of Significant Differentially Expressed Genes

Top selection of significant differentially expressed genes based on their normalized counts.

Volcano plots

Volcano plots of differential expressed genes in pairwise comparisons with threshold p-value adjusted < 0.05 and fold change > 1.5.

Gene Ontology Analysis

Gene ontology analysis is performed using annotation libraries: clusterProfiler and org.Mm.eg.db.

Column description:

Ontology: BP for Biological Process, MF for Molecular Function, and CC for Cellular Component
ID: Gene ontology ID
Description: Description of gene ontology
Gene ratio: Gene ratio
Background ratio: Background ratio
P-value: P-value
P-value adjusted: Method: Benjamini-Hochberg (p-value < 0.05)
qvalue: Q-value
geneID: All genes that occur in this ontology (Hidden below, available in csv file)
Count: Amount of genes found
Constrast: Contrast in which genes found

Software Versions

## R version 4.0.3 (2020-10-10)
        ## Platform: x86_64-pc-linux-gnu (64-bit)
        ## Running under: Ubuntu 18.04.5 LTS
        ##
        ## Matrix products: default
        ## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
        ## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
        ##
        ## locale:
        ## [1] C
        ##
        ## attached base packages:
        ## [1] parallel  stats4    stats     graphics  grDevices utils     datasets
        ## [8] methods   base
        ##
        ## other attached packages:
        ##  [1] org.Mm.eg.db_3.12.0         org.Hs.eg.db_3.12.0
        ##  [3] AnnotationDbi_1.52.0        reshape_0.8.8
        ##  [5] DT_0.17                     ggrepel_0.9.0
        ##  [7] clusterProfiler_3.18.0      gplots_3.1.1
        ##  [9] DEGreport_1.26.0            pheatmap_1.0.12
        ## [11] RColorBrewer_1.1-2          forcats_0.5.0
        ## [13] stringr_1.4.0               dplyr_1.0.2
        ## [15] purrr_0.3.4                 readr_1.4.0
        ## [17] tidyr_1.1.2                 tibble_3.0.4
        ## [19] ggplot2_3.3.3               tidyverse_1.3.0
        ## [21] DESeq2_1.30.0               SummarizedExperiment_1.20.0
        ## [23] Biobase_2.50.0              MatrixGenerics_1.2.0
        ## [25] matrixStats_0.57.0          GenomicRanges_1.42.0
        ## [27] GenomeInfoDb_1.26.2         IRanges_2.24.1
        ## [29] S4Vectors_0.28.1            BiocGenerics_0.36.0
        ## [31] knitr_1.30                  edgeR_3.32.0
        ## [33] limma_3.46.0                optparse_1.6.6
        ## [35] rmarkdown_2.6
        ##
        ## loaded via a namespace (and not attached):
        ##   [1] tidyselect_1.1.0            RSQLite_2.2.2
        ##   [3] htmlwidgets_1.5.3           grid_4.0.3
        ##   [5] BiocParallel_1.24.1         scatterpie_0.1.5
        ##   [7] munsell_0.5.0               withr_2.3.0
        ##   [9] colorspace_2.0-0            GOSemSim_2.16.1
        ##  [11] highr_0.8                   rstudioapi_0.13
        ##  [13] DOSE_3.16.0                 labeling_0.4.2
        ##  [15] lasso2_1.2-21.1             GenomeInfoDbData_1.2.4
        ##  [17] mixsqp_0.3-43               mnormt_2.0.2
        ##  [19] polyclip_1.10-0             bit64_4.0.5
        ##  [21] farver_2.0.3                downloader_0.4
        ##  [23] vctrs_0.3.6                 generics_0.1.0
        ##  [25] xfun_0.20                   R6_2.5.0
        ##  [27] clue_0.3-58                 graphlayouts_0.7.1
        ##  [29] invgamma_1.1                locfit_1.5-9.4
        ##  [31] bitops_1.0-6                fgsea_1.16.0
        ##  [33] DelayedArray_0.16.0         assertthat_0.2.1
        ##  [35] scales_1.1.1                ggraph_2.0.4
        ##  [37] enrichplot_1.10.1           gtable_0.3.0
        ##  [39] Cairo_1.5-12.2              tidygraph_1.2.0
        ##  [41] rlang_0.4.10                genefilter_1.72.0
        ##  [43] GlobalOptions_0.1.2         splines_4.0.3
        ##  [45] broom_0.7.3                 BiocManager_1.30.10
        ##  [47] yaml_2.2.1                  reshape2_1.4.4
        ##  [49] modelr_0.1.8                crosstalk_1.1.0.1
        ##  [51] backports_1.2.1             qvalue_2.22.0
        ##  [53] tools_4.0.3                 psych_2.0.12
        ##  [55] logging_0.10-108            ellipsis_0.3.1
        ##  [57] ggdendro_0.1.22             Rcpp_1.0.5
        ##  [59] plyr_1.8.6                  zlibbioc_1.36.0
        ##  [61] RCurl_1.98-1.2              ps_1.5.0
        ##  [63] GetoptLong_1.0.5            viridis_0.5.1
        ##  [65] ashr_2.2-47                 cowplot_1.1.1
        ##  [67] haven_2.3.1                 cluster_2.1.0
        ##  [69] fs_1.5.0                    magrittr_2.0.1
        ##  [71] data.table_1.13.6           DO.db_2.9
        ##  [73] circlize_0.4.12             reprex_0.3.0
        ##  [75] truncnorm_1.0-8             tmvnsim_1.0-2
        ##  [77] SQUAREM_2020.5              hms_0.5.3
        ##  [79] evaluate_0.14               xtable_1.8-4
        ##  [81] XML_3.99-0.5                readxl_1.3.1
        ##  [83] gridExtra_2.3               shape_1.4.5
        ##  [85] compiler_4.0.3              KernSmooth_2.23-18
        ##  [87] crayon_1.3.4                shadowtext_0.0.7
        ##  [89] htmltools_0.5.0             geneplotter_1.68.0
        ##  [91] lubridate_1.7.9.2           DBI_1.1.0
        ##  [93] tweenr_1.0.1                dbplyr_2.0.0
        ##  [95] ComplexHeatmap_2.6.2        MASS_7.3-53
        ##  [97] Matrix_1.3-2                getopt_1.20.3
        ##  [99] cli_2.2.0                   igraph_1.2.6
        ## [101] pkgconfig_2.0.3             rvcheck_0.1.8
        ## [103] xml2_1.3.2                  annotate_1.68.0
        ## [105] XVector_0.30.0              rvest_0.3.6
        ## [107] digest_0.6.27               ConsensusClusterPlus_1.54.0
        ## [109] cellranger_1.1.0            fastmatch_1.1-0
        ## [111] gtools_3.8.2                rjson_0.2.20
        ## [113] lifecycle_0.2.0             nlme_3.1-151
        ## [115] jsonlite_1.7.2              viridisLite_0.3.0
        ## [117] fansi_0.4.1                 pillar_1.4.7
        ## [119] lattice_0.20-41             Nozzle.R1_1.1-1
        ## [121] httr_1.4.2                  survival_3.2-7
        ## [123] GO.db_3.12.1                glue_1.4.2
        ## [125] png_0.1-7                   bit_4.0.4
        ## [127] ggforce_0.3.2               stringi_1.5.3
        ## [129] blob_1.2.1                  caTools_1.18.0
        ## [131] memoise_1.1.0               irlba_2.3.3

Bulk Transcriptomics

Bulk Transcriptomics
A bulk RNA pipeline, implemented in Nextflow and part of the Online Pipelines Platform (OP²).

Pipeline overview

Report info