Metadata
The metadata is collected from samplesheet.
Sample ID | Group ID |
---|---|
A | A |
B | A |
C | B |
D | B |
Sample-level Quality control
High-level quality control of the samples and their behavior based on the count data matrix.
Sample correlation
MDS Plot
Multidimensional scaling plot is generated to inspect how samples are clustered based on their relative normalization factors.
Mean-variance trend
The DESeq2 dispersion, a measure of spread or variability in data, estimates are inversely related to the mean and directly related to variance. Based on this relationship, the dispersion is higher for small mean counts and lower for large mean counts. The dispersion estimates for genes with the same mean will differ only based on their variance. Therefore, the dispersion estimates reflect the variance in gene expression for a given mean value.
The plot of mean versus variance in count data below shows the variance in gene expression increases with the mean expression (each black dot is a gene). Notice that the relationship between mean and variance is linear on the log scale, and for higher means, we could predict the variance relatively accurately given the mean. However, for low mean counts, the variance estimates have a much larger spread; therefore, the dispersion estimates will differ much more between genes with small means.
Differential Gene Expression (DGE) analysis
Differential expression analysis is performed using DESeq2 and is based on the Negative Binomial (a.k.a. Gamma-Poission) distribution.
Gene-level Quality control
DESeq package will also omit genes that have little or no chance of being detected as differentially expressed. This will increase the power to detect differentially expressed genes.
The genes omitted fall into three categories:
- Genes with zero counts in all samples
- Genes with an extreme count outlier
- Genes with a low mean normalized counts
This plot shows per-gene dispersion estimates together with the fitted mean-dispersion relationship.
PCA
Principal Component Analysis (PCA) is a technique used to emphasize variation and bring out strong patterns in a dataset (dimensionality reduction). DESeq2 uses a regularized log transform (rlog) of the normalized counts for sample-level QC as it moderates the variance across the mean, improving the clustering. Another technique is variance stabilizing transformation (vst) Both techniques aim to remove the dependence of the variance on the mean. In particular, genes with low expression level and therefore low read counts tend to have high variance, which is not removed efficiently by the ordinary logarithmic transformation.
The chosen technique to transform normalized counts here: rlog
Hierarchical clustering/Heatmap
The hierarchical clustering using the same technique (FALSE) as for PCA, shows the correlation between samples. A high overall correlation suggests no outlying samples. Also similar to PCA plot where samples are clustered by group ID.
Pairwise comparisons
The order of the names determines the direction of fold change that is reported. The name provided in the second element is the level that is used as baseline. So for example, if we observe a log2 fold change of -2 this would mean the gene expression is lower in first element relative to the control (second element). E.g. treatment (first element) vs control (second element).
The lfc.cutoff is set to 0.58; which translated to fold change of 1.5 with log2 fold changes.
(Pairwise) UP, DOWN and TOTAL-regulated significant genes
Below is a summary of up, down and total of significant genes per pairwise comparison.
Comparison | Low | High | Total |
---|---|---|---|
A-B | 12 | 6 | 18 |
Full table is written to DEG_all.csv
.
Top selection of Significant Differentially Expressed Genes
Top selection of significant differentially expressed genes based on their normalized counts.
Volcano plots
Volcano plots of differential expressed genes in pairwise comparisons with threshold p-value adjusted < 0.05
and fold change > 1.5
.
Gene Ontology Analysis
Gene ontology analysis is performed using annotation libraries: clusterProfiler
and org.Mm.eg.db
.
- Ontology
- BP for Biological Process, MF for Molecular Function, and CC for Cellular Component
- ID
- Gene ontology ID
- Description
- Description of gene ontology
- Gene ratio
- Gene ratio
- Background ratio
- Background ratio
- P-value
- P-value
- P-value adjusted
- Method: Benjamini-Hochberg (p-value < 0.05)
- qvalue
- Q-value
- geneID
- All genes that occur in this ontology (Hidden below, available in csv file)
- Count
- Amount of genes found
- Constrast
- Contrast in which genes found
Software Versions
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] C
##
## attached base packages:
## [1] parallel stats4 stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] org.Mm.eg.db_3.12.0 org.Hs.eg.db_3.12.0
## [3] AnnotationDbi_1.52.0 reshape_0.8.8
## [5] DT_0.17 ggrepel_0.9.0
## [7] clusterProfiler_3.18.0 gplots_3.1.1
## [9] DEGreport_1.26.0 pheatmap_1.0.12
## [11] RColorBrewer_1.1-2 forcats_0.5.0
## [13] stringr_1.4.0 dplyr_1.0.2
## [15] purrr_0.3.4 readr_1.4.0
## [17] tidyr_1.1.2 tibble_3.0.4
## [19] ggplot2_3.3.3 tidyverse_1.3.0
## [21] DESeq2_1.30.0 SummarizedExperiment_1.20.0
## [23] Biobase_2.50.0 MatrixGenerics_1.2.0
## [25] matrixStats_0.57.0 GenomicRanges_1.42.0
## [27] GenomeInfoDb_1.26.2 IRanges_2.24.1
## [29] S4Vectors_0.28.1 BiocGenerics_0.36.0
## [31] knitr_1.30 edgeR_3.32.0
## [33] limma_3.46.0 optparse_1.6.6
## [35] rmarkdown_2.6
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.1.0 RSQLite_2.2.2
## [3] htmlwidgets_1.5.3 grid_4.0.3
## [5] BiocParallel_1.24.1 scatterpie_0.1.5
## [7] munsell_0.5.0 withr_2.3.0
## [9] colorspace_2.0-0 GOSemSim_2.16.1
## [11] highr_0.8 rstudioapi_0.13
## [13] DOSE_3.16.0 labeling_0.4.2
## [15] lasso2_1.2-21.1 GenomeInfoDbData_1.2.4
## [17] mixsqp_0.3-43 mnormt_2.0.2
## [19] polyclip_1.10-0 bit64_4.0.5
## [21] farver_2.0.3 downloader_0.4
## [23] vctrs_0.3.6 generics_0.1.0
## [25] xfun_0.20 R6_2.5.0
## [27] clue_0.3-58 graphlayouts_0.7.1
## [29] invgamma_1.1 locfit_1.5-9.4
## [31] bitops_1.0-6 fgsea_1.16.0
## [33] DelayedArray_0.16.0 assertthat_0.2.1
## [35] scales_1.1.1 ggraph_2.0.4
## [37] enrichplot_1.10.1 gtable_0.3.0
## [39] Cairo_1.5-12.2 tidygraph_1.2.0
## [41] rlang_0.4.10 genefilter_1.72.0
## [43] GlobalOptions_0.1.2 splines_4.0.3
## [45] broom_0.7.3 BiocManager_1.30.10
## [47] yaml_2.2.1 reshape2_1.4.4
## [49] modelr_0.1.8 crosstalk_1.1.0.1
## [51] backports_1.2.1 qvalue_2.22.0
## [53] tools_4.0.3 psych_2.0.12
## [55] logging_0.10-108 ellipsis_0.3.1
## [57] ggdendro_0.1.22 Rcpp_1.0.5
## [59] plyr_1.8.6 zlibbioc_1.36.0
## [61] RCurl_1.98-1.2 ps_1.5.0
## [63] GetoptLong_1.0.5 viridis_0.5.1
## [65] ashr_2.2-47 cowplot_1.1.1
## [67] haven_2.3.1 cluster_2.1.0
## [69] fs_1.5.0 magrittr_2.0.1
## [71] data.table_1.13.6 DO.db_2.9
## [73] circlize_0.4.12 reprex_0.3.0
## [75] truncnorm_1.0-8 tmvnsim_1.0-2
## [77] SQUAREM_2020.5 hms_0.5.3
## [79] evaluate_0.14 xtable_1.8-4
## [81] XML_3.99-0.5 readxl_1.3.1
## [83] gridExtra_2.3 shape_1.4.5
## [85] compiler_4.0.3 KernSmooth_2.23-18
## [87] crayon_1.3.4 shadowtext_0.0.7
## [89] htmltools_0.5.0 geneplotter_1.68.0
## [91] lubridate_1.7.9.2 DBI_1.1.0
## [93] tweenr_1.0.1 dbplyr_2.0.0
## [95] ComplexHeatmap_2.6.2 MASS_7.3-53
## [97] Matrix_1.3-2 getopt_1.20.3
## [99] cli_2.2.0 igraph_1.2.6
## [101] pkgconfig_2.0.3 rvcheck_0.1.8
## [103] xml2_1.3.2 annotate_1.68.0
## [105] XVector_0.30.0 rvest_0.3.6
## [107] digest_0.6.27 ConsensusClusterPlus_1.54.0
## [109] cellranger_1.1.0 fastmatch_1.1-0
## [111] gtools_3.8.2 rjson_0.2.20
## [113] lifecycle_0.2.0 nlme_3.1-151
## [115] jsonlite_1.7.2 viridisLite_0.3.0
## [117] fansi_0.4.1 pillar_1.4.7
## [119] lattice_0.20-41 Nozzle.R1_1.1-1
## [121] httr_1.4.2 survival_3.2-7
## [123] GO.db_3.12.1 glue_1.4.2
## [125] png_0.1-7 bit_4.0.4
## [127] ggforce_0.3.2 stringi_1.5.3
## [129] blob_1.2.1 caTools_1.18.0
## [131] memoise_1.1.0 irlba_2.3.3