Single-cell Transcriptomics

Our OP² single-cell transcriptomics pipeline is a bioinformatics analysis workflow used for single-cell RNA sequencing data.
It allows you to analyze your RNA sequencing data using this gold standard analysis pipeline.
You get insights into the quality of your data, expression profiles of your cells, differential expression levels of multiple genes, cell annotations and identities, and gene enrichment analysis.

The workflow processes raw data from FastQ inputs, aligns the reads, generates counts relative to genes and performs extensive quality-control on the results.
These results are made available to you via two interactive reports, and a data package with all essential intermediate files to perform more in-depth data analysis.
The pre-processing workflow processes your raw sequence data until QC approved aligned data.
Next, the post-processing workflow enables you to review the biological meaning of your data via a statistical analysis approach.

Example Pre-processing Report
Example Post-processing Report
  • 1

    Input

    Droplet-based (e.g. 10X Genomics)
    Compressed raw FastQ files (R1 and R2)
    Chemistry specific barcodes (V2 or V3)
    Reference transcriptome (hg19 or hg38 or mm10)

  • 2

    De-multiplexing

    Extract cell barcodes to retrievesingle cell information
    Cell barcodes + UMI’s

  • 3

    Sequence QC

    Reads with low-quality cell barcodes are discarded

  • 4

    Alignment

    STARsolo aligns reads to reference transcriptome

  • 5

    Alignment QC

    Alignment statistics: read depths, per base, GC content, …

  • 1

    Input

    Load cell-gene count matrices

  • 2

    Produce high count matrix

    Identification of cells from empty droplets
    Removal of barcode-swapped pseudo-cells
    Downsampling of the count matrix

  • 3

    Matrix QC

    Identification of low quality libraries
    Number of UMI's, low expressed genes and percentage of mitochondrial

  • 4

    Remove outliers

    Automated mean absolute deviation (MAD) thresholding
    Removed matrix QC values with MAD above 3

  • 5

    Normalize data

    Library size normalisation to remove technical biases

  • 6

    Identify highly variable features

    Select most variable genes that contain useful information about the biology
    Remove genes that contain noise

  • 7

    Integrate Seurat objects

    Format object to perform statistical analysis

  • 8

    Scale data

    Linear transformation to give equal weights to all genes
    Avoid highly-expressed gene to dominate
    Shift gene expression values to cell mean of 0
    Shift gene variance values to cell mean of 1

  • 9

    Linear dimension reduction

    Principal components analysis (PCA) is performed to denoise and compact the data prior to post-processing.

  • 10

    Determine dimensionality

    Select components based on the Elbow Plot

  • 11

    Cluster cells

    Construct K-Nearest neighbor graph on Euclidean distancein PCA space
    Refine by Jaccard similarity
    Cluster cells by modularity optimization Louvain algorithms

  • 12

    Non-linear dimension reduction

    t-distributed stochastic neighbour embedding (t-SNE) is widely used for visualizing complex single-cell data sets. This is useful as it improves speed by using a low-rank approximation of the expression matrix; and reduces random noise, by focusing on the major factors of variation.

  • 13

    Assign cell types

    Unbiased cell type recognition from single-cell RNA sequencing data, by leveraging reference transcriptomic datasets of pure cell types to infer the cell of origin of each single cell independently.

  • 14

    Identify cell markers

    If a cell in the test dataset is confidently assigned to a particular label, we would expect it to have strong expression of that label’s markers. This can be useful if you want to find markers that are conserved between a treated and untreated condition for a specific cell type or group of cells. 

  • 15

    Identify differential expressed genes

    After identifying conserved markers, a comparative analyses is performed on the differences induced by stimulation/treatment. We take the average expression of all clusters and generate the scatter plots, highlighting genes that are identified in previous step.

  • 16

    Gene ontology

    Check over-repressentations of genes or gene products across conditions.