AIRR-Seq

BISC Global OPP AIRR-Seq
An adaptive immune repertoire sequencing (AIRR-Seq) pipeline. Implemented in Nextflow and part of the Online Pipelines Platform (OP2).

Pipeline overview

The 10X BCR immune repertoire sequencing pipeline is a bioinformatics workflow for characterizing the sequences of B-cell receptors (BCRs) with single-cell resolution for libraries generated using 10x technology. You get insights into the quality of your data, overview of the clonotype contents, somatic hypermutation rates, amino acid properties, gene usage, repertoire overlap and phylogenetic trees.

The workflow consists of two parts: pre- and postprocessing. The pre-processing workflow processes the raw FASTQ sequence data using the 10x Chromium Cell Barcodes and unique molecular identifiers (UMIs) to assemble V(D)J transcripts per cell. Results are made available for download in the standardized AIRR-tsv format. The post-processing workflow provides insights into the immune repertoire, for example, by determination of clonotypes or inference of phylogenetic trees.

See the pipeline page for a more detailed overview.

Do you have any question about these results? Just email us at helpdesk@biscglobal.com

Report info

Generated on
2023-01-26 14:40
Experiment
experiment
Pipeline
AIRR-Seq
Report
Post-processing Report
Species
human
Species Build
ig

AIRRseq Data Overview

This section provides a table with overview of the samples used to generate this report.

The Simpson diversity provides a statistical estimate on the probability of selecting two different sequences when randomly selecting 2 from the dataset.

Clones are defined as a group of cells that are descended from a common ancestor and can’t be distinguished on a sequence level. They are determined as identical CDRH1 CDRH2 and CDRH3 amino acid sequences.

Clonotypes are sequences with identical CDRH3 amino acid sequence and same V-J gene combination.

Lineages refer to a group of cells that are all descended from the same naive ancestor and carry the same rearrangements. In this report, they are determined the same way as Clonotypes, but any Lineages with less than 2 sequences are removed from downstream analysis.

Sample Nr Sample Name Sequences Clones Clonotypes Lineages Simpson Diversity Avg V identity Avg J identity
1 BSSE_QGF_138968_HM3H7DRXX_2_BC5921798_SI-GA-D1 2392 2336 2329 0 0.9994427 0.97 0.99
2 BSSE_QGF_138972_HM3H7DRXX_1_BC5922423_SI-GA-D5 1402 1392 1390 0 0.9990487 0.97 0.99

CDRH3 Overlap

This section visualizes overlap between repertoires on CDRH3 amino acid level. Overlap is calculated as number of unique CDR3s common divided by number of total unique CDR3s. Scaling is performed on \(log_{10}\) of the (\(overlap +1\)), at permille scale (i.e. \(x1000\))

## Warning in doTryCatch(return(expr), name, parentenv, handler): The CDR3 overlap
## matrix only contains zeros.

Top 10 Clonotypes

This section visualizes the Top 10 clonotypes with the corresponding percentage.

The subsequent Table provides this information in a more readable format.

Top Clonotype
Sample 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
BSSE_QGF_138968_HM3H7DRXX_2_BC5921798_SI-GA-D1 1342: 0.21% 318: 0.13% 511: 0.13% 1000: 0.13% 1029: 0.13% 1068: 0.13% 1187: 0.13% 1285: 0.13% 1629: 0.13% 1875: 0.13%
BSSE_QGF_138972_HM3H7DRXX_1_BC5922423_SI-GA-D5 281: 0.21% 175: 0.14% 278: 0.14% 282: 0.14% 315: 0.14% 419: 0.14% 513: 0.14% 1010: 0.14% 1054: 0.14% 1352: 0.14%

Clonal and Clonotype Accumulation Curves

This section visualizes the clonal and clonotype composition of each repertoire as a size-sorted accumulation curve.


SHM Rates

Violinplots allow examination of the distribution and composition of a metric. Somatic hypermutation rates provide a firsthand measure on the mutation status and thus affinity maturation levels per sample. In the plot below, the somatic hypermutation of each repertoire is visualized as a violinplot over V gene identity (i.e. % difference to closest germline) with mean indicated by a white point.


Amino Acid Physiochemical Properties

The physiochemical properties provide a swift overview of potential protein interaction metrics. Here we focus on 9 properties of the HCDR3 + somatic hypermutation rates across the V gene.Together these metrics allow a quick interpretation of which changes occur in response of e.g. a vaccination or antigenic stimulus.


Clonal Abundance and Diversity

Alakazam provides methods for the inference of a complete clonal abundance distribution along with two approaches to assess the diversity of these distributions.

While observed abundances can be computed, they will not provide confidence intervals. A complete clonal abundance distribution may be inferred using the estimateAbundance function with confidence intervals derived via bootstrapping. In this section we partition the data on the sample column and calculate a 95% confidence interval via 200 bootstrap realizations. Then, we plot a rank abundance curve of the relative clonal abundances.

Diversity curves as calculated with the alakazam package allow comparison of different grades of diversity (q 1 to 4) for all samples of each repertoire.

From the alakazam handbook: This method, proposed by Hill (Hill, 1973), quantifies diversity as a smooth function (D) of a single parameter q. Special cases of the generalized diversity index correspond to the most popular diversity measures in ecology: species richness (q = 0), the exponential of the Shannon-Weiner index (q approaches 1), the inverse of the Simpson index (q = 2), and the reciprocal abundance of the largest clone (q approaches +\(\infty\)). At q = 0 different clones weight equally, regardless of their size. As the parameter q increases from 0 to +\(\infty\) the diversity index (D) depends less on rare clones and more on common (abundant) ones, thus encompassing a range of definitions that can be visualized as a single curve.

Values of q < 0 are valid, but are generally not meaningful. The value of D at q=1 is estimated by D at q=0.9999.


Clonotype V Gene Usage

This section shows the family V genes with their frequency.


Software Versions