AIRRseq Data Overview
This section provides a table with overview of the samples used to generate this report.
The Simpson diversity provides a statistical estimate on the probability of selecting two different sequences when randomly selecting 2 from the dataset.
Clones are defined as a group of cells that are descended from a common ancestor and can’t be distinguished on a sequence level. They are determined as identical CDRH1 CDRH2 and CDRH3 amino acid sequences.
Clonotypes are sequences with identical CDRH3 amino acid sequence and same V-J gene combination.
Lineages refer to a group of cells that are all descended from the same naive ancestor and carry the same rearrangements. In this report, they are determined the same way as Clonotypes, but any Lineages with less than 2 sequences are removed from downstream analysis.
Sample Nr | Sample Name | Sequences | Clones | Clonotypes | Lineages | Simpson Diversity | Avg V identity | Avg J identity |
---|---|---|---|---|---|---|---|---|
1 | BSSE_QGF_138968_HM3H7DRXX_2_BC5921798_SI-GA-D1 | 2392 | 2336 | 2329 | 0 | 0.9994427 | 0.97 | 0.99 |
2 | BSSE_QGF_138972_HM3H7DRXX_1_BC5922423_SI-GA-D5 | 1402 | 1392 | 1390 | 0 | 0.9990487 | 0.97 | 0.99 |
CDRH3 Overlap
This section visualizes overlap between repertoires on CDRH3 amino acid level. Overlap is calculated as number of unique CDR3s common divided by number of total unique CDR3s. Scaling is performed on \(log_{10}\) of the (\(overlap +1\)), at permille scale (i.e. \(x1000\))
## Warning in doTryCatch(return(expr), name, parentenv, handler): The CDR3 overlap
## matrix only contains zeros.
Top 10 Clonotypes
This section visualizes the Top 10 clonotypes with the corresponding percentage.
The subsequent Table provides this information in a more readable format.
Sample | 1st | 2nd | 3rd | 4th | 5th | 6th | 7th | 8th | 9th | 10th |
---|---|---|---|---|---|---|---|---|---|---|
BSSE_QGF_138968_HM3H7DRXX_2_BC5921798_SI-GA-D1 | 1342: 0.21% | 318: 0.13% | 511: 0.13% | 1000: 0.13% | 1029: 0.13% | 1068: 0.13% | 1187: 0.13% | 1285: 0.13% | 1629: 0.13% | 1875: 0.13% |
BSSE_QGF_138972_HM3H7DRXX_1_BC5922423_SI-GA-D5 | 281: 0.21% | 175: 0.14% | 278: 0.14% | 282: 0.14% | 315: 0.14% | 419: 0.14% | 513: 0.14% | 1010: 0.14% | 1054: 0.14% | 1352: 0.14% |
Clonal and Clonotype Accumulation Curves
This section visualizes the clonal and clonotype composition of each repertoire as a size-sorted accumulation curve.
SHM Rates
Violinplots allow examination of the distribution and composition of a metric. Somatic hypermutation rates provide a firsthand measure on the mutation status and thus affinity maturation levels per sample. In the plot below, the somatic hypermutation of each repertoire is visualized as a violinplot over V gene identity (i.e. % difference to closest germline) with mean indicated by a white point.
Amino Acid Physiochemical Properties
The physiochemical properties provide a swift overview of potential protein interaction metrics. Here we focus on 9 properties of the HCDR3 + somatic hypermutation rates across the V gene.Together these metrics allow a quick interpretation of which changes occur in response of e.g. a vaccination or antigenic stimulus.
Clonal Abundance and Diversity
Alakazam
provides methods for the inference of a
complete clonal abundance distribution along with two approaches
to assess the diversity of these distributions.
While observed abundances can be computed, they will not
provide confidence intervals. A complete clonal abundance
distribution may be inferred using the
estimateAbundance
function with confidence
intervals derived via bootstrapping. In this section we
partition the data on the sample column and calculate a 95%
confidence interval via 200 bootstrap realizations. Then, we
plot a rank abundance curve of the relative clonal
abundances.
Diversity curves as calculated with the alakazam
package allow comparison of different grades of diversity (q 1
to 4) for all samples of each repertoire.
From the alakazam
handbook:
This method, proposed by Hill (Hill, 1973
),
quantifies diversity as a smooth function (D
) of a
single parameter q
. Special cases of the
generalized diversity index correspond to the most popular
diversity measures in ecology: species richness
(q = 0
), the exponential of the Shannon-Weiner
index (q
approaches 1
), the inverse of
the Simpson index (q = 2
), and the reciprocal
abundance of the largest clone (q
approaches +\(\infty\)). At q = 0
different clones weight equally, regardless of their size. As
the parameter q
increases from 0
to
+\(\infty\) the diversity index
(D
) depends less on rare clones and more on common
(abundant) ones, thus encompassing a range of definitions that
can be visualized as a single curve.
Values of q < 0
are valid, but are generally
not meaningful. The value of D
at q=1
is estimated by D
at q=0.9999
.
Clonotype V Gene Usage
This section shows the family V genes with their frequency.