12 QC - control quality of features

The quality control (QC) report evaluates whether the extracted features are linked to experimental groups or attended effects, but can also detect unattended effects (clustering problem, etc.).

Analysis Feature	Type	Description
Number of Cells Barplot	graphic	Displays cell counts per `.fcs` file, highlighting potential batch effects and identifying samples with low cell counts.
Number of Cells Per Quantiles	table	Summarizes the distribution of cell counts across `.fcs` files using quantiles, aiding in identifying outliers and general data trends.
Number of Cells per `.fcs` Histogram	graphic	Visualizes the distribution of cell counts across `.fcs` files, marking key quantiles to illustrate significant statistical boundaries.
Barplot Number of Cells per Clusters	graphic	Illustrates the distribution of cells across manually gated clusters, showing the percentage and cumulative percentage of cells per cluster.
Density Heatmap	graphic	Visually checks the standardization method and ensures similar abundance ranges across samples using a special distance method.
MFI’s Heatmap	graphic	Displays the median fluorescence intensity (MFI) of markers across clusters, allowing for comparison of MFI patterns across experimental conditions or sample groups.
Mfi x Abundance Heatmap	graphic	Combines information about marker expression and cluster abundance, helping users understand expression patterns and cluster distribution.
Abundance Heatmap	graphic	Shows the percentage of cells in each cluster for each sample, helping identify outliers and check if experimental conditions have distinct cluster abundance patterns.
PCA of Samples Using Percentages per Cluster	graphic	Visualizes samples in a reduced dimensional space based on cluster percentages, assessing sample homogeneity, identifying outliers, and determining if experimental conditions can be separated.
Barplot of Variance of Each PC	graphic	Shows the percentage of explained variance for each principal component in the PCA analysis.
Contribution and Correlation of Variables to Each Axis	tables	Provides tables to analyze PCA results, including contribution, correlation, and R2 tables, helping identify key variables driving sample distribution and variability.
Heatmap Top 10 cos2 for Each First 3 Dimensions Combination	graphic	Shows the top 10 markers with the highest cos2 values for each combination of the first three dimensions in the MFI PCA analysis, helping interpret key markers driving sample separation or clustering.

12.1 Number of Cells Barplot

The “Number of cell per fcs” displays the cell counts for each .fcs file, sorted by condition if provided. If batch annotations are available, the plot will feature batch colors around the graph for enhanced visibility of potential batch effects.

Annotations on the plot assist in quickly identifying variations, facilitating effective monitoring and quality control. Below the graph, the documentation lists the .fcs files with the fewest cells to highlight samples that may require further examination.

12.2 Number of Cells Per Quantiles table

This table presents the distribution of cell counts across all .fcs files, summarized by quantiles.

It provides key statistical measures at five points: This quantile breakdown helps in understanding the distribution and variability of cell counts within your dataset, aiding in identifying outliers and general data trends.

12.3 Number of Cells per `.fcs` Histogram

This histogram visualizes the distribution of cell counts across all .fcs files, with the count of .fcs on the y-axis and cell numbers on the x-axis.

Key quantiles—Q1 (25%), Q2 (50%, median), Q3 (75%), and Q4 (100%, maximum)—are marked with red dashed lines to illustrate significant statistical boundaries within the data.

This histogram provides a clear graphical representation of how cell counts are spread among the .fcs files, helping to quickly identify patterns and anomalies in the dataset.

12.4 Barplot Number of Cells per Clusters

This barplot illustrates the distribution of cells across different clusters, identified through manual gating. It displays the percentage of total cells per cluster and the cumulative percentage, providing a detailed look at cell distribution patterns. Barplot Details

Y-Axis (Left): Shows the percentage of total cells in each cluster.
Y-Axis (Right): Displays the cumulative percentage of cells, aiding in understanding the overall distribution as it accumulates across clusters.
Cumulative Annotation: Each cluster is sorted by the percentage of cells it contains, with annotations indicating the cumulative percentages, helping to visualize how cell counts build up across clusters.

Below the barplot, the clusters with the lowest percentage of cells are listed to highlight areas with minimal cell counts. This information is crucial for assessing the efficiency and effectiveness of the gating process.

12.5 Density Heatmap

A special distance method, denoted as ‘ks,’ measures the similarity between distributions by computing the Kolmogorov-Smirnov statistic between two distributions.

for each sample/fcs a density of Argsinh frequencies / 0.03 then centered by the mean per clusters with density going from blue to red

The aim is to visually check the standardisation method and ensure that the abundance ranges of the samples are similar.

12.6 MFI’s Heatmap

The MFI’s Heatmap is a visualization that displays the median fluorescence intensity (MFI) of markers across clusters in a cytometry dataset.

The input for this heatmap is the MFI cluster x marker array, where the MFI values have been previously transformed using an asinh(intensity/cofactor) function. The heatmap is organized with clusters in rows and markers in columns, providing a clear overview of the MFI distribution across the dataset. Alongside the heatmap, a barplot is presented to show the number of cells in each metacluster, offering insights into the relative sizes of the clusters.

Initially, the heatmap covers all .fcs files in the dataset, allowing for a comprehensive view of the MFI patterns. However, if a specific condition is selected for rendering the document, additional heatmaps are generated to display the MFIs for each subgroup within that condition. its feature enables users to compare and contrast the MFI patterns across different experimental conditions or sample groups. The MFI’s Heatmap serves as a valuable tool for visually checking the consistency of MFI values between samples and ensuring that the data is comparable across the dataset. By providing a clear and concise representation of the MFI distribution, this visualization aids in the interpretation and quality control of data.

12.7 Mfi x Abundance Heatmap

The Mfi x Abundance Heatmap is a two-part visualization that combines information about marker expression and cluster abundance in cytometry data.

The first heatmap shows the normalized median fluorescence intensity (MFI) of each marker per cluster, allowing for easy comparison of marker expression across clusters. Markers are grouped hierarchically for better interpretation.
The second heatmap displays the relative abundances of each cluster per .fcs file, transformed relative to the cluster mean. The consistent green-purple color scale is used throughout the abundance matrix.
Sample metadata is provided on the left side of the abundance heatmap, offering context for interpreting cluster abundances.

12.8 Abundance Heatmap

The Abundance Heatmap is a graph that shows the percentage of cells in each cluster for each sample in your cytometry dataset. The heatmap has samples in columns and clusters in rows, with the color intensity representing the percentage of cells. The graph helps you:

See if samples within groups or conditions have similar cluster abundances
Identify samples that are different from others (outliers)
Check if experimental conditions have distinct cluster abundance patterns

The heatmap also has a barplot on the right side showing the group abundance values on a log scale, colored by condition if one was selected. This graph is useful for checking data quality and exploring how cluster abundances vary across samples and conditions in your dataset.

13 PCA of Samples Using Percentages per Cluster

The PCA (Principal Component Analysis) graph is a visualization of your samples in a reduced dimensional space based on the percentages of cells in each cluster. This graph helps you assess the homogeneity of sample states, identify outliers or gender effects, and determine if experimental conditions can be separated. The input for the PCA is a table with samples as rows and clusters as columns, containing the percentage of cells in each cluster for each sample. Before the analysis, the percentages are transformed using an asinh(%+cte) function with a constant (cte) of 0.03 and then centered by cluster. The PCA graph is generated using the factormineR PCA function and visualized with factoextra functions. It is important to check that the first 2 or 3 principal components retain the main information from your dataset. In the PCA graph, each point represents a sample, and the distance between points reflects their similarity in terms of cluster percentages. Samples with similar cluster abundances will be closer together, while samples with different abundances will be farther apart. By examining the PCA graph, you can:

Assess the homogeneity of sample states within groups or conditions
Identify potential outliers or samples affected by factors like gender
Determine if experimental conditions form distinct clusters, indicating their impact on cluster abundances

This graph provides a concise summary of the relationships between your samples based on their cluster composition, aiding in data quality assessment and exploratory analysis of your cytometry dataset.

13.1 Barplot of Variance of Each PC

The barplot of variance shows the percentage of explained variance for each principal component (PC) in the PCA analysis, both for abundance and MFI data. The bars represent the PCs, with the height indicating the proportion of variance explained. The first few dimensions capturing the most significant information are displayed.

PCA The PCA graph visualizes the samples in the reduced dimensional space, with colors representing annotations of interest. Each point is a sample, and colors correspond to different categories or conditions. Circles highlight clustering or separation of groups. This graph assesses the impact of annotated variables on sample distribution.

Biplot The biplot combines sample points and variable loadings, showing relationships between samples and contributions of variables (clusters or markers) to the principal components. Correlations between variables and PCs are annotated if their cos2 values exceed the mean cos2 for the axes of interest, identifying key variables driving sample separation or clustering.

These graphs provide a comprehensive understanding of PCA results, exploring sample relationships, annotation impact, and main factors contributing to variance in the cytometry dataset.

14 Contribution and Correlation of variables to each axis

In addition to the PCA graphs, three tables are provided to further analyze the results:

Contribution Table: The contribution table shows the contributions of each variable (cluster or marker) to the first five principal components. Only variables that contribute at least 1% to one of these dimensions are included in the table. This table helps identify the key variables that have a significant impact on the variance captured by each principal component. Variables with high contributions are considered important in shaping the sample distribution in the reduced-dimensional space.
Correlation Table: If quantitative associated variables have been detected in the dataset, a correlation table is generated. This table displays the correlations between the quantitative variables and the principal components.High absolute values of correlation indicate a strong relationship between a variable and a specific principal component. This information can help interpret the meaning of the principal components and understand how the quantitative variables are related to the sample distribution in the PCA space.
R2 Table: If qualitative associated variables have been detected in the dataset, an R2 table is provided. This table shows the R-squared values, which measure the proportion of variance in the principal components that can be explained by each qualitative variable. High R-squared values suggest that a qualitative variable has a strong influence on the sample distribution along the corresponding principal component. This information can help identify the categorical factors that significantly contribute to the observed patterns in the PCA results.

These tables complement the PCA graphs by providing numerical summaries of the variable contributions, correlations, and explanatory power. They assist in interpreting the PCA results and identifying the key factors driving the sample distribution and variability in the cytometry dataset.

14.1 Heatmap Top 10 cos2 for Each First 3 Dimensions Combination

The MFI PCA analysis includes heatmaps that show the top 10 markers with the highest cos2 values for each combination of the first three dimensions (Dim.1 vs Dim.2, Dim.1 vs Dim.3, and Dim.2 vs Dim.3). The cos2 value indicates how well a marker is represented and contributes to the variability captured by the corresponding dimensions. A high cos2 value means a marker has a strong influence on the sample distribution in the PCA space. For each dimension combination, a separate heatmap is generated.