The article was co-written with Pan Liu, postdoctoral researcher at UCLA and Fred Hutchinson Most cancers Heart. Pan is the primary writer of the mcRigor Nature Communications article.
Single-cell sequencing applied sciences have superior quickly in recent times, offering unprecedented alternatives to uncover mobile variety, dynamic adjustments in cell states, and underlying gene regulatory mechanisms. Along with the broadly used single-cell RNA sequencing (scRNA-seq) 1,2, new modalities equivalent to single-cell chromatin accessibility sequencing (scATAC-seq) 3,4 and joint profiling of transcriptome and chromatin accessibility (scMultiome) 5 have enabled the dissection of mobile heterogeneity at single-cell decision throughout a number of omics layers. Nonetheless, the info generated by these applied sciences are sometimes extremely sparse, primarily on account of restricted sequencing depth per cell, in addition to imperfect reverse transcription and nonlinear amplification, which trigger extremely expressed genes to dominate sequencing capability and make lowly expressed genes tough to detect 6.

To alleviate knowledge sparsity and noise, researchers proposed the “metacell” idea, by which cells with comparable expression profiles are aggregated right into a single consultant unit—a metacell—whose expression is outlined by the imply expression of its constituent cells, thereby enhancing sign and lowering noise. But, current metacell building strategies usually yield considerably totally different metacell partitions and are extremely delicate to hyperparameter settings, notably the common metacell measurement 7. Such lack of consistency makes it tough for customers to find out which metacell partition is extra reliable and to what extent the ensuing metacell profiles protect true organic alerts. Consequently, the robustness of downstream analyses is compromised, and the potential of metacells as a basic knowledge preprocessing framework throughout numerous duties and omics modalities stays restricted.
Our Nature Communications paper 8 offers a rigorous statistical definition of a metacell primarily based on a two-layer mannequin of single-cell sequencing knowledge: the higher layer captures the organic variation in true expression, whereas the decrease layer fashions the sequencing course of that generates measured expression from the true expression. Constructing on this definition, we develop mcRigor, a statistical framework for detecting doubtful metacells inside a given partition and choosing the optimum metacell partitioning technique and hyperparameter throughout candidate method-hyperparameter configurations.
mcRigor not solely detects and removes doubtful metacells (its prolonged model, mcRigor two-step, additional disassembles doubtful metacells into single cells and re-assembles them into smaller, extra dependable ones), thereby bettering the reliability of downstream analyses equivalent to gene co-expression and enhancer–gene regulation, but additionally permits data-driven number of probably the most appropriate metacell partitioning technique for every dataset. Owing to its versatile compatibility, mcRigor will be readily utilized to single-cell transcriptomic, chromatin accessibility, and multi-omic knowledge (Fig. 2). As well as, mcRigor offers a unified analysis criterion for benchmarking totally different metacell building strategies, providing dependable steering for researchers in technique choice.
Within the first a part of our paper 8, we introduce mcRigor’s methodology for detecting doubtful metacells. Particularly, mcRigor quantifies the interior heterogeneity of every metacell utilizing a feature-correlation-based statistic, mcDiv, which measures the deviation of function–function correlations from independence. The rationale is that if all member cells share the identical true expression ranges and the noticed variation amongst them arises purely from the measurement course of, the options must be roughly impartial. mcRigor then constructs a null distribution for mcDiv utilizing a novel double permutation process and identifies metacells that considerably deviate from this null as doubtful (Fig. 2a).
In each semi-simulated and actual PBMC datasets, mcRigor precisely distinguishes reliable metacells from doubtful ones (Fig. 2b–c). We additional exhibit mcRigor’s effectiveness in bettering the reliability of a number of downstream analyses. In cell-line knowledge analyses, eradicating doubtful metacells markedly will increase the signal-to-noise ratio of cell-cycle marker genes (Fig. 2nd). In COVID-19 versus wholesome management knowledge analyses, mcRigor eliminates spurious gene correlations brought on by doubtful metacells and divulges stronger co-expression inside adaptive immune response modules (Fig. 2e). In scMultiome knowledge analyses, mcRigor enhances the detectability of enhancer–gene associations, filtering out weakly supported false positives whereas preserving alerts according to these noticed on the single-cell stage (Fig. 2f).


Within the second a part of our paper 8, we current mcRigor’s methodology for evaluating metacell partitions and optimizing hyperparameters. By balancing metacell trustworthiness in opposition to knowledge sparsity, mcRigor assigns an total analysis rating to every candidate partition and robotically selects the optimum technique–parameter configuration amongst all candidates, thereby reworking the empirical technique of technique and parameter tuning into data-driven automated decision-making (Fig. 3a).
We illustrate the utility of this optimization performance throughout numerous downstream duties. As an illustration, the zero proportion of mcRigor-optimized metacells intently matches the gold-standard zero proportion measured by smRNA-FISH, demonstrating its capacity to tell apart technical zeros from organic zeros (Fig. 3b). In differential expression evaluation, outcomes primarily based on mcRigor-optimized metacells align extra intently with these obtained from bulk RNA-seq knowledge, indicating improved reliability (Fig. 3c). In time-course knowledge, mcRigor-optimized metacells improve trajectory decision and reveal clearer gene-expression dynamics according to experimental proof (Fig. 3d).
The mcRigor R bundle and on-line tutorials can be found at https://jsb-ucla.github.io/mcRigor/
Full paper obtainable at https://www.nature.com/articles/s41467-025-63626-5
References:
8. Liu, P. & Li, J. J. mcRigor: a statistical technique to reinforce the rigor of metacell partitioning in single-cell knowledge evaluation. bioRxiv (2024) doi:10.1101/2024.10.30.621093.
















