geo_sketch.Rd
geo_sketch()
Create a downsampled representation of the data while preserving the structure and rare populations with Geometric Sketch method (https://github.com/brianhie/geosketch, https://www.biorxiv.org/content/10.1101/536730v1). This method splits gene expression space into hypercubes (covering boxes) of equal volume and samples uniformly from those cubes. This reduces the density of the data and cell numbers while preserving global structure. Useful for fitting polytopes to large datasets. First step - find PCs using Facebook PCA method, second step - define hypercubes and sample cells.
geo_sketch(data, N, assay_slot = "logcounts", use_PCs = TRUE, PCs = 100, k = "auto", seed = 4563, replace = FALSE, alpha = 0.1, max_iter = 200, verbose = 0, check_installed = TRUE)
data | matrix, sparse matrix or SingleCellExperiment (SummarizedExperiment): dim(dimensions * examples) |
---|---|
N | number of cells to sample |
assay_slot | slot in data (SingleCellExperiment, SummarizedExperiment) containing matrix to use for PCA and geometric sampling |
use_PCs | logical, use PCs (TRUE) or data directly (FALSE)? |
PCs | number of PCs to use. Identified using Facebook implementation of PCA (fbpca). |
k | Number of covering boxes. When `'auto'` and replace is `True`, draws sqrt(X.shape[0]) covering boxes. When `'auto'` and replace is `False`, draws N covering boxes. |
seed | Random number generation seed passed to numpy |
alpha | Binary search halts when it obtains between `k * (1 - alpha)` and `k * (1 + alpha)` covering boxes. |
max_iter | Maximum iterations at which to terminate binary seach in rare case of non-monotonicity of covering boxes with box side length. |
verbose | report progress |
geo_sketch()
integer vector of cell indices