geo_sketch() Create a downsampled representation of the data while preserving the structure and rare populations with Geometric Sketch method (https://github.com/brianhie/geosketch, https://www.biorxiv.org/content/10.1101/536730v1). This method splits gene expression space into hypercubes (covering boxes) of equal volume and samples uniformly from those cubes. This reduces the density of the data and cell numbers while preserving global structure. Useful for fitting polytopes to large datasets. First step - find PCs using Facebook PCA method, second step - define hypercubes and sample cells.

geo_sketch(data, N, assay_slot = "logcounts", use_PCs = TRUE,
  PCs = 100, k = "auto", seed = 4563, replace = FALSE,
  alpha = 0.1, max_iter = 200, verbose = 0, check_installed = TRUE)

Arguments

data

matrix, sparse matrix or SingleCellExperiment (SummarizedExperiment): dim(dimensions * examples)

N

number of cells to sample

assay_slot

slot in data (SingleCellExperiment, SummarizedExperiment) containing matrix to use for PCA and geometric sampling

use_PCs

logical, use PCs (TRUE) or data directly (FALSE)?

PCs

number of PCs to use. Identified using Facebook implementation of PCA (fbpca).

k

Number of covering boxes. When `'auto'` and replace is `True`, draws sqrt(X.shape[0]) covering boxes. When `'auto'` and replace is `False`, draws N covering boxes.

seed

Random number generation seed passed to numpy

alpha

Binary search halts when it obtains between `k * (1 - alpha)` and `k * (1 + alpha)` covering boxes.

max_iter

Maximum iterations at which to terminate binary seach in rare case of non-monotonicity of covering boxes with box side length.

verbose

report progress

Value

geo_sketch() integer vector of cell indices