poisson_regression.Rd
Poisson regression models gene expression (Y) as a function of gene mean and sample covariates (X) mu = beta * X Y ~ Poisson(mu)
compute_gene_deviance(data, family = "poisson", covar = NULL, precision = c("double", "single"), verbose = FALSE, optimiser = greta::adam(), max_iterations = 5000, tolerance = 1e-06) poisson_regression(data, covar = NULL, beta_mean = 0, beta_sd = 3, precision = c("double", "single"))
data | data matrix (data points * dimensions), can be dense and sparse matrix, SummarizedExperiment/SingleCellExperiment, Seurat (counts slot is used). |
---|---|
family | character naming the data distribution |
covar | matrix (data points * covariates) or vector of column names (for compute_gene_deviance() and SingleCellExperiment, Seurat) containing covariates affecting expression in addition to gene mean (coverage, batch). Adding this will find genes whose deviance (residuals) is unexplained both by these covariates and Poisson noise (covar = NULL tests Poisson noise alone). |
precision | argument for model. Use "single" for large datasets to reduce memory footprint |
verbose | logical, plot greta model structure and print messages? |
optimiser | method to use for finding regression coefficients and deviance when adding covariates see opt |
max_iterations | number of iterations to run optimiser for. |
tolerance | the numerical tolerance for the solution, the optimiser stops when the (absolute) difference in the joint density between successive iterations drops below this level. |
beta_mean | prior mean for coefficients |
beta_sd | prior sd for coefficients, use small values to regularise (e.g. penalise coefficients that deviate too much from 0) |
compute_gene_deviance()
: list containing the deviance vector with dimension names (genes) as names, beta coefficient matrix (dimensions * coeffs) and greta model used to compute those. For SingleCellExperiment the same object with beta coeffecients and deviance as rowData is returned. For Seurat the same object is returned updated with beta coeffecients and deviance in Seurat::GetAssay(obj, "RNA")@meta.features
.
poisson_regression()
: R environment containing the model and parameters as greta arrays
# Use fake data as example # Random data that fits into the triangle set.seed(4355) arc_data = generate_arc(arc_coord = list(c(7, 3, 10), c(12, 17, 11), c(30, 20, 9)), mean = 0, sd = 1) data = generate_data(arc_data$XC, N_examples = 1e4, jiiter = 0.04, size = 0.9) # Take Poisson sample with the mean defined by each entry of the data matrix # (this create Poisson-distributed positive integer data) data = matrix(rpois(length(data), (data)), nrow(data), ncol(data)) # Compute deviance from the mean (residuals for Poisson data) dev = compute_gene_deviance(t(data)) # As you can see, the third dimension has lowest deviance dev#> $deviance #> [1] 19179.31 16610.19 10733.26 #> #> $beta #> beta_mean #> [1,] 2.639314 #> [2,] 2.546119 #> [3,] 2.158645 #> #> $model #> NULL #># because the vertices of the triangle have almost identical position in third dimension. plot_arc(arc_data = arc_data, data = data, which_dimensions = 1:3, data_alpha = 0.5)#>#> #>#>#># You can use deviance to find which dimension have variability to be explained with Archetypal Analysis # Create a probabilistic Poisson regression model with greta # to study effects of covariates on Poisson data (requires greta installed)# NOT RUN { model = poisson_regression(t(data), covar = matrix(rnorm(ncol(data)), ncol(data), 1)) # plot the structure of tensorflow computation graph plot(model$model) # find parameters using adam optimiser res = greta::opt(model$model, optimiser = greta::adam(), max_iterations = 500) # did the model converge before 500 iterations? res$iterations # Value of Poisson negative log likelihood (see greta documentation for details) res$value # View beta parameters for each dimension (columns), log(mean) in the first row, # covariate coefficients in the subsequent rows res$par$beta # }