find_decreasing.Rd
find_decreasing()
Fits gam models to find features that are a decreasing function of distance from archetype. Both gam functions and first derivatives can be visualised using plot() method.
fit_arc_gam_1()
Finds single GAM model fit and it's first derivative for a single feature and one or several archetypes.
get_top_decreasing()
Find genes highest at each archetype above p-values threshold, and print top-12 genes and top-3 gene sets for each archetype.
find_decreasing_wilcox()
find features that are a decreasing function of distance from archetype by finding features with highest value (median) in bin closest to archetype (1 vs all Wilcox test).
bin_cells_by_arch()
find which cells are in bin closest to archetype.
find_tradeoff_wilcox()
: find features that are most different between 2 archetypes (at a tradeoff, DE, differentially expressed genes) by finding features with highest value (median) in bin closest to archetype (1 vs all Wilcox test).
find_decreasing(data_attr, arc_col, features = c("Gpx1", "Alb", "Cyp2e1", "Apoa2")[3], min.sp = c(60), N_smooths = 4, n_points = 200, d = 1/n_points, weights = c(rep(1, each = n_points), rep(c(1, 0), each = n_points/2))[1], return_only_summary = FALSE, stop_at_10 = TRUE, one_arc_per_model = TRUE, type = c("s", "m", "cmq")[1], clust_options = list(), ...) fit_arc_gam_1(feature, col, N_smooths, data_attr, min.sp, ..., d, n_points, weights) get_top_decreasing(summary_genes, summary_sets = NULL, cutoff_genes = 0.01, cutoff_sets = 0.01, cutoff_metric = "wilcoxon_p_val", p.adjust.method = c("fdr", "none")[1], gam_fit_pval = 0.01, invert_cutoff = FALSE, order_by = "mean_diff", order_decreasing = TRUE, min_max_diff_cutoff_g = 0.3, min_max_diff_cutoff_f = 0.1) find_decreasing_wilcox(data_attr, arc_col, features = c("Gpx1", "Alb", "Cyp2e1", "Apoa2")[3], bin_prop = 0.1, dist_cutoff = NULL, na.rm = FALSE, type = c("s", "m", "cmq")[1], clust_options = list(), method = c("BioQC", "r_stats")[1]) bin_cells_by_arch(data_attr, arc_col, bin_prop = 0.1, dist_cutoff = NULL, return_names = FALSE) find_tradeoff_wilcox(data_attr, arc_col = c("archetype_1", "archetype_2"), features = c("Gpx1", "Alb", "Cyp2e1", "Apoa2")[3], bin_prop = 0.1, na.rm = FALSE)
data_attr | data.table dim(examples, dimensions) that includes distance of each example to archetype in columns given by |
---|---|
arc_col | character vector, columns that give distance to archetypes (column per archetype) |
features | character vector (1L), column than containg feature values |
min.sp | lower bound for the smoothing parameter, details: gam. Default value of 60 works well to stabilise curve shape near min and max distance |
N_smooths | number of bases used to represent the smooth term (s), 4 for cubic splines |
n_points | number of points at which to evaluate derivative |
d | numeric vector (1L), finite difference interval |
weights | how to weight points along x axis when calculating mean (integral) probability. Useful if you care that the function is decreasing near the archetype but not far away. Two defaults suggest to weight point equally or discard bottom 50 percent. |
return_only_summary | return only summary data.table containing p-values for each feature at each archetype and effect-size measures (average derivative). |
stop_at_10 | prevents |
one_arc_per_model | If TRUE fit separate gam models for each archetype. If FALSE combine all archetypes in one model: feature ~ s(arc1) + s(arc2) + ... + s(arcN). |
type | one of s, m, cmq. s means single core processing using lapply. m means multi-core parallel procession using parLapply. cmq means multi-node parallel processing on a computing cluster using clustermq package. |
clust_options | list of options for parallel processing. The default for "m" is list(cores = parallel::detectCores()-1, cluster_type = "PSOCK"). The default for "cmq" is list(memory = 2000, template = list(), n_jobs = 10, fail_on_error = FALSE). Change these options as required. |
... | arguments passed to gam |
summary_genes | gam_deriv summary data.table for decreasing genes |
summary_sets | gam_deriv summary data.table for decreasing gene sets |
cutoff_genes | value of cutoff_metric (lower bound) for genes |
cutoff_sets | value of cutoff_metric (lower bound) for gene sets |
cutoff_metric | probability metric for selecting decreasing genes: mean_prob, prod_prob, mean_prob_excl or prod_prob_excl |
p.adjust.method | choose method for correcting p-value for multiple hypothesis testing. See p.adjust.methods and p.adjust for details. |
gam_fit_pval | smooth term probability in gam fit (upper bound) |
invert_cutoff | invert cutoff for genes and sets. If FALSE p < cutoff_genes, if TRUE p > cutoff_genes. |
order_by | order decreasing feature list by measure in summary sets. By default is mean_diff, the average difference between cells in bin closest to archetype and all other cells. When using GAM instead of Wilcox test set this to one of c( "deriv100", "deriv50", "deriv20"), the average value of derivative at 20/50/100 percent of points closest to archetype. |
order_decreasing | order significant categories using |
min_max_diff_cutoff_g | what should be the mean difference (log-ratio, when y is log-space) of gene expression at the point closest to archetype compared to point furthest from archetype? When Wilcox method was used it is difference between mean of bin closest to archetype and all other cells. By default, at least 0.3 for genes and 0.1 for functions. |
min_max_diff_cutoff_f | see min_max_diff_cutoff_g |
bin_prop | proportion of data to put in bin closest to archetype |
dist_cutoff | cutoff of cell distances to archetypes (high bound) to put cells into in bin closest to archetype. |
method | how to find_decreasing_wilcox()? Use wmwTest or wilcox.test. BioQC::wmwTest can be up to 1000 times faster, so it is default. |
return_names | return list of indices of cells or names of cells? |
find_decreasing()
list (S3 object, gam_deriv) containing summary p-values for features and each archetype, function call and (optionally) a data.table with values of the first derivative
fit_arc_gam_1()
list containing function call, 1st derivative values of GAM model (derivs), summary of GAM model (p-value and r^2, gam_sm)
get_top_decreasing()
print summary to output, and return list with character vector with one element for each archetype, and 2 data.table- with selection of enriched genes and functions.
find_decreasing_wilcox()
data.table containing p-values for each feature at each archetype and effect-size measures (average difference between bins). When log(counts) was used mean_diff reflects log-fold change.
bin_cells_by_arch()
list of indices of cells or names of cells that are in bin closest to each archetype
find_tradeoff_wilcox()
data.table containing p-values for each feature at each archetype and effect-size measures (average difference between bins). When log(counts) was used mean_diff reflects log-fold change.