Classify cells with TensorFlow (logistic regression)

fit_logistic_model() Uses TensorFlow to fit logistic regression model for classifying cells by attribute in colData slot of sce. Most arguments have sensible defaults. Use history plot to determine if model is learning and that it performs equally well on training and validation sets.

plot_confusion() plot confusion matrix: number of cells assigned to observed class and predicted class

predict_logistic_prob() predict class assignments of cells in SingleCellExperiment object. Genes used to train the model are selected automatically.

fit_logistic_model(sce, y = NULL, assay_slot = "logcounts",
  y_col = "Cell_class", activation = "softmax",
  loss = c("categorical_crossentropy", "kullback_leibler_divergence")[1],
  regularizer = keras::regularizer_l1, penalty = 0.01,
  initializer = "random_uniform", optimizer = keras::optimizer_sgd(lr =
  0.01, nesterov = TRUE), metrics = list("accuracy"), epochs = 100,
  validation_split = 0.3, validation_split_per_class = TRUE,
  callback_early_stopping_patience = 50, batch_size = 1000,
  shuffle = TRUE, verbose = TRUE, model = NULL)

plot_confusion(confusion, normalize = FALSE, text_color = "grey60")

predict_logistic_prob(sce, model_res, assay_slot = "logcounts",
  ref_y = NULL, ref_y_col = NULL, batch_size = 1000,
  verbose = TRUE)

Arguments

sce	SingleCellExperiment object.
y	Optionally, you can provide your own cell labels as `y` matrix (dim = cells * classes) where each row should sum to 1. This can be continuous cell labels such as those from archetypal analysis (fit_pch) or NMF (non-negativa matrix factorisation). In such case, kullback_leibler_divergence is a more suitable cost function.
assay_slot	Assay slot in `sce` containing gene expression matrix (default is logcounts).
y_col	Column in`colData(sce)` containing cell labels.
activation	Activation function. Using "softmax" gives logistic regression.
loss	Loss or Cost function that evaluates the difference between preditions and true labels and is minimised during model training. Use `"categorical_crossentropy"` for discrete class labels and `"kullback_leibler_divergence"` for continuous labels that sum to 1.
regularizer	Function to penalise high values of model parameters/weights and reduce overfitting (memorisation of the dataset). Details: regularizer_l1. L1 regularisation tends to push most weights to 0 (thus acting as feature selection method) and enforce sparse weights. L2 regularisation also reduces overfitting by keeping most weights small but does not shrink them to 0. Set to `NULL` to not regularise. Both weights and bias are regularised the same way.
penalty	Regularisation penalty between 0 and 1. The higher the penalty the more stringent is the regularisation. Very high values can lead to poor model performance due to high bias (limited flexibility). Sensible values: 0.01 for regularizer_l1 and 0.5 for regularizer_l2. Change this parameter based history plot to make sure the model performs equally well on training and validation sets.
initializer	Method of initialising weights and bias. You do not normally need to change this. See https://keras.io/initializers/ for details.
optimizer	Which optimiser should be used to fit the model. You do not normally need to change this. See https://keras.io/optimizers/ for details.
metrics	Metrics that evaluate performance of the model. Usually this is accuracy and loss function. See https://keras.io/metrics/ for details
epochs	Number of training epochs. You do not normally need to change this.
validation_split	What proportion of cells should be used for validation.
validation_split_per_class	Do the validation split within each class to maintains proportion of classes in training and validation sets (TRUE)?
callback_early_stopping_patience	Number of epochs to wait for improvement before stopping early.
batch_size	Look at data in batches of `batch_size` cells. All batches will be seen in each epoch.
shuffle	Logical, whether to shuffle the training data before each epoch? Details: fit.keras.engine.training.Model
verbose	Logical. Show and plot diagnosic output (TRUE)?
model	Provide your own keras/tensorflow model. Output units must be equal to the number of classes (columns of y), input_shape must be equal to nrow(sce). This can be used to extend logistic regression model by adding hidden layers
confusion	The confusion table generated by table(), OR the output of ParetoTI::fit_logistic_model(), OR the output of ParetoTI::predict_logistic_prob()
normalize	Normalise so that cell of each observed class sum to 1?
text_color	Color of on-plot text showing absolute numbers of cells.
model_res	Output of ParetoTI::fit_logistic_model(), class "logistic_model_fit_TF"
ref_y	Reference cell labels as `ref_y` matrix (dim = cells * classes) where each row should sum to 1. Optional, it can be used to match continuous cell labels across datasets.
ref_y_col	Reference class column in `colData(sce)`. For example, not annotated clusters assigned by a clustering algorhitm. Optional, it can be used to match discrete cell labels across datasets.

Examples

# download PBMC data as SingleCellExperiment object

# split in 2 parts

# fit logistic regression model to 1st part

# use this model to predict cell types in the second part

Arguments

Examples

Contents