fit_logistic_model.Rd
fit_logistic_model() Uses TensorFlow to fit logistic regression model for classifying cells by attribute in colData slot of sce
. Most arguments have sensible defaults. Use history plot to determine if model is learning and that it performs equally well on training and validation sets.
plot_confusion() plot confusion matrix: number of cells assigned to observed class and predicted class
predict_logistic_prob() predict class assignments of cells in SingleCellExperiment object. Genes used to train the model are selected automatically.
fit_logistic_model(sce, y = NULL, assay_slot = "logcounts", y_col = "Cell_class", activation = "softmax", loss = c("categorical_crossentropy", "kullback_leibler_divergence")[1], regularizer = keras::regularizer_l1, penalty = 0.01, initializer = "random_uniform", optimizer = keras::optimizer_sgd(lr = 0.01, nesterov = TRUE), metrics = list("accuracy"), epochs = 100, validation_split = 0.3, validation_split_per_class = TRUE, callback_early_stopping_patience = 50, batch_size = 1000, shuffle = TRUE, verbose = TRUE, model = NULL) plot_confusion(confusion, normalize = FALSE, text_color = "grey60") predict_logistic_prob(sce, model_res, assay_slot = "logcounts", ref_y = NULL, ref_y_col = NULL, batch_size = 1000, verbose = TRUE)
sce | SingleCellExperiment object. |
---|---|
y | Optionally, you can provide your own cell labels as |
assay_slot | Assay slot in |
y_col | Column in |
activation | Activation function. Using "softmax" gives logistic regression. |
loss | Loss or Cost function that evaluates the difference between preditions and true labels and is minimised during model training. Use |
regularizer | Function to penalise high values of model parameters/weights and reduce overfitting (memorisation of the dataset). Details: regularizer_l1. L1 regularisation tends to push most weights to 0 (thus acting as feature selection method) and enforce sparse weights. L2 regularisation also reduces overfitting by keeping most weights small but does not shrink them to 0. Set to |
penalty | Regularisation penalty between 0 and 1. The higher the penalty the more stringent is the regularisation. Very high values can lead to poor model performance due to high bias (limited flexibility). Sensible values: 0.01 for regularizer_l1 and 0.5 for regularizer_l2. Change this parameter based history plot to make sure the model performs equally well on training and validation sets. |
initializer | Method of initialising weights and bias. You do not normally need to change this. See https://keras.io/initializers/ for details. |
optimizer | Which optimiser should be used to fit the model. You do not normally need to change this. See https://keras.io/optimizers/ for details. |
metrics | Metrics that evaluate performance of the model. Usually this is accuracy and loss function. See https://keras.io/metrics/ for details |
epochs | Number of training epochs. You do not normally need to change this. |
validation_split | What proportion of cells should be used for validation. |
validation_split_per_class | Do the validation split within each class to maintains proportion of classes in training and validation sets (TRUE)? |
callback_early_stopping_patience | Number of epochs to wait for improvement before stopping early. |
batch_size | Look at data in batches of |
shuffle | Logical, whether to shuffle the training data before each epoch? Details: fit.keras.engine.training.Model |
verbose | Logical. Show and plot diagnosic output (TRUE)? |
model | Provide your own keras/tensorflow model. Output units must be equal to the number of classes (columns of y), input_shape must be equal to nrow(sce). This can be used to extend logistic regression model by adding hidden layers |
confusion | The confusion table generated by table(), OR the output of ParetoTI::fit_logistic_model(), OR the output of ParetoTI::predict_logistic_prob() |
normalize | Normalise so that cell of each observed class sum to 1? |
text_color | Color of on-plot text showing absolute numbers of cells. |
model_res | Output of ParetoTI::fit_logistic_model(), class "logistic_model_fit_TF" |
ref_y | Reference cell labels as |
ref_y_col | Reference class column in |
# download PBMC data as SingleCellExperiment object # split in 2 parts # fit logistic regression model to 1st part # use this model to predict cell types in the second part