map_go_annot() annotates genes with Gene Ontology terms including less specific terms. It automatically loads correct OrgDb package given taxonomy_id. Then it selects GOALL annotations for genes provided in keys, filter by ontology branch and evidence codes. Finally it maps GO term names using GO.db and returns annotations in list and data.table formats. This function can be used to retrieve other gene annotations from OrgDb (e.g OMIM, PFAM) but those will not be mapped to readable names.

filter_go_annot() filters GO annotations by ontology file, GO branch, number of genes annotated

measure_activity() annotates genes with GO terms and measures their activities

load_go_ontology() load GO slim ontology file and return file path

map_gwas_annot() Load known GWAS loci for all phenotypes in GWAS Catalog. For each the keys find which phenotypes it is associated with using genes mapped by GWAS Catalog.

map_promoter_tf() load data mapping TF->target gene regulation via co-occurence of multiple motifs for the same TF in promoters (courtesy of Krzysztof Polanski who generated this resource). This TF datasets does not produce useful results with > 5 scRNA-seq datasets, so more bespoke approach is needed, and TF-target mapping can be provided with "custom" option of measure_activity().

map_go_annot(taxonomy_id = 9606, keys = c("TP53", "ALB"),
  columns = c("GOALL"), keytype = "ALIAS", ontology_type = c("BP",
  "MF", "CC"), evidence_code = "all", localHub = FALSE,
  return_org_db = FALSE,
  ann_hub_cache = AnnotationHub::getAnnotationHubOption("CACHE"))

filter_go_annot(annot, ontology_file = NULL, lower = 1, upper = Inf,
  ontology_type = c("BP", "MF", "CC"))

measure_activity(expr_mat, which = c("BP", "MF", "CC", "gwas",
  "custom")[1], activity_method = c("AUCell", "pseudoinverse")[1],
  keys = rownames(expr_mat), keytype = "ALIAS", ontology_file = NULL,
  taxonomy_id = 10090, evidence_code = "all", localHub = FALSE,
  ann_hub_cache = AnnotationHub::getAnnotationHubOption("CACHE"),
  lower = 1, upper = Inf, variant_context = NULL,
  return_as_matrix = FALSE, annot_dt = NULL,
  assay_name = "logcounts", aucell_options = list(aucMaxRank =
  nrow(expr_mat) * 0.05, binary = F, nCores = 3, plotStats = TRUE))

load_go_ontology(ont_dir = "./data/", ont_file = "goslim_generic.obo")

map_gwas_annot(taxonomy_id = 9606, keys = c("TP53", "ZZZ3"),
  keytype = "SYMBOL", localHub = FALSE,
  ann_hub_cache = AnnotationHub::getAnnotationHubOption("CACHE"),
  gwas_url = "https://www.ebi.ac.uk/gwas/api/search/downloads/alternative",
  gwas_file = "gwas_catalog_v1.0.2-associations_e93_r2019-01-31.tsv",
  lower = 1, upper = Inf, variant_context = NULL)

map_promoter_tf(keys = c("TP53", "ZZZ3"), keytype = "SYMBOL",
  lower = 1, upper = Inf)

Arguments

taxonomy_id

Taxonomy id of your species. 9606 is human, 10090 is mouse. Find your species id using taxonomy search by UniProt: https://www.uniprot.org/taxonomy/.

keys

gene or protein identifiers of keytype

columns

columns to retrieve. Default is GOALL: GO Identifiers (includes less specific terms), colsAndKeytypes.

keytype

type of keys identifiers. Use keytypes(map_go_annot(taxonomy_id = 9606, return_org_db = T)) to find which keytypes are available.

ontology_type

specify which branch of gene ontology to retrieve annotations from.

evidence_code

specify which evidence should have been used to annotate genes with GO terms. Use keys(map_go_annot(..., return_org_db = T), "EVIDENCE") to find which codes exist. See http://www.geneontology.org/page/guide-go-evidence-codes for explanations and details.

localHub

set localHub = FALSE for working offline. Details AnnotationHub.

return_org_db

return OrgDb database link instead of gene annotations.

ann_hub_cache

If needed, set location of local AnnotationHub cache to specific directory by providing it here. To view where cache is currently located use getAnnotationHubOption("CACHE")

annot

GO annotations, output of map_go_annot()

ontology_file

path (url) to ontology file in obo format

lower

lower limit on size of gene sets (inclusive)

upper

upper limit on size of gene sets (exclusive)

expr_mat

expression matrix (genes in rows, cells in columns) or one of: dgCMatrix, ExpressionSet, and SummarizedExperiment or SingleCellExperiment both of which require assay_name.

which

which set activities to measure? Currently implemented "BP", "MF", "CC" gene ontology subsets. Use "gwas" for constructing gene sets with gwas_catalog v1.0.2. GWAS option works only with human hgnc identifiers. TF data is added supplied with the package and works only for human data, only SYMBOL and ENSEMBL can be used as keytype. Use "custom" to provide your own gene sets in annot_dt table.

activity_method

find activities using find_set_activity_AUCell or find_set_activity_pseudoinv?

variant_context

select traits (which = "gwas") based on variant context (coding, intronic, UTR, intergenic). Find availlable options by running annot = map_gwas_annot(); t = table(annot$annot_dt$CONTEXT); t[!grepl(";| x ", names(t))]. Suggested options: c("3_prime_UTR_variant", "5_prime_UTR_variant", "frameshift_variant", "inframe_deletion", "inframe_insertion", "intergenic_variant", "intron_variant", "missense_variant", "non_coding_transcript_exon_variant" "regulatory_region_variant", "splice_acceptor_variant", "splice_donor_variant", "splice_region_variant", "start_lost", "stop_gained", "stop_lost", "synonymous_variant", "TF_binding_site_variant", "TFBS_ablation", "upstream_gene_variant").

return_as_matrix

return matrix (TRUE) or data.table (FALSE)

annot_dt

data.table, data.frame or matrix containing custom gene set annotations. The 1st column should contain gene set identifier or name, the 2nd column should contain gene identifiers matching keys.

assay_name

name of assay in SummarizedExperiment or SingleCellExperiment, normally counts or logcounts

aucell_options

additional options specific to AUCell, details: find_set_activity_AUCell

ont_dir

directory where to save ontology

ont_file

which ontology file to load?

gwas_url

where to download annotations? Useful for other versions of GWAS Catalog.

gwas_file

name of the file where to store GWAS Catalog data locally (in ann_hub_cache directory).

Value

list (S3 class ParetoTI_annot) containing GO annotations in list and data.table formats, and OrgDb database link.

Details

GO consortium annotates genes with the most specific terms so there is a need to propagate annotations to less specific parent terms. For example, "apoptotic process in response to mitochondrial fragmentation" (GO:0140208) is an "apoptotic process" (GO:0006915). Conceptually, if gene functions in the first it also functions in the second but it is only directly annotated by the first term. So, using GO annotations for functional enrichment requires propagating annotations to less specific terms.

Examples

annot = map_go_annot(taxonomy_id = 9606, keys = c("TP53", "ALB", "GPX1"), columns = c("GOALL"), keytype = "ALIAS", ontology_type = c("BP", "MF", "CC"))
#> snapshotDate(): 2018-10-24
#> downloading 0 resources
#> loading from cache #> ‘/Users/vk7//.AnnotationHub/72902’
#> Loading required package: AnnotationDbi
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#> #> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:parallel’: #> #> clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, #> clusterExport, clusterMap, parApply, parCapply, parLapply, #> parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from ‘package:stats’: #> #> IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’: #> #> anyDuplicated, append, as.data.frame, basename, cbind, colMeans, #> colnames, colSums, dirname, do.call, duplicated, eval, evalq, #> Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply, #> lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int, #> pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames, #> rowSums, sapply, setdiff, sort, table, tapply, union, unique, #> unsplit, which, which.max, which.min
#> Loading required package: Biobase
#> Welcome to Bioconductor #> #> Vignettes contain introductory material; view with #> 'browseVignettes()'. To cite Bioconductor, see #> 'citation("Biobase")', and for packages 'citation("pkgname")'.
#> Loading required package: IRanges
#> Loading required package: S4Vectors
#> #> Attaching package: ‘S4Vectors’
#> The following objects are masked from ‘package:data.table’: #> #> first, second
#> The following object is masked from ‘package:base’: #> #> expand.grid
#> #> Attaching package: ‘IRanges’
#> The following object is masked from ‘package:data.table’: #> #> shift
annot2 = filter_go_annot(annot, ontology_file = "http://www.geneontology.org/ontology/subsets/goslim_generic.obo", lower = 50, upper = 2000, ontology_type = "BP") activ = measure_activity(expr_mat, which = "BP", taxonomy_id = 9606, keytype = "ALIAS", ontology_file = load_go_ontology("./data/", "goslim_generic.obo"))
#> snapshotDate(): 2018-10-24
#> downloading 0 resources
#> loading from cache #> ‘/Users/vk7//.AnnotationHub/72902’
#> Error in rownames(expr_mat): object 'expr_mat' not found