Annotate genes with GO terms and measure their activities

map_go_annot() annotates genes with Gene Ontology terms including less specific terms. It automatically loads correct OrgDb package given taxonomy_id. Then it selects GOALL annotations for genes provided in keys, filter by ontology branch and evidence codes. Finally it maps GO term names using GO.db and returns annotations in list and data.table formats. This function can be used to retrieve other gene annotations from OrgDb (e.g OMIM, PFAM) but those will not be mapped to readable names.

filter_go_annot() filters GO annotations by ontology file, GO branch, number of genes annotated

measure_activity() annotates genes with GO terms and measures their activities

load_go_ontology() load GO slim ontology file and return file path

map_gwas_annot() Load known GWAS loci for all phenotypes in GWAS Catalog. For each the keys find which phenotypes it is associated with using genes mapped by GWAS Catalog.

map_promoter_tf() load data mapping TF->target gene regulation via co-occurence of multiple motifs for the same TF in promoters (courtesy of Krzysztof Polanski who generated this resource). This TF datasets does not produce useful results with > 5 scRNA-seq datasets, so more bespoke approach is needed, and TF-target mapping can be provided with "custom" option of measure_activity().

map_go_annot(taxonomy_id = 9606, keys = c("TP53", "ALB"),
  columns = c("GOALL"), keytype = "ALIAS", ontology_type = c("BP",
  "MF", "CC"), evidence_code = "all", localHub = FALSE,
  return_org_db = FALSE,
  ann_hub_cache = AnnotationHub::getAnnotationHubOption("CACHE"))

filter_go_annot(annot, ontology_file = NULL, lower = 1, upper = Inf,
  ontology_type = c("BP", "MF", "CC"))

measure_activity(expr_mat, which = c("BP", "MF", "CC", "gwas",
  "custom")[1], activity_method = c("AUCell", "pseudoinverse")[1],
  keys = rownames(expr_mat), keytype = "ALIAS", ontology_file = NULL,
  taxonomy_id = 10090, evidence_code = "all", localHub = FALSE,
  ann_hub_cache = AnnotationHub::getAnnotationHubOption("CACHE"),
  lower = 1, upper = Inf, variant_context = NULL,
  return_as_matrix = FALSE, annot_dt = NULL,
  assay_name = "logcounts", aucell_options = list(aucMaxRank =
  nrow(expr_mat) * 0.05, binary = F, nCores = 3, plotStats = TRUE))

load_go_ontology(ont_dir = "./data/", ont_file = "goslim_generic.obo")

map_gwas_annot(taxonomy_id = 9606, keys = c("TP53", "ZZZ3"),
  keytype = "SYMBOL", localHub = FALSE,
  ann_hub_cache = AnnotationHub::getAnnotationHubOption("CACHE"),
  gwas_url = "https://www.ebi.ac.uk/gwas/api/search/downloads/alternative",
  gwas_file = "gwas_catalog_v1.0.2-associations_e93_r2019-01-31.tsv",
  lower = 1, upper = Inf, variant_context = NULL)

map_promoter_tf(keys = c("TP53", "ZZZ3"), keytype = "SYMBOL",
  lower = 1, upper = Inf)

Arguments

taxonomy_id	Taxonomy id of your species. 9606 is human, 10090 is mouse. Find your species id using taxonomy search by UniProt: https://www.uniprot.org/taxonomy/.
keys	gene or protein identifiers of `keytype`
columns	columns to retrieve. Default is GOALL: GO Identifiers (includes less specific terms), colsAndKeytypes.
keytype	type of `keys` identifiers. Use `keytypes(map_go_annot(taxonomy_id = 9606, return_org_db = T))` to find which keytypes are available.
ontology_type	specify which branch of gene ontology to retrieve annotations from.
evidence_code	specify which evidence should have been used to annotate genes with GO terms. Use `keys(map_go_annot(..., return_org_db = T), "EVIDENCE")` to find which codes exist. See http://www.geneontology.org/page/guide-go-evidence-codes for explanations and details.
localHub	set localHub = FALSE for working offline. Details AnnotationHub.
return_org_db	return OrgDb database link instead of gene annotations.
ann_hub_cache	If needed, set location of local AnnotationHub cache to specific directory by providing it here. To view where cache is currently located use getAnnotationHubOption("CACHE")
annot	GO annotations, output of map_go_annot()
ontology_file	path (url) to ontology file in obo format
lower	lower limit on size of gene sets (inclusive)
upper	upper limit on size of gene sets (exclusive)
expr_mat	expression matrix (genes in rows, cells in columns) or one of: dgCMatrix, ExpressionSet, and SummarizedExperiment or SingleCellExperiment both of which require assay_name.
which	which set activities to measure? Currently implemented "BP", "MF", "CC" gene ontology subsets. Use "gwas" for constructing gene sets with gwas_catalog v1.0.2. GWAS option works only with human hgnc identifiers. TF data is added supplied with the package and works only for human data, only SYMBOL and ENSEMBL can be used as keytype. Use "custom" to provide your own gene sets in `annot_dt` table.
activity_method	find activities using find_set_activity_AUCell or find_set_activity_pseudoinv?
variant_context	select traits (which = "gwas") based on variant context (coding, intronic, UTR, intergenic). Find availlable options by running `annot = map_gwas_annot(); t = table(annot$annot_dt$CONTEXT); t[!grepl(";\| x ", names(t))]`. Suggested options: `c("3_prime_UTR_variant", "5_prime_UTR_variant", "frameshift_variant", "inframe_deletion", "inframe_insertion", "intergenic_variant", "intron_variant", "missense_variant", "non_coding_transcript_exon_variant" "regulatory_region_variant", "splice_acceptor_variant", "splice_donor_variant", "splice_region_variant", "start_lost", "stop_gained", "stop_lost", "synonymous_variant", "TF_binding_site_variant", "TFBS_ablation", "upstream_gene_variant")`.
return_as_matrix	return matrix (TRUE) or data.table (FALSE)
annot_dt	data.table, data.frame or matrix containing custom gene set annotations. The 1st column should contain gene set identifier or name, the 2nd column should contain gene identifiers matching `keys`.
assay_name	name of assay in SummarizedExperiment or SingleCellExperiment, normally counts or logcounts
aucell_options	additional options specific to AUCell, details: find_set_activity_AUCell
ont_dir	directory where to save ontology
ont_file	which ontology file to load?
gwas_url	where to download annotations? Useful for other versions of GWAS Catalog.
gwas_file	name of the file where to store GWAS Catalog data locally (in ann_hub_cache directory).

Value

list (S3 class ParetoTI_annot) containing GO annotations in list and data.table formats, and OrgDb database link.

Details

GO consortium annotates genes with the most specific terms so there is a need to propagate annotations to less specific parent terms. For example, "apoptotic process in response to mitochondrial fragmentation" (GO:0140208) is an "apoptotic process" (GO:0006915). Conceptually, if gene functions in the first it also functions in the second but it is only directly annotated by the first term. So, using GO annotations for functional enrichment requires propagating annotations to less specific terms.

Examples

annot = map_go_annot(taxonomy_id = 9606,
                     keys = c("TP53", "ALB", "GPX1"),
                     columns = c("GOALL"), keytype = "ALIAS",
                     ontology_type = c("BP", "MF", "CC"))
#> snapshotDate(): 2018-10-24
#> downloading 0 resources
#> loading from cache 
#>     ‘/Users/vk7//.AnnotationHub/72902’
#> Loading required package: AnnotationDbi
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> Loading required package: parallel
#> 
#> Attaching package: ‘BiocGenerics’
#> The following objects are masked from ‘package:parallel’:
#> 
#>     clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
#>     clusterExport, clusterMap, parApply, parCapply, parLapply,
#>     parLapplyLB, parRapply, parSapply, parSapplyLB
#> The following objects are masked from ‘package:stats’:
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from ‘package:base’:
#> 
#>     anyDuplicated, append, as.data.frame, basename, cbind, colMeans,
#>     colnames, colSums, dirname, do.call, duplicated, eval, evalq,
#>     Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply,
#>     lengths, Map, mapply, match, mget, order, paste, pmax, pmax.int,
#>     pmin, pmin.int, Position, rank, rbind, Reduce, rowMeans, rownames,
#>     rowSums, sapply, setdiff, sort, table, tapply, union, unique,
#>     unsplit, which, which.max, which.min
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> Loading required package: IRanges
#> Loading required package: S4Vectors
#> 
#> Attaching package: ‘S4Vectors’
#> The following objects are masked from ‘package:data.table’:
#> 
#>     first, second
#> The following object is masked from ‘package:base’:
#> 
#>     expand.grid
#> 
#> Attaching package: ‘IRanges’
#> The following object is masked from ‘package:data.table’:
#> 
#>     shift
annot2 = filter_go_annot(annot, ontology_file = "http://www.geneontology.org/ontology/subsets/goslim_generic.obo",
           lower = 50, upper = 2000, ontology_type = "BP")
activ = measure_activity(expr_mat, which = "BP",
                         taxonomy_id = 9606, keytype = "ALIAS",
                         ontology_file = load_go_ontology("./data/",
                                                  "goslim_generic.obo"))
#> snapshotDate(): 2018-10-24
#> downloading 0 resources
#> loading from cache 
#>     ‘/Users/vk7//.AnnotationHub/72902’
#> Error in rownames(expr_mat): object 'expr_mat' not found

Annotate genes with GO terms and measure their activities

Arguments

Value

Details

Examples

Contents