Full_interactomes_interspecies.Rmd
This article shows how to install PItools package and download protein interaction data for one species (for example, human), 2 species/taxomony groups (for example, human and viruses) and how to filter this data by interaction detection method, publication ID (PMID) or list of protein IDs (UniprotKB).
if(!"PItools" %in% installed.packages()){
# Install R PItools package
install.packages("BiocManager") # for installing BioConductor dependencies
BiocManager::install("vitkl/PItools", dependencies = T)
}
suppressPackageStartupMessages({
library(PItools)
})
## Warning: replacing previous import 'IRanges::desc' by 'plyr::desc' when
## loading 'PSICQUIC'
## Warning: package 'data.table' was built under R version 3.5.2
## Warning: package 'ontologyIndex' was built under R version 3.5.2
## Warning: package 'R.utils' was built under R version 3.5.2
Interactome search by species is taxonomy-hierarchy aware.
human = fullInteractome(taxid = 9606, database = "IntActFTP",
format = "tab27", clean = TRUE,
protein_only = TRUE,
directory = NULL) # keep data files inside R library - default
## ... looking for the date of the latest IntAct release ...
## ... looking for the date of the latest IntAct release ...
## ... loading local copy ...
## Warning in fread(file_name, header = T, stringsAsFactors = F): Found
## and resolved improper quoting out-of-sample. First healed line 25383:
## <<uniprotkb:P16054 uniprotkb:Q05769 intact:EBI-298451 intact:EBI-298933|
## uniprotkb:Q543K3 psi-mi:kpce_mouse(display_long)|uniprotkb:Prkce(gene
## name)|psi-mi:Prkce(display_short)|uniprotkb:Pkce(gene name synonym)|
## uniprotkb:Pkcea(gene name synonym)|uniprotkb:nPKC-epsilon(gene name
## synonym) psi-mi:pgh2_mouse(display_long)|uniprotkb:Ptgs2(gene name)|
## psi-mi:Ptgs2(display_short)|uniprotkb:Cox-2(gene name synonym)|
## uniprotkb:Cox2(gene name synonym)|uniprotkb:Pghs-b(gene name synonym)|
## uniprotkb:Tis10(gene na>>. If the fields are not quoted (e.g. field
## separator does not appear within any field), try quote="" to avoid this
## warning.
human_mouse = interSpeciesInteractome(taxid1 = 9606, taxid2 = 10090,
database = "IntActFTP", format = "tab27",
clean = TRUE, protein_only = TRUE)
## ... looking for the date of the latest IntAct release ...
## ... looking for the date of the latest IntAct release ...
## ... loading local copy ...
## Warning in fread(file_name, header = T, stringsAsFactors = F): Found
## and resolved improper quoting out-of-sample. First healed line 25383:
## <<uniprotkb:P16054 uniprotkb:Q05769 intact:EBI-298451 intact:EBI-298933|
## uniprotkb:Q543K3 psi-mi:kpce_mouse(display_long)|uniprotkb:Prkce(gene
## name)|psi-mi:Prkce(display_short)|uniprotkb:Pkce(gene name synonym)|
## uniprotkb:Pkcea(gene name synonym)|uniprotkb:nPKC-epsilon(gene name
## synonym) psi-mi:pgh2_mouse(display_long)|uniprotkb:Ptgs2(gene name)|
## psi-mi:Ptgs2(display_short)|uniprotkb:Cox-2(gene name synonym)|
## uniprotkb:Cox2(gene name synonym)|uniprotkb:Pghs-b(gene name synonym)|
## uniprotkb:Tis10(gene na>>. If the fields are not quoted (e.g. field
## separator does not appear within any field), try quote="" to avoid this
## warning.
uniqueNinteractions(human)
## [1] 196473
uniqueNinteractors(human)
## [1] 21521
uniqueNinteractions(human_viral)
## [1] 15003
uniqueNinteractors(human_viral)
## [1] 5538
uniqueNinteractions(human_mouse)
## [1] 24960
uniqueNinteractors(human_mouse)
## [1] 10114
# subset two-hybrid interactions
human_two_hybrid = subsetMITABbyMethod(MITABdata = human,
Interaction_detection_methods = "MI:0018")
## downloading MI ontology from http://purl.obolibrary.org/obo/mi.obo
uniqueNinteractions(human_two_hybrid)
## [1] 83919
uniqueNinteractors(human_two_hybrid)
## [1] 14396
# subset all interactions but two-hybrid
human_NOT_two_hybrid = subsetMITABbyMethod(MITABdata = human,
Interaction_detection_methods = "MI:0018", inverse_filter = T)
## loading local copy of MI ontology
uniqueNinteractions(human_NOT_two_hybrid)
## [1] 116651
uniqueNinteractors(human_NOT_two_hybrid)
## [1] 16616
# subset affinity purification - mass spectrometry interactions
human_AP_MS = subsetMITABbyMethod(MITABdata = human,
Interaction_detection_methods = "MI:0004", Identification_method = "MI:0433")
## loading local copy of MI ontology
uniqueNinteractions(human_AP_MS)
## [1] 69044
uniqueNinteractors(human_AP_MS)
## [1] 12465
# subset both published and unpublished Vidal group data
Vidal_all = subsetMITABbyPMIDs(MITABdata = human,
PMIDs = c("25416956", "unassigned1304"))
uniqueNinteractions(Vidal_all)
## [1] 59037
uniqueNinteractors(Vidal_all)
## [1] 9756
# subset Mattias Mann 2015 paper data
Mann = subsetMITABbyPMIDs(MITABdata = human,
PMIDs = "26496610")
uniqueNinteractions(Mann)
## [1] 15589
uniqueNinteractors(Mann)
## [1] 4950
You can get help and more details on these functions (for example, how to find which molecular ontology terms correspond to which methods): ?subsetMITABbyMethod
mediator_complex_proteins = fread("https://www.uniprot.org/uniprot/?query=GO:0016592%20AND%20taxonomy:9606&format=tab&columns=id")
mediator_complex = subsetMITABbyID(Vidal_all,
ID_seed = mediator_complex_proteins$Entry,
within_seed = T, only_seed2nonseed = F)
uniqueNinteractions(mediator_complex)
## [1] 5
uniqueNinteractors(mediator_complex)
## [1] 7
mediator_complex_interactions = subsetMITABbyID(Vidal_all,
ID_seed = mediator_complex_proteins$Entry,
within_seed = F, only_seed2nonseed = T)
uniqueNinteractions(mediator_complex_interactions)
## [1] 179
uniqueNinteractors(mediator_complex_interactions)
## [1] 179
This can be useful in you need interactions for a small number of proteins or if you want to query non-IMEx databases. Note that queryPSICQUIC doesn’t keep track of database version data, while queryPSICQUICrlib does.
# Query for interactions of bacterial RNA polymerase sigma factor SigA identified using two-hybrid methods in all imex databases
queryPSICQUIC(query = "id:P74565 AND detmethod:\"MI:0018\"",
format = "tab27",
database = "imex",
file = "./P74565_2H_interactions_imex_tab27.tsv")
## Warning in if (N_interactions > 0) {: the condition has length > 1 and only
## the first element will be used
## Warning in N_SPECIES_ID_interactome[indices] <- N_interactions: number of
## items to replace is not a multiple of replacement length
## query
## 1: id:P74565 AND detmethod:%22MI:0018%22
## 2: id:P74565 AND detmethod:%22MI:0018%22
## 3: id:P74565 AND detmethod:%22MI:0018%22
## 4: id:P74565 AND detmethod:%22MI:0018%22
## 5: id:P74565 AND detmethod:%22MI:0018%22
## 6: id:P74565 AND detmethod:%22MI:0018%22
## 7: id:P74565 AND detmethod:%22MI:0018%22
## 8: id:P74565 AND detmethod:%22MI:0018%22
## 9: id:P74565 AND detmethod:%22MI:0018%22
## 10: id:P74565 AND detmethod:%22MI:0018%22
## 11: id:P74565 AND detmethod:%22MI:0018%22
## file format all.databases
## 1: ./P74565_2H_interactions_imex_tab27.tsv tab27 IntAct
## 2: ./P74565_2H_interactions_imex_tab27.tsv tab27 MINT
## 3: ./P74565_2H_interactions_imex_tab27.tsv tab27 bhf-ucl
## 4: ./P74565_2H_interactions_imex_tab27.tsv tab27 MPIDB
## 5: ./P74565_2H_interactions_imex_tab27.tsv tab27 MatrixDB
## 6: ./P74565_2H_interactions_imex_tab27.tsv tab27 HPIDb
## 7: ./P74565_2H_interactions_imex_tab27.tsv tab27 I2D-IMEx
## 8: ./P74565_2H_interactions_imex_tab27.tsv tab27 InnateDB-IMEx
## 9: ./P74565_2H_interactions_imex_tab27.tsv tab27 MolCon
## 10: ./P74565_2H_interactions_imex_tab27.tsv tab27 UniProt
## 11: ./P74565_2H_interactions_imex_tab27.tsv tab27 MBInfo
## n.interactions.in.database database.not.active
## 1: 44
## 2: 0
## 3: 0
## 4: 0
## 5: <html>
## 6: 0
## 7: I2D-IMEx
## 8: InnateDB-IMEx
## 9: MolCon
## 10: 0
## 11: 0
# Query for interactions of sigma factor SigA identified using two-hybrid methods in mentha (a database that aggregates data from all primary databases, but does no interaction predition)
queryPSICQUIC(query = "id:P74565 AND detmethod:\"MI:0018\"",
format = "tab25",
database = "mentha",
file = "./P74565_2H_interactions_mentha_tab25.tsv")
## query
## 1: id:P74565 AND detmethod:%22MI:0018%22
## file format all.databases
## 1: ./P74565_2H_interactions_mentha_tab25.tsv tab25 mentha
## n.interactions.in.database database.not.active
## 1: 44
# Query for interactions of sigma factor SigA in mentha
queryPSICQUIC(query = "id:P74565",
format = "tab25",
database = "mentha",
file = "./P74565_2H_interactions_mentha_tab25.tsv")
## query file format
## 1: id:P74565 ./P74565_2H_interactions_mentha_tab25.tsv tab25
## all.databases n.interactions.in.database database.not.active
## 1: mentha 44
# Retrieve interaction of any proteins encoded by Beta-adrenergic receptor kinase 1 gene (Entrez GeneID 156) from BioGRID (which recognises only this type of ID)
queryPSICQUIC(query = "id:156",
format = "tab25",
database = "BioGrid",
file = "./entrezgene156_interactions_BioGrid_tab25.tsv")
## query file format
## 1: id:156 ./entrezgene156_interactions_BioGrid_tab25.tsv tab25
## all.databases n.interactions.in.database database.not.active
## 1: BioGrid 82
# The function return the report of how many interaction were found in each database, not the data itself. Reading data into R.
fread("./entrezgene156_interactions_BioGrid_tab25.tsv", header = T, stringsAsFactors = F)[1:5]
## V1 V2
## 1: entrez gene/locuslink:156 entrez gene/locuslink:5579
## 2: entrez gene/locuslink:156 entrez gene/locuslink:5148
## 3: entrez gene/locuslink:156 entrez gene/locuslink:6714
## 4: entrez gene/locuslink:6714 entrez gene/locuslink:156
## 5: entrez gene/locuslink:5148 entrez gene/locuslink:156
## V3
## 1: biogrid:106665|entrez gene/locuslink:ADRBK1
## 2: biogrid:106665|entrez gene/locuslink:ADRBK1
## 3: biogrid:106665|entrez gene/locuslink:ADRBK1
## 4: biogrid:112592|entrez gene/locuslink:SRC|entrez gene/locuslink:RP5-823N20.1
## 5: biogrid:111174|entrez gene/locuslink:PDE6G
## V4
## 1: biogrid:111565|entrez gene/locuslink:PRKCB
## 2: biogrid:111174|entrez gene/locuslink:PDE6G
## 3: biogrid:112592|entrez gene/locuslink:SRC|entrez gene/locuslink:RP5-823N20.1
## 4: biogrid:106665|entrez gene/locuslink:ADRBK1
## 5: biogrid:106665|entrez gene/locuslink:ADRBK1
## V5
## 1: entrez gene/locuslink:BARK1(gene name synonym)|entrez gene/locuslink:BETA-ARK1(gene name synonym)|entrez gene/locuslink:GRK2(gene name synonym)
## 2: entrez gene/locuslink:BARK1(gene name synonym)|entrez gene/locuslink:BETA-ARK1(gene name synonym)|entrez gene/locuslink:GRK2(gene name synonym)
## 3: entrez gene/locuslink:BARK1(gene name synonym)|entrez gene/locuslink:BETA-ARK1(gene name synonym)|entrez gene/locuslink:GRK2(gene name synonym)
## 4: entrez gene/locuslink:ASV(gene name synonym)|entrez gene/locuslink:SRC1(gene name synonym)|entrez gene/locuslink:c-SRC(gene name synonym)|entrez gene/locuslink:p60-Src(gene name synonym)
## 5: entrez gene/locuslink:PDEG(gene name synonym)|entrez gene/locuslink:RP57(gene name synonym)
## V6
## 1: entrez gene/locuslink:PKC-beta(gene name synonym)|entrez gene/locuslink:PKCB(gene name synonym)|entrez gene/locuslink:PRKCB1(gene name synonym)|entrez gene/locuslink:PRKCB2(gene name synonym)
## 2: entrez gene/locuslink:PDEG(gene name synonym)|entrez gene/locuslink:RP57(gene name synonym)
## 3: entrez gene/locuslink:ASV(gene name synonym)|entrez gene/locuslink:SRC1(gene name synonym)|entrez gene/locuslink:c-SRC(gene name synonym)|entrez gene/locuslink:p60-Src(gene name synonym)
## 4: entrez gene/locuslink:BARK1(gene name synonym)|entrez gene/locuslink:BETA-ARK1(gene name synonym)|entrez gene/locuslink:GRK2(gene name synonym)
## 5: entrez gene/locuslink:BARK1(gene name synonym)|entrez gene/locuslink:BETA-ARK1(gene name synonym)|entrez gene/locuslink:GRK2(gene name synonym)
## V7 V8
## 1: psi-mi:MI:0004(affinity chromatography technology) Yang XL (2003)
## 2: psi-mi:MI:0004(affinity chromatography technology) Wan KF (2003)
## 3: psi-mi:MI:0004(affinity chromatography technology) Wan KF (2003)
## 4: psi-mi:MI:0004(affinity chromatography technology) Wan KF (2003)
## 5: psi-mi:MI:0004(affinity chromatography technology) Wan KF (2003)
## V9 V10 V11
## 1: pubmed:12679936 taxid:9606 taxid:9606
## 2: pubmed:12624098 taxid:9606 taxid:9606
## 3: pubmed:12624098 taxid:9606 taxid:9606
## 4: pubmed:12624098 taxid:9606 taxid:9606
## 5: pubmed:12624098 taxid:9606 taxid:9606
## V12 V13
## 1: psi-mi:MI:0915(physical association) psi-mi:MI:0463(biogrid)
## 2: psi-mi:MI:0915(physical association) psi-mi:MI:0463(biogrid)
## 3: psi-mi:MI:0915(physical association) psi-mi:MI:0463(biogrid)
## 4: psi-mi:MI:0915(physical association) psi-mi:MI:0463(biogrid)
## 5: psi-mi:MI:0915(physical association) psi-mi:MI:0463(biogrid)
## V14 V15
## 1: biogrid:259364 -
## 2: biogrid:259971 -
## 3: biogrid:259972 -
## 4: biogrid:259974 -
## 5: biogrid:259977 -
All the same operations can be done using function queryPSICQUICrlib but with the convienience of automatic tracking of database release date and the exact query text. This function also return the data in object of class RAW_MItab that after cleaned make data ready for use with other tools in the package.
BioGrid_156 = queryPSICQUICrlib(query = "id:156",
format = "tab25",
database = "BioGrid",
directory = "./")
## downloading using PSICQUIC
# The same protein, but only two-hybrid interactions
BioGrid_156_2H = queryPSICQUICrlib(query = "id:156 AND detmethod:\"MI:0018\"",
format = "tab25",
database = "BioGrid",
directory = "./")
## downloading using PSICQUIC
# The data returned by queryPSICQUICrlib constains auxillary information that is not necessary for most analysis. Let's clean the data.
BioGrid_156 = cleanMITAB(BioGrid_156)
BioGrid_156$data[1:5]
## IDs_interactor_A IDs_interactor_B interactor_IDs_databases_A
## 1: 156 5579 entrez gene/locuslink
## 2: 156 5148 entrez gene/locuslink
## 3: 156 6714 entrez gene/locuslink
## 4: 156 2776 entrez gene/locuslink
## 5: 156 2769 entrez gene/locuslink
## interactor_IDs_databases_B Taxid_interactor_A Taxid_interactor_B
## 1: entrez gene/locuslink 9606 9606
## 2: entrez gene/locuslink 9606 9606
## 3: entrez gene/locuslink 9606 9606
## 4: entrez gene/locuslink 9606 9606
## 5: entrez gene/locuslink 9606 9606
## Publication_Identifiers Confidence_values pair_id
## 1: 12679936 NA 156|5579
## 2: 12624098 NA 156|5148
## 3: 12624098 NA 156|6714
## 4: 12885252 NA 156|2776
## 5: 12885252 NA 156|2769
Sys.Date. = Sys.Date()
Sys.Date.
## [1] "2019-04-11"
session_info. = devtools::session_info()
session_info.
## ─ Session info ──────────────────────────────────────────────────────────
## setting value
## version R version 3.5.1 (2018-07-02)
## os macOS High Sierra 10.13.6
## system x86_64, darwin15.6.0
## ui X11
## language (EN)
## collate en_GB.UTF-8
## ctype en_GB.UTF-8
## tz Europe/London
## date 2019-04-11
##
## ─ Packages ──────────────────────────────────────────────────────────────
## package * version date lib source
## AnnotationDbi 1.44.0 2018-10-30 [1] Bioconductor
## assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.5.1)
## backports 1.1.3 2018-12-14 [1] CRAN (R 3.5.0)
## Biobase 2.42.0 2018-10-30 [1] Bioconductor
## BiocGenerics * 0.28.0 2018-10-30 [1] Bioconductor
## biomaRt * 2.38.0 2018-10-30 [1] Bioconductor
## Biostrings 2.50.2 2019-01-03 [1] Bioconductor
## bit 1.1-14 2018-05-29 [1] CRAN (R 3.5.0)
## bit64 0.9-7 2017-05-08 [1] CRAN (R 3.5.0)
## bitops 1.0-6 2013-08-17 [1] CRAN (R 3.5.0)
## blob 1.1.1 2018-03-25 [1] CRAN (R 3.5.0)
## callr 3.2.0 2019-03-15 [1] CRAN (R 3.5.2)
## cli 1.1.0 2019-03-19 [1] CRAN (R 3.5.2)
## commonmark 1.7 2018-12-01 [1] CRAN (R 3.5.0)
## crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.0)
## curl 3.3 2019-01-10 [1] CRAN (R 3.5.2)
## data.table * 1.12.0 2019-01-13 [1] CRAN (R 3.5.2)
## DBI 1.0.0 2018-05-02 [1] CRAN (R 3.5.0)
## desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.0)
## devtools 2.0.1 2018-10-26 [1] CRAN (R 3.5.1)
## digest 0.6.18 2018-10-10 [1] CRAN (R 3.5.0)
## downloader * 0.4 2015-07-09 [1] CRAN (R 3.5.0)
## evaluate 0.13 2019-02-12 [1] CRAN (R 3.5.2)
## fs 1.2.7 2019-03-19 [1] CRAN (R 3.5.2)
## glue 1.3.1 2019-03-12 [1] CRAN (R 3.5.2)
## gsubfn * 0.7 2018-03-16 [1] CRAN (R 3.5.0)
## hms 0.4.2 2018-03-10 [1] CRAN (R 3.5.0)
## htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.5.0)
## httr * 1.4.0 2018-12-11 [1] CRAN (R 3.5.0)
## IRanges * 2.16.0 2018-10-30 [1] Bioconductor
## jsonlite * 1.6 2018-12-07 [1] CRAN (R 3.5.0)
## knitr 1.22 2019-03-08 [1] CRAN (R 3.5.1)
## magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.0)
## MASS 7.3-50 2018-04-30 [2] CRAN (R 3.5.1)
## memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.0)
## ontologyIndex * 2.5 2019-01-08 [1] CRAN (R 3.5.2)
## PItools * 0.1.41 2019-04-11 [1] local
## pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.5.1)
## pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.5.0)
## pkgdown 1.3.0 2018-12-07 [1] CRAN (R 3.5.0)
## pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.0)
## plyr * 1.8.4 2016-06-08 [1] CRAN (R 3.5.0)
## prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.0)
## processx 3.3.0 2019-03-10 [1] CRAN (R 3.5.2)
## progress 1.2.0 2018-06-14 [1] CRAN (R 3.5.0)
## proto * 1.0.0 2016-10-29 [1] CRAN (R 3.5.0)
## ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.0)
## PSICQUIC * 1.20.0 2018-10-30 [1] Bioconductor
## R.methodsS3 * 1.7.1 2016-02-16 [1] CRAN (R 3.5.0)
## R.oo * 1.22.0 2018-04-22 [1] CRAN (R 3.5.0)
## R.utils * 2.8.0 2019-02-14 [1] CRAN (R 3.5.2)
## R6 2.4.0 2019-02-14 [1] CRAN (R 3.5.2)
## Rcpp 1.0.1 2019-03-17 [1] CRAN (R 3.5.2)
## RCurl 1.95-4.12 2019-03-04 [1] CRAN (R 3.5.2)
## remotes 2.0.2 2018-10-30 [1] CRAN (R 3.5.0)
## rlang 0.3.1 2019-01-08 [1] CRAN (R 3.5.2)
## rmarkdown 1.12 2019-03-14 [1] CRAN (R 3.5.2)
## roxygen2 6.1.1 2018-11-07 [1] CRAN (R 3.5.0)
## rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.0)
## RSQLite 2.1.1 2018-05-06 [1] CRAN (R 3.5.0)
## rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.5.2)
## S4Vectors * 0.20.1 2018-11-09 [1] Bioconductor
## sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.0)
## stringi 1.4.3 2019-03-12 [1] CRAN (R 3.5.2)
## stringr 1.4.0 2019-02-10 [1] CRAN (R 3.5.2)
## testthat 2.0.1 2018-10-13 [1] CRAN (R 3.5.1)
## usethis 1.5.0 2019-04-07 [1] CRAN (R 3.5.1)
## withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.0)
## xfun 0.6 2019-04-02 [1] CRAN (R 3.5.1)
## XML 3.98-1.19 2019-03-06 [1] CRAN (R 3.5.2)
## xml2 1.2.0 2018-01-24 [1] CRAN (R 3.5.0)
## XVector 0.22.0 2018-10-30 [1] Bioconductor
## yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.0)
## zlibbioc 1.28.0 2018-10-30 [1] Bioconductor
##
## [1] /Users/vk7/Library/R/3.5/library
## [2] /Library/Frameworks/R.framework/Versions/3.5/Resources/library