This article and accompanying slides cover fundamentals of working with PPI data and the problem of study bias in protein interactions networks.

Outline

This workshop covers the following topics:
0. Brief intro into R Markdown
1. The fundamentals of protein-protein interactions (PPI) and networks (lecture)
2. Getting PPI data into R (from IntAct, using PSICQUIC)
3. Finding publication (Pubmed) IDs for every protein
4. Why should you care about the study bias in your network and how to evaluate it

1. The fundamentals of protein-protein interactions (PPI) and networks (lecture)

The PowerPoint presentation for this lecture can be found in the project folder (PPI_networks_and_study_bias.pptx).

2. Getting PPI data into R (from IntAct, using PSICQUIC)

You can download protein-protein interaction data directly from IntAct or using PSICQUIC and PSICQUIC client implemented in PSICQUIC R package. Both ways are implemented in PItools package. Downloading data from IntAct gets you all data from all IMEx consortium databases. PSICQUIC route allows to query non-IMEx databases and construct complex search queries.
As implemented in PItools, PSICQUIC route is slow to download (limit of 2500 interactions per second to lower the load on servers). IntAct route is faster but requires memory intensive processing locally (3 GB table).

First, let’s focus on the IntAct route. Later we will download molecular interactions from a large non-IMEx database called BioGRID.

2.1 Getting PPI data into R from IntAct

Load full human-human interactome and count interactions

By species search is taxonomy-hierarchy aware.

# find the date of the latest locally available release
human = fullInteractome(taxid = 9606, database = "IntActFTP", # 9606 - human taxid
                        clean = TRUE,
                        protein_only = TRUE,
                        directory = "./data_files/", # NULL to keep data files inside R library - default
                        releaseORdate = NULL) # useful to keep track of the release date e.g. 2019Mar23, but for the first download set to NULL 
## ... looking for the date of the latest IntAct release ...
## ... looking for the date of the latest IntAct release ...
## ... loading local copy ...
## Warning in fread(file_name, header = T, stringsAsFactors = F): Found
## and resolved improper quoting out-of-sample. First healed line 25383:
## <<uniprotkb:P16054 uniprotkb:Q05769 intact:EBI-298451 intact:EBI-298933|
## uniprotkb:Q543K3 psi-mi:kpce_mouse(display_long)|uniprotkb:Prkce(gene
## name)|psi-mi:Prkce(display_short)|uniprotkb:Pkce(gene name synonym)|
## uniprotkb:Pkcea(gene name synonym)|uniprotkb:nPKC-epsilon(gene name
## synonym) psi-mi:pgh2_mouse(display_long)|uniprotkb:Ptgs2(gene name)|
## psi-mi:Ptgs2(display_short)|uniprotkb:Cox-2(gene name synonym)|
## uniprotkb:Cox2(gene name synonym)|uniprotkb:Pghs-b(gene name synonym)|
## uniprotkb:Tis10(gene na>>. If the fields are not quoted (e.g. field
## separator does not appear within any field), try quote="" to avoid this
## warning.

Load all human-viral interactions from IntAct, BioGRID, VirHostNet

10239 is the top viral clade identifier.

# read all IntAct data (filter later)
IntAct = loadIntActFTP(dir = "./data_files/IntActRelease_2019Mar23/")
## ... looking for the date of the latest IntAct release ...
## ... loading local copy ...
## Warning in fread(file_name, header = T, stringsAsFactors = F): Found
## and resolved improper quoting out-of-sample. First healed line 25383:
## <<uniprotkb:P16054 uniprotkb:Q05769 intact:EBI-298451 intact:EBI-298933|
## uniprotkb:Q543K3 psi-mi:kpce_mouse(display_long)|uniprotkb:Prkce(gene
## name)|psi-mi:Prkce(display_short)|uniprotkb:Pkce(gene name synonym)|
## uniprotkb:Pkcea(gene name synonym)|uniprotkb:nPKC-epsilon(gene name
## synonym) psi-mi:pgh2_mouse(display_long)|uniprotkb:Ptgs2(gene name)|
## psi-mi:Ptgs2(display_short)|uniprotkb:Cox-2(gene name synonym)|
## uniprotkb:Cox2(gene name synonym)|uniprotkb:Pghs-b(gene name synonym)|
## uniprotkb:Tis10(gene na>>. If the fields are not quoted (e.g. field
## separator does not appear within any field), try quote="" to avoid this
## warning.
# filter for viral-human interaction
human_viral = interSpeciesInteractome(MITABdata = IntAct,
                                      taxid1 = 9606,  taxid2 = 10239,
                                      database = "IntActFTP",
                                      clean = TRUE, protein_only = TRUE, 
                                      directory = "./data_files/",
                        releaseORdate = "2019Mar23")
## Warning in fread(filename): Found and resolved improper quoting out-of-
## sample. First healed line 5839: <<542959 Bpp1virus "BPP-1-like viruses";
## BPP-1-like phages; Bpp-1-like viruses; Bppunalikevirus Genus Viruses; dsDNA
## viruses, no RNA stage; Caudovirales; Podoviridae 10744 >>. If the fields
## are not quoted (e.g. field separator does not appear within any field), try
## quote="" to avoid this warning.
## Warning in fread(filename): Found and resolved improper quoting out-of-
## sample. First healed line 5839: <<542959 Bpp1virus "BPP-1-like viruses";
## BPP-1-like phages; Bpp-1-like viruses; Bppunalikevirus Genus Viruses; dsDNA
## viruses, no RNA stage; Caudovirales; Podoviridae 10744 >>. If the fields
## are not quoted (e.g. field separator does not appear within any field), try
## quote="" to avoid this warning.
uniqueNinteractions(human_viral)
## [1] 15003
uniqueNinteractors(human_viral, taxid = 9606)
## [1] 4644
uniqueNinteractors(human_viral, taxid = 9606, inverse_filter = T)
## [1] 894
# retrieve all VirHostNet data 
# which does not understand taxonomy hierarhy so interSpeciesInteractome() will not work
VirHostNet_raw = queryPSICQUICrlib(query = "taxid:9606",
                                   format = "tab27", database = "VirHostNet",
                                   directory = "./data_files/",
                                   releaseORdate = NULL)
## downloading using PSICQUIC
VirHostNet = interSpeciesInteractome(taxid1 = 9606,  taxid2 = 10239, 
                                     MITABdata = VirHostNet_raw,
                                     clean = TRUE, protein_only = TRUE, 
                                     directory = "./data_files/")
## ... looking for the date of the latest IntAct release ...
## Warning in fread(filename): Found and resolved improper quoting out-of-
## sample. First healed line 5839: <<542959 Bpp1virus "BPP-1-like viruses";
## BPP-1-like phages; Bpp-1-like viruses; Bppunalikevirus Genus Viruses; dsDNA
## viruses, no RNA stage; Caudovirales; Podoviridae 10744 >>. If the fields
## are not quoted (e.g. field separator does not appear within any field), try
## quote="" to avoid this warning.

## Warning in fread(filename): Found and resolved improper quoting out-of-
## sample. First healed line 5839: <<542959 Bpp1virus "BPP-1-like viruses";
## BPP-1-like phages; Bpp-1-like viruses; Bppunalikevirus Genus Viruses; dsDNA
## viruses, no RNA stage; Caudovirales; Podoviridae 10744 >>. If the fields
## are not quoted (e.g. field separator does not appear within any field), try
## quote="" to avoid this warning.
# filter out human-human and viral-viral interactions 
VirHostNet$data = VirHostNet$data[!Taxid_interactor_A == Taxid_interactor_B]
uniqueNinteractions(VirHostNet)
## [1] 23109
uniqueNinteractors(VirHostNet, taxid = 9606)
## [1] 6216
uniqueNinteractors(VirHostNet, taxid = 9606, inverse_filter = T)
## [1] 766
# Find which interactions are uniquely present in VirHostNet (not IntAct)
VirHostNet$data = VirHostNet$data[!pair_id %in% human_viral$data$pair_id]
uniqueNinteractions(VirHostNet)
## [1] 17822
uniqueNinteractors(VirHostNet, taxid = 9606)
## [1] 5512
uniqueNinteractors(VirHostNet, taxid = 9606, inverse_filter = T)
## [1] 679

Load all human-mouse interactions

# filter for viral-mouse interaction
human_mouse = interSpeciesInteractome(taxid1 = 9606,  taxid2 = 10090,
                                      database = "IntActFTP", MITABdata = IntAct,
                                      clean = TRUE, protein_only = TRUE,
                                      directory = "./data_files/",
                        releaseORdate = "2019Mar23")
# save RData file to save time processing this data if you want to proceed to later sections immediately
rm(IntAct)
save(list = ls(), file = "./RData_steps/study_bias_doc_1.RData")

You can find out any other taxid if you know species name using UniProt taxonomy search: https://www.uniprot.org/taxonomy/

Count interactors and interactions

# load previosly saved RData
load("./RData_steps/study_bias_doc_1.RData")

uniqueNinteractions(human)
## [1] 196473
## [1] 21521
uniqueNinteractions(human_viral)
## [1] 15003
uniqueNinteractors(human_viral)
## [1] 5538
uniqueNinteractions(human_mouse)
## [1] 24960
uniqueNinteractors(human_mouse)
## [1] 10114
# how many of these proteins are human?
uniqueNinteractors(human_mouse, taxid = 9606)
## [1] 7060

filter human-human data by detection method and recount interactors and interactions

# subset two-hybrid interactions
human_two_hybrid = subsetMITABbyMethod(MITABdata = human,
                                       Interaction_detection_methods = "MI:0018")
## loading local copy of MI ontology
uniqueNinteractions(human_two_hybrid)
## [1] 83919
uniqueNinteractors(human_two_hybrid)
## [1] 14396
# subset all interactions but two-hybrid
human_NOT_two_hybrid = subsetMITABbyMethod(MITABdata = human,
                                           Interaction_detection_methods = "MI:0018", inverse_filter = T)
## loading local copy of MI ontology
uniqueNinteractions(human_NOT_two_hybrid)
## [1] 116651
uniqueNinteractors(human_NOT_two_hybrid)
## [1] 16616
# subset affinity purification - mass spectrometry interactions
human_AP_MS = subsetMITABbyMethod(MITABdata = human,
                                  Interaction_detection_methods = "MI:0004",  Identification_method = "MI:0433")
## loading local copy of MI ontology
uniqueNinteractions(human_AP_MS)
## [1] 69044
uniqueNinteractors(human_AP_MS)
## [1] 12465

filter human-human data by PMID and recount interactors and interactions

# subset both published and unpublished Vidal group data
Vidal_all = subsetMITABbyPMIDs(MITABdata = human,
                               PMIDs = c("25416956", "unassigned1304"))
uniqueNinteractions(Vidal_all)
## [1] 59037
## [1] 9756
# subset Mattias Mann 2015 paper data
Mann = subsetMITABbyPMIDs(MITABdata = human,
                          PMIDs = "26496610")
uniqueNinteractions(Mann)
## [1] 15589
## [1] 4950

You can get help and more details on these functions (for example, how to find which molecular ontology terms correspond to which methods): ?subsetMITABbyMethod

Find interactions between components of the mediator complex in the Vidal dataset

mediator_complex_proteins = fread("http://www.uniprot.org/uniprot/?query=GO:0016592%20AND%20taxonomy:9606&format=tab&columns=id")
mediator_complex = subsetMITABbyID(Vidal_all,
                                   ID_seed = mediator_complex_proteins$Entry,
                                   within_seed = T, only_seed2nonseed = F)
uniqueNinteractions(mediator_complex)
## [1] 5
uniqueNinteractors(mediator_complex)
## [1] 7

Find interactions of the components of the mediator complex with other proteins

mediator_complex_interactions = subsetMITABbyID(Vidal_all,
                                                ID_seed = mediator_complex_proteins$Entry,
                                                within_seed = F, only_seed2nonseed = T)
uniqueNinteractions(mediator_complex_interactions)
## [1] 179
uniqueNinteractors(mediator_complex_interactions)
## [1] 179

2.2 Getting PPI data from non-IMEx databases into R using PSICQUIC

Let’s retrieve all human-viral interactions from BioGrid database. But first we need to find correct database names.

Find correct database names.

client = PSICQUIC()
providers(client)
##  [1] "BioGrid"           "bhf-ucl"           "ChEMBL"           
##  [4] "HPIDb"             "InnateDB"          "InnateDB-All"     
##  [7] "IntAct"            "IMEx"              "mentha"           
## [10] "MPIDB"             "iRefIndex"         "MatrixDB"         
## [13] "MINT"              "Reactome"          "Reactome-FIs"     
## [16] "EBI-GOA-miRNA"     "UniProt"           "MBInfo"           
## [19] "VirHostNet"        "BAR"               "EBI-GOA-nonIntAct"
## [22] "ZINC"
# Don't run
human_viral_BioGrid = interSpeciesInteractome(taxid1 = 9606,  taxid2 = 10239,
                                              database = "BioGrid",
                                              clean = TRUE, protein_only = TRUE,
                                              directory = "./data_files/")
human_BioGrid = fullInteractome(taxid = 9606, database = "BioGrid", # 9606 - human taxid
                        clean = TRUE,
                        protein_only = TRUE,
                        directory = "./data_files/")

3. Finding publication (Pubmed) IDs for every protein

There are many ways to find publications for a given protein (all proteins). We will discuss 3 and focus on one (UniProtKB references).

  1. UniProtKB references. Details: https://www.uniprot.org/help/publications_section
    Quote:
    “The set of publications fully curated in UniProtKB/Swiss-Prot and publications imported in UniProtKB/TrEMBL is complemented by additional publications that have been computationally mapped from other resources to UniProtKB entries.”

https://www.uniprot.org/help/publications_section#additional_bibliography:
“As a comprehensive and high-quality resource of protein sequence and functional information, UniProtKB strives to provide comprehensive literature citations associated with protein sequences and their characterization. Currently about 2 thirds of the UniProtKB PubMed citations are found in UniProtKB/Swiss-Prot, as a result of active integration in the course of manual curation.

In order to keep up with the explosive growth of literature and to give our users access to additional publications, we decided to integrate additional sources of literature from other annotated databases into UniProtKB. For this purpose we selected a number external databases, e.g. Entrez Gene (GeneRIFs), SGD, MGI, GAD and PDB, and extracted citations that were mapped to UniProtKB entries. This additional protein bibliography information helps our users to better explore the existing knowledge of their proteins of interest."

  1. Europe PMC text-mining efforts yeld multiple identifier-publication associations. Details can be found here: https://europepmc.org/downloads. UniProtKB to pubmedID mapping can be downloaded from ftp: ftp://ftp.ebi.ac.uk/pub/databases/pmc/TextMinedTerms/

  2. NCBI portal: references for entrez geneID. Similar to UniProtKB references, but different ID, documentation about how these links are maintained is provided here: https://www.ncbi.nlm.nih.gov/entrez/query/static/entrezlinks.html#gene. GeneID to pubmedID mapping can be downloaded from NCBI ftp: ftp://ftp.ncbi.nih.gov/gene/DATA/gene2pubmed.gz

3.1 Retrive UniProtKB references using Uniprot REST API

Let’s download the list of NCBI PubMed identifiers (PubMed IDs) associated with the UniProtKB entry (human proteins) and computationally mapped to the UniProtKB entry (human proteins).

# retrieve human interactions (taxonomy:9606) from SwissProt, a manually reviewed part or the UniProt (reviewed:yes)
url = "https://www.uniprot.org/uniprot/?query=taxonomy:9606&compress=yes&format=tab&columns=id,citation,citationmapping"
filename.gz = "./data_files/uniprot2pubmed_human.tsv.gz"
filename = "./data_files/uniprot2pubmed_human.tsv"
if(!file.exists(filename.gz)) download(url, filename.gz)
R.utils::gunzip(filename.gz, filename, remove = F, overwrite = T)

# Read into R
uniprot2pubmed_human = fread(filename, header = T, stringsAsFactors = F)

# We need to do some processing
## 1. Merge manually associated and computationally mapped publications 
uniprot2pubmed_human[, PMIDs := paste0(`PubMed ID`,";",`Mapped PubMed ID`)]
uniprot2pubmed_human[`PubMed ID` == "" & `Mapped PubMed ID` == "", PMIDs := ""]
uniprot2pubmed_human[`PubMed ID` == "" & `Mapped PubMed ID` != "", PMIDs := `Mapped PubMed ID`]
uniprot2pubmed_human[`PubMed ID` != "" & `Mapped PubMed ID` == "", PMIDs := `PubMed ID`]
uniprot2pubmed_human$`PubMed ID` = NULL
uniprot2pubmed_human$`Mapped PubMed ID` = NULL
## 2. Split concatenated PubMed IDs and count their number per protein
uniprot2pubmed_human = uniprot2pubmed_human[, .(N_PMIDs = uniqueN(unlist(tstrsplit(PMIDs,";")))), by = Entry]

# remove objects necessary for the next part, save
rm(list = ls()[!ls() %in% c("human", "uniprot2pubmed_human", "Vidal_all", "Mann", "human_two_hybrid", "human_AP_MS")])
save(list = ls(), file = "./RData_steps/study_bias_doc_2.RData")

4. Evaluating bias of protein popularity in the literature

Why should you care about the study bias in your network and how to evaluate it

Back to the presentation, slide 61. Study bias is the phenomenon that causes certain properties of proteins to appear biologically meaningful but actually confounded with how well studies proteins are overall. Study bias may cause certain proteins to appear as hubs in the protein interaction network. This may lead to false conclusions regarding the importance of the protein in the network. Therefore, it is important to estimate and if possible correct the effect of study bias on the property of your interest.

First, we need to bin proteins by how often they are mentioned in the literature

Let’s use the number of UniProt reference as a proxy for how well studied the protein is. We will bin all proteins that have interactions in our network by the number of publications.

load("./RData_steps/study_bias_doc_2.RData")
uniprot2pubmed_human = uniprot2pubmed_human[order(N_PMIDs, decreasing = T)]
uniprot2pubmed_human_w = uniprot2pubmed_human[Entry %in% extractInteractors(human)]

# Most proteins have very low number of publications:
hist(log10(uniprot2pubmed_human_w$N_PMIDs), xlab = "log10(number of publications) per protein")

n_bins = 50
n_proteins = nrow(uniprot2pubmed_human_w)
n_proteins_per_bin = round(n_proteins/n_bins)
bins = rep(1:50, each = n_proteins_per_bin)
if(n_proteins > length(bins)){
  bins = c(bins, rep(50, n_proteins - length(bins)))
} else if(n_proteins < length(bins)){
  bins = bins[1:n_proteins]
}


uniprot2pubmed_human_w[, bins := bins]
bin_lists = split(uniprot2pubmed_human_w$Entry, uniprot2pubmed_human_w$bins)

Second, we need to count interactions within and between bins

This method shows the gist of the problem of study bias but needs further improvement.

unique_interactions = unique(human$data[, .(IDs_interactor_A, IDs_interactor_B)])
N_interaction_table = lapply(1:length(bin_lists), function(bin_list1_N, unique_interactions, bin_lists){
    N_interaction_table = lapply(1:length(bin_lists), function(bin_list2_N, bin_list1_N, unique_interactions, bin_lists) {
      bin_list1 = bin_lists[[bin_list1_N]]
      bin_list2 = bin_lists[[bin_list2_N]]
      N_proteins = uniqueN(unique_interactions[(IDs_interactor_A %in% bin_list1 &
                                     IDs_interactor_B %in% bin_list2) |
                                     (IDs_interactor_A %in% bin_list2 &
                                     IDs_interactor_B %in% bin_list1), c(IDs_interactor_A, IDs_interactor_B)])
      N_interactions = unique_interactions[,sum((IDs_interactor_A %in% bin_list1 &
                                     IDs_interactor_B %in% bin_list2) |
                                     (IDs_interactor_A %in% bin_list2 &
                                     IDs_interactor_B %in% bin_list1))]
      data.table(N_interactions_per_protein = N_interactions / N_proteins,
                 x = bin_list1_N, y = bin_list2_N,
                 N_interactions = N_interactions, N_proteins = N_proteins)
    }, bin_list1_N, unique_interactions, bin_lists)
    Reduce(rbind, N_interaction_table)
}, unique_interactions, bin_lists)
N_interaction_table = Reduce(rbind, N_interaction_table)

Third, we need to plot our results

cols = c(colorRampPalette(c("white", "#95cbee", "#0099dc", "#4ab04a", "#ffd73e"), bias = 2)(10), #"#c9e2f6"
            colorRampPalette(c("#eec73a", "#e29421", "#e29421", "#f05336","#ce472e"))(20))
ggplot(N_interaction_table, aes(x, y)) +
 geom_raster(aes(fill = N_interactions_per_protein)) +
    xlab("rank by number of studies") +
  ylab("rank by number of studies") +
  theme_bw() +
    scale_fill_gradientn(colours=cols,
                         na.value=rgb(246, 246, 246, max=255),
                         guide=guide_colourbar(ticks=T, nbin=50,
                                               barheight=15, label=T))

Finally, we can write a function to analyse other datasets

First, let’s put what we just did into a function.

interactionBias = function(ppi_network, uniprot2pubmed, n_bins = 50){
  uniprot2pubmed = copy(uniprot2pubmed)
  uniprot2pubmed = uniprot2pubmed[order(N_PMIDs, decreasing = T)]
  uniprot2pubmed = uniprot2pubmed[Entry %in% extractInteractors(human)]
  
  n_proteins = nrow(uniprot2pubmed)
  n_proteins_per_bin = round(n_proteins/n_bins)
  bins = rep(1:n_bins, each = n_proteins_per_bin)
  if(n_proteins > length(bins)){
    bins = c(bins, rep(n_bins, n_proteins - length(bins)))
  } else if(n_proteins < length(bins)){
    bins = bins[1:n_proteins]
  }
  
  uniprot2pubmed[, bins := bins]
  bin_lists = split(uniprot2pubmed$Entry, uniprot2pubmed$bins)
  unique_interactions = unique(ppi_network$data[, .(IDs_interactor_A, IDs_interactor_B)])
  N_interaction_table = lapply(1:length(bin_lists), function(bin_list1_N, unique_interactions, bin_lists){
    N_interaction_table = lapply(1:length(bin_lists), function(bin_list2_N, bin_list1_N, unique_interactions, bin_lists) {
      bin_list1 = bin_lists[[bin_list1_N]]
      bin_list2 = bin_lists[[bin_list2_N]]
      N_proteins = uniqueN(unique_interactions[(IDs_interactor_A %in% bin_list1 &
                                                  IDs_interactor_B %in% bin_list2) |
                                                 (IDs_interactor_A %in% bin_list2 &
                                                    IDs_interactor_B %in% bin_list1), c(IDs_interactor_A, IDs_interactor_B)])
      N_interactions = unique_interactions[,sum((IDs_interactor_A %in% bin_list1 &
                                                   IDs_interactor_B %in% bin_list2) |
                                                  (IDs_interactor_A %in% bin_list2 &
                                                     IDs_interactor_B %in% bin_list1))]
      data.table(N_interactions_per_protein = N_interactions / N_proteins,
                 x = bin_list1_N, y = bin_list2_N,
                 N_interactions = N_interactions, N_proteins = N_proteins)
    }, bin_list1_N, unique_interactions, bin_lists)
    Reduce(rbind, N_interaction_table)
  }, unique_interactions, bin_lists)
  N_interaction_table = Reduce(rbind, N_interaction_table)
  return(N_interaction_table)
}

4.2 You can see that Marc Vidal two-hybrid and Mattias Mann AP-MS data is substantially less biased by protein popularity among in the literature

N_interaction_table[, title := "All IntAct data"]
N_interaction_Mann = interactionBias(ppi_network = Mann, uniprot2pubmed = uniprot2pubmed_human, n_bins = 50)
N_interaction_Mann[, title := "Mattias Mann unbiased AP-MS dataset"]

N_interaction_Vidal = interactionBias(ppi_network = Vidal_all, uniprot2pubmed = uniprot2pubmed_human, n_bins = 50)
N_interaction_Vidal[, title := "Marc Vidal unbiased two-hybrid dataset"]

N_interaction_two_hybrid = interactionBias(ppi_network = human_two_hybrid, uniprot2pubmed = uniprot2pubmed_human, n_bins = 50)
N_interaction_two_hybrid[, title := "all two-hybrid data"]

N_interaction_AP_MS = interactionBias(ppi_network = human_AP_MS, uniprot2pubmed = uniprot2pubmed_human, n_bins = 50)
N_interaction_AP_MS[, title := "all AP-MS data"]

N_interaction = Reduce(rbind, list(N_interaction_Mann, N_interaction_Vidal,
                                   N_interaction_two_hybrid, N_interaction_AP_MS, N_interaction_table))

ggplot(N_interaction, aes(x, y)) +
  geom_raster(aes(fill = N_interactions_per_protein)) +
  xlab("rank by number of studies") + ylab("rank by number of studies") +
  facet_wrap( ~ title, ncol = 2) +
  theme_bw() +
    scale_fill_gradientn(colours=cols,
                         na.value=rgb(246, 246, 246, max=255),
                         guide=guide_colourbar(ticks=T, nbin=50,
                                               barheight=20, label=T))

R and system details

## [1] "2019-04-11"
devtools::session_info()
## ─ Session info ──────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 3.5.1 (2018-07-02)
##  os       macOS High Sierra 10.13.6   
##  system   x86_64, darwin15.6.0        
##  ui       X11                         
##  language (EN)                        
##  collate  en_GB.UTF-8                 
##  ctype    en_GB.UTF-8                 
##  tz       Europe/London               
##  date     2019-04-11                  
## 
## ─ Packages ──────────────────────────────────────────────────────────────
##  package       * version   date       lib source        
##  AnnotationDbi   1.44.0    2018-10-30 [1] Bioconductor  
##  assertthat      0.2.1     2019-03-21 [1] CRAN (R 3.5.1)
##  backports       1.1.3     2018-12-14 [1] CRAN (R 3.5.0)
##  Biobase         2.42.0    2018-10-30 [1] Bioconductor  
##  BiocGenerics  * 0.28.0    2018-10-30 [1] Bioconductor  
##  biomaRt       * 2.38.0    2018-10-30 [1] Bioconductor  
##  Biostrings      2.50.2    2019-01-03 [1] Bioconductor  
##  bit             1.1-14    2018-05-29 [1] CRAN (R 3.5.0)
##  bit64           0.9-7     2017-05-08 [1] CRAN (R 3.5.0)
##  bitops          1.0-6     2013-08-17 [1] CRAN (R 3.5.0)
##  blob            1.1.1     2018-03-25 [1] CRAN (R 3.5.0)
##  callr           3.2.0     2019-03-15 [1] CRAN (R 3.5.2)
##  cli             1.1.0     2019-03-19 [1] CRAN (R 3.5.2)
##  colorspace      1.4-1     2019-03-18 [1] CRAN (R 3.5.2)
##  commonmark      1.7       2018-12-01 [1] CRAN (R 3.5.0)
##  crayon          1.3.4     2017-09-16 [1] CRAN (R 3.5.0)
##  curl            3.3       2019-01-10 [1] CRAN (R 3.5.2)
##  data.table    * 1.12.0    2019-01-13 [1] CRAN (R 3.5.2)
##  DBI             1.0.0     2018-05-02 [1] CRAN (R 3.5.0)
##  desc            1.2.0     2018-05-01 [1] CRAN (R 3.5.0)
##  devtools        2.0.1     2018-10-26 [1] CRAN (R 3.5.1)
##  digest          0.6.18    2018-10-10 [1] CRAN (R 3.5.0)
##  downloader    * 0.4       2015-07-09 [1] CRAN (R 3.5.0)
##  dplyr           0.8.0.1   2019-02-15 [1] CRAN (R 3.5.2)
##  evaluate        0.13      2019-02-12 [1] CRAN (R 3.5.2)
##  fs              1.2.7     2019-03-19 [1] CRAN (R 3.5.2)
##  ggplot2       * 3.1.1     2019-04-07 [1] CRAN (R 3.5.1)
##  glue            1.3.1     2019-03-12 [1] CRAN (R 3.5.2)
##  gsubfn        * 0.7       2018-03-16 [1] CRAN (R 3.5.0)
##  gtable          0.3.0     2019-03-25 [1] CRAN (R 3.5.1)
##  hms             0.4.2     2018-03-10 [1] CRAN (R 3.5.0)
##  htmltools       0.3.6     2017-04-28 [1] CRAN (R 3.5.0)
##  httr          * 1.4.0     2018-12-11 [1] CRAN (R 3.5.0)
##  IRanges       * 2.16.0    2018-10-30 [1] Bioconductor  
##  jsonlite      * 1.6       2018-12-07 [1] CRAN (R 3.5.0)
##  knitr           1.22      2019-03-08 [1] CRAN (R 3.5.1)
##  labeling        0.3       2014-08-23 [1] CRAN (R 3.5.0)
##  lazyeval        0.2.2     2019-03-15 [1] CRAN (R 3.5.2)
##  magrittr        1.5       2014-11-22 [1] CRAN (R 3.5.0)
##  MASS            7.3-50    2018-04-30 [2] CRAN (R 3.5.1)
##  memoise         1.1.0     2017-04-21 [1] CRAN (R 3.5.0)
##  munsell         0.5.0     2018-06-12 [1] CRAN (R 3.5.0)
##  ontologyIndex * 2.5       2019-01-08 [1] CRAN (R 3.5.2)
##  pillar          1.3.1     2018-12-15 [1] CRAN (R 3.5.0)
##  PItools       * 0.1.41    2019-04-11 [1] local         
##  pkgbuild        1.0.3     2019-03-20 [1] CRAN (R 3.5.1)
##  pkgconfig       2.0.2     2018-08-16 [1] CRAN (R 3.5.0)
##  pkgdown         1.3.0     2018-12-07 [1] CRAN (R 3.5.0)
##  pkgload         1.0.2     2018-10-29 [1] CRAN (R 3.5.0)
##  plyr          * 1.8.4     2016-06-08 [1] CRAN (R 3.5.0)
##  prettyunits     1.0.2     2015-07-13 [1] CRAN (R 3.5.0)
##  processx        3.3.0     2019-03-10 [1] CRAN (R 3.5.2)
##  progress        1.2.0     2018-06-14 [1] CRAN (R 3.5.0)
##  proto         * 1.0.0     2016-10-29 [1] CRAN (R 3.5.0)
##  ps              1.3.0     2018-12-21 [1] CRAN (R 3.5.0)
##  PSICQUIC      * 1.20.0    2018-10-30 [1] Bioconductor  
##  purrr           0.3.2     2019-03-15 [1] CRAN (R 3.5.2)
##  R.methodsS3   * 1.7.1     2016-02-16 [1] CRAN (R 3.5.0)
##  R.oo          * 1.22.0    2018-04-22 [1] CRAN (R 3.5.0)
##  R.utils       * 2.8.0     2019-02-14 [1] CRAN (R 3.5.2)
##  R6              2.4.0     2019-02-14 [1] CRAN (R 3.5.2)
##  Rcpp            1.0.1     2019-03-17 [1] CRAN (R 3.5.2)
##  RCurl           1.95-4.12 2019-03-04 [1] CRAN (R 3.5.2)
##  remotes         2.0.2     2018-10-30 [1] CRAN (R 3.5.0)
##  rlang           0.3.1     2019-01-08 [1] CRAN (R 3.5.2)
##  rmarkdown       1.12      2019-03-14 [1] CRAN (R 3.5.2)
##  roxygen2        6.1.1     2018-11-07 [1] CRAN (R 3.5.0)
##  rprojroot       1.3-2     2018-01-03 [1] CRAN (R 3.5.0)
##  RSQLite         2.1.1     2018-05-06 [1] CRAN (R 3.5.0)
##  rstudioapi      0.10      2019-03-19 [1] CRAN (R 3.5.2)
##  S4Vectors     * 0.20.1    2018-11-09 [1] Bioconductor  
##  scales          1.0.0     2018-08-09 [1] CRAN (R 3.5.0)
##  sessioninfo     1.1.1     2018-11-05 [1] CRAN (R 3.5.0)
##  stringi         1.4.3     2019-03-12 [1] CRAN (R 3.5.2)
##  stringr         1.4.0     2019-02-10 [1] CRAN (R 3.5.2)
##  testthat        2.0.1     2018-10-13 [1] CRAN (R 3.5.1)
##  tibble          2.1.1     2019-03-16 [1] CRAN (R 3.5.2)
##  tidyselect      0.2.5     2018-10-11 [1] CRAN (R 3.5.0)
##  usethis         1.5.0     2019-04-07 [1] CRAN (R 3.5.1)
##  withr           2.1.2     2018-03-15 [1] CRAN (R 3.5.0)
##  xfun            0.6       2019-04-02 [1] CRAN (R 3.5.1)
##  XML             3.98-1.19 2019-03-06 [1] CRAN (R 3.5.2)
##  xml2            1.2.0     2018-01-24 [1] CRAN (R 3.5.0)
##  XVector         0.22.0    2018-10-30 [1] Bioconductor  
##  yaml            2.2.0     2018-07-25 [1] CRAN (R 3.5.0)
##  zlibbioc        1.28.0    2018-10-30 [1] Bioconductor  
## 
## [1] /Users/vk7/Library/R/3.5/library
## [2] /Library/Frameworks/R.framework/Versions/3.5/Resources/library