1. Introduction

Volcano plots represent a useful way to visualise the results of differential expression analyses. Here, we present a highly-configurable function that produces publication-ready volcano plots. EnhancedVolcano (Blighe, Rana, and Lewis 2018) will attempt to fit as many labels in the plot window as possible, thus avoiding ‘clogging’ up the plot with labels that could not otherwise have been read. Other functionality allows the user to identify up to 5 different types of attributes in the same plot space via colour, shape, size, encircling, and shade parameter configurations.

2. Installation

2.1 Download the package from Bioconductor

if (!requireNamespace('BiocManager', quietly = TRUE))
  install.packages('BiocManager')

BiocManager::install('EnhancedVolcano')
## Bioconductor version 3.19 (BiocManager 1.30.23), R 4.4.0 (2024-04-24)
## Warning: package(s) not installed when version(s) same as or greater than current; use
##   `force = TRUE` to re-install: 'EnhancedVolcano'
## Old packages: 'abind', 'afex', 'ape', 'arrow', 'backports', 'bayestestR', 'BH',
##   'bigD', 'BiocManager', 'Biostrings', 'bit', 'bit64', 'bitops', 'BMA', 'boot',
##   'broom', 'canvasXpress', 'car', 'caTools', 'class', 'cluster', 'collapse',
##   'collections', 'commonmark', 'correlation', 'corrplot', 'COUNT',
##   'credentials', 'data.table', 'datawizard', 'DBI', 'DEoptimR', 'Deriv',
##   'diffobj', 'doBy', 'dotCall64', 'downlit', 'dqrng', 'Ecfun', 'effectsize',
##   'ensembldb', 'faraway', 'fastDummies', 'fastmatch', 'fda', 'fields',
##   'fitdistrplus', 'FNN', 'foreign', 'future', 'future.apply', 'gcookbook',
##   'GenomeInfoDb', 'GenomicRanges', 'gert', 'ggblanket', 'gginnards', 'ggiraph',
##   'ggnewscale', 'ggrepel', 'ggstatsplot', 'gh', 'globals', 'gmp',
##   'GPArotation', 'gplots', 'gt', 'haven', 'HDF5Array', 'hdf5r', 'hexbin',
##   'HistData', 'HSAUR', 'httpuv', 'httr2', 'hypergeo', 'igraph', 'inline',
##   'insight', 'IRanges', 'jpeg', 'KEGGREST', 'KernSmooth', 'KMsurv', 'ks',
##   'labelled', 'lattice', 'lintr', 'lme4', 'locfit', 'lubridate', 'markdown',
##   'MASS', 'Matrix', 'MatrixModels', 'matrixStats', 'maxstat', 'mgcv',
##   'microbenchmark', 'miniUI', 'minqa', 'msme', 'mvtnorm', 'nlme', 'nloptr',
##   'nnet', 'parallelly', 'parameters', 'patchwork', 'pbdZMQ', 'pbkrtest',
##   'performance', 'pkgbuild', 'pkgdown', 'pkgload', 'plm', 'PMCMRplus',
##   'polyclip', 'poweRlaw', 'processx', 'profvis', 'progressr', 'ps', 'psych',
##   'quantreg', 'qvcalc', 'R.cache', 'R.oo', 'R.utils', 'ragg', 'RANN', 'raster',
##   'RcppArmadillo', 'RcppEigen', 'RcppRoll', 'RcppTOML', 'RCurl', 'Rdpack',
##   'reactR', 'readxl', 'renv', 'reprex', 'reticulate', 'rjson', 'Rmpfr',
##   'roxygen2', 'rpart', 'rrcov', 'RSpectra', 'RSQLite', 'rstudioapi',
##   'S4Arrays', 'S4Vectors', 'sctransform', 'sessioninfo', 'Seurat',
##   'SeuratObject', 'shiny', 'shinydashboard', 'Signac', 'sp', 'spam',
##   'SparseArray', 'SparseM', 'spatial', 'spatstat.data', 'spatstat.explore',
##   'spatstat.geom', 'spatstat.random', 'spatstat.sparse', 'spatstat.utils',
##   'statsExpressions', 'SuppDists', 'survival', 'survminer', 'svglite',
##   'systemfonts', 'tensor', 'terra', 'testthat', 'textshaping', 'tzdb',
##   'usethis', 'uwot', 'V8', 'VGAM', 'vioplot', 'waldo', 'WRS2', 'XML', 'xml2',
##   'zeallot', 'zip', 'zoo'

2.2 Load the package into R session

Note: to install development version:

if (!("EnhancedVolcano" %in% installed.packages())) {
    devtools::install_github('kevinblighe/EnhancedVolcano')
}
library(EnhancedVolcano)
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.4.1
## Loading required package: ggrepel
if (!("canvasXpress" %in% installed.packages())) {
    devtools::install_github('neuhausi/canvasXpress')
}
#devtools::install_local("~/git/canvas/R/canvasXpress.tar.gz", build_manual = TRUE, upgrade = "always")
library(canvasXpress)

3. Quick start

For this example, we will follow the tutorial (from Section 3.1) of RNA-seq workflow: gene-level exploratory analysis and differential expression. Specifically, we will load the ‘airway’ data, where different airway smooth muscle cells were treated with dexamethasone.

library(airway)
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
## 
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
##     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
##     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
##     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,
##     tapply, union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
## 
##     findMatches
## The following objects are masked from 'package:base':
## 
##     expand.grid, I, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## 
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
## 
##     rowMedians
## The following objects are masked from 'package:matrixStats':
## 
##     anyMissing, rowMedians
library(magrittr)
## 
## Attaching package: 'magrittr'
## The following object is masked from 'package:GenomicRanges':
## 
##     subtract
data('airway')
airway$dex %<>% relevel('untrt')

Annotate the Ensembl gene IDs to gene symbols:

ens <- rownames(airway)

library(org.Hs.eg.db)
## Loading required package: AnnotationDbi
## 
symbols <- mapIds(org.Hs.eg.db, keys = ens, column = c('SYMBOL'), keytype = 'ENSEMBL')
## 'select()' returned 1:many mapping between keys and columns
symbols <- symbols[!is.na(symbols)]
symbols <- symbols[match(rownames(airway), names(symbols))]
rownames(airway) <- symbols
keep <- !is.na(rownames(airway))
airway <- airway[keep,]

Conduct differential expression using DESeq2 in order to create 2 sets of results:

library('DESeq2')

dds <- DESeqDataSet(airway, design = ~ cell + dex)
## Warning in DESeqDataSet(airway, design = ~cell + dex): 1337 duplicate rownames
## were renamed by adding numbers
dds <- DESeq(dds, betaPrior=FALSE)
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
res <- results(dds, contrast = c('dex','trt','untrt'))
res <- lfcShrink(dds, contrast = c('dex','trt','untrt'), res=res, type = 'normal')
## using 'normal' for LFC shrinkage, the Normal prior from Love et al (2014).
## 
## Note that type='apeglm' and type='ashr' have shown to have less bias than type='normal'.
## See ?lfcShrink for more details on shrinkage type, and the DESeq2 vignette.
## Reference: https://doi.org/10.1093/bioinformatics/bty895

3.1 Plot the most basic volcano plot

For the most basic volcano plot, only a single data-frame, data-matrix, or tibble of test results is required, containing point labels, log2FC, and adjusted or unadjusted P values. The default cut-off for log2FC is >|2|; the default cut-off for P value is 10e-6.

v <- EnhancedVolcano(res,
  lab = rownames(res),
  x = 'log2FoldChange',
  y = 'pvalue')
v

canvasXpress(v, width = 700, height = 500)

4. Advanced features

Virtually all aspects of an EnhancedVolcano plot can be configured for the purposes of accommodating all types of statistical distributions and labelling preferences. By default, EnhancedVolcano will only attempt to label genes that pass the thresholds that you set for statistical significance, i.e., ‘pCutoff’ and ‘FCcutoff’. In addition, it will only label as many of these that can reasonably fit in the plot space. The user can optionally supply a vector of labels (as ‘selectLab’) that s/he wishes to label in the plot.

4.1 Modify cut-offs for log2FC and P value; specify title; adjust point and label size

The default P value cut-off of 10e-6 may be too relaxed for most studies, which may therefore necessitate increasing this threshold by a few orders of magnitude. Equally, the log2FC cut-offs may be too stringent, given that moderated ‘shrunk’ estimates of log2FC differences in differential expression analysis can now be calculated.

In this example, we also modify the point and label size, which can help to improve clarity where many variables went into the differential expression analysis.

v <- EnhancedVolcano(res,
  lab = rownames(res),
  x = 'log2FoldChange',
  y = 'pvalue',
  title = 'N061011 versus N61311',
  pCutoff = 10e-32,
  FCcutoff = 0.5,
  pointSize = 3.0,
  labSize = 6.0)
v

canvasXpress(v, width = 700, height = 500)

4.2 Adjust colour and alpha for point shading

The default colour scheme may not be to everyone’s taste. Here we make it such that only the variables passing both the log2FC and P value thresholds are coloured red, with everything else black. We also adjust the value for ‘alpha’, which controls the transparency of the plotted points: 1 = 100% opaque; 0 = 100% transparent.

v <- EnhancedVolcano(res,
  lab = rownames(res),
  x = 'log2FoldChange',
  y = 'pvalue',
  title = 'N061011 versus N61311',
  pCutoff = 10e-16,
  FCcutoff = 1.5,
  pointSize = 3.0,
  labSize = 6.0,
  col=c('black', 'black', 'black', 'red3'),
  colAlpha = 1)
v

canvasXpress(v, width = 700, height = 500)