Overview

This package provides a number of plot wrappers that create common bioinformatic analyses visualizations and work with results from our other packages, such as kimma and SEARchways. As such, you will need to install and load several packages to access example data.

Setup

Software and packages

This pipeline should be completed in R version 4.5.0 (2025-04-11) or newer. Please also download the following packages.

# CRAN packages
install.packages(c("tidyverse","patchwork"))

# Bioconductor packages
## These are dependencies for kimma that are beneficial to download separately
install.packages("BiocManager")
BiocManager::install(c("limma", "edgeR"))

# GitHub packages
install.packages("devtools")
devtools::install_github("BIGslu/kimma")
devtools::install_github("BIGslu/BIGpicture")
devtools::install_github("BIGslu/RNAetc")
devtools::install_github("BIGslu/SEARchways")

And load them into your current R session.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(edgeR)

## Loading required package: limma

library(limma)
library(kimma)
library(BIGpicture)
library(SEARchways)
library(patchwork)

To ensure reproducibility with this document, set the random seed value.

set.seed(651)

Data background

Example data were obtained from media controls and human rhinovirus (HRV) infected human plasmacytoid dendritic cells. You can learn more about these data in

Dill-McFarland KA, Schwartz JT, Zhao H, Shao B, Fulkerson PC, Altman MC, Gill MA. 2022. Eosinophil-mediated suppression and Anti-IL-5 enhancement of plasmacytoid dendritic cell interferon responses in asthma. J Allergy Clin Immunol. Epub ahead of print. doi: 10.1016/j.jaci.2022.03.025. — GitHub

Specifically, this vignette uses bulk human RNA-seq processed from fastq to counts using SEAsnake and counts to voom using edgeR/limma. This results in voom-normalized, log2 counts per million (CPM) gene expression and associated sample and gene metadata. In total, 6 donors and 1000 random genes were selected for this vignette. Actual donor identifiers have been altered for privacy.

Load data

Unnormalized RNAseq counts and metadata.

example.dat <- kimma::example.dat
names(example.dat)

## [1] "counts"  "samples" "genes"

Voom normalized RNAseq counts and metadata.

example.voom <- kimma::example.voom
names(example.voom)

## [1] "genes"   "targets" "E"       "weights" "design"

kmFit model results.

example.model <- BIGpicture::example.model

Create an additional kmFit model for plotting.

example.model2 <- kimma::kmFit(
  dat = example.voom,
  model = "~virus*asthma + median_cv_coverage + (1|ptID)",
  run_lme = TRUE, use_weights = TRUE, metrics = TRUE)

## lme/lmerel model: expression~virus*asthma+median_cv_coverage+(1|ptID)

## Input: 12 libraries from 6 unique patients

## Model: 12 libraries

## Complete: 1000 genes

## Failed: 0 genes

Create example enrichment results.

enrich <- SEARchways::flexEnrich(
  gene_list = list("HRV1"=names(SEARchways::example.gene.list$HRV1),
                   "HRV2"=names(SEARchways::example.gene.list$HRV2)),
  ID = "ENSEMBL", species = "human", 
  collection = "H")

gsea <- SEARchways::BIGsea(
  gene_list = SEARchways::example.gene.list,
  ID = "ENSEMBL", species = "human", 
  collection = "C2", subcollection = "CP")

## Running HRV1

## Running HRV2

## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (1% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.

RNAseq data cleaning

Mean-variance trend

Used in RNAseq data cleaning counts to voom to visualize the association of expression on the X with variance on the Y.

plot_mv(dat = example.dat, design = "~ virus")

PCA

`plot_pca`: PCA all in one

Calculate, annotate, and plot a PCA representing gene expression. By default, we assume unique library IDs are contained in the column libID. Change this with the libraryID parameter.

When running on unnormalized data, transform the counts to log counts per million.

plot_pca(dat = example.dat, 
         vars = "virus",
         scale = TRUE,
         transform_logCPM = TRUE)

## Joining with `by = join_by(libID)`

## $virus

When running on normalized data, don’t transform the data (default).

plot_pca(dat = example.voom, 
         vars = "virus",
         scale = TRUE)

## Joining with `by = join_by(libID)`

## $virus

Automatically identify outliers in PC space. See outlier_group parameter to define outliers within groups independently.

plot_pca(dat = example.voom, 
         vars = "outlier", outlier_sd = 3,
         scale = TRUE)

## Joining with `by = join_by(libID)`

## $outlier

Plot multiple variables and visualize together with patchwork.

plot_pca(dat = example.voom, 
         vars = c("virus","asthma","median_cv_coverage", "norm.factors"),
         scale = TRUE) %>% 
  patchwork::wrap_plots(nrow=2)

## Joining with `by = join_by(libID)`

Other parameters

View different PCs using PCx and PCy
Define outliers within groups independently with outlier_group
Define a difference unique identifier column with libraryID

Coming soon: plot_pca2

Linear modeling

Model fit

`plot_fit2`: Compare fit metrics

These functions use the output from kimma::kmFit.

Compare model fits between 2 models. Note that it is important to choose comparable models - this means the same number of samples and changing only 1 model attribute at a time e.g. adding a covariate, adding weights, adding a random effect, etc. Be careful if you have missing data. For example, if you have ~ variable vs ~ variable + covariate but the covariate has some missing data, changes in fit could be due to the covariate itself and/or the smaller sample size since samples missing any model variable are removed during modeling.

Here, we compare a model with and without kinship correction. The kmFit result has two model results (lme and lmerel), so you need only 1 model result input. Note the results printed in the console. These summarize genes better fit by one model or another. An absolute change in AIC > 7 is significant, change from 2-7 is moderate, and < 2 is non-significant. These cutoffs are also shown as red lines in the plot.

names(example.model)

## [1] "lme"             "lmerel"          "lme.contrast"    "lmerel.contrast"
## [5] "lme.fit"         "lmerel.fit"

plot_fit2(model_result = example.model,
          x="lme", y="lmerel", metrics = "AIC")

## Summary: AIC and BIC

##          Best fit Metric Total genes Mean delta Stdev delta
## 1 Nonsignif (< 2)    AIC        1000  0.2471154   0.3143377

If you’re models are in two separate objects:

plot_fit2(model_result = example.model, x="lme",
          model_result_y = example.model2, y="lme",
          metrics = "AIC")

## Summary: AIC and BIC

##                                                                     Best fit
## 1 Moderate (2-7) lme_virus_asthma_median_cv_coverage_virus:asthma_(1 | ptID)
## 2                                                            Nonsignif (< 2)
## 3   Signif (> 7) lme_virus_asthma_median_cv_coverage_virus:asthma_(1 | ptID)
##   Metric Total genes Mean delta Stdev delta
## 1    AIC         727   4.439370   1.3281120
## 2    AIC         110   1.258108   0.5218337
## 3    AIC         163   9.782923   2.6771227

Note that the model name is not very readable. You can change the labels.

plot_fit2(model_result = example.model, x="lme", 
          x_label = "no covariate",
          model_result_y = example.model2, y="lme", 
          y_label = "+medianCV",
          metrics = "AIC")

## Summary: AIC and BIC

##                   Best fit Metric Total genes Mean delta Stdev delta
## 1 Moderate (2-7) +medianCV    AIC         727   4.439370   1.3281120
## 2          Nonsignif (< 2)    AIC         110   1.258108   0.5218337
## 3   Signif (> 7) +medianCV    AIC         163   9.782923   2.6771227

Plot a different metric. Choose anything in example.model$lme.fit or similar. Note that the cutoffs only apply to AIC and BIC

plot_fit2(model_result = example.model,
          x="lme", y="lmerel", 
          metrics = c("BIC","Rsq"))

## Summary: AIC and BIC

##          Best fit Metric Total genes Mean delta Stdev delta
## 1 Nonsignif (< 2)    BIC        1000  0.2471154   0.3143377

## Summary: Other metrics

##   Best fit Metric Total genes Mean delta Stdev delta
## 1      lme    Rsq         273 0.01979372  0.02558724
## 2   lmerel    Rsq         466 0.01729597  0.01973940
## 3     none    Rsq         261 0.00000000  0.00000000

Add dots to see outlying genes.

plot_fit2(model_result = example.model, x="lme", 
          x_label = "no covariate",
          model_result_y = example.model2, y="lme", 
          y_label = "+medianCV",
          metrics = "AIC",
          outliers = TRUE)

## Joining with `by = join_by(name)`
## Summary: AIC and BIC

##                   Best fit Metric Total genes Mean delta Stdev delta
## 1 Moderate (2-7) +medianCV    AIC         727   4.439370   1.3281120
## 2          Nonsignif (< 2)    AIC         110   1.258108   0.5218337
## 3   Signif (> 7) +medianCV    AIC         163   9.782923   2.6771227

Label genes with the largest changes in fit.

plot_fit2(model_result = example.model, x="lme", 
          x_label = "no covariate",
          model_result_y = example.model2, y="lme", 
          y_label = "+medianCV",
          metrics = "AIC",
          label = 3)

## Summary: AIC and BIC

##                   Best fit Metric Total genes Mean delta Stdev delta
## 1 Moderate (2-7) +medianCV    AIC         727   4.439370   1.3281120
## 2          Nonsignif (< 2)    AIC         110   1.258108   0.5218337
## 3   Signif (> 7) +medianCV    AIC         163   9.782923   2.6771227

Change the labels to another gene identifier.

plot_fit2(model_result = example.model, x="lme", 
          x_label = "no covariate",
          model_result_y = example.model2, y="lme", 
          y_label = "+medianCV",
          metrics = "AIC",
          label = 3, 
          genes = example.voom$genes, genes_label = "hgnc_symbol")

## Summary: AIC and BIC

##                   Best fit Metric Total genes Mean delta Stdev delta
## 1 Moderate (2-7) +medianCV    AIC         727   4.439370   1.3281120
## 2          Nonsignif (< 2)    AIC         110   1.258108   0.5218337
## 3   Signif (> 7) +medianCV    AIC         163   9.782923   2.6771227

Only plot a subset of genes.

plot_fit2(model_result = example.model, x="lme", 
          x_label = "no covariate",
          model_result_y = example.model2, y="lme", 
          y_label = "+medianCV",
          metrics = "AIC",
          subset_genes = example.voom$genes$geneName[1:100])

## Summary: AIC and BIC

##                   Best fit Metric Total genes Mean delta Stdev delta
## 1 Moderate (2-7) +medianCV    AIC          71   4.398087   1.2628561
## 2          Nonsignif (< 2)    AIC          12   1.380337   0.5232679
## 3   Signif (> 7) +medianCV    AIC          17   9.942274   2.3146413

Significant results

`plot_volcano`: Volcano plot

Plot all variables from one model.

plot_volcano(model_result = example.model, model = "lme")

Select variable(s) of interest.

plot_volcano(model_result = example.model, model = "lme",
             variables = "virus")

Plot uncorrected P-values instead of FDR.

plot_volcano(model_result = example.model, model = "lme",
             variables = "virus",
             y="pval")

Color significant genes based on FDR and/or estimate.

plot_volcano(model_result = example.model, model = "lme",
             variables = "virus",
             x_cutoff = 1, y_cutoff = 0.05)

Label most significant genes.

plot_volcano(model_result = example.model, model = "lme",
             variables = "virus",
             x_cutoff = 1, y_cutoff = 0.05,
             label = 3)

Change the gene labels to a different gene indentifier.

plot_volcano(model_result = example.model, model = "lme",
             variables = "virus",
             x_cutoff = 1, y_cutoff = 0.05,
             label = 3,
             genes = example.voom$genes,
             genes_label = "hgnc_symbol")

`plot_venn_genes`: Venn plot of DEGs

Functions the same as plot_upset_genes.

Plot all variables at multiple FDR cutoffs.

plot_venn_genes(model_result = example.model, 
                models = "lme")$venn %>% 
  patchwork::wrap_plots(ncol=2)

Select specific FDR cutoff(s).

plot_venn_genes(model_result = example.model, 
                models = "lme",
                fdr_cutoff = 0.05)

## $venn
## $venn$`0.05`

Select variable(s) of interest.

plot_venn_genes(model_result = example.model, 
                models = "lme",
                fdr_cutoff = 0.05,
                variables = c("asthma","virus"))

## $venn
## $venn$`0.05`

If you’re using a contrasts model, specify contrasts instead.

plot_venn_genes(model_result = example.model, 
                models = "lme.contrast",
                fdr_cutoff = 0.05,
                contrasts = c("HRV healthy - none healthy",
                              "HRV asthma - none asthma"))

## $venn
## $venn$`0.05`

Include the random effect and/or intercept if the model has one.

plot_venn_genes(model_result = example.model, 
                models = "lme",
                fdr_cutoff = 0.05,
                random = TRUE, intercept = TRUE)

## $venn
## $venn$`0.05`

Save the genes in each part of the Venn’s overlap for downstream analyses.

venn_result <- plot_venn_genes(model_result = example.model, 
                models = "lme",
                fdr_cutoff = 0.05,
                return_genes = TRUE)
venn_result$venn

## $`0.05`

venn_result$gene$`0.05` %>% head

##              gene virus:asthma asthma virus
## 1 ENSG00000002587         <NA>   <NA>     Y
## 2 ENSG00000013810         <NA>      Y     Y
## 3 ENSG00000023287         <NA>   <NA>     Y
## 4 ENSG00000034510         <NA>   <NA>     Y
## 5 ENSG00000050130         <NA>   <NA>     Y
## 6 ENSG00000054118         <NA>   <NA>     Y

Compare DEGs in two different models.

plot_venn_genes(model_result = list("noCov"=example.model,
                                    "medianCV"=example.model2), 
                models = "lme",
                variables = "virus",
                fdr_cutoff = 0.05)

## $venn
## $venn$`0.05`

Note that labels might get cutoff if they are too long, so you may need to expand the plot limits. A good way to do this is to first find the current limits by adding a theme with xy axes and then setting lims on the single Venn result. For example, here we expand the y limits (even though the labels are not cutoff).

plot_venn_genes(model_result = list("noCov"=example.model,
                                    "medianCV"=example.model2), 
                models = c("lme","lme"),
                variables = "virus",
                fdr_cutoff = 0.05)$venn$`0.05` +
  theme_bw()

plot_venn_genes(model_result = list("noCov"=example.model,
                                    "medianCV"=example.model2), 
                models = c("lme","lme"),
                variables = "virus",
                fdr_cutoff = 0.05)$venn$`0.05` +
  lims(y=c(-1,2))

## Scale for y is already present.
## Adding another scale for y, which will replace the existing scale.

`plot_upset_genes`: Upset plot of DEGs

Functions the same as plot_venn_genes.

Plot all variables at multiple FDR cutoffs.

plot_upset_genes(model_result = example.model, 
                models = "lme")$upset %>% 
  patchwork::wrap_plots(ncol=2)

Select specific FDR cutoff(s).

plot_upset_genes(model_result = example.model, 
                models = "lme",
                fdr_cutoff = 0.05)

## $upset
## $upset$`0.05`

Select variable(s) of interest.

plot_upset_genes(model_result = example.model, 
                models = "lme",
                fdr_cutoff = 0.05,
                variables = c("asthma","virus"))

## $upset
## $upset$`0.05`

If you’re using a contrasts model, specify contrasts instead.

plot_upset_genes(model_result = example.model, 
                models = "lme.contrast",
                fdr_cutoff = 0.5,
                contrasts = c("HRV healthy - none healthy",
                              "HRV asthma - none asthma"))

## $upset
## $upset$`0.5`

Include the random effect and/or intercept if the model has one.

plot_upset_genes(model_result = example.model, 
                models = "lme",
                fdr_cutoff = 0.05,
                random = TRUE, intercept = TRUE)

## $upset
## $upset$`0.05`

Save the genes in each part of the Venn’s overlap for downstream analyses.

upset_result <- plot_upset_genes(model_result = example.model, 
                models = "lme",
                fdr_cutoff = 0.05,
                return_genes = TRUE)
upset_result$upset

## $`0.05`

upset_result$gene$`0.05` %>% head

## # A tibble: 6 × 4
##   gene            virus asthma `virus:asthma`
##   <chr>           <chr> <chr>  <chr>         
## 1 ENSG00000002587 Y     <NA>   <NA>          
## 2 ENSG00000005884 <NA>  Y      <NA>          
## 3 ENSG00000006712 <NA>  Y      <NA>          
## 4 ENSG00000013810 Y     Y      <NA>          
## 5 ENSG00000018610 <NA>  Y      <NA>          
## 6 ENSG00000023287 Y     <NA>   <NA>

Compare DEGs in two different models.

plot_upset_genes(model_result = list("noCov"=example.model,
                                    "medianCV"=example.model2), 
                models = "lme",
                variables = "virus",
                fdr_cutoff = 0.05)

## $upset
## $upset$`0.05`

`plot_genes`: Expression and model result summary for 1 gene

A single wrapper function to get a table of model result and boxplots associated with model variables for one gene at a time.

plot_genes(dat = example.voom,
           fdr = example.model$lme,
           subset_genes = "ENSG00000173950",
           variables = "virus")

## $ENSG00000173950

Plot multiple variables including interaction terms, which also add the main model terms associated with the interaction (e.g. if A:B, also plots A and B)

plot_genes(dat = example.voom,
           fdr = example.model$lme,
           subset_genes = "ENSG00000173950",
           variables = "virus:asthma")

## $ENSG00000173950

Color boxplots by a variable of interest. This can be a variable already being plotted or a new variable.

plot_genes(dat = example.voom,
           fdr = example.model$lme,
           subset_genes = "ENSG00000173950",
           variables = c("virus","asthma"),
           colorID = "virus")

## $ENSG00000173950

plot_genes(dat = example.voom,
           fdr = example.model$lme,
           subset_genes = "ENSG00000173950",
           variables = c("virus","asthma"),
           colorID = "ptID")

## $ENSG00000173950

Plot many genes quickly using multiple processors. Not run in this document to save space.

deg <- example.model$lme %>% filter(FDR<0.001) %>% pull(gene)
length(deg)

plot.ls <- plot_genes(dat = example.voom,
                      fdr = example.model$lme,
                      subset_genes = deg,
                      variables = "virus",
                      processors = 6)

#Save each plot
#Change settings as needed
outdir <- "~/Desktop/"
fig.width <- 8
fig.height <- 5

for(gene in names(plot.ls)){
  f <- paste0(outdir, gene, ".png")
  ggsave(filename = f, plot = plot.ls[[gene]], 
         width = fig.width, height = fig.height)
}

Other parameters

You can also provide expression data not in a edgeR or voom object with counts, meta, and optionally genes
If your data have different variable names, use libraryID and geneID to change them

Gene set enrichment

Hypergeometric / pathway enrichment / Fisher’s exact test

These functions use the outputs from SEARchways::BIGprofiler and SEARchways::flexEnrich.

`plot_enrich2`: Lollipop plot

The default plot for plot_enrich2 is what we call a lollipop plot, and it only shows results with FDR < 0.2.

plot_enrich2(df = enrich)

`plot_enrich2`: Dotplot

Change the chart_style to a dotplot

plot_enrich2(df = enrich,
             fdr_cutoff = 0.3,
             chart_style = "dot")

Use an outline to denote results below an FDR cutoff. Note this only applies to dotplots.

plot_enrich2(df = enrich,
             fdr_cutoff = 0.3,
             chart_style = "dot",
             dot_sig_cutoff = 0.2)

`plot_enrich2`: Barplot

Change the chart_style to a barplot.

Note that barplots can only be created with a single group of enrichment results.

plot_enrich2(df = enrich %>%
               dplyr::filter(group=="HRV2"),
             fdr_cutoff = 0.3,
             chart_style = "bar")

`plot_enrich2`: Shared parameters

You can change the FDR cutoff and/or FDR colors groups.

plot_enrich2(df = enrich,
             fdr_cutoff = 0.3,
             fdr_binned_colors = c(0.2, 0.3, 1))

For set the FDR coloring to a log color gradient.

plot_enrich2(df = enrich,
             fdr_cutoff = 0.3,
             fdr_continuous = TRUE)

Order the gene sets / pathways. By default, they are ordered by fdr. You can also order by “hclust” of gene membership, “overlap_size”, “gs_size”, “ratio”, “fdr”, or “input”.

plot_enrich2(df = enrich,
             fdr_cutoff = 0.3,
             y_grouping_method = "gs_size")

plot_enrich2(df = enrich,
             fdr_cutoff = 0.3,
             y_grouping_method = "hclust")

#etc...

Add total gene set sizes.

plot_enrich2(df = enrich,
             fdr_cutoff = 0.3,
             include_gssize = TRUE)

Other parameters

Change the data column names to match your data if you are not using SEARchways results with _col parameters (pathway_col, gene_col, ratio_col, fdr_col, gssize_col, dot_groupcol)
- These parameters can also be used to change which variable is mapped to which plot aesthetic.

`plot_enrichgrid`: Add gene level info to `plot_enrich2` plot

#Modify pathway names for a prettier plot
enrich <- enrich %>% 
  mutate(pathway = gsub("HALLMARK_","",pathway),
         pathway = gsub("_","\n",pathway))

plot_enrichgrid(df = enrich, 
                df_group = "HRV2")

plot_enrichgrid(df = enrich, 
                df_group = "HRV2",
                chart_style = "lollipop")

Order genes in the heatmap including “hclust”, “prevalence”, “input”, “alphabetical”, or “geneset”.

plot_enrichgrid(df = enrich, 
                df_group = "HRV2",
                x_grouping_method = "prevalence")

Color genes by a binary prevalence cutoff instead of a heatmap.

plot_enrichgrid(df = enrich, 
                df_group = "HRV2",
                prevalence_color = "cutoff",
                prevalence_cutoff = 0.5)

Only plot specific genes of interest.

plot_enrichgrid(df = enrich, 
                df_group = "HRV2",
                custom_genelist = c("ENSG00000023228", "ENSG00000158864"))

Quickly get the equivalent plot without the gene grid without having to convert to plot_enrich2.

plot_enrichgrid(df = enrich, 
                df_group = "HRV2",
                include_grid = FALSE)

Other parameters

See plot_enrich2 Shared parameters for info on fdr_cutoff, fdr_binned_colors, y_grouping_method, pathway_col, gene_col, ratio_col, fdr_col, gssize_col
Note that chart_style only supports “bar” or “lollipop” for this function

GSEA: Gene set enrichment analysis

`plot_gsea`: Lollipop plot

Plot a lollipop plot of GSEA normalized enrichment scores (NES). Will be superseded by plot_fit2.

Here, we use a ridiculous FDR cutoff, because there are actually no significant results in the example data.

plot_gsea(gsea = gsea, 
          fdr_cutoff = 0.8, fdr_colors = c(0.5))

STRING network

Get some genes of interest. These can be ENSEMBL or HGNC. Here, we pull some genes from gene sets in MSigDB, but you may wish to plot DEGs, GSEA leading edge, etc.

db <- msigdbr::msigdbr(species = "Homo sapiens", collection = "H")
genes <- db %>% 
  filter(gs_name %in% c("HALLMARK_OXIDATIVE_PHOSPHORYLATION",
                        "HALLMARK_HYPOXIA"))%>% 
  pull(ensembl_gene) %>% unique()
genes <- genes[c(1:20,250:270)]
#Add some results from enrichment so plot is more interesting
genes <- unique(c(genes, "ENSG00000079739", "ENSG00000105220", "ENSG00000114023", 
           "ENSG00000122863", "ENSG00000136521", "ENSG00000154174", 
           "ENSG00000158864", "ENSG00000023228"))

`map_string`: Map genes to STRING database

Map genes of interest to the STRING network database. You can update the STRING version as needed. By default, connections are retained with combined score > 400 (e.g. medium confidence).

map <- map_string(genes, version = 12.0)
names(map)

## [1] "map"      "subgraph"

head(map$map)

##              gene            STRING_id
## 1 ENSG00000144476 9606.ENSP00000272928
## 2 ENSG00000148926 9606.ENSP00000436607
## 3 ENSG00000170425 9606.ENSP00000304501
## 4 ENSG00000162433 9606.ENSP00000378743
## 5 ENSG00000131016 9606.ENSP00000384537
## 6 ENSG00000149925 9606.ENSP00000496166

Change the combined scored threshold. For example, to “strong” confidence connections.

map <- map_string(genes, version = 12.0,
                  score_threshold = 700)

Other parameters

You can switch to mouse data with species = 'mouse'

`plot_string`: Plot a STRING map

Now you can plot the STRING network map.

plot_string(map = map)

Control the layout if you don’t like the automatically chosen one. See options in igraph::layout_with_

plot_string(map = map,
            layout = "mds")

plot_string(map = map,
            layout = "kk")

Remove orphans (nodes with no edge connections).

plot_string(map = map,
            edge_min = 1)

Plot just orphans (nodes with no edge connections).

plot_string(map = map, 
            edge_max = 0)

Plot just the largest cluster. The large “lgl” layout is usually best for these.

plot_string(map = map, layout = "lgl",
            main_cluster_only = TRUE)

Map enrichment or GSEA results from SEARchways to your network.

enrich.select <- enrich %>% 
  filter(group=="HRV2")

plot_string(map = map, layout = "lgl",
            main_cluster_only = TRUE,
            enrichment = enrich.select)

Change the cutoffs of what enrichment results are included.

plot_string(map = map, layout = "lgl",
            main_cluster_only = TRUE,
            enrichment = enrich.select,
            overlap = 1, fdr_cutoff = 0.5)

Only included genes in selected (by cutoffs) enrichment pathways.

plot_string(map = map, layout = "lgl",
            main_cluster_only = TRUE,
            enrichment = enrich.select,
            overlap = 1, fdr_cutoff = 0.5,
            enriched_only = TRUE)

Change the aesthetics of the plot.

plot_string(map = map, layout = "lgl",
            main_cluster_only = TRUE,
            enrichment = enrich.select,
            text_size = 1,
            node_size = 2,
            colors = c("#CC6677", "#44AA99", "grey"))

Heatmap

`get_hm_clust`: Extract genes in `ComplexHeatmap` clusters

Create a heatmap to apply the function to.

library(ComplexHeatmap)

## Loading required package: grid

## ========================================
## ComplexHeatmap version 2.24.0
## Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
## Github page: https://github.com/jokergoo/ComplexHeatmap
## Documentation: http://jokergoo.github.io/ComplexHeatmap-reference
## 
## If you use it in published research, please cite either one:
## - Gu, Z. Complex Heatmap Visualization. iMeta 2022.
## - Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional 
##     genomic data. Bioinformatics 2016.
## 
## 
## The new InteractiveComplexHeatmap package can directly export static 
## complex heatmaps into an interactive Shiny app with zero effort. Have a try!
## 
## This message can be suppressed by:
##   suppressPackageStartupMessages(library(ComplexHeatmap))
## ========================================

hm.dat <- example.model$lme %>%
          select(gene, variable, pval) %>%
          pivot_wider(names_from = variable, values_from = pval) %>%
          column_to_rownames("gene") %>% as.matrix()

# The km determine cutoffs for clusters
example.hm <- Heatmap(hm.dat, row_km=3, column_km=2)
example.hm <- draw(example.hm)

Extract row (gene usually) clusters.

get_hm_clust(dat = hm.dat,
             hm = example.hm, dimension = "row") %>% 
  head()

##                 row row_within_cluster  cluster
## 581 ENSG00000115568                  1 cluster1
## 582 ENSG00000111642                  2 cluster1
## 583 ENSG00000061794                  3 cluster1
## 584 ENSG00000120875                  4 cluster1
## 585 ENSG00000167325                  5 cluster1
## 586 ENSG00000164300                  6 cluster1

Extract column (variable or sample usually) clusters.

get_hm_clust(dat = hm.dat,
             hm = example.hm, dimension = "col")

##            col col_within_cluster  cluster
## 4   (1 | ptID)                  1 cluster1
## 1 virus:asthma                  1 cluster2
## 2       asthma                  2 cluster2
## 3        virus                  3 cluster2

Superceded functions

`plot_enrich`: Lollipop plot

There is also the older plot_enrich that only supports lollipop plots and will be deprecated once plot_enrich2 can handle GSEA results.

plot_enrich(enrich = enrich)

`plot_fit`: Compare fit metrics

There is also the original plot_fit, which has a subset of the plot_fit2 parameters. Each point is a gene colored by the best fit model (delta AIC > or < 0). This plot is less preferred as it is difficult to see small changes in AIC and many genes overlap.

plot_fit(model_result = example.model, x="lme", 
         model_result_y = example.model, y="lmerel")

## Summary

##                                      Best fit Metric Total genes Mean delta
## 1    lme_virus_asthma_virus:asthma_(1 | ptID)    AIC         273  0.3958556
## 2 lmerel_virus_asthma_virus:asthma_(1 | ptID)    AIC         466  0.2983838
## 3                                        none    AIC         261  0.0000000
##   Stdev delta
## 1   0.4044870
## 2   0.2582185
## 3   0.0000000

R session

sessionInfo()

## R version 4.5.0 (2025-04-11)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.4.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Los_Angeles
## tzcode source: internal
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] ComplexHeatmap_2.24.0 patchwork_1.3.0       SEARchways_1.5.0     
##  [4] BIGpicture_1.2.0      kimma_1.6.3           edgeR_4.6.1          
##  [7] limma_3.64.0          lubridate_1.9.4       forcats_1.0.0        
## [10] stringr_1.5.1         dplyr_1.1.4           purrr_1.0.4          
## [13] readr_2.1.5           tidyr_1.3.1           tibble_3.2.1         
## [16] ggplot2_3.5.2         tidyverse_2.0.0      
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3  shape_1.4.6.1       rstudioapi_0.17.1  
##   [4] jsonlite_2.0.0      magrittr_2.0.3      farver_2.1.2       
##   [7] rmarkdown_2.29      GlobalOptions_0.1.2 fs_1.6.6           
##  [10] vctrs_0.6.5         memoise_2.0.1       rstatix_0.7.2      
##  [13] htmltools_0.5.8.1   plotrix_3.8-4       broom_1.0.8        
##  [16] Formula_1.2-5       sass_0.4.10         KernSmooth_2.23-26 
##  [19] bslib_0.9.0         gsubfn_0.7          plyr_1.8.9         
##  [22] cachem_1.1.0        igraph_2.1.4        lifecycle_1.0.4    
##  [25] ggnetwork_0.5.13    iterators_1.0.14    pkgconfig_2.0.3    
##  [28] Matrix_1.7-3        R6_2.6.1            fastmap_1.2.0      
##  [31] clue_0.3-66         digest_0.6.37       colorspace_2.1-1   
##  [34] S4Vectors_0.46.0    chron_2.3-62        RSQLite_2.3.9      
##  [37] ggpubr_0.6.0        labeling_0.4.3      timechange_0.3.0   
##  [40] httr_1.4.7          polyclip_1.10-7     abind_1.4-8        
##  [43] compiler_4.5.0      bit64_4.6.0-1       withr_3.0.2        
##  [46] doParallel_1.0.17   backports_1.5.0     BiocParallel_1.42.0
##  [49] carData_3.0-5       viridis_0.6.5       DBI_1.2.3          
##  [52] ggupset_0.4.1       ggforce_0.4.2       gplots_3.2.0       
##  [55] ggsignif_0.6.4      MASS_7.3-65         rjson_0.2.23       
##  [58] gtools_3.9.5        caTools_1.18.3      tools_4.5.0        
##  [61] scatterpie_0.2.4    msigdbr_10.0.2      glue_1.8.0         
##  [64] cluster_2.1.8.1     ggvenn_0.1.10       fgsea_1.34.0       
##  [67] generics_0.1.3      gtable_0.3.6        textshape_1.7.5    
##  [70] tzdb_0.5.0          data.table_1.17.0   hms_1.1.3          
##  [73] tidygraph_1.3.1     car_3.1-3           utf8_1.2.4         
##  [76] BiocGenerics_0.54.0 ggrepel_0.9.6       foreach_1.5.2      
##  [79] pillar_1.10.2       yulab.utils_0.2.0   babelgene_22.9     
##  [82] circlize_0.4.16     tweenr_2.0.3        lattice_0.22-7     
##  [85] bit_4.6.0           tidyselect_1.2.1    locfit_1.5-9.12    
##  [88] knitr_1.50          gridExtra_2.3       IRanges_2.42.0     
##  [91] sqldf_0.4-11        stats4_4.5.0        xfun_0.52          
##  [94] graphlayouts_1.2.2  statmod_1.5.0       matrixStats_1.5.0  
##  [97] proto_1.0.0         stringi_1.8.7       ggfun_0.1.8        
## [100] yaml_2.3.10         evaluate_1.0.3      codetools_0.2-20   
## [103] ggraph_2.2.1        hash_2.2.6.3        cli_3.6.5          
## [106] STRINGdb_2.20.0     jquerylib_0.1.4     Rcpp_1.0.14        
## [109] msigdbdf_24.1.1     png_0.1-8           parallel_4.5.0     
## [112] assertthat_0.2.1    blob_1.2.4          bitops_1.0-9       
## [115] viridisLite_0.4.2   scales_1.4.0        crayon_1.5.3       
## [118] GetoptLong_1.0.5    rlang_1.1.6         cowplot_1.1.3      
## [121] fastmatch_1.1-6

BIGpicture vignette

Useful plot wrappers for ggplot2

Kim Dill-McFarland, kadm@uw.edu

version May 02, 2025

Overview

Setup

Software and packages

Data background

Load data

RNAseq data cleaning

Mean-variance trend

PCA

`plot_pca`: PCA all in one

Linear modeling

Model fit

`plot_fit2`: Compare fit metrics

Significant results

`plot_volcano`: Volcano plot

`plot_venn_genes`: Venn plot of DEGs

`plot_upset_genes`: Upset plot of DEGs

`plot_genes`: Expression and model result summary for 1 gene

Gene set enrichment

Hypergeometric / pathway enrichment / Fisher’s exact test

`plot_enrich2`: Lollipop plot

`plot_enrich2`: Dotplot

`plot_enrich2`: Barplot

`plot_enrich2`: Shared parameters

`plot_enrichgrid`: Add gene level info to `plot_enrich2` plot

GSEA: Gene set enrichment analysis

`plot_gsea`: Lollipop plot

STRING network

`map_string`: Map genes to STRING database

`plot_string`: Plot a STRING map

Heatmap

`get_hm_clust`: Extract genes in `ComplexHeatmap` clusters

Superceded functions

`plot_enrich`: Lollipop plot

`plot_fit`: Compare fit metrics

R session

BIGpicture vignette

Useful plot wrappers for ggplot2

Kim Dill-McFarland, kadm@uw.edu

version May 02, 2025

Overview

Setup

Software and packages

Data background

Load data

RNAseq data cleaning

Mean-variance trend

PCA

plot_pca: PCA all in one

Linear modeling

Model fit

plot_fit2: Compare fit metrics

Significant results

plot_volcano: Volcano plot

plot_venn_genes: Venn plot of DEGs

plot_upset_genes: Upset plot of DEGs

plot_genes: Expression and model result summary for 1 gene

Gene set enrichment

Hypergeometric / pathway enrichment / Fisher’s exact test

plot_enrich2: Lollipop plot

plot_enrich2: Dotplot

plot_enrich2: Barplot

plot_enrich2: Shared parameters

plot_enrichgrid: Add gene level info to plot_enrich2 plot

GSEA: Gene set enrichment analysis

plot_gsea: Lollipop plot

STRING network

map_string: Map genes to STRING database

plot_string: Plot a STRING map

Heatmap

get_hm_clust: Extract genes in ComplexHeatmap clusters

Superceded functions

plot_enrich: Lollipop plot

plot_fit: Compare fit metrics

R session

`plot_pca`: PCA all in one

`plot_fit2`: Compare fit metrics

`plot_volcano`: Volcano plot

`plot_venn_genes`: Venn plot of DEGs

`plot_upset_genes`: Upset plot of DEGs

`plot_genes`: Expression and model result summary for 1 gene

`plot_enrich2`: Lollipop plot

`plot_enrich2`: Dotplot

`plot_enrich2`: Barplot

`plot_enrich2`: Shared parameters

`plot_enrichgrid`: Add gene level info to `plot_enrich2` plot

`plot_gsea`: Lollipop plot

`map_string`: Map genes to STRING database

`plot_string`: Plot a STRING map

`get_hm_clust`: Extract genes in `ComplexHeatmap` clusters

`plot_enrich`: Lollipop plot

`plot_fit`: Compare fit metrics