Overview

This document covers gene set related analyses of RNAseq data in the R package SEARchways. This pipeline explores differentially expressed genes (see voom to DEG) and gene co-expression modules (see RNAseq modules). The goal is to determine pathways associated with genes and modules significantly impacted by viral infection or asthma status. The example data are human, bulk, paired-end RNA-seq, but this pipeline can be applied to other organisms or single-read libraries.

0. Setup

Software

This pipeline should be completed in R and RStudio. You should also install the following packages.

#CRAN packages
install.packages(c("tidyverse", "WGCNA"))

#Bioconductor packages
install.packages("BiocManager")
BiocManager::install(c("edgeR", "limma", "patchwork"))

#GitHub packages
install.packages("devtools")
devtools::install_github("BIGslu/kimma")
devtools::install_github("BIGslu/BIGpicture")
devtools::install_github("BIGslu/RNAetc")
devtools::install_github("BIGslu/SEARchways")

And load them into your current R session.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(kimma)
library(BIGpicture)
library(RNAetc)

## 
## Attaching package: 'RNAetc'
## 
## The following objects are masked from 'package:kimma':
## 
##     example.dat, example.voom

library(SEARchways)
library(limma)
library(WGCNA)

## Loading required package: dynamicTreeCut
## Loading required package: fastcluster
## 
## Attaching package: 'fastcluster'
## 
## The following object is masked from 'package:stats':
## 
##     hclust
## 
## 
## 
## Attaching package: 'WGCNA'
## 
## The following object is masked from 'package:stats':
## 
##     cor

library(patchwork)

set.seed(651)

Example data

Example data were obtained from virus-stimulated human plasmacytoid dendritic cells. Actual patient identifiers and metadata have been altered for this tutorial.

Dill-McFarland KA, Schwartz JT, Zhao H, Shao B, Fulkerson PC, Altman MC, Gill MA. 2022. Eosinophil-mediated suppression and Anti-IL-5 enhancement of plasmacytoid dendritic cell interferon responses in asthma. J Allergy Clin Immunol. Epub ahead of print. doi: 10.1016/j.jaci.2022.03.025. — GitHub

Specifically, this tutorial uses RNAseq data processed using our SEAsnake and counts to voom pipelines, resulting in voom-normalized, log2 counts per million (CPM) expression and associated sample metadata in a limma EList object in the data_clean/ directory. These data are available in the kimma package within the example.voom data object.

1. Load data

All counts, gene, and sample metadata are contained in a single object in the kimma package. Note that these data contain only 1000 genes from the full data set create in our counts to voom pipeline. This is to reduce computational time.

#Extract and rename data object
dat <- kimma::example.voom

2. Differential gene expression

First, we determine genes associated with viral infection and/or asthma in this data set. You can see more about model fitting in our voom to DEG tutorial.

virus_lme <- kmFit(dat = dat, 
                   model = "~ virus*asthma + (1|ptID)", 
                   run_lme = TRUE, 
                   use_weights = TRUE)

## lme/lmerel model: expression~virus*asthma+(1|ptID)

## Input: 12 libraries from 6 unique patients

## Model: 12 libraries

## Complete: 1000 genes

## Failed: 0 genes

2.1 Hypergeometric enrichment

There are hundreds of genes significant for virus and/or asthma in this data set (FDR < 0.05). This is too many to assess individuals so we use enrichment to determine pathways associated with these genes.

summarise_kmFit(virus_lme$lme)

## # A tibble: 5 × 7
##   variable          `fdr<0.05` `fdr<0.1` `fdr<0.2` `fdr<0.3` `fdr<0.4` `fdr<0.5`
##   <fct>                  <int>     <int>     <int>     <int>     <int>     <int>
## 1 (1 | ptID)                 7         9        21        24        33       124
## 2 asthma                    69        98       140       204       284       355
## 3 virus                    266       314       397       468       566       628
## 4 virus:asthma              17        33        46        77        89       106
## 5 total (nonredund…        316       384       493       590       710       797

`BIGprofiler`

SEARchway contains two functions to run enrichment. The first BIGprofiler employs clusterProfiler to run Fisher’s exact tests of your genes of interest against any gene set data base available in the Broad Molecular Signatures Database (MSigDB). Here, let’s run the virus significant genes against the Hallmark data base.

#Virus significant genes
genes.virus <- virus_lme$lme %>% 
  filter(variable == "virus" & FDR < 0.05) %>% 
  pull(gene)

enrich.virus <- BIGprofiler(gene_list = list("virus" = genes.virus),
                            ID = "ENSEMBL",
                            category = "H")

## [1] "virus"

The output includes:

group: Name identifier for genes of interest
size_group: Total genes in each group
gs_cat: Gene set data base category
gs_subcat: Gene set data base subcategory
size_cat.subcat: Total genes in data base category/subcategory
group_in_cat.subcat: Total significant genes in data base category/subcategory e.g. how many of your genes of interest are present in the data base
pathway: Gene set name
size_pathway: Total genes in gene set (K)
group_in_pathway: Total significant genes in gene set (k)
k/K: group_in_pathway / size_pathway, a commonly used metric of enrichment
pval: P-value
FDR: FDR corrected P-value
qvalue: q-value
genes: List of significant genes in gene set (not shown below for length)

The results show that there is 1 pathway enriched in virus significant genes at FDR < 0.05.

enrich.virus %>% 
  filter(FDR < 0.05) %>% 
  select(-genes)

## # A tibble: 1 × 13
##   group size_group gs_collection gs_subcollection size_cat.subcat
##   <chr>      <int> <chr>         <chr>                      <dbl>
## 1 virus        266 H             ""                          4386
## # ℹ 8 more variables: group_in_cat.subcat <dbl>, pathway <chr>,
## #   size_pathway <dbl>, group_in_pathway <dbl>, `k/K` <dbl>, pval <dbl>,
## #   FDR <dbl>, qvalue <dbl>

We can easily plot this result in BIGpicture. We’ll use FDR < 0.5 so you can see multiple pathways. You can see that just over 6% of the IFN gamma response pathway is present in our list of virus significant genes.

plot_enrich(enrich.virus, 
            fdr_cutoff = 0.5, fdr_colors = c(0.05, 0.5))

You can also look at multiple variables by inputing a data frame (gene_df) with the variable in 1 column and the significant genes in another. Note that the plot shows non-significant results for one variable or the other. This can be turned off with show.overlap = FALSE.

gene_df <- virus_lme$lme %>% 
  filter(FDR < 0.05 & variable %in% c("virus","asthma")) %>% 
  select(variable, gene) 
head(gene_df)

## # A tibble: 6 × 2
##   variable gene           
##   <chr>    <chr>          
## 1 virus    ENSG00000002587
## 2 asthma   ENSG00000005884
## 3 asthma   ENSG00000006712
## 4 virus    ENSG00000010818
## 5 virus    ENSG00000013810
## 6 asthma   ENSG00000013810

enrich.all <- BIGprofiler(gene_df = gene_df,
                          ID = "ENSEMBL",
                          category = "H")

## [1] "virus"
## [1] "asthma"

plot_enrich(enrich.all, 
            fdr_cutoff = 0.5, fdr_colors = c(0.05, 0.5))

What about directionality?

This type of enrichment does not address whether genes are up or down-regulated, just that they are significant in some way. Thus, if you would like to know direction, you need to create subsets like virus.up and virus.down, then run enrichment on the lists separately. This should be done with caution as not all gene sets have concordant directionality and shorter gene lists are less likely to be significant in any pathway.

A note on saving `SEARchways` results

Many SEARchways results contain list data. For example, the genes column from all of the BIG functions in this tutorial is a list column. This means these data cannot be saved outside of R as they are.

class(enrich.all$genes)

## [1] "list"

To save this column outside R, simply convert it to a character.

enrich.all %>% 
  mutate(genes = as.character(genes)) %>% 
  write_csv(file = "results/BIGprofiler.results.csv")

Note that if you read this csv back into R, the genes column will not be formatting correctly for downstream our package functions. Thus, we recommend saving data in .RData or .Rds in order to maintain formatting.

Alternatively, you can unnest the data to get 1 row per gene per pathway. Then when you read the csv into R, you can re-nest it like so.

enrich.all %>% 
  unnest(genes) %>% 
  write_csv(file = "results/BIGprofiler.results.csv")

enrich.all <- read_csv("results/BIGprofiler.results.csv") %>% 
  group_by(across(c(-genes))) %>% 
  summarise(genes = list(genes), .groups = "drop")

2.2 Gene set enrichment analysis (GSEA)

Another gene set related analysis is GSEA. This method uses fold change values of all genes in the data set instead of having you pre-define a significant list of genes. GSEA, thus, can find significant pathways with no individually significant genes in them. Instead, smaller up (or down) regulations of many genes can drive overall significance of a pathway.

BIGsea runs GSEA with parameters similar to BIGprofiler only now the input is log fold change (named estimate in the kimma output).

gene_df2 <- virus_lme$lme %>% 
  filter(variable %in% c("virus", "asthma")) %>% 
  select(variable, gene, estimate) 
head(gene_df2)

## # A tibble: 6 × 3
##   variable gene            estimate
##   <chr>    <chr>              <dbl>
## 1 virus    ENSG00000000460    0.425
## 2 asthma   ENSG00000000460    0.783
## 3 virus    ENSG00000001460    0.369
## 4 asthma   ENSG00000001460    0.225
## 5 virus    ENSG00000002587   -0.858
## 6 asthma   ENSG00000002587   -0.130

gsea.all <- BIGsea(gene_df = gene_df2,
                          ID = "ENSEMBL",
                          category = "H")

## Running virus

## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (0.7% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.

##   |                                                                              |                                                                      |   0%  |                                                                              |===================================                                   |  50%  |                                                                              |======================================================================| 100%

## Running asthma

## Warning in preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, : There are ties in the preranked stats (0.7% of the list).
## The order of those tied genes will be arbitrary, which may produce unexpected results.

##   |                                                                              |                                                                      |   0%  |                                                                              |===================================                                   |  50%  |                                                                              |======================================================================| 100%

Note the warning about ties in the data. This means that 2 or more genes have an identical log fold change value and thus, GSEA must randomly determine their rank order relative to each other. This is not an issue because we’ve already set a random seed for reproducibility. However, if there are a lot of ties in your data, like > 10%, this may indicate a need to re-assess your linear model to better represent differences across the data.

Similarly, we can plot the GSEA results with BIGpicture. Until simple enrichment, GSEA shows directionality such as here where IFN gamma response is up-regulated in response to virus. Remember to define your variable order with factor in the original data if you want something other than alphabetical as this determines whether the fold change values are A-B or B-A.

plot_gsea(gsea.all,
          fdr_cutoff = 0.2)

What if I have more than 2 groups to compare?

GSEA can only support pairwise comparison. Thus, if you have a variable of interest with 3 or more levels, you need to run each pairwise comparison of interest separately. You can access this data in kimma with run.contrasts = TRUE. This is also true for interaction terms, where the model estimate is not a simple fold change.

3. Module annotation

Gene set analyses can also be used to assign function to gene co-expression modules. For more information on module building, see our RNAseq module tutorial.

#Define significant genes
genes.interact <- virus_lme$lme %>% 
  filter(variable == "virus:asthma" & FDR < 0.2) %>% 
  pull(gene)

genes.main <- virus_lme$lme %>% 
  filter(variable %in% c("virus", "asthma") & FDR < 0.2) %>% 
  pull(gene)

genes.all <- unique(c(genes.interact, genes.main))

fit <- fit_modules(dat = kimma::example.voom, 
                   genes = genes.all,
                   powerVector=c(1:30), 
                   networkType = "signed",
                   nThread = 4)

mod.p14 <- make_modules(
  fit,
  sft.value = 14, 
  minModuleSize = 20, 
  maxBlockSize = 500, 
  deepSplit = 2,
  networkType = "signed", 
  mods_mean = TRUE, mods_eigen = TRUE, 
  david = TRUE,
  nThread = 4)

The SEARchways function can be applied to module results from RNAetc like so.

3.1 Hypergeometric enrichment

We see several pathways enrichment in some of the modules.

Note that this also showcases that different gene identifiers can be used in enrichment.

mod_df <- mod.p14$mods %>% 
  select(module.char, geneName) 
head(mod_df)

##   module.char        geneName
## 1          05 ENSG00000002587
## 2          04 ENSG00000005884
## 3          01 ENSG00000006453
## 4          04 ENSG00000006634
## 5          07 ENSG00000006712
## 6          02 ENSG00000010278

enrich.mod <- BIGprofiler(gene_df = mod_df,
                          ID = "ENSEMBL",
                          category = "H")

## [1] "05"
## [1] "04"
## [1] "01"
## [1] "07"
## [1] "02"
## [1] "06"
## [1] "08"
## [1] "03"
## [1] "00"

plot_enrich(enrich.mod, fdr_cutoff = 0.2,
            show_overlap = FALSE)

4. STRING networks

The STRING protein-protein interaction network data base is a useful tool for visualizing genes of interest. We have incorporated this visualization with SEARchways in order to facilitate annotation of the a STRING network with significant pathways.

4.1 Network colored by enrichment

For example, we plot the virus significant genes and color by significant hypergeometric enrichment from BIGprofiler.

#Map gene to STRING
map <- map_string(genes = genes.virus)

## Warning:  we couldn't map to STRING 0% of your identifiers

#plot
plot_string(map = map,
            enrichment = enrich.virus, fdr_cutoff = 0.05)

## Registered S3 method overwritten by 'ggnetwork':
##   method         from  
##   fortify.igraph ggtree

Using other gene identifiers

We used ENSEMBL gene identifiers in our enrichment, so these are used to label the networks nodes. However, you may wish to use more human-readable names like HGNC symbols. We can pull these data from the original dat$genes metadata or if you do not already have this information, checkout biomaRt.

#View gene metadata
head(dat$genes)

##                 hgnc_symbol Previous symbols                   Alias symbols
## ENSG00000000460       RAB12             <NA>                            <NA>
## ENSG00000001460         MIA             <NA>                          CD-RAP
## ENSG00000002587     C1orf52             <NA>                 gm117, FLJ44982
## ENSG00000005189    C12orf73             <NA> FLJ13975, DKFZp547P055, BRAWNIN
## ENSG00000005801        EMC2  KIAA0103, TTC35                            <NA>
## ENSG00000005884      CNPPD1          C2orf24                          CGI-57
##                        gene_biotype        geneName
## ENSG00000000460 protein-coding gene ENSG00000206418
## ENSG00000001460 protein-coding gene ENSG00000261857
## ENSG00000002587 protein-coding gene ENSG00000162642
## ENSG00000005189 protein-coding gene ENSG00000204954
## ENSG00000005801 protein-coding gene ENSG00000104412
## ENSG00000005884 protein-coding gene ENSG00000115649

#Convert ENSEMBL gene list to HGNC symbol
genes.virus.hgnc <- dat$genes %>% 
  filter(geneName %in% genes.virus) %>% 
  pull(hgnc_symbol)
genes.virus.hgnc[1:3]

## [1] "MIA"     "C1orf52" "EMC2"

We also need to convert the enrichment results

genes.key <- dat$genes %>% 
  select(geneName, hgnc_symbol)

enrich.virus.hgnc <- enrich.virus %>% 
  #Match ENSEMBL to HGNC
  unnest(genes) %>% 
  left_join(genes.key, by = c("genes"="geneName")) %>% 
  select(-genes) %>% 
  #Re-nest gene IDs
  group_by(across(c(-hgnc_symbol))) %>% 
  summarise(genes = list(hgnc_symbol), .groups = "drop")

Or rerun the enrichment with the HGNC symbols.

enrich.virus.hgnc <- BIGprofiler(gene_list = list("virus" = genes.virus.hgnc),
                            ID = "SYMBOL",
                            category = "H")

Then we can re-make the network.

#Map gene to STRING
map <- map_string(genes = genes.virus.hgnc)

## Warning:  we couldn't map to STRING 0% of your identifiers

#plot
plot_string(map = map,
            enrichment = enrich.virus.hgnc, fdr_cutoff = 0.05)

4.2 Customizing networks

This network is pretty big so let’s cut if down to just genes with at least 1 connection.

plot_string(map = map,
            enrichment = enrich.virus.hgnc, fdr_cutoff = 0.05,
            edge_min = 1)

The pathway names aren’t nicely formatted so we could change them in the enrichment results.

enrich.virus.hgnc.format <- enrich.virus.hgnc %>% 
  mutate(pathway = gsub("HALLMARK_","",pathway),
         pathway = gsub("_"," ",pathway))

plot_string(map = map,
            enrichment = enrich.virus.hgnc.format, fdr_cutoff = 0.05,
            edge_min = 1)

Or we could look at just nodes with no connections (i.e. orphans). We also use a non-default layout for this visualization.

plot_string(map = map,
            enrichment = enrich.virus.hgnc.format, fdr_cutoff = 0.05,
            edge_max = 0,
            layout = "grid")

4.2 Network colored by GSEA

We can also color by GSEA leading edge genes from BIGsea. Again, we need to convert or rerun GSEA with HGNC symbols to match the current network.

gsea.virus.hgnc <- gsea.all %>% 
  filter(group == "virus") %>% 
  #Match ENSEMBL to HGNC
  unnest(leadingEdge) %>% 
  left_join(genes.key, by = c("leadingEdge"="geneName")) %>% 
  select(-leadingEdge) %>% 
  #Re-nest gene IDs
  group_by(across(c(-hgnc_symbol))) %>% 
  summarise(leadingEdge = list(hgnc_symbol), .groups = "drop")

plot_string(map = map,
            enrichment = gsea.virus.hgnc, fdr_cutoff = 0.2,
            edge_min = 1)

Note that fewer genes are colored in the GSEA network. This is because GSEA identifies the leading edge only. Thus, you can have genes in your network that are annotated to a significant GSEA pathway but not colored because they were not leading edge in your comparison of interest.

R session

sessionInfo()

## R version 4.5.0 (2025-04-11)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.4.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Los_Angeles
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] patchwork_1.3.0       WGCNA_1.73            fastcluster_1.2.6    
##  [4] dynamicTreeCut_1.63-1 limma_3.64.0          SEARchways_1.4.0     
##  [7] RNAetc_1.2.0          BIGpicture_1.1.0      kimma_1.6.3          
## [10] lubridate_1.9.4       forcats_1.0.0         stringr_1.5.1        
## [13] dplyr_1.1.4           purrr_1.0.4           readr_2.1.5          
## [16] tidyr_1.3.1           tibble_3.2.1          ggplot2_3.5.2        
## [19] tidyverse_2.0.0      
## 
## loaded via a namespace (and not attached):
##   [1] splines_4.5.0           bitops_1.0-9            ggplotify_0.1.2        
##   [4] R.oo_1.27.0             polyclip_1.10-7         ggnetwork_0.5.13       
##   [7] preprocessCore_1.70.0   rpart_4.1.24            lifecycle_1.0.4        
##  [10] doParallel_1.0.17       lattice_0.22-7          MASS_7.3-65            
##  [13] backports_1.5.0         magrittr_2.0.3          Hmisc_5.2-3            
##  [16] sass_0.4.10             rmarkdown_2.29          jquerylib_0.1.4        
##  [19] yaml_2.3.10             plotrix_3.8-4           ggtangle_0.0.6         
##  [22] cowplot_1.1.3           DBI_1.2.3               RColorBrewer_1.1-3     
##  [25] R.utils_2.13.0          ggraph_2.2.1            BiocGenerics_0.54.0    
##  [28] msigdbr_10.0.2          hash_2.2.6.3            yulab.utils_0.2.0      
##  [31] nnet_7.3-20             tweenr_2.0.3            GenomeInfoDbData_1.2.14
##  [34] IRanges_2.42.0          S4Vectors_0.46.0        enrichplot_1.28.0      
##  [37] ggrepel_0.9.6           tidytree_0.4.6          proto_1.0.0            
##  [40] codetools_0.2-20        DOSE_4.2.0              ggforce_0.4.2          
##  [43] tidyselect_1.2.1        aplot_0.2.5             UCSC.utils_1.4.0       
##  [46] farver_2.1.2            viridis_0.6.5           matrixStats_1.5.0      
##  [49] stats4_4.5.0            base64enc_0.1-3         jsonlite_2.0.0         
##  [52] tidygraph_1.3.1         Formula_1.2-5           survival_3.8-3         
##  [55] iterators_1.0.14        foreach_1.5.2           tools_4.5.0            
##  [58] chron_2.3-62            treeio_1.32.0           Rcpp_1.0.14            
##  [61] glue_1.8.0              gridExtra_2.3           xfun_0.52              
##  [64] qvalue_2.40.0           GenomeInfoDb_1.44.0     withr_3.0.2            
##  [67] fastmap_1.2.0           caTools_1.18.3          digest_0.6.37          
##  [70] timechange_0.3.0        R6_2.6.1                gridGraphics_0.5-1     
##  [73] colorspace_2.1-1        GO.db_3.21.0            gtools_3.9.5           
##  [76] RSQLite_2.3.9           R.methodsS3_1.8.2       utf8_1.2.4             
##  [79] generics_0.1.3          data.table_1.17.0       graphlayouts_1.2.2     
##  [82] httr_1.4.7              htmlwidgets_1.6.4       scatterpie_0.2.4       
##  [85] sqldf_0.4-11            pkgconfig_2.0.3         gtable_0.3.6           
##  [88] blob_1.2.4              impute_1.82.0           XVector_0.48.0         
##  [91] clusterProfiler_4.16.0  htmltools_0.5.8.1       fgsea_1.34.0           
##  [94] msigdbdf_24.1.1         scales_1.4.0            Biobase_2.68.0         
##  [97] png_0.1-8               ggfun_0.1.8             knitr_1.50             
## [100] rstudioapi_0.17.1       tzdb_0.5.0              reshape2_1.4.4         
## [103] checkmate_2.3.2         nlme_3.1-168            cachem_1.1.0           
## [106] KernSmooth_2.23-26      parallel_4.5.0          foreign_0.8-90         
## [109] AnnotationDbi_1.70.0    pillar_1.10.2           grid_4.5.0             
## [112] vctrs_0.6.5             gplots_3.2.0            STRINGdb_2.20.0        
## [115] cluster_2.1.8.1         htmlTable_2.4.3         evaluate_1.0.3         
## [118] gsubfn_0.7              cli_3.6.5               compiler_4.5.0         
## [121] rlang_1.1.6             crayon_1.5.3            labeling_0.4.3         
## [124] plyr_1.8.9              fs_1.6.6                stringi_1.8.7          
## [127] viridisLite_0.4.2       BiocParallel_1.42.0     assertthat_0.2.1       
## [130] babelgene_22.9          Biostrings_2.76.0       lazyeval_0.2.2         
## [133] GOSemSim_2.34.0         Matrix_1.7-3            hms_1.1.3              
## [136] bit64_4.6.0-1           KEGGREST_1.48.0         statmod_1.5.0          
## [139] igraph_2.1.4            memoise_2.0.1           bslib_0.9.0            
## [142] ggtree_3.16.0           fastmatch_1.1-6         bit_4.6.0              
## [145] ape_5.8-1               gson_0.1.0

RNA-seq data analysis

Gene set analyses

Kim Dill-McFarland, kadm@uw.edu

version April 30, 2025

Overview

0. Setup

Software

Example data

1. Load data

2. Differential gene expression

2.1 Hypergeometric enrichment

`BIGprofiler`

What about directionality?

A note on saving `SEARchways` results

2.2 Gene set enrichment analysis (GSEA)

What if I have more than 2 groups to compare?

3. Module annotation

3.1 Hypergeometric enrichment

4. STRING networks

4.1 Network colored by enrichment

Using other gene identifiers

4.2 Customizing networks

4.2 Network colored by GSEA

R session

Navigation

RNA-seq data analysis

Gene set analyses

Kim Dill-McFarland, kadm@uw.edu

version April 30, 2025

Overview

0. Setup

Software

Example data

1. Load data

2. Differential gene expression

2.1 Hypergeometric enrichment

BIGprofiler

What about directionality?

A note on saving SEARchways results

2.2 Gene set enrichment analysis (GSEA)

What if I have more than 2 groups to compare?

3. Module annotation

3.1 Hypergeometric enrichment

4. STRING networks

4.1 Network colored by enrichment

Using other gene identifiers

4.2 Customizing networks

4.2 Network colored by GSEA

R session

Navigation

`BIGprofiler`

A note on saving `SEARchways` results