This document covers our recommended analysis pipeline to determine
differentially expressed co-expression modules. This pipeline includes
linear regression using kimma
and module building using
WGCNA
. The example data are human, bulk, paired-end
RNA-seq, but this pipeline can be applied to other organisms or
single-read libraries.
This pipeline should be completed in R and RStudio. You should also install the following packages.
#CRAN packages
install.packages(c("tidyverse", "WGCNA"))
#Bioconductor packages
install.packages("BiocManager")
BiocManager::install(c("edgeR", "limma", "patchwork"))
#GitHub packages
install.packages("devtools")
devtools::install_github("BIGslu/BIGverse")
#Or if fails, install packages individually
devtools::install_github("BIGslu/kimma")
devtools::install_github("BIGslu/BIGpicture")
devtools::install_github("BIGslu/RNAetc")
devtools::install_github("BIGslu/SEARchways")
And load them into your current R session.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(BIGverse)
## ── Attaching packages ──────────────────────────────────────── BIGverse 1.0.0 ──
## ✔ kimma 1.6.1 ✔ BIGpicture 1.1.0
## ✔ RNAetc 1.1.0 ✔ SEARchways 1.3.2
## ── Conflicts ─────────────────────────────────────────── BIGverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(limma)
library(patchwork)
library(WGCNA)
## Loading required package: dynamicTreeCut
## Loading required package: fastcluster
##
## Attaching package: 'fastcluster'
##
## The following object is masked from 'package:stats':
##
## hclust
##
##
##
## Attaching package: 'WGCNA'
##
## The following object is masked from 'package:stats':
##
## cor
set.seed(651)
Example data were obtained from virus-stimulated human plasmacytoid dendritic cells. Actual patient identifiers and metadata have been altered for this tutorial.
Dill-McFarland KA, Schwartz JT, Zhao H, Shao B, Fulkerson PC, Altman MC, Gill MA. 2022. Eosinophil-mediated suppression and Anti-IL-5 enhancement of plasmacytoid dendritic cell interferon responses in asthma. J Allergy Clin Immunol. Epub ahead of print. doi: 10.1016/j.jaci.2022.03.025. — GitHub
Specifically, this tutorial uses RNAseq data processed using our SEAsnake
and counts
to voom pipelines, resulting in voom-normalized, log2 counts per
million (CPM) expression and associated sample metadata in a
limma EList
object in the data_clean/
directory. These data are available in the kimma
package
within the example.voom
data object.
All counts, gene, and sample metadata are contained in a single
object in the kimma
package. Note that these data contain
only 1000 genes from the full data set create in our counts to voom pipeline. This is to reduce
computational time.
#Extract and rename data object
dat <- kimma::example.voom
We will take a supervised approach to co-expression module creation. Thus, we first determine genes that are associated with our variable(s) of interest. Then, these genes are grouped into modules using weighted gene correlation network analysis (WGCNA). You can also take an unsupervised approach using all genes in the data set. Please see the WGCNA tutorial for an unsupervised example.
We take a supervised approach as it reduces computational time and results in more interpretable modules.
First, we determine genes associated with viral infection and/or asthma in this data set. You can see how we determined this as our best fit model in our voom to DEG tutorial.
virus_lme <- kmFit(dat = dat,
model = "~ virus*asthma + (1|ptID)",
run_lme = TRUE,
use_weights = TRUE)
## lme/lmerel model: expression~virus*asthma+(1|ptID)
## Input: 12 libraries from 6 unique patients
## Model: 12 libraries
## Complete: 1000 genes
## Failed: 0 genes
We extract genes significant for the interaction term as well as those significant for virus or asthma alone.
genes.interact <- virus_lme$lme %>%
filter(variable == "virus:asthma" & FDR < 0.2) %>%
pull(gene)
genes.virus <- virus_lme$lme %>%
filter(variable == "virus" & FDR < 0.2) %>%
pull(gene)
genes.asthma <- virus_lme$lme %>%
filter(variable == "asthma" & FDR < 0.2) %>%
pull(gene)
genes.all <- unique(c(genes.interact, genes.virus, genes.asthma))
This results in 485 genes for use in module building. Note that this example data set contains only 1000 genes out of the roughly 14,000 protein coding genes usually seen in an RNAseq data set. Thus, the genes used in module analysis here are many fewer than would be seen in a full data set.
We filter the dat
object to contain only these
genes.
dat.subset <- dat[genes.all, ]
The goal of weighted gene correlation network analysis (WGCNA) is to reduce data complexity by grouping genes into co-expression modules. This can take 1000s of genes down to only 10s of modules for analysis, thus dramatically reducing multiple comparison penalties (e.g. FDR). WGCNA modules contain correlated genes with similar expression patterns across samples. However, genes in a module do not necessary have the same average expression and a module can, thus, contain both lowly and highly expressed genes.
There are a number of parameters that impact module builds. The first we will explore is soft thresholding power, which determines how correlated genes must be in order to be in the same module. We check values from 1 to 30 in a signed network because positive/negative expression values have meaning.
# You may wish to run on multiple processors
# WGCNA::allowWGCNAThreads(nThreads = 4)
fit <- pickSoftThreshold(data = t(dat.subset$E),
powerVector = c(1:30),
networkType = "signed")
## Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
## 1 1 0.26313660 9.943599 0.8813112 502.006481 503.1508775 531.43769
## 2 2 0.03041606 -1.797055 0.8636882 283.230328 280.6590060 330.74298
## 3 3 0.02812478 -1.095957 0.8733717 173.482287 171.1222482 226.52349
## 4 4 0.05936983 -1.048659 0.8945757 113.012955 111.0505139 164.06308
## 5 5 0.11880939 -1.048333 0.9066056 77.246302 75.0113257 123.94006
## 6 6 0.17295138 -1.048439 0.9210418 54.878815 52.6212801 96.89712
## 7 7 0.25743039 -1.072228 0.9179633 40.246468 38.1184746 77.60957
## 8 8 0.36256555 -1.115785 0.9260355 30.311291 28.2546075 63.38037
## 9 9 0.47816903 -1.141153 0.9427763 23.350851 21.4726710 52.59127
## 10 10 0.54194055 -1.114224 0.9113726 18.342379 16.4120450 44.25168
## 11 11 0.62117033 -1.181646 0.9257618 14.654301 12.8113442 38.13064
## 12 12 0.72867667 -1.292638 0.9679070 11.883257 10.1154016 33.55203
## 13 13 0.77459583 -1.348804 0.9650979 9.763989 8.0714519 29.79725
## 14 14 0.81858626 -1.397540 0.9821075 8.117515 6.4733097 26.67198
## 15 15 0.78215560 -1.484216 0.9263327 6.820289 5.2857455 24.03698
## 16 16 0.79957493 -1.544640 0.9232817 5.785278 4.3256524 21.79033
## 17 17 0.88055940 -1.490901 0.9795207 4.950042 3.6054073 19.85593
## 18 18 0.89425851 -1.489373 0.9631134 4.269043 3.0279837 18.17602
## 19 19 0.91864982 -1.495883 0.9768264 3.708566 2.5429408 16.70603
## 20 20 0.93217689 -1.491355 0.9844292 3.243310 2.1659235 15.41106
## 21 21 0.92049496 -1.549493 0.9683322 2.854045 1.8600610 14.58156
## 22 22 0.91613250 -1.596012 0.9574356 2.525993 1.5838550 13.98098
## 23 23 0.94146629 -1.589420 0.9633023 2.247671 1.3623646 13.45295
## 24 24 0.91362068 -1.635744 0.9212029 2.010072 1.1693855 12.98548
## 25 25 0.93581070 -1.651049 0.9444718 1.806065 1.0251724 12.56893
## 26 26 0.91744934 -1.708221 0.9262327 1.629958 0.9068533 12.19550
## 27 27 0.90604648 -1.736650 0.9038590 1.477171 0.7978643 11.85880
## 28 28 0.89618355 -1.745676 0.8812566 1.343993 0.6930648 11.55362
## 29 29 0.85142506 -1.815295 0.8252909 1.227395 0.6114491 11.27562
## 30 30 0.85695984 -1.829880 0.8343792 1.124888 0.5439173 11.02120
You can roughly see how gene correlation associates with power in the automatically generated table but we also create plots for easier interpretation. First looking at the signed R2, we see a general increase in gene correlation with power until correlation is maximized around power 17. In supervised module building, there are also often 1 or 2 local maxima such as powers 2 and 14 here.
as.data.frame(fit$fitIndices) %>%
#Calculate scale free topology
dplyr::mutate(signed.R.sq = -sign(slope)*SFT.R.sq) %>%
ggplot(aes(x = Power, y = signed.R.sq, label = Power)) +
geom_text(size = 3) +
theme_classic() +
labs(y = "Scale free topology model fit,signed R^2",
x = "Soft threshold (power)") +
scale_y_continuous(breaks = round(seq(0, 1, by = 0.1), digits = 1))
Next, we look at connectivity which is a measure of how many genes in a module meet the correlation cutoff. This roughly correlates with how many genes are in each module with lower connectivity resulting in smaller modules. Here, we see an expected decrease in connectivity with increase in power.
as.data.frame(fit$fitIndices) %>%
ggplot(aes(x = Power, y = mean.k., label = Power)) +
geom_text(size = 3) +
theme_classic() +
labs(y = "Mean connectivity",
x = "Soft threshold (power)")
Taken together, we choose a soft thresholding power of 14 because it is a local maximum and achieves correlation > 0.8. We will also compare to power 23 as the true maximum correlation to show how this effects modules in this tutorial.
as.data.frame(fit$fitIndices) %>%
filter(Power %in% c(14, 23))
## Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
## 1 14 0.8185863 -1.39754 0.9821075 8.117515 6.473310 26.67198
## 2 23 0.9414663 -1.58942 0.9633023 2.247671 1.362365 13.45295
If you’ve selected few genes and/or your data are not strongly correlated, you many not reach an R2 of 0.8 or your local maximum may be much lower. This is because there are not many genes that meet the correlation cutoff. In this case, we either choose the local maximum closest to 0.8 or go back and select more genes for module building.
We build modules with our two chosen soft thresholding powers and WGCNA defaults, except we continue to use a signed network and ask it for numeric labels.
mod.p14 <- blockwiseModules(datExpr = t(dat.subset$E),
power = 14,
networkType="signed",
numericLabels = TRUE,
nthreads = 4)
mod.p23 <- blockwiseModules(datExpr = t(dat.subset$E),
power = 23,
networkType="signed",
numericLabels = TRUE,
nthreads = 4)
The first thing to note is module 0. This is the trash module; genes in this module are not correlated with one another. In an unsupervised analysis with all genes, module 0 can’t be analyzed as it is a random assortment of genes. In our supervised analysis, however, we already know that these genes are associated with virus and/or asthma, so we may be able to glean some insight from them (more on this in Section 4).
Comparing the two module builds, we see the power 14 has fewer genes in module 0 than power 23. This makes sense as power 23 demands that genes be more highly correlated in order to be in a module together. Since both builds have the same number of modules, the additional genes in module 0, power 23 come from those that “fell out” of all the other modules. In a power comparison, you may also see fewer modules at higher powers when enough genes “fall out” of the smaller modules to no longer meet the minimum module size, thus that entire module is redistributed across the other modules, mostly to module 0.
as.data.frame(mod.p14$colors) %>%
rownames_to_column("gene") %>%
count(`mod.p14$colors`)
## mod.p14$colors n
## 1 0 9
## 2 1 76
## 3 2 67
## 4 3 66
## 5 4 64
## 6 5 60
## 7 6 56
## 8 7 52
## 9 8 35
as.data.frame(mod.p23$colors) %>%
rownames_to_column("gene") %>%
count(`mod.p23$colors`)
## mod.p23$colors n
## 1 0 79
## 2 1 68
## 3 2 58
## 4 3 53
## 5 4 51
## 6 5 50
## 7 6 48
## 8 7 44
## 9 8 34
There are a lot of module building parameters as you can see in the
?blockwiseModules
help page. We like to explore
minModuleSize
, deepSplit
, and
maxBlockSize
as these generally achieve useful modules.
Here, we will use the power 14 build to assess these additional
parameters.
As the name suggests, minimum module size is the minimum number of correlated genes needed to make a module. If you have a group of correlated genes that do not make the minimum, those genes get redistributed, thus changing the overall network.
The default minModSize
is the minimum of 20 or 1/2 the
number of genes. In these data, that is 242.5 so our default run had
minModSize = 20
and we saw that our smallest module 8 was
35 genes.
If we change the minimum to 40 genes, we lose 3 modules, not just the former module 8 that was below this cutoff. This is because the removal of one module results in the redistribution of these genes, thus changing the overall network and shifting more than just the one small module.
mod.p14.m40 <- blockwiseModules(datExpr = t(dat.subset$E),
power = 14,
networkType="signed",
numericLabels = TRUE,
nthreads = 4,
minModuleSize = 40)
as.data.frame(mod.p14.m40$colors) %>%
rownames_to_column("gene") %>%
count(`mod.p14.m40$colors`)
## mod.p14.m40$colors n
## 1 0 14
## 2 1 127
## 3 2 120
## 4 3 96
## 5 4 76
## 6 5 52
The deep network split determines where the network can be split to
form modules. It is a controlled parameter from 0 to 4 with being 0
least and 4 most sensitive. Thus, lower values result in less splitting
and fewer modules. The default deepSplit
is 2 so comparing
to 0, we see fewer modules in our data.
mod.p14.s0 <- blockwiseModules(datExpr = t(dat.subset$E),
power = 14,
networkType="signed",
numericLabels = TRUE,
nthreads = 4,
deepSplit = 0)
as.data.frame(mod.p14.s0$colors) %>%
rownames_to_column("gene") %>%
count(`mod.p14.s0$colors`)
## mod.p14.s0$colors n
## 1 0 9
## 2 1 127
## 3 2 76
## 4 3 67
## 5 4 64
## 6 5 55
## 7 6 52
## 8 7 35
The maximum block size determines the largest group of genes
analyzed. If you have more genes than the maxBlockSize
, the
data are pre-clusterd into blocks less than this size and modules are
built from within these clusters. In general, lower
maxBlockSize
results in more modules and more genes in
module 0 as a full network connecting all genes is not constructed and
you may miss some connections.
Because we only have 485 genes in this analysis, the default
maxBlockSize
of 5000 includes all genes. In most supervised
module builds, this is appropriate and we rarely change this parameter.
However, if you have highly variable gene expression, reducing the
maxBlockSize
can result in smaller, more interpretable
modules.
For example, if we reduce the maxBlockSize
to 200, we
get more genes in module 0 and generally smaller modules. This is not
recommended for these data and just shown as an example here.
mod.p14.b200 <- blockwiseModules(datExpr = t(dat.subset$E),
power = 14,
networkType="signed",
numericLabels = TRUE,
nthreads = 4,
maxBlockSize = 200)
as.data.frame(mod.p14.b200$colors) %>%
rownames_to_column("gene") %>%
count(`mod.p14.b200$colors`)
## mod.p14.b200$colors n
## 1 0 25
## 2 1 84
## 3 2 78
## 4 3 67
## 5 4 67
## 6 5 57
## 7 6 41
## 8 7 35
## 9 8 31
There are no absolute cutoffs in module building. This includes supervised gene selection and module build parameters.
Because we want to capture as much signal as possible, we use a relatively high FDR cutoff for gene selection in module building. You may go even higher than FDR < 0.2 in your analysis. The exact cutoff is arbitrary and we generally aim for 1000 to 5000 genes for module building. This is, of course, not possible in the reduced example data set here. This range is rather large given the roughly 14,000 genes in the full data set, and it will likely take some trial and error in your data. In general, if you find that you have a lot of modules not associated with your variable(s) of interest, you may need to reduce the FDR, and thus the number of genes, selected for module building. Or you may leave those genes and simply ignore modules that are not significant.
Similarly, it is helpful to compare several module builds across different parameters as summarized below. You want to minimize genes in module 0 (because these are the uncorrelated genes) without creating too many or too small modules for analysis. This process has many “right” paths. Hence, the idea that it is an art, not a science.
## power minModSize deepSplit maxBlockSize totalModules sizeModule0
## 1 14 20 2 5000 8 9
## 2 23 20 2 5000 8 79
## 3 14 40 2 5000 5 14
## 4 14 20 0 5000 7 9
## 5 14 20 2 200 8 25
Once you’ve finalized your modules, you can assess their expression
similar to individual genes. Here, we use mean expression of genes in
modules. You may use the module eigenvalues as well (contained in
mod.p14$MEs
). We generally see similar results with the two
methods and chose mean module expression because the values are more
directly interpretable in relation to actual expression.
First, we calculate the mean expression of genes in each module.
# List genes in modules and format module names
mod.genes <- as.data.frame(mod.p14$colors) %>%
tibble::rownames_to_column("geneName") %>%
dplyr::rename(module = "mod.p14$colors") %>%
#add leading 0 to module names for correct sorting of factor
dplyr::mutate(module = ifelse(module <= 9,
paste("0", module, sep=""),
module)) %>%
#Add color var
dplyr::mutate(mod.color = WGCNA::labels2colors(mod.p14$colors))
# Mean gene expression
mod.E <- as.data.frame(dat.subset$E) %>%
rownames_to_column("geneName") %>%
inner_join(mod.genes, by = "geneName") %>%
#Mean
group_by(module) %>%
summarise(across(where(is.numeric), ~mean(., na.rm=TRUE)),
.groups = "drop")
mod.E
## # A tibble: 9 × 13
## module lib1 lib2 lib3 lib4 lib5 lib6 lib7 lib8 lib9 lib10 lib11 lib12
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 00 6.51 7.13 7.35 6.99 7.40 7.31 6.82 6.32 6.55 6.40 7.47 7.22
## 2 01 9.30 8.49 9.26 8.19 9.49 8.89 9.78 8.85 9.96 9.43 9.87 9.27
## 3 02 8.36 7.19 8.26 6.60 8.02 7.00 8.30 6.91 8.22 7.00 7.75 5.78
## 4 03 8.39 9.26 7.80 9.28 8.47 9.24 7.56 9.43 7.39 8.73 7.64 8.85
## 5 04 8.57 8.26 7.98 8.30 7.98 8.23 6.88 7.36 6.59 6.75 6.59 6.90
## 6 05 9.32 8.52 9.20 8.32 8.41 8.04 8.57 7.89 8.51 7.21 8.84 7.81
## 7 06 8.32 9.51 8.50 9.85 9.34 9.97 8.78 9.80 9.09 9.88 8.62 9.68
## 8 07 9.25 9.80 9.79 9.98 10.3 10.4 10.6 10.5 10.4 10.6 10.5 10.5
## 9 08 8.32 8.60 7.37 7.98 6.84 7.19 6.81 7.35 6.92 7.25 6.73 7.63
We run a linear mixed effects model just like in Section 2, only now
we give the mean gene expression for counts. We can use the original
sample metadata as our library names are the same in the module data.
Note that the gene-level weights cannot be applied to module data and
thus, use.weights = FALSE
.
virus_mod <- kmFit(counts = mod.E,
meta = dat.subset$targets,
model = "~ virus*asthma + (1|ptID)",
run_lme = TRUE,
use_weights = FALSE)
## lme/lmerel model: expression~virus*asthma+(1|ptID)
## Input: 12 libraries from 6 unique patients
## Model: 12 libraries
## Complete: 9 genes
## Failed: 0 genes
We see that all modules are significant for virus and/or asthma, though none are signifcant for the interaction.
summarise_kmFit(virus_mod$lme)
## # A tibble: 5 × 7
## variable `fdr<0.05` `fdr<0.1` `fdr<0.2` `fdr<0.3` `fdr<0.4` `fdr<0.5`
## <fct> <int> <int> <int> <int> <int> <int>
## 1 (1 | ptID) 0 1 2 7 7 8
## 2 asthma 4 5 5 6 6 8
## 3 virus 7 8 8 8 8 9
## 4 virus:asthma 0 0 0 0 0 8
## 5 total (nonredund… 8 8 8 9 9 9
Thus, in a true analysis, we would remove the interaction in our final model.
virus_mod2 <- kmFit(counts = mod.E,
meta = dat.subset$targets,
model = "~ virus + asthma + (1|ptID)",
run_lme = TRUE,
use_weights = FALSE)
## lme/lmerel model: expression~virus+asthma+(1|ptID)
## Input: 12 libraries from 6 unique patients
## Model: 12 libraries
## Complete: 9 genes
## Failed: 0 genes
summarise_kmFit(virus_mod2$lme)
## # A tibble: 4 × 7
## variable `fdr<0.05` `fdr<0.1` `fdr<0.2` `fdr<0.3` `fdr<0.4` `fdr<0.5`
## <fct> <int> <int> <int> <int> <int> <int>
## 1 (1 | ptID) 1 1 2 7 7 8
## 2 asthma 4 5 5 6 6 8
## 3 virus 7 8 8 8 8 9
## 4 total (nonredund… 8 8 8 9 9 9
RNAetc
custom module functionsOur package RNAetc
contains functions to make module
building easier. You can filter to genes of interest and test soft
thresholds with fit_modules
.
fit <- fit_modules(dat = dat,
genes = genes.all,
powerVector=c(1:30),
networkType = "signed",
nThread = 4)
## Allowing multi-threading with up to 4 threads.
## Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.
## 1 1 0.31300 5.670 0.820 244.00 245.000 262.00
## 2 2 0.08810 2.210 0.921 144.00 144.000 169.00
## 3 3 0.01860 0.601 0.931 93.60 93.000 123.00
## 4 4 0.00160 0.130 0.976 65.00 63.900 95.80
## 5 5 0.00144 0.094 0.973 47.30 46.500 77.10
## 6 6 0.00756 -0.175 0.964 35.50 34.500 63.40
## 7 7 0.04530 -0.370 0.958 27.40 26.500 53.00
## 8 8 0.16500 -0.619 0.953 21.60 20.900 44.90
## 9 9 0.23200 -0.689 0.968 17.30 16.600 38.40
## 10 10 0.29000 -0.727 0.976 14.10 13.200 33.10
## 11 11 0.33900 -0.769 0.948 11.60 10.700 28.80
## 12 12 0.40000 -0.806 0.926 9.66 8.780 25.20
## 13 13 0.47000 -0.874 0.905 8.13 7.290 22.20
## 14 14 0.53300 -0.909 0.919 6.90 5.980 19.70
## 15 15 0.59000 -0.963 0.920 5.90 4.960 17.50
## 16 16 0.64200 -1.010 0.894 5.09 4.230 15.60
## 17 17 0.68200 -1.040 0.888 4.41 3.590 14.00
## 18 18 0.74200 -1.100 0.890 3.85 3.080 12.80
## 19 19 0.76900 -1.130 0.895 3.38 2.670 11.80
## 20 20 0.80000 -1.160 0.918 2.98 2.300 10.80
## 21 21 0.81100 -1.210 0.940 2.64 1.960 10.00
## 22 22 0.84500 -1.220 0.974 2.35 1.700 9.31
## 23 23 0.85500 -1.230 0.949 2.10 1.490 8.67
## 24 24 0.86100 -1.260 0.950 1.89 1.320 8.10
## 25 25 0.88300 -1.270 0.965 1.70 1.160 7.59
## 26 26 0.88300 -1.300 0.954 1.54 1.010 7.13
## 27 27 0.86900 -1.350 0.920 1.39 0.890 6.72
## 28 28 0.86100 -1.360 0.902 1.27 0.784 6.34
## 29 29 0.88000 -1.380 0.927 1.16 0.701 5.99
## 30 30 0.89700 -1.370 0.935 1.06 0.622 5.68
This outputs:
dat
: Filtered EList
containing only genes
of interestgenes
: Vector of genes of interestsft
: Soft thresholding results from
pickSoftThreshold
top.plot
: Scaled topology plot (shown below)connect.plot
: Connectivity plot (shown below)names(fit)
## [1] "dat" "genes" "sft" "top.plot" "connect.plot"
fit$top.plot + fit$connect.plot
Then build modules with make_modules
where you can give
a specific soft thresholding value (sft.value
) or ask it to
determine that value based on the minimum acceptable R2
(Rsq.min
). You can also set our commonly tested parameters
minModuleSize
, deepSplit
, and
maxBlockSize
as well as determine is you want mean module
expression (mods.mean
), eigenvalues
(mods.eigen
), and/or DAVID formatted data frame of genes in
modules (david
).
mod.p14 <- make_modules(fit,
sft.value = 14, # Or Rsq.min = 0.8
minModuleSize = 20,
maxBlockSize = 500,
deepSplit = 2,
networkType = "signed",
mods.mean = TRUE, mods.eigen = TRUE, david = TRUE,
nThread = 4)
This results in:
genes
, sft
, top.plot
, and
connect.plot
from fit
plusmods
: data frame of genes in modules with cleaned
namesmods.mean
: Mean gene expressionmods.eigen
: Eigenvalues of module expressiondavid
: DAVID-formatted data frame of genes in
modulesnames(mod.p14)
## [1] "genes" "mods" "sft" "top.plot" "connect.plot"
## [6] "mods.mean" "mods.eigen" "david"
mod.p14$mods[1:3,]
## geneName module module.char mod.color hgnc_symbol Previous symbols
## 1 ENSG00000002587 5 05 green HS3ST1 <NA>
## 2 ENSG00000005884 4 04 yellow ITGA3 MSK18
## 3 ENSG00000006453 1 01 turquoise BAIAP2L1 <NA>
## Alias symbols gene_biotype
## 1 3OST1 protein-coding gene
## 2 CD49c, VLA3a, VCA-2, GAP-B3 protein-coding gene
## 3 IRTKS protein-coding gene
mod.p14$mods.mean[1:3,1:3]
## module.char lib1 lib2
## module_00 00 6.514825 7.125083
## module_01 01 9.299977 8.485937
## module_02 02 8.356833 7.191985
mod.p14$david[1:3, 1:3]
## rowname module_00 module_01
## 1 1 ENSG00000133302 ENSG00000006453
## 2 2 ENSG00000135446 ENSG00000013810
## 3 3 ENSG00000158560 ENSG00000063761
See our gene set analyses tutorial for methods to assign function to modules.
sessionInfo()
## R version 4.4.0 (2024-04-24)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sonoma 14.2.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/Los_Angeles
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] WGCNA_1.72-5 fastcluster_1.2.6 dynamicTreeCut_1.63-1
## [4] patchwork_1.2.0 limma_3.60.0 SEARchways_1.3.2
## [7] BIGpicture_1.1.0 RNAetc_1.1.0 kimma_1.6.1
## [10] BIGverse_1.0.0 lubridate_1.9.3 forcats_1.0.0
## [13] stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2
## [16] readr_2.1.5 tidyr_1.3.1 tibble_3.2.1
## [19] ggplot2_3.5.1 tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] DBI_1.2.2 gridExtra_2.3 rlang_1.1.3
## [4] magrittr_2.0.3 matrixStats_1.3.0 compiler_4.4.0
## [7] RSQLite_2.3.7 png_0.1-8 vctrs_0.6.5
## [10] pkgconfig_2.0.3 crayon_1.5.2 fastmap_1.2.0
## [13] backports_1.5.0 XVector_0.44.0 labeling_0.4.3
## [16] utf8_1.2.4 rmarkdown_2.27 tzdb_0.4.0
## [19] UCSC.utils_1.0.0 preprocessCore_1.66.0 bit_4.0.5
## [22] xfun_0.44 zlibbioc_1.50.0 cachem_1.1.0
## [25] GenomeInfoDb_1.40.0 jsonlite_1.8.8 blob_1.2.4
## [28] highr_0.11 parallel_4.4.0 cluster_2.1.6
## [31] R6_2.5.1 bslib_0.7.0 stringi_1.8.4
## [34] rpart_4.1.23 jquerylib_0.1.4 Rcpp_1.0.12
## [37] iterators_1.0.14 knitr_1.47 base64enc_0.1-3
## [40] IRanges_2.38.0 Matrix_1.7-0 splines_4.4.0
## [43] nnet_7.3-19 timechange_0.3.0 tidyselect_1.2.1
## [46] rstudioapi_0.16.0 yaml_2.3.8 doParallel_1.0.17
## [49] codetools_0.2-20 lattice_0.22-6 Biobase_2.64.0
## [52] withr_3.0.0 KEGGREST_1.44.0 evaluate_0.23
## [55] foreign_0.8-86 survival_3.6-4 Biostrings_2.72.0
## [58] pillar_1.9.0 checkmate_2.3.1 foreach_1.5.2
## [61] stats4_4.4.0 generics_0.1.3 S4Vectors_0.42.0
## [64] hms_1.1.3 munsell_0.5.1 scales_1.3.0
## [67] glue_1.7.0 Hmisc_5.1-3 tools_4.4.0
## [70] data.table_1.15.4 grid_4.4.0 impute_1.78.0
## [73] AnnotationDbi_1.66.0 colorspace_2.1-0 GenomeInfoDbData_1.2.12
## [76] htmlTable_2.4.2 Formula_1.2-5 cli_3.6.2
## [79] fansi_1.0.6 gtable_0.3.5 sass_0.4.9
## [82] digest_0.6.35 BiocGenerics_0.50.0 farver_2.1.2
## [85] htmlwidgets_1.6.4 memoise_2.0.1 htmltools_0.5.8.1
## [88] lifecycle_1.0.4 httr_1.4.7 GO.db_3.19.1
## [91] statmod_1.5.0 bit64_4.0.5