This workshop introduces popular data visualization methods for single cell RNA-seq data. Specifically, we will cover UMAP/tSNE and heatmaps.
During the workshop, we will build an R script together, which is available at https://github.com/BIGslu/workshops/blob/main/2023.02.07_scRNAseq.viz.workshop/2023.02.07_live.notes.R
The video recording is available at https://youtu.be/x-BtCyLEH8c
Please following the setup instructions at https://bigslu.github.io/workshops/setup/setup.html
If you are brand new to R, we recommend completely our 1-hour Introduction to R workshop to prepare, https://bigslu.github.io/workshops/introR.workshop/introR.html
Create a new R project and new R script to save your code. See Intro R for more information if you are unfamiliar with these.
You should have installed packages prior to the workshop. For more information on installation, see the setup instructions.
Every time you open a new RStudio session, the packages you
want to use need to be loaded into the R workspace with the
library
function. This tells R to access the package’s
functions and prevents RStudio from lags that would occur if it
automatically loaded every downloaded package every time you opened it.
To put this in perspective, I had 464 packages installed at the time
this document was made.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.3.0 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
Because tidyverse
is a meta-package, it automatically
tells you what packages it is loading and their versions. In addition,
the Conflicts section let’s you know functions in the
tidyverse
that alter exist in your R session. Because you
chose to load the package, calling the function filter
will
use the tidyverse
function not the stats
function (which comes with base R). If you for some reason needed the
stats
version, you can specify it with
package::
like stats::filter
.
library(Seurat)
## Attaching SeuratObject
library(ComplexHeatmap)
## Loading required package: grid
## ========================================
## ComplexHeatmap version 2.12.1
## Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
## Github page: https://github.com/jokergoo/ComplexHeatmap
## Documentation: http://jokergoo.github.io/ComplexHeatmap-reference
##
## If you use it in published research, please cite either one:
## - Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional
## genomic data. Bioinformatics 2016.
## - Gu, Z. Complex Heatmap Visualization. iMeta 2022.
##
##
## The new InteractiveComplexHeatmap package can directly export static
## complex heatmaps into an interactive Shiny app with zero effort. Have a try!
##
## This message can be suppressed by:
## suppressPackageStartupMessages(library(ComplexHeatmap))
## ========================================
library(circlize)
## ========================================
## circlize version 0.4.15
## CRAN page: https://cran.r-project.org/package=circlize
## Github page: https://github.com/jokergoo/circlize
## Documentation: https://jokergoo.github.io/circlize_book/book/
##
## If you use it in published research, please cite:
## Gu, Z. circlize implements and enhances circular visualization
## in R. Bioinformatics 2014.
##
## This message can be suppressed by:
## suppressPackageStartupMessages(library(circlize))
## ========================================
These packages automatically print some information of dependencies, citation, etc. You don’t need to worry about these messages.
library(ggalluvial)
And finally, these packages load silently.
We will use example data provided by 10X. These data were pre-cleaned and normalized for this workshop so we can get right to plotting! You can see the cleaning steps in this markdown with more information in the Seurat tutorial.
Please download
the data and place the RData
file in your project
directory.
Then, load the data into R.
load("pbmc_clean.RData")
First let’s explore the final data quality. While Seurat
has a number of plotting functions, we will also build a plot from
scratch so that you can further customize as needed.
The percent mitochondrial data are contained in the data frame
pbmc[["percent.mt"]]
. See the data
cleaning notes for how this metric was calculated.
class(pbmc[["percent.mt"]])
## [1] "data.frame"
head(pbmc[["percent.mt"]])
## percent.mt
## AAACATACAACCAC-1 3.0177759
## AAACATTGAGCTAC-1 3.7935958
## AAACATTGATCAGC-1 0.8897363
## AAACCGTGCTTCCG-1 1.7430845
## AAACCGTGTATGCG-1 1.2244898
## AAACGCACTGGTAC-1 1.6643551
Seurat
provides an easy way to make violin plots with
VlnPlot
.
VlnPlot(pbmc, features = "percent.mt", group.by = "orig.ident")
Now let’s recapitulate this plot in ggplot
. Ggplot uses
layers connected by +
to progressively build more and more
complex and customized plots. To start, we create a simple dot plot but
we see that this is not very readable since there are so many data
points (one per cell in the data set).
ggplot(data = pbmc[["percent.mt"]]) +
aes(x = 1, y = percent.mt) +
geom_point()
We can improve readability by jittering the points on the x axis.
ggplot(data = pbmc[["percent.mt"]]) +
aes(x = 1, y = percent.mt) +
geom_jitter(height = 0, width = 0.2)