Description

This workshop introduces popular data visualization methods for single cell RNA-seq data. Specifically, we will cover UMAP/tSNE and heatmaps.

During the workshop, we will build an R script together, which is available at https://github.com/BIGslu/workshops/blob/main/2023.02.07_scRNAseq.viz.workshop/2023.02.07_live.notes.R

The video recording is available at https://youtu.be/x-BtCyLEH8c

Prior to this workshop

Please following the setup instructions at https://bigslu.github.io/workshops/setup/setup.html

If you are brand new to R, we recommend completely our 1-hour Introduction to R workshop to prepare, https://bigslu.github.io/workshops/introR.workshop/introR.html

Setup

R project and script

Create a new R project and new R script to save your code. See Intro R for more information if you are unfamiliar with these.

Load packages

You should have installed packages prior to the workshop. For more information on installation, see the setup instructions.

Every time you open a new RStudio session, the packages you want to use need to be loaded into the R workspace with the library function. This tells R to access the package’s functions and prevents RStudio from lags that would occur if it automatically loaded every downloaded package every time you opened it. To put this in perspective, I had 464 packages installed at the time this document was made.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   1.0.1 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.3.0      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 1.0.0 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Because tidyverse is a meta-package, it automatically tells you what packages it is loading and their versions. In addition, the Conflicts section let’s you know functions in the tidyverse that alter exist in your R session. Because you chose to load the package, calling the function filter will use the tidyverse function not the stats function (which comes with base R). If you for some reason needed the stats version, you can specify it with package:: like stats::filter.

library(Seurat)
## Attaching SeuratObject
library(ComplexHeatmap)
## Loading required package: grid
## ========================================
## ComplexHeatmap version 2.12.1
## Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
## Github page: https://github.com/jokergoo/ComplexHeatmap
## Documentation: http://jokergoo.github.io/ComplexHeatmap-reference
## 
## If you use it in published research, please cite either one:
## - Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional 
##     genomic data. Bioinformatics 2016.
## - Gu, Z. Complex Heatmap Visualization. iMeta 2022.
## 
## 
## The new InteractiveComplexHeatmap package can directly export static 
## complex heatmaps into an interactive Shiny app with zero effort. Have a try!
## 
## This message can be suppressed by:
##   suppressPackageStartupMessages(library(ComplexHeatmap))
## ========================================
library(circlize)
## ========================================
## circlize version 0.4.15
## CRAN page: https://cran.r-project.org/package=circlize
## Github page: https://github.com/jokergoo/circlize
## Documentation: https://jokergoo.github.io/circlize_book/book/
## 
## If you use it in published research, please cite:
## Gu, Z. circlize implements and enhances circular visualization
##   in R. Bioinformatics 2014.
## 
## This message can be suppressed by:
##   suppressPackageStartupMessages(library(circlize))
## ========================================

These packages automatically print some information of dependencies, citation, etc. You don’t need to worry about these messages.

library(ggalluvial)

And finally, these packages load silently.

Download and load data

We will use example data provided by 10X. These data were pre-cleaned and normalized for this workshop so we can get right to plotting! You can see the cleaning steps in this markdown with more information in the Seurat tutorial.

Please download the data and place the RData file in your project directory.

Then, load the data into R.

load("pbmc_clean.RData")

Data quality

First let’s explore the final data quality. While Seurat has a number of plotting functions, we will also build a plot from scratch so that you can further customize as needed.

The percent mitochondrial data are contained in the data frame pbmc[["percent.mt"]]. See the data cleaning notes for how this metric was calculated.

class(pbmc[["percent.mt"]])
## [1] "data.frame"
head(pbmc[["percent.mt"]])
##                  percent.mt
## AAACATACAACCAC-1  3.0177759
## AAACATTGAGCTAC-1  3.7935958
## AAACATTGATCAGC-1  0.8897363
## AAACCGTGCTTCCG-1  1.7430845
## AAACCGTGTATGCG-1  1.2244898
## AAACGCACTGGTAC-1  1.6643551

Violin plot

Seurat provides an easy way to make violin plots with VlnPlot.

VlnPlot(pbmc, features = "percent.mt", group.by = "orig.ident")

Now let’s recapitulate this plot in ggplot. Ggplot uses layers connected by + to progressively build more and more complex and customized plots. To start, we create a simple dot plot but we see that this is not very readable since there are so many data points (one per cell in the data set).

ggplot(data = pbmc[["percent.mt"]]) +
  aes(x = 1, y = percent.mt) +
  geom_point()

We can improve readability by jittering the points on the x axis.

ggplot(data = pbmc[["percent.mt"]]) +
  aes(x = 1, y = percent.mt) +
  geom_jitter(height = 0, width = 0.2)