This is a bare bones tutorial of how to install R and associated packages on an Amazon AMI EC2 instance. It assumes that you have followed AWS setup tutorial to create an AWS account and are familiar with the steps to create an EC2 instance.
Online through AWS console, create an instance (https://us-west-2.console.aws.amazon.com/ec2).
Recommendations for R usage:
In the Terminal, log-in into your instance and perform basic setup.
More details are available in 1.AWS_setup_tutorial.pdf
.
This directory setup is slightly different than in the setup
tutorial. We mount the EBS storage to project/
and then
fuse a subdirectory project/data/
to our S3 bucket holding
all our data. This is because a fused directory becomes
read-only. Thus, if we fused to the main directory as in the
setup tutorial, we would not be able to write any of our results to the
EBS storage where we have the extra space.
## Updates if available
sudo yum upgrade -y
sudo yum update -y
## Install AWS command line client if not using Amazon OS
sudo yum install awscli -y
## Configure your account
aws configure
## FILL IN WITH YOUR KEYS ###
## Setup fuse
sudo amazon-linux-extras install -y epel
sudo yum install -y s3fs-fuse
### Fuse key
### FILL IN WITH YOUR KEYS ###
echo UserKey:SecretKey > ~/.passwd-s3fs
chmod 600 ~/.passwd-s3fs
## Setup EBS volumes
lsblk
sudo mkfs -t ext4 /dev/nvme1n1
sudo mkdir -p ~/project
sudo mount /dev/nvme1n1 ~/project/
### Change permissions
sudo chmod 777 -R ~/project/
## Mount S3 data
mkdir ~/project/data
sudo chmod 777 -R ~/project/data
s3fs kadm-data ~/project/data -o passwd_file=~/.passwd-s3fs \
-o default_acl=public-read -o uid=1000 -o gid=1000 -o umask=0007
Here, we download and install R on the main EC2 instance storage, not EBS extra storage. You could install it to extra storage if you main disk was not large enough.
### Download to a directory on the main disk
mkdir ~/apps/
sudo chmod 777 -R ~/apps
cd ~/apps/
#### Update to latest version as necessary ####
wget https://cran.r-project.org/src/base/R-4/R-4.1.1.tar.gz
tar xf R-4.1.1.tar.gz
cd R-4.1.1/
sudo yum install -y gcc gcc-c++ gcc-gfortran readline-devel \
zlib-devel bzip2 bzip2-devel xz xz-devel \
libcurl libcurl.i686 libcurl-devel.x86_64 \
openssl-devel findutils libffi-devel \
libxml2 libxml2-devel pcre java \
nlopt nlopt-devel libpng-devel cmake pkg-config #for kimma
sudo yum update -y
Update nlopt
cd
#Update CMake
wget https://cmake.org/files/v3.23/cmake-3.23.1.tar.gz
tar -xvzf cmake-3.23.1.tar.gz
cd cmake-3.23.1
./bootstrap
make
sudo make install
#Update nlopt
cd
wget https://github.com/stevengj/nlopt/archive/v2.7.1.tar.gz
tar -xf v2.7.1.tar.gz
cd nlopt-2.7.1/
mkdir build
cd build
cmake ..
make
#Restart instance
This will take several minutes.
./configure --prefix=$HOME/R-4.1.1/ --with-x=no --with-pcre1
make
Add the PATH to R to ~/.bash_profile
. Once complete,
exit and re-login to your EC2 instance for the PATH to take effect.
echo export PATH=~/apps/R-4.1.1/bin:$PATH >> ~/.bash_profile
Open R with R
[Enter]. Double check that the version
that opens is the one you downloaded. If it is not, there is likely
something wrong with your PATH in .bash_profile
.
Install packages as you would in RStudio’s console. You can speed thing up by setting the number of threads (or CPUs) in options like so. The following are recommended to install on all EC2 instances with R.
options(Ncpus = 20)
install.packages(c("foreach","doParallel",
"tidyverse","BiocManager","devtools"),
repos='http://cran.us.r-project.org')
Exit with the following.
q()
In the terminal, you can run scripts like so. I recommend using
screen
so that the script runs even if you log-out of the
instance.
screen
Rscript my_script.R
At any time, you can check what’s running on your instance with
top
in the terminal.