This repo uses the Packer + Terraform combo to deploy an interactive Clojure development machine with Leiningen and BeakerX on Amazon EC2. This allows coding on a local machine while executing code directly on the AWS infrastructure with easy access to resources such as S3, EMR, Elasticsearch, etc.
The deployment involves 3 simple steps/layers:
-
provisioning an Amazon Machine Image (Packer)
-
creating a reusable EBS volume for data (Terraform)
-
creating an EC2 instance based on the above AMI and mounting the EBS volume onto it at
/data
(Terraform).
This project showcases the declarative and immutable paradigm of Packer and Terraform:
-
Declarative: we describe the resources we wish to create rather than the steps taken to build them or amend them,
-
Immutable: we destroy old resources and create new resources when we want to deploy changes, rather than amending existing resources.
-
Download this repo and install Packer and Terraform on your local machine.
-
If you don't already have one, set up an AWS account.
-
Create an IAM Profile (e.g.
terraform
) to allow Packer and Terraform to manage resources. For simplicity, give itAmazonEC2FullAccess
permissions and and anIAMPassRole
inline policy (see Packer's and Terraform's online doc to narrow the permissions to what is strictly necessary). Register this profile in your local~/.aws/credentials
file. -
Still in IAM, create an IAM role for the instance, with access to the AWS services that you'll wish to access via the instance.
-
In EC2, create the following resources so that Terraform can ssh onto the instance:
-
A key pair, which the ssh server on the instance will use to authenticate connections; download the private key onto your local machine (
.pem
file). -
A security group (firewall rules) that provides inbound TCP access to a port of your choosing for ssh access.
-
Edit the variables in packer/images.json
to match the IAM profile
that you've set up earlier on, the AWS region
and the ID of the latest Amazon Linux 2 AMI (for SSD/gp2 EBS volume)
in that region.
"variables": {
"profile": "terraform",
"region": "eu-west-1",
"ami": "ami-0bdb1d6c15a40392c"
}
Edit the dependencies in packer/files/.packer/project.clj
to
amend the list of pre-loaded Clojure libraries.
In the packer
directory, run packer build image.json
. This will
create the new image and install Leiningen, BeakerX and libraries. The image
will then be ready in the EC2 image registry.
In the terraform/ebs
directory, create a new terraform.tfvars
file from the example file. Edit the variables to match your region
and IAM profile for Terraform.
region = "eu-west-1"
availability_zone = "eu-west-1a"
profile = "terraform"
You can also edit main.tf
to change volume size and type.
In the terraform/ebs
directory, run terraform init
and then terraform apply
to create the EBS volume.
In the terraform/ec2
directory, create a new terraform.tfvars
file from the example file. Edit the variables to match your region,
IAM profile for Terraform, key name, instance IAM role, security group and
ssh connection parameters.
region = "eu-west-1"
availability_zone = "eu-west-1c"
profile = "terraform"
ec2_iam_role = "my-iam-role"
ec2_key_name = "my-ec2-key"
ssh_private_key_path = "/my-private-keys/my-ec2-key.pem"
ec2_security_groups = ["my-security-group"]
ssh_port = "12345"
You can also edit main.tf
to change instance type, root disk size...
In the terraform/ec2
directory, run terraform init
and then terraform apply
.
This creates the instance and mounts the EBS volume to it at /data
.
Ssh into the instance using the port and private key (.pem
file)
that you've set up earlier.
Instruct ssh to forward some ports from your local machine to the instance
in order to locally access the remote nREPL and Jupyter notebooks.
For example:
ssh -i "/my-private-keys/my-ec2-key.pem" \
-p 12345 \
-L 20000:localhost:20000 \
-L 20001:localhost:20001 \
ec2-user@ec2-444-333-222-111.eu-west-1.compute.amazonaws.com
You may then, via shh:
-
Start a Clojure nREPL on one of the remote forwarding ports e.g.
lein :start :port 20000
. Connect your local development environment to it via the matching local port. If started from the home directory~
, the REPL will use the minimalproject.clj
file that is pre-installed there: it will run Clojure 1.9 with thealembic
library, which allows dynamically loading dependencies from the REPL. Use the/data
directory to persist new projects and files on the reusable EBS volume. -
Start Jupyter/BeakerX on one of the remote forwarding ports e.g.
jupyter notebook --port=20001
. In your local browser, open the authentication URL produced by Jupyter (change the port to the matching local port if different). If you start Jupyter from the/data
directory, you will be be able to save notebooks on the reusable EBS volume.
The Packer template pre-populates the local Maven repository ~/.m2
with selected libraries as per packer/files/.packer/project.clj
. Leiningen
will have access to these and will, as usual, download new libraries directly
from Clojars or Maven Central into ~/.m2
if they're not already there.
BeakerX uses its own local Maven repository rather than ~/.m2
, out of concern that ~/.m2
is not entirely safe for concurrent access.
You can still instruct BeakerX to use ~/.m2
instead of its own repo,
so that it can access the pre-loaded libraries.
To do so, run %classpath config resolver mvnLocal
in your Beaker notebook.
The %classpath add mvn
directive will then load libraries from ~/.m2
.
Another reason for switching the BeakerX repo to ~/.m2
is that
BeakerX can only download dependencies from Maven Central, but not from Clojars.
So if you want to use a new Clojars library in BeakerX,
download it first into ~/.m2
with Leiningen
(with Alembic or by running lein deps
over a project.clj
) and then
load it in BeakerX with %classpath config resolver mvnLocal
and %classpath add mvn
.
Because the local Maven repository ~/.m2
is located on the instance's
transient root drive, it will be reset to its AMI state when the instance is
destroyed/recreated with Terraform. So if you use a library frequently,
burn it into the AMI with Packer via packer/files/.packer/project.clj
.
There are a few tasks that you'll want to execute after the initial deployment, such as destroying and recreating the instance, increasing the size of the EBS volume, rolling out a new AMI, or destroying and recreating the whole stack.
In Terraform's (mostly) immutable logic, we destroy and recreate
resources rather than amending/mutating them. To do this painlessly, this project follows
Terraform's guidelines
about breaking an overall configuration into several smaller configurations, and using
data sources
when a configuration needs to access access resources created by another configuration.
In our case, the terraform/ec2
configuration uses data sources to locate the AMI
and the EBS volume created by the terraform/ebs
configuration.
In the terraform/ec2
directory, run terraform destroy
. Amend
the Terraform configuration file if there's any change you want to make.
Later, recreate the instance with terraform apply
.
The new instance will be attached to the
same EBS volume as before, which remains intact.
Destroying and recreating the EC2 instance is preferred to stopping and restarting it. Restarting the instance would indeed occasion a change of IP address and hence a configuration drift between EC2 and Terraform's state files. Also, relying on successive stop/restart cycles would be an encouragement to make successive and possibly undocumented amendment to the instance; wiping the slate clean every time forces us to code changes 'at the source' in the AMI.
This is an amendment that doesn't require destruction of the existing volume.
However, you'll have to detach the volume first, which you can do by destroying
the instance first.
Edit terraform/ebs/main.tf
with then new size and run terraform apply
, after which
you can recreate the instance.
You can also change the disk type and IOPS,
subject to some limitations.
In doubt whether a change requires destruction of a resource or not, run terraform plan
.
Amend the Packer template and its associated files and scripts.
Run packer build image.json
to create the new AMI on AWS.
Destroy and recreate instance;
the Terraform configuration will automatically select
the latest version of the AMI.
Save a snapshot of the EBS volume if you wish to keep its data.
Run terraform destroy
, first in terraform/ec2
and then in terraform/ebs
.
Amend the terraform configuration files if needed, in particular if
the new EBS volume should be based on a snapshot taken earlier
(see aws_ebs_volume).
Then run terraform apply
, first in terraform/ebs
and then
in terraform/ec2
.
Since the machine is provisioned at AMI level by Packer, our Terraform configuration is generic and directly reusable for other projects.
Below a couple of comments about the Terraform code.
The ssh port of the instance can be changed from the onset
by passing a script to the user_data
property of the aws_instance
.
This will be directly executed as root
by AWS when the instance is created.
The code below uses sed
to replace the #Port 22
line in /etc/ssh/sshd_config
with a new port, and it then restarts the ssh server:
user_data = "#!/bin/bash\nsed -i 's/#Port 22/Port ${var.ssh_port}/g' /etc/ssh/sshd_config && service sshd restart"
The user_data
block is executed before Terraform runs provisioners, so Terraform
provisioners only need to know about the new port.
Terraform allows attaching an EBS volume to an EC2 instance via a
resource called aws_volume_attachment
. It's pretty straightforward: you specify
an aws_instance
and an aws_ebs_volume
, and you then specify their attachment
with aws_volume_attachment
.
resource "aws_volume_attachment" "clojure" {
device_name = "/dev/sdd"
volume_id = "${data.aws_ebs_volume.clojure.id}"
instance_id = "${aws_instance.clojure.id}"
}
In its simplest form above, aws_volume_attachment
only attaches the volume as a device (e.g. /dev/sdd
), but it doesn't
mount it onto a folder. Terraform will also refuse
to destroy the attachment if the device is already mounted (although
there is an option to add force_detach = true
to force the detachment).
Getting granular control of the mounting and unmounting process with Terraform
can come across as a confusing matter.
However, using remote_exec
provisiones within the aws_volume_attachment
does provide
such granular control.
The key here is to give these provisioners the proper ssh connection
so that they can connect to the aws_instance
and run
provisioning commands/scripts. Terraform's 'interpolation syntax'
makes this easy, by providing the hostname of the instance via
"${aws_instance.my-instance.public_dns}"
. Below the code of the
aws_volume_attachment
in terraform/ec2/main.tf
:
resource "aws_volume_attachment" "clojure" {
device_name = "/dev/sdd"
volume_id = "${data.aws_ebs_volume.clojure.id}"
instance_id = "${aws_instance.clojure.id}"
connection {
type = "ssh"
user = "ec2-user"
host = "${aws_instance.clojure.public_dns}"
port = "${var.ssh_port}"
private_key = "${file(var.ssh_private_key_path)}"
}
provisioner "remote-exec" {
script = "scripts/ebs-mount"
}
provisioner "remote-exec" {
when = "destroy"
script = "scripts/ebs-unmount"
}
}
The result of this code is that Terraform will run ebs-mount
via ssh onto
the instance once the attachment is created; and it will run ebs-unmount
before
destroying the attachment. My codes for ebs-mount
and ebs-unmount
can be found in ec2/scripts
. ebs-mount
detects whether
the volume needs to be formatted or not before mounting. ebs-unmount
sends a kill signal to processes that use the device before unmounting it.
I'm not a linux expert though, and these scripts can certainly be improved.
Nevertheless, using remote_exec
provisioners as above
seems an idiomatic way to mount/unmount volumes via aws_volume_attachment
and to tailor the process to one's needs.
Copyright © 2018 Nicolas Duchenne, Belove Ltd, London, UK
Distributed under the Creative Commons Attribution 4.0 (CC BY 4.0) license.