AWS ECS with CloudWatch

AWS ECS is a nice environment to run your applications in. But sometimes you want “hot of the press” Docker features, which you can not configure in your task definitions just yet - like the awslogs log driver, which forwards your Docker logs to CloudWatch.

When using the Amazon provided ECS AMIs the setup can be bit complicated for non-us regions, so here’s a simple solution to make it work until the task definitions support the log driver:

Use cloud-init userdata to configure your instances properly:

#cloud-config

write_files:
  - path: /etc/ecs/ecs.config
    content: |
        ECS_CLUSTER=my-cluster
    owner: root:root
  - path: /etc/awslogs/awscli.conf
    content: |
        [plugins]
        cwlogs = cwlogs

        [default]
        region = eu-west-1
        aws_access_key_id = AKIAIOSFODNN7EXAMPLE
        aws_secret_access_key = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
    owner: root:root
  - path: /etc/sysconfig/docker
    content: |
      DAEMON_MAXFILES=1048576
      OPTIONS="--default-ulimit nofile=1024:4096 --log-driver=awslogs --log-opt awslogs-region=eu-west-1 --log-opt awslogs-group=my-cluster"

package_upgrade: true
packages:
  - awslogs

runcmd:
  - service awslogs start
  - chkconfig awslogs on
  - sed -i '/region = us-east-1/c\region = eu-west-1' /etc/awslogs/awscli.conf
  - service awslogs restart
  - service docker restart
  - start ecs

First, I’m configuring my ECS agent to join the right cluster, then I’m writing the awslogs agent configuration.
Here’s the catch I’ve tripped over repeatedly:

when installing the awslogs package, the configuration files region always get’s replaced with us-east-1.

To correct this I’m using sed, replacing the wrong region, and restarting the awslogs agent.

Lastly, the Docker configuration files is overwritten, instructing Docker to forward all logs to CloudWatch, into a log group called my-cluster. This requires a restart of the Docker daemon, followed by a start of the ECS agent.

Done.

Hopefully this workaround won’t be required for too long, because there are two downsides: a) all logs are forwarded to CloudWatch, even those you are not interested in, and b) you can not direct them to per-container log groups.

But for now, it’s good enough - and easy to integrate into tools like terraform :)


Cheers to 2016

2015 was a turbulent year for me: I switched jobs twice, moved closer to Hamburg, lost a great companion and won a new one.

Guessing from the past two months 2016 will be a turbulent year as well, because I’ll have to help Yumi become the awesome border colli he’s supposed to be.

an image of yumi, our border-colli an image of yumi, our border-colli an image of yumi, our border-colli

So cheers to a great 2016. Cheers to new challenges and new adventures.


Migrating applications to AWS

Today I want to demonstrate how to move a web application into “the cloud”, specifically AWS. Along the way I’ll point out some details about the general steps necessary.

I’ll be using a simple Sinatra based url-shortener service I found on github.

The general steps look like this:

  • create an AMI containing all necessary dependencies
  • create the necessary infrastructure in the cloud
  • adjust deployment strategies

Before going into the details, I’m aiming for a reliable solution, which ideally scales quickly. From were I stand baking is a good solution for that, bringing us closer to ideas like immutable infrastructure, immutable servers.

While Docker seems like a good candidate I’ll be using traditional EC2 instances with pre-provisioned AMIs, using Vagrant, Ansible, Packer and Terraform for most of the heavy lifting. I’ll probably revisit this example with Docker and ECS some other time.

Assuming you have all tools installed you can also follow along.

provisioning a local VM

We need to write provisioning for our service. Instead of using AWS for this step we’ll prototype quickly using a local VM. Once the VM works we’ll adjust the provisioning to also work with Packer on AWS.

$ vagrant box add ubuntu/trusty64
$ vagrant init ubuntu/trusty64

Note that I’ve modified the Vagrantfile to include a private network with the ip ‘10.0.0.199’

Now, start the empty VM

$ vagrant up

Ansible requires an inventory, containing the details on which hosts to provision, as well as a playbook to run:

# hosts
[vagrant]
10.0.0.199
# playbook.yml
- hosts: all
  sudo: true
  gather_facts: false
  vars:
    redis_url: redis://127.0.0.1:6379
    gem_home: /home/ubuntu/.gem
    app_directory: /home/ubuntu/app
  tasks:
    - apt:
        name: "{{ item }}"
        state: latest
        update_cache: yes
      with_items:
        - build-essential
        - ruby
        - ruby-dev
        - zip

    - gem:
        name: bundler
        state: latest
        user_install: no

    - file:
        path: "{{ app_directory }}"
        state: directory
        owner: ubuntu
        group: ubuntu

    - unarchive:
        copy: yes
        src: artifact.zip
        dest: "{{ app_directory }}"
        owner: ubuntu
        group: ubuntu
        creates: /home/ubuntu/app/app_config.rb

    - command: bundle install chdir={{ app_directory }}
      sudo_user: ubuntu
      environment:
        GEM_HOME: "{{ gem_home }}"

    - template:
        src: service.conf.j2
        dest: /etc/init/url-shortener.conf

    - service:
        name: url-shortener
        enabled: yes
        state: started

- hosts: vagrant
  sudo: true
  gather_facts: false
  tasks:
    - apt: name=redis-server state=latest
    - service: name=redis-server enabled=true state=started

Aside: the above playbook is not very well structured. In reality you’d want to split the playbook into multiple roles instead, thus making it easier to understand and maintain

# service.conf.j2
description "URL Shortener Service"

start on runlevel [2345]
stop on runlevel [!2345]

env RACK_ENV=production
env GEM_HOME="{{ gem_home }}"
env REDIS_URL="{{ redis_url }}"

setuid ubuntu
setgid ubuntu

chdir {{ app_directory }}

exec /usr/local/bin/bundle exec rackup

respawn

The playbook runs in two stages: the first stage runs against all hosts to setup the entire application. The second stage installs dependencies, which later on will run outside the instance in the cloud. This is to produce a working system, to prototype and make sure it’s working beforehand.

Packer will execute ansible using ansible-local, which will only execute the first stage.

In our case the first stage installs ruby & bundler. Then, a user is created under which the application will be executed. Lastly, a yet to create artifact is copied onto the VM & unpacked.

The artifact will contain the source for our service. After installing the gem dependencies a service definition is created, to ensure the application starts up once the VM starts.

The second stage only installs redis, which is used as a data store by the url shortener.

Now, to get a working VM we need to create the artifact. Before that, however, you’ll have to make a tiny adjustment to our service: the ELB needs an endpoint to check if it is healthy. Only healthy instances are to receive traffic.

I’ve thus modified the source slightly to add such an endpoint:

# app.rb
before do
  uri = URI.parse(ENV["REDIS_URL"])
  if !$r
    $r = Redis.new(:host => uri.host, :port => uri.port, :password => uri.password)
    $r.client.connect
  end
end

get "/status" do
  if $r.client.connected?
    status 200
  else
    status 500
  end
  body nil
end

Now let’s clone the repo & add our adjustments:

$ cd /tmp
$ git clone https://github.com/felipernb/url-shortener.git
$ cd url-shortener
# adjust app.rb
$ zip -r artifact.zip *

That’s all - the artifact is just an archive.

Aside: For compiled languages you’ll create an executable for the target system instead.

Let’s provision the VM now:

$ ansible-playbook -i hosts playbook.yml  --private-key=.vagrant/machines/default/virtualbox/private_key -u vagrant

After a short while the provisioner should finish. Let’s check that the service is running:

$ curl 10.0.0.199:9292/status -vv

If the response is 200 OK, we’re ready to move it to the cloud.

Moving it to the cloud

Packer uses a JSON based template definition, containing information on where, how and what to build. Since we already wrote our provisioning the template is pretty forward: launch a small AWS EC2 instance, install ansible on it, copy our playbook onto the machine and execute it.

# template.json
{
  "builders": [{
    "type": "amazon-ebs",
    "ami_name": "url_shortener",
    "ssh_username": "ubuntu",
    "source_ami": "ami-47a23a30",
    "instance_type": "t2.micro",
    "region": "eu-west-1",
    "force_deregister": true
  }],
  "provisioners": [
    {
      "type": "shell",
      "inline": [
        "sudo apt-get install -y software-properties-common",
        "sudo apt-add-repository ppa:ansible/ansible",
        "sudo apt-get update",
        "sudo apt-get install -y ansible"
      ]
    },
    {
      "type": "ansible-local",
      "playbook_file": "playbook.yml",
      "playbook_dir": "."
    }
  ]
}

Aside: the source_ami, ami-47a23a30, is the ID of the AWS ubuntu base image. This ID depends on the region you are in.

Executing packer give us an AMI ID which we can then feed into Terraform.

$ packer build template.json

Once we create a redis server and adjust the configuration the service should work just as expected.

Since we do not know the details about redis at this point we’ll make use of a cloud-init script to rewrite the configuration once the EC2 instances start up. We’ll be using terraform to create this cloud-init script, since terraform acts as our single point of truth.

Alternativly we could setup service discovery, e.g. with consul and configure the service ad hoc when the system boots up. I’ll leave that for a follow up post.

Connecting the dots: terraforming

Terraform only needs the AMI ID from the previous steps to create our entire infrastructure. This means setting up a redis cluster, an elastic loadbalancer, a autoscaling group and a launch configuration:

# main.tf
provider "aws" {
  region = "eu-west-1"
}

variable "ami" {
  default = "ami-e0f25593"
}

resource "template_file" "init" {
  template = "cloud-init/configure.sh.tpl"

  vars {
    redis_url = "${aws_elasticache_cluster.redis.cache_nodes.0.address}"
    redis_port = "${aws_elasticache_cluster.redis.cache_nodes.0.port}"
  }
}

resource "aws_elasticache_cluster" "redis" {
  cluster_id = "redis-cluster"
  engine = "redis"
  node_type = "cache.t2.micro"
  port = 6379
  num_cache_nodes = 1
  parameter_group_name = "default.redis2.8"
}

resource "aws_launch_configuration" "service" {
    image_id = "${var.ami}"
    instance_type = "t2.small"
}

resource "aws_autoscaling_group" "service" {
  name = "service"
  max_size = 2
  min_size = 1
  health_check_grace_period = 300
  health_check_type = "ELB"
  desired_capacity = 1
  force_delete = true
  launch_configuration = "${aws_launch_configuration.service.name}"
  load_balancers = ["${aws_elb.service.name}"]
  availability_zones = ["eu-west-1a"]
}

resource "aws_elb" "service" {
  listener {
    instance_port = 80
    instance_protocol = "http"
    lb_port = 80
    lb_protocol = "http"
  }

  health_check {
    healthy_threshold = 2
    unhealthy_threshold = 2
    timeout = 3
    target = "HTTP:80/status"
    interval = 30
  }

  availability_zones = ["eu-west-1a"]
}
# cloud-init/configure.sh.tpl
#!/bin/bash

sed -i '8s/.*/env REDIS_URL="redis:\/\/${redis_url}:${redis_port}"/' /etc/init/url-shortener.conf

sudo stop url-shortener
sudo start url-shortener

Note that you could also write the entire upstart.d script using the cloud-init daemon. But that splits some configuration details between ansible & terraform, e.g. where the GEM_HOME is located. I’ve opted to instead only adjust the changed content.

Once you execute terraform apply you’ll create all of the above services & connect things together.

To get the DNS of the ELB you can query terraform:

$ terraform show | grep dns_name
  dns_name = tf-lb-vcug5v54yfhyzaamz6hw43ey2i-2098903617.eu-west-1.elb.amazonaws.com

To once again check that your service is running properly you’ll have to curl this DNS:

$ curl tf-lb-vcug5v54yfhyzaamz6hw43ey2i-2098903617.eu-west-1.elb.amazonaws.com/status -vv

Aside: do not forget to terraform destroy if you followed along, otherwise you’ll be charged for the running services.

conclusion

The hashicorp tools make it extremly easy to move existing web applications into the cloud with only minimal changes to your application. For most application all you need to do is to add an health check endpoint.

The choice of tools also doesn’t matter. What matters is that you automate your infrastructure, ensuring that every change is checked into version control.

Also note that the above scripts are just a rough starting point. There are plenty of opportunities to improve them along the way:

  • restructure them for readability & maintainability (e.g. ansible -> roles, terraform -> modules)
  • introduce a multi-stage build setup to improve build times (e.g. first hardening the OS, then installing generic ruby dependencies, lastly only the application)

I hope this blog post got you excited about AWS & infrastructure in general!


Farewell Balto

you were a true companion, always - and a great guide.
you will always be with us in our hearts. Forever.

an image of balto, our border-colli an image of balto, our border-colli an image of balto, our border-colli an image of balto, our border-colli