This post was originally posted on my company’s engineering blog here: http://engineering.curalate.com/2018/05/16/productionalizing-ecs.html

In January of last year we decided as a company to move towards containerization and began a migration to move onto AWS ECS. We pushed to move to containers, and off of AMI based VM deployments, in order to speed up our deployments, simplify our build tooling (since it only has to work on containers), get the benefits of being able to run our production code in a sandbox even locally on our dev machines (something you can’t really do easily with AMI’s), and lower our costs by getting more out of the resources we’re already paying for.

However, making ECS production ready was actually quite the challenge. In this post I’ll discuss:

  • Scaling the underlying ECS cluster
  • Upgrading the backing cluster images
  • Monitoring our containers
  • Cleanup of images, container artifacts
  • Remote debugging of our JVM processes
Chaos monkey for docker

I work at a mostly AWS shop, and while we still have services on raw EC2, nearly all of our new development is on Amazon ECS in docker. I like docker because it provides a unified unit of operation (a container) that makes it easy to build shared tooling regardless of language/application. It also lets you reproduce your applications local in the same environment they run remote, as well as starting fast and deploying fast.

However, many services run on a shared ECS node in a cluster, and so while things like Chaos Monkey may run around turning nodes off it’d be nice to have a little less of an impact during working hours while still being able to stress recovery and our alerting.

Dynamic HAProxy configs with puppet

I’ve posted a little about puppet and our teams ops in the past since my team has pretty heavily invested in the dev portion of the ops role. Our initial foray into ops included us building a pretty basic puppet role based system which we use to coordinate docker deployments of our java services.

We use HAProxy as our software load balancer and the v1 of our infrastructure managment had us versioning a hardcoded haproxy.cfg for each environment and pushing out that config when we want to add or remove machines from the load balancer. It works, but it has a few issues

  1. Cluster swings involve checking into github. This pollutes our version history with a bunch of unnecessary toggling
  2. Difficult to automate swings since its flat file config driven and requires the config to be pushed out from puppet

Leveraging message passing to do currying in ruby

I’m not much of a ruby guy, but I had the inkling to play with it this weekend. The first thing I do when I’m in a new language is try to map constructs that I’m familiar with, from basic stuff like object instantiation, singletons, inheritance, to more complicated paradigms like lambdas and currying.

I came across this blog post that shows that ruby has a way to auto curry lambdas, which is actually pretty awesome. However, I was a little confused by the syntax

a.send(fn, b)

I’m more used to ML style where you would do

fn a b 

So what is a.send doing?

Message passing

Ruby exposes its dynamic dispatch as a message passing mechanism (like objective c), so you can send “messages” to objects. It’s like being able to say “hey, execute this function (represented by a string) on this context”.

Pulling back all repos of a github user

I recently had to relinquish my trusty dev machine (my work laptop) since I got a new job, and as such am relegated to using my old mac laptop at home for development until I either find a new personal dev machine or get a new work laptop. For those who don’t know, I’m leaving the DC area and moving to Seattle to work for Amazon, so that’s pretty cool! Downside is that it’s Java and Java kind of sucks, but I can still do f#, haskell, and all the other fun stuff on the side.

Anyways, since I’m setting up my home dev environment I wanted to pull back all my github repos in one go. If I only had a few of them I would’ve just cloned them by hand, but I have almost 30 repos, which puts me in the realm of wanting to automate it.

