This post was originally posted on my company’s engineering blog here: http://engineering.curalate.com/2018/05/16/productionalizing-ecs.html

In January of last year we decided as a company to move towards containerization and began a migration to move onto AWS ECS. We pushed to move to containers, and off of AMI based VM deployments, in order to speed up our deployments, simplify our build tooling (since it only has to work on containers), get the benefits of being able to run our production code in a sandbox even locally on our dev machines (something you can’t really do easily with AMI’s), and lower our costs by getting more out of the resources we’re already paying for.

However, making ECS production ready was actually quite the challenge. In this post I’ll discuss:

  • Scaling the underlying ECS cluster
  • Upgrading the backing cluster images
  • Monitoring our containers
  • Cleanup of images, container artifacts
  • Remote debugging of our JVM processes
Read more

Continue reading

Debugging “Maximum String literal length exceeded” with scala

Today I ran into a fascinating bug. We use ficus as a HOCON auto parser for scala. It works great, because parsing configurations into strongly typed case classes is annoying. Ficus works by using a macro to invoke implicitly in scope Reader[T] classes for data types and recursively builds the nested parser.

I went to create a test for a new custom field I added to our config:

class ProductConfigTests extends FlatSpec {
  "Configs" should "be valid in QA" in {
    assert(ConfigLoader.verify(ProductsConfig, Environment.QA).isSuccess)
  }
}

Our config verifier just invokes the hocon parser and makes sure it doesn’t throw an error. ProductsConfig has a lot of fields to it, and I recently added a new one. Suddenly the test broke with the following error:

error] Error while emitting com/services/products/service/tests/ConfigTests
[error] Maximum String literal length exceeded
[error] one error found
[error] (server/test:compileIncremental) Compilation failed
[error] Total time: 359 s, completed May 
Read more
AETR an open source workflow engine

For the past several years I’ve been thinking about the idea of an open source workflow execution engine. Something like AWS workflow but simpler. No need to upload python, or javascript, or whatever. Just call an API with a callback url, and when the API completes its step, callback to the coordinator with a payload. Have the coordinator then send that payload to the next step in the workflow, etc.

This kind of simplified workflow process is really common and I keep running into it at different places that I work at. For example, my company ingests client catalogs to augment imagery with their SKU numbers and other metadata. However that ingestion process is really fragmented and asynchronous. There’s an ingestion step, following that there is a normalization step, then a processing step, then an indexing step, etc. In the happy case everyone is hooked together with a queue pipeline … Read more

Chaos monkey for docker

I work at a mostly AWS shop, and while we still have services on raw EC2, nearly all of our new development is on Amazon ECS in docker. I like docker because it provides a unified unit of operation (a container) that makes it easy to build shared tooling regardless of language/application. It also lets you reproduce your applications local in the same environment they run remote, as well as starting fast and deploying fast.

However, many services run on a shared ECS node in a cluster, and so while things like Chaos Monkey may run around turning nodes off it’d be nice to have a little less of an impact during working hours while still being able to stress recovery and our alerting.

This is actually pretty easy though with a little docker container we call The Beast. All the beast does is run on a ECS Scheduled … Read more

Tracking batch queue fanouts

Edit: This code now exists at https://github.com/paradoxical-io/carlyle

When working in any resilient distributed system invariably queues come into play. You fire events to be handled into a queue, and you can horizontally scale workers out to churn through events.

One thing though that is difficult to do is to answer a question of when is a batch of events finished? This is a common scenario when you have a single event causing a fan-out of other events. Imagine you have an event called ProcessCatalog and a catalog may have N catalog items. The first handler for ProcessCatalog may pull some data and fire N events for catalog items. You may want to know when all catalog items are processed by downstream listeners for the particular event.

It may seem easy though right? Just wait for the last item to come through. Ah, but distributed queues are loosely ordered. If an … Read more

Sbt sonatypeRelease on Travis

I figured I’d drop a quick note here for anyone else running into an issue. If you are trying to do a sonatypeRelease via sbt 1.0.3 on travis and getting a

Credentials file /home/travis/.sbt/credentials does not exist

Even though you are supplying your own inline creds, just drop in a fake creds file into that location:

realm=Sonatype Nexus Repository Manager
host=x.y.z
user=none
password=none

Via

cp scripts/creds.fake /home/travis/.sbt/credentials

And save yourself the 2 days of headache I have had :p… Read more

Functors in scala

A coworker of mine and I frequently talk about higher kinded types, category theory, and lament about the lack of unified types in scala: namely functors. A functor is a fancy name for a thing that can be mapped on. Wanting to abstract over something that is mappable comes up more often than you think. I don’t necessarily care that its an Option, or a List, or a whatever. I just care that it has a map.

We’re not the only ones who want this. Cats, Shapeless, Scalaz, all have implementations of functor. The downside there is that usually these definitions tend to leak throughout your ecosystem. I’ve written before about ecosystem and library management, and it’s an important thing to think about when working at a company of 50+ people. You need to think long and hard about putting dependencies on things. Sometimes you can, if those libraries have … Read more

Tracing High Volume Services

This post was originally posted at engineering.curalate.com

We like to think that building a service ecosystem is like stacking building blocks. You start with a function in your code. That function is hosted in a class. That class in a service. That service is hosted in a cluster. That cluster in a region. That region in a data center, etc. At each level there’s a myriad of challenges.

From the start, developers tend to use things like logging and metrics to debug their systems, but a certain class of problems crops up when you need to debug across services. From a debugging perspective, you’d like to have a higher projection of the view of the system: a linearized view of what requests are doing. I.e. You want to be able to see that service A called service B and service C called service D at the granularity of single requests.… Read more

From Thrift to Finatra

Originally posted on the curalate engineering blog

There are a million and one ways to do (micro-)services, each with a million and one pitfalls. At Curalate, we’ve been on a long journey of splitting out our monolith into composable and simple services. It’s never easy, as there are a lot of advantages to having a monolith. Things like refactoring, code-reuse, deployment, versioning, rollbacks, are all atomic in a monolith. But there are a lot of disadvantages as well. Monoliths encourage poor factoring, bugs in one part of the codebase force rollbacks/changes of the entire application, reasoning about the application in general becomes difficult, build times are slow, transient build errors increase, etc.

To that end our first foray into services was built on top of Twitter Finagle stack. If you go to the page and can’t figure out what exactly finagle does, I don’t blame you. The documentation is … Read more