Category: Code

Serialization of lombok value types with jackson

For anyone who uses lombok with jackson, you should checkout jackson-lombok which is a fork from xebia that allows lombok value types (and lombok generated constructors) to be json creators.

The original authors compiled their version against jackson-core 2.4.* but the new version uses 2.6.*. Props needs to go to github user kazuki-ma for submitting a PR that actually addresses this. Paradoxical just took those fixes and published.

Anyways, now you get the niceties of being able to do:

@Value
public class ValueType{
    @JsonProperty
    private String name;
    
    @JsonProperty
    private String description;
}

And instantiate your mapper:

new ObjectMapper().setAnnotationIntrospector(new JacksonLombokAnnotationIntrospector());

Enjoy!

Cassandra DB migrations

When doing any application that involves a persistent data storage you usually need a way to upgrade and change your database using a set of scripts. Working with patterns like ActiveRecord you get easy up/down by version migrations. But with cassandra, which traditionally was schemaless, there aren’t that many tools out there to do this.

One thing we have been using at my work and at paradoxical is a simple java based cassandra loader tool that does “up” migrations based on db version scripts.

Assuming you have a folder in your application that stores db scripts like

db/scripts/01_init.cql
db/scripts/02_add_thing.cql
..
db/sripts/10_migrate_users.cql
..

Then each script corresponds to a particular db version state. It’s current state depends on all previous states. Our cassandra loader tracks db versions in a db_version table and lets you apply runners against a keyspace to move your schema (and data) to the target version. If your db is already at a version it does nothing, or if your db is a few versions back the runner will only run the required versions to get you to latest (or to the version number you want).

Taking this one step further, when working at least in Java we have the luxury of using cassandra-unit to actually run an embedded cassandra instance available for unit or integration tests. This way you don’t need to mock out your database, you actually run all your db calls through the embedded cassandra. We use this heavily in cassieq (a distributed queue based on cassandra).

One thing our cassandra loader can do is be run in library mode, where you give it the same set of db scripts and you can build a fresh db for your integration tests:

public static Session create() throws Exception {
    return CqlUnitDb.create("../db/scripts");
}

Running the loader in standalone mode (by downloading the runner maven classifier) lets you run the migration runner in your console:

> java -jar cassandra.loader-runner.jar

Unexpected exception:Missing required options: ip, u, pw, k
usage: Main
 -f,--file-path <arg>         CQL File Path (default =
                              ../db/src/main/resources)
 -ip <arg>                    Cassandra IP Address
 -k,--keyspace <arg>          Cassandra Keyspace
 -p,--port <arg>              Cassandra Port (default = 9042)
 -pw,--password <arg>         Cassandra Password
 -recreateDatabase            Deletes all tables. WARNING all
                              data will be deleted! 
 -u,--username <arg>          Cassandra Username
 -v,--upgrade-version <arg>   Upgrade to Version

The advantage to unifying all of this is that you can test your db scripts in isolation and be confident that they work!

Dalloc – coordinating resource distribution using hazelcast

A fun problem that has come up during the implementation of cassieq (a distributed queue based on cassandra) is how to evenly distribute resources across a group of machines. There is a scenario in cassieq where writes can be delayed, and as such there is a custom worker in the app (by queue) who watches a queue to see if a delayed write comes in and republishes the message to a bucket later on. It’s transparent to the user, but if we have multiple workers on the same queue we could potentially republish the message twice. While technically that falls within the SLA we’ve set for cassieq (at least once delivery) it’d be nice to avoid this particular race condition.

To solve this, I’ve clustered the cassieq instances together using hazelcast. Hazelcast is a pretty cool library since it abstracts away member discovery/connection and gives you events on membership changes to make it easy for you to build distributed data grids. It also has a lot of great primitives that are useful in building distributed workflows. Using hazelcast, I’ve built a simple resource distributor that uses shared distributed locks and a master set of allocations across cluster members to coordinate who can “grab” which resource.

For the impatient you can get dalloc from

<dependency>
    <groupId>io.paradoxical</groupId>
    <artifactId>dalloc</artifactId>
    <version>1.0</version>
</dependency>

The general idea in dalloc is that each node creates a resource allocator who is bound to a resource group name (like “Queues”). Each node supplies a function to the allocator that generates the master set of resources to use, and a callback for when resources are allocated. The callback is so you can wire in async events and when allocations need to be rebalanced outside of a manual invocation (like cluster member/join).

The entire resource allocation library API deals with abstractions on what a resource is, and lets the client map their internal resource into a ResourceIdentity. For cassieq, it’s a queue id.

When an allocation is triggered (either manually or via a member join/leave event) the following occurs:

  • Try and acquire a shared lock for a finite period of time
  • If you acquired the lock, acquire a map of what has been allocated to everyone else and compare what is available from your master set to what is available
  • Given the size of the current cluster, determine how many resources you are allowed to claim (by even distribution). If you don’t have your entire set claimed, take as many as you can to fill up. If you have too many claimed, give some resources up
  • Persist your changes to the master state map
  • Dispatch to your callback what the new set of resources should be

Hazelcast supports distributed maps, where part of the map is sharded by its map key on different nodes. However, I’m actually explicitly NOT distributing the map across the cluster. I’ve put ownership of the resource set on “one” node (but the map is replicated so if that node goes down the map still exists). This is because each node is going to have to try and do a claim. If each node claims, and then calls to every other node, thats n^2 IO operations. Compare that to every node making N operations.

The library also supports bypassing this mechanism and instead supports a much more “low-tech” solution of manual allocation. All this means is that you pre-define how many nodes there should be, and which node number a node is. Then each node sorts the input data and grabs a specific slice out of the input set based on its id. It doesn’t give any guarantees to non-overlap, but it does give you an 80% solution to a hard problem.

Jake, the other paradoxical member suggested that there could be a nice alternative solution using a similar broadcast style of quorum using paxos. Each node broadcasts what it’s claiming and the nodes agree on who is allowed to do what. I probably wouldn’t use hazelcast for that, though the primitives of paxos (talking to all members of a cluster) are there and it’d be interesting to build paxos on top of hazelcast now that I think about it…

Anyways, abstracting distributed resource allocation is nice, because as we make improvements to how we want to tune the allocation algorithms all dependent services get it for free. And free stuff is my favorite.

Leadership election with cassandra

Cassandra has a neat feature that lets you expire data in a column. Using this handy little feature, you can create simple leadership election using cassandra. The whole process is described here which talks about leveraging Cassandras consensus and the column expiration to create leadership electors.

The idea is that a user will try and claim a slot for a period of time in a leadership table. If a slot is full, someone else has leadership. While the leader is still active they needs to heartbeat the table faster than the columns TTL to act as a keepalive. If it fails to heartbeat (i.e. it died) then its leadership claim can be relinquished and someone else can claim it. Unlike most leadership algorithms that claim a single “host” as a leader, I needed a way to create leaders sharded by some “group”. I call this a “LeadershipGroup” and we can leverage the expiring columns in cassandra to do this!

To make this easier, I’ve wrapped this algorithm in a java library available from paradoxical. For the impatient

<dependency>
    <groupId>io.paradoxical</groupId>
    <artifactId>cassandra-leadership</artifactId>
    <version>1.0</version>
</dependency>

The gist here is that you need to provide a schema similar to

CREATE TABLE leadership_election (
    group text PRIMARY KEY,
    leader_id text
);

Though the actual column names can be custom defined. You can define a leadership election factory using Guice like so

public class LeadershipModule extends AbstractModule {
    @Override
    protected void configure() {
        bind(LeadershipSchema.class).toInstance(LeadershipSchema.Default);

        bind(LeadershipStatus.class).to(LeadershipStatusImpl.class);

        bind(LeadershipElectionFactory.class).to(CassandraLeadershipElectionFactory.class);
    }
}
  • LeadershipStatus is a class that lets you query who is leader for what “group”. For example, you can have multiple workers competing for leadership of a certain resource.
  • LeadershipSchema is a class that defines what the column names in your schema are named. By default if you use the sample table above, the Default schema maps to that
  • LeadershipElectionFactory is a class that gives you instances of LeadershipElection classes, and I’ve provided a cassandra leadership factory

Once we have a leader election we can try and claim leadership:

final LeadershipElectionFactory factory = new CassandraLeadershipElectionFactory(session);

// create an election processor for a group id
final LeadershipElection leadership = factory.create(LeadershipGroup.random());

final LeaderIdentity user1 = LeaderIdentity.valueOf("user1");

final LeaderIdentity user2 = LeaderIdentity.valueOf("user2");

assertThat(leadership.tryClaimLeader(user1, Duration.ofSeconds(2))).isPresent();

Thread.sleep(Duration.ofSeconds(3).toMillis());

assertThat(leadership.tryClaimLeader(user2, Duration.ofSeconds(3))).isPresent();

When you claim leadership you claim it for a period of time and if you get it you get a leadership token that you can heartbeat on. And now you have leadership!

As usual, full source available at my github

Plugin class loaders are hard

Plugin based systems are really common. Jenkins, Jira, wordpress, whatever. Recently I built a plugin workflow for a system at work and have been mired in the joys of the class loader. For the uninitiated, a class in Java is identified uniquely by the class loader instance it is created from as well as its fully qualified class name. This means that foo.bar class loaded by class loader A is not the same as foo.bar class loaded by class loader B.

There are actually some cool things you can do with this, especially in terms of code isolation. Imagine your plugins are bundled as shaded jars that contain all the internal dependencies. By leveraging class loaders you can isolate potentially conflicting versions of libraries from the host application and the plugin. But, in order to communicate to the host layer, you need a strict set of shared interfaces that the host layer always owns. When building the uber jar you exclude the host interfaces from being bundled (and all its transitive dependencies which in maven can be done by using scope provided). This means that they will always be loaded by the host.

In general, class loaders are heirarchical. They ask their parent if a class has been loaded, and if so returns that. In order to do plugins you need to invert that process. First look inside the uber-jar, and then if you can’t find a class then look up.

An example can be found here and copied for the sake of internet completeness:

import java.net.URL;
import java.net.URLClassLoader;
import java.net.URLStreamHandlerFactory;
import java.util.UUID;

public class PostDelegationClassLoader extends URLClassLoader {

    private final UUID id = UUID.randomUUID();

    public PostDelegationClassLoader(URL[] urls, ClassLoader parent, URLStreamHandlerFactory factory) {
        super(urls, parent, factory);
    }

    public PostDelegationClassLoader(URL[] urls, ClassLoader parent) {
        super(urls, parent);
    }

    public PostDelegationClassLoader(URL[] urls) {
        super(urls);
    }

    public PostDelegationClassLoader() {
        super(new URL[0]);
    }

    @Override
    public Class<?> loadClass(String name) throws ClassNotFoundException {
        try (ThreadCurrentClassLoaderCapture capture = new ThreadCurrentClassLoaderCapture(this)) {
            Class loadedClass = findLoadedClass(name);

            // Nope, try to load it
            if (loadedClass == null) {
                try {
                    // Ignore parent delegation and just try to load locally
                    loadedClass = findClass(name);
                }
                catch (ClassNotFoundException e) {
                    // Swallow - does not exist locally
                }

                // If not found, just use the standard URLClassLoader (which follows normal parent delegation)
                if (loadedClass == null) {
                    // throws ClassNotFoundException if not found in delegation hierarchy at all
                    loadedClass = super.loadClass(name);
                }
            }
            return loadedClass;
        }
    }

    @Override
    public URL getResource(final String name) {
        final URL resource = findResource(name);

        if (resource != null) {
            return resource;
        }

        return super.getResource(name);
    }
}

But this is just the tip of the fun iceberg. If all your libraries play nice then you may not notice anything. But I recently noticed using the apache xml-rpc library that I would get a SAXParserFactory class def not found exception, specifically bitching about instantiating the sax parser factory. I’m not the only one apparenlty, here is a discussion about a JIRA plugin that wasn’t happy. After much code digging I found that the classloader being used was the one bound to the threads current context.

Why in the world is there a classloader bound to thread local? JavaWorld has a nice blurb about this

Why do thread context classloaders exist in the first place? They were introduced in J2SE without much fanfare. A certain lack of proper guidance and documentation from Sun Microsystems likely explains why many developers find them confusing.

In truth, context classloaders provide a back door around the classloading delegation scheme also introduced in J2SE. Normally, all classloaders in a JVM are organized in a hierarchy such that every classloader (except for the primordial classloader that bootstraps the entire JVM) has a single parent. When asked to load a class, every compliant classloader is expected to delegate loading to its parent first and attempt to define the class only if the parent fails.

Sometimes this orderly arrangement does not work, usually when some JVM core code must dynamically load resources provided by application developers. Take JNDI for instance: its guts are implemented by bootstrap classes in rt.jar (starting with J2SE 1.3), but these core JNDI classes may load JNDI providers implemented by independent vendors and potentially deployed in the application’s -classpath. This scenario calls for a parent classloader (the primordial one in this case) to load a class visible to one of its child classloaders (the system one, for example). Normal J2SE delegation does not work, and the workaround is to make the core JNDI classes use thread context loaders, thus effectively “tunneling” through the classloader hierarchy in the direction opposite to the proper delegation.

This means that whenever I’m delegating work to my plugins I need to be smart about capturing my custom plugin class loader and putting it on the current thread before execution. Otherwise if a misbehaving library accesses the thread classloader, it can now have access to the ambient root class loader and IFF the same class name exists in the host application it will load it. This could potentially conflict with other classes from the same package that aren’t loaded this way and in general cause mayhem.

The solution here was a simple class modeled after .NET’s disposable pattern using Java’s try/finally auto closeable.

public class ThreadCurrentClassLoaderCapture implements AutoCloseable {
    final ClassLoader originalClassLoader;

    public ThreadCurrentClassLoaderCapture(final ClassLoader newClassLoader) {
        originalClassLoader = Thread.currentThread().getContextClassLoader();

        Thread.currentThread().setContextClassLoader(newClassLoader);
    }

    @Override
    public void close() {
        Thread.currentThread().setContextClassLoader(originalClassLoader);
    }
}

Which is used before each and every invocation into the interface of the plugin (where connection is the plugin reference)

@Override
public void start() throws Exception {
    captureClassLoader(connection::start);
}

@Override
public void stop() throws Exception {
    captureClassLoader(connection::stop);
}

@Override
public void heartbeat() throws Exception {
    captureClassLoader(connection::heartbeat);
}

private void captureClassLoader(ExceptionRunnable runner) throws Exception {
    try (ThreadCurrentClassLoaderCapture capture = new ThreadCurrentClassLoaderCapture(connection.getClass().getClassLoader())) {
        runner.run();
    }
}

However, this isn’t the only issue. Imagine a scenario where you support both class path loaded plugins AND remote loaded plugins (via shaded uber-jar). And lets pretend that on the classpath is a jar with the same namespaces and classes as that in an uberjar. To be more succinct, you have a delay loaded shared library on the class path, and a version of that library that is shaded loaded via the plugin mechanism.

Technically there shouldn’t be any issues here. The class path plugin gets all its classes resolved from the root scope. The plugin gets its classes (of the same name) from the delegated provider. Both use the same shared set of interfaces of the host. The issue arrises if you have a library like reflectasm, which dynamically emits bytecode at runtime.

Look at this code:

AccessClassLoader loader = AccessClassLoader.get(type);
synchronized (loader) {
	try {
		accessClass = loader.loadClass(accessClassName);
	} catch (ClassNotFoundException ignored) {
		String accessClassNameInternal = accessClassName.replace('.', '/');
		String classNameInternal = className.replace('.', '/');
		ClassWriter cw = new ClassWriter(ClassWriter.COMPUTE_MAXS);

Which is a snippet from reflectasm as its generating a runtime byte code emitter that can access fields for you. It creates a class name like your.class.nameMethodAccess. If the class name isn’t found, it generates the bytecode and then writes it into the owning classes class loader.

In the scenario of a plugin using this library, it will check the loader and see that the plugin classloader AND rootscope loader do not have the emitted class name, and so a class not found exception is thrown. It will then write the class into the target types class loader. This would be the delegated loader, and provides the isolation we want.

However, if the class path plugin (what I call an embedded plugin) runs this code, the dynamic runtime class is written into the root scope loader. This means that all delegating class loaders will eventually find this type since they always do a delegated pass to the root!

The important thing to note here is that using a delegated loader does not mean every class that comes out of it is tied to the delegated loader. Only classes that are found inside of the delegated loader are bound to it. If a class is resolved by the parent, the class is linked to the parent.

In this scenario with the root class loader being polluted with the same class name, I don’t think there is much you can do other than avoid it.

Anyways, maybe I should have used OSGi…?

Project angelhair: Building a queue on cassandra

Edit: this project has since been moved to CassieQ: https://github.com/paradoxical-io/cassieq

A few weeks ago my work had a hack day and I got together with some of my coworker friends and we decided to build a queue on top of Cassandra.

For the impatient, give it a try (docker hub):

docker run -it \
    -e CLUSTER_NAME="" \
    -e KEYSPACE="" \
    -e CONTACT_POINTS="" \
    -e USERNAME="" \
    -e PASSWORD="" \
    -e USE_SSL="" \
    -e DATA_CENTER="" \
    -e METRICS_GRAPHITE "true" \
    -e GRAPHITE_PREFIX=" \
    -e GRAPHITE_URL=""  \
    onoffswitch/angelhair

The core features for what we called Project Angelhair was to handle:

– long term events (so many events that AMQ or RMQ might run out of storage space)
– connectionless – wanted to use http
– invisibility – need messages to disappear when they are processing but be able to come back
– highly scaleable – wanted to distribute a docker container that just did all the work

Building a queue on cassandra isn’t a trivial task and is rife with problems. In fact, this is pretty well known and in general the consensus is don’t build a queue on Cassandra.

But why not? There are a few reasons. In general, the question you want to answer with a queue is “what haven’t I seen“. A simple way to do this is when a message is consumed to delete it. However, with cassandra, deletes aren’t immediate. They are tombstoned, so they will exist for the compaction period. This means even if you have only 1 message in your queue, cassandra has to scan all the old deleted messages before it finds it. With high load this can be a LOT of extra work. But thats not the only problem. You have problems of how to distribute your messages across the ring. If you put all your messages for a queue into one partition key now you haven’t evenly distributed your messages and have a skewed distribution of work. This is going to manifest in really poor performance.

On top of all of that, cassandra has poor support for atomic transactions, so you can’t easily say “let me get, process, and consume” in one atomic action. Backing stores that are owned by a master (like sqlserver) let you do atomic actions much better since they have either have an elected leader who can manage this or are a single box. Cassandra isn’t so lucky.

Given all the problems described, it may seem insane to build a queue on Cassandra. But cassandra is a great datastore that is massively horizontally scaleable. It also exists at a lot of organizations already. Being able to use a horizontally scaleable data store means you can ingest incredible amounts of messages.

How does angelhair work?

Angelhair works with 3 pointers into a queue.

A reader bucket pointer
A repair bucket pointer
An invisibility pointer

In order to scale and efficiently act as a queue we need to leverage cassandra partitioning capabilities. Queues are actually messages bucketized into a fixed size group called a bucket. Each message is assigned a monotonically increasing id that maps itself into a bucket. For example, if the bucket is size 20 and you have id 21, that maps into bucket 1 (21/20). This is done using a table in cassandra whose only job is to provide monotonic values for a queue:

CREATE TABLE monoton (
  queuename text PRIMARY KEY,
  value bigint
);

By bucketizing messages we can distribute messages across the cassandra clusters.

Messages are always put into the bucket they correlate to, regardless if previous buckets are full. This means that messages just keep getting put into the end, as fast as possible.

Given that messages are put into their corresponding bucket, the reader has a pointer to its active bucket (the reader bucket pointer) and scans the bucket for unacked visible messages. If the bucket is full it tombstones the bucket indicating that the bucket is closed for processing. If the bucket is NOT full, but all messages in the bucket are consumed (or being processed) AND the monotonic pointer has already advanced to the next bucket, the current bucket is also tombstoned. This means no more messages will ever show up in the current bucket… sort of

Repairing delayed writes

Without synchronizing reads and writes you can run into a situation where you can have a delayed write. For example, assume you generate monotonic ids in this sequence:

Id 19
Id 20

Write 20 <-- bucket advances to bucket 1 
             (assuming bucket size of 20) and 
             bucket 0 is tombstoned (closed)

Write 19 <-- but message 19 writes into 
             bucket 0, even though 0 
             was tombstoned!

In this scenario id 20 advances the monotonic bucket to bucket 1 (given buckets are size 20). That means the reader tombstones bucket 0. But what happens to message 19? We don’t want to lose it, but as far as the reader is concerned it’s moved onto bucket 1 and off of bucket 0.

This is where the concept of a repair worker comes into play. The repair worker’s job is to slowly follow the reader and wait for tombstoned buckets. It has its own pointer (the repair bucket pointer) and polls to find when a bucket is tombstoned. When a bucket is tombstoned the repair worker will wait for a configured timeout for out of order missing messages to appear. This means if a slightly delayed write occurs then the repair worker will actually pick it up and then republish it to the last active bucket. We’re gambling on probability here, the assumption is that if a message is going to be successfully written then it will be written within time T. That time is configurable when you create the queue.

But there is also a scenario like this:

Id 19
Id 20

!!Write 19 ---> This actually dies and fails to write!
Write 20

In this scenario we claimed Id’s 19 and 20, but 19 failed to write. Once 20 is consumed the reader tombstones the bucket and the repair worker kicks in. But 19 isn’t ever going to show up! In this case, the repair worker waits for the configured time and if after that time the message isn’t written then we assume that that message is dead and will never be processed. Then the repair worker advances its pointer and moves on.

This means we don’t necessarily guarantee FIFO, however we do (reasonably) guarantee messages will appear. The repair worker never moves past a non completed bucket, though since its just a pointer we can always repair the repair worker by moving the pointer back.

Invisibility

Now the question comes up as how to deal with invisibility of messages. Invisible messages are important since with a conncectionless protocol (like http) we need to know if a message worker is dead and its message has to go back for processing. In queues like RMQ this is detected when a channel is disconnected (i.e. the connection is lost). With http not so lucky.

To track invisibility there is a separate pointer tracking the last invisible pointer. When a read comes in, we first check the invisibility pointer to see if that message is now visible.

If it is, we can return it. If not, get the next available message.

If the current invisible pointer is already acked then we need to find the next invisible pointer. This next invisible pointer is the first non-acked non-visible message. If there isn’t one in the current bucket, the invisibility pointer moves to the next bucket until it finds one or no messages exist, but never move past a message that hasn’t been delivered before. This way it won’t accidentally skip a message that hasn’t been sent out yet.

If however, there are two messages that get picked up at the same time the invis pointer is scanning through the invis pointer could choose the wrong id. In order to prevent this, we update the invis pointer to the destination if it’s less than the current (i.e. we need to move back), or if its not then only update if the current reader owns the current invis pointer (doing an atomic update).

API

Angelhair has a simple API.

– Put a message into a queue (and optionally specify an initial invisiblity)
– Get a message from a queue
– Ack the message using the message pop reciept (which is an encoded version and id metadata). The pop reciept is unique for each message dequeue. If a message comes back alive and is available for processing again it gets a new pop recipet. This also lets us identify a unique consumer of a message since the current atomic version of the message is encoded in the pop reciept.

Doesn’t get much easier than that!

Conclusion

There are a couple implementations of queues on cassandra out there that we found while researching this. One is from netflix but their implementation builds a lock system on top of cassandra and coordinates reads/writes using locking. Some other implementations used wide rows (or CQL lists in a single row) to get around the tombstoning, but that limits the number of messages in your “queue” to 64k messages.

While we haven’t tested angelhair in a stressed environment, we’ve decided to give it a go in some non critical areas in our internal tooling. But so far we’ve had great success with it!

Dynamic HAProxy configs with puppet

I’ve posted a little about puppet and our teams ops in the past since my team has pretty heavily invested in the dev portion of the ops role. Our initial foray into ops included us building a pretty basic puppet role based system which we use to coordinate docker deployments of our java services.

We use HAProxy as our software load balancer and the v1 of our infrastructure managment had us versioning a hardcoded haproxy.cfg for each environment and pushing out that config when we want to add or remove machines from the load balancer. It works, but it has a few issues

  1. Cluster swings involve checking into github. This pollutes our version history with a bunch of unnecessary toggling
  2. Difficult to automate swings since its flat file config driven and requires the config to be pushed out from puppet

Our team did a little brainstorming and came up with a nice solution which is to data drive it from some sort of json blob. By abstracting who provides the json blob and just building out our ha proxy config from structured data we can move to an API to serve this up for us. Step one was to replace our haproxy.conf with some sort of flat file json. The workflow we have isn’t changing, but its setting us up for success. Step two is to tie in something like consul to provide the json for us.

The first thing we need to do to support this is get puppet to know how to load up json from either a file or from an api. To do that we built an extra puppet custom function which we put into our /etc/puppet/modules/custom/lib/puppet/functions folder:

require 'json'
require 'rest-client'

module Puppet::Parser::Functions
  newfunction(:json_provider, :type => :rvalue) do |args|

    begin
      url=args[0]

      info("Getting json from url #{url}")

      if File.exists?(url)
        raw_json = File.read(url)
      else
        raw_json = RestClient.get(url)
      end

      data = JSON.parse(raw_json)

      info("Got json #{data}")

      data
    rescue Exception => e
      warning("Error accessing url #{url} from args '#{args}' with exception #{e}")

      raise Puppet::ParseError, "Error getting value from url #{url} exception #{e}"
    end
  end
end

And we need to make sure the puppetmaster knows where all its gems are so we we’ve added

 if ! defined(Package['json']) {
    package { 'json':
      ensure   => installed,
      provider => 'gem'
    }
  }

  if ! defined(Package['rest-client']) {
    package { 'rest-client':
      ensure   => installed,
      provider => 'gem'
    }
  }

To our puppet master role .pp.

At this point we can define what our ha proxy json file would look like. A sample structure that we’ve settled on looks like this:

{
  "frontends": [
    {
      "name": "main",
      "bind": "*",
      "port": 80,
      "default_backend": "app"
    },
    {
      "name": "legacy",
      "bind": "*",
      "port": 8080,
      "default_backend": "app"
    }
  ],
  "backends": [
    {
      "name": "app",
      "options": [
        "balance roundrobin"
      ],
      "servers": [
        {
          "name": "api1",
          "host": "api1.cloud.dev:8080",
          "option": "check"
        },
        {
          "name": "api2",
          "host": "api1.cloud.dev:8080",
          "option": "check"
        }
      ]
    }
  ]
}

Using this structure we can dynamically build out our haproxy.conf using ruby’s erb templating that puppet hooks into. Below is our ha proxy erb template. It assumes that @config is in the current scope which should be a json object in the puppet file. While the config is pretty basic, we don’t use any ACLs or too many custom options, we can always tweak the base haproxy config or add more metadata to our json structure to support more options.

#---------------------------------------------------------------------
# Example configuration for a possible web application.  See the
# full configuration options online.
#
#   http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
#
#---------------------------------------------------------------------

#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
    # to have these messages end up in /var/log/haproxy.log you will
    # need to:
    #
    # 1) configure syslog to accept network log events.  This is done
    #    by adding the '-r' option to the SYSLOGD_OPTIONS in
    #    /etc/sysconfig/syslog
    #
    # 2) configure local2 events to go to the /var/log/haproxy.log
    #   file. A line like the following can be added to
    #   /etc/sysconfig/syslog
    #
    #    local2.*                       /var/log/haproxy.log
    #
    log         127.0.0.1 local2

    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon

    # turn on stats unix socket
    stats socket /var/lib/haproxy/stats level admin

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 3
    timeout http-request    10s
    timeout queue           1m
    timeout connect         10s
    timeout client          1m
    timeout server          1m
    timeout http-keep-alive 10s
    timeout check           10s
    maxconn                 3000

listen stats :1936
    mode http
    stats enable
    stats hide-version
    stats realm Haproxy\ Statistics
    stats uri /
    stats auth admin:password
#---------------------------------------------------------------------
# main frontend which proxys to the backends
#---------------------------------------------------------------------
<% @config["frontends"].each do |frontend| %>
frontend  <%= frontend["name"] %> <%= frontend["bind"] %>:<%= frontend["port"] %>
    default_backend             <%= frontend["default_backend"] %>
<% end %>
#---------------------------------------------------------------------
# backends
#---------------------------------------------------------------------

<% @config["backends"].each do |backend| %>
backend <%= backend["name"] %>
    <%- if backend["options"] != nil -%>
        <%- backend["options"].each do |option| -%>
    <%= option %>
        <%- end -%>
    <%- end -%>
    <%- backend["servers"].each do |server| -%>
    server  <%= server["name"] %> <%= server["host"] %> <%= server["option"] %>
    <%- end -%>
<% end %>

This builds out a simple set of named frontends that point to a set of backends. We can populate backends for the different swing configurations (A cluster, B cluster, etc) and then toggle the default frontend to swing.

But, we still have to provide for a graceful reload. There is a lot of documentation out there on this, but the gist is that you want to cause clients to retry under the hood while you restart, so that the actual requester of the connection doesn’t notice a blip in service. To do that we can leverage the codified structure as well with another template

#!/bin/bash

# hold/pause new requests
<% @config["frontends"].each do |frontend| %>
/usr/sbin/iptables -I INPUT -p tcp --dport <%= frontend["port"] %> --syn -j DROP
<% end %>

sleep 1

# gracefully restart haproxy
/usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)

# allow new requests to come in again
<% @config["frontends"].each do |frontend| %>
/usr/sbin/iptables -D INPUT -p tcp --dport  <%= frontend["port"] %> --syn -j DROP
<% end %>

This inserts a rule for each frontend port to drop SYN packets silenty. SYN is the first packet type used in the tcp 3 way handshake and by dropping it the client will retry a few times after some interval to reconnect. This does mean the initial client will experience a slight delay, but their request will go through vs getting completely dropped.

Now our final haproxy.pp file looks like

class custom::loadbalancers::dynamic_ha(
  $load_balance_path = undef,
  $identity = undef # a unique seed to make sure the haproxy reloads dont stomp
)
{

  if $load_balance_path  == undef {
    fail 'Pass in a load balance source path. Can be either a file on disk or a GET json url'
  }

  if $identity == undef {
    fail "Identity for ha should be unique and set. This creates a temp file for reloading the haproxy gracefully"
  }

  package { 'haproxy':
    ensure => installed
  } ->

  service { 'haproxy':
    enable => true,
    ensure => running,
  }  ->

  package { 'haproxyctl':
    ensure    => installed,
    provider  => "gem"
  }

  $config = json_provider($load_balance_path)

  $rand = fqdn_rand(1000, $identity)

  $file = "/tmp/$identity-ha-reload.sh"

  file { '/etc/haproxy/haproxy.cfg':
    ensure   => present,
    mode     => 644,
    notify   => Exec['hot-reload'],
    content  => template("custom/app/ha.conf.erb")
  }

  file { $file:
    content  => template("custom/app/ha_reload.conf.erb"),
    mode     => 0755
  } ->
  exec { 'hot-reload' :
    require     => File[$file],
    command     => $file,
    path        => "/usr/bin:/usr/sbin",
    refreshonly => true
  }
}

With this, we can now drive everything from either a json file, or from a GET rest endpoint that provides JSON. We’re planning on using consul as a simple key value store with an api to be able to drive the json payload. At that point our swings get the current json configuration, change the default endpoint for the frontned, post it back, and issue a puppet command to the ha proxies via salt nodegroups and we’re all good!

Adventures in pretty printing JSON in haskell

Today I gave atom haskell-ide a whirl and wanted to play with haskell a bit more. I’ve played with haskell in the past and always been put off by the tooling. To be fair, I still kind of am. I love the idea of the language but the tooling is just not there to make it an enjoyable exploratory experience. I spend half my time in the repl inspecting types, the other half on hoogle, and the 3rd half (yes I know) being frustrated that I can’t just type in package names and explore API’s in sublime or atom or wherever I am. Now that I’m on a mac, maybe I’ll give leksah another try. I tried it a while ago it didn’t work well.

Anyways, I digress. Playing with haskell and I thought I’d try poking with Aeson, the JSON library. Like Scala, you have to define your objects as json parseable/serializable (unlike java/c# which use runtime reflection). Thankfully if you enable some language extensions its just a matter of adding Generic to your derives list and making sure the data type is of the ToJSON and FromJSON typeclasses. Mostly I was just copying the examples from here.

My sample class is

{-# LANGUAGE DeriveGeneric #-}

module Types where

import Data.Aeson
import GHC.Generics

data Person =
  Person  { firstName :: String
           ,lastName :: String
          } deriving(Show, Generic)

instance ToJSON Person
instance FromJSON Person

And I just wanted to make a simple hello world where I’d read in some data, make my object, and print pretty json to the screen.

On my first try:

import Data.Aeson
import Types

process :: IO String
process = getLine

main = do
  putStrLn "First name"
  firstName <- process

  putStrLn "Last name"
  lastName <- process

  let person = Person firstName lastName

  print $ (encode person)

  return ()

When I run cabal build;cabal run I now get

First name
anton
Last name
kropp
"{\"lastName\":\"kropp\",\"firstName\":\"anton\"}"

Certainly JSON, but I want it pretty. I found aeson-pretty and gave that a shot. Now I’m doing:

import Data.Aeson
import Data.Aeson.Encode.Pretty
import Types

process :: IO String
process = getLine

main = do
  putStrLn "First name"
  firstName <- process

  putStrLn "Last name"
  lastName <- process

  let person = Person firstName lastName

  print $ (encodePretty person)

  return ()

And I got:

First name
anton
Last name
kropp
"{\n    \"lastName\": \"kropp\",\n    \"firstName\": \"anton\"\n}"

Hmm. I can see that it should be pretty, but it isn’t. How come? Lets check out the types:

Prelude > import Data.Aeson
Prelude Data.Aeson > :t encode
encode :: ToJSON a => a -> Data.ByteString.Lazy.Internal.ByteString

Whats a lazy bytestring?

Well, from fpcomplete

ByteString provides a more efficient alternative to Haskell’s built-in String which can be used to store 8-bit character strings and also to handle binary data

And the lazy one is the, well, lazy version. Through some googling I find that the right way to print the bytestring is by using the “putStr” functions in the Data.ByteString package. But as a good functional programmer, I want to encapsulate that and basically make a useful function that given the json object I can get a plain ol happy string and decide how to print it later.

I need to somehow make a lazy bytestring into a regular string. This leads me to this:

getJson :: ToJSON a => a -> String 
getJson d = unpack $ decodeUtf8 $ BSL.toStrict (encodePretty d)

So I first evaluate the bytestring into a strict version (instead of lazy), then decode it to utf8, then unpack the text class into strings (apparenlty text is more efficient but more API’s use String).

Prelude > :t toStrict
toStrict :: ByteString -> Data.ByteString.Internal.ByteString

Prelude > :t decodeUtf8
decodeUtf8 :: Data.ByteString.Internal.ByteString -> Text

Prelude > :t Data.Text.unpack
Data.Text.unpack :: Text -> String

And now, finally:

import Data.Aeson
import Data.Aeson.Encode.Pretty
import qualified Data.ByteString.Lazy as BSL
import Data.Text
import Data.Text.Encoding
import Types

process :: IO String
process = getLine

getJson :: ToJSON a => a -> String 
getJson d = unpack $ decodeUtf8 $ BSL.toStrict (encodePretty d)

main = do
  putStrLn "First name"
  firstName <- process

  putStrLn "Last name"
  lastName <- process

  let person = Person firstName lastName

  print $ getJson person

  return ()

Which gives me

First name
anton
Last name
kropp
"{\n    \"lastName\": \"kropp\",\n    \"firstName\": \"anton\"\n}"

AGHH! Still! Ok, more googling. Google google google.

Last piece of the puzzle is that print is really putStrLn . show

Prelude > let x = putStrLn . show
Prelude > x "foo\nbar"
"foo\nbar"

And if we just do

Prelude > putStrLn "foo\nbar"
foo
bar

The missing ticket. Finally all put together:

import Data.Aeson
import Data.Aeson.Encode.Pretty
import qualified Data.ByteString.Lazy as BSL
import Data.Text
import Data.Text.Encoding
import Types

process :: IO String
process = getLine

getJson :: ToJSON a => a -> String
getJson d = unpack $ decodeUtf8 $ BSL.toStrict (encodePretty d)

main = do
  putStrLn "First name"
  firstName <- process

  putStrLn "Last name"
  lastName <- process

  let person = Person firstName lastName

  putStrLn $ getJson person

  return ()

Which gives me:

$ cabal build; cabal run
Building sample-0.1.0.0...
Preprocessing executable 'sample' for sample-0.1.0.0...
[3 of 3] Compiling Main             ( src/Main.hs, dist/build/sample/sample-tmp/Main.o )
Linking dist/build/sample/sample ...
Preprocessing executable 'sample' for sample-0.1.0.0...
Running sample...
First name
anton
Last name
kropp
{
    "lastName": "kropp",
    "firstName": "anton"
}

Source available at my github

Automating deployments with salt, puppet, jenkins and docker

I know, its a buzzword mouthful. My team has had good first success leveraging jenkins, salt, sensu, puppet, and docker to package and monitor distributed java services with a one click deployment story so I wanted to share how we’ve set things up.

First and foremost, I’ve never been an ops guy. I’ve spent time on build systems like msbuild and fake, but never a full pipeline solution that also had to manage infrastructure, but there’s a first for everything. Most companies I’ve worked at have had all this stuff set up already, and out of the minds of developers, but the place I am at now does not. I actually think it’s been a great opportunity to dive into the full flow of how do you get your damn code out the door. I think if you are developing an application and don’t know how it gets from git to your box, you should do some digging, because there is A LOT that happens.

Given that my team doesn’t have a one click solution to just so “magic!” I set out to build an initial workflow for our team that would let us package, deploy, monitor, and version our infrastructure. I wanted the entire setup to be jenkins driven so everything we have is versioned in github enterprise with jenkins hooks to build repositories. Full disclaimer, this isn’t meant to work for enormous teams or companies, but just for what mine and another team are doing.

None of this is really all that new or interesting to anyone whose done it, but I wanted to document how and why we did it the way we have.

Puppet

The first thing we did was create a puppet repo. You may have remembered in a past post I blogged before how that led to testing puppet scripts with docker (which sort of fizzled due to puppet not having access to systemd for service starting. It would work on a pure linux box, but given boot2docker doesn’t have cgroups to virtually mount it wouldn’t work on a mac). Our puppet scripts are set up by environment:

Puppet directory structure

Where our “/data” folder will get mapped to “/etc/puppet”.

We use a custom facter to identify nodes in our environments. Instead of setting up our site.pp manifest with actual node matching, everyone has a custom role and our site delegates what to do based on that role:

$rmq_master = "..."
$salt_master = "..."
$sensu_host = "..."

node default {
  case $node_role{
   
    'rmq_master': {
      class { 'domains::monitoring::salt::client':
        salt_master => $salt_master
      }

      class { "domains::rmq::master" :
        sensu_host => $sensu_host
      }
    }

    'rmq_slave': {
      class { 'domains::monitoring::salt::client':
        salt_master => $salt_master
      }

      class { "domains::rmq::slave" :
        master_nodename => $rmq_master,
        sensu_host      => $sensu_host
      }
    }

    'sensu_server': {
      class { 'domains::monitoring::salt::client':
        salt_master => $salt_master
      }

      class { "domains::monitoring::sensu::server": }
    }

    'jenkins-master' : {
      class { 'domains::monitoring::salt::client':
        salt_master => $salt_master
      }

      class { "domains::cicd::jenkins-master":
        sensu_host => $sensu_host
      }
    }

    'jenkins-slave' : {
      class { 'domains::monitoring::salt::client':
        salt_master => $salt_master
      }

      class { "domains::cicd::jenkins-slave":
        sensu_host => $sensu_host
      }
    }

    'elk-server': {
      class { 'domains::monitoring::salt::client':
        salt_master => $salt_master
      }

      class { "domains::monitoring::elk::server" :
        sensu_host => $sensu_host
      }
    }
  }
}

With the exceptions of the master hostnames that we need to know about we can now spin up new machines, link them to puppet, and they become who they are. We store the role files in /etc/.config/role on the agents and our facter looks like this

# node_role.rb
require 'facter'
Facter.add(:node_role) do
   confine :kernel => 'Linux'
      setcode do
        Facter::Core::Execution.exec('cat /etc/.config/role')
      end
end

Linking them up to puppet is easy too, we wrote some scripts to help bootstrap machines since we don’t want to have manually install puppet on each box, then configure its puppet.conf file to point to a master, etc. We want to just from our shell spin up new VM’s in our openstack instance, and create new roles quickly. And by adding on zsh autocompletion we can get a really nice experience:

Bootstrapping puppet agent

Salt

Saltstack is an orchestration tool built on zeromq that we use to delegate tasks and commands. Basically anytime we need to execute a command on a machine, or role, or group of machines, we’ll use salt. For example, let me ping all the machines in our salt group:

Screen Shot 2015-08-13 at 3.24.12 PM

You can run arbitrary commands on machines too, or you can write your own custom python modules to execute (test.ping is a module called test with a method called ping that will get executed on all machines).

Leveraging salt, we can deploy our puppet scripts continuously trigged from a git repo change. When you commit into github a jenkins job is trigged. The jenkins job runs the puppet syntax validator on each puppet file:

#!/usr/bin/env bash

pushd data

for file in `find . -name "*.pp"`; do
   echo "Validating $file"

   puppet parser validate $file

   if [ $? -ne 0 ]; then
     popd
   	 exit 1;
   fi
done;

popd

If that succeeds it dispatches a command to salt with a nodegroup of the puppet master. All this does is execute a remote command (via the salt REST api) on the puppet master machine (or group of machines, we really dont care) and it checks out the git repo at a particular commit into a temp folder, blows out the environment and custom modules folders in /etc/puppet and then copies over the new files.

The nice thing here, is that even if we screwed up with our puppet scripts, we can still execute salt commands and roll things back with our jenkins job.

We take this salt deployment one step further for deploying applications in docker containers.

Docker

My team is developing java services that deploy into openstack VMs running centos7. While we could push RPM’s into spacewalk and orchestrate uninstalling and reinstalling rpms via puppet, I’ve had way too many instances in the past of config files getting left behind, or some other garbage mucking up my otherwise pristine environment. We also wanted a way to simulate the exact prod environment on our local machines. Building distributed services is easy, but debugging them is really hard, especially when you have lots of dependencies (rmq, redis, cassandra, decoupled listeners, ingest apis, health monitors, etc). For that reason we chose to package up our java service as a shaded jar (we use dropwizard) and package it into a docker image.

We built a base docker image for java services and then bundle our config, jar, and bootstrap scripts, into the base image. After our jenkins build of our service is complete, our package process takes the final artifact (which actually is an RPM, which is created using the maven RPM plugin), templates out a dockerfile, and creates a tar file with both files (RPM and Dockerfile) to artifact. This way we can use the jenkins promotion plugin to do the actual docker build. The tar file is there because jenkins promotions are asynchronous and can run on different slaves, so we want to include the raw artifacts as part of the artifacting for access later.

Here is an example of the templated dockerfile which will generates our docker image. The --RPM_SRC-- and --RPM_NAME-- placeholders get replaced during build

FROM artifactory/java-base

# add the rpm
RUN mkdir /rpms

ADD --RPM_SRC-- /rpms/

# install the rpm
RUN yum -y install /rpms/--RPM_NAME--

# set the service to run
ENV SERVICE_RUN /data/bin/service

And below you can see the different jenkins promotion steps here. The first star is artifacting the docker image, and the second is deploying it to puppet via salt:

Deploy services

Once we artifact the docker image, it’s pushed out to artifactory. All our images are tagged with app_name:git_sha in artifactory. To deploy the docker image out into the wild we use salt to push the expected git sha for that application type onto the puppet master. I mentioned one custom facter that we wrote that nodes use to report their information back, but we also have a second.

Nodes store a json payload of their docker app arguments, the container version, and some other metadata, so that they know what they are currently running. They also report this json payload back to puppet, and puppet can determine if their currently running git_sha is the same as the expected one. This way we know all their runtime arguments that were passed, etc. If anything is different then puppet stops the current container, and loads on the new one. All our containers are named, since it makes managing this easier (vs using anonymous container names). When we do update an application to the newest version we write that json back to disk in a known location. Here is the factor that collects all the docker version information for us:

# Build a hash of all the different app jsons, located in /etc/.config/versions
# there could be many docker applications running on this machine, which is why we bundle them all up together
require 'facter'
require 'json'
Facter.add(:node_info) do
  confine :kernel => 'Linux'
  setcode do
    node_information = {}
    path = '/etc/.config/versions'

    if File.exist?(path)
      Dir.foreach(path) do |app_type|
        next if File.directory?(File.join(path, app_type))
        content = File.read(File.join(path, app_type))
        data_hash = JSON.parse(content)
        node_information[app_type] = data_hash
      end
    end

    node_information
  end
end

To ensure reboot tolerance and application restart tolerance our base docker image runs an instance of monit which is the foreground process for our application. We also make sure all docker containers are started with --restart always and that the docker service is set to start on reboot.

Cluster swinging

We are still working on doing cluster swinging, but we’ll leverage salt as well as a/b node roles in puppet to do that. For example, we could tag half of our RMQ listeners as A roles, and the other half B roles. When we go to publish we can publish by application type, so RMQ_A_ROLE will be an application type. Those can all switch over while the B roles stay the same. We can also leverage salt to do HAProxy swings. If we send a message to puppet to use a different haproxy config, then tell the haproxy box to update we’ve done part of a swing. Same in the reverse. All of this can be managed with jenkins promotion steps and can be undone by re-promoting an older step.

Conclusion

This is just the beginning of our ops story on my team, as we have a lot of other work to do, but now that the basics are there we can ramp much faster. Whats interesting is that there are a lot of different varieties of production level tooling available, but no real consensus on how to do any of it. In my research I found endless conflicting resources and suggestions, and it seemed like every team (including mine!) was just sort if reinventing the process. I’m hoping that how we have things set up will work well, but our application is still in the major development phases so we have yet to really see how this will pan out in production (though our development cluster has 30 machines that we manage this way now).

And of course there are still issues. For example, we are using a puppet sensu plugin that keeps restarting the sensu process every 60 seconds. It’s not the end of the world, but it is annoying. We had to write our own docker puppet module since the one out there didn’t do what we wanted. The RMQ puppet module doesn’t support FQDN mode for auto adding slaves, so we had to write that. We had to write tooling to help bootstrap boxes and deploy the puppet scripts, etc. While none of this is technologically challenging from a purely code perspective, it wasn’t an easy process as most of your time spent debugging is “why the hell isn’t this working, it SHOULD!” only to find you missed a comma somewhere (why can’t we just do everything in strongly typed languages…). And of course, headaches are always there when you try something totally new, but in the end it was an interesting journey and gives me so much more appreciate for ops people. They never get enough credit.

Testing puppet with docker and python

In all the past positions I’ve been in I’ve been lucky enough to have a dedicated ops team to handle service deployment, cluster health, and machine managmenent. However, at my new company there is much more of a “self serve” mentality such that each team needs to handle things themselves. On the one hand this is a huge pain in my ass, since really the last thing I want to do is deal with clusters and machines. On the other hand though, because we have the ability to spin up openstack boxes in our data centers at the click of a button, each team has the flexibility to host their own infrastructrure and stack.

For the most part my team and I are deploying our java services using dockerized containers. Our container is a centos7 base image with a logstash forwarder in it and some other minor tooling, and we run our java service in the foreground. All we need to have on our host boxes is a bootloader script that we execute to shut down old docker containers and spin up new docker containers, and of course docker. To get docker and our bootloader (and of course manage things like our jenkins instances, RMQ clusters, cassandra nodes, etc) we are using puppet.

After deep diving into puppet my first question was “how do I test this?”. Most suggestions indicate testing is two fold

  1. Syntax checking
  2. Integration testing on isolated machines

The first element is a no brainer. You run the puppet syntax checker and you get some output. That’s not that helpful though, other than making sure I didn’t fat finger something. And the second point really sucks. You have to manually check if everything worked. As an engineer I shudder at the word “manual”, so I set out to create an isolated test framework that my team can use to simulate and automatically test puppet scripts both local and on jenkins.

To do that, I wrote puppety. It’s really stupidly simple. The gist is you have a puppet master in a docker container who auto signs anyone who connects, and you have a puppet agent in a docker container who connects, syncs, and then runs tests validating the sync was complete.

Puppety structure

If you look at the git repo, you’ll see there are two main folders:

/data
/test

The /data folder is going to map to the /etc/puppet folder on our puppet master. It should contain all the stuff we want to deploy as if we plopped that whole folder onto the puppet root.

The test folder contains the python test runners, as well as the dockerized containers for both the master and the agent.

Testing a node

If you have a node configuration in an environment you can test a node by annotating it like so:

# node-test: jenkins/test-server
node "test.foo.com" {
  file {'/tmp/example-ip':                                            # resource type file and filename
    ensure  => present,                                               # make sure it exists
    mode    => 0644,                                                  # file permissions
    content =>  "Here is my Public IP Address: ${ipaddress_eth0}.\n",  # note the ipaddress_eth0 fact
  }
}

Lets say this node sits in a definition file in /etc/puppet/environments/develop/manifests/nodes/jenkins.pp

Our test runner can pick up that we asked to test the jenkins node, and template our manifests such that during run time the actual node definition looks like

# node-test: jenkins/test-server
node /docker_host.*/ {
  file {'/tmp/example-ip':                                            # resource type file and filename
    ensure  => present,                                               # make sure it exists
    mode    => 0644,                                                  # file permissions
    content =>  "Here is my Public IP Address: ${ipaddress_eth0}.\n",  # note the ipaddress_eth0 fact
  }
}

Now, when the dockerized puppet container connects, it assumes the role of the jenkins node!

The tests sit in a folder called tests/runners and the test name is the path to the test to run. It’s that simple.

We are also structuring our puppet scripts in terms of roles. Roles using a custom facter who reads from /etc/.role/role to find out the role name of a machine. So this way, when a machine connects to puppet it’ll say “I’m this role” and puppet can switch on the role to know what configurations to apply.

To support this, we can annotate role tests like so

node default {
  case $node_role{
    # role-test: roles/slave-test
    'slave': {
      file {'/tmp/node-role': # resource type file and file
        ensure  => present,   # make sure it exists
        mode    => 0644,      # file permissions
        content =>    "Here is my Role ${$node_Role}.\n",  # note the node role
       }
    }
    # role-test: roles/listener-test
    'listener': {
      file { '/tmp/listener': # resource type file and file
        ensure  => present,   # make sure it exists
        mode    => 0644,      # file permissions
        content =>    "I am a listener",  # note the node role
      }
    }
   }
}

When the roles/slave-test gets run the test runner will add the role slave to the right file, such that when the container connects it’ll assume that role.

The tests themselves are trivial. They use pytest syntax and look like this:

from util.puppet_utils import *

@agent
def test_file_exists():
    assert file_exists("/tmp/example-ip")

@agent
def test_ip_contents_set():
    assert contents_contains('/tmp/example-ip', 'Here is my Public IP Address')

@master
def test_setup():
    print "foo"

Tests are annotated by where they’ll run at. Agent tests run after a sync, but master tests will run BEFORE the master runs. This is so you can do any setup on the master you need. Need to drop in some custom data before the agent starts? Perfect place to do it.

Getting test results on jenkins

The fun part about this is that we can output the result of each test into a linked docker volume. Our jenkins test runner just looks like:

cd test

PATH=$WORKSPACE/venv/bin:/usr/local/bin:$PATH

virtualenv venv

. venv/bin/activate

pip install -r requirements.txt

python test-runner.py -e develop --all

python test-runner.py -e production --all

And we can collect our results to get a nice test graph

Screen Shot 2015-07-07 at 5.38.23 PM

To deploy we have cron job on the puppet master to pull back our puppet scripts git repo and merge the data folder into its /etc/puppet folder.

Debugging the containers

Sometimes using puppety goes wrong and it’s nice to see whats going on. Because each container exposes an entrypoint script we can pass in a debug flag to get access to a shell so we can run the tests manually:

$ docker run -it -h docker_host -v ~/tmp/:/opt/local/tmp puppet-tests/puppet-agent --debug /bin/bash

Now we can execute the entrypoint by hand, or run puppet by hand and play around.

Conclusion

All in all this has worked really well for our team. It’s made it easy for us to prototype and play with our infrastructure scripts in a controlled environment locally. And since we are able to now actually write tests against our infrastructure we can feel more comfortable about pushing changes out to prod.