Category: Code

Extracting scala method names from objects with macros

I have a soft spot in me for AST’s ever since I went through the exercise of building my own language. Working in Java I missed the dynamic ability to get compile time information, though I knew it was available as part of the annotation processing pipleine during compilation (which is how lombok works). Scala has something similiar in the concept of macros: a way to hook into the compiler, manipulate or inspect the syntax tree, and rewrite or inject whatever you want. It’s a wonderfully elegant system that reminds me of Lisp/Clojure macros.

I ran into a situation (as always) where I really wanted to get the name of a function dynamically. i.e.

class Foo {
   val field: String = "" 
   def method(): Unit = {}

val name: String = ??.field // = "field"

In .NET this is pretty easy since at runtime you can create an expression tree which gives you the AST. But I haven’t been in .NET in a while, so off to macros I went!

First off, I found the documentation regarding macros to be lackluster. It’s either rudimentary with trivial examples, or the learning curve was steep and I was too lazy to read through all of it. Usually when I encounter scenarios like this I turn to exploratory programming, where I have a unit test that sets up a basic example and I leverage the debugger and intellij live REPL to poke through what I can and can’t do. Time to get set up.

First, I needed to create a new submodule in my multi module maven project that would contain my macro. The reason is that you can’t use macros in the same compilation unit that they are defined in. You can however, use macros in a macros test since the compiler compiles test sources different from regular sources.

That said, debugging macros is harder than normal because you aren’t debugging your running program, you are debugging the actual compiler. I found this blog post which was a life saver, even though it was missing a few minor pieces.

1. Set the main class to
2. Set the VM args to -Dscala.usejavacp=true
3. Set the program arguments to first point to the file containing the macro, then the file to compile that uses the macro:

-cp types.Types macros/src/main/scala/com/devshorts/common/macros/MethodNames.scala config/src/test/scala/config/ConfigProxySpec.scala

Now you can actually debug your macro!

First let me show the test

case class MethodNameTest(field1: Object) {
  def getFoo(arg: Object): Unit = {}
  def getFoo2(arg: Object, arg2: Object): Unit = {}

class MethodNamesMacroSpec extends FlatSpec with Matchers {
  "Names macro" should "extract from an function" in {
    methodName[MethodNameTest](_.field1) shouldEqual MethodName("field1")

  it should "extract when the function contains an argument" in {
    methodName[MethodNameTest](_.getFoo(null)) shouldEqual MethodName("getFoo")

  it should "extract when the function contains multiple argument" in {
    methodName[MethodNameTest](_.getFoo2(null, null)) shouldEqual MethodName("getFoo2")

  it should "extract when the method is curried" in {
    methodName[MethodNameTest](m => m.getFoo2 _) shouldEqual MethodName("getFoo2")


methodName here is a macro that extracts the method name from a lambda passed in of the parameterized generic type. What’s nice about how scala set up their macros is you provide an alias for your macro such that you can re-use the macro but type it however you want.

object MethodNames {
  implicit def methodName[A](extractor: (A) => Any): MethodName = macro methodNamesMacro[A]

  def methodNamesMacro[A: c.WeakTypeTag](c: Context)(extractor: c.Expr[(A) => Any]): c.Expr[MethodName] = {

I’ve made the methodName function take a generic and a function that uses that generic (even though no actual instance is ever passed in). The nice thing about this is I can re-use the macro typed as another function elsewhere. Imagine I want to pin [A] so people don’t have to type it. I can do exactly that!

case class Configuration (foo: String)

implicit def config(extractor: Configuration => Any): MethodName = macro MethodNames.methodNamesMacro[Configuration]

config( == "foo"

At this point its time to build the bulk of the macro. The idea is to inspect parts of the AST and potentially walk it to find the pieces we want. Here’s what I ended up with:

def methodNamesMacro[A: c.WeakTypeTag](c: Context)(extractor: c.Expr[(A) => Any]): c.Expr[MethodName] = {
  import c.universe._

  def resolveFunctionName(f: Function): String = {
    f.body match {
      // the function name
      case t: Select =>

      case t: Function =>

      // an application of a function and extracting the name
      case t: Apply if[Select] =>[Select].name.decoded

      // curried lambda
      case t: Block if t.expr.isInstanceOf[Function] =>
        val func = t.expr.asInstanceOf[Function]


      case _ => {
        throw new RuntimeException("Unable to resolve function name for expression: " + f.body)

  val name = resolveFunctionName(extractor.tree.asInstanceOf[Function])

  val literal = c.Expr[String](Literal(Constant(name)))

  reify {

For more details on parts of the AST here is a great resource

In the first case, when we pass in methodName[Config](_.method) it gets mangled into a function with a body that is of x$1.method. The select indicates the x$1 instance and selects the method expression of it. This is an easy case.

In the block case that maps to when we call methodName[Config](c => c.thing _). In this case we have a function but its curried. In this scenario the function body is a block who’s inner expression is a function. But, the functions body of that inner function is an Apply.

Apply takes two arguments — a Select or an Ident for the function and a list of arguments

So that makes sense.

The rest is just helper methods to recurse.

The last piece of the puzzle is to create an instance of a string literal and splice it into a new expression returning the MethodName case class that contains the string literal.

All in all a fun afternoons worth of code and now I get semantic string safety. A great use case here can be to type configuration values or other string semantics with a trait. You can get compile time refactoring + type safety. Other use cases are things like database configurations and drivers (the .NET mongo driver uses expression trees to type an object to its underlying mongo collection).

Unit testing DNS failovers

Something that’s come up a few times in my career is the difficulty of validating if and when your code can handle actual DNS changes. A lot of times testing that you have the right JVM settings and that your 3rd party clients can handle it involves mucking with hosts files, nameservers, or stuff like Route53 and waiting around. Then its hard to automate and deterministically reproduce. However, you can hook into the DNS resolution in the JVM to control what gets resolved to what. And this way you can tweak the resolution in a test and see what breaks! I found some info at this blog post and cleaned it up a bit for usage in scala.

The magic sauce to pull this off is to make sure you override the default Internally in the InetAddress class it tries to load an instance of the interface NameServiceDescriptor using the Service loader mechanism. The service loader looks for resources in META-INF/services/ and instantiates whatever fully qualified class name is that class name override file.

For example, if we have

cat META-INF/services/

Then the io.paradoxical.test.dns.LocalNameServerDescriptor will get created. Nice.

What does that class actually look like?

class LocalNameServerDescriptor extends NameServiceDescriptor {
  override def getType: String = "dns"

  override def createNameService(): NameService = {
    new LocalNameServer()

  override def getProviderName: String = LocalNameServer.dnsName

The type is of dns and the name service implementation is our own class. The provider name is something we have custom defined as well below:

object LocalNameServer {
  Security.setProperty("networkaddress.cache.ttl", "0")

  protected val cache = new ConcurrentHashMap[String, String]()

  val dnsName = "local-dns"

  def use(): Unit = {
    System.setProperty("", s"dns,${dnsName}")

  def put(hostName: String, ip: String) = {
    cache.put(hostName, ip)

  def remove(hostName: String) = {

class LocalNameServer extends NameService {

  import LocalNameServer._

  val default = new DNSNameService()

  override def lookupAllHostAddr(name: String): Array[InetAddress] = {
    val ip = cache.get(name)
    if (ip != null && !ip.isEmpty) {
    } else {

  override def getHostByAddr(bytes: Array[Byte]): String = {

Pretty simple. We have a cache that is stored in a singleton companion object with some helper methods on it, and all we do is delegate looking into the cache. If we can resolve the data in the cache we return it, otherwise just proxy it to the default resolver.

The use method sets a system property that says to use the dns resolver of name local-dns as the highest priority nameservice.provider.1 (lower numbers are higher priority)

Now we can write some tests and see if this works!

class DnsTests extends FlatSpec with Matchers {

  "DNS" should "resolve" in {
    val google = resolve("")

    google.getHostAddress shouldNot be("")

  it should "be overridable" in {
    LocalNameServer.put("", "")

    val google = resolve("")

    google.getHostAddress should be("")


  it should "be undoable" in {
    LocalNameServer.put("", "")

    val google = resolve("")

    google.getHostAddress should be("")


    resolve("").getHostAddress shouldNot be("")

  def resolve(name: String) = InetAddress.getByName(name)

Happy dns resolving!

Consistent hashing for fun

I think consistent hashing is pretty fascinating. It lets you define a ring of machines that shard out data by a hash value. Imagine that your hash space is 0 -> Int.Max, and you have 2 machines. Well one machine gets all values hashed from 0 -> Int.Max/2 and the other from Int.Max/2 -> Int.Max. Clever. This is one of the major algorithms of distributed systems like cassandra and dynamoDB.

For a good visualization, check out this blog post.

The fun stuff happens when you want to add replication and fault tolerance to your hashing. Now you need to have replicants and manage when machines join and add. When someone joins, you need to re-partition the space evenly and re-distribute the values that were previously held.

Something similar when you have a node leave, you need to make sure that whatever it was responsible for in its primray space AND the things it was responsible for as a secondary replicant, are re-redistributed amongst the remaining nodes.

But the beauty of consistent hashing is that the replication basically happens for free! And so does redistribution!

Since my new feature is in all in Scala, I figured I’d write something up to see how this might play out in scala.

For the impatient, the full source is here.

First I started with some data types

case class HashValue(value: String) extends AnyRef

case class HashKey(key: Int) extends AnyRef with Ordered[HashKey] {
  override def compare(that: HashKey): Int =

object HashKey {
  def safe(key: Int) = new HashKey(Math.abs(key))

case class HashRange(minHash: HashKey, maxHash: HashKey) extends Ordered[HashRange] {
  override def compare(that: HashRange): Int =

I chose to wrap the key in a positive space since it made things slightly easier. In reality you want to use md5 or some actual hashing function, but I relied on the hash code here.

And then a machine to hold values:

import scala.collection.immutable.TreeMap

class Machine[TValue](val id: String) {
  private var map: TreeMap[HashKey, TValue] = new TreeMap[HashKey, TValue]()

  def add(key: HashKey, value: TValue): Unit = {
    map = map + (key -> value)

  def get(hashKey: HashKey): Option[TValue] = {

  def getValuesInHashRange(hashRange: HashRange): Seq[(HashKey, TValue)] ={
    map.range(hashRange.minHash, hashRange.maxHash).toSeq

  def keepOnly(hashRanges: Seq[HashRange]): Seq[(HashKey, TValue)] = {
    val keepOnly: TreeMap[HashKey, TValue] =
      .map(range => map.range(range.minHash, range.maxHash))
      .fold(map.empty) { (tree1, tree2) => tree1 ++ tree2 }

    val dropped = map.filter { case (k, v) => !keepOnly.contains(k) }

    map = keepOnly

A machine keeps a sorted tree map of hash values. This lets me really quickly get things within ranges. For example, when we re-partition a machine, it’s no longer responsible for the entire range set that it was before. But it may still be responsible for parts of it. So we want to be able to tell a machine hey, keep ranges 0-5, 12-20, but drop everything else. The tree map lets me do this really nicely.

Now for the fun part, the actual consistent hashing stuff.

Given a set of machines, we need to define how the circular partitions is defined

private def getPartitions(machines: Seq[Machine[TValue]]): Seq[(HashRange, Machine[TValue])] = {
  val replicatedRanges: Seq[HashRange] = Stream.continually(defineRanges(machines.size)).flatten

  val infiteMachines: Stream[Machine[TValue]] = 

  .take(machines.size * replicas)

What we want to make sure is that each node sits on multiple ranges, this gives us the replication factor. To do that I’ve duplicated the machines in the list by the replication factor, and made sure all the lists cycle around indefinteily, so while they are not evenly distributed around the ring (they are clustered) they do provide fault tolerance

Lets look at what it takes to put a value into the ring:

private def put(hashkey: HashKey, value: TValue): Unit = {
  getReplicas(hashkey).foreach(_.add(hashkey, value))

private def getReplicas(hashKey: HashKey): Seq[Machine[TValue]] = {
  .filter { case (range, machine) => hashKey >= range.minHash && hashKey < range.maxHash }

We need to make sure that for each replica in the ring that sits on a hash range, that we insert it into that machine. Thats pretty easy, though we can improve this later with better lookups

Lets look at a get

def get(hashKey: TKey): Option[TValue] = {
  val key =

  .collectFirst { case Some(x) => x }

Also similar. Go through all the replicas, and find the first one to return a value

Now lets look how to add a machine into the ring

def addMachine(): Machine[TValue] = {
  id += 1

  val newMachine = new Machine[TValue]("machine-" + id)

  val oldMachines =

  partitions = getPartitions(Seq(newMachine) ++ oldMachines)



So we first create a new list of machines, and then ask how to re-partition the ring. Then the keys in the ring need to redistribute themselves so that only the nodes who are responsible for certain ranges contain those keys

def redistribute(newPartitions: Seq[(HashRange, Machine[TValue])]) = {
  newPartitions.groupBy { case (range, machine) => machine }
  .flatMap { case (machine, ranges) => machine.keepOnly( }
  .foreach { case (k, v) => put(k, v) }

Redistributing isn’t that complicated either. We group all the nodes in the ring by the machine they are on, then for each machine we tell it to only keep values that are in its replicas. The machine keepOnly function takes a list of ranges and will remove and return anything not in those ranges. We can now aggregate all the things that are “emitted” by the machines and re insert them into the right location

Removing a machine is really similiar

def removeMachine(machine: Machine[TValue]): Unit = {
  val remainingMachines = partitions.filter { case (r, m) => !m.eq(machine) }.map(_._2)

  partitions = getPartitions(remainingMachines.distinct)


And thats all there is to it! Now we have a fast, simple consistent hasher.

A toy generational garbage collector

Had a little downtime today and figured I’d make a toy generational garbage collector, for funsies. A friend of mine was once asked this as an interview question so I thought it might make for some good weekend practice.

For those not familiar, a common way of doing garbage collection in managed languages is to have the concept of multiple generations. All newly created objects go in gen0. New objects are also the most probably to be destroyed, as there is a lot of transient stuff that goes in an application. If an element survives a gc round it gets promoted to gen1. Gen1 doesn’t get GC’d as often. Same with gen2.

A GC cycle usually consists of iterating through application root nodes (so starting at main and traversing down) and checking to see where a reference lays in which generation. If we’re doing a gen1 collection, we’ll also do gen0 and gen1. However, if you’re doing gen0 only and a node is already laying in gen1, you can just bail early and say “meh, this node and all its references are probably ok for now, we’ll try this later“.

For a really great visualization checkout this msdn article on generation garabage collection.

And now on to the code! First lets start with what is an object

@EqualsAndHashCode(of = "id")
public class Node {
    private final String id;

    public Node(String id) { = id;

    private final List<Node> references = new ArrayList<>();

    public void addReference(Node node) {

    public void removeReference(Node node) {
        references.removeIf(i -> i.getId().equals(node.getId()));

For the purposes of the toy, its just some node with some unique id.

Lets also define an enum of the different generations we’ll support and their ordinal values

public enum Mode {

Next, lets make an allocator who can allocate new nodes. This would be like a new syntax behind the scenes

public class Allocator {
    private Set<Node> gen0 = new HashSet<>();

    private Set<Node> gen1 = new HashSet<>();

    public Node newNode() {
        return newNode("");

    public Node newNode(String tag) {
        final Node node = new Node(tag + UUID.randomUUID());


        return node;

    public Mode locateNode(Node tag) {
        if (gen1.contains(tag)) {
            return Mode.Gen1;

        return Mode.Gen0;

At this point we can allocate a new node, and assign nodes references.

final Allocator allocator = new Allocator();

final Node root = allocator.newNode();


Still haven’t actually collected anything though yet. So lets write a garbage collector

public class Gc {
    private final Allocator allocator;

    public Gc(Allocator allocator) {
        this.allocator = allocator;

    public void collect(Node root, Mode mode) {
        final Allocator.Marker marker = allocator.markBuilder(mode);

        mark(root, marker, mode);


    private void mark(Node root, Allocator.Marker marker, Mode mode) {
        final Mode found = allocator.locateNode(root);

        if (found.ordinal() > mode.ordinal()) {


        root.getReferences().forEach(ref -> mark(ref, marker, mode));

The GC dos a DFS on the root object reference and marks all visible nodes with some marker builder (yet to be shown). If the generational heap that the node lives in is less than or equal to the mode we are on, we’ll mark it, otherwise just skip it. This works because later we’ll only prune from generation heaps according to the mode

Now comes the fun part, and its the marker

public static class Marker {

    private final Set<String> marks;
    private final Allocator allocator;

    private final Mode mode;

    public Marker(Allocator allocator, final Mode mode) {
        this.allocator = allocator;
        this.mode = mode;
        marks = new HashSet<>();

    public void mark(Node node) {

    public void sweep() {
        Predicate<Node> remove = node -> !marks.contains(node.getId());



        switch (mode) {
            case Gen0:
            case Gen1:

All we do here is when we mark, tag the node in a set. When we go to sweep, go through the generations less than or equal to the current and remove unmarked nodes, as well as promote the surviving nodes to the next heap!

Still missing two last functions in the allocator which are promote and the marker builder

public Marker markBuilder(final Mode mode) {
    return new Marker(this, mode);

private void promote(final Mode mode) {
    switch (mode) {
        case Gen0:
        case Gen1:

Now we can put it all together and write some tests:

Below you can see the promotion in action.

final Allocator allocator = new Allocator();

final Gc gc = new Gc(allocator);

final Node root = allocator.newNode();


final Node removable = allocator.newNode("remove");




gc.collect(root, Mode.Gen0);


Nothing can be collected since all nodes have references, but we’ve cleared the gen0 and moved all nodes to gen1


gc.collect(root, Mode.Gen1);


Now we can actually remove the reference and do a gen1 collection. You can see now that the gen1 heap size went down by 3 (so the removable node, plus its two children) since those nodes are no longer reachable

And just for fun, lets show that gen1 collection works as well

final Node gen1Remove = allocator.newNode();


gc.collect(root, Mode.Gen1);



gc.collect(root, Mode.Gen1);


And there you have it, a toy generational garbage collector :)

For the full code, check out this gist

Logging the easy way

This is a cross post from the original posting at godaddy’s engineering blog. This is a project I have spent considerable time working on and leverage a lot.

Logging is a funny thing. Everyone knows what logs are and everyone knows you should log, but there are no hard and fast rules on how to log or what to log. Your logs are your first line of defense against figuring out issues live. Sometimes logs are the only line of defense (especially in time sensitive systems).

That said, in any application good logging is critical. Debugging an issue can be made ten times easier with simple, consistent logging. Inconsistent or poor logging can actually make it impossible to figure out what went wrong in certain situations. Here at GoDaddy we want to make sure that we encourage logging that is consistent, informative, and easy to search.

Enter the GoDaddy Logger. This is a SLF4J wrapper library that encourages us to fall into the pit of success when dealing with our logging formats and styles in a few ways:

  • Frees you from having to think about what context fields need to be logged and removes any worries about forgetting to log a value,
  • Provides the ability to skip personal identifiable information from being logged,
  • Abstracts out the actual format of the logs from the production of them. By decoupling the output of the framework from the log statements themselves, you can easily swap out the formatter when you want to change the structure and all of your logging statements will be consistently logged using the new format.

A lot of teams at GoDaddy use ELK (Elasticsearch, Logstash, Kibana) to search logs in a distributed system. By combining consistent logging with ELK (or Splunk or some other solution), it becomes relatively straight forward for developers to correlate and locate related events in their distributed systems.


In an effort to make doing the right thing the easy thing, our team set out to build an extra layer on top of SLF4J – The GoDaddy Logger. While SLF4J is meant to abstract logging libraries and gives you a basic logging interface, our goal was to extend that interface to provide for consistent logging formats. One of the most important things for us was that we wanted to provide an easy way to log objects rather than having to use string formatting everywhere.


One of the first things we did was expose what we call the ‘with’ syntax. The ‘with’ syntax builds a formatted key value pair, which by default is “key=value;”, and allows logging statements to be more human readable. For example:

logger.with(“first-name”, “GoDaddy”)
     .with(“last-name”, “Developers!”)
     .info(“Logging is fun”);

Using the default logging formatter this log statement outputs:

Logging is fun; first-name=“GoDaddy”; last-name=”Developers!”.
We can build on this to support deep object logging as well. A good example is to log the entire object from an incoming request. Instead of relying on the .toString() of the object to be its loggable representation, we can crawl the object using reflectasm and format it globally and consistently. Let’s look at an example of how a full object is logged.

Logger logger = LoggerFactory.getLogger(LoggerTest.class);
Car car = new Car(“911”, 2015, “Porsche”, 70000.00, Country.GERMANY, new Engine(“V12”));
logger.with(car).info(“Logging Car”);

Like the initial string ‘with’ example, the above log line produces:

14:31:03.943 [main] INFO – Logging Car; cost=70000.0; country=GERMANY;”V12”; make=”Porsche”; model=”911”; year=2015

All of the car objects info is cleanly logged in a consistent way. We can easily search for a model property in our logs and we won’t be at the whim of spelling errors of forgetful developers. You can also see that our logger nests object properties in dot object notation like “”V12””. To accomplish the same behavior using SLF4J, we would need to do something akin to the following:

Use the Car’s toString functionality:

Implement the Car object’s toString function:

String toString() {
     Return “cost=” + cost + “; country=” + country + “;” + (engine == null ? “null” : engine.getName()) … etc.

Log the car via it’s toString() function:“Logging Car; {}”, car.toString());

Use String formatting"Logging Car; cost={}; country={};\"{}\"; make=\"{}\"; model=\"{}\"; " + "year={}; test=\"{}\"", car.getCost(), car.getCountry(), car.getEngine() == null ? null : car.getEngine().getName(), car.getMake(), car.getModel(), car.getYear());

Our logger combats these unfortunate scenarios and many others by allowing you to set the recursive logging level, which defines the amount of levels deep into a nested object you want to have logged and takes into account object cycles so there isn’t infinite recursion.


The GoDaddy Logger provides annotation based logging scope support giving you the ability to prevent fields/methods from being logged with the use of annotations. If you don’t want to skip the entity completely, but would rather provide a hashed value, you can use an injectable hash processor to hash the values that are to be logged. Hashing a value can be useful since you may want to log a piece of data consistently but you may not want to log the actual data value. For example:

import lombok.Data;


public class AnnotatedObject {
     private String notAnnotated;

@LoggingScope(scope = Scope.SKIP)
     private String annotatedLogSkip;
     public String getNotAnnotatedMethod() {
          return "Not Annotated";

     @LoggingScope(scope = Scope.SKIP)
     public String getAnnotatedLogSkipMethod() {
          return "Annotated";

@LoggingScope(scope = Scope.HASH)
     public String getCreditCardNumber() {
          return "1234-5678-9123-4567";

If we were to log this object:

AnnotatedObject annotatedObject = new AnnotatedObject();
annotatedObject.setAnnotatedLogSkip(“SKIP ME”);
annotatedObject.setNotAnnotated(“NOT ANNOTATED”);

logger.with(annotatedObject).info(“Annotation Logging”);

The following would be output to the logs:

09:43:13.306 [main] INFO – Annotating Logging; creditCardNumber=”5d4e923fe014cb34f4c7ed17b82d6c58; notAnnotated=”NOT ANNOTATED”; notAnnotatedMethod=”Not Annotated”

Notice that the annotatedLogSkip value of “SKIP ME” is not logged. You can also see that the credit card number has been hashed. The GoDaddy Logger uses Guava’s MD5 hashing algorithm by default which is not cryptographically secure, but definitely fast. And you’re able to provide your own hashing algorithm when configuring the logger.


One of the more powerful things of the logger is that the ‘with’ syntax returns a new immutable captured logger. This means you can do something like this:

Logger contextLogger = logger.with(“request-id”, 123);“enter”);

// .. Do Work“exist”);

All logs generated off the captured logger will include the captured with statements. This lets you factor out common logging statements and cleans up your logs so you see what you really care about (and make less mistakes).


With consistent logging we can easily search through our logs and debug complicated issues with confidence. As an added bonus, since our log formatting is centralized and abstracted, we can also make team-wide or company-wide formatting shifts without impacting developers or existing code bases.

Logging is hard. There is a fine line between logging too much and too little. Logging is also best done while you write code vs. as an afterthought. We’ve really enjoyed using the GoDaddy Logger and it’s really made logging into a simple and unobtrusive task. We hope you take a look and if you find it useful for yourself or your team let us know!

For more information about the GoDaddy Logger, check out the GitHub project, or if you’re interested in working on these and other fun problems with us, check out our jobs page.

Serialization of lombok value types with jackson

For anyone who uses lombok with jackson, you should checkout jackson-lombok which is a fork from xebia that allows lombok value types (and lombok generated constructors) to be json creators.

The original authors compiled their version against jackson-core 2.4.* but the new version uses 2.6.*. Props needs to go to github user kazuki-ma for submitting a PR that actually addresses this. Paradoxical just took those fixes and published.

Anyways, now you get the niceties of being able to do:

public class ValueType{
    private String name;
    private String description;

And instantiate your mapper:

new ObjectMapper().setAnnotationIntrospector(new JacksonLombokAnnotationIntrospector());


Cassandra DB migrations

When doing any application that involves a persistent data storage you usually need a way to upgrade and change your database using a set of scripts. Working with patterns like ActiveRecord you get easy up/down by version migrations. But with cassandra, which traditionally was schemaless, there aren’t that many tools out there to do this.

One thing we have been using at my work and at paradoxical is a simple java based cassandra loader tool that does “up” migrations based on db version scripts.

Assuming you have a folder in your application that stores db scripts like


Then each script corresponds to a particular db version state. It’s current state depends on all previous states. Our cassandra loader tracks db versions in a db_version table and lets you apply runners against a keyspace to move your schema (and data) to the target version. If your db is already at a version it does nothing, or if your db is a few versions back the runner will only run the required versions to get you to latest (or to the version number you want).

Taking this one step further, when working at least in Java we have the luxury of using cassandra-unit to actually run an embedded cassandra instance available for unit or integration tests. This way you don’t need to mock out your database, you actually run all your db calls through the embedded cassandra. We use this heavily in cassieq (a distributed queue based on cassandra).

One thing our cassandra loader can do is be run in library mode, where you give it the same set of db scripts and you can build a fresh db for your integration tests:

public static Session create() throws Exception {
    return CqlUnitDb.create("../db/scripts");

Running the loader in standalone mode (by downloading the runner maven classifier) lets you run the migration runner in your console:

> java -jar cassandra.loader-runner.jar

Unexpected exception:Missing required options: ip, u, pw, k
usage: Main
 -f,--file-path <arg>         CQL File Path (default =
 -ip <arg>                    Cassandra IP Address
 -k,--keyspace <arg>          Cassandra Keyspace
 -p,--port <arg>              Cassandra Port (default = 9042)
 -pw,--password <arg>         Cassandra Password
 -recreateDatabase            Deletes all tables. WARNING all
                              data will be deleted! 
 -u,--username <arg>          Cassandra Username
 -v,--upgrade-version <arg>   Upgrade to Version

The advantage to unifying all of this is that you can test your db scripts in isolation and be confident that they work!

Dalloc – coordinating resource distribution using hazelcast

A fun problem that has come up during the implementation of cassieq (a distributed queue based on cassandra) is how to evenly distribute resources across a group of machines. There is a scenario in cassieq where writes can be delayed, and as such there is a custom worker in the app (by queue) who watches a queue to see if a delayed write comes in and republishes the message to a bucket later on. It’s transparent to the user, but if we have multiple workers on the same queue we could potentially republish the message twice. While technically that falls within the SLA we’ve set for cassieq (at least once delivery) it’d be nice to avoid this particular race condition.

To solve this, I’ve clustered the cassieq instances together using hazelcast. Hazelcast is a pretty cool library since it abstracts away member discovery/connection and gives you events on membership changes to make it easy for you to build distributed data grids. It also has a lot of great primitives that are useful in building distributed workflows. Using hazelcast, I’ve built a simple resource distributor that uses shared distributed locks and a master set of allocations across cluster members to coordinate who can “grab” which resource.

For the impatient you can get dalloc from


The general idea in dalloc is that each node creates a resource allocator who is bound to a resource group name (like “Queues”). Each node supplies a function to the allocator that generates the master set of resources to use, and a callback for when resources are allocated. The callback is so you can wire in async events and when allocations need to be rebalanced outside of a manual invocation (like cluster member/join).

The entire resource allocation library API deals with abstractions on what a resource is, and lets the client map their internal resource into a ResourceIdentity. For cassieq, it’s a queue id.

When an allocation is triggered (either manually or via a member join/leave event) the following occurs:

  • Try and acquire a shared lock for a finite period of time
  • If you acquired the lock, acquire a map of what has been allocated to everyone else and compare what is available from your master set to what is available
  • Given the size of the current cluster, determine how many resources you are allowed to claim (by even distribution). If you don’t have your entire set claimed, take as many as you can to fill up. If you have too many claimed, give some resources up
  • Persist your changes to the master state map
  • Dispatch to your callback what the new set of resources should be

Hazelcast supports distributed maps, where part of the map is sharded by its map key on different nodes. However, I’m actually explicitly NOT distributing the map across the cluster. I’ve put ownership of the resource set on “one” node (but the map is replicated so if that node goes down the map still exists). This is because each node is going to have to try and do a claim. If each node claims, and then calls to every other node, thats n^2 IO operations. Compare that to every node making N operations.

The library also supports bypassing this mechanism and instead supports a much more “low-tech” solution of manual allocation. All this means is that you pre-define how many nodes there should be, and which node number a node is. Then each node sorts the input data and grabs a specific slice out of the input set based on its id. It doesn’t give any guarantees to non-overlap, but it does give you an 80% solution to a hard problem.

Jake, the other paradoxical member suggested that there could be a nice alternative solution using a similar broadcast style of quorum using paxos. Each node broadcasts what it’s claiming and the nodes agree on who is allowed to do what. I probably wouldn’t use hazelcast for that, though the primitives of paxos (talking to all members of a cluster) are there and it’d be interesting to build paxos on top of hazelcast now that I think about it…

Anyways, abstracting distributed resource allocation is nice, because as we make improvements to how we want to tune the allocation algorithms all dependent services get it for free. And free stuff is my favorite.

Leadership election with cassandra

Cassandra has a neat feature that lets you expire data in a column. Using this handy little feature, you can create simple leadership election using cassandra. The whole process is described here which talks about leveraging Cassandras consensus and the column expiration to create leadership electors.

The idea is that a user will try and claim a slot for a period of time in a leadership table. If a slot is full, someone else has leadership. While the leader is still active they needs to heartbeat the table faster than the columns TTL to act as a keepalive. If it fails to heartbeat (i.e. it died) then its leadership claim can be relinquished and someone else can claim it. Unlike most leadership algorithms that claim a single “host” as a leader, I needed a way to create leaders sharded by some “group”. I call this a “LeadershipGroup” and we can leverage the expiring columns in cassandra to do this!

To make this easier, I’ve wrapped this algorithm in a java library available from paradoxical. For the impatient


The gist here is that you need to provide a schema similar to

CREATE TABLE leadership_election (
    group text PRIMARY KEY,
    leader_id text

Though the actual column names can be custom defined. You can define a leadership election factory using Guice like so

public class LeadershipModule extends AbstractModule {
    protected void configure() {


  • LeadershipStatus is a class that lets you query who is leader for what “group”. For example, you can have multiple workers competing for leadership of a certain resource.
  • LeadershipSchema is a class that defines what the column names in your schema are named. By default if you use the sample table above, the Default schema maps to that
  • LeadershipElectionFactory is a class that gives you instances of LeadershipElection classes, and I’ve provided a cassandra leadership factory

Once we have a leader election we can try and claim leadership:

final LeadershipElectionFactory factory = new CassandraLeadershipElectionFactory(session);

// create an election processor for a group id
final LeadershipElection leadership = factory.create(LeadershipGroup.random());

final LeaderIdentity user1 = LeaderIdentity.valueOf("user1");

final LeaderIdentity user2 = LeaderIdentity.valueOf("user2");

assertThat(leadership.tryClaimLeader(user1, Duration.ofSeconds(2))).isPresent();


assertThat(leadership.tryClaimLeader(user2, Duration.ofSeconds(3))).isPresent();

When you claim leadership you claim it for a period of time and if you get it you get a leadership token that you can heartbeat on. And now you have leadership!

As usual, full source available at my github

Plugin class loaders are hard

Plugin based systems are really common. Jenkins, Jira, wordpress, whatever. Recently I built a plugin workflow for a system at work and have been mired in the joys of the class loader. For the uninitiated, a class in Java is identified uniquely by the class loader instance it is created from as well as its fully qualified class name. This means that class loaded by class loader A is not the same as class loaded by class loader B.

There are actually some cool things you can do with this, especially in terms of code isolation. Imagine your plugins are bundled as shaded jars that contain all the internal dependencies. By leveraging class loaders you can isolate potentially conflicting versions of libraries from the host application and the plugin. But, in order to communicate to the host layer, you need a strict set of shared interfaces that the host layer always owns. When building the uber jar you exclude the host interfaces from being bundled (and all its transitive dependencies which in maven can be done by using scope provided). This means that they will always be loaded by the host.

In general, class loaders are heirarchical. They ask their parent if a class has been loaded, and if so returns that. In order to do plugins you need to invert that process. First look inside the uber-jar, and then if you can’t find a class then look up.

An example can be found here and copied for the sake of internet completeness:

import java.util.UUID;

public class PostDelegationClassLoader extends URLClassLoader {

    private final UUID id = UUID.randomUUID();

    public PostDelegationClassLoader(URL[] urls, ClassLoader parent, URLStreamHandlerFactory factory) {
        super(urls, parent, factory);

    public PostDelegationClassLoader(URL[] urls, ClassLoader parent) {
        super(urls, parent);

    public PostDelegationClassLoader(URL[] urls) {

    public PostDelegationClassLoader() {
        super(new URL[0]);

    public Class<?> loadClass(String name) throws ClassNotFoundException {
        try (ThreadCurrentClassLoaderCapture capture = new ThreadCurrentClassLoaderCapture(this)) {
            Class loadedClass = findLoadedClass(name);

            // Nope, try to load it
            if (loadedClass == null) {
                try {
                    // Ignore parent delegation and just try to load locally
                    loadedClass = findClass(name);
                catch (ClassNotFoundException e) {
                    // Swallow - does not exist locally

                // If not found, just use the standard URLClassLoader (which follows normal parent delegation)
                if (loadedClass == null) {
                    // throws ClassNotFoundException if not found in delegation hierarchy at all
                    loadedClass = super.loadClass(name);
            return loadedClass;

    public URL getResource(final String name) {
        final URL resource = findResource(name);

        if (resource != null) {
            return resource;

        return super.getResource(name);

But this is just the tip of the fun iceberg. If all your libraries play nice then you may not notice anything. But I recently noticed using the apache xml-rpc library that I would get a SAXParserFactory class def not found exception, specifically bitching about instantiating the sax parser factory. I’m not the only one apparenlty, here is a discussion about a JIRA plugin that wasn’t happy. After much code digging I found that the classloader being used was the one bound to the threads current context.

Why in the world is there a classloader bound to thread local? JavaWorld has a nice blurb about this

Why do thread context classloaders exist in the first place? They were introduced in J2SE without much fanfare. A certain lack of proper guidance and documentation from Sun Microsystems likely explains why many developers find them confusing.

In truth, context classloaders provide a back door around the classloading delegation scheme also introduced in J2SE. Normally, all classloaders in a JVM are organized in a hierarchy such that every classloader (except for the primordial classloader that bootstraps the entire JVM) has a single parent. When asked to load a class, every compliant classloader is expected to delegate loading to its parent first and attempt to define the class only if the parent fails.

Sometimes this orderly arrangement does not work, usually when some JVM core code must dynamically load resources provided by application developers. Take JNDI for instance: its guts are implemented by bootstrap classes in rt.jar (starting with J2SE 1.3), but these core JNDI classes may load JNDI providers implemented by independent vendors and potentially deployed in the application’s -classpath. This scenario calls for a parent classloader (the primordial one in this case) to load a class visible to one of its child classloaders (the system one, for example). Normal J2SE delegation does not work, and the workaround is to make the core JNDI classes use thread context loaders, thus effectively “tunneling” through the classloader hierarchy in the direction opposite to the proper delegation.

This means that whenever I’m delegating work to my plugins I need to be smart about capturing my custom plugin class loader and putting it on the current thread before execution. Otherwise if a misbehaving library accesses the thread classloader, it can now have access to the ambient root class loader and IFF the same class name exists in the host application it will load it. This could potentially conflict with other classes from the same package that aren’t loaded this way and in general cause mayhem.

The solution here was a simple class modeled after .NET’s disposable pattern using Java’s try/finally auto closeable.

public class ThreadCurrentClassLoaderCapture implements AutoCloseable {
    final ClassLoader originalClassLoader;

    public ThreadCurrentClassLoaderCapture(final ClassLoader newClassLoader) {
        originalClassLoader = Thread.currentThread().getContextClassLoader();


    public void close() {

Which is used before each and every invocation into the interface of the plugin (where connection is the plugin reference)

public void start() throws Exception {

public void stop() throws Exception {

public void heartbeat() throws Exception {

private void captureClassLoader(ExceptionRunnable runner) throws Exception {
    try (ThreadCurrentClassLoaderCapture capture = new ThreadCurrentClassLoaderCapture(connection.getClass().getClassLoader())) {;

However, this isn’t the only issue. Imagine a scenario where you support both class path loaded plugins AND remote loaded plugins (via shaded uber-jar). And lets pretend that on the classpath is a jar with the same namespaces and classes as that in an uberjar. To be more succinct, you have a delay loaded shared library on the class path, and a version of that library that is shaded loaded via the plugin mechanism.

Technically there shouldn’t be any issues here. The class path plugin gets all its classes resolved from the root scope. The plugin gets its classes (of the same name) from the delegated provider. Both use the same shared set of interfaces of the host. The issue arrises if you have a library like reflectasm, which dynamically emits bytecode at runtime.

Look at this code:

AccessClassLoader loader = AccessClassLoader.get(type);
synchronized (loader) {
	try {
		accessClass = loader.loadClass(accessClassName);
	} catch (ClassNotFoundException ignored) {
		String accessClassNameInternal = accessClassName.replace('.', '/');
		String classNameInternal = className.replace('.', '/');
		ClassWriter cw = new ClassWriter(ClassWriter.COMPUTE_MAXS);

Which is a snippet from reflectasm as its generating a runtime byte code emitter that can access fields for you. It creates a class name like your.class.nameMethodAccess. If the class name isn’t found, it generates the bytecode and then writes it into the owning classes class loader.

In the scenario of a plugin using this library, it will check the loader and see that the plugin classloader AND rootscope loader do not have the emitted class name, and so a class not found exception is thrown. It will then write the class into the target types class loader. This would be the delegated loader, and provides the isolation we want.

However, if the class path plugin (what I call an embedded plugin) runs this code, the dynamic runtime class is written into the root scope loader. This means that all delegating class loaders will eventually find this type since they always do a delegated pass to the root!

The important thing to note here is that using a delegated loader does not mean every class that comes out of it is tied to the delegated loader. Only classes that are found inside of the delegated loader are bound to it. If a class is resolved by the parent, the class is linked to the parent.

In this scenario with the root class loader being polluted with the same class name, I don’t think there is much you can do other than avoid it.

Anyways, maybe I should have used OSGi…?