Coproducts and polymorphic functions for safety

I was recently exploring shapeless and a coworker turned me onto the interesting features of coproducts and how they can be used with polymorphic functions.

Frequently when using pattern matching you want to make sure that all cases are exhaustively checked. A non exhaustive pattern match is a runtime exception waiting to happen. As a scala user, I’m all about compile time checking. For classes that I own I can enforce exhaustiveness by creating a sealed trait heirarchy:

sealed trait Base
case class Sub1() extends Base
case class Sub2() extends Base

And if I ever try and match on an Base type I’ll get a compiler warning (that I can fail on) if all the types aren’t matched. This is nice because if I ever add another type, I’ll get a (hopefully) failed build.

But what about the scenario where you don’t own the types?

case class Type1()
case class 
Read more
Mocking nested objects with mockito

Yes, I know its a code smell. But I live in the real world, and sometimes you need to mock nested objects. This is a scenario like:


The usual pattern here is to create a mock for each object and return the previous mock:

val a = mock[A]
val b = mock[B]
val c = mock[C]
val d = mock[D]


But again, in the real world the signatures are longer, the types are nastier, and its never quite so clean. I figured I’d sit down and solve this for myself once and for all and came up with:

import org.junit.runner.RunWith
import org.mockito.Mockito
import org.scalatest.junit.JUnitRunner
import org.scalatest.{FlatSpec, Matchers}

class Tests extends FlatSpec with Matchers {
  "Mockito" should "proxy nested objects" in {
    val parent = Mocks.mock[Parent]


    parent.value.getChild1.getChild2.getChild3.doWork() shouldEqual 3

class Child3 {
  def doWork(): Int = 0
Read more
Extracting scala method names from objects with macros

I have a soft spot in me for AST’s ever since I went through the exercise of building my own language. Working in Java I missed the dynamic ability to get compile time information, though I knew it was available as part of the annotation processing pipleine during compilation (which is how lombok works). Scala has something similiar in the concept of macros: a way to hook into the compiler, manipulate or inspect the syntax tree, and rewrite or inject whatever you want. It’s a wonderfully elegant system that reminds me of Lisp/Clojure macros.

I ran into a situation (as always) where I really wanted to get the name of a function dynamically. i.e.

class Foo {
   val field: String = "" 
   def method(): Unit = {}

val name: String = ??.field // = "field"

In .NET this is pretty easy since at runtime you can create an … Read more

Dealing with a bad symbolic reference in scala

Every time this hits me I have to think about it. The compiler barfs at you with something ambiguous like

[error] error: bad symbolic reference. the classpath might be incompatible with the version used when compiling Foo.class.

What this really is saying is that Foo.class references some import or class whose namespace isn’t on the classpath or has fields missing. I usually get this when I have a project level circular dependency via transitive includes. I.e.

Repo 1/
  /project A
  /project B -> depends on C and A
Repo 2
  /project C -> depends on A

So here the dependency C pulls in a version of A but that version may not be the same that project B pulls in. If I do namespace refactoring in project A, then project B won’t compile if those namespaces are used by project C. It’s a mess.

Thankfully scala lets you maintain folder … Read more

Scripting deployment of clusters in asgard

We use asgard at work to do deployments in both qa and production. Our general flow is to check in, have jenkins build, an AMI is created, and then … we have to manually go to asgard and deploy it. That sucks.

However, its actually not super hard to write some scripts to find the latest AMI for a cluster and prepare an automated deployment pipeline from a template. Here you go:

function asgard(){
  http ${VERB} --verify=no "$url" -b

function next-ami(){

  prepare-ami $cluster true | \
    jq ".environment.images | reverse | .[0]"

function prepare-ami(){


  asgard GET "deployment/prepare/${cluster}?deploymentTemplateName=CreateAndCleanUpPreviousAsg&includeEnvironment=${includeEnv}"

function get-next-ami(){

  next=`next-ami ${cluster} | jq ".id"`

  prepare-ami ${cluster} "false" | jq ".lcOptions.imageId |= ${next}"

function start-deployment(){

  echo $payload | asgard POST "deployment/start/${cluster}"

The gist here is to

  • Find the next AMI image of a cluster
  • Get the prepared
Read more
Unit testing DNS failovers

Something that’s come up a few times in my career is the difficulty of validating if and when your code can handle actual DNS changes. A lot of times testing that you have the right JVM settings and that your 3rd party clients can handle it involves mucking with hosts files, nameservers, or stuff like Route53 and waiting around. Then its hard to automate and deterministically reproduce. However, you can hook into the DNS resolution in the JVM to control what gets resolved to what. And this way you can tweak the resolution in a test and see what breaks! I found some info at this blog post and cleaned it up a bit for usage in scala.

The magic sauce to pull this off is to make sure you override the default Internally in the InetAddress class it tries to load an instance of the interface NameServiceDescriptorRead more

CassieQ at the Seattle Cassandra Users Meetup

Last night Jake and I presented CassieQ (the distributed message queue on cassandra) at the seattle cassandra users meetup at the Expedia building in Bellevue. Thanks for everyone who came out and chatted with us, we certainly learned a lot and had some great conversations regarding potential optimizations to include in CassieQ.

A couple good points that came up where how to minimize the use of compare and set with the monoton provider, whether we can move to time UUID’s for “auto” incrementing monotons. Another interesting tidbit was the discussion of using potential time based compaction strategies that are being discussed that could give a big boost given the workflow cassieq has.

But my favorite was the suggestion that we create “kafka” mode and move the logic of storing pointer offsets out of cassieq and onto the client, in which case we could get enormous gains since we no longer … Read more

Consistent hashing for fun

I think consistent hashing is pretty fascinating. It lets you define a ring of machines that shard out data by a hash value. Imagine that your hash space is 0 -Int.Max, and you have 2 machines. Well one machine gets all values hashed from 0 -Int.Max/2 and the other from Int.Max/2 -Int.Max. Clever. This is one of the major algorithms of distributed systems like cassandra and dynamoDB.

For a good visualization, check out this blog post.

The fun stuff happens when you want to add replication and fault tolerance to your hashing. Now you need to have replicants and manage when machines join and add. When someone joins, you need to re-partition the space evenly and re-distribute the values that were previously held.

Something similar when you have a node leave, you need to make sure that whatever it was responsible for in its primray space … Read more

A toy generational garbage collector

Had a little downtime today and figured I’d make a toy generational garbage collector, for funsies. A friend of mine was once asked this as an interview question so I thought it might make for some good weekend practice.

For those not familiar, a common way of doing garbage collection in managed languages is to have the concept of multiple generations. All newly created objects go in gen0. New objects are also the most probably to be destroyed, as there is a lot of transient stuff that goes in an application. If an element survives a gc round it gets promoted to gen1. Gen1 doesn’t get GC’d as often. Same with gen2.

A GC cycle usually consists of iterating through application root nodes (so starting at main and traversing down) and checking to see where a reference lays in which generation. If we’re doing a gen1 collection, we’ll also do … Read more