Last night Jake and I presented CassieQ (the distributed message queue on cassandra) at the seattle cassandra users meetup at the Expedia building in Bellevue. Thanks for everyone who came out and chatted with us, we certainly learned a lot and had some great conversations regarding potential optimizations to include in CassieQ.
A couple good points that came up where how to minimize the use of compare and set with the monoton provider, whether we can move to time UUID’s for “auto” incrementing monotons. Another interesting tidbit was the discussion of using potential time based compaction strategies that are being discussed that could give a big boost given the workflow cassieq has.
But my favorite was the suggestion that we create “kafka” mode and move the logic of storing pointer offsets out of cassieq and onto the client, in which case we could get enormous gains since we no longer … Read more
When doing any application that involves a persistent data storage you usually need a way to upgrade and change your database using a set of scripts. Working with patterns like ActiveRecord you get easy up/down by version migrations. But with cassandra, which traditionally was schemaless, there aren’t that many tools out there to do this.
One thing we have been using at my work and at paradoxical is a simple java based cassandra loader tool that does “up” migrations based on db version scripts.
Assuming you have a folder in your application that stores db scripts like
db/scripts/01_init.cql db/scripts/02_add_thing.cql .. db/sripts/10_migrate_users.cql ..
Then each script corresponds to a particular db version state. It’s current state depends on all previous states. Our cassandra loader tracks db versions in a
db_version table and lets you apply runners against a keyspace to move your schema (and data) to the target version. If your … Read more
Cassandra has a neat feature that lets you expire data in a column. Using this handy little feature, you can create simple leadership election using cassandra. The whole process is described here which talks about leveraging Cassandras consensus and the column expiration to create leadership electors.
The idea is that a user will try and claim a slot for a period of time in a leadership table. If a slot is full, someone else has leadership. While the leader is still active they needs to heartbeat the table faster than the columns TTL to act as a keepalive. If it fails to heartbeat (i.e. it died) then its leadership claim can be relinquished and someone else can claim it. Unlike most leadership algorithms that claim a single “host” as a leader, I needed a way to create leaders sharded by some “group”. I call this a “LeadershipGroup” and we can … Read more
Edit: this project has since been moved to CassieQ: https://github.com/paradoxical-io/cassieq
A few weeks ago my work had a hack day and I got together with some of my coworker friends and we decided to build a queue on top of Cassandra.
For the impatient, give it a try (docker hub):
docker run -it \ -e CLUSTER_NAME="" \ -e KEYSPACE="" \ -e CONTACT_POINTS="" \ -e USERNAME="" \ -e PASSWORD="" \ -e USE_SSL="" \ -e DATA_CENTER="" \ -e METRICS_GRAPHITE "true" \ -e GRAPHITE_PREFIX=" \ -e GRAPHITE_URL="" \ onoffswitch/angelhair
The core features for what we called Project Angelhair was to handle:
– long term events (so many events that AMQ or RMQ might run out of storage space)
– connectionless – wanted to use http
– invisibility – need messages to disappear when they are processing but be able to come back
– highly scaleable – wanted to distribute a docker … Read more