February, 2016

A toy generational garbage collector

Had a little downtime today and figured I’d make a toy generational garbage collector, for funsies. A friend of mine was once asked this as an interview question so I thought it might make for some good weekend practice.

For those not familiar, a common way of doing garbage collection in managed languages is to have the concept of multiple generations. All newly created objects go in gen0. New objects are also the most probably to be destroyed, as there is a lot of transient stuff that goes in an application. If an element survives a gc round it gets promoted to gen1. Gen1 doesn’t get GC’d as often. Same with gen2.

A GC cycle usually consists of iterating through application root nodes (so starting at main and traversing down) and checking to see where a reference lays in which generation. If we’re doing a gen1 collection, we’ll also do … Read more

, ,

RMQ failures from suspended VMs

My team recently ran into a bizarre RMQ partition failure in a production cluster. RMQ doesn’t handle partition failures well, and while you can set up auto recovery (such as suspension of minority groups) you need to manually recover from it. The one time I’ve encountered this I got a very useful message in the admin managment page indicating that parts of the cluster were in partition failure, but this time things went weird.

Symptoms:

  • Could not gracefully restart rmq using rabbitmqctl stop_app/start_app. The commands would stall
  • Could not list queues for any vhost. rabbitmqctl list_queues -p [vhost] would stall
  • Logs showed partition failure
  • People could not consistently log into the admin api without stalls, or other strange issues even when clearing browsing data/local storage/incognito/different browsers
  • Rebooting the master did not help

In the end the solution was to do an NTP time sync, turn off all clustered slaves … Read more

, ,