Category: Uncategorized

CassieQ at the Seattle Cassandra Users Meetup

Last night Jake and I presented CassieQ (the distributed message queue on cassandra) at the seattle cassandra users meetup at the Expedia building in Bellevue. Thanks for everyone who came out and chatted with us, we certainly learned a lot and had some great conversations regarding potential optimizations to include in CassieQ.

A couple good points that came up where how to minimize the use of compare and set with the monoton provider, whether we can move to time UUID’s for “auto” incrementing monotons. Another interesting tidbit was the discussion of using potential time based compaction strategies that are being discussed that could give a big boost given the workflow cassieq has.

But my favorite was the suggestion that we create “kafka” mode and move the logic of storing pointer offsets out of cassieq and onto the client, in which case we could get enormous gains since we no longer need to do compare and sets for multiple consumers. If we do see that pull request come in I think both Jake and I would be pretty stoked.

Anyways, the slides of our presentation are available here: paradoxical.io/slides/cassieq (keynote)

AngularJS for .Net developers

A few months ago I was asked to be a technical reviewer on a new packt pub book called AngularJS for .Net developers. It mostly revolves around ServiceStack (not web API) and building a full stack application with angular. I actually really enjoyed reading it and thought it touched on a lot of great points that a developer who is serious needs to know about.

Unfortunately I think the book doesn’t do a very good job at explaining angular in general. It’s certainly geared to the experienced developer who has worked with angular and servicestack/c# REST before.

Still, if you are interested in using angular as a .net developer its an informative and quick read!

Leveraging message passing to do currying in ruby

I’m not much of a ruby guy, but I had the inkling to play with it this weekend. The first thing I do when I’m in a new language is try to map constructs that I’m familiar with, from basic stuff like object instantiation, singletons, inheritance, to more complicated paradigms like lambdas and currying.

I came across this blog post that shows that ruby has a way to auto curry lambdas, which is actually pretty awesome. However, I was a little confused by the syntax

a.send(fn, b)

I’m more used to ML style where you would do

fn a b 

So what is a.send doing?

Message passing

Ruby exposes its dynamic dispatch as a message passing mechanism (like objective c), so you can send “messages” to objects. It’s like being able to say “hey, execute this function (represented by a string) on this context”.

If you think of it that way, then a.send(fn, b) translates to “execute function ‘fn’ on the context a, with the argument of b”. This means that fn better exist on the context of ‘a’.

As an example, this curries the multiplication function:

apply_onContext = lambda do |fn, a, b|
  a.send(fn, b)
end

mult = apply_onContext.curry.(:*, 5)

puts mult.(2)

This prints out 10. First a lambda is created that sends a message to the object ‘a’ asking it to execute the the function * (represented as an interned string).

Then we can leverage the curry function to auto curry the lambda for us creating almost F# style curried functions. The syntax of “.(” is a shorthand of .call syntax which executes a lambda.

If we understand message passing we can construct other lambdas now too:

class Test
  def add(x, y)
    x + y
  end

  def addOne
    apply_onClass = lambda do |fn, x, y|
      send(fn, x, y)
    end

    apply_onClass.curry.(:add, 1)
  end
end

puts Test.new.addOne.(4)

This returns a curried lambda that invokes a message :add on the source object.

Getting rid of the dot

Ruby 1.9 doesn’t let you define what () does so you are forced to call lambdas with the dot syntax. However, ruby has other interesting features that let you alias a method to another name. It’s like moving the original method to a new name.

You can do this to any method you have access to so you can get the benefits of method overriding without needing to actually do inheritance.

Taking advantage of this you can actually hook into the default missing message exception on object (which is invoked when a “message” isn’t caught). Catching the missing method exception and then executing a .call on the object (if it accepts that message) lets us fake the parenthesis.

Here is a blog post that shows how to do it.

Obviously it sucks to leverage exception handling, but hey, still neat.

Conclusion

While nowhere near as succinct as f#

let addOne = (+) 1

But learning new things about other languages is interesting :)

Machine Learning with disaster video posted

A few weeks ago we had our second DC F# meetup with speaker Phil Trelford where he led a hands on session introducing decision trees. The goal of meetup was to see how good of a predictor we could make of who would live and die on the titanic. Kaggle has an excellent data set that shows age, sex, ticket price, cabin number, class, and a bunch of other useful features describing Titanic passengers.

Phil followed Mathias‘ format and had an excellent .fsx script that walked everyone through it. I think the best predictor that someone made was close to 84%, though it was surprisingly difficult to exceed that in the short period of time that we had to work on it. I’d implemented my own shannon entropy based ID3 decision tree in C# so this wasn’t my first foray into decision tree’s, but the compactness of the tree in F# was great to see. On top of that Phil extended the tree to test not just features, but also combinations of features by having the tree accept functions describing features. This was cool and something I hadn’t thought of. By the end you sort of had built out a small DSL describing the feature relationships of what you were trying to test. I like it when a problem domain devolves into a series of small DSL like functions!

If anyone is interested Phil let us post all of his slides and information on our github. Anyways, here is the video of the session!

Thinking about haskell functors in .net

I’ve been teaching myself haskell lately and came across an interesting language feature called functors. Functors are a way of describing a transformation when you have a boxed container. They have a generic signature of

('a -> 'b) -> f 'a -> f 'b

Where f isn’t a “function”, it’s a type that contains the type of 'a.

The idea is you can write custom map functions for types that act as generic containers. Generic containers are things like lists, an option type, or other things that hold something. By itself a list is nothing, it has to be a list OF something. Not to get sidetracked too much, but these kinds of boxes are called Monads.

Anyways, let’s do this in C# by assuming that we have a box type that holds something.


public class Box<T>
{
    public T Data { get; set; }   
}

var boxes = new List<Box<string>>();

IEnumerable<string> boxNames  = boxes.Select(box => box.Data);

We have a type Box and a list of boxes. Then we Select (or map) a box’s inner data into another list. We could extract the projection into a separate function too:

public string BoxString(Box<string> p)
{
    return p.Data;
}

The type signature of this function is

Box-> string

But wouldn’t it be nice to be able to do work on a boxes data without having to explicity project it out? Like, maybe define a way so that if you pass in a box, and a function that works on a string, it’ll automatically unbox the data and apply the function to its data.

For example something like this (but this won’t compile obviously)

public String AddExclamation(String input){
   return input + "!";
}

IEnumerable<Box<string>> boxes = new List<Box<string>>();

IEnumerable<string> boxStringsExclamation = boxes.Select(AddExclamation);

In C# we have to add the projection step (which in this case is overloaded):

public String AddExclamation(Box<String> p){
   return AddExclamation(p.Data);
}

In F# you have to do basically the same thing:

type Box<'T> = { Data: 'T }

let boxes = List.init 10 (fun i -> { Data= i.ToString() })

let boxStrings = List.map (fun i -> i.Data) boxes

But in Haskell, you can define this projection as part of the type by saying it is an instance of the Functor type class. When you make a generic type an instance of the functor type class you can define how maps work on the insides of that class.

data Box a = Data a deriving (Show)

instance Functor Box where
    fmap f (Data inside) = Data(f inside)    

main =
    print $ fmap (++"... your name!") (Data "my name")

This outputs

Data "my name... your name!"

Here I have a box that contains a value, and it has a value. Then I can define how a box behaves when someone maps over it. As long as the type of the box contents matches the type of the projection, the call to fmap works.

ParsecClone on nuget

Today I published the first version of ParsecClone to nuget. I blogged recently about creating my own parser combinator and it’s come along pretty well. While FParsec is more performant and better optimized, mine has other advantages (such as being able to work on arbitrary consumption streams such as binary or bit level) and work directly on strings with regex instead of character by character. Though I wouldn’t recommend using ParsecClone for production string parsing if you have big data sets, since the string parsing isn’t streamed. It works directly on a string. That’s still on the todo list, however the binary parsing does work on streams.

Things included:

  • All your favorite parsec style operators: <|>, >>., .>>, |>>, etc. I won’t list them all since there are a lot.
  • String parsing. Match on full string terms, do regular expression parsing, inverted regular expressions, etc. I have a full working CSV parser written in ParsecClone
  • Binary parsing. Do byte level parsing with endianness conversion for reading byte arrays, floats, ints, unsigned ints, longs, etc.
  • Bit level parsing. Capture a byte array from the byte parsing stream and then reprocess it with bit level parsing. Extract any bit, fold bits to numbers, get a list of zero and ones representing the bits you captured. Works for any size byte array (though converting to int will only work for up to 32 bit captures).
  • The fun thing about ParsecClone is you can now parse anything you want as long as you create a streamable container. The combinator libraries don’t care what they are consuming, just that they are combining and consuming. This made it easy to support strings, bytes, and bits, all as separate consumption containers.

    Anyways, maybe someone will find it useful, as I don’t think there are any binary combinator libraries out there for F# other than this one. I’d love to get feedback if anyone does use it!

Machine learning from disaster

If any of my readers are in the DC/MD/VA area you should all come to the next DC F# meetup that I’m organizing on september 16th (monday). The topic this time is machine learning from disaster, and we’ll get to find out who lives and dies on the Titanic! We’re bringing in guest speaker Phil Trelford so you know its going to be awesome! Phil is in the DC area on his way to the F# skills matters conference in NYC a few days later. I won’t be there but I expect that it will be top notch since all the big F# players are there (such as Don Syme and Tomas Petricek)!.

For more info check out our meetup page.

F# and Machine learning Meetup in DC

As you may have figured out, I like F# and I like functional languages. At some point I tweeted to the f# community lamenting that there was a dearth of F# meetups in the DC area. Lo and behold, tons of people replied saying they’d be interested in forming one, and some notable speakers piped up and said they’d come and speak if I set something up.

So, If any of my readers live in the DC metro area, I’m organizing an F# meetup featuring Mathias Brandewinder. We’ll be doing a hands on F# and machine learning coding dojo which should be a whole buttload of fun. Here’s the official blurb:

Machine Learning is the art of writing programs that get better at performing a task as they gain experience, without being explicitly programmed to do so. Feed your program more data, and it will get smarter at handling new situations.

Some machine learning algorithms use fairly advanced math, but simple approaches can be surprisingly effective. In this Session, we’ll take a classic Machine Learning challenge from Kaggle.com, automatically recognizing hand-written digits (http://www.kaggle.com/c/digit-recognizer), and build a classifier, from scratch, using F#. So bring your laptop, and let’s see how smart we can make our machines!

This session will be organized as an interactive workshop. Come over, and learn yourself a Machine Learning and F# for great good! No prior experience with Machine Learning required, and F# beginners are very welcome – it will be a great opportunity to see F# in action, and why it’s awesome.

To get the most from the session please try and bring a laptop along with F# installed (ideally either MonoDevelop or Visual Studio Web Express/Full Edition).

Mathias Brandewinder has been writing software in C# for 7+ years, and loving every minute of it, except maybe for a few release days. He is an F# MVP, enjoys arguing about code and how to make it better, and gets very excited when discussing TDD or F#. His other professional interests are applied math and probability. If you want to know more about him, you can check out his blog at www.clear-lines.com/blog or find him on Twitter as @Brandewinder.

For more info go RSVP at meetup.com

Qconn NYC 2013

If anyone is at qconn this year come find me (I’m wearing an adult swim hoodie)! There won’t be a tech talk this week since I’m busy at the conf but things will return back to normal next week.

Thread Synchronization With Aspects

This article was originally published at tech.blinemedical.com

Aspect-oriented programming is an interesting way to decouple common method level logic into localized methods that can be applied on build. For C#, PostSharp is a great tool that does the heavy lifting of the MSIL rewrites to inject itself in and around your methods based on method tagging with attributes. PostSharp’s offerings are split up into free aspects and pro aspects so it makes diving into aspect-oriented programming easy since you can get a lot done with their free offerings.

One of their free aspects, the method interception aspect, lets you control how a method gets invoked. Using this capability, my general idea was to expose some sort of lock and wrap the method invocation automatically in lock statement using a shared object. This way, we can manage thread synchronization using aspects.

Managing thread synchronization with aspects isn’t a new idea: the PostSharp site already has an example of thread synchronization. However, they are using a pro feature aspect that allows them to auto-implement a new interface for tagged classes. For the purposes of my example, we can do the same thing without using the pro feature and simultaneously add a little extra functionality.

There are two things I wanted to accomplish. One was to simplify local method locking (basically what the PostSharp example solves), and the second was to facilitate locking of objects across multiple files and namespace boundaries. You can imagine a situation where you have two or more singletons who work on a shared resource. These objects need some sort of shared lock reference to synchronize on, which means you need to expose the synchronized object between all the classes. Not only does this tie classes together, but it can also get messy and error-prone as your application grows.

First, I’ve defined an interface that exposes a basic lock. Implementing the interface is optional as you’ll see later.

public interface IAspectLock
{
    object Lock { get; }
}

Next we have the actual aspect we’ll be tagging methods with.

[Serializable]
public class Synchronize : MethodInterceptionAspect
{
    private static readonly object FlyweightLock = new object();

    private static readonly Dictionary<string, object> LocksByName = new Dictionary<string, object>();

    private String LockName { get; set; }

    /// <summary>
    /// Constructor when using a shared lock by name
    /// </summary>
    /// <param name="lockName"></param>
    public Synchronize(String lockName)
    {
        LockName = lockName;
    }

    /// <summary>
    /// Constructor for when an object implements IAspectLock
    /// </summary>
    public Synchronize()
    {

    }

    public override void OnInvoke(MethodInterceptionArgs args)
    {
        object locker;

        if (String.IsNullOrEmpty(LockName))
        {
            var aspectLockObject = args.Instance as IAspectLock;

            if (aspectLockObject != null)
            {
                locker = aspectLockObject.Lock;
            }
            else
            {
                throw new Exception(String.Format("Method {0} didn't define a lock name nor implement IAspectLock", args.Method.Name));
            }
        }
        else
        {
            lock (FlyweightLock)
            {
                if (!LocksByName.TryGetValue(LockName, out locker))
                {
                    locker = new object();
                    LocksByName[LockName] = locker;
                }
            }
        }

        lock (locker)
        {
            args.Proceed();
        }
    }
}

The attribute can either take a string representing the name of the global lock we want to use, or, if none is provided, we can test to see if the instance implements our special interface and use its lock. When an object implements IAspectLock the code path is simple: get the lock from the object and use it on the method.

The second code path, when you use global lock name, lets you lock across the entire application without having to tie classes together, keeping things clean and decoupled.

For the scenario where a global lock name was defined, I used a static dictionary to keep track of the locks and corresponding reference objects to lock on based on name. This way I can maximize throughput by using a flyweight container: lock first on the dictionary just to get the lock I want, then lock on the value retrieved. The locking of the dictionary will always be fast and shouldn’t be contended for that often. Uncontested locks are tested for using spinlock semantics so they are usually extremely quick. Once you have the actual lock you want to use for this function, you can call args.Proceed() which will actually invoke the tagged method.

Just to be sure this all works, I wrote a unit test to make sure the attribute worked as expected. The test spawns 10,000 threads which will each loop 100,000 times and increment the _syncTest integer. The idea is to introduce a race condition. Given enough threads and enough work, some of those threads won’t get the updated value of the integer and won’t actually increment it. For example, at some point both threads may think _syncTest is 134, and both will increment to 135. If it was synchronized, the value, after two increments, should be 136. Since race conditions are timing-dependent we want to make the unit test stressful to try and maximize the probability that this would happen. Theoretically, we could run this test and never get the race condition we’re expecting, since that’s by definition a race condition (non-deterministic results). However, on my machine, I was able to consistently reproduce the expected failure conditions.

private int _syncTest = 0;
private const int ThreadCount = 10000;
private const int IterationCount = 100000;

[Test]
public void TestSynchro()
{
    var threads = new List<Thread>();
    for (int i = 0; i < ThreadCount; i++)
    {
        threads.Add(ThreadUtil.Start("SyncTester" + i, SynchroMethod));
    }

    threads.ForEach(t=>t.Join());

    Assert.True(_syncTest == ThreadCount * IterationCount,
            String.Format("Expected synchronized value to be {0} but was {1}", ThreadCount * IterationCount, _syncTest));
}

[GlobalSynchronize("SynchroMethodTest")]
private void SynchroMethod()
{
    for (int i = 0; i < IterationCount; i++)
    {
        _syncTest++;
    }
}

When the method doesn’t have the attribute we get an NUnit failure like

  Expected synchornized value to be 1000000000 but was 630198141
  Expected: True
  But was:  False

   at NUnit.Framework.Assert.That(Object actual, IResolveConstraint expression, String message, Object[] args)
   at NUnit.Framework.Assert.True(Boolean condition, String message)
   at AspectTests.TestSynchro() in AspectTests.cs: line 35

Showing the race condition that we expected did happen (the value will change each time). When we have the method synchronized, our test passes.