A few weeks ago we had our second DC F# meetup with speaker Phil Trelford where he led a hands on session introducing decision trees. The goal of meetup was to see how good of a predictor we could make of who would live and die on the titanic. Kaggle has an excellent data set that shows age, sex, ticket price, cabin number, class, and a bunch of other useful features describing Titanic passengers.
Phil followed Mathias‘ format and had an excellent .fsx script that walked everyone through it. I think the best predictor that someone made was close to 84%, though it was surprisingly difficult to exceed that in the short period of time that we had to work on it. I’d implemented my own shannon entropy based ID3 decision tree in C# so this wasn’t my first foray into decision tree’s, but the compactness of the tree in F# was great to see. On top of that Phil extended the tree to test not just features, but also combinations of features by having the tree accept functions describing features. This was cool and something I hadn’t thought of. By the end you sort of had built out a small DSL describing the feature relationships of what you were trying to test. I like it when a problem domain devolves into a series of small DSL like functions!
If anyone is interested Phil let us post all of his slides and information on our github. Anyways, here is the video of the session!