Coding Dojo: a gentle introduction to Machine Learning with F# review

Recently I organized an F# meetup in DC, and for our first event we brought in a wonderful speaker (Mathias Brandewinder) who’s topic was called: “Coding Dojo: a gentle introduction to Machine Learning with F#“.

I was certainly a little nervous about our first meetup, but a ton of great people came out: from experienced F# users, to people who had used other functional languages (like OCaml), to people with no functional experience. The goal of the meetup was to write a k-nearest neighbors classifier for a previously posted kaggle exercise to classify pixellated numbers.

Mathias introducing F#

Mathias did a great job of breaking people up into groups and then explaining what is machine learning and the criteria of the project in a surprsingly short time period. I think people were a little scared of jumping in since he only talked for about 10 to 15 minutes, but in place of a long lecture Mathias had a really well put together guided document that encouraged users to play and interact with F#.

The first step was to create an F# project and to download his fsx gist. The gist was broken down into 7 steps where each step walked a user through the basics of F# and machine learning to build their classifier. For example, one step was how to execute lines in F# interactive. Another step was explaining the map function. Another step talked about how to read a file and parse a csv. And yet another discussed distance functions and converting raw data into records.

The meetup group

In the end, if you followed his steps, in a span of under 2 hours, even a novice could end up with a fully working classifier! The classifier’s accuracy, by default, was about 94.4%. Not too bad.

I wanted to share my version of his classifer which is based off of Mathias’ well guided steps.

open System
open System.IO
 
type Number = { Label: string; Pixels: int[] }

let splitLine (line:String) = line.Split([|','|])

let extract file = File.ReadAllLines file |> Array.map splitLine

let strippedHeaders (arr:'a[]) = arr.[1..]

let convertToInt (str:string) = Convert.ToInt32 str

let lineToInt arr = Array.map convertToInt arr

let linesAsInts = Array.map lineToInt 

let toNum (line:int[]) = {Label = line.[0].ToString(); Pixels = line.[1..] }

let convertToNum lines = Array.map toNum lines

let dist (a:int) (b:int) = (a-b)*(a-b)

let arrayDist = Array.map2 dist
 
let totalDist a b = arrayDist a b |> Array.reduce (+)

let train file = 
    extract file 
        |> strippedHeaders
        |> linesAsInts
        |> convertToNum        

let kNNSet trainingSet pixels k =
    trainingSet 
        |> Array.map (fun i -> (i.Label, totalDist i.Pixels pixels)) 
        |> Array.sortBy (fun (label, dist) -> dist)
        |> fun sorted -> sorted.[0..(k - 1)]          
    
let classify trainingSet pixels k = 
    kNNSet trainingSet pixels k
        |> Array.toSeq
        |> Seq.groupBy (fun (label, dist) -> label)
        |> Seq.maxBy (fun (label, items) -> Seq.length items)
        |> fun (label, items) -> label

let accuracy trainingSet validationSet k = 
    Array.map (fun i -> 
        let result = classify trainingSet i.Pixels k
        result = i.Label) validationSet
        |> Array.map(fun i -> if i = true then 1 else 0)
        |> Array.sum 
        |> fun sum -> (double)sum / (double)(Array.length validationSet)
        |> fun acc -> (int)(acc * 100.0)
    
let training = train @"C:\Projects\Personal2\DcDojo\DcDojo\trainingsample.csv"
let validation = train @"C:\Projects\Personal2\DcDojo\DcDojo\validationsample.csv"                       

Had I written this without following his steps I probably would have inlined a lot of the simple helper functions, but I wanted to show how Mathias really brought the “start small, build big” mentality to the project. This is something that really works well in functional languages and I think all the meetup participants picked up on that.

Another meetup participant (my coworker Sam) also posted his kNN classifier, so go check it out and worked through it with a side by side C# example which was cool.

If you get a chance to see Mathias during his summer of F# tour you should! While DC was on the tail end of the trip, Boston and Detroit still are on the agenda.


Edit:

Here is a youtube of a portion of the dojo:

2 comments

  1. Pingback: B-Line Medical | Machine Learning with F# and C# side-by-side
  2. Pingback: F# Weekly #34 2013 | Sergey Tihon's Blog

Post a comment

You may use the following HTML:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>