A few weekends ago I decided to give solving Sudoku a try. In case you aren’t familiar with Sudoku, here is what an unsolved board looks like
And here is a solved one
Sudoku, of size 3 is pretty easy. Make a snapshot of the board, pick a random open cell, find out what its available possibilities are and set it to a value. To figure out it’s possibilities you need get the cells “group”. This means all the values of the 3×3 cell it’s in, as well as all the values of the row that it’s in and the columns that it’s in.
Based on what is available, you can choose a number that isn’t taken, plop it in down, and then recursively repeat. If nothing is available, and the board isn’t empty, you messed up and the recursion will backtrack.
After following Mathias Brandewinder’s series on converting the python from “Machine Learning in Action” to F#, I decided I’d give the book a try myself. Brandewinder’s blog is great and he went through chapter by chapter working through F# conversions. If you followed his series, this won’t be anything new. Still, I decided to do the same thing as a way to solidify the concepts for myself, and in order to differentiate my posts I am reworking the python code into C#. For the impatient, the full source is available at my github.
This post will discuss the ID3 decision tree algorithm. ID3 is an algorithm that’s used to create a decision tree from a sample data set. Once you have the tree, you can then follow the branches of the tree until you reach a leaf and that will give you a classification for your sample.
I was recently asked to write some code to find the largest contiguous group of synonymous elements in a two dimensional array. The idea is that you want to find the largest “land mass” in a problem where you have a game board that looks something like
L L L W
W L L W
L W W W
W L L W
Where L stands for land, and W stands for water. In this example, the largest land mass would be of size 5. But there are also 2 other land masses, one of size one, and another of size two. Elements can be contiguous only if their direct adjacent neighbor is the same type, so diagonals don’t count.
In general, you can think of the largest mass problem as almost exactly the same as the flood fill problem in image graphics. Except with flood fill, you are given … Read more
Recently we had a discussion on clustering techniques at one of our weekly tech talks and the k-means clustering algorithm came up. K-means is considered one of the simplest unsupervised machine learning techniques and I thought it would be cool to try my hand at an F# implementation of it.
In short, k-means is a way to put data into groups based on distance between nearest neighbors. Technically it works with any dimension but for this post I’ll stick with 1-d data to make things easy.
Imagine you have some 1-dimensional data like
1, 2, 5, 14, 17, 19, 20
And you want to group this into two groups. Pretty easily you can see that 1, 2, and 5 go into one group, and 14, 17, 19, 20 should go in another group. But why? For this small data set we … Read more