At my work we use fogbugz for our bugtracker and over the history of our company’s lifetime we have tens of thousands of cases. I was thinking recently that this is an interesting repository of historical data and I wanted to see what I could do with it. What if I was able to predict, to some degree of acuracy, who the case would be assigned to based soley on the case title? What about area? Or priority? Being able to predict who a case gets assigned to could alleviate a big time burden on the bug triager.
Thankfully, I’m reading “Machine Learning In Action” and came across the naive bayes classifier, which seemed a good fit for me to use to try and categorize cases based on their titles. Naive bayes is most famously used as part of spam filtering algorithms. The general idea is you train the classifier … Read more
After following Mathias Brandewinder’s series on converting the python from “Machine Learning in Action” to F#, I decided I’d give the book a try myself. Brandewinder’s blog is great and he went through chapter by chapter working through F# conversions. If you followed his series, this won’t be anything new. Still, I decided to do the same thing as a way to solidify the concepts for myself, and in order to differentiate my posts I am reworking the python code into C#. For the impatient, the full source is available at my github.
This post will discuss the ID3 decision tree algorithm. ID3 is an algorithm that’s used to create a decision tree from a sample data set. Once you have the tree, you can then follow the branches of the tree until you reach a leaf and that will give you a classification for your sample.
For example, … Read more
I was trying to read a binary file created from a native app using the C# BinaryReader class but kept getting weird numbers. When I checked the hex in visual studio I saw that the bytes were backwards from what I expected, indicating endianess issues. This threw me for a loop since I was writing the file from C++ on the same machine that I was reading the file in C# in. Also, I wasn’t sending any data over the network so I was a little confused. Endianess is usually an issue across machine architectures or over the network.
The issue is that I ran into an endianess problem when writing values byte by byte, versus by using the actual data type of an object. Let me demonstrate the issue
What happens if I write 65297 (0xFF11) using C++
int _tmain(int argc, _TCHAR* argv)
… Read more
Every software developer has at one point in time heard the adage
If you have a problem and you think you can solve it with [threads|pointers|regex|etc], now you have two problems
For me, I’ve always told it with regex (and I think that’s the official way to do it). It’s not that threads and pointers aren’t hard, but more that with proper stylistic choices and with experience, they can be easily manageable and simple to debug. Regex though, have a tendency to spiral out of control. What starts with something simple always bloats into an enormously difficult to read haze of PERLgasms.
For example, I frequently wonder why in the 21st century why we still deal with a syntax like this:
(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t]
\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(
\t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\0
Even the most seasoned engineers couldn’t tell me what … Read more
I ran into a neat C# memory leak today that I wanted to share. It’s not often you get a clear undeniable leak in C# and so I really had fun figuring this one out.
Look at this and see if you can spot the leak:
public static class Extensions
public static Image Append(this Image source, Image append)
var newImage = new Bitmap(source.Width + append.Width, source.Height);
using (var g = Graphics.FromImage(newImage))
g.DrawImage(source, 0, 0);
g.DrawImage(append, source.Width, 0);
private static void Main(string args)
var src = @"C:\users\anton\desktop\bigImage.jpg";
var images = Enumerable.Repeat(Image.FromFile(src), 25).ToList();
var appendedImage = images.Aggregate((acc, i) => acc.Append(i));
foreach (var image in images)
What this code does is create 25 instances of my bigImage.jpg (6.5MB), and then creates a new image consisting of those 25 images side by side. The aggregate function folds the list … Read more
Today, at 1pm EST, the venerable Jon Skeet had a goto meeting webinar sponsored by JetBrains reviewing weird and cool stuff about C# and Resharper. For those who aren’t in the know, Resharper is a static analysis tool for C# that is pretty much the best thing ever. Skeet’s a great speaker and my entire team at work and I watched the webinar in our conference room while eating lunch.
I took some notes and wanted to share some of the interesting things that Jon mentioned. You can watch the video here. It’s an hour long and definitely worth viewing.
Skeet talked about how Resharper, and in fact the C# compiler lets you do weird stuff like this:
public class SuperContainer<T>
public class Container<T> : SuperContainer<Container<Container<T>>>
Even though this leads itself to recursive parameterization. Compiling this is just fine though. However, even if … Read more
Locking is a necessary aspect of multithreading code: it prevents unpredictable behavior and makes sure code that is expected to run synchronously does so. Some situations can leverage lockless code, but not always. When you do need to do a lock you shouldn’t do it carelessly, if you lock a section of code that does some major work (such as database access) and it blocks other pending calls you need to be cognizant that there could be a delay or bottleneck. However, just because we have to lock doesn’t mean we can’t do some simple optimizations depending on what our business logic is. If we only need to lock items per a defined group then we can leverage flyweight locking. Lets go through an example to make this scenario clearer.
Imagine we have a WCF service that signs a student into a class where the student has a name, an … Read more
When is it OK to abort a thread is a question that comes up every so often. Usually everyone jumps on the bandwagon that you should never ever do a thread abort, but I don’t agree. Certainly there are times when it’s valid and if you understand what you are doing then it’s ok to use.
The reasoning behind never using thread abort is because calling abort on a thread issues an asynchronous exception, meaning that exceptions could happen where you think there never should be exceptions such as dispose methods or finally blocks. This post describes what happens with thread abort and I found it to be a good read.
But, I still don’t think you should never use thread abort. The big issue is what if you don’t have access to the code that is running in the thread? If a 3rd party library is blocking your app … Read more
This next section I had a lot of fun with, and originally I didn’t plan on implementing it at all. The only reason I did it is because I had a stroke of genius while in the shower one morning. Today, I’m going to talk about how I supported partial functions in my toy programming language.
First let’s look at what a partial function looks like in my language. I took an F# approach where any function whose argument count is less than the declared count becomes a new function (even though F# functions are curried by default but mine are not). For example, in ML type notation you could have a function of type
'a -> 'b -> 'c
Which means that it takes something of type a and type b as arguments, and returns a type c. If we pass this function only a
'a then it’ll return … Read more
In an earlier post I gave a brief overview of the scope builder and its jobs. There I mentioned that supporting forward references required some extra work. In this post I’ll talk more about how I solved forward references.
Here is what I mean by forward references.
func is declared after it’s being referenced
string item = func();
If we iterate over the program only once from the top down using our visitor pattern based scope builder, when we try and resolve the
func method invocation symbol we’ll get an error (it hasn’t been defined yet).
Remember that when things are declared (such as methods, classes, or variables) we create a symbol (with a type) in the current scope tree. Later, when we are referencing them, we need to resolve that symbol. Resolution both validates that we can properly see the symbol and … Read more