June, 2013

Ordered Consumable

I had the need for a specific collection type where I would only ever process an element once, but be able to arbitrarily jump around and process different elements. Once a jump happened, the elements would be processed in circular order: continue to the end, then loop around to the beginning and process any remaining items.

The use case that prompted this is I have an image generator that creates snapshots out of a video on demand. However, I need to be able to seek in the video and create snapshots at the seeked point in time. Once all the snapshots are created I don’t need to create them again, it’s just a one time processing, but the snapshot generation has to follow the users actions. This also means that if a user seeks to a point in time where a snapshot was already generated, the snapshot generation doesn’t need … Read more

, ,

Threadpooling in netduino

Sometimes you want to do asynchronous work without holding up your current thread but the work that needs to be done doesn’t really warrant the cost of spinning up a new thread (though what the exact cost is on an embedded environment I’m not sure).

This where threadpooling comes into play. A threadpool has a certain amount of pre-spun up threads that you can re-use for actions. You push actions onto the threadpool and when there is an available thread it’ll run your action. While threadpools aren’t free (you still incur context switching and the initial overhead of firing up a thread) you can limit your context switches and minimize thread start/cleanup time by reusing threads. Threadpooling is a handy feature and C# has built in support for it, but it’s not in the .net micro framework so I decided to write my own.

For a basic threadpool manager it’s … Read more

, ,

Qconn NYC 2013

If anyone is at qconn this year come find me (I’m wearing an adult swim hoodie)! There won’t be a tech talk this week since I’m busy at the conf but things will return back to normal next week.… Read more

Automatic fogbugz triage with naive bayes

At my work we use fogbugz for our bugtracker and over the history of our company’s lifetime we have tens of thousands of cases. I was thinking recently that this is an interesting repository of historical data and I wanted to see what I could do with it. What if I was able to predict, to some degree of acuracy, who the case would be assigned to based soley on the case title? What about area? Or priority? Being able to predict who a case gets assigned to could alleviate a big time burden on the bug triager.

Thankfully, I’m reading “Machine Learning In Action” and came across the naive bayes classifier, which seemed a good fit for me to use to try and categorize cases based on their titles. Naive bayes is most famously used as part of spam filtering algorithms. The general idea is you train the classifier … Read more

, , , ,

Tech talk: B-Trees

Yesterdays tech talk was on b-trees. B-trees are an interesting tree data structure that are used to minimize disk read access. Also, since they are self balancing, and optimized for sequential reads and inserts, they’re really good for file systems and databases. CouchDB, MongoDB, SQLite, SQL Server and other datbases all use either a b-tree or a b+ tree as their data indexes, so it was interesting to discuss b-tree properties.

Disk io optimizations

A big part of the need for b-trees is disk io optimizations. The need to optimize for disk reads comes from the fact that disk io is slow. Imagine you have to make thousands of reads off disk to try and find a certain piece of data. The rotational delay in a platter drive to read a certain disk block can be up to 20 milliseconds. If you have to read hundreds, or even thousands of … Read more

,

Working on a long term svn branch

I work on a reasonably small team and for the most part everyone works in trunk. But it can happen where you need to switch over to a long term feature branch (more than a week or two) that can last sometimes months. The problem here is that your branch can easily diverge from trunk. If the intent is that the feature branch will eventually become the master (trunk) then you should merge the feature branch frequently. For me, this method has worked really well.

Merging often lets you take the trunk fixes that happen and you manually resolve any conflicts as they come in. Since the feature branch is going to be the final thing (when the feature is done), svn needs to know how to deal with these conflicts. It’s much better to deal with them as they come in, rather than try to integrate a feature branch … Read more

, ,

Building an ID3 decision tree

After following Mathias Brandewinder’s series on converting the python from “Machine Learning in Action” to F#, I decided I’d give the book a try myself. Brandewinder’s blog is great and he went through chapter by chapter working through F# conversions. If you followed his series, this won’t be anything new. Still, I decided to do the same thing as a way to solidify the concepts for myself, and in order to differentiate my posts I am reworking the python code into C#. For the impatient, the full source is available at my github.

This post will discuss the ID3 decision tree algorithm. ID3 is an algorithm that’s used to create a decision tree from a sample data set. Once you have the tree, you can then follow the branches of the tree until you reach a leaf and that will give you a classification for your sample.

For example, … Read more

, ,