language implementation

Locale parser with fparsec

Localizing an application consists of extracting out user directed text and managing it outside of hardcoded strings in your code. This lets you tweak strings without having to recompile, and if done properly, allows you to support multiple languages. Localizing is no easy task, it messes up spacing, formatting, name/date other cultural information, but thats a separate issue. The crux of localizing is text.

But, who just uses bare text to display things to the user? Usually you want to have text be a little dynamic. Something like

Hello {user}! Welcome!

Here, user will be some sort of dynamic property. To support this, your locale files need a way to handle arguments.

One way of storing contents in a locale file is like this:

ExampleText = Some Text {argName:argType} other text etc
            = This is on a seperate newline
UserLoginText = ... 

This consists of an identifier, followed by an … Read more

, , ,

Implementing partial functions

This next section I had a lot of fun with, and originally I didn’t plan on implementing it at all. The only reason I did it is because I had a stroke of genius while in the shower one morning. Today, I’m going to talk about how I supported partial functions in my toy programming language.

First let’s look at what a partial function looks like in my language. I took an F# approach where any function whose argument count is less than the declared count becomes a new function (even though F# functions are curried by default but mine are not). For example, in ML type notation you could have a function of type

'a -> 'b -> 'c 

Which means that it takes something of type a and type b as arguments, and returns a type c. If we pass this function only a 'a then it’ll return … Read more


Adding static typing and scope references, part 3: solving forward references

In an earlier post I gave a brief overview of the scope builder and its jobs. There I mentioned that supporting forward references required some extra work. In this post I’ll talk more about how I solved forward references.

Here is what I mean by forward references. func is declared after it’s being referenced

string item = func();

string func(){
    return "yes";

print item;

If we iterate over the program only once from the top down using our visitor pattern based scope builder, when we try and resolve the func method invocation symbol we’ll get an error (it hasn’t been defined yet).

Remember that when things are declared (such as methods, classes, or variables) we create a symbol (with a type) in the current scope tree. Later, when we are referencing them, we need to resolve that symbol. Resolution both validates that we can properly see the symbol and … Read more


Adding static typing and scope validation, part 2: type inference and validation

This post continues my series describing how I solved certain problems while creating a toy programming language. Today I’ll discuss static typing and type inference.

Tracking types

To have static typing, each syntax tree node needs to track what kind of type it is. Integers are integers, words are resolved user defined types, quoted strings are strings. But terminals are not the only nodes with types. Each syntax trees type is derived from its terminals. For example the expression syntax tree that represents the following:

(1 + 2) == 5

Is actually this:

    /    \
   +      5
 /   \
1     2  

We have a tree that has an expression on the left, and a literal on the right. We need to know what the expression on the lefts type is before we can do anything. Since the scope builder is depth first and each syntax tree’s type is … Read more


Adding static typing and scope validation into the language, part 1

Continuing on my series discussing the language I wrote, this next post is going to talk about the basics of static typing and scope rules. So far my language implementation follows very closely to Parr’s examples in his book Language Implementation Patterns, which is what gave me the inspiration to do this project.

However, starting here, I began to get confused with Parr’s examples. I didn’t really see how to translate his examples into real code. On top of that some of his diagrams weren’t matching my mental model of what was going on, so that didn’t help either. In general, I understood what he was doing, but I didn’t want to follow his examples anymore – Parr switched over to using ANTLR for everything which didn’t help me: I wanted to know how to do this without a generator. So I just jumped in and took a stab … Read more


A handrolled language parser

In my previous post about building a custom lexer I mentioned that, for educational purposes, I created a simple toy programming language (still unnamed). There, I talked about building a tokenizer and lexer from scratch. In this post I’ll discuss building a parser that is responsible for generating an abstract syntax tree (AST) for my language. This syntax tree can then be passed to other language components such as a scope and type resolver, and finally an interpreter.

The parser I made is a recursive descent packrat parser that uses backtracking. Short of memoizing found AST, there aren’t any other real optimizations. The goal was to create a working parser, not a production parser to distribute or use (or reuse) in any professional sense. Like the lexer, this is an academic exercise to try and hit on some of the points covered by Terence Parr’s Language Implementation Patterns book that … Read more

, , , , ,

Building a custom lexer

As a software engineer I spend all day (hopefully) writing code. I love code and I love that there are languages that help me solve problems and create solutions. But as an engineer I always want to know more about the tools I work with so I recently picked up “Language Implementation Patterns” by Terence Parr and decided I was going to learn how to build a language. After reading through most of the book and working on examples for about 5 weeks I ended up building an interpreted toy general purpose language that has features like:

  • type inference
  • partial functions
  • static typing
  • classes
  • first class methods

The language I wrote is by no means production level. It’s a toy, and it can do toy things. It can’t do IO, it can’t talk to the OS, it doesn’t do GC, but it does evaluate expressions, call methods, basic … Read more

, ,