Takeaways from building a language in Rust

I've been working on an open source language that compiles to regular expressions called Melody for the last month, to learn more about building languages, compilers, and Rust. While I have worked on smaller projects in Rust in the past, this was my largest dive into the language to date, and my first attempt at creating a language. Here are a few things that I've learned in the process, musings I had, and things I wanted to share, in no particular order:

ASTs make your life easier

I started Melody out by using a Logos parser and iterating over each token it produced in a linear fashion. As someone new to language development, I knew about ASTs (abstract syntax trees), but hadn't much experience implementing them, especially in Rust, and thought that using a linear mechanism with some external flags (e.g. "in a group") would be faster to implement and sufficient to target regular expression output for a POC. I ended up being able to implement most of the basic features I wanted, but once I hit things like nested blocks the mechanism quickly became convoluted and a bottleneck in terms of what it could support. I ended up rewriting the compiler using Pest and a proper AST, and the mechanism is a lot easier to follow and allows for building new features a lot more easily. I may look into using Nom at some point in the future.

"Marketing" open source is tricky

I shared Melody on /r/rust not much after I started working on the language, mainly to get feedback and possibly interest people in contributing. That post and an additional share on /r/programming and HackerNews by readers account for the lion's share of the 3.3k stars Melody has. The response was by far more than I expected at that stage, but it's also difficult to strike a balance between letting people know about updates to the project and not spamming subreddits / overdoing it in general.

Apple Silicon is mostly supported for Rust

I'm using a 14" M1 Pro MacBook Pro, and I've found that most things are supported at this point on Apple Silicon. There have been a few outliers like Tarpaulin that only supports Linux at the moment and Miri which does not run on Apple Silicon, but you can make due with GitHub Actions for most of the cases where something will not work locally.

Rust tooling is a pleasure to use

Melody is the largest project I've built in Rust to date, and while working on it I've been exposed to a lot of new tools. I've been working on frontend projects for the last 3 years and the experience of setting up tooling for a Rust project is night and day. Not to knock the frontend tooling community, they're doing a great job and there are a lot of quality tools in the frontend space, but things like linting, testing, fuzzing and benchmarking have a lot less friction in Rust and are an extremely positive experience.

Memory golfing is hard to resist

Working in Rust, both due to memory and lifetime being part of the syntax and due to the inherent performance of the language, tends to make you have an itch to implement everything in the most performant way. I've been calling this "memory golfing", which is similar to code golfing but rather than trying to implement something in as few characters as possible, you try to use as little memory as possible. Wanting to write performant code isn't a bad thing, and should be part of your consideration when working on a project, but it's important to start with something that works well and is readable and optimize where needed after benchmarking rather than rushing into arbitrary optimizations. As the adage says, "premature optimization is the root of all evil".

Should Results and Options collapse?

I'm used to TypeScript, which has the concept of a type guard. A type guard is a normal condition that checks for a variant of an ADT, e.g. if I have a variable that is either a number or string and I'm inside a condition (or return in such a condition) that verified that I'm dealing with a specific variant, the variable "collapses" into the variant (e.g. string) rather than remaining as the original type. Rust requires you to explicitly handle the Result or Option via match, if let, unwrap or ? in the case of a Result, but if you have a case as follows:

// optional_string is an Option<String>

if let None = optional_string {
    return;
}
    
// optional_five is still an Option here, although it has to be Some(String)

The compiler still considers the Option or Result to be "wrapped", even though it is proven to be a specific variant.

I'm not sure whether the TypeScript way is the "better" way in this instance, especially since Results and Options are monads and aren't purely type-level, but in some cases it would be convenient for it to automatically unwrap if it's proven to hold a value / not be an error.

There's no convention for crate naming

Apparently there's no set guideline for naming crates, which I found a bit surprising. Not much to say here, just a note.

Yoav Lavi

Yoav Lavi