Monkey in C++

A C++ implementation of the Monkey programming language from Writing an Interpreter in Go, built as a tree-walking interpreter with a lexer, Pratt parser, explicit AST, evaluator, small REPL, and Catch2-based test suite.

Home View Source

Overview

This project follows the general path of the Monkey interpreter from Writing an Interpreter in Go, but the implementation is written in modern C++ and goes further into the runtime side of the interpreter than the tutorial baseline usually does.

At a high level it is a tree-walking interpreter with a lexer, a Pratt parser, an explicit AST, an evaluator that walks that AST, a small REPL, and a Catch2-based test suite. The familiar educational shape is still there, but the repository also spends a lot of effort on how symbols are resolved, how values are represented at runtime, which allocations should be temporary, and how profiling changes the design once the implementation starts to grow beyond the smallest examples.

It is still an interpreter rather than a compiler or a VM, so the core model remains direct and readable. That is part of what I like about it. The code keeps the language pipeline visible from lexing all the way to evaluation, while still making room for performance-oriented changes that matter once recursion, closures, arrays, hashes, and repeated evaluation start stressing the runtime.

Language and interpreter structure

Monkey is a small dynamically typed language, and this implementation covers the language features you would expect from that kind of project: integers and booleans, prefix and infix expressions, if / else, let, and return, along with functions, closures, strings, arrays, hashes, indexing, and builtin operations such as len, first, last, rest, and push. That language surface is broad enough that the project quickly stops being only about recognizing syntax. Once closures, arrays, hashes, and recursive calls are involved, the runtime representation and evaluation path start to matter just as much as the parser.

The parser is a Pratt parser, so expression parsing is driven by token precedence and small prefix and infix parse functions instead of a large grammar table. Source text is tokenized by the lexer, parsed into an AST, passed through a resolver that assigns indexed references for globals, locals, and builtins, and then evaluated by recursively walking the tree. That overall pipeline stays simple enough to follow while still showing the full shape of an interpreter.

1 + 2 * 3
add(1, 2, 3)
array[1 + 1]

The AST is important because it preserves meaning that would be awkward to recover from raw text. An expression like 1 + 2 * 3 is not just a string of tokens. Once parsed, it becomes a tree where precedence is explicit, so evaluation can operate on semantics rather than on surface order. That is why a tree-walking interpreter can stay conceptually straightforward while still handling nontrivial expressions correctly.

Examples from the language

The easiest way to understand the scope of the implementation is to look at the kinds of programs it already handles. Simple bindings and arithmetic work as expected, comparisons and conditionals behave normally, and the structure is expressive enough to support user-defined functions and closures. That moves the interpreter past toy arithmetic fairly quickly and into the territory where environments, captured values, and function application become real concerns for the runtime.

let a = 5;
let b = a * 2;
b;

1 + 2 * 3;
10 / 2;
1 < 2;
1 == 1;
true != false;

if (1 < 2) {
  10
} else {
  20
}

let newAdder = fn(x) {
  fn(y) { x + y; };
};

let addTwo = newAdder(2);
addTwo(3);

The language also includes strings, arrays, hashes, and a small set of builtins, which helps exercise more of the evaluator and object model than a purely arithmetic language would. Concatenating strings, indexing into arrays, extending them with push, or accessing hash values with different key types all stress different parts of the interpreter and make the implementation more representative of a real dynamic language runtime.

"Hello" + " " + "World!";
len("hello");

let xs = [1, 2, 3];
xs[1];
push(xs, 4);
rest(xs);

let m = {
  "one": 1,
  "two": 2,
  true: 3
};

m["one"];
m[true];

Example program

Recursive Fibonacci is a useful example here because it does two jobs at once. It demonstrates that functions, branching, recursion, integer arithmetic, and return values all work together end to end, and it also provides a consistent benchmark that puts visible pressure on the evaluator. A small interpreter can look complete when it only runs simple expressions, but a recursive program immediately exposes where the hot paths and allocation costs actually live.

let fibonacci = fn(x) {
  if (x == 0) {
    0
  } else {
    if (x == 1) {
      1
    } else {
      fibonacci(x - 1) + fibonacci(x - 2);
    }
  }
};

fibonacci(10);

Runtime and performance work

This is still an AST interpreter, not a compiler or VM, so there is a natural ceiling to how fast it can get. Even so, the runtime side of the project is one of the most interesting parts because the implementation was repeatedly profiled and reworked instead of being left at the first correct version. The local fibonacci(33) measurements currently land around ~1.3s - 1.6s wall-clock with peak memory around ~1.6MB - 1.8MB on the local machine. Those numbers are explicitly local measurements rather than a formal benchmark suite, but they still show how much the runtime changed over time.

The current optimized fibonacci(33) benchmark is roughly ~1.3s - 1.6s wall-clock with peak memory around ~1.6MB - 1.8MB on the local machine.

The profiling history is useful because it shows how the bottleneck moved as each layer improved. Early versions spent a lot of time in evaluator dispatch and RTTI-style type checks. After that was improved, runtime name lookup became more visible. Once lookup was less expensive, allocation and ownership churn became the dominant problem. After the Value and arena refactor, the next problem was not general heap cost so much as recursive call frames being kept alive longer than necessary. After temporary call-frame separation, what remained was mostly the expected cost of recursive AST evaluation, plus smaller overhead around argument handling and the rest of the interpreter machinery.

Several implementation changes sit underneath that progression. AST nodes carry an explicit NodeType, which lets the evaluator dispatch by node kind instead of leaning on heavier runtime type paths. Name resolution is moved partly out of the hot path by assigning indexed symbols for globals, locals, and builtins ahead of time, which reduces string-based environment work during evaluation and makes repeated recursive calls less expensive. The runtime value model is also deliberately split: simple values such as int, bool, and null use immediate representations, while composite values remain heap-backed only where that is actually needed.

Lifetime management is where the project becomes particularly interesting. Temporary composite values created during evaluation go through a ScratchArena, while values that genuinely need to outlive the current scratch lifetime live in a PersistentStore. The transition between those two worlds is explicit through operations like promote(Value, PersistentStore&) and detach(Value), which makes it much clearer when a value is escaping and when it should remain temporary. That explicit separation became especially important for recursive execution, where normal calls now use temporary stack-like Environment frames and persistent storage is reserved for closures or returned values that actually need to survive beyond the current evaluation context.

There are also smaller but still meaningful changes around the edges. Common cases with zero, one, or two arguments avoid allocating a std::vector<Value>, which helps with benchmark shapes like Fibonacci where tiny calls happen over and over again. On the build side, the repository exposes several CMake presets, including release, release-lto, and release-native-lto, so the optimized measurements can be read together with the exact build configuration that produced them. At this point the main remaining bottleneck is less about one missed allocation trick and more about the nature of the interpreter itself: recursive AST evaluation, repeated tree-walking through eval_node, and the ordinary overhead of an interpreter handling calls, branching, and expressions directly. A substantially larger speedup would probably require moving toward compilation or a VM rather than continuing to reshape the AST evaluator alone.

Build, test, and run

The repository exposes the workflow through CMake presets, which keeps the available build modes explicit instead of hiding them behind ad hoc shell commands. The ordinary development path uses the debug preset, while progressively stronger optimized builds are available through release, release-lto, and release-native-lto.

cmake --preset debug
cmake --build --preset debug

cmake --preset release
cmake --build --preset release

cmake --preset release-lto
cmake --build --preset release-lto

cmake --preset release-native-lto
cmake --build --preset release-native-lto

Tests can be run against either the debug or optimized configurations, and the project also includes a small REPL plus a dedicated benchmark target for the recursive Fibonacci workload that was used during profiling.

ctest --preset debug --output-on-failure
ctest --preset release-native-lto --output-on-failure

./build/debug/monkey_repl
./build/release-native-lto/monkey_repl

>> let add = fn(x, y) { x + y; };
>> add(2, 3);
5

./build/release-native-lto/monkey_performance_tests "Performance fibonacci(33)"

Background

The project follows the spirit and structure of Writing an Interpreter in Go, so the overall shape is familiar if you know the book: tokens, lexer, Pratt parser, AST, evaluator, environments, and closures. What changes here is the implementation language and the amount of runtime work layered on top of the core interpreter design. That combination makes it useful both as a language implementation exercise and as a closer look at how apparently small runtime choices start to matter once an interpreter begins running more demanding programs.