← Back to blog
cpplox

cpplox

January 16, 2022

Early in my time at the Recurse Center I (re-)familiarized myself with C++ by reading A Tour of C++ by Bjarne Stroustrup. Since I am someone who thought that The Crying of Lot 49 was a lot of fun and is constantly looking for new things to learn, it took me only one encounter with function signatures like const int* const f(const std::vector<int>& x) const and move semantics to get me hooked.

So when I later started a study group to work through Robert Nystrom's book Crafting Interpreters I thought "why not do it in C++?". So I did. For a while.

The short version of why I switched to Java for the implementation of Lox is:

'virtual' cannot be specified on member function templates

This post is the long version. It has several parts:

  • dynamic and static dispatch
  • the visitor pattern
  • why C++ can't combine them
  • what to do about it

Dispatch

First, a riddle: here is the (at first glance) exact same code, once in Java and once in C++. Can you predict what will be printed in each case?

class Shape {
  String description() {
      return "Hi, I'm a Shape!";
  }
}

class Circle extends Shape {
    String description() {
        return "Hi, I'm a Circle!";
    }
}

public class Main {
    static void describe(Shape s) {
        System.out.println(s.description());
    }

    public static void main(String[] args) {
        var circle = new Circle();
        describe(circle);
    }
}
class Shape {
public:
    std::string description() {
        return "Hi, I'm a Shape!";
    }
};

class Circle : public Shape {
public:
    std::string description() {
        return "Hi, I'm a Circle!";
    }
};

void describe(Shape& s) {
    std::cout << s.description() << std::endl;
}

int main(int argc, char** argv) {
    auto circle = Circle{};
    describe(circle);
}

Did you get it right? Java prints "Hi, I'm a Circle", because it has a mechanism called dynamic dispatch. This allows the language to resolve which method to actually call during runtime.

C++ prints "Hi, I'm a Shape" This happens, because in C++ the default mechanism is static dispatch, so the implementation that will be called is selected at compile-time.

In order to get dynamic dispatch in C++, you must declare a function virtual. Go ahead, try it. By simply adding the virtual keyword to the signature of Shape::description you can get the same result as in Java: "Hi, I'm a Circle!"

That was easy. So what's the problem?

The Visitor Pattern

I wrote a whole separate article about what the visitor pattern is and what problem it solves here. The short version is: Rather than adding every new piece of functionality in a model class itself, you add an accept method that accepts a visitor and a visitor that implements the desired new functionality for all relevant model classes.

Imagine you're writing an interpreter and you have a couple of classes with an inheritance relationship to model the elements of your grammar. A very simple one could look like this:

abstract class Expr {
    abstract <T> T accept(ExprVisitor<T> visitor);
}

class Number extends Expr {
    double value;

    <T> T accept(ExprVisitor<T> visitor) {
        return visitor.visitNumber(this);
    }
}

class Binary extends Expr {
    Expr left;
    String operator;
    Expr right;

    <T> T accept(ExprVisitor<T> visitor) {
        return visitor.visitBinary(this);
    }
}

In this case rather than adding print functionality to Binary and Number, we'd create a PrettyPrintVisitor that takes care of the printing of all our classes.

So then later, when we need evaluation, typechecking or maybe even bytecode generation, we don't need to open up our Expr subclasses and add all this unrelated functionality to a class that could otherwise have remained happily closed.

Instead, we can create an EvaluationVisitor, a TypeCheckingVisitor and a ByteCodeGenerationVisitor that neatly encapsulate their functionality in a separate class:

interface ExprVisitor<T> {
    T visitBinary(Binary expr);
    T visitNumber(Number expr);
}

class EvaluationVisitor implements ExprVisitor<Double> {
    Double visitBinary(Binary expr) { /* ... */ }
    Double visitNumber(Number expr) { return expr.value; }
}

class TypeCheckingVisitor implements ExprVisitor<Type> {
    Type visitBinary(Binary expr) { /* ... */ }
    Type visitNumber(Number expr) { return Type.NUMBER; }
}

The EvaluationVisitor returns Double. The TypeCheckingVisitor returns Type. Each visitor knows its own return type, and Java's generics handle this cleanly.

The Problem

So let's try this in C++.

We need accept to be virtual. When we have an Expr* that's actually pointing to a Number, we want to call Number::accept, not some base class version. That's what dynamic dispatch is for.

We also want accept to work with different visitors that return different types. The natural C++ way to express "works with different types" is templates.

So we write:

class Expr {
public:
    template <typename T>
    virtual T accept(ExprVisitor<T>& visitor) = 0;  // <- error

    virtual ~Expr() = default;
};

And the compiler says:

'virtual' cannot be specified on member function templates

We want two things. C++ will only give us one.

Why?

To understand what's going on here, you need to know how C++ implements virtual functions.

When you declare a virtual function, the compiler creates a table of function pointers for each class. This is called the vtable. Every object of that class carries a hidden pointer to its class's vtable, and when you call a virtual function, the program looks up the right function pointer at runtime.

The important bit: the vtable must have a fixed size, known at compile time.

When the compiler sees:

class Shape {
    virtual std::string description();
};

It knows: "Shape's vtable needs one slot for description()." Done.

Now consider templates. When you write:

template <typename T>
T accept(ExprVisitor<T>& visitor);

This isn't one function. It's a recipe for generating functions. accept<double>, accept<Type>, accept<std::string>, accept<SomeTypeYouInventNextTuesday>. Each instantiation is a separate function.

The compiler doesn't know which instantiations you'll need until it sees all the code that calls accept. Maybe just two. Maybe two hundred. Maybe someone in a different .cpp file uses a type the compiler hasn't seen yet.

So how big should the vtable be?

You can't have a vtable with a slot for every possible template instantiation. That's infinite. Or at least unknowable. C++ doesn't do unknowable.

So it just says no. Virtual functions require all signatures to be known when the class is defined. Templates defer that knowledge until instantiation. You can have one or the other, not both.

The C++ Philosophy

Here's the thing. Java would just make this work. It would box things, add some runtime overhead, let the JVM figure it out. You wouldn't have to think about it.

C++ refuses. And I think this refusal is actually the point.

C++ has this principle: you don't pay for what you don't use. Want runtime polymorphism? Fine, you get vtables, but only for the functions you explicitly mark virtual. Want compile-time polymorphism? Use templates. Zero runtime overhead, full optimization, but no runtime flexibility.

What you don't get is both at once for free. Because "free" usually means "the cost is hidden somewhere."

Java hides the costs. C++ makes you see them. Neither is wrong, they're just different. But if you're the kind of person who likes knowing what your code is actually doing (and if you're reading about C++ vtable layouts, you probably are), there's something satisfying about a language that forces you to choose.

What To Do Instead

Okay, so how do you actually implement the visitor pattern in C++?

Option 1: Fix the return type

If all your visitors can return the same type, the problem goes away:

class ExprVisitor {
public:
    virtual Value visitBinary(Binary& expr) = 0;
    virtual Value visitNumber(Number& expr) = 0;
    virtual ~ExprVisitor() = default;
};

class Expr {
public:
    virtual Value accept(ExprVisitor& visitor) = 0;
    virtual ~Expr() = default;
};

Where Value is some variant or union type that can hold numbers, strings, types, whatever you need.

Trade-off: Every visitor must return Value, even if it only ever produces one type. You lose compile-time type checking on visitor return types.

Option 2: std::variant (if you know the types)

If you know all possible return types upfront:

using VisitorResult = std::variant<double, Type, std::string>;

class Expr {
public:
    virtual VisitorResult accept(ExprVisitorBase& visitor) = 0;
};

Trade-off: Closed set of types. Adding a new return type means modifying the variant. But you get type-safe access via std::visit.

Option 3: std::any

Accept that visitors return "something" and erase the type entirely:

class Expr {
public:
    virtual std::any accept(ExprVisitorBase& visitor) = 0;
};

// Usage:
std::any result = expr->accept(evaluator);
double value = std::any_cast<double>(result);

Trade-off: Runtime type checking. If you cast wrong, you get an exception. You've essentially reinvented Java's Object.

Option 4: Template the whole hierarchy (CRTP)

Use the Curiously Recurring Template Pattern to bake the visitor type into the class hierarchy:

template <typename V>
class Expr {
public:
    virtual typename V::result_type accept(V& visitor) = 0;
};

Trade-off: Your AST is now templated on the visitor type. Different visitors mean different AST instantiations. This can cause code bloat, and you lose the ability to have a polymorphic collection of expressions that works with any visitor.

Option 5: Just use Java

This is what I did.

Look, if you're writing an interpreter and you're going to be walking the AST at runtime anyway, the vtable overhead is noise. Java gives you the pattern directly, the code is cleaner, and you can focus on the interesting parts. Which, if you're working through Crafting Interpreters, is the language implementation, not fighting the type system.

The trade-off is that you don't learn as much about C++.

(But you learn something important about C++: when to use it and when not to.)

The Lesson

The visitor pattern wants to dispatch on two things at once: the type of the expression (Number, Binary, Unary) and the type of the operation (evaluate, typecheck, print). This is sometimes called double dispatch.

Java handles both at runtime. C++ makes you pick which one.

  • Virtual functions: runtime dispatch on object type, compile-time on operation
  • Templates: compile-time dispatch on everything, no runtime flexibility
  • Type erasure (std::any, std::variant): runtime dispatch on everything, but you pay for it

There's no secret fifth option where everything is dynamic and also free.

That compiler error, 'virtual' cannot be specified on member function templates, is C++ telling you: you've asked for two things that work differently. Pick one.

I picked Java.

But now I understand why I had to.